Prerequisites and setup

To complete the lineage use case tutorials, perform the following setup steps:

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  4. Enable the BigQuery, Data lineage , Dataform, BigQuery Data Transfer, and Secret Manager APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the APIs

    For new projects, the BigQuery API is automatically enabled.

Required roles

To get the permissions that you need to perform the lineage use case tutorials, ask your administrator to grant you the following IAM roles on your projects:

  • Data Lineage Viewer (roles/datalineage.viewer): on the project where lineage is recorded and viewed.
  • BigQuery Data Viewer (roles/bigquery.dataViewer): on the table's storage project.
  • BigQuery Resource Viewer (roles/bigquery.resourceViewer): on the job's compute project.
  • Dataplex Catalog Viewer (roles/dataplex.catalogViewer): on the project where catalog entries are stored.
  • Dataform Editor (roles/dataform.editor): on the project where your workspaces and repositories are located.

The following list describes the project types and services associated with the required roles:

  • Storage project stores the BigQuery datasets and tables.
  • Compute project processes your data and stores the lineage metadata. It is used to run BigQuery jobs and data transformations.
  • Catalog entries contain metadata that describes your tables, allowing you to find and organize them without accessing the underlying data.
  • Lineage project records and visualises the history of your data and its transformations.
  • Dataform is a service used to build, version control, and run SQL-based data pipelines. It transforms raw data into clean, documented datasets.

For more information about granting roles, see Manage access to projects, folders, and organizations. You might also be able to get the required permissions through custom roles or other predefined roles.

Get started

To complete the tutorials, use the Data lineage use cases repository. This repository contains predefined code to set up datasets and run data transformations.

Datasets overview

Each tutorial uses a different real-world dataset, such as healthcare provider, employment, or business data, to demonstrate data lineage in realistic scenarios.

Set up datasets

To track data changes with data lineage, perform this one-time setup:

  1. Create a personal access token and store it in Secret Manager.
  2. Link the repository to Dataform.

After setup, run the data transformations to process the data and generate lineage.