Connect a Git repository and run a workflow
This quickstart walks you through the process of creating a Dataform repository, connecting it to an existing third-party Git repository, and running a workflow. You perform the following tasks using the Google Cloud console and the Dataform API:
- Create a Dataform repository.
- Connect the repository to the
dataform-co/dataform-example-project-bigqueryGitHub repository. - Create and initialize a development workspace.
- Add a new view to the project.
- Compile the project and execute the workflow in BigQuery.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, Dataform, and Secret Manager APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, Dataform, and Secret Manager APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
Required roles
To get the permissions that you need to perform all the tasks in this quickstart, ask your administrator to grant you the following IAM roles:
- Dataform Admin (
roles/dataform.admin) on the project or repository - BigQuery Data Editor (
roles/bigquery.dataEditor) on the project or specific datasets - BigQuery Job User (
roles/bigquery.jobUser) on the project - Service Account User (
roles/iam.serviceAccountUser) on the custom service account
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Create a Dataform repository
A repository is the main container for your Dataform project. Select one of the following options:
Console
Go to the BigQuery Dataform page.
Click Create repository.
On the Create repository page, do the following:
- In the Repository ID field, enter
quickstart-repo. - In the Region list, select a region—for example,
europe-west4. - In the Service account list, select a custom service account for the repository.
- Click Create.
- Click Go to repositories.
- In the Repository ID field, enter
You have successfully created a Dataform repository. Next, you can connect the Dataform repository to a remote Git repository.
API
To create a repository, use the
projects.locations.repositories.create method.
Run the API request with the following information:
- Endpoint:
POST https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories - Query parameter:
repositoryId=REPOSITORY_ID
Alternatively, in your terminal, run the following curl command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"serviceAccount": "SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com"}' \
"https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories?repositoryId=REPOSITORY_ID"
Replace the following:
SERVICE_ACCOUNT_NAME: the ID of the custom service account created to run BigQuery jobs.PROJECT_ID: the unique identifier of the Google Cloud project where you want to create the Dataform repository.LOCATION: the Google Cloud region where you want to create the repository—for example,europe-west4.REPOSITORY_ID: the unique identifier for your new Dataform repository—for example,quickstart-repo.
You have successfully created a Dataform repository. Next, you can connect the Dataform repository to a remote Git repository.
Connect to the Git repository
To connect your Dataform repository to your project, select one of the following options:
Console
Go to the Secret Manager page.
Click Create secret.
In the Name field, enter
dataform-git-token.In the Secret value field, enter your GitHub personal access token (PAT).
For instructions on how to create a PAT, see Managing your personal access tokens.
We recommend setting an expiration date for your token according to your organization's security policies.
Click Create secret.
On the secret details page, click the Permissions tab, and then click Grant access.
In the New principals field, enter your Dataform service agent:
service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com.Replace
PROJECT_NUMBERwith your Google Cloud project number. For details on finding your project number, see Find the project name, number, and ID.In the Select a role field, select Secret Manager > Secret Manager Secret Accessor.
Click Save.
In the Google Cloud console, go to the Dataform page.
Click
quickstart-repo.On the repository page, click Settings > Connect with Git.
In the Link to remote repository pane, select HTTPS.
In the Remote Git repository URL field, enter
https://github.com/dataform-co/dataform-example-project-bigquery.git.In the Default remote branch name field, enter
master.In the Secret menu, select
dataform-git-token.Click Link.
You have successfully connected your Dataform repository to a remote Git repository and granted the necessary permissions. Next, you can create and initialize a development workspace.
API
To store your Git personal access token, create a secret in Secret Manager with the
projects.secrets.createmethod. Run the API request with the following information:- Endpoint:
POST https://secretmanager.googleapis.com/v1/projects/PROJECT_ID/secrets - Query parameter:
secretId=dataform-git-token Body:
{ "replication": { "automatic": {} } }
Alternatively, in your terminal, run the following
curlcommand:curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "replication": { "automatic": {} } }' \ "https://secretmanager.googleapis.com/v1/projects/PROJECT_ID/secrets?secretId=dataform-git-token"- Endpoint:
Add a version to the secret containing your GitHub personal access token (PAT). For instructions on how to create a PAT, see Managing your personal access tokens. We recommend setting an expiration date for your token according to your organization's security policies.
To add a secret version, use the
projects.secrets.addVersionmethod. Run the API request with the following information:- Endpoint:
POST https://secretmanager.googleapis.com/v1/projects/PROJECT_ID/secrets/dataform-git-token:addVersion Body:
{ "payload": { "data": "GITHUB_PAT" } }
Alternatively, in your terminal, run the following
curlcommand:curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "payload": { "data": "$(echo -n "GITHUB_PAT" | base64)" } }' \ "https://secretmanager.googleapis.com/v1/projects/PROJECT_ID/secrets/dataform-git-token:addVersion"Replace
GITHUB_PATwith your GitHub personal access token. Thecurlcommand automatically converts your PAT to a Base64-encoded string before sending the request.- Endpoint:
To let Dataform access the secret, grant the Secret Manager Secret Accessor role (
roles/secretmanager.secretAccessor) to the Dataform service agent. To grant the role, select one of the following options:gcloud
Run the
gcloud secrets add-iam-policy-bindingcommand:gcloud secrets add-iam-policy-binding dataform-git-token \ --member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com" \ --role="roles/secretmanager.secretAccessor"Replace
PROJECT_NUMBERwith your Google Cloud project number. For details on finding your project number, see Find the project name, number, and ID.Secret Manager API
Use the
projects.secrets.setIamPolicymethod. Run the API request with the following information:- Endpoint:
POST https://secretmanager.googleapis.com/v1/projects/PROJECT_ID/secrets/dataform-git-token:setIamPolicy Body:
{ "policy": { "bindings": [ { "role": "roles/secretmanager.secretAccessor", "members": [ "serviceAccount:service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com" ] } ] } }
Alternatively, in your terminal, run the following
curlcommand:curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "policy": { "bindings": [ { "role": "roles/secretmanager.secretAccessor", "members": [ "serviceAccount:service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com" ] } ] } }' \ "https://secretmanager.googleapis.com/v1/projects/PROJECT_ID/secrets/dataform-git-token:setIamPolicy"Replace
PROJECT_NUMBERwith your Google Cloud project number. For details on finding your project number, see Find the project name, number, and ID.- Endpoint:
To connect your repository to a remote Git repository, use the
projects.locations.repositories.patchmethod. Run the API request with the following information:- Endpoint:
PATCH https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID - Query parameter:
updateMask=gitRemoteSettings
Alternatively, in your terminal, run the following
curlcommand:curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "gitRemoteSettings": { "url": "https://github.com/dataform-co/dataform-example-project-bigquery.git", "defaultBranch": "master", "authenticationTokenSecretVersion": "projects/PROJECT_ID/secrets/dataform-git-token/versions/1" } }' \ "https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID?updateMask=gitRemoteSettings"- Endpoint:
You have successfully connected your Dataform repository to a remote Git repository and granted the necessary permissions. Next, you can create and initialize a development workspace.
Create and initialize a development workspace
A workspace is an isolated development environment. To create and initialize a workspace, select one of the following options:
Console
Go to the BigQuery Dataform page.
Click
quickstart-repo.In your repository, go to the Development Workspaces tab.
Click Create development workspace.
In the Workspace ID field, enter
dev-workspace.Click Create.
On the Development Workspaces tab, select the
dev-workspaceworkspace.
You have successfully created and initialized a development workspace. Next, you can configure the workflow settings.
API
To create a workspace, use the
projects.locations.repositories.workspaces.create method.
Run the API request with the following information:
- Endpoint:
POST https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workspaces - Query parameter:
workspaceId=WORKSPACE_ID
Alternatively, in your terminal, run the following curl command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{}" \
"https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workspaces?workspaceId=WORKSPACE_ID"
Replace WORKSPACE_ID with the unique identifier for
your new Dataform development workspace—for example,
dev-workspace.
You have successfully created and initialized a development workspace. Next, you can configure the workflow settings.
Configure workflow settings
In this section, you update the project ID in the workflow_settings.yaml
file to ensure that Dataform executes the workflow in your
Google Cloud project. To configure the workflow settings, select one of the
following options:
Console
Go to the BigQuery Dataform page.
Click
quickstart-repo.In your repository, go to the Development Workspaces tab, and then click
dev-workspace.In the Files pane, select
workflow_settings.yaml.In the file, replace the value of
defaultProjectwith your project ID.The file is automatically saved.
You have successfully updated your workflow settings. Next, you can add a new source declaration to your project.
API
Create a local file named
workflow_settings.yamland paste the following configuration into the file:defaultProject: PROJECT_ID defaultDataset: dataform dataformCoreVersion: CORE_VERSIONReplace
CORE_VERSIONwith the latest stable (non-beta) version of Dataform core—for example,3.0.43. You can find the latest version listed in Releases.In your terminal, encode the file content into a Base64 string:
base64 -w 0 workflow_settings.yamlCopy the resulting output string to use in the
SETTINGS_DEFINITIONplaceholder should you decide to use the alternativecurlcommand later in these steps.To update your workflow settings, use the
projects.locations.repositories.workspaces.writeFilemethod. Run the API request with the following information:- Endpoint:
POST https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workspaces/WORKSPACE_ID:writeFile
Alternatively, in your terminal, run the following
curlcommand:curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "path": "workflow_settings.yaml", "contents": "SETTINGS_DEFINITION" }' \ "https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workspaces/WORKSPACE_ID:writeFile"Replace
SETTINGS_DEFINITIONwith the YAML file's content as a Base64-encoded string.- Endpoint:
You have successfully updated your workflow settings. Next, you can add a new source declaration to your project.
Create a source
In this section, you add a new SQLX source declaration to your project that defines an existing BigQuery dataset so that Dataform can reference it as a data source in your workflow. To create the new source, select one of the following options:
Console
Go to the BigQuery Dataform page.
Click
quickstart-repo.In your repository, go to the Development Workspaces tab, and then click
dev-workspace.In the Files pane, select the
definitionsfolder.Click More file actions > Create file.
In the Add a file path field, enter
definitions/sources/tags.sqlx.Click Create file.
In the SQL editor for the new
definitions/sources/tags.sqlxfile, paste the following code:config { type: "declaration", database: "bigquery-public-data", schema: "stackoverflow", name: "tags" }
You have successfully created a source declaration. Next, you can add a new view to your project.
API
- Create a local file named
tags.sqlx. Paste the following code into the
tags.sqlxfile:config { type: "declaration", database: "bigquery-public-data", schema: "stackoverflow", name: "tags" }In your terminal, encode the file content into a single continuous string:
base64 -w 0 tags.sqlxCopy the resulting output string to use in the
SOURCE_DEFINITIONplaceholder should you decide to use the alternativecurlcommand later in these steps.To create a source declaration file in your workspace, use the
projects.locations.repositories.workspaces.writeFilemethod. Run the API request with the following information:- Endpoint:
POST https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workspaces/WORKSPACE_ID:writeFile
Alternatively, in your terminal, run the following
curlcommand:curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "path": "definitions/sources/tags.sqlx", "contents": "SOURCE_DEFINITION" }' \ "https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workspaces/WORKSPACE_ID:writeFile"Replace
SOURCE_DEFINITIONwith the SQLX file's content as a Base64-encoded string.- Endpoint:
You have successfully created a source declaration. Next, you can add a new view to your project.
Create a view
In this section, you add a new SQLX file to your project that defines a view. To create the new view, select one of the following options:
Console
Go to the BigQuery Dataform page.
Click
quickstart-repo.In your repository, go to the Development Workspaces tab, and then click
dev-workspace.In the Files pane, select the
definitionsfolder.Click More file actions > Create file.
In the Add a file path field, enter
definitions/top_question_tags.sqlx.Click Create file.
In the SQL editor for the new
definitions/top_question_tags.sqlxfile, paste the following code:config { type: "view", name: "top_question_tags", tags: ["daily"], schema: "reporting", } select tag_name, count from ${ref("tags")} order by count desc limit 100
You have successfully created a view. Next, you can compile your project.
API
- Create a local file named
top_question_tags.sqlx. Paste the following code into the
top_question_tags.sqlxfile:config { type: "view", name: "top_question_tags", tags: ["daily"], schema: "reporting", } select tag_name, count from ${ref("tags")} order by count desc limit 100In your terminal, encode the file content into a single continuous string:
base64 -w 0 top_question_tags.sqlxCopy the resulting output string to use in the
VIEW_DEFINITIONplaceholder should you decide to use the alternativecurlcommand later in these steps.To create a view definition file in your workspace, use the
projects.locations.repositories.workspaces.writeFilemethod. Run the API request with the following information:- Endpoint:
POST https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workspaces/WORKSPACE_ID:writeFile
Alternatively, in your terminal, run the following
curlcommand:curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "path": "definitions/top_question_tags.sqlx", "contents": "VIEW_DEFINITION" }' \ "https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workspaces/WORKSPACE_ID:writeFile"Replace
VIEW_DEFINITIONwith the SQLX file's content as a Base64-encoded string.- Endpoint:
You have successfully created a view. Next, you can compile your project.
Compile the project
Compilation converts SQLX files into a pure SQL execution graph. To compile the project, select one of the following options:
Console
The Google Cloud console compiles your project automatically. You can verify the compilation in the Compiled graph tab in your workspace.
You have successfully compiled your project and verified the execution graph. Next, you can execute your workflow in BigQuery.
API
To create a compilation result based on your workspace, use the
projects.locations.repositories.compilationResults.createmethod. Run the API request with the following information:- Endpoint:
POST https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/compilationResults
Alternatively, in your terminal, run the following
curlcommand:curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "workspace": "projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workspaces/WORKSPACE_ID" }' \ "https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/compilationResults"- Endpoint:
To verify that your project compiled successfully, use the
projects.locations.repositories.compilationResults.getmethod. Run the API request with the following information:- Endpoint:
GET https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/compilationResults/COMPILATION_ID
Alternatively, in your terminal, run the following
curlcommand:curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/compilationResults/COMPILATION_ID"Replace
COMPILATION_IDwith the unique identifier for your compilation result. This ID is provided in the response of the compilation request in the previous step.- Endpoint:
In the response, check the
compilationErrorsfield. If the list is empty, your project compiled successfully.
You have successfully compiled your project and verified the execution graph. Next, you can execute your workflow in BigQuery.
Run the workflow
To trigger the execution of your workflow in BigQuery, select one of the following options:
Console
Go to the BigQuery Dataform page.
Click
quickstart-repo.In your repository, go to the Development Workspaces tab, and then click
dev-workspace.In the toolbar, click Start Execution > Execute actions.
Select All actions.
Click Start execution.
You have successfully run your workflow.
API
To trigger a workflow invocation, use the
projects.locations.repositories.workflowInvocations.create method.
Run the API request with the following information:
- Endpoint:
POST https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workflowInvocations
Alternatively, in your terminal, run the following curl command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{
"compilationResult": "projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/compilationResults/COMPILATION_ID"
}' \
"https://dataform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/repositories/REPOSITORY_ID/workflowInvocations"
You have successfully run your workflow.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Delete the BigQuery datasets
To avoid incurring charges for BigQuery assets, delete the
datasets created by this workflow, such as the datasets named reporting
and staging.
In the Google Cloud console, go to the BigQuery page.
In the Explorer panel, expand your project and select a dataset.
Click the Actions menu, and then select Delete.
In the Delete dataset dialog, enter
deleteinto the field, and then click Delete.
Delete the Secret Manager secret
To clean up your security resources, delete the secret used for the Git connection.
In the Google Cloud console, go to the Secret Manager page.
Select the
dataform-git-tokensecret.Click Delete.
In the confirmation dialog, enter the secret name to confirm, and then click Delete.
Delete the Dataform development workspace
Dataform development workspace creation incurs no costs, but to delete the development workspace, follow these steps:
In the Google Cloud console, go to the Dataform page.
Click
quickstart-repo.In the Development Workspaces tab, click the More menu by
dev-workspace, and then select Delete.To confirm, click Delete.
Delete the Dataform repository
Dataform repository creation incurs no costs, but to delete the repository, follow these steps:
In the Google Cloud console, go to the Dataform page.
By
quickstart-repo, click the More menu, and then select Delete.In the Delete repository window, enter the name of the repository to confirm deletion.
To confirm, click Delete.
What's next
- To learn how to declare data sources in Dataform, see Declare a data source.
- To learn how to create views and tables in Dataform, see Create tables.
- To learn more about version control in Dataform, see Version control your code.
- To learn how to schedule workflow runs, see Schedule runs with workflow configurations.