This document describes how to create a BigQuery subscription. You can use the Google Cloud console, the Google Cloud CLI, the client library, or the Pub/Sub API to create a BigQuery subscription.
Before you begin
Before reading this document, ensure that you're familiar with the following:
How subscriptions work.
The workflow for BigQuery subscriptions.
How to configure a dead letter topic to handle message failures.
In addition to your familiarity with Pub/Sub and BigQuery, ensure that you meet the following prerequisites before you create a BigQuery subscription:
A BigQuery table exists. Alternatively, you can create one when you create the BigQuery subscription as described in the later sections of this document.
Compatibility between the schema of the Pub/Sub topic and the BigQuery table. If you add a non-compatible BigQuery table, you get a compatibility-related error message. For more information, see Schema compatibility.
Required roles and permissions
To get the permissions that
you need to create a BigQuery subscription,
ask your administrator to grant you the
Pub/Sub Editor (roles/pubsub.editor)
IAM role on the project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to create a BigQuery subscription. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to create a BigQuery subscription:
-
on the projectpubsub.subscriptions.create -
on the topicpubsub.topics.attachSubscription
You might also be able to get these permissions with custom roles or other predefined roles.
Cross-project subscriptions
If you create a subscription in one project for a topic in another project, you
must have pubsub.subscriptions.create permission on the project in which you
are creating the subscription, and pubsub.topics.attachSubscription permission
on the topic.
Grant IAM roles to the service account
Pub/Sub uses an Identity and Access Management (IAM) service account to
access Google Cloud resources. By default, it uses the
Pub/Sub service agent
(service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com).
To enable Pub/Sub to write to a BigQuery table,
the service account requires the BigQuery Data Editor
(roles/bigquery.dataEditor) role. You can give the service account permissions
either for the project or the table, as follows:
Project
In the Google Cloud console, go to the IAM page.
Select Include Google-provided role grants.
Find the row for the Cloud Pub/Sub service account and click Edit principal.
Click Add another role and select the BigQuery Data Editor role.
For more information, see Grant an IAM role by using the console.
Table
In the Google Cloud console, go to BigQuery Studio.
In the Explorer pane search box labeled Filter by name and labels, type the name of the table and press Enter.
In the search results, click the name of the table to which you want to grant permission.
In the Details tab, click Share > Manage permissions.
Click Add principal, then enter the service account identifier, in the following format:
service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com.In the Assign roles list, select BigQuery Data Editor.
Click Save. The principal is granted the role on the resource.
Use a custom service account
By granting the BigQuery Data Editor role to the Cloud Pub/Sub service account, any user who has permission to create a subscription in your project can write to the BigQuery table. If you want to provide more granular permissions, configure a user-managed service account instead.
The following permissions are required to configure a user-managed service account to write to BigQuery:
The user-managed service account must have the BigQuery Data Editor role.
The Cloud Pub/Sub service account must have the
iam.serviceAccounts.getAccessTokenpermission on the user-managed service account.The user creating the subscription must have the
iam.serviceAccounts.actAspermission on the user-managed service account.
When you create the subscription, specify the user-managed service account as the subscription service account.
BigQuery subscription properties
BigQuery subscriptions support all of the common subscription properties. The following sections describe properties that are specific to BigQuery subscriptions.
Use topic schema
This option lets Pub/Sub use the schema of the Pub/Sub topic to which the subscription is attached. In addition, Pub/Sub writes the fields in messages to the corresponding columns in the BigQuery table.
When you use this option, remember to check the following additional requirements:
The fields in the topic schema and the BigQuery schema must have the same names and their types must be compatible with each other.
Any optional field in the topic schema must also be optional in the BigQuery schema.
Required fields in the topic schema don't need to be required in the BigQuery schema.
If there are BigQuery fields that are not present in the topic schema, these BigQuery fields must be in mode
NULLABLE.If the topic schema has additional fields that are not present in the BigQuery schema and these fields can be dropped, select the option Drop unknown fields.
You can select only one of the subscription properties, Use topic schema or Use table schema.
If you don't select the Use topic schema or Use table schema option,
ensure that the BigQuery table has a column called data of
type BYTES, STRING, or JSON. Pub/Sub writes the message to
this BigQuery column.
You might not see changes to the Pub/Sub topics schema or BigQuery table schema take effect immediately with messages written to the BigQuery table. For example, if the Drop unknown fields option is enabled and a field is present in the Pub/Sub schema, but not the BigQuery schema, messages written to the BigQuery table might still not contain the field after adding it to the BigQuery schema. Eventually, the schemas synchronize and subsequent messages include the field.
When you use the Use topic schema option for your BigQuery subscription, you can also take advantage of BigQuery change data capture (CDC). CDC updates your BigQuery tables by processing and applying changes to existing rows.
To learn more about this feature, see Stream table updates with change data capture.
To learn how to use this feature with BigQuery subscriptions, see BigQuery change data capture.
Use table schema
This option lets Pub/Sub use the schema of the BigQuery table to write the fields of a JSON message to the corresponding columns. When you use this option, remember to check the following additional requirements:
The names of each column in the BigQuery table must only contain letters (a-z, A-Z), numbers (0-9), or underscores (_).
Published messages must be in JSON format.
If a BigQuery table column has the
JSONdata type, the corresponding field in your Pub/Sub message must be valid, JSON in an escaped string. For example, for a column namedmyData, the message field must be"myData": "{\"key\":\"value\"}". BigQuery rejects messages that don't contain valid JSON.The following JSON conversions are supported:
JSON Type BigQuery Data Type stringNUMERIC,BIGNUMERIC,DATE,TIME,DATETIME, orTIMESTAMPnumberNUMERIC,BIGNUMERIC,DATE,TIME,DATETIME, orTIMESTAMP- When using
numbertoDATE,DATETIME,TIME, orTIMESTAMPconversions, the number must adhere to the supported representations. - When using
numbertoNUMERICorBIGNUMERICconversion, the precision and range of values is limited to those accepted by the IEEE 754 standard for floating-point arithmetic. If you require high precision or a wider range of values, usestringtoNUMERICorBIGNUMERICconversions instead. - When using
stringtoNUMERICorBIGNUMERICconversions, Pub/Sub assumes the string is a human readable number (e.g."123.124"). If processing the string as a human readable number fails, Pub/Sub treats the string as bytes encoded with the BigDecimalByteStringEncoder.
- When using
If the subscription's topic has a schema associated with it, then the message encoding property must be set to
JSON.If there are BigQuery fields that are not present in the messages, these BigQuery fields must be in mode
NULLABLE.If the messages have additional fields that are not present in the BigQuery schema and these fields can be dropped, select the option Drop unknown fields.
You can select only one of the subscription properties, Use topic schema or Use table schema.
If you don't select the Use topic schema or Use table schema option,
ensure that the BigQuery table has a column called data of
type BYTES, STRING, or JSON. Pub/Sub writes the message to
this BigQuery column.
You might not see changes to the BigQuery table schema take effect immediately with messages written to the BigQuery table. For example, if the Drop unknown fields option is enabled and a field is present in the messages, but not in the BigQuery schema, messages written to the BigQuery table might still not contain the field after adding it to the BigQuery schema. Eventually, the schema synchronizes and subsequent messages include the field.
When you use the Use table schema option for your BigQuery subscription, you can also take advantage of BigQuery change data capture (CDC). CDC updates your BigQuery tables by processing and applying changes to existing rows.
To learn more about this feature, see Stream table updates with change data capture.
To learn how to use this feature with BigQuery subscriptions, see BigQuery change data capture.
Drop unknown fields
This option is used with the Use topic schema or Use table schema option. When enabled, this option lets Pub/Sub drop any field that is present in the topic schema or message but not in the BigQuery schema. The fields that are not part of the BigQuery schema are dropped when writing the message to the BigQuery table.
Without Drop unknown fields set, messages with extra fields are not written to BigQuery and remain in the subscription backlog unless you configure a dead letter topic.
The Drop unknown fields setting does not affect fields that are not defined in either the Pub/Sub topic schema or the BigQuery table schema. In this case, a valid Pub/Sub message is delivered to the subscription. However, because BigQuery does not have columns defined for these extra fields, these fields are dropped during the BigQuery writing process. To prevent this behaviour, ensure that any field contained in the Pub/Sub message is also contained in the BigQuery table schema.
The behavior regarding extra fields can also depend on the specific schema type (Avro, Protocol Buffer) and encoding (JSON, Binary) used. For information about how these factors affect the handling of extra fields, see the documentation for your specific schema type and encoding.
Write metadata
This option lets Pub/Sub write the metadata of each message to additional columns in the BigQuery table. Else, the metadata is not written to the BigQuery table.
If you select the Write metadata option, ensure that the BigQuery table has the fields described in the following table.
If you don't select the Write metadata option, then the destination BigQuery table only requires the data field unless
use_topic_schema is true. If you select both the Write metadata and
Use topic schema options, then the schema of the topic must
not contain any fields with names that match those of the metadata parameters.
This limitation includes camelcase versions of these snake case parameters.
| Parameters | |
|---|---|
subscription_name |
STRING Name of a subscription. |
message_id |
STRING ID of a message |
publish_time |
TIMESTAMP The time of publishing a message. |
data |
BYTES, STRING, or JSON The message body. The |
attributes |
STRING or JSON A JSON object containing all message attributes. It also contains additional fields that are part of the Pub/Sub message including the ordering key, if present. |
Service account
You have the following options to write messages to a BigQuery table:
Configure a custom service account so that only users who have the
iam.serviceAccounts.actAspermission on the service account can create a subscription that writes to the table. An example role that includes theiam.serviceAccounts.actAspermission is the Service Account User (roles/iam.serviceAccountUser) role.Use the default Pub/Sub service agent that lets any user with the ability to create subscriptions in the project to create a subscription that writes to the table. The Pub/Sub service agent is the default setting when you don't specify a custom service account.
Create a BigQuery subscription
To create a subscription with BigQuery delivery, perform the following steps.
Console
In the Google Cloud console, go to the Create subscription page.
For the Subscription ID field, enter a name. For information on how to name a subscription, see Guidelines to name a topic or a subscription.
In the Select a Cloud Pub/Sub topic box, type or select the topic to receive messages from.
For Delivery type, select Write to BigQuery.
Select the BigQuery table:
For Project, select the Google Cloud project that contains the BigQuery table.
For Dataset, select an existing dataset or click Create new dataset to create a new dataset. For information about creating a dataset, see Create datasets.
In the Table field, enter the name of the table. To create a new table, click the link that takes you to the BigQuery Create new table page. The page opens in a separate tab. For information about creating a table, see Create and use tables.
For Schema Configuration, select one of the following options:
Don't use a schema. Pub/Sub writes the message bytes to a column named
data.Use topic schema. Pub/Sub uses the schema that is associated with the topic. For more information, see Use topic schema.
Use table schema. Pub/Sub uses the schema of the BigQuery table. For more information, see Use table schema.
Optional. To write message metadata to the BigQuery table, select Write metadata. For more information, see Write metadata.
Optional. To drop fields that are not present in the BigQuery table schema, select Drop unknown fields. For more information, see Drop unknown fields.
Configure the common subscription properties as needed. We strongly recommend that you enable Dead lettering to handle message failures. For more information, see Dead letter topic.
Click Create.
gcloud
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
To create a Pub/Sub subscription, use the
gcloud pubsub subscriptions createcommand:gcloud pubsub subscriptions create SUBSCRIPTION_ID \ --topic=TOPIC_ID \ --bigquery-table=PROJECT_ID.DATASET_ID.TABLE_IDIf you want to use a custom service account, provide it as an additional argument:
gcloud pubsub subscriptions create SUBSCRIPTION_ID \ --topic=TOPIC_ID \ --bigquery-table=PROJECT_ID.DATASET_ID.TABLE_ID \ --bigquery-service-account-email=SERVICE_ACCOUNT_NAMEReplace the following:
SUBSCRIPTION_ID: Specifies the ID of the subscription.TOPIC_ID: Specifies the ID of the topic. The topic requires a schema.PROJECT_ID: Specifies the ID of the project.DATASET_ID: Specifies the ID of an existing dataset. To create a dataset, see Create datasets.TABLE_ID: Specifies the ID of an existing table. The table requires adatafield if your topic doesn't have a schema. To create a table, see Create an empty table with a schema definition.SERVICE_ACCOUNT_NAME: Specifies the name of the service account to use to write to BigQuery.
C++
Before trying this sample, follow the C++ setup instructions in Quickstart: Using Client Libraries. For more information, see the Pub/Sub C++ API reference documentation.
C#
Before trying this sample, follow the C# setup instructions in Quickstart: Using Client Libraries. For more information, see the Pub/Sub C# API reference documentation.
Go
The following sample uses the major version of the Go Pub/Sub client library (v2). If you are still using the v1 library, see the migration guide to v2. To see a list of v1 code samples, see the deprecated code samples.
Before trying this sample, follow the Go setup instructions in Quickstart: Using Client Libraries. For more information, see the Pub/Sub Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in Quickstart: Using Client Libraries. For more information, see the Pub/Sub Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in Quickstart: Using Client Libraries. For more information, see the Pub/Sub Node.js API reference documentation.
Node.ts
Before trying this sample, follow the Node.js setup instructions in Quickstart: Using Client Libraries. For more information, see the Pub/Sub Node.js API reference documentation.
PHP
Before trying this sample, follow the PHP setup instructions in Quickstart: Using Client Libraries. For more information, see the Pub/Sub PHP API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in Quickstart: Using Client Libraries. For more information, see the Pub/Sub Python API reference documentation.
Ruby
The following sample uses Ruby Pub/Sub client library v3. If you are still using the v2 library, see the migration guide to v3. To see a list of Ruby v2 code samples, see the deprecated code samples.
Before trying this sample, follow the Ruby setup instructions in Quickstart: Using Client Libraries. For more information, see the Pub/Sub Ruby API reference documentation.
Monitor a BigQuery subscription
Cloud Monitoring provides a number of metrics to monitor subscriptions.
For a list of all the available metrics related to Pub/Sub and their descriptions, see the Monitoring documentation for Pub/Sub.
You can also monitor subscriptions from within Pub/Sub.
What's next
- Create or modify a subscription with
gcloudcommands. - Create or modify a subscription with REST APIs.
- Troubleshoot a BigQuery subscription.