This page gives you an overview of the BigQuery destination. It describes write behavior and known limitations of Datastream when you replicate data to BigQuery.
Write behavior
The maximum event size when you stream data into BigQuery is 20 MB.
When you configure your stream, you can select the way that Datastream writes your change data to BigQuery. For more information, see Configure write mode.
Table metadata
Datastream appends a STRUCT column named datastream_metadata to each
table that's written to the BigQuery destination.
Merge write mode
If a table has a primary key at the source, then the column contains the following fields:
UUID: This field has theSTRINGdata type.SOURCE_TIMESTAMP: This field has theINTEGERdata type.
If a table doesn't have a primary key, then the column contains an additional field: IS_DELETED. This field has the BOOLEAN data type, and it indicates whether the data that Datastream streams to the destination is associated with a DELETE operation at the source. Tables without primary keys are append-only.
Append-only write mode
The datastream_metadata column contains the same fields for tables with and
without primary keys:
UUID: This field has theSTRINGdata type.SOURCE_TIMESTAMP: This field has theINTEGERdata type.CHANGE_SEQUENCE_NUMBER: This field has theSTRINGdata type. It's an internal sequence number used by Datastream for each change event.CHANGE_TYPE: This field has theSTRINGdata type. It indicates the type of the change event:INSERT,UPDATE-INSERT,UPDATE-DELETEorDELETE.SORT_KEYS: This field contains an array ofSTRINGvalues. You can use the values to sort the change events.
Control BigQuery costs
BigQuery costs are charged separately from Datastream. To learn how to control your BigQuery costs, see BigQuery CDC pricing.
Cost management best practices
When using Datastream with BigQuery as a destination, consider the following cost management best practices:
- Understand BigQuery CDC costs: when using Datastream with BigQuery as a destination, change data capture (CDC) involves background merge operations that are billed as Analysis SKU usage.
- Use BigQuery reservations: to manage and cap the costs associated with CDC merge jobs, we recommend that you purchase a BigQuery slot reservation. This provides a fixed capacity for a fixed price, making your costs more predictable.
- Tune
max_staleness: Themax_stalenessparameter in BigQuery is a direct trade-off between data freshness and cost. A lower value means more frequent merge operations and higher costs, but fresher data. Tune this parameter to a level that meets your business requirements for data freshness without exceeding your budget. For more information, see Manage table staleness.
Known limitations
Known limitations for using BigQuery as a destination include:
- By default, Datastream doesn't support adding a primary key to a table that's already replicated to BigQuery without a primary key, or removing a primary key from a table that's replicated to BigQuery with a primary key. If you need to perform such changes, contact Google Support. For information about changing the primary key definition for a source table that already has a primary key, see Diagnose issues.
Primary keys in BigQuery must be of the following data types:
DATEBOOLGEOGRAPHYINT64NUMERICBIGNUMERICSTRINGTIMESTAMPDATETIME
Tables that contain primary keys of unsupported data types aren't replicated by Datastream.
BigQuery doesn't support table names with
.,$,/,@, or+characters. Datastream replaces such characters with underscores when creating destination tables.For example,
table.namein the source database becomestable_namein BigQuery.For more information on table names in BigQuery, see Table naming.
BigQuery doesn't support more than four clustering columns. When replicating a table with more than four primary key columns, Datastream uses four primary key columns as the clustering columns.
Datastream maps out-of-range date and time literals such as PostgreSQL infinity date types to the following values:
- Positive
DATEto the value of9999-12-31 - Negative
DATEto the value of0001-01-01 - Positive
TIMESTAMPto the value of9999-12-31 23:59:59.999000 UTC - Negative
TIMESTAMPto the value of0001-01-01 00:00:00 UTC
- Positive
BigQuery doesn't support streaming tables which have primary keys of
FLOATorREALdata types. Such tables aren't replicated. To learn more about BigQuery date types and ranges, see Data types.If your source is Salesforce, the Dataset for each schema configuration option isn't supported.