Snowflake to BigQuery migration tutorials

This document provides end-to-end examples and tutorials for the different ways you can set up a Snowflake to BigQuery migration pipeline.

Snowflake migration pipeline examples

You can migrate your data from Snowflake to BigQuery using three different processes: ELT, ETL, or using partner tools.

Extract, load, and transform

You can set up an extract, load, and transform (ELT) process with two methods:

  • Use a pipeline to extract data from Snowflake and load the data to BigQuery
  • Extract data from Snowflake using other Google Cloud products.

Use a pipeline to extract data from Snowflake

To extract data from Snowflake and load directly into Cloud Storage, use the snowflake2bq tool.

You can then load your data from Cloud Storage to BigQuery using one of the following tools:

Other tools to extract data from Snowflake

You can also use the following tools to extract data from Snowflake:

Other tools to load data to BigQuery

You can also use the following tools to load data to BigQuery:

Extract, transform, and load

If you want to transform your data before loading it into BigQuery, consider the following tools:

Partner tools for migration

There are multiple vendors that specialize in the EDW migration space. For a list of key partners and their provided solutions, see BigQuery partners.

Snowflake export tutorial

The following tutorial show a sample data export from Snowflake to BigQuery that uses the COPY INTO <location> Snowflake command. For a detailed, step-by step process that includes code samples, see the Google Cloud professional services Snowflake to BigQuery tool

Prepare for the export

You can prepare your Snowflake data for an export by extracting your Snowflake data into a Cloud Storage or an Amazon Simple Storage Service (Amazon S3) bucket with the following steps:

Cloud Storage

This tutorial prepares the file in PARQUET format.

  1. Use Snowflake SQL statements to create a named file format specification.

    create or replace file format NAMED_FILE_FORMAT
        type = 'PARQUET'

    Replace NAMED_FILE_FORMAT with a name for the file format. For example, my_parquet_unload_format.

  2. Create an integration with the CREATE STORAGE INTEGRATION command.

    create storage integration INTEGRATION_NAME
        type = external_stage
        storage_provider = gcs
        enabled = true
        storage_allowed_locations = ('BUCKET_NAME')

    Replace the following:

    • INTEGRATION_NAME: a name for the storage integration. For example, gcs_int
    • BUCKET_NAME: the path to the Cloud Storage bucket. For example, gcs://mybucket/extract/
  3. Retrieve the Cloud Storage service account for Snowflake with the DESCRIBE INTEGRATION command.

    desc storage integration INTEGRATION_NAME;

    The output is similar to the following:

    +-----------------------------+---------------+-----------------------------------------------------------------------------+------------------+
    | property                    | property_type | property_value                                                              | property_default |
    +-----------------------------+---------------+-----------------------------------------------------------------------------+------------------|
    | ENABLED                     | Boolean       | true                                                                        | false            |
    | STORAGE_ALLOWED_LOCATIONS   | List          | gcs://mybucket1/path1/,gcs://mybucket2/path2/                               | []               |
    | STORAGE_BLOCKED_LOCATIONS   | List          | gcs://mybucket1/path1/sensitivedata/,gcs://mybucket2/path2/sensitivedata/   | []               |
    | STORAGE_GCP_SERVICE_ACCOUNT | String        | service-account-id@iam.gserviceaccount.com                 |                  |
    +-----------------------------+---------------+-----------------------------------------------------------------------------+------------------+
  4. Grant the service account listed as STORAGE_GCP_SERVICE_ACCOUNT read and write access to the bucket specified in the storage integration command. In this example, grant the service-account-id@ service account read and write access to the <var>UNLOAD_BUCKET</var> bucket.

  5. Create an external Cloud Storage stage that references the integration that you created previously.

    create or replace stage STAGE_NAME
        url='UNLOAD_BUCKET'
        storage_integration = INTEGRATION_NAME
        file_format = NAMED_FILE_FORMAT;

    Replace the following:

    • STAGE_NAME: a name for the Cloud Storage stage object. For example, my_ext_unload_stage

Amazon S3

The following example shows how to move data from a Snowflake table to an Amazon S3 bucket:

  1. In Snowflake, configure a storage integration object to allow Snowflake to write to an Amazon S3 bucket referenced in an external Cloud Storage stage.

    This step involves configuring access permissions to the Amazon S3 bucket, creating the Amazon Web Services (AWS) IAM role, and creating a storage integration in Snowflake with the CREATE STORAGE INTEGRATION command:

    create storage integration INTEGRATION_NAME
    type = external_stage
    storage_provider = s3
    enabled = true
    storage_aws_role_arn = 'arn:aws:iam::001234567890:role/myrole'
    storage_allowed_locations = ('BUCKET_NAME')

    Replace the following:

    • INTEGRATION_NAME: a name for the storage integration. For example, s3_int
    • BUCKET_NAME: the path to the Amazon S3 bucket to load files to. For example, s3://unload/files/
  2. Retrieve the AWS IAM user with the DESCRIBE INTEGRATION command.

    desc integration INTEGRATION_NAME;

    The output is similar to the following:

    +---------------------------+---------------+================================================================================+------------------+
    | property                  | property_type | property_value                                                                 | property_default |
    +---------------------------+---------------+================================================================================+------------------|
    | ENABLED                   | Boolean       | true                                                                           | false            |
    | STORAGE_ALLOWED_LOCATIONS | List          | s3://mybucket1/mypath1/,s3://mybucket2/mypath2/                                | []               |
    | STORAGE_BLOCKED_LOCATIONS | List          | s3://mybucket1/mypath1/sensitivedata/,s3://mybucket2/mypath2/sensitivedata/    | []               |
    | STORAGE_AWS_IAM_USER_ARN  | String        | arn:aws:iam::123456789001:user/abc1-b-self1234                                 |                  |
    | STORAGE_AWS_ROLE_ARN      | String        | arn:aws:iam::001234567890:role/myrole                                          |                  |
    | STORAGE_AWS_EXTERNAL_ID   | String        | MYACCOUNT_SFCRole=                                                   |                  |
    +---------------------------+---------------+================================================================================+------------------+
  3. Create a role that has the CREATE STAGE privilege for the schema, and the USAGE privilege for the storage integration:

        CREATE role ROLE_NAME;  
        GRANT CREATE STAGE ON SCHEMA public TO ROLE ROLE_NAME;
        GRANT USAGE ON INTEGRATION s3_int TO ROLE ROLE_NAME;

    Replace ROLE_NAME with a name for the role. For example, myrole.

  4. Grant the AWS IAM user permissions to access the Amazon S3 bucket, and create an external stage with the CREATE STAGE command:

        USE SCHEMA mydb.public;
    
        create or replace stage STAGE_NAME
            url='BUCKET_NAME'
            storage_integration = INTEGRATION_NAMEt
            file_format = NAMED_FILE_FORMAT;

    Replace the following:

    • STAGE_NAME: a name for the Cloud Storage stage object. For example, my_ext_unload_stage

Export Snowflake data

After you have prepared your data, you can move your data to Google Cloud. Use the COPY INTO command to copy data from the Snowflake database table into a Cloud Storage or Amazon S3 bucket by specifying the external stage object, STAGE_NAME.

    copy into @STAGE_NAME/d1
    from TABLE_NAME;

Replace TABLE_NAME with the name of your Snowflake database table.

As a result of this command, the table data is copied to the stage object, which is linked to the Cloud Storage or Amazon S3 bucket. The file includes the d1 prefix.

Other export methods

To use Azure Blob Storage for your data exports, follow the steps detailed in Unloading into Microsoft Azure. Then, transfer the exported files into Cloud Storage using Storage Transfer Service.