The CREATE MODEL statement for remote models over open generative AI models

This document describes the CREATE MODEL statement for creating remote models in BigQuery over open text embedding (Preview) and text generation models deployed to Vertex AI.

After you create a remote model, you can use it with the AI.GENERATE_EMBEDDING or AI.GENERATE_TEXT function, depending on the model type.

`CREATE MODEL` syntax

You can automatically deploy the open model at the same time that you create the remote model, or you can manually deploy the open model first, and then use the deployed open model with the remote model. BigQuery manages the Vertex AI resources for automatically deployed open models. You must manage the Vertex AI resources for manually deployed open models. For more information, see Deploy open models.

Automatically deployed

{CREATE MODEL | CREATE MODEL IF NOT EXISTS | CREATE OR REPLACE MODEL}
`project_id.dataset.model_name`
REMOTE WITH CONNECTION {DEFAULT | `project_id.region.connection_id`}
OPTIONS (
  { HUGGING_FACE_MODEL_ID = 'hugging_face_model_id' |
     MODEL_GARDEN_MODEL_NAME = 'model_garden_model_name'}
  [, HUGGING_FACE_TOKEN = 'hugging_face_token' ]
  [,  MACHINE_TYPE = 'machine_type' ]
  [,  MIN_REPLICA_COUNT = min_replica_count ]
  [,  MAX_REPLICA_COUNT = max_replica_count ]
  [,  RESERVATION_AFFINITY_TYPE = {'NO_RESERVATION' | 'ANY_RESERVATION' | 'SPECIFIC_RESERVATION'} ]
  [,  RESERVATION_AFFINITY_KEY = 'compute.googleapis.com/reservation-name' ]
  [,  RESERVATION_AFFINITY_VALUES = reservation_affinity_values ]
  [,  ENDPOINT_IDLE_TTL = endpoint_idle_ttl ]
);

`CREATE MODEL`

Creates and trains a new model in the specified dataset. If the model name exists, CREATE MODEL returns an error.

`CREATE MODEL IF NOT EXISTS`

Creates and trains a new model only if the model doesn't exist in the specified dataset.

`CREATE OR REPLACE MODEL`

Creates and trains a model and replaces an existing model with the same name in the specified dataset.

`model_name`

The name of the model you're creating or replacing. The model name must be unique in the dataset: no other model or table can have the same name. The model name must follow the same naming rules as a BigQuery table. A model name can:

Contain up to 1,024 characters
Contain letters (upper or lower case), numbers, and underscores

model_name is case-sensitive.

If you don't have a default project configured, then you must prepend the project ID to the model name in the following format, including backticks:

`[PROJECT_ID].[DATASET].[MODEL]`

For example, `myproject.mydataset.mymodel`.

`REMOTE WITH CONNECTION`

Syntax

`[PROJECT_ID].[LOCATION].[CONNECTION_ID]`

BigQuery uses a Cloud resource connection to interact with the Vertex AI endpoint.

The connection elements are as follows:

PROJECT_ID: the project ID of the project that contains the connection.
LOCATION: the location used by the connection. The connection must be in the same location as the dataset that contains the model.
CONNECTION_ID: the connection ID—for example, myconnection.
To find your connection ID, view the connection details in the Google Cloud console. The connection ID is the value in the last section of the fully qualified connection ID that is shown in Connection ID—for example projects/myproject/locations/connection_location/connections/myconnection.

To use a default connection, specify DEFAULT instead of the connection string containing PROJECT_ID.LOCATION.CONNECTION_ID.

You must grant the Vertex AI User role to the connection's service account in the project where you create the model. For more information, see Grant or revoke a single IAM role.

Example

`myproject.us.my_connection`

`HUGGING_FACE_MODEL_ID`

Syntax

HUGGING_FACE_MODEL_ID = 'hugging_face_model_id'

Description

The model ID for a supported Hugging Face model, in the format provider_name/model_name. For example, deepseek-ai/DeepSeek-R1. You can get the model ID by clicking the model name in the Hugging Face Model Hub and then copying the model ID from the top of the model card.

Arguments

A STRING value that specifies the model ID.

`MODEL_GARDEN_MODEL_NAME`

MODEL_GARDEN_MODEL_NAME = 'model_garden_model_name'

Description

The model ID and model version of a supported Vertex AI Model Garden model, in the format publishers/publisher/models/model_name@model_version. For example, publishers/openai/models/gpt-oss@gpt-oss-120b. You can get the model ID by clicking the model card in the Vertex AI Model Garden and then copying the model ID from the Model ID field. You can get the default model version by copying it from the Version field on the model card. To see other model versions that you can use, click Deploy model and then click the Resource ID field.

Arguments

A STRING value that specifies the model ID and model version.

`HUGGING_FACE_TOKEN`

HUGGING_FACE_TOKEN = 'hugging_face_token'

Description

The Hugging Face User Access Token to use. You can only specify a value for this option if you also specify a value for the HUGGING_FACE_MODEL_ID option.

The token must have the read role scope or broader. This option is required when the model identified by the HUGGING_FACE_MODEL_ID value is a Hugging Face gated or private model.

Some gated models require explicit agreement to their terms of service before access is granted. To agree to these terms, follow these steps:

Navigate to the model's page on the Hugging Face website.
Locate and review the model's terms of service. A link to the service agreement is typically found on the model card.
Accept the terms as prompted on the page.

Arguments

A STRING value that specifies the Hugging Face User Access Token.

`MACHINE_TYPE`

MACHINE_TYPE = 'machine_type'

Description

The machine type to use when deploying the model to Vertex AI. For information about supported machine types, see Machine types.

Arguments

A STRING value that specifies the machine type. If you don't specify a value for the MACHINE_TYPE option, the Vertex AI Model Garden default machine type for the model is used.

`MIN_REPLICA_COUNT`

MIN_REPLICA_COUNT = min_replica_count

Description

The minimum number of machine replicas used when deploying the model on a Vertex AI endpoint. The service increases or decreases the replica count as required by the inference load on the endpoint. The number of replicas used is never lower than the MIN_REPLICA_COUNT value and never higher than the MAX_REPLICA_COUNT value.

Arguments

An INT64 value that specifies the minimum replica count. The MIN_REPLICA_COUNT value must be in the range [1, 4096]. The default value is 1.

`MAX_REPLICA_COUNT`

MAX_REPLICA_COUNT = max_replica_count

Description

The maximum number of machine replicas used when deploying the model on a Vertex AI endpoint. The service increases or decreases the replica count as required by the inference load on the endpoint. The number of replicas used is never lower than the MIN_REPLICA_COUNT value and never higher than the MAX_REPLICA_COUNT value.

Arguments

An INT64 value that specifies the maximum replica count. The MAX_REPLICA_COUNT value must be in the range [1, 4096]. The default value is the MIN_REPLICA_COUNT value.

`RESERVATION_AFFINITY_TYPE`

RESERVATION_AFFINITY_TYPE = {'NO_RESERVATION' | 'ANY_RESERVATION' | 'SPECIFIC_RESERVATION'}

Description

Determines whether the deployed model uses Compute Engine reservations to provide assured virtual machine (VM) availability when serving predictions, and specifies whether the model uses VMs from all available reservations or just one specific reservation. For more information, see Compute Engine reservation affinity.

You can only use Compute Engine reservations that are shared with Vertex AI. For more information, see Allow a reservation to be consumed.

Arguments

This option accepts the following values:

NO_RESERVATION: no reservation is consumed when your model is deployed to a Vertex AI endpoint. Specifying NO_RESERVATION has the same effect as not specifying a reservation affinity.
ANY_RESERVATION: the Vertex AI model deployment consumes virtual machines (VMs) from Compute Engine reservations that are in the current project or that are shared with the project, and that are configured for automatic consumption. Only VMs that meet the following qualifications are used:
- They use the machine type specified by the MACHINE_TYPE value.
- If the BigQuery dataset in which you are creating the remote model is a single region, the reservation must be in the same region. If the dataset is in the US multiregion, the reservation must be in the us-central1 region. If the dataset is in the EU multiregion, the reservation must be in the europe-west4 region.
If there isn't enough capacity in the available reservations, or if no suitable reservations are found, the system provisions on-demand Compute Engine VMs to meet the resource requirements.
SPECIFIC_RESERVATION: the Vertex AI model deployment consumes VMs only from the reservation that you specify in the RESERVATION_AFFINITY_VALUES value. This reservation must be configured for specifically targeted consumption. Deployment fails if the specified reservation doesn't have sufficient capacity.

`RESERVATION_AFFINITY_KEY`

RESERVATION_AFFINITY_KEY = 'compute.googleapis.com/reservation-name'

Description

The key for a Compute Engine reservation. You must specify this option when the RESERVATION_AFFINITY_TYPE value is SPECIFIC_RESERVATION.

Arguments

The string compute.googleapis.com/reservation-name.

`RESERVATION_AFFINITY_VALUES`

RESERVATION_AFFINITY_VALUES = reservation_affinity_values

Description

The full resource name of the Compute Engine reservation, in the following format:

projects/myproject/zones/reservation_zone/reservations/reservation_name

You can get the reservation name and zone from the Reservations page of the Google Cloud console. For more information, see View reservations.

You must specify this option when the RESERVATION_AFFINITY_TYPE value is SPECIFIC_RESERVATION.

Arguments

An ARRAY<STRING> value that specifies the resource name. For example, RESERVATION_AFFINITY_values = ['projects/myProject/zones/us-central1-a/reservations/myReservationName'].

`ENDPOINT_IDLE_TTL`

ENDPOINT_IDLE_TTL = endpoint_idle_ttl

Description

The duration of inactivity after which the open model is automatically undeployed from the Vertex AI endpoint.

Model inactivity is defined as the amount of time that has passed since the any of the following operations were performed on the model:

Running the CREATE MODEL statement.
Running the ALTER MODEL statement with the DEPLOY_MODEL argument set to TRUE.
Sending an inference request to the model endpoint. For example, by running the AI.GENERATE_EMBEDDING or AI.GENERATE_TEXT function.

Each of these operations resets the inactivity timer to zero. The reset is triggered at the start of the BigQuery job that performs the operation.

After the model is undeployed, inference requests sent to the model return an error. The BigQuery model object remains unchanged, including model metadata. To use the model for inference again, you must redeploy it by running the ALTER MODEL statement on the model and setting the DEPLOY_MODEL option to TRUE.

Arguments

An INTERVAL value. Specify an interval literal value between 390 minutes (6.5 hours) and 7 days. For example, specify INTERVAL 8 HOUR to have the model undeployed after 8 hours of idleness. The default value is 390 minutes (6.5 hours).

Manually deployed

{CREATE MODEL | CREATE MODEL IF NOT EXISTS | CREATE OR REPLACE MODEL}
`project_id.dataset.model_name`
REMOTE WITH CONNECTION {DEFAULT | `project_id.region.connection_id`}
OPTIONS (
  ENDPOINT = 'vertex_ai_endpoint'
);

`CREATE MODEL`

Creates and trains a new model in the specified dataset. If the model name exists, CREATE MODEL returns an error.

`CREATE MODEL IF NOT EXISTS`

Creates and trains a new model only if the model doesn't exist in the specified dataset.

`CREATE OR REPLACE MODEL`

Creates and trains a model and replaces an existing model with the same name in the specified dataset.

`model_name`

Contain up to 1,024 characters
Contain letters (upper or lower case), numbers, and underscores

model_name is case-sensitive.

If you don't have a default project configured, then you must prepend the project ID to the model name in the following format, including backticks:

`[PROJECT_ID].[DATASET].[MODEL]`

For example, `myproject.mydataset.mymodel`.

`REMOTE WITH CONNECTION`

Syntax

`[PROJECT_ID].[LOCATION].[CONNECTION_ID]`

BigQuery uses a Cloud resource connection to interact with the Vertex AI endpoint.

The connection elements are as follows:

PROJECT_ID: the project ID of the project that contains the connection.
LOCATION: the location used by the connection. The connection must be in the same location as the dataset that contains the model.
CONNECTION_ID: the connection ID—for example, myconnection.
To find your connection ID, view the connection details in the Google Cloud console. The connection ID is the value in the last section of the fully qualified connection ID that is shown in Connection ID—for example projects/myproject/locations/connection_location/connections/myconnection.

To use a default connection, specify DEFAULT instead of the connection string containing PROJECT_ID.LOCATION.CONNECTION_ID.

You must grant the Vertex AI User role to the connection's service account in the project where you create the model. For more information, see Grant or revoke a single IAM role.

Example

`myproject.us.my_connection`

`ENDPOINT`

Syntax

ENDPOINT = 'vertex_ai_endpoint'

Description

The Vertex AI endpoint of the deployed open model. The endpoint must be a shared public endpoint. Dedicated public endpoints, Private Service Connect endpoints, and private endpoints aren't supported.

Arguments

A STRING value that specifies the shared public endpoint of a model deployed to Vertex AI, in the format https://location-aiplatform.googleapis.com/v1/projects/project/locations/location/endpoints/endpoint_id.

The following code shows an example endpoint:

ENDPOINT = 'https://us-central1-aiplatform.googleapis.com/v1/projects/myproject/locations/us-central1/endpoints/1234'

Supported open models

You can create a remote model over a deployed open model from either the Vertex AI Model Garden or Hugging Face.

Supported Vertex AI Model Garden models

The following open text generation models in the Vertex AI Model Garden are supported:

The following open embedding generation models in the Vertex AI Model Garden are supported:

E5 Text Embedding (Preview)

Supported Hugging Face models

BigQuery ML supports Hugging Face models that use the Text Embeddings Inference API or the Text Generation Inference API and can be deployed to Vertex AI.

If you use the Vertex AI API to deploy the open model as a Vertex AI endpoint, you don't have to check whether the model has been verified by Vertex AI. Otherwise, you must verify that the model is supported before deploying it.

To find supported Hugging Face models, do the following:

On Hugging Face, open the list of Text Embeddings Inference or Text Generation Inference models, depending on the type of model that you want to create.
Locate a model that you are interested in and copy the model ID.
Open the Vertex AI Model Garden.
Click Deploy from Hugging Face.
On the Deploy from Hugging Face pane, paste the model ID into the Hugging Face URL field.
Press Enter.

A green check circle is shown if the model can be deployed in Vertex AI Model Garden:

Deploy open models

If you are creating a remote model over a supported open model, you can automatically deploy the open model at the same time that you create the remote model by specifying the Vertex AI Model Garden or Hugging Face model ID in the CREATE MODEL statement. Alternatively, you can manually deploy the open model first, and then use that open model with the remote model by specifying the model endpoint in the CREATE MODEL statement.

Automatically deployed models

If you choose to automatically deploy the open model, the service uses the credentials of the connection that you specify in the CREATE MODEL statement to deploy the open model to a Vertex AI shared public endpoint. The Vertex AI endpoint is created in the same project in which you create the remote model. The Vertex AI resource IDs for the BigQuery-managed model and endpoint begin with bq-managed-. The location of the endpoint is determined as follows:

If the BigQuery dataset in which you are creating the remote model is in a single region, the endpoint is created in the same region.
If the dataset is in the US multiregion, the endpoint is created in the us-central1 region.
If the dataset is in the EU multiregion, the endpoint is created in the europe-west4 region.

Automatically deployed models offer the following benefits:

Automatic Vertex AI resource management
Reserve open model resources by using Compute Engine reservations
Automatic or immediate open model undeployment to save costs

Automatic Vertex AI resource management

When you automatically deploy an open model, BigQuery manages the open model's Vertex AI resources for you. If you alter or delete the remote model in BigQuery, the related open model in Vertex AI is altered or deleted as well.

Compute Engine reservation affinity

You can use Compute Engine reservations with open models that you automatically deploy to Vertex AI. Compute Engine reservations provide assured virtual machine (VM) availability for the open model when serving predictions. For more information, see Use reservations with inference.

Use the RESERVATION_AFFINITY_TYPE, RESERVATION_AFFINITY_KEY, and RESERVATION_AFFINITY_VALUES options to specify Compute Engine reservation affinity configurations for the open model.

Automatic or immediate open model undeployment

By default, BigQuery automatically undeploys the model after a specified period of idleness. You can specify how long the model can remain idle before undeployment by setting the ENDPOINT_IDLE_TTL option in the CREATE MODEL or ALTER MODEL statement. This helps you save costs by undeploying the model it isn't being used.

Alternatively, you can use the ALTER MODEL statement to manually undeploy an open model from a Vertex AI endpoint in order to immediately stop billing. To manually undeploy an open model, set the DEPLOY_MODEL argument of the ALTER MODEL statement to FALSE.

Manually deploy Vertex AI Model Garden models

To deploy an open model from the Model Garden, do the following:

Go to Model Garden.

Go to Model Garden
Locate the model's model card in the Model Garden and click it.
Use one of the deployment options on the model card. All models have a Deploy option for deploying directly, and an Open Notebook option for deploying the model by using a Colab Enterprise notebook. Some models also offer a Fine-Tune option for deploying a fine-tuned version of the model.
Follow the workflow of the deployment option that you have chosen. You must select Public (shared endpoint) as the value for the Endpoint access field in the deployment workflow. For more information, see Use models in Model Garden.

Manually deploy Hugging Face models

To deploy a Hugging Face text embedding (Preview) or text generation model, do the following:

UI

On Hugging Face, open the list of Text Embeddings Inference(Preview) or Text Generation Inference models, depending on the type of model that you want to create.
Locate a model that you are interested in and copy the model ID.
Open the Vertex AI Model Garden.
Click Deploy from Hugging Face.
On the Deploy from Hugging Face pane, paste the model ID into the Hugging Face URL field.
Press Enter.
Select region and machine specifications appropriate to your use case.
For Endpoint access, select Public (Shared endpoint).
Click Deploy.

Notebook

On Hugging Face, open the list of Text Embeddings Inference (Preview) or Text Generation Inference models, depending on the type of model that you want to create.
Locate a model that you are interested in and copy the model ID.
Open the Vertex AI Model Garden.
Click Deploy from Hugging Face.
On the Deploy from Hugging Face pane, paste the model ID into the Hugging Face URL field.
Press Enter.
Select Open Notebook:

Open a notebook to deploy a Hugging Face model.

Click the name of the notebook that you want to use.

To stop billing for a model, undeploy the model.

As an alternative to using the UI or a notebook, you can also use the Vertex AI API to deploy the open model as a Vertex AI endpoint.

Locations

For information about supported locations, see Locations for remote models.

Limitations

You can only use open models to process text data. Multimodal data isn't supported.
Canceling a model creation or update job that includes open model deployment doesn't cause the allocated Vertex AI resources to be cleaned up, because Vertex AI doesn't support canceling in-progress model deployments. Instead of canceling the job, wait for it to complete, and then replace, alter, or delete the BigQuery ML model to release Vertex AI resources.
Don't directly modify the Vertex AI resources for an open model that BigQuery manages in Vertex AI. Doing so can lead to unexpected behavior. For example, manually undeploying a model while it is processing an inference job can lead to the whole inference query failing.
Time travel isn't supported for automatically deployed open models. The Vertex AI resources associated with the model aren't recovered.
If you want to delete a BigQuery dataset that contains a BigQuery ML model with BigQuery-managed resources in Vertex AI, you must delete the BigQuery ML model from the dataset first. Otherwise, the Vertex AI resources remain active, and you must go to Vertex AI to manually delete them.

Examples

The following examples show how to create a remote model over an open model.

Create a remote model over a deployed open model

The following example creates a BigQuery ML remote model over a model deployed to a Vertex AI endpoint:

CREATE MODEL `project_id.mydataset.mymodel`
 REMOTE WITH CONNECTION `myproject.us.test_connection`
 OPTIONS(ENDPOINT = 'https://us-central1-aiplatform.googleapis.com/v1/projects/myproject/locations/us-central1/endpoints/1234')

Create a Vertex AI Model Garden model and run text generation

The following example creates a BigQuery-managed Vertex AI model over the gpt-oss@gpt-oss-20b model from Vertex AI Model Garden:

CREATE MODEL `project_id.mydataset.my_model_garden_model`
REMOTE WITH CONNECTION DEFAULT
OPTIONS (model_garden_model_name = 'publishers/openai/models/gpt-oss@gpt-oss-20b');

SELECT *
FROM
  AI.GENERATE_TEXT(
    MODEL `project_id.mydataset.my_model_garden_model`,
    (SELECT 'What is the purpose of dreams?' AS prompt));

Create a Hugging Face model and run embedding generation

The following example creates a BigQuery-managed Vertex AI model over the intfloat/multilingual-e5-small model from Hugging Face with ten machine replicas, and uses it for large-scale embedding generation:

CREATE MODEL `project_id.mydataset.my_hugging_face_model`
REMOTE WITH CONNECTION DEFAULT
OPTIONS (hugging_face_model_id = 'intfloat/multilingual-e5-small', max_replica_count = 10);

SELECT *
FROM
  AI.GENERATE_EMBEDDING(
    MODEL `project_id.mydataset.my_hugging_face_model`,
    TABLE `project_id.mydataset.customer_feedback`
);

Vertex AI resource management

The following example creates a BigQuery-managed Vertex AI model over the gemma3@gemma-3-1b-it model from Vertex AI Model Garden, and configures it to automatically undeploy after being idle for 8 hours:

CREATE MODEL `project_id.mydataset.my_model_garden_model`
  REMOTE WITH CONNECTION DEFAULT
  OPTIONS (
    model_garden_model_name = 'publishers/google/models/gemma3@gemma-3-1b-it',
    endpoint_idle_ttl = INTERVAL 8 HOUR
  );

The following example updates the endpoint_idle_ttl option to automatically undeploy the model after 1 day of inactivity:

ALTER MODEL `project_id.mydataset.my_model_garden_model`
SET OPTIONS (endpoint_idle_ttl = INTERVAL 1 DAY);

The following example manually undeploys the model to immediately stop costs after the inference job is complete:

SELECT *
FROM
  AI.GENERATE_TEXT(
    MODEL `project_id.mydataset.my_model_garden_model`,
    (SELECT 'What is the purpose of dreams?' AS prompt));

ALTER MODEL `project_id.mydataset.my_model_garden_model`
SET OPTIONS (deploy_model = FALSE);

What's next

Try generating text using your data.
Try generating text embeddings using your data.
Try generating text with Gemma.
Try generating text embeddings with an open model.
Try generating video embeddings using your data.
For more information about the supported SQL statements and functions for remote models that use HTTPS endpoints, see End-to-end user journey for each model.

The CREATE MODEL statement for remote models over open generative AI models

CREATE MODEL syntax

Automatically deployed

CREATE MODEL

CREATE MODEL IF NOT EXISTS

CREATE OR REPLACE MODEL

model_name

REMOTE WITH CONNECTION

HUGGING_FACE_MODEL_ID

MODEL_GARDEN_MODEL_NAME

HUGGING_FACE_TOKEN

MACHINE_TYPE

MIN_REPLICA_COUNT

MAX_REPLICA_COUNT

RESERVATION_AFFINITY_TYPE

RESERVATION_AFFINITY_KEY

RESERVATION_AFFINITY_VALUES

ENDPOINT_IDLE_TTL

Manually deployed

CREATE MODEL

CREATE MODEL IF NOT EXISTS

CREATE OR REPLACE MODEL

model_name

REMOTE WITH CONNECTION

ENDPOINT

Supported open models

Supported Vertex AI Model Garden models

Supported Hugging Face models

Deploy open models

Automatically deployed models

Automatic Vertex AI resource management

Compute Engine reservation affinity

Automatic or immediate open model undeployment

Manually deploy Vertex AI Model Garden models

Manually deploy Hugging Face models

UI

Notebook

Locations

Limitations

Examples

Create a remote model over a deployed open model

Create a Vertex AI Model Garden model and run text generation

Create a Hugging Face model and run embedding generation

Vertex AI resource management

What's next

`CREATE MODEL` syntax

`CREATE MODEL`

`CREATE MODEL IF NOT EXISTS`

`CREATE OR REPLACE MODEL`

`model_name`

`REMOTE WITH CONNECTION`

`HUGGING_FACE_MODEL_ID`

`MODEL_GARDEN_MODEL_NAME`

`HUGGING_FACE_TOKEN`

`MACHINE_TYPE`

`MIN_REPLICA_COUNT`

`MAX_REPLICA_COUNT`

`RESERVATION_AFFINITY_TYPE`

`RESERVATION_AFFINITY_KEY`

`RESERVATION_AFFINITY_VALUES`

`ENDPOINT_IDLE_TTL`

`CREATE MODEL`

`CREATE MODEL IF NOT EXISTS`

`CREATE OR REPLACE MODEL`

`model_name`

`REMOTE WITH CONNECTION`

`ENDPOINT`