The ML.GENERATE_TEXT function
This document describes the ML.GENERATE_TEXT function, which lets you perform
generative natural language tasks by using any combination of text and
unstructured data from BigQuery
standard tables, or
unstructured data from BigQuery
object tables.
The function works by sending requests to a BigQuery ML remote model that represents a Vertex AI model, and then returning that model's response. The following types of remote models are supported:
- Remote models over any of the generally available or preview Gemini models.
Remote models over the following partner models:
Several of the ML.GENERATE_TEXT function's arguments provide the
parameters that shape the Vertex AI model's response.
You can use the ML.GENERATE_TEXT function to perform tasks such as
classification, sentiment analysis, image captioning, and transcription.
Prompt design can strongly affect the responses returned by the Vertex AI model. For more information, see Introduction to prompting or Design multimodal prompts.
Input
The input you can provide to ML.GENERATE_TEXT varies depending on the
Vertex AI model that you reference from your remote model.
Input for Gemini models
When you use the Gemini models, you can use the following types of input:
- Text data from standard tables.
ObjectRefRuntimevalues that are generated by theOBJ.GET_ACCESS_URLfunction. You can useObjectRefvalues from standard tables as input to theOBJ.GET_ACCESS_URLfunction. (Preview)- Unstructured data from an object table.
When you analyze unstructured data, that data must meet the following requirements:
- Content must be in one of the supported formats that are
described in the Gemini API model
mimeTypeparameter. - If you are analyzing a video, the maximum supported length is two minutes.
If the video is longer than two minutes,
ML.GENERATE_TEXTonly returns results for the first two minutes.
Input for other models
For all other types of models, you can analyze text data from a standard table.
Syntax for standard tables
ML.GENERATE_TEXT syntax differs depending on the Vertex AI
model that your remote model references. Choose the option appropriate for your
use case.
Gemini
ML.GENERATE_TEXT(
MODEL `PROJECT_ID.DATASET.MODEL`,
{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },
STRUCT(
[MAX_OUTPUT_TOKENS AS max_output_tokens]
[, TOP_P AS top_p]
[, TEMPERATURE AS temperature]
[, FLATTEN_JSON_OUTPUT AS flatten_json_output]
[, STOP_SEQUENCES AS stop_sequences]
[, GROUND_WITH_GOOGLE_SEARCH AS ground_with_google_search]
[, SAFETY_SETTINGS AS safety_settings]
[, REQUEST_TYPE AS request_type])
)
Arguments
ML.GENERATE_TEXT takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
TABLE: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use theQUERY_STATEMENTargument instead and provide aSELECTstatement that includes an alias for an existing table column. An error occurs if nopromptcolumn is available.QUERY_STATEMENT: the GoogleSQL query that generates the prompt data. The query must produce a column namedprompt. Within the query, you can provide the prompt value in the following ways:- Specify a
STRINGvalue. For example,('Write a poem about birds'). Specify a
STRUCTvalue that contains one or more fields. You can use the following types of fields within theSTRUCTvalue:Field type Description Examples STRINGA string literal, or the name of a STRINGcolumn.String literal: 'Is Seattle a US city?'
String column name:my_string_columnARRAY<STRING>You can only use string literals in the array. Array of string literals: ['Is ', 'Seattle', ' a US city']ObjectRefRuntimeAn
ObjectRefRuntimevalue returned by theOBJ.GET_ACCESS_URLfunction. TheOBJ.GET_ACCESS_URLfunction takes anObjectRefvalue as input, which you can provide by either specifying the name of a column that containsObjectRefvalues, or by constructing anObjectRefvalue.ObjectRefRuntimevalues must have theaccess_url.read_urlanddetails.gcs_metadata.content_typeelements of the JSON value populated.Function call with ObjectRefcolumn:OBJ.GET_ACCESS_URL(my_objectref_column, 'r')
Function call with constructedObjectRefvalue:OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image.jpg', 'myconnection'), 'r')ARRAY<ObjectRefRuntime>ObjectRefRuntimevalues returned from multiple calls to theOBJ.GET_ACCESS_URLfunction. TheOBJ.GET_ACCESS_URLfunction takes anObjectRefvalue as input, which you can provide by either specifying the name of a column that containsObjectRefvalues, or by constructing anObjectRefvalue.ObjectRefRuntimevalues must have theaccess_url.read_urlanddetails.gcs_metadata.content_typeelements of the JSON value populated.Function calls with ObjectRefcolumns:[OBJ.GET_ACCESS_URL(my_objectref_column1, 'r'), OBJ.GET_ACCESS_URL(my_objectref_column2, 'r')]
Function calls with constructedObjectRefvalues:[OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image1.jpg', 'myconnection'), 'r'), OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image2.jpg', 'myconnection'), 'r')]The function combines
STRUCTfields similarly to aCONCAToperation and concatenates the fields in their specified order. The same is true for the elements of any arrays used within the struct. The following table shows some examples ofSTRUCTprompt values and how they are interpreted:Struct field types Struct value Semantic equivalent STRUCT<STRING>('Describe the city of Seattle')'Describe the city of Seattle' STRUCT<STRING, STRING, STRING>('Describe the city ', my_city_column, ' in 15 words')'Describe the city my_city_column_value in 15 words' STRUCT<STRING, ARRAY<STRING>>('Describe ', ['the city of', 'Seattle'])'Describe the city of Seattle' STRUCT<STRING, ObjectRefRuntime>('Describe this city', OBJ.GET_ACCESS_URL(image_objectref_column, 'r'))'Describe this city' image STRUCT<STRING, ObjectRefRuntime, ObjectRefRuntime>('If the city in the first image is within the country of the second image, provide a ten word description of the city',
OBJ.GET_ACCESS_URL(city_image_objectref_column, 'r'),
OBJ.GET_ACCESS_URL(country_image_objectref_column, 'r'))'If the city in the first image is within the country of the second image, provide a ten word description of the city' city_image country_image
- Specify a
MAX_OUTPUT_TOKENS: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is1024.
TOP_P: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
TOP_Pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and theTOP_Pvalue is0.5, then the model selects either A or B as the next token by using theTEMPERATUREvalue and doesn't consider C.TEMPERATURE: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. LowerTEMPERATUREvalues are good for prompts that require a more deterministic and less open-ended or creative response, while higherTEMPERATUREvalues can lead to more diverse or creative results. ATEMPERATUREvalue of0is deterministic, meaning that the highest probability response is always selected. The default is0.FLATTEN_JSON_OUTPUT: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.STOP_SEQUENCES: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
GROUND_WITH_GOOGLE_SEARCH: aBOOLvalue that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. You can only use this argument with Gemini 1.5 pro and Gemini 1.5 flash models. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When bothflatten_json_outputand this field are set toTRUE, an additionalml_generate_text_grounding_resultcolumn is included in the results, providing the sources that the model used to gather additional information. The default isFALSE.
SAFETY_SETTINGS: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
REQUEST_TYPE: aSTRINGvalue that specifies the type of inference request to send to the Gemini model. The request type determines what quota the request uses. Valid values are as follows:DEDICATED: TheML.GENERATE_TEXTfunction only uses Provisioned Throughput quota. TheML.GENERATE_TEXTfunction returns the errorProvisioned throughput is not purchased or is not activeif Provisioned Throughput quota isn't available.SHARED: TheML.GENERATE_TEXTfunction only uses dynamic shared quota (DSQ), even if you have purchased Provisioned Throughput quota.UNSPECIFIED: TheML.GENERATE_TEXTfunction uses quota as follows:- If you haven't purchased Provisioned Throughput quota,
the
ML.GENERATE_TEXTfunction uses DSQ quota. - If you have purchased Provisioned Throughput quota,
the
ML.GENERATE_TEXTfunction uses the Provisioned Throughput quota first. If requests exceed the Provisioned Throughput quota, the overflow traffic uses DSQ quota.
- If you haven't purchased Provisioned Throughput quota,
the
The default value is
UNSPECIFIED.
Details
The model and input table must be in the same region.
Claude
You must enable Claude models in Vertex AI before you can use them. For more information, see Enable a partner model.
ML.GENERATE_TEXT(
MODEL `PROJECT_ID.DATASET.MODEL`,
{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },
STRUCT(
[MAX_OUTPUT_TOKENS AS max_output_tokens]
[, TOP_K AS top_k]
[, TOP_P AS top_p]
[, FLATTEN_JSON_OUTPUT AS flatten_json_output])
)
Arguments
ML.GENERATE_TEXT takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
TABLE: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use theQUERY_STATEMENTargument instead and provide aSELECTstatement that includes an alias for an existing table column. An error occurs if nopromptcolumn is available.QUERY_STATEMENT: the GoogleSQL query that generates the prompt data. The query must produce a column namedprompt.
MAX_OUTPUT_TOKENS: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. The default is1024.
TOP_K: anINT64value in the range[1,40]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.A
TOP_Kvalue of1means the next selected token is the most probable among all tokens in the model's vocabulary, while aTOP_Kvalue of3means that the next token is selected from among the three most probable tokens by using theTEMPERATUREvalue.For each token selection step, the
TOP_Ktokens with the highest probabilities are sampled. Then tokens are further filtered based on theTOP_Pvalue, with the final token selected using temperature sampling.
TOP_P: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.Tokens are selected from the most to least probable until the sum of their probabilities equals the
TOP_Pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and theTOP_Pvalue is0.5, then the model selects either A or B as the next token by using theTEMPERATUREvalue and doesn't consider C.FLATTEN_JSON_OUTPUT: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.
Details
The model and input table must be in the same region.
Llama
You must enable Llama models in Vertex AI before you can use them. For more information, see Enable a partner model.
ML.GENERATE_TEXT(
MODEL `PROJECT_ID.DATASET.MODEL`,
{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },
STRUCT(
[MAX_OUTPUT_TOKENS AS max_output_tokens]
[, TOP_P AS top_p]
[, TEMPERATURE AS temperature]
[, FLATTEN_JSON_OUTPUT AS flatten_json_output]
[, STOP_SEQUENCES AS stop_sequences])
)
Arguments
ML.GENERATE_TEXT takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
TABLE: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use theQUERY_STATEMENTargument instead and provide aSELECTstatement that includes an alias for an existing table column. An error occurs if nopromptcolumn is available.QUERY_STATEMENT: the GoogleSQL query that generates the prompt data. The query must produce a column namedprompt.
MAX_OUTPUT_TOKENS: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. The default is1024.
TOP_P: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95. If you don't specify a value, the model determines an appropriate value.Tokens are selected from the most to least probable until the sum of their probabilities equals the
TOP_Pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and theTOP_Pvalue is0.5, then the model selects either A or B as the next token by using theTEMPERATUREvalue and doesn't consider C.TEMPERATURE: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. LowerTEMPERATUREvalues are good for prompts that require a more deterministic and less open-ended or creative response, while higherTEMPERATUREvalues can lead to more diverse or creative results. ATEMPERATUREvalue of0is deterministic, meaning that the highest probability response is always selected. The default is0.FLATTEN_JSON_OUTPUT: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.STOP_SEQUENCES: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
Mistral AI
You must enable Mistral AI models in Vertex AI before you can use them. For more information, see Enable a partner model.
ML.GENERATE_TEXT(
MODEL `PROJECT_ID.DATASET.MODEL`,
{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },
STRUCT(
[MAX_OUTPUT_TOKENS AS max_output_tokens]
[, TOP_P AS top_p]
[, TEMPERATURE AS temperature]
[, FLATTEN_JSON_OUTPUT AS flatten_json_output]
[, STOP_SEQUENCES AS stop_sequences])
)
Arguments
ML.GENERATE_TEXT takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
TABLE: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use theQUERY_STATEMENTargument instead and provide aSELECTstatement that includes an alias for an existing table column. An error occurs if nopromptcolumn is available.QUERY_STATEMENT: the GoogleSQL query that generates the prompt data. The query must produce a column namedprompt.
MAX_OUTPUT_TOKENS: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. The default is1024.
TOP_P: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95. If you don't specify a value, the model determines an appropriate value.Tokens are selected from the most to least probable until the sum of their probabilities equals the
TOP_Pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and theTOP_Pvalue is0.5, then the model selects either A or B as the next token by using theTEMPERATUREvalue and doesn't consider C.TEMPERATURE: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. LowerTEMPERATUREvalues are good for prompts that require a more deterministic and less open-ended or creative response, while higherTEMPERATUREvalues can lead to more diverse or creative results. ATEMPERATUREvalue of0is deterministic, meaning that the highest probability response is always selected. The default is0.FLATTEN_JSON_OUTPUT: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.STOP_SEQUENCES: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
Open models
ML.GENERATE_TEXT(
MODEL `PROJECT_ID.DATASET.MODEL`,
{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },
STRUCT(
[MAX_OUTPUT_TOKENS AS max_output_tokens]
[, TOP_K AS top_k]
[, TOP_P AS top_p]
[, TEMPERATURE AS temperature]
[, FLATTEN_JSON_OUTPUT AS flatten_json_output])
)
Arguments
ML.GENERATE_TEXT takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
TABLE: the name of the BigQuery table that contains the prompt data. The text in the column that's namedpromptis sent to the model. If your table does not have apromptcolumn, use theQUERY_STATEMENTargument instead and provide aSELECTstatement that includes an alias for an existing table column. An error occurs if nopromptcolumn is available.QUERY_STATEMENT: the GoogleSQL query that generates the prompt data. The query must produce a column namedprompt.
MAX_OUTPUT_TOKENS: anINT64value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. If you don't specify a value, the model determines an appropriate value.
TOP_K: anINT64value in the range[1,40]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.A
TOP_Kvalue of1means the next selected token is the most probable among all tokens in the model's vocabulary, while aTOP_Kvalue of3means that the next token is selected from among the three most probable tokens by using theTEMPERATUREvalue.For each token selection step, the
TOP_Ktokens with the highest probabilities are sampled. Then tokens are further filtered based on theTOP_Pvalue, with the final token selected using temperature sampling.
TOP_P: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.Tokens are selected from the most to least probable until the sum of their probabilities equals the
TOP_Pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and theTOP_Pvalue is0.5, then the model selects either A or B as the next token by using theTEMPERATUREvalue and doesn't consider C.TEMPERATURE: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. LowerTEMPERATUREvalues are good for prompts that require a more deterministic and less open-ended or creative response, while higherTEMPERATUREvalues can lead to more diverse or creative results. ATEMPERATUREvalue of0is deterministic, meaning that the highest probability response is always selected. If you don't specify a value, the model determines an appropriate value.FLATTEN_JSON_OUTPUT: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.
Details
The model and input table must be in the same region.
Syntax for object tables
Use the following syntax to use ML.GENERATE_TEXT with Gemini
models and object table data.
ML.GENERATE_TEXT(
MODEL `PROJECT_ID.DATASET.MODEL`,
{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },
STRUCT(
PROMPT AS prompt
[MAX_OUTPUT_TOKENS AS max_output_tokens]
[, TOP_P AS top_p]
[, TEMPERATURE AS temperature]
[, FLATTEN_JSON_OUTPUT AS flatten_json_output]
[, STOP_SEQUENCES AS stop_sequences]
[, SAFETY_SETTINGS AS safety_settings])
)
Arguments
ML.GENERATE_TEXT takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see TheCREATE MODELstatement for remote models over LLMs.You can confirm which model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
TABLE: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the
ML.GENERATE_TEXTfunction.QUERY_STATEMENT: the GoogleSQL query that generates the image data. You can only specifyWHEREandORDER BYclauses in the query.PROMPT: aSTRINGvalue that contains the prompt to use to analyze the visual content. Thepromptvalue must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.MAX_OUTPUT_TOKENS: anINT64value that sets the maximum number of tokens that can be generated in the response. This value must be in the range[1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is1024.TOP_P: aFLOAT64value in the range[0.0,1.0]that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95.Tokens are selected from the most to least probable until the sum of their probabilities equals the
TOP_Pvalue. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and theTOP_Pvalue is0.5, then the model selects either A or B as the next token by using theTEMPERATUREvalue and doesn't consider C.TEMPERATURE: aFLOAT64value in the range[0.0,1.0]that controls the degree of randomness in token selection. LowerTEMPERATUREvalues are good for prompts that require a more deterministic and less open-ended or creative response, while higherTEMPERATUREvalues can lead to more diverse or creative results. ATEMPERATUREvalue of0is deterministic, meaning that the highest probability response is always selected. The default is0.FLATTEN_JSON_OUTPUT: aBOOLvalue that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE.STOP_SEQUENCES: anARRAY<STRING>value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.SAFETY_SETTINGS: anARRAY<STRUCT<STRING AS category, STRING AS threshold>>value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVEsafety setting is used.Supported categories are as follows:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
BLOCK_NONE(Restricted)BLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVE(Default)BLOCK_ONLY_HIGHHARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, refer to the definition of safety category and blocking threshold.
Details
The model and input table must be in the same region.
Output
ML.GENERATE_TEXT returns the input table plus the following columns:
Gemini API models
ml_generate_text_result: This is the JSON response from theprojects.locations.endpoints.generateContentcall to the model. The generated text is in thetextelement. If you specified thesafety_settingsargument in theML.GENERATE_TEXTfunction, the safety attributes are in thesafety_ratingselement. This column is returned whenflatten_json_outputisFALSE.ml_generate_text_llm_result: aSTRINGvalue that contains the generated text. This column is returned whenflatten_json_outputisTRUE.ml_generate_text_rai_result: aSTRINGvalue that contains the generated safety ratings. This column is returned whenflatten_json_outputisTRUEand you have specified thesafety_settingsargument in theML.GENERATE_TEXTfunction.ml_generate_text_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.ml_generate_text_grounding_result: aSTRINGvalue that contains a list of the grounding sources that the model used to gather additional information. This column is returned when bothflatten_json_outputandground_with_google_searchareTRUE.
Claude models
ml_generate_text_result: This is the JSON response from theprojects.locations.endpoints.rawPredictcall to the model. The generated text is in thecontentelement. This column is returned whenflatten_json_outputisFALSE.ml_generate_text_llm_result: aSTRINGvalue that contains the generated text. This column is returned whenflatten_json_outputisTRUE.ml_generate_text_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.
LLama models
ml_generate_text_result: This is the JSON response from theprojects.locations.endpoints.rawPredictcall to the model. The generated text is in thecontentelement. This column is returned whenflatten_json_outputisFALSE.ml_generate_text_llm_result: aSTRINGvalue that contains the generated text. This column is returned whenflatten_json_outputisTRUE.ml_generate_text_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Mistral AI models
ml_generate_text_result: This is the JSON response from theprojects.locations.endpoints.rawPredictcall to the model. The generated text is in thecontentelement. This column is returned whenflatten_json_outputisFALSE.ml_generate_text_llm_result: aSTRINGvalue that contains the generated text. This column is returned whenflatten_json_outputisTRUE.ml_generate_text_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Open models
ml_generate_text_result: This is the JSON response from theprojects.locations.endpoints.predictcall to the model. The generated text is in thepredictionselement. This column is returned whenflatten_json_outputisFALSE.ml_generate_text_llm_result: aSTRINGvalue that contains the generated text. This column is returned whenflatten_json_outputisTRUE.ml_generate_text_status: aSTRINGvalue that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Examples
Text analysis
Example 1
This example shows a request to a Claude model that provides a single prompt.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.claude_model`, (SELECT 'What is the purpose of dreams?' AS prompt));
Example 2
This example shows a request to a Gemini model that provides
prompt data from a table column named question that is aliased as
prompt.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.gemini_model`, (SELECT question AS prompt FROM `mydataset.prompt_table`));
Example 3
This example shows a request to a Gemini model that concatenates strings and a table column to provide the prompt data.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.gemini_model`, ( SELECT CONCAT( 'Classify the sentiment of the following text as positive or negative.Text:', input_column, 'Sentiment:') AS prompt FROM `mydataset.input_table`));
Example 4
This example shows a request a Gemini model
that excludes model responses that contain
the strings Golf or football.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.gemini_model`, TABLE `mydataset.prompt_table`, STRUCT( .15 AS TEMPERATURE, TRUE AS flatten_json_output, ['Golf', 'football'] AS stop_sequences));
Example 5
This example shows a request to a Gemini model with the following characteristics:
- Provides prompt data from a table column that's named
prompt. - Flattens the JSON response into separate columns.
- Retrieves and returns public web data for response grounding.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.gemini_model`, TABLE `mydataset.prompt_table`, STRUCT( TRUE AS flatten_json_output, TRUE AS ground_with_google_search));
Example 6
This example shows a request to a Gemini model with the following characteristics:
- Provides prompt data from a table column that's named
prompt. - Returns a shorter generated text response.
- Filters out unsafe responses by using safety settings.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.gemini_model`, TABLE `mydataset.prompt_table`, STRUCT( 75 AS max_output_tokens, [STRUCT('HARM_CATEGORY_HATE_SPEECH' AS category, 'BLOCK_LOW_AND_ABOVE' AS threshold), STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)] AS safety_settings));
Visual content analysis
Example 1
This example adds product description information to a table by analyzing
the object data in an ObjectRef column named image:
UPDATE mydataset.products SET image_description = ( SELECT ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.gemini_model`, ( SELECT ('Can you describe the following image?', OBJ.GET_ACCESS_URL(image, 'r')) AS prompt ), STRUCT( TRUE AS FLATTEN_JSON_OUTPUT)) ) WHERE image IS NOT NULL;
Example 2
This example analyzes visual content from an object table that's named
dogs and identifies the breed of dog contained in the content. The content
returned is filtered by the specified safety settings:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.dog_identifier_model`, TABLE `mydataset.dogs` STRUCT( 'What is the breed of the dog?' AS PROMPT, .01 AS TEMPERATURE, TRUE AS FLATTEN_JSON_OUTPUT, [STRUCT('HARM_CATEGORY_HATE_SPEECH' AS category, 'BLOCK_LOW_AND_ABOVE' AS threshold), STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)] AS safety_settings));
Audio content analysis
This example translates and transcribes audio content from an object table
that's named feedback:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.audio_model`, TABLE `mydataset.feedback`, STRUCT( 'What is the content of this audio clip, translated into Spanish?' AS PROMPT, .01 AS TEMPERATURE, TRUE AS FLATTEN_JSON_OUTPUT));
PDF content analysis
This example classifies PDF content from an object table
that's named documents:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.classify_model` TABLE `mydataset.documents` STRUCT( 'Classify this document using the following categories: legal, tax-related, real estate' AS PROMPT, .2 AS TEMPERATURE, TRUE AS FLATTEN_JSON_OUTPUT));
Use Vertex AI Provisioned Throughput
You can use
Vertex AI Provisioned Throughput
with the ML.GENERATE_TEXT function to provide consistent high throughput for
requests. The remote model that you reference in the ML.GENERATE_TEXT function
must use a
supported Gemini model
in order for you to use Provisioned Throughput.
To use Provisioned Throughput,
calculate your Provisioned Throughput requirements
and then
purchase Provisioned Throughput
quota before running the ML.GENERATE_TEXT function. When you purchase
Provisioned Throughput, do the following:
- For Model, select the same Gemini model as the one used
by the remote model that you reference in the
ML.GENERATE_TEXTfunction. For Region, select the same region as the dataset that contains the remote model that you reference in the
ML.GENERATE_TEXTfunction, with the following exceptions:- If the dataset is in the
USmulti-region, select theus-central1region. - If the dataset is in the
EUmulti-region, select theeurope-west4region.
- If the dataset is in the
After you submit the order, wait for the order to be approved and appear on the Orders page.
After you have purchased Provisioned Throughput quota, use the
request_type argument to determine how the ML.GENERATE_TEXT function uses
the quota.
Locations
ML.GENERATE_TEXT must run in the same
region or multi-region as the remote model that the
function references. See the following topics for more information:
- For Gemini model supported regions, see
Google model endpoint locations.
Gemini models are also available in the
USandEUmulti-regions. - For Claude, Llama, and Mistral AI model supported regions, see Google Cloud partner model endpoint locations.
Quotas
See Vertex AI and Cloud AI service functions quotas and limits.
Known issues
This section contains information about known issues.Resource exhausted errors
Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:
A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>
This issue occurs because BigQuery query jobs finish successfully
even if the function fails for some of the rows. The function fails when the
volume of API calls to the remote endpoint exceeds the quota limits for that
service. This issue occurs most often when you are running multiple parallel
batch queries. BigQuery retries these calls, but if the retries
fail, the resource exhausted error message is returned.
To iterate through inference calls until all rows are successfully processed,
you can use the
BigQuery remote inference SQL scripts
or the
BigQuery remote inference pipeline Dataform package.
To try the BigQuery ML remote inference SQL script, see
Handle quota errors by calling ML.GENERATE_TEXT iteratively.
What's next
- Try a tutorial on generating text using a public dataset.
- Get step-by-step instructions on how to generate text using your own data.
- Get step-by-step instructions on how to tune an LLM and use it to generate text.
- For more information about using Vertex AI models to generate text and embeddings, see Generative AI overview.
- For more information about using Cloud AI APIs to perform AI tasks, see AI application overview.
- For more information about supported SQL statements and functions for generative AI models, see End-to-end user journeys for generative AI models.