The Chat Completions API works as an Open AI-compatible endpoint, designed to make it easier to interface with Gemini on Vertex AI by using the OpenAI libraries for Python and REST. If you're already using the OpenAI libraries, you can use this API as a low-cost way to switch between calling OpenAI models and Vertex AI hosted models to compare output, cost, and scalability, without changing your existing code. If you aren't already using the OpenAI libraries, we recommend that you use the Google Gen AI SDK. To migrate your existing OpenAI SDK code to use the Google Gen AI SDK, see Migrate from OpenAI SDK to Google Gen AI SDK.
Supported models
The Chat Completions API supports both Gemini models and select self-deployed models from Model Garden.
Gemini models
The following models provide support for the Chat Completions API:
- Gemini 3.1 Flash-Lite
- Gemini 3.1 Pro
- Gemini 3 Flash
- Gemini 3 Pro
- Gemini 2.5 Pro
- Gemini 2.5 Flash
- Gemini 2.5 Flash
- Gemini 2.0 Flash
- Gemini 2.0 Flash-Lite
Self-deployed models from Model Garden
The Hugging Face Text Generation Interface (HF TGI) and Vertex AI Model Garden prebuilt vLLM containers support the Chat Completions API. However, not every model deployed to these containers supports the Chat Completions API. The following table includes the most popular supported models by container:
HF TGI |
vLLM |
|---|---|
Supported parameters
For Google models, the Chat Completions API supports the following OpenAI parameters. For a description of each parameter, see OpenAI's documentation on Creating chat completions. Parameter support for third-party models varies by model. To see which parameters are supported, consult the model's documentation.
messages |
|
model |
|
detail |
For models older than Gemini 3, the detail field must be consistent across all messages
and contents (it is request-level). For Gemini 3 and onwards, this corresponds to a part-level
`media_resolution`. For more information, see
Media Resolution.
|
max_completion_tokens |
Alias for max_tokens. |
modalities |
Supports audio, image, and text. |
max_tokens |
|
n |
|
frequency_penalty |
|
presence_penalty |
|
reasoning_effort |
Configures how much time and how many tokens are used on a response.
reasoning_effort or extra_body.google.thinking_config
may be specified.
|
response_format |
|
seed |
Corresponds to GenerationConfig.seed. |
stop |
|
stream |
|
temperature |
|
top_p |
|
tools |
|
tool_choice |
|
web_search_options |
Corresponds to the GoogleSearch tool. No sub-options are
supported. |
function_call |
This field is deprecated, but supported for backwards compatibility. |
functions |
This field is deprecated, but supported for backwards compatibility. |
If you pass any unsupported parameter, it is ignored.
Multimodal input parameters
The Chat Completions API supports select multimodal inputs.
input_audio |
|
image_url |
|
In general, the data parameter can be a URI or a combination of MIME type and
base64 encoded bytes in the form "data:<MIME-TYPE>;base64,<BASE64-ENCODED-BYTES>".
For a full list of MIME types, see GenerateContent.
For more information on OpenAI's base64 encoding, see their documentation.
For usage, see our multimodal input examples.
Gemini-specific parameters
There are several features supported by Gemini that are not available in OpenAI models.
These features can still be passed in as parameters, but must be contained within an
extra_content or extra_body or they will be ignored.
extra_body features
Include a google field to contain any Gemini-specific
extra_body features.
{
...,
"extra_body": {
"google": {
...,
// Add extra_body features here.
}
}
}
safety_settings |
This corresponds to the Gemini
SafetySetting.
|
cached_content |
This corresponds to the Gemini
generateContent.cached_content field.
|
thinking_config |
This corresponds to the Gemini
GenerationConfig.ThinkingConfig.
|
thought_tag_marker |
Used to separate a model's thoughts from its responses for models with Thinking available. If not specified, no tags will be returned around the model's thoughts. If present, subsequent queries will strip the thought tags and mark the thoughts appropriately for context. This helps preserve the appropriate context for subsequent queries. |
stream_function_call_arguments |
Streams function call arguments back as segments of JSON. For more information, see Streaming function call arguments. |
tools |
Specify tools similar to `GenerateContent`. For more information, see
Tool. |
media_resolution |
Specify a request-level media resolution similar to `GenerateContent`. For more information, see
MediaResolution. |
extra_content features
extra_content lets you specify Gemini-specific content that shouldn't be ignored.
Include a google field to contain any Gemini-specific
extra_content features.
{
...,
"extra_content": {
"google": {
...,
// Add extra_content features here.
}
}
}
thought |
This field explicitly marks if a field is a thought and takes precedence
over thought_tag_marker. It helps distinguish between different
steps in a thought process, especially in tool use scenarios where intermediate
steps might be mistaken for final answers. By tagging specific parts of the
input as thoughts, you can guide the model to treat them as internal
reasoning rather than user-facing responses. |
thought_signature |
A bytes field that provides a thought signature to validate against
thoughts returned by the model. This field is distinct from
thought, which is a boolean field. For more information, see
Thought signatures. |
parts |
Specific to a Tool message to pass multi-modal function response parts back to the model.
For more information, see
FunctionResponsePart and
Multimodal function response. |
What's next
- Learn more about authentication and credentialing with the OpenAI-compatible syntax.
- See examples of calling the Chat Completions API with the OpenAI-compatible syntax.
- See examples of calling the Inference API with the OpenAI-compatible syntax.
- See examples of calling the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API.
- Learn more about migrating from Azure OpenAI to the Gemini API.
- To migrate your existing OpenAI SDK code to use the Google Gen AI SDK, see Migrate from OpenAI SDK to Google Gen AI SDK.