OpenAI models are available for use as managed APIs and self-deployed models on Vertex AI. You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.
gpt-oss 120B
OpenAI gpt-oss 120B is a 120B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The model is optimized for deployment on consumer hardware.
The 120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running on a single 80GB GPU.
Go to the gpt-oss 120B model cardgpt-oss 20B
OpenAI gpt-oss 20B is a 20B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The model is optimized for deployment on consumer hardware.
The 20B model delivers similar results to OpenAI o3-mini on common benchmarks and can run on edge devices with 16GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.
Go to the gpt-oss 20B model cardUse OpenAI models
For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names:
- For gpt-oss 120B, use
gpt-oss-120b-maas - For gpt-oss 20B, use
gpt-oss-20b-maas
To learn how to make streaming and non-streaming calls to OpenAI models, see Call open model APIs.
To use a self-deployed Vertex AI model:
- Navigate to the Model Garden console.
- Find the relevant Vertex AI model.
- Click Enable and complete the provided form to get the necessary commercial use licenses.
For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .
OpenAI model region availability
OpenAI models are available in the following regions:
| Model | Regions |
|---|---|
| gpt-oss 120B |
|
| gpt-oss 20B |
|
What's next
Learn how to Call open model APIs.