OpenAI models

OpenAI models are available for use as managed APIs and self-deployed models on Vertex AI. You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.

gpt-oss 120B

OpenAI gpt-oss 120B is a 120B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The model is optimized for deployment on consumer hardware.

The 120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running on a single 80GB GPU.

Go to the gpt-oss 120B model card

gpt-oss 20B

OpenAI gpt-oss 20B is a 20B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The model is optimized for deployment on consumer hardware.

The 20B model delivers similar results to OpenAI o3-mini on common benchmarks and can run on edge devices with 16GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.

Go to the gpt-oss 20B model card

Use OpenAI models

For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names:

  • For gpt-oss 120B, use gpt-oss-120b-maas
  • For gpt-oss 20B, use gpt-oss-20b-maas

To learn how to make streaming and non-streaming calls to OpenAI models, see Call open model APIs.

To use a self-deployed Vertex AI model:

  1. Navigate to the Model Garden console.
  2. Find the relevant Vertex AI model.
  3. Click Enable and complete the provided form to get the necessary commercial use licenses.

For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .

OpenAI model region availability

OpenAI models are available in the following regions:

Model Regions
gpt-oss 120B
  • global
    • Max output: 131,072
    • Context length: 131,072
  • us-central1
    • Max output: 131,072
    • Context length: 131,072
gpt-oss 20B
  • us-central1
    • Max output: 32,768
    • Context length: 131,072

What's next

Learn how to Call open model APIs.