Use open models using Model as a Service (MaaS)

This document describes how to use open models through Model as a Service (MaaS) on Vertex AI. MaaS provides serverless access to selected partner and open-source models, eliminating the need to provision or manage infrastructure.

Model Garden is a centralized library of AI and ML models from Google, Google Partners, and open models (open-weight and open-source), including MaaS models. Model Garden provides multiple ways to deploy available models on Vertex AI, including models from Hugging Face.

For more information about MaaS, see the partner models documentation.

Before you begin

To use MaaS models, you must enable the Vertex AI API in your Google Cloud project.

gcloud services enable aiplatform.googleapis.com

Enable the model's API

Before you can use a MaaS model, you must enable its API. To do this, go to the model page in Model Garden. Some models available through MaaS are also available for self-deployment. The Model Garden model cards for both offerings differ. The MaaS model card includes API Service in its name.

Call the model using the Google Gen AI SDK for Python

The following example calls the Llama 3.3 model using the Google Gen AI SDK for Python.

from google import genai
from google.genai import types

PROJECT_ID="PROJECT_ID"
LOCATION="LOCATION"
MODEL="meta/llama-3.3-70b-instruct-maas"  # The model ID from Model Garden with "API Service"

# Define the prompt to send to the model.
prompt = "What is the distance between earth and moon?"

# Initialize the Google Gen AI SDK client.
client = genai.Client(
    vertexai=True,
    project=PROJECT_ID,
    location=LOCATION,
)

# Prepare the content for the chat.
contents: types.ContentListUnion = [
    types.Content(
        role="user",
        parts=[
            types.Part.from_text(text=prompt)
        ]
    )
]

# Configure generation parameters.
generate_content_config = types.GenerateContentConfig(
    temperature = 0,
    top_p = 0,
    max_output_tokens = 4096,
)

try:
    # Create a chat instance with the specified model.
    chat = client.chats.create(model=MODEL)
    # Send the message and print the response.
    response = chat.send_message(prompt)
    print(response.text)
except Exception as e:
    print(f"{MODEL} call failed due to {e}")

What's next