Model Garden lets you self-deploy select partner models (preview). Self-deployed models aren't serverless. You must deploy them on Vertex AI before you use them. These models deploy securely within your Google Cloud project and VPC network. For more information about self-deployed models, see the self-deployed models documentation.
Purchase self-deployable partner models
To deploy self-deployable partner models on Vertex AI, you must first purchase them through Google Cloud Marketplace. To purchase a self-deployed partner model, do the following:
Go to Model Garden.
In Model collections, click Self-deployable partner models to filter the list of models.
Click the model card of the partner model that you want to purchase.
Click Contact sales.
Complete the form and submit your request.
After completing these steps, you'll be connected with a Google Cloud sales representative to finalize the purchase.
Deploy models
After purchasing a self-deployable partner model, you can deploy it to a Vertex AI Endpoint using one-click deployment. This process simplifies deployment by pre-configuring the necessary settings.
You can perform one-click deployment using either the Google Cloud console or the Vertex AI SDK for Python.
Console
To deploy a partner model in the Google Cloud console, do the following:
Go to Model Garden.
Locate and click the model card of the partner model that you want to use.
Click Deploy model.
Configure your deployment settings as prompted.
Click Deploy.
Python
The following sample shows how to deploy a partner model using the Vertex AI SDK for Python. Replace the placeholder values with your specific information.
import vertexai
from vertexai import model_garden
vertexai.init(project="PROJECT_ID", location="LOCATION")
# Replace with the actual partner model ID from Model Garden
model = model_garden.OpenModel("PARTNER_MODEL_ID")
endpoint = model.deploy(
accept_eula=True,
machine_type="MACHINE_TYPE", # e.g., "a3-ultragpu-8g"
accelerator_type="ACCELERATOR_TYPE", # e.g., "NVIDIA_H200_141GB"
accelerator_count=ACCELERATOR_COUNT, # e.g., 8
serving_container_image_uri="SERVING_CONTAINER_IMAGE_URI",
endpoint_display_name="ENDPOINT_DISPLAY_NAME",
model_display_name="MODEL_DISPLAY_NAME",
use_dedicated_endpoint=True,
)
print(f"Model deployed to endpoint: {endpoint.resource_name}")
What's next
- Choose an open model serving option
- Use open models using Model as a Service (MaaS)
- Deploy open models with prebuilt containers
- Deploy open models with a custom vLLM container