The Vertex AI SDK includes the following prediction classes. One class is for batch predictions. The others are related to online predictions or Vector Search predictions. For more information, see Overview of getting predictions on Vertex AI.
Batch prediction class
A batch prediction is a group of asynchronous prediction requests. You request
batch predictions from the model resource without needing to deploy the model to
an endpoint. Batch predictions are suitable for when you don't need an immediate
response and want to process data with a single request.
BatchPredictionJob is the one class in the
Vertex AI SDK that is specific to batch predictions.
BatchPredictionJob
The BatchPredictionJob class represents a
group of asynchronous prediction requests. There are two ways to create a batch
prediction job:
- The preferred way to create a batch prediction job is to use the - batch_predictmethod on your trained- Model. This method requires the following parameters:- instances_format: The format of the batch prediction request file:- jsonl,- csv,- bigquery,- tf-record,- tf-record-gzip, or- file-list.
- prediction_format: The format of the batch prediction response file:- jsonl,- csv,- bigquery,- tf-record,- tf-record-gzip, or- file-list.
- gcs_source:A list of one or more Cloud Storage paths to your batch prediction requests.
- gcs_destination_prefix: The Cloud Storage path to which Vertex AI writes the predictions.
 - The following code is an example of how you might call - Model.batch_predict:- batch_prediction_job = model.batch_predict( instances_format="jsonl", predictions_format="jsonl", job_display_name="your_job_display_name_string", gcs_source=['gs://path/to/my/dataset.csv'], gcs_destination_prefix='gs://path/to/my/destination', model_parameters=None, starting_replica_count=1, max_replica_count=5, machine_type="n1-standard-4", sync=True )
- The second way to create a batch prediction job is to call the - BatchPredictionJob.createmethod. The- BatchPredictionJob.createmethod requires four parameters:- job_display_name: A name you that you assign to the batch prediction job. Note that while- job_display_nameis required for- BatchPredictionJob.create, it is optional for- Model.batch_predict.
- model_name: The fully-qualified name or ID of the trained- Modelyou use for the batch prediction job.
- instances_format: The format of the batch prediction request file:- jsonl,- csv,- bigquery,- tf-record,- tf-record-gzip, or- file-list.
- predictions_format: The format of the batch prediction response file:- jsonl,- csv,- bigquery,- tf-record,- tf-record-gzip, or- file-list.
 
Online prediction classes
Online predictions are synchronous requests made to a model endpoint. You must deploy your model to an endpoint before you can make an online prediction request. Use online predictions when you want predictions that are generated based on application input or when you need a fast prediction response.
Endpoint
Before you can get online predictions from your model, you must deploy your model to an endpoint. When you deploy a model to an endpoint, you associate the physical machine resources with the model so it can serve online predictions.
You can deploy more than one model to one endpoint. You can also deploy one model to more than one endpoint. For more information, see Considerations for deploying models.
To create an Endpoint resource, you deploy your
model. When you call the
Model.deploy
method, it creates and returns an Endpoint.
The following is a sample code snippet that shows how to create a custom training job, create and train a model, and then deploy the model to an endpoint.
# Create your custom training job
job = aiplatform.CustomTrainingJob(
    display_name="my_custom_training_job",
    script_path="task.py",
    container_uri="us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest",
    requirements=["google-cloud-bigquery>=2.20.0", "db-dtypes"],
    model_serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest"
)
# Start the training and create your model
model = job.run(
    dataset=dataset,
    model_display_name="my_model_name",
    bigquery_destination=f"bq://{project_id}"
)
# Create an endpoint and deploy your model to that endpoint
endpoint = model.deploy(deployed_model_display_name="my_deployed_model")
# Get predictions using test data in a DataFrame named 'df_my_test_data'
predictions = endpoint.predict(instances=df_my_test_data)
PrivateEndpoint
A private endpoint is like an Endpoint resource,
except predictions are sent across a secure network to the Vertex AI
online prediction service. Use a private endpoint if your organization wants to
keep all traffic private.
To use a private endpoint, you must configure Vertex AI to peer with a Virtual Private Cloud (VPC). A VPC is required for the private prediction endpoint to connect directly with Vertex AI. For more information, see Set up VPC network peering and Use private endpoints for online prediction.
ModelDeploymentMonitoringJob
Use the
ModelDeploymentMonitoringJob
resource to monitor your model and receive alerts if it deviates in a way that
might impact the quality of your model's predictions. 
When the input data deviates from the data used to train your model, the model's performance can deteriorate, even if the model hasn't changed. Model monitoring analyzes input date for feature skew and drift:
- Skew occurs when the production feature data distribution deviates from the feature data used to train the model.
- Drift occurs when the production feature data changes significantly over time.
For more information, see Introduction to Vertex AI model monitoring. For an example of how to implement Vertex AI monitoring with the Vertex AI SDK, see the Vertex AI model monitoring with explainable AI feature attributions notebook on GitHub.
Vector Search prediction classes
Vector Search is a managed service that builds similarity indexes, or vectors, to perform similarity matching. There are two high-level steps to perform similarity matching:
- Create a vector representation of your data. Data can be text, images, video, audio, or tabular data. 
- Vector Search uses the endpoints of the vectors you create to perform a high scale, low latency search for similar vectors. 
For more information, see Vector Search overview and the Create a Vector Search index notebook on GitHub.
MatchingEngineIndex
The MatchingEngineIndex class represents
the indexes, or vectors, you create that Vector Search uses to
perform its similarity search.
There are two search algorithms you can use for your index:
- TreeAhConfiguses a shallow the tree-AH algorithm (shallow tree using asymmetric hashing). Use- MatchingEngineIndex.create_tree_ah_indexto create an index that uses the tree-AH algorithm algorithm.
- BruteForceConfiguses a standard linear search) Use- MatchingEngineIndex.create_brute_force_indexto create an index that uses a standard linear search.
For more information about how you can configure your indexes, see Configure indices.
The following code is an example of creating an index that uses the tree-AH algorithm:
my_tree_ah_index = aiplatform.Index.create_tree_ah_index(
    display_name="my_display_name",
    contents_delta_uri="gs://my_bucket/embeddings",
    dimensions=1,
    approximate_neighbors_count=150,
    distance_measure_type="SQUARED_L2_DISTANCE",
    leaf_node_embedding_count=100,
    leaf_nodes_to_search_percent=50,
    description="my description",
    labels={ "label_name": "label_value" }
)
The following code is an example of creating an index that uses the brute force algorithm:
my_brute_force_index = aiplatform.Index.create_brute_force_index(
    display_name="my_display_name",
    contents_delta_uri="gs://my_bucket/embeddings",
    dimensions=1,
    approximate_neighbors_count=150,
    distance_measure_type="SQUARED_L2_DISTANCE",
    description="my description",
    labels={ "label_name": "label_value" }
)
MatchingEngineIndexEndpoint
Use the
MatchingEngineIndexEndpoint class
to create and retrieve an endpoint. After you deploy a model to your endpoint,
you get an IP address that you use to run your queries.
The following code is an example of creating a matching engine index endpoint and then deploying a matching engine index to it:
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name="sample_index_endpoint",
    description="index endpoint description",
    network="projects/123456789123/global/networks/my_vpc"
)
my_index_endpoint = my_index_endpoint.deploy_index(
    index=my_tree_ah_index, deployed_index_id="my_matching_engine_index_id"
)
What's next
- Learn about the Vertex AI SDK.