Method: endpoints.predict

Full name: projects.locations.endpoints.predict

Perform an online inference. Use this method to run inference on Google's generative AI models as well as custom models deployed to Gemini Enterprise Agent Platform. This method supports both generative AI tasks (such as image generation, virtual try-on, text generation, and multimodal embeddings) and traditional machine learning tasks (such as classification and regression).

To run inference on a base (non-tuned) Gemini model, see endpoints.generateContent.

Endpoint

post https://{service-endpoint}/v1/{endpoint}:predict

Where {service-endpoint} is one of the supported service endpoints.

Path parameters

endpoint string

Required. The resource name of the publisher model or endpoint requested to serve the prediction. For Google models like Embedding or Veo, use the publisher model format. For tuned models or other models deployed to an Agent Platform endpoint, use the endpoint format.

  • Publisher model format: projects/{project}/locations/{location}/publishers/google/models/{model}
  • Endpoint format: projects/{project}/locations/{location}/endpoints/{endpoint}

Request body

The request body contains data with the following structure:

Fields
instances[] value (Value format)

Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behavior is as documented by that Model. The schema of each instance depends on the type of request.

parameters value (Value format)

The parameters that govern the prediction. The schema of the parameters depends on the type of request.

labels map (key: string, value: string)

Optional. The user labels for Imagen billing usage only. Only Imagen supports labels. For other use cases, it will be ignored.

Response body

If successful, the response body contains an instance of PredictResponse.