Vertex AI provides model evaluation metrics to help you determine the performance of your models, such as precision and recall metrics. Vertex AI calculates evaluation metrics by using the test set.
How you use model evaluation metrics
Model evaluation metrics provide quantitative measurements of how your model performed on the test set. How you interpret and use those metrics depends on your business need and the problem your model is trained to solve. For example, you might have a lower tolerance for false positives than for false negatives or the other way around. These kinds of questions affect which metrics you would focus on.
For more information about iterating on your model to improve its performance, see Iterating on your model.
Evaluation metrics returned by Vertex AI
Vertex AI returns several different evaluation metrics such as precision, recall, and confidence thresholds. The metrics that Vertex AI returns depend on your model's objective. For example, Vertex AI provides different evaluation metrics for an image object detection model compared to an image object classification model.
A schema file, downloadable from a Cloud Storage location, determines which evaluation metrics Vertex AI provides for each objective. The following tabs provide links to the schema files and describes the evaluation metrics for each model objective.
You can view and download schema files from the following Cloud Storage
location:
gs://google-cloud-aiplatform/schema/modelevaluation/
- IoU threshold: An intersection over union threshold value that determines which inferences to return. A model returns inferences that are at this value or higher. The higher the threshold, the closer the predicted bounding box values must be to the actual bounding box values.
- Mean average precision: also known as the average precision. This value ranges from zero to one, where a higher value indicates a higher-quality model.
- Confidence threshold: A confidence score that determines which inferences to return. A model returns inferences that are at this value or higher. A higher confidence threshold increases precision but lowers recall. Vertex AI returns confidence metrics at different threshold values to show how the threshold affects precision and recall.
- Recall: The fraction of inferences with this class that the model correctly predicted. Also called true positive rate.
- Precision: The fraction of classification inferences produced by the model that were correct.
- F1 score: The harmonic mean of precision and recall. F1 is a useful metric if you're looking for a balance between precision and recall and there's an uneven class distribution.
- 
Bounding box mean average precision: The single metric for bounding box
evaluations: the meanAveragePrecisionaveraged over allboundingBoxMetrics.
Getting evaluation metrics
You can get an aggregate set of evaluation metrics for your model and, for some objectives, evaluation metrics for a particular class or label. Evaluation metrics for a particular class or label is also known as an evaluation slice. The following content describes how to get aggregate evaluation metrics and evaluation slices by using the Google Cloud console or API.
Google Cloud console
- In the Google Cloud console, in the Vertex AI section, go to the Models page. 
- In the Region drop-down, select the region where your model is located. 
- From the list of models, click your model, which opens the model's Evaluate tab. - In the Evaluate tab, you can view your model's aggregate evaluation metrics, such as the Average precision and Recall. - If the model objective has evaluation slices, the console shows a list of labels. You can click a label to view evaluation metrics for that label, as shown in the following example:  
API
API requests for getting evaluation metrics is the same for each data type and objective, but the outputs are different. The following samples show the same request but different responses.
Getting aggregate model evaluation metrics
The aggregate model evaluation metrics provide information about the model as a whole. To see information about a specific slice, list the model evaluation slices.
To view aggregate model evaluation metrics, use the
projects.locations.models.evaluations.get
method.
For the bounding box metric, Vertex AI returns an array of metric values at different IoU threshold values (between 0 and 1) and confidence threshold values (between 0 and 1). For example, you can narrow in on evaluation metrics at an IoU threshold of 0.85 and a confidence threshold of 0.8228. By viewing these different threshold values, you can see how they affect other metrics such as precision and recall.
Select a tab that corresponds to your language or environment:
REST
Before using any of the request data, make the following replacements:
- LOCATION: Region where your model is stored.
- PROJECT: Your project ID.
- MODEL_ID: The ID of the model resource.
- PROJECT_NUMBER: Your project's automatically generated project number.
- EVALUATION_ID: ID for the model evaluation (appears in the response).
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID/evaluations
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID/evaluations"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID/evaluations" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Listing all evaluation slices
The
projects.locations.models.evaluations.slices.list
method lists all evaluation slices for your model. You must have the model's
evaluation ID, which you can get when you view the aggregated evaluation
metrics.
You can use model evaluation slices to determine how the model performed on a
specific label. The value field tells you which label the metrics are for.
For the bounding box metric, Vertex AI returns an array of metric values at different IoU threshold values (between 0 and 1) and confidence threshold values (between 0 and 1). For example, you can narrow in on evaluation metrics at an IoU threshold of 0.85 and a confidence threshold of 0.8228. By viewing these different threshold values, you can see how they affect other metrics such as precision and recall.
REST
Before using any of the request data, make the following replacements:
- LOCATION: Region where Model is located. For example,
    us-central1.
- PROJECT: .
- MODEL_ID: The ID of your model.
- EVALUATION_ID: ID of the model evaluation that contains the evaluation slices to list.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID/evaluations/EVALUATION_ID/slices
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID/evaluations/EVALUATION_ID/slices"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID/evaluations/EVALUATION_ID/slices" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Getting metrics for a single slice
To view evaluation metrics for a single slice, use the
projects.locations.models.evaluations.slices.get
method. You must have the slice ID, which is provided when you list all
slices. The following sample applies to all data types and
objectives.
REST
Before using any of the request data, make the following replacements:
- LOCATION: Region where Model is located. For example, us-central1.
- PROJECT: .
- MODEL_ID: The ID of your model.
- EVALUATION_ID: ID of the model evaluation that contains the evaluation slice to retrieve.
- SLICE_ID: ID of an evaluation slice to get.
- PROJECT_NUMBER: Your project's automatically generated project number.
- EVALUATION_METRIC_SCHEMA_FILE_NAME: The name of a schema file
    that defines the evaluation metrics to return such as
    classification_metrics_1.0.0.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID/evaluations/EVALUATION_ID/slices/SLICE_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID/evaluations/EVALUATION_ID/slices/SLICE_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID/evaluations/EVALUATION_ID/slices/SLICE_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Iterate on your model
Model evaluation metrics provide a starting point for debugging your model when the model isn't meeting your expectations. For example, low precision and recall scores can indicate that your model needs additional training data or has inconsistent labels. Perfect precision and recall can indicate that the test data is too easy to predict and might not generalize well.
You can iterate on your training data and create a new model. After you create a new model, you can compare the evaluation metrics between the existing model and the new model.
The following suggestions can help you improve models that label items, such as object detection or detection models:
- Consider adding more examples or a wider range of examples in your training data. For example, for an image object detection model, you might include wider angle images, higher or lower resolution images, or different points of view. For more guidance, see Preparing data.
- Consider removing classes or labels that don't have a lot of examples. Insufficient examples prevent the model from consistently and confidently making predictions about those classes or labels.
- Machines can't interpret the name of your classes or labels and don't understand the nuances between them, such as "door" and "door_with_knob." You must provide data to help machines recognize such nuances.
- Augment your data with more examples of true positives and true negatives, especially examples that are close to a decision boundary to mitigate model confusion.
- Specify your own data split (training, validation, and test). Vertex AI randomly assigns items to each set. Therefore, near-duplicates can be allocated in the training and validation sets, which could lead to overfitting and then poor performance on the test set. For more information about setting your own data split, see About data splits for AutoML models.
- If your model's evaluation metrics include a confusion matrix, you can see if the model is confusing two labels, where the model is predicting a particular label significantly more than the true label. Review your data and make sure the examples are correctly labeled.
- If you had a short training time (low maximum number of node hours), you might
get a higher-quality model by allowing it to train for a longer period of time
(higher maximum number of node hours).