Overview of creating managed datasets on Vertex AI

You can use a managed dataset to provide the source data used to train AutoML and custom models on Vertex AI. A managed dataset is required for AutoML and is optional for custom training.

Permissions and access control

When you use data from a Cloud Storage bucket to create a dataset, Vertex AI requires permissions to access the data. Vertex AI uses a special Google-managed service account known as a Service Agent to securely access your data. For more information on the roles required and how the Service Agent works, see Access control with IAM.

Create a managed dataset for AutoML models

You can create managed datasets for training AutoML models by using the Google Cloud console or the Vertex AI API. The instructions for how to do this slightly vary based on your data type and model objective. Start by preparing your training data.

Image

Learn how to create a managed dataset for the following types of image AutoML models:

Tabular

Learn how to create a managed dataset for the following types of tabular AutoML models:

Create a managed dataset for custom trained models

The instructions on how to create a managed dataset for training custom models are the same, regardless of your data type or model objective.

For details, see Use managed datasets.

View managed datasets using Dataplex Universal Catalog

Dataplex Universal Catalog is a fully managed, scalable metadata management service that provides a centralized location to search for datasets across projects and regions. It's integrated with Vertex AI and offers similar capabilities to the deprecated Data Catalog.

You can use Dataplex Universal Catalog to discover, understand, and enrich your data with aspects (which are similar to Data Catalog tags).

For details on managing metadata and aspects for your Vertex AI resources, see Manage aspects and enrich metadata in the Dataplex Universal Catalog.