Use data insights for structured data

This document explains how to generate, view, and manage data insights for your structured data. Using AI-powered data insights helps you accelerate data exploration by automatically generating descriptions, relationship graphs, and SQL queries from your table and dataset metadata.

In BigQuery Studio, you can generate data insights for BigQuery datasets, tables, views, BigLake tables, and BigQuery external tables,

In Knowledge Catalog, you can generate data insights for BigLake and Iceberg REST Catalog tables.

Before you begin

Before using data insights, ensure you have completed the following prerequisites:

Required roles

To get the permissions that you need to use data insights, ask your administrator to grant you the following IAM roles:

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to use data insights. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to use data insights:

  • dataplex.datascans.create
  • dataplex.datascans.get
  • dataplex.datascans.getData
  • dataplex.datascans.run

You might also be able to get these permissions with custom roles or other predefined roles.

Enable APIs

To use data insights, enable the following APIs in your project:

  • Dataplex API
  • BigQuery API
  • Gemini for Google Cloud API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

For more information about enabling the Gemini for Google Cloud API, see Enable the Gemini for Google Cloud API in a Google Cloud project.

Prepare data

For BigLake tables, ensure your data is in Cloud Storage and a BigLake table is created.

For Iceberg REST Catalog tables, ensure your tables are registered in the BigLake metastore.

Generate insights in BigQuery

Data insights for BigQuery datasets, tables, views, BigLake tables, and BigQuery external tables are generated using Gemini in BigQuery and can only be generated in BigQuery Studio.

You must first set up Gemini in BigQuery, then generate insights. After you generate insights, you can view and modify them in Knowledge Catalog.

For more information about generating insights in BigQuery, see the following documents:

Generate insights for Iceberg REST Catalog tables

  1. In the Google Cloud console, go to the Knowledge Catalog Search page.

    Go to Search

  2. In the Filters, select BigLake.

  3. Select the Iceberg REST Catalog table for which you want to generate insights.

  4. Click the Insights tab. If the tab is empty, it means that the insights for this table aren't generated yet.

  5. To generate insights and attach them permanently to the table as aspects, click Generate and publish. This makes the insights indexable, searchable, and visible to other users in your organization within the Knowledge Catalog.

    To generate insights and view them temporarily during your current session, click Generate without publishing. Use this option if you only need a quick analysis of the data without saving the metadata to the Knowledge Catalog.

    For more information about the differences between the Generate and publish and Generate without publishing modes, see Modes for generating data insights.

  6. Select a region to generate insights and click Generate.

    It takes a few minutes for the insights to be populated.

  7. Click the Insights tab and review the following:

    • Descriptions: These are the AI-generated summaries explaining the purpose of the table and detailing specific columns.
    • Sample queries: This is the list of tailored SQL queries designed specifically for your dataset schema and content.
  8. To view the SQL query that answers a question, click the question.

Review the generated insights for a resource

To view the generated insights for a resource, complete the following steps:

  1. In the Google Cloud console, go to the Knowledge Catalog Search page.

    Go to Search

  2. Search for the resource for which you want to view insights.

  3. In the search results, click the resource to open its entry details page.

  4. Review the Descriptions and Queries generated for the selected resource.

  5. To view the relationship graphs to understand how data points connect, click the Relationships (Preview) tab. You can only view relationships at the table level, not at the dataset level.

Manage table insights

After you generate and publish table insights, you can review and manage them as metadata aspects in the Knowledge Catalog. Table-level insights include table and column descriptions, and sample queries.

Update generated descriptions for a table

You can update table and column descriptions using only the Dataplex API. To do this, use the entries.patch method.

Update generated queries for a table

You can update the generated queries for a table using both Google Cloud console and Dataplex API.

Console

  1. Search for the table for which you want to update the generated queries.

  2. In the search results, click the table to open its entry details page.

  3. In the Queries section, click Edit.

  4. Update the query description as needed.

  5. Manage ownership: By default, the Source is set to Agent. If you modify a query and change the source to User, subsequent insight generation runs won't override your changes. If the Source remains Agent, the query may be replaced during a regeneration.

  6. Manage overrides: To prevent all queries from being overridden during a re-run, you can set the User managed option to True. This applies to the entire set of queries for that metadata aspect, ensuring that no manual changes are lost.

REST

To update queries for a table, use the entries.patch method.

Update generated relationships for a table

You can update relationships using only the Dataplex API. To do this, use entries.patch method.

Manage dataset insights

Dataset-level insights focus on high-level descriptions and dataset-wide queries.

Update generated descriptions for a dataset

You can update the dataset descriptions using only the Dataplex API. To do this, use the entries.patch method.

Update generated queries for a dataset

You can update the generated queries for a dataset using both Google Cloud console and Dataplex API.

Console

  1. Search for the dataset for which you want to update the generated queries.

  2. In the search results, click the dataset to open its entry details page.

  3. In the Queries section, click Edit.

  4. Update the description as needed.

  5. Manage ownership: By default, the Source is set to Agent. If you modify a query and change the source to User, subsequent insight generation runs won't override your changes. If the Source remains Agent, the query may be replaced during a regeneration.

  6. Manage overrides: To prevent all queries from being overridden during a re-run, you can set the User managed option to True. This applies to the entire set of queries for that metadata aspect, ensuring that no manual changes are lost.

REST

To update queries for a dataset, use the entries.patch method.

What's next