Explore Knowledge Catalog
Knowledge Catalog (formerly Dataplex Universal Catalog) is an AI-powered data governance solution that provides high-quality data context for agents to ground generated content. This page provides you with hands-on use cases to help you get started with Knowledge Catalog.
Build and power AI agents
Build an agent to discover your data
Run complex, natural language queries on enterprise data assets, using a discovery agent that makes Knowledge Catalog API calls (Python).
Build an agent to enrich metadata
Generate AI-powered overviews for your data assets at scale, using an enrichment agent that makes Knowledge Catalog API calls (Python).
Use Gemini CLI agent to test data context
Verify that Knowledge Catalog can distinguish between source data and temporary derivatives, using natural language queries to Gemini CLI connected to a local MCP server.
Establish data governance
Build a data foundation
Set up a realistic, "messy" data lake in BigQuery, apply rigid metadata tags (aspects) to differentiate valid data from noise, and use Gemini CLI to verify that it follows your rules.
Set up foundational governance
Attach structured, schema-driven metadata (aspects) and business definitions (glossaries) to your data assets (entries) using the Google Cloud console.
Build a governed Iceberg lakehouse
Create Apache Iceberg tables, enforce centralized data policies for column-level security, define security policies, and visualize automated data lineage.
Analyze data lineage
Analyze the impact of data changes
Identify how data transformations affect downstream resources, data integrity, and workflows.
Analyze causes of a PII leak
Trace the flow of sensitive data back to the process that moves it from a trusted to an untrusted location.
Optimize storage costs
Reduce storage costs by identifying assets that are not actively used as sources for other processes.
Automate data quality with AI
Automate data quality scans
Through Gemini CLI, use natural language queries to profile data and generate quality rules, then deploy data quality rules as automated scans.