This document answers some of the Frequently Asked Questions related to Knowledge Catalog (formerly Dataplex Universal Catalog).
For more information about Knowledge Catalog, see Knowledge Catalog overview.
What is Knowledge Catalog?
Google Knowledge Catalog is an intelligent governance solution for data and AI assets in Google Cloud. It provides a centralized inventory where you can discover, manage, and govern your data across Google Cloud data sources like BigQuery, Cloud Storage, Pub/Sub, and Spanner. It uses AI to automate data discovery, metadata enrichment, and data quality. Through its governed data catalog, Knowledge Catalog provides the essential grounding that AI agents need to generate high-quality content.
What is Data Catalog?
Data Catalog was the original name of Google Cloud's metadata service. Over time, it evolved into Dataplex Universal Catalog, and it has now been renamed and evolved into Knowledge Catalog.
While the term "Data Catalog" is still used to describe this type of data indexing, in the context of Google Cloud, it refers to our legacy product. We recommend that all new projects use Knowledge Catalog to take advantage of AI-powered features and enhanced governance.
Is Knowledge Catalog different from Data Catalog?
Yes, Knowledge Catalog is the AI-powered data governance platform that eventually replaces the existing Data Catalog. While they share similar concepts, Knowledge Catalog provides several enhancements:
AI-powered context: Unlike Data Catalog, Knowledge Catalog uses Gemini to automatically extract business context, generate natural language descriptions, and provide SQL "golden queries" to ground AI agents.
Rich metadata support: Knowledge Catalog supports more complex metadata types, such as nested arrays, maps, and records.
Agentic access: AI agents can discover and adaptively use Knowledge Catalog tools through a local or remote MCP server.
Data discovery: Knowledge Catalog can auto-ingest metadata from a larger set of Google Cloud services and external data sources.
Governance at scale: It offers enhanced capabilities for data profiling, automatic data quality, and centralized governance.
What is Knowledge Catalog used for?
Google Knowledge Catalog solves the "data cold start" problem—the time wasted trying to find, understand, and trust data before you can actually use it. Its primary uses include the following:
Accelerated data discovery: Instead of navigating complex organizational silos to locate data, you can use natural language search (for example, "Show me the most recent customer churn data") to find assets across Google Cloud resources instantly, increasing productivity for data consumers.
Grounding AI agents: It acts as the "source of truth" for generative AI or ADK. By linking physical data to business definitions, it ensures that AI agents (like those built on Vertex AI) use high-quality data, which significantly reduces AI hallucinations and improves trust in AI-generated insights.
Automated data governance: It automatically scans your data to identify sensitive information (like PII), tracks where data comes from (lineage), and monitors its accuracy (auto data quality). These capabilities help improve data trust, security, and compliance with less manual effort.
Discovering "dark data": It can scan unstructured files (like PDFs or images in Cloud Storage), extract the information inside, and make it searchable and queryable in BigQuery, which helps you unlock insights from previously inaccessible data.
For hands-on use cases, see Explore Knowledge Catalog.
What types of metadata does Knowledge Catalog store?
Knowledge Catalog stores three types of metadata:
Technical metadata: Automatically harvested schemas, table names, and system properties.
Business metadata: User-defined context such as business descriptions, glossary terms, and ownership.
Runtime metadata: Information about data lineage, data quality scores, and data profiling statistics.
How do I migrate from Data Catalog?
The transition to Knowledge Catalog is designed to be seamless, with no manual data movement required. Depending on your current usage, the process involves two main phases:
Preparatory phase: If you have custom metadata (tags, tag templates, or custom entries), this content is automatically brought into Knowledge Catalog as read-only. During this phase, you perform configuration tasks to make your existing Data Catalog content simultaneously available in the new interface.
Transfer phase: Once prepared, you transfer the active state of your metadata to make it read-write within Knowledge Catalog. This step should be coordinated with updating any programmatic workloads (APIs, client libraries, or Terraform modules) to point to the new Knowledge Catalog endpoints.
If you have no custom metadata or if you are new to the platform, you can complete the transition by setting Knowledge Catalog as your default UI experience in the Google Cloud console.
For more information, see Transition from Data Catalog to Knowledge Catalog.