The context that surrounds your data equips your AI applications with a deep understanding of your data assets, improving the accuracy and relevance of LLM-generated responses.
The lookupContext method bridges the context gap using a single API request to retrieve a pre-formatted bundle of data asset metadata optimized for interactive agentic workflows. You can use this compact, LLM-ready context to ground your agents in assessing and using data assets.
You can use the lookupContext method for any data assets stored in Knowledge Catalog, for example, BigQuery tables, datasets, or any other entries.
How to retrieve context for an asset with the lookupContext method
- The agent retrieves data assets that are potentially relevant for context retrieval, for example, by using Knowledge Catalog semantic search.
- The agent uses the
lookupContextmethod to make a single API call or an MCP tool request that retrieves the context for a specific asset. The method returns a response containing a pre-formatted text block. Depending on the
formatparameter you specify in the request, the document can be in YAML, XML, or JSON format.The response contains the following context elements:
Context element Description Technical metadata Resource schemas and physical configurations, such as BigQuery partition and clustering strategies. Operational metadata Joins and other relationships, based on historical query logs and data insights. For more information, see View data relationships. Business descriptions Related business terms, overviews, catalog annotations, descriptions captured in the source system and auto-generated in Knowledge Catalog, and guidelines.
Note: You can use the guidelines aspect on data assets to capture additional context useful for agents when they discover, inspect, or use data assets.Data profile Distribution statistics, distinct value counts, null ratios, and sample values. Data quality Automated data quality check outputs against predefined rules. Context on related data assets Context on related data assets, such as glossary terms or other related assets, like frequently joined tables. The context returned for related assets includes the same range of elements as for the main asset or assets. The agent uses this response to guide the selection of relevant assets or their usage.
Before you begin
Before using the lookupContext method, ensure you have the necessary roles and enable the required APIs.
Required roles
To get the permissions that
you need to call the lookupContext method,
ask your administrator to grant you the
following IAM roles on your Google Cloud project iam.gserviceaccount.com:
-
Read access to catalog resources, including entries, entry groups, and glossaries:
Dataplex Catalog Viewer (
roles/dataplex.catalogViewer)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Enable APIs
To use the lookupContext method, enable the following APIs in your project:
- Knowledge Catalog API
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains the serviceusage.services.enable permission. Learn how to grant
roles.
Retrieve context for a data asset
To retrieve the context for a data asset, access the lookupContext method directly with the Dataplex API or use Knowledge Catalog remote Model Context Protocol (MCP) server or MCP Toolbox For Databases.
The lookupContext method filters the resources based on your permissions. The response contains data only for assets that your identity has the necessary Identity and Access Management (IAM) permissions to access. If you have no permissions on the requested resources, the method returns an empty response.
REST
To retrieve context for a data asset, send the following request:
curl --request POST \
'https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:lookupContext' \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"resources": RESOURCES
"options": OPTIONS
}' \
--compressed
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the region where the asset
exists (for example,
us-central1) - RESOURCES: up to ten entry names to retrieve context for, formatted as
projects/{project}/locations/{location}/entryGroups/{entryGroup}/entries/{entry}. For multiple resources, the API establishes relationships between the requested resources, such as frequent schema joins, and returns the relationship information in the context. - OPTIONS: the options that let you define the context:
formatis the format of the context file. For example,yaml.context_budgetis the number of characters to which the response is limited. If you set theall_schema_fieldsparameter totrue, the API returns all schema fields regardless of thecontext_budgetvalue.
An example request that retrieves context for a BigQuery table looks as follows:
curl --request POST \
'https://dataplex.googleapis.com/v1/projects/test-project/locations/us:lookupContext?key=[YOUR_API_KEY]' \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"resources":
["projects/test-project/locations/us/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table"],
"options":
{
"format":"yaml",
"context_budget":"4000"
}
}' \
--compressed
The response is a pre-formatted block of text similar to the following:
{
"context": "resource: \"projects/test-project/locations/us/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects/test-project/datasets/sales_data/tables/orders\"\ntechnical_metadata:\n schema:\n - name: order_id\n type: STRING\n description: \"Primary key for the order.\"\n - name: customer_id\n type: STRING\n - name: total_amount\n type: NUMERIC\n partitioning:\n type: TIMESTAMP\n field: order_date\nbusiness_descriptions:\n overview: \"Historical record of all customer transactions.\"\n related_terms:\n - \"Revenue\"\n - \"Sales Transactions\"\n guidelines: \"Always filter by 'order_date' to optimize query costs due to partitioning.\"\ndata_profile:\n columns:\n - name: total_amount\n null_ratio: 0.001\n distinct_values: 52340\n sample_values: [45.99, 120.00, 15.50]\ndata_quality:\n summary:\n - rule: \"positive_amounts\"\n status: PASSED\n description: \"Ensures total_amount is greater than zero.\"\noperational_metadata:\n frequent_joins:\n - table: \"projects/test-project/locations/us/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects/test-project/datasets/sales_data/tables/customers\"\n join_key: \"customer_id\"\n"
}
Python
Python
Before trying this sample, follow the Python setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Python API reference documentation.
To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
The following example shows how to retrieve context for a BigQuery table:
from google.cloud import dataplex_v1
# Initialize the client
client = dataplex_v1.CatalogServiceClient()
# Define the request with a seed resource
request = dataplex_v1.LookupContextRequest(
name="projects/test-project/locations/us",
resources=["projects/test-project/locations/us/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table"],
options={"format": "yaml", "budget": "4000"}
)
# Retrieve the LLM-ready context
response = client.lookup_context(request=request)
context_yaml = response.context
print(f"Retrieved Context: \n{context_yaml}")
Best practices for the lookupContext method
To optimize your results when using the lookupContext method, consider the following best practices:
- Request the selected length of the output context with
context_budgetparameter. ThelookupContextmethod will aim to fit the most relevant context into output as close as possible within the limits prescribed by the parameter. - You can list up to ten data assets in the
resourceslist. For example, including several tables in theresourceslist makes the API provide the context not only for those tables but also for possible join paths between them therefore providing necessary guidance on how to use these tables together. - Use the
formatoption, such asyamlorjson, that aligns best with the LLM or agent's parsing logic to avoid costly transformations.
What's next
- Learn how to build an agent to discover your data.
- Learn how to build an agent to enrich your metadata.
- Understand search syntax for Knowledge Catalog.
- Learn more about viewing data relationships.