When you import structured data using the Google Cloud console, Gemini Enterprise auto-detects the schema. You can either use this auto-detected schema in your engine or use the API to provide a schema to indicate the structure of the data.
If you provide a schema and later update it with a new schema, the new schema must be backward compatible with the original. Otherwise the schema update fails.
For reference information about the schema, see
dataStores.schemas.
Approaches to providing the schema for your data store
There are various approaches to determining the schema for structured data.
Auto-detect and edit. Let Gemini Enterprise auto-detect and suggest an initial schema. Then, you refine the schema through the console interface. Google highly recommends that, after your fields are auto-detected, you map key properties to all the important fields.
This is the approach that you'll use when following the Google Cloud console instructions for structured data in Create a first-party data store.
Provide the schema as a JSON object. Provide the schema to Gemini Enterprise as a JSON object. You need to have prepared a correct JSON object. For an example of a JSON object, see Example schema as a JSON object. After creating the schema, you upload your data according to that schema.
This is the approach that you can use when creating a data store through the API using a curl command (or program). For example, see Import once from BigQuery. Also see the following instructions, Provide your own schema.
About auto-detect and edit
When you begin importing data, Gemini Enterprise samples the first few documents that are imported. Based on these documents, it proposes a schema for the data, which you can then review or edit.
If fields that you want to map to key properties aren't present in the sampled documents, then you can manually add these fields when you review the schema.
If Gemini Enterprise encounters additional fields later in the data import, it still imports these fields and adds them to the schema. If you want to edit the schema after all the data has been imported, see Update your schema.
Example schema as a JSON object
You can define your own schema using the JSON Schema format, which is an open source, declarative language to define, annotate, and validate JSON documents. For example, this is a valid JSON schema annotation:
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "dynamic": "true", "datetime_detection": true, "geolocation_detection": true, "properties": { "title": { "type": "string", "keyPropertyMapping": "title", "retrievable": true, "completable": true }, "description": { "type": "string", "keyPropertyMapping": "description" }, "categories": { "type": "array", "items": { "type": "string", "keyPropertyMapping": "category" } }, "uri": { "type": "string", "keyPropertyMapping": "uri" }, "brand": { "type": "string", "indexable": true, "dynamicFacetable": true }, "location": { "type": "geolocation", "indexable": true, "retrievable": true }, "creationDate": { "type": "datetime", "indexable": true, "retrievable": true }, "isCurrent": { "type": "boolean", "indexable": true, "retrievable": true } } }
Here are some of the fields in this schema example:
dynamic. Ifdynamicis set to the string value"true", then any new properties found in the imported data is added to the schema. Ifdynamicis set to"false", new properties found in imported data are ignored; the properties are not added to the schema nor are the values are imported.For example, a schema has two properties:
titleanddescription, and you upload a data that contains properties fortitle,description, andrating. Ifdynamicis"true", then the ratings property and data are imported. Ifdynamicis"false", thenratingproperties are not imported, althoughtitleanddescriptionare.The default is
"true".datetime_detection. Ifdatetime_detectionis set to the booleantrue, then, when data in datetime format are imported, the schema type is set todatetime. The supported formats are RFC 3339 and ISO 8601.For example:
2024-08-05 08:30:00 UTC2024-08-05T08:30:00Z2024-08-05T01:30:00-07:002024-08-052024-08-05T08:30:00+00:00
If
datatime_detectionis set to the booleanfalse, then, when data in datetime format are imported, the schema type is set tostring.The default is
true.geolocation_detection. Ifgeolocation_detectionis set to the booleantrue, then, when data in geolocation format are imported, the schema type is set togeolocation. Data is detected as geolocation if it is an object containing a latitude number and a longitude number or an object containing an address string.For example:
"myLocation": {"latitude":37.42, "longitude":-122.08}"myLocation": {"address": "1600 Amphitheatre Pkwy, Mountain View, CA 94043"}
If
geolocation_detectionis set to the booleanfalse, then, when data in geolocation format are imported, the schema type is set toobject.The default is
true.keyPropertyMapping. A field that maps predefined keywords to critical fields in your documents, helping to clarify their semantic meaning. Values includetitle,description,uri, andcategory. Note that your field name doesn't need to match thekeyPropertyValuesvalue. For example, for a field that you namedmy_title, you can include akeyPropertyValuesfield with a value oftitle.Fields marked with
keyPropertyMappingare by default indexable and searchable, but not retrievable, completable, or dynamicFacetable. This means that you don't need to include theindexableorsearchablefields with akeyPropertyValuesfield to get the expected default behavior.type. The type of the field. This is a string value that isdatetime,geolocationor one of the primitive types (integer,boolean,object,array,number, orstring).retrievable. Indicates whether this field can be returned in a search response. This can be set for fields of typenumber,string,boolean,integer,datetime, andgeolocation. A maximum of 50 fields can be set as retrievable. User-defined fields andkeyPropertyValuesfields are not retrievable by default. To make a field retrievable, include"retrievable": truewith the field.indexable. Indicates whether this field can be filtered, faceted, boosted, or sorted in theservingConfigs.searchmethod. This can be set for fields of typenumber,string,boolean,integer,datetime, andgeolocation. A maximum of 50 fields can be set as indexable. User-defined fields are not indexable by default, except for fields containing thekeyPropertyMappingfield. To make a field indexable, include"indexable": truewith the field.dynamicFacetable. Indicates that the field can be used as a dynamic facet. This can be set for fields of typenumber,string,boolean, andinteger. To make a field dynamically facetable, it must also be indexable: include"dynamicFacetable": trueand"indexable": truewith the field.searchable. Indicates whether this field can be reverse indexed to match unstructured text queries. This can only be set for fields of typestring. A maximum of 50 fields can be set as searchable. User-defined fields are not searchable by default, except for fields containing thekeyPropertyMappingfield. To make a field searchable, include"searchable": truewith the field.completable. Indicates whether this field can be returned as an autocomplete suggestion. This can only be set for fields of typestring. To make a field completable, include"completable": truewith the field.
Provide your own schema as a JSON object
To provide your own schema, you create a data store that contains an empty schema and then you update the schema, supplying your schema as a JSON object. Follow these steps:
Prepare the schema as a JSON object, using the Example schema as a JSON object as a guide.
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DATA_STORE_DISPLAY_NAME", "industryVertical": "INDUSTRY_VERTICAL" }'Replace the following:
PROJECT_ID: the ID of your project.DATA_STORE_ID: the ID of the data store that you want to create. This ID can contain only lowercase letters, digits, underscores, and hyphens.DATA_STORE_DISPLAY_NAME: the display name of the data store that you want to create.INDUSTRY_VERTICAL:GENERIC
Use the schemas.patch API method to provide your new JSON schema as a JSON object.
curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/schemas/default_schema" \ -d '{ "structSchema": JSON_SCHEMA_OBJECT }'Replace the following:
PROJECT_ID: the ID of your project.DATA_STORE_ID: the ID of the data store.JSON_SCHEMA_OBJECT: your new JSON schema as a JSON object. For example:{ "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "title": { "type": "string", "keyPropertyMapping": "title" }, "categories": { "type": "array", "items": { "type": "string", "keyPropertyMapping": "category" } }, "uri": { "type": "string", "keyPropertyMapping": "uri" } } }
Optional: Review the schema by following the procedure View a schema definition.
What's next
- Create a search app
- Get the schema definition for structured data
- Update a schema for structured data