The schema defines the output of the processed document by a processor.
| JSON representation |
|---|
{ "displayName": string, "description": string, "entityTypes": [ { object ( |
| Fields | |
|---|---|
displayName |
Display name to show users. |
description |
Description of the schema. |
entityTypes[] |
Entity types of the schema. |
metadata |
Metadata of the schema. |
EntityType
EntityType is the wrapper of a label of the corresponding model with detailed attributes and limitations for entity-based processors. Multiple types can also compose a dependency tree to represent nested types.
| JSON representation |
|---|
{ "displayName": string, "name": string, "baseTypes": [ string ], "properties": [ { object ( |
| Fields | |
|---|---|
displayName |
User defined name for the type. |
name |
Name of the type. It must be unique within the schema file and cannot be a "Common Type". The following naming conventions are used:
|
baseTypes[] |
The entity type that this type is derived from. For now, one and only one should be set. |
properties[] |
Description the nested structure, or composition of an entity. |
Union field
|
|
enumValues |
If specified, lists all the possible values for this entity. This should not be more than a handful of values. If the number of values is >10 or could change frequently use the |
EnumValues
Defines the a list of enum values.
| JSON representation |
|---|
{ "values": [ string ] } |
| Fields | |
|---|---|
values[] |
The individual values that this enum values type can include. |
Property
Defines properties that can be part of the entity type.
| JSON representation |
|---|
{ "name": string, "displayName": string, "valueType": string, "occurrenceType": enum ( |
| Fields | |
|---|---|
name |
The name of the property. Follows the same guidelines as the EntityType name. |
displayName |
User defined name for the property. |
valueType |
A reference to the value type of the property. This type is subject to the same conventions as the |
occurrenceType |
Occurrence type limits the number of instances an entity type appears in the document. |
method |
Specifies how the entity's value is obtained. |
OccurrenceType
Types of occurrences of the entity type in the document. This represents the number of instances, not mentions, of an entity. For example, a bank statement might only have one account_number, but this account number can be mentioned in several places on the document. In this case, the account_number is considered a REQUIRED_ONCE entity type. If, on the other hand, it's expected that a bank statement contains the status of multiple different accounts for the customers, the occurrence type is set to REQUIRED_MULTIPLE.
| Enums | |
|---|---|
OCCURRENCE_TYPE_UNSPECIFIED |
Unspecified occurrence type. |
OPTIONAL_ONCE |
There will be zero or one instance of this entity type. The same entity instance may be mentioned multiple times. |
OPTIONAL_MULTIPLE |
The entity type will appear zero or multiple times. |
REQUIRED_ONCE |
The entity type will only appear exactly once. The same entity instance may be mentioned multiple times. |
REQUIRED_MULTIPLE |
The entity type will appear once or more times. |
Method
Specifies how the entity's value is obtained from the document.
| Enums | |
|---|---|
METHOD_UNSPECIFIED |
Unspecified method. It defaults to EXTRACT. |
EXTRACT |
The entity's value is directly extracted as-is from the document text. |
DERIVE |
The entity's value is derived through inference and is not necessarily an exact text extraction from the document. |
Metadata
Metadata for global schema behavior.
| JSON representation |
|---|
{ "documentSplitter": boolean, "documentAllowMultipleLabels": boolean, "prefixedNamingOnProperties": boolean, "skipNamingValidation": boolean } |
| Fields | |
|---|---|
documentSplitter |
If true, a |
documentAllowMultipleLabels |
If true, on a given page, there can be multiple |
prefixedNamingOnProperties |
If set, all the nested entities must be prefixed with the parents. |
skipNamingValidation |
If set, this will skip the naming format validation in the schema. So the string values in |