Reference documentation and code samples for the Google Cloud Document Ai V1 Client class Document.
Document represents the canonical document resource in Document AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document AI to iterate and optimize for quality.
Generated from protobuf message google.cloud.documentai.v1.Document
Namespace
Google \ Cloud \ DocumentAI \ V1Methods
__construct
Constructor.
Parameters | |
---|---|
Name | Description |
data |
array
Optional. Data for populating the Message object. |
↳ uri |
string
Optional. Currently supports Google Cloud Storage URI of the form |
↳ content |
string
Optional. Inline document content, represented as a stream of bytes. Note: As with all |
↳ docid |
string
Optional. An internal identifier for document. Should be loggable (no PII). |
↳ mime_type |
string
An IANA published media type (MIME type). |
↳ text |
string
Optional. UTF-8 encoded text in reading order from the document. |
↳ text_styles |
array<Document\Style>
Styles for the Document.text. |
↳ pages |
array<Document\Page>
Visual page layout for the Document. |
↳ entities |
array<Document\Entity>
A list of entities detected on Document.text. For document shards, entities in this list may cross shard boundaries. |
↳ entity_relations |
array<Document\EntityRelation>
Placeholder. Relationship among Document.entities. |
↳ text_changes |
array<Document\TextChange>
Placeholder. A list of text corrections made to Document.text. This is usually used for annotating corrections to OCR mistakes. Text changes for a given revision may not overlap with each other. |
↳ shard_info |
Document\ShardInfo
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified. |
↳ error |
Google\Rpc\Status
Any error that occurred while processing this document. |
↳ revisions |
array<Document\Revision>
Placeholder. Revision history of this document. |
↳ document_layout |
Document\DocumentLayout
Parsed layout of the document. |
↳ chunked_document |
Document\ChunkedDocument
Document chunked based on chunking config. |
↳ entity_validation_output |
Document\EntityValidationOutput
The entity validation output for the document. This is the validation output for |
↳ entities_revisions |
array<Document\EntitiesRevision>
A list of entity revisions. The entity revisions are appended to the document in the processing order. This field can be used for comparing the entity extraction results at different stages of the processing. |
↳ entities_revision_id |
string
The entity revision id that |
getUri
Optional. Currently supports Google Cloud Storage URI of the form
gs://bucket_name/object_name
. Object versioning is not supported.
For more information, refer to Google Cloud Storage Request URIs.
Returns | |
---|---|
Type | Description |
string |
hasUri
setUri
Optional. Currently supports Google Cloud Storage URI of the form
gs://bucket_name/object_name
. Object versioning is not supported.
For more information, refer to Google Cloud Storage Request URIs.
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getContent
Optional. Inline document content, represented as a stream of bytes.
Note: As with all bytes
fields, protobuffers use a pure binary
representation, whereas JSON representations use base64.
Returns | |
---|---|
Type | Description |
string |
hasContent
setContent
Optional. Inline document content, represented as a stream of bytes.
Note: As with all bytes
fields, protobuffers use a pure binary
representation, whereas JSON representations use base64.
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getDocid
Optional. An internal identifier for document. Should be loggable (no PII).
Returns | |
---|---|
Type | Description |
string |
setDocid
Optional. An internal identifier for document. Should be loggable (no PII).
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getMimeType
An IANA published media type (MIME type).
Returns | |
---|---|
Type | Description |
string |
setMimeType
An IANA published media type (MIME type).
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getText
Optional. UTF-8 encoded text in reading order from the document.
Returns | |
---|---|
Type | Description |
string |
setText
Optional. UTF-8 encoded text in reading order from the document.
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getTextStyles
Styles for the Document.text.
Returns | |
---|---|
Type | Description |
Google\Protobuf\Internal\RepeatedField |
setTextStyles
Styles for the Document.text.
Parameter | |
---|---|
Name | Description |
var |
array<Document\Style>
|
Returns | |
---|---|
Type | Description |
$this |
getPages
Visual page layout for the Document.
Returns | |
---|---|
Type | Description |
Google\Protobuf\Internal\RepeatedField |
setPages
Visual page layout for the Document.
Parameter | |
---|---|
Name | Description |
var |
array<Document\Page>
|
Returns | |
---|---|
Type | Description |
$this |
getEntities
A list of entities detected on Document.text. For document shards, entities in this list may cross shard boundaries.
Returns | |
---|---|
Type | Description |
Google\Protobuf\Internal\RepeatedField |
setEntities
A list of entities detected on Document.text. For document shards, entities in this list may cross shard boundaries.
Parameter | |
---|---|
Name | Description |
var |
array<Document\Entity>
|
Returns | |
---|---|
Type | Description |
$this |
getEntityRelations
Placeholder. Relationship among Document.entities.
Returns | |
---|---|
Type | Description |
Google\Protobuf\Internal\RepeatedField |
setEntityRelations
Placeholder. Relationship among Document.entities.
Parameter | |
---|---|
Name | Description |
var |
array<Document\EntityRelation>
|
Returns | |
---|---|
Type | Description |
$this |
getTextChanges
Placeholder. A list of text corrections made to Document.text. This is usually used for annotating corrections to OCR mistakes. Text changes for a given revision may not overlap with each other.
Returns | |
---|---|
Type | Description |
Google\Protobuf\Internal\RepeatedField |
setTextChanges
Placeholder. A list of text corrections made to Document.text. This is usually used for annotating corrections to OCR mistakes. Text changes for a given revision may not overlap with each other.
Parameter | |
---|---|
Name | Description |
var |
array<Document\TextChange>
|
Returns | |
---|---|
Type | Description |
$this |
getShardInfo
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.
Returns | |
---|---|
Type | Description |
Document\ShardInfo|null |
hasShardInfo
clearShardInfo
setShardInfo
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.
Parameter | |
---|---|
Name | Description |
var |
Document\ShardInfo
|
Returns | |
---|---|
Type | Description |
$this |
getError
Any error that occurred while processing this document.
Returns | |
---|---|
Type | Description |
Google\Rpc\Status|null |
hasError
clearError
setError
Any error that occurred while processing this document.
Parameter | |
---|---|
Name | Description |
var |
Google\Rpc\Status
|
Returns | |
---|---|
Type | Description |
$this |
getRevisions
Placeholder. Revision history of this document.
Returns | |
---|---|
Type | Description |
Google\Protobuf\Internal\RepeatedField |
setRevisions
Placeholder. Revision history of this document.
Parameter | |
---|---|
Name | Description |
var |
array<Document\Revision>
|
Returns | |
---|---|
Type | Description |
$this |
getDocumentLayout
Parsed layout of the document.
Returns | |
---|---|
Type | Description |
Document\DocumentLayout|null |
hasDocumentLayout
clearDocumentLayout
setDocumentLayout
Parsed layout of the document.
Parameter | |
---|---|
Name | Description |
var |
Document\DocumentLayout
|
Returns | |
---|---|
Type | Description |
$this |
getChunkedDocument
Document chunked based on chunking config.
Returns | |
---|---|
Type | Description |
Document\ChunkedDocument|null |
hasChunkedDocument
clearChunkedDocument
setChunkedDocument
Document chunked based on chunking config.
Parameter | |
---|---|
Name | Description |
var |
Document\ChunkedDocument
|
Returns | |
---|---|
Type | Description |
$this |
getEntityValidationOutput
The entity validation output for the document. This is the validation
output for document.entities
field.
Returns | |
---|---|
Type | Description |
Document\EntityValidationOutput|null |
hasEntityValidationOutput
clearEntityValidationOutput
setEntityValidationOutput
The entity validation output for the document. This is the validation
output for document.entities
field.
Parameter | |
---|---|
Name | Description |
var |
Document\EntityValidationOutput
|
Returns | |
---|---|
Type | Description |
$this |
getEntitiesRevisions
A list of entity revisions. The entity revisions are appended to the document in the processing order. This field can be used for comparing the entity extraction results at different stages of the processing.
Returns | |
---|---|
Type | Description |
Google\Protobuf\Internal\RepeatedField |
setEntitiesRevisions
A list of entity revisions. The entity revisions are appended to the document in the processing order. This field can be used for comparing the entity extraction results at different stages of the processing.
Parameter | |
---|---|
Name | Description |
var |
array<Document\EntitiesRevision>
|
Returns | |
---|---|
Type | Description |
$this |
getEntitiesRevisionId
The entity revision id that document.entities
field is based on.
If this field is set and entities_revisions
is not empty, the entities in
document.entities
field are the entities in the entity revision with this
id and document.entity_validation_output
field is the
entity_validation_output
field in this entity revision.
Returns | |
---|---|
Type | Description |
string |
setEntitiesRevisionId
The entity revision id that document.entities
field is based on.
If this field is set and entities_revisions
is not empty, the entities in
document.entities
field are the entities in the entity revision with this
id and document.entity_validation_output
field is the
entity_validation_output
field in this entity revision.
Parameter | |
---|---|
Name | Description |
var |
string
|
Returns | |
---|---|
Type | Description |
$this |
getSource
Returns | |
---|---|
Type | Description |
string |