Method: dataset.importDocuments

Full name: projects.locations.processors.dataset.importDocuments

Import documents into a dataset.

HTTP request

Choose a location:

POST https://documentai.googleapis.com/v1beta3/{dataset}:importDocuments

Path parameters

Parameters

Parameters
`dataset`	`string` Required. The dataset resource name. Format: projects/{project}/locations/{location}/processors/{processor}/dataset It takes the form `projects/{project}/locations/{location}/processors/{processor}/dataset`.

dataset

string

Required. The dataset resource name. Format: projects/{project}/locations/{location}/processors/{processor}/dataset It takes the form projects/{project}/locations/{location}/processors/{processor}/dataset.

Request body

The request body contains data with the following structure:

JSON representation
{ "batchDocumentsImportConfigs": [ { object (`BatchDocumentsImportConfig`) } ] }

Fields

Fields
`batchDocumentsImportConfigs[]`	`object (BatchDocumentsImportConfig)` Required. The Cloud Storage uri containing raw documents that must be imported.

batchDocumentsImportConfigs[]

object (BatchDocumentsImportConfig)

Required. The Cloud Storage uri containing raw documents that must be imported.

Response body

If successful, the response body contains an instance of Operation.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the dataset resource:

documentai.datasets.createDocuments

For more information, see the IAM documentation.

BatchDocumentsImportConfig

Config for importing documents. Each batch can have its own dataset split type.

JSON representation

JSON representation
{ "batchInputConfig": { object (`BatchDocumentsInputConfig`) }, "documentType": string, // Union field `split_type_config` can be only one of the following: "datasetSplit": enum (`DatasetSplitType`), "autoSplitConfig": { object (`AutoSplitConfig`) } // End of list of possible types for union field `split_type_config`. }

{
  "batchInputConfig": {
    object (BatchDocumentsInputConfig)
  },
  "documentType": string,

  // Union field split_type_config can be only one of the following:
  "datasetSplit": enum (DatasetSplitType),
  "autoSplitConfig": {
    object (AutoSplitConfig)
  }
  // End of list of possible types for union field split_type_config.
}

Fields
`batchInputConfig`	`object (BatchDocumentsInputConfig)` The common config to specify a set of documents used as input.
`documentType`	`string` Optional. If set, determines the type of the documents to be imported in this batch. It can be used to auto-label the documents with a single entity of the provided type. This field can only be used with a classifier or splitter processor. Providing this field is mutually exclusive with `entities` and `autoLabelingConfig`.
Union field `split_type_config`. `split_type_config` can be only one of the following:
`datasetSplit`	`enum (DatasetSplitType)` Target dataset split where the documents must be stored.
`autoSplitConfig`	`object (AutoSplitConfig)` If set, documents will be automatically split into training and test split category with the specified ratio.

AutoSplitConfig

The config for auto-split.

JSON representation
{ "trainingSplitRatio": number }