管理文档架构

本文档介绍了如何管理 Document AI Warehouse 中的文档架构,包括创建、提取、列出、更新和删除操作。

什么是文档架构

每个文档都属于某种文档类型,并由架构指定。

文档架构用于定义 Document AI Warehouse 中文档类型(例如,账单或工资单)的结构,管理员可以在其中指定不同数据类型(文本 | 数值 | 日期 | 枚举)的属性。

属性用于表示提取的数据、分类标记或 AI 或人工用户附加到文档的其他业务标记,例如 Invoice_Amount(数值)、Due_Date(日期)或 Supplier_Name(文本)。

  1. 媒体资源属性:每项媒体资源都可以声明为

    1. 可过滤 - 可用于过滤搜索结果

    2. 可搜索 - 已编入索引,因此可在搜索查询中找到

    3. 必需 - required 用于确保属性存在于文档中(我们建议将大多数属性保存为 required = false,除非该属性是必需属性)。

  2. 可扩展的架构:在某些情况下,具有“编辑”权限的最终用户需要向文档添加 / 删除新的架构属性。这是通过“MAP 属性”(即键值对列表)实现的。

    1. MAP 属性中的每个键值对都可以是以下数据类型:(文本 | 数字 | 日期 | 枚举)。

    2. 例如,发票可能包含一个映射属性“Invoice_Entities”,其中包含以下键值对:

      • Invoice_Amount(数值)1000

      • Due_Date(日期)12/24/2021

      • Supplier_Name(文本)ABC Corp

    3. 架构的不可变性:请注意,架构或架构属性可以添加,但目前无法修改或删除,因此请仔细定义架构。

准备工作

在开始之前,请确保您已完成快速入门页面中的步骤。

创建架构

创建文档架构。

REST

  curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer $(gcloud auth print-access-token)" \
  --data '{
    "display_name": "Test Doc Schema",
    "property_definitions": [
      {
        "name": "plaintiff",
        "display_name": "Plaintiff",
        "is_searchable": true,
        "is_repeatable": true,
        "text_type_options": {}
      }
    ]
  }'

Python

如需了解详情,请参阅 Document AI Warehouse Python API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from google.cloud import contentwarehouse

# TODO(developer): Uncomment these variables before running the sample.
# project_number = 'YOUR_PROJECT_NUMBER'
# location = 'YOUR_PROJECT_LOCATION' # Format is 'us' or 'eu'


def sample_create_document_schema(project_number: str, location: str) -> None:
    """Creates document schema.

    Args:
        project_number: Google Cloud project number.
        location: Google Cloud project location.
    Returns:
        Response object.
    """
    # Create a Schema Service client.
    document_schema_client = contentwarehouse.DocumentSchemaServiceClient()

    property_definition = contentwarehouse.PropertyDefinition(
        name="stock_symbol",  # Must be unique within a document schema (case insensitive)
        display_name="Searchable text",
        is_searchable=True,
        text_type_options=contentwarehouse.TextTypeOptions(),
    )
    # Initialize request argument(s)
    document_schema = contentwarehouse.DocumentSchema(
        display_name="My Test Schema",
        property_definitions=[property_definition],
    )

    request = contentwarehouse.CreateDocumentSchemaRequest(
        # The full resource name of the location, e.g.:
        # projects/{project_number}/locations/{location}/
        parent=document_schema_client.common_location_path(project_number, location),
        document_schema=document_schema,
    )

    # Make the request
    response = document_schema_client.create_document_schema(request=request)

    # Print response
    print("Document Schema Created:", response)

    return response

Java

如需了解详情,请参阅 Document AI Warehouse Java API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


import com.google.cloud.contentwarehouse.v1.CreateDocumentSchemaRequest;
import com.google.cloud.contentwarehouse.v1.DocumentSchema;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceClient;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceSettings;
import com.google.cloud.contentwarehouse.v1.LocationName;
import com.google.cloud.contentwarehouse.v1.PropertyDefinition;
import com.google.cloud.contentwarehouse.v1.TextTypeOptions;
import com.google.cloud.resourcemanager.v3.Project;
import com.google.cloud.resourcemanager.v3.ProjectName;
import com.google.cloud.resourcemanager.v3.ProjectsClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class CreateDocumentSchema {

  public static void createDocumentSchema() throws IOException, 
        InterruptedException, ExecutionException, TimeoutException {
    String projectId = "your-project-id";
    String location = "your-region"; // Format is "us" or "eu".
    createDocumentSchema(projectId, location);
  }

  // Creates a new Document Schema
  public static void createDocumentSchema(String projectId, String location) throws IOException, 
        InterruptedException, ExecutionException, TimeoutException {
    String projectNumber = getProjectNumber(projectId);

    String endpoint = "contentwarehouse.googleapis.com:443";
    if (!"us".equals(location)) {
      endpoint = String.format("%s-%s", location, endpoint);
    }
    DocumentSchemaServiceSettings documentSchemaServiceSettings = 
         DocumentSchemaServiceSettings.newBuilder().setEndpoint(endpoint).build(); 

    // Create a Schema Service client
    try (DocumentSchemaServiceClient documentSchemaServiceClient =
        DocumentSchemaServiceClient.create(documentSchemaServiceSettings)) {
      /*  The full resource name of the location, e.g.:
      projects/{project_number}/locations/{location} */
      String parent = LocationName.format(projectNumber, location);

      /* Create Document Schema with Text Type Property Definition
       * More detail on managing Document Schemas: 
       * https://cloud.google.com/document-warehouse/docs/manage-document-schemas */
      DocumentSchema documentSchema = DocumentSchema.newBuilder()
          .setDisplayName("Test Doc Schema")
          .setDescription("Test Doc Schema's Description")
          .addPropertyDefinitions(
            PropertyDefinition.newBuilder()
              .setName("plaintiff")
              .setDisplayName("Plaintiff")
              .setIsSearchable(true)
              .setIsRepeatable(true)
              .setTextTypeOptions(TextTypeOptions.newBuilder().build())
              .build()).build();

      // Define Document Schema request
      CreateDocumentSchemaRequest createDocumentSchemaRequest =
          CreateDocumentSchemaRequest.newBuilder()
            .setParent(parent)
            .setDocumentSchema(documentSchema).build();

      // Create Document Schema
      DocumentSchema documentSchemaResponse =
          documentSchemaServiceClient.createDocumentSchema(createDocumentSchemaRequest); 

      System.out.println(documentSchemaResponse.getName());
    }
  }

  private static String getProjectNumber(String projectId) throws IOException { 
    /* Initialize client that will be used to send requests. 
    * This client only needs to be created once, and can be reused for multiple requests. */
    try (ProjectsClient projectsClient = ProjectsClient.create()) { 
      ProjectName projectName = ProjectName.of(projectId); 
      Project project = projectsClient.getProject(projectName);
      String projectNumber = project.getName(); // Format returned is projects/xxxxxx
      return projectNumber.substring(projectNumber.lastIndexOf("/") + 1);
    } 
  }
}

Node.js

如需了解详情,请参阅 Document AI Warehouse Node.js API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


/**
 * TODO(developer): Uncomment these variables before running the sample.
 * const projectNumber = 'YOUR_PROJECT_NUMBER';
 * const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
 */

// Import from google cloud
const {DocumentSchemaServiceClient} =
  require('@google-cloud/contentwarehouse').v1;

const apiEndpoint =
  location === 'us'
    ? 'contentwarehouse.googleapis.com'
    : `${location}-contentwarehouse.googleapis.com`;

// Create service client
const serviceClient = new DocumentSchemaServiceClient({
  apiEndpoint: apiEndpoint,
});

// Create Document Schema
async function createDocumentSchema() {
  // The full resource name of the location, e.g.:
  // projects/{project_number}/locations/{location}
  const parent = `projects/${projectNumber}/locations/${location}`;
  // Initialize request argument(s)
  const request = {
    parent: parent,
    // Document Schema
    documentSchema: {
      displayName: 'My Test Schema',
      // Property Definition
      propertyDefinitions: [
        {
          name: 'testPropertyDefinitionName', // Must be unique within a document schema (case insensitive)
          displayName: 'searchable text',
          isSearchable: true,
          textTypeOptions: {},
        },
      ],
    },
  };

  // Make Request
  const response = serviceClient.createDocumentSchema(request);

  // Print out response
  response.then(
    result =>
      console.log(`Document Schema Created: ${JSON.stringify(result)}`),
    error => console.log(`${error}`)
  );
}

获取架构

获取文档架构的详细信息。

REST

  curl --request GET --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/{document_schema_id} \
  --header "Authorization: Bearer $(gcloud auth print-access-token)" \
  --header "Content-Type: application/json; charset=UTF-8"

Python

如需了解详情,请参阅 Document AI Warehouse Python API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from google.cloud import contentwarehouse

# TODO(developer): Uncomment these variables before running the sample.
# project_number = 'YOUR_PROJECT_NUMBER'
# location = 'YOUR_PROJECT_LOCATION' # Format is 'us' or 'eu'
# document_schema_id = "YOUR_DOCUMENT SCHEMA_ID"


def sample_get_document_schema(
    project_number: str, location: str, document_schema_id: str
) -> None:
    """Gets document schema details.

    Args:
        project_number: Google Cloud project number.
        location: Google Cloud project location.
        document_schema_id: Unique identifier for document schema
    Returns:
        Response object.
    """
    # Create a Schema Service client.
    document_schema_client = contentwarehouse.DocumentSchemaServiceClient()

    # The full resource name of the location, e.g.:
    # projects/{project_number}/locations/{location}/documentSchemas/{document_schema_id}
    document_schema_path = document_schema_client.document_schema_path(
        project=project_number,
        location=location,
        document_schema=document_schema_id,
    )

    # Initialize request argument(s)
    request = contentwarehouse.GetDocumentSchemaRequest(
        name=document_schema_path,
    )

    # Make the request
    response = document_schema_client.get_document_schema(request=request)

    # Handle the response
    print("Document Schema:", response)

    return response

Java

如需了解详情,请参阅 Document AI Warehouse Java API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


import com.google.cloud.contentwarehouse.v1.DocumentSchema;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaName;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceClient;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceSettings;
import com.google.cloud.contentwarehouse.v1.GetDocumentSchemaRequest;
import com.google.cloud.resourcemanager.v3.Project;
import com.google.cloud.resourcemanager.v3.ProjectName;
import com.google.cloud.resourcemanager.v3.ProjectsClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class GetDocumentSchema {

  public static void getDocumentSchema() throws IOException, 
        InterruptedException, ExecutionException, TimeoutException {
    String projectId = "your-project-id";
    String location = "your-region"; // Format is "us" or "eu".
    String documentSchemaId = "your-document-schema-id";
    getDocumentSchema(projectId, location, documentSchemaId);
  }

  // Retrieves details about existing Document Schema
  public static void getDocumentSchema(String projectId, String location, 
        String documentSchemaId) throws IOException, 
            InterruptedException, ExecutionException, TimeoutException {
    String projectNumber = getProjectNumber(projectId);

    String endpoint = "contentwarehouse.googleapis.com:443";
    if (!"us".equals(location)) {
      endpoint = String.format("%s-%s", location, endpoint);
    }
    DocumentSchemaServiceSettings documentSchemaServiceSettings = 
         DocumentSchemaServiceSettings.newBuilder().setEndpoint(endpoint).build(); 

    // Create a Schema Service client
    try (DocumentSchemaServiceClient documentSchemaServiceClient =
        DocumentSchemaServiceClient.create(documentSchemaServiceSettings)) {
      /* The full resource name of the location, e.g.: 
       projects/{project_number}/location/{location}/documentSchemas/{document_schema_id} */
      DocumentSchemaName documentSchemaName = 
          DocumentSchemaName.of(projectNumber, location, documentSchemaId);

      // Define request to get details of a specific Document Schema
      GetDocumentSchemaRequest getDocumentSchemaRequest = 
          GetDocumentSchemaRequest.newBuilder().setName(documentSchemaName.toString()).build();

      // Get details of Document Schema
      DocumentSchema documentSchema = 
          documentSchemaServiceClient.getDocumentSchema(getDocumentSchemaRequest);

      System.out.println(documentSchema.getName());
    }
  }

  private static String getProjectNumber(String projectId) throws IOException { 
    /* Initialize client that will be used to send requests. 
    * This client only needs to be created once, and can be reused for multiple requests. */
    try (ProjectsClient projectsClient = ProjectsClient.create()) { 
      ProjectName projectName = ProjectName.of(projectId); 
      Project project = projectsClient.getProject(projectName);
      String projectNumber = project.getName(); // Format returned is projects/xxxxxx
      return projectNumber.substring(projectNumber.lastIndexOf("/") + 1);
    } 
  }
}

Node.js

如需了解详情,请参阅 Document AI Warehouse Node.js API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


/**
 * TODO(developer): Uncomment these variables before running the sample.
 * const projectNumber = 'YOUR_PROJECT_NUMBER';
 * const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
 * const schemaId = 'YOUR_DOCUMENT_SCHEMA_ID';
 */

// Import from google cloud
const {DocumentSchemaServiceClient} =
  require('@google-cloud/contentwarehouse').v1;

const apiEndpoint =
  location === 'us'
    ? 'contentwarehouse.googleapis.com'
    : `${location}-contentwarehouse.googleapis.com`;

// Create service client
const serviceClient = new DocumentSchemaServiceClient({
  apiEndpoint: apiEndpoint,
});

// Get Document Schema
async function getDocumentSchema() {
  // Initialize request argument(s)
  const request = {};

  // The full resource name of the location, e.g.:
  // projects/{project_number}/locations/{location}/documentSchemas/{document_schema_id}
  const name = serviceClient.documentSchemaPath(
    projectNumber,
    location,
    documentSchemaId
  );
  request.name = name;

  // Make Request
  const response = serviceClient.getDocumentSchema(request);

  // Print out response
  response.then(
    result => console.log(`Schema Found: ${JSON.stringify(result)}`),
    error => console.log(`${error}`)
  );
}

列出架构

列出文档架构。

REST

  curl --request GET --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas \
  --header "Authorization: Bearer $(gcloud auth print-access-token)" \
  --header "Content-Type: application/json; charset=UTF-8"

Python

如需了解详情,请参阅 Document AI Warehouse Python API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from google.cloud import contentwarehouse

# TODO(developer): Uncomment these variables before running the sample.
# project_number = 'YOUR_PROJECT_NUMBER'
# location = 'YOUR_PROJECT_LOCATION' # Format is 'us' or 'eu'


def sample_list_document_schemas(project_number: str, location: str) -> None:
    """Lists document schemas.

    Args:
        project_number: Google Cloud project number.
        location: Google Cloud project location.
    """
    # Create a client
    document_schema_client = contentwarehouse.DocumentSchemaServiceClient()

    # The full resource name of the location, e.g.:
    # projects/{project_number}/locations/{location}
    parent = document_schema_client.common_location_path(
        project=project_number, location=location
    )

    # Initialize request argument(s)
    request = contentwarehouse.ListDocumentSchemasRequest(
        parent=parent,
    )

    # Make the request
    page_result = document_schema_client.list_document_schemas(request=request)

    # Print response
    responses = []
    print("Document Schemas:")
    for response in page_result:
        print(response)
        responses.append(response)

    return responses

Java

如需了解详情,请参阅 Document AI Warehouse Java API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


import com.google.cloud.contentwarehouse.v1.DocumentSchema;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceClient;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceSettings;
import com.google.cloud.contentwarehouse.v1.ListDocumentSchemasRequest;
import com.google.cloud.contentwarehouse.v1.LocationName;
import com.google.cloud.resourcemanager.v3.Project;
import com.google.cloud.resourcemanager.v3.ProjectName;
import com.google.cloud.resourcemanager.v3.ProjectsClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class ListDocumentSchema {
  public static void listDocumentSchemas() throws IOException, 
        InterruptedException, ExecutionException, TimeoutException {
    String projectId = "your-project-id";
    String location = "your-region"; // Format is "us" or "eu".
    listDocumentSchemas(projectId, location);
  }

  // Retrieves all Document Schemas associated with a specified project
  public static void listDocumentSchemas(String projectId, String location) throws IOException, 
        InterruptedException, ExecutionException, TimeoutException {
    String projectNumber = getProjectNumber(projectId);

    String endpoint = "contentwarehouse.googleapis.com:443";
    if (!"us".equals(location)) {
      endpoint = String.format("%s-%s", location, endpoint);
    }
    DocumentSchemaServiceSettings documentSchemaServiceSettings = 
         DocumentSchemaServiceSettings.newBuilder().setEndpoint(endpoint).build(); 

    // Create a Schema Service client
    try (DocumentSchemaServiceClient documentSchemaServiceClient =
        DocumentSchemaServiceClient.create(documentSchemaServiceSettings)) {
      /*  The full resource name of the location, e.g.:
      projects/{project_number}/locations/{location} */
      String parent = LocationName.format(projectNumber, location);

      // Define request to list all Document Schemas
      ListDocumentSchemasRequest listDocumentSchemasRequest = 
          ListDocumentSchemasRequest.newBuilder().setParent(parent).build();

      // Print each schema ID  
      for (DocumentSchema schema :
          documentSchemaServiceClient.listDocumentSchemas(listDocumentSchemasRequest)
            .iterateAll()) {
        System.out.println(schema.getName());
      }
    }
  }

  private static String getProjectNumber(String projectId) throws IOException { 
    /* Initialize client that will be used to send requests. 
    * This client only needs to be created once, and can be reused for multiple requests. */
    try (ProjectsClient projectsClient = ProjectsClient.create()) { 
      ProjectName projectName = ProjectName.of(projectId); 
      Project project = projectsClient.getProject(projectName);
      String projectNumber = project.getName(); // Format returned is projects/xxxxxx
      return projectNumber.substring(projectNumber.lastIndexOf("/") + 1);
    } 
  }
}

删除架构

删除文档架构。

REST

  curl --request DELETE --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/{document_schema_id} \
  --header "Authorization: Bearer $(gcloud auth print-access-token)" \
  --header "Content-Type: application/json; charset=UTF-8"

Python

如需了解详情,请参阅 Document AI Warehouse Python API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from google.cloud import contentwarehouse

# TODO(developer): Uncomment these variables before running the sample.
# project_number = 'YOUR_PROJECT_NUMBER'
# location = 'YOUR_PROJECT_LOCATION' # Format is 'us' or 'eu'
# document_schema_id = "YOUR_DOCUMENT SCHEMA_ID"


def sample_delete_document_schema(
    project_number: str, location: str, document_schema_id: str
) -> None:
    """Deletes document schema.

    Args:
        project_number: Google Cloud project number.
        location: Google Cloud project location.
        document_schema_id: Unique identifier for document schema
    Returns:
        None, if operation is successful
    """
    # Create a client
    document_schema_client = contentwarehouse.DocumentSchemaServiceClient()

    # The full resource name of the location, e.g.:
    # projects/{project_number}/locations/{location}/documentSchemas/{document_schema_id}
    document_schema_path = document_schema_client.document_schema_path(
        project=project_number,
        location=location,
        document_schema=document_schema_id,
    )

    # Initialize request argument(s)
    request = contentwarehouse.DeleteDocumentSchemaRequest(
        name=document_schema_path,
    )

    # Make the request
    response = document_schema_client.delete_document_schema(request=request)

    return response

Java

如需了解详情,请参阅 Document AI Warehouse Java API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


import com.google.cloud.contentwarehouse.v1.DeleteDocumentSchemaRequest;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaName;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceClient;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceSettings;
import com.google.cloud.resourcemanager.v3.Project;
import com.google.cloud.resourcemanager.v3.ProjectName;
import com.google.cloud.resourcemanager.v3.ProjectsClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class DeleteDocumentSchema {

  public static void createDocumentSchema() throws IOException, 
        InterruptedException, ExecutionException, TimeoutException {
    String projectId = "your-project-id";
    String location = "your-region"; // Format is "us" or "eu".
    String documentSchemaId = "your-schema-id";
    deleteDocumentSchema(projectId, location, documentSchemaId);
  }

  // Creates a new Document Schema
  public static void deleteDocumentSchema(String projectId, String location,
      String documentSchemaId) throws IOException,
          InterruptedException, ExecutionException, TimeoutException {
    String projectNumber = getProjectNumber(projectId);

    String endpoint = "contentwarehouse.googleapis.com:443";
    if (!"us".equals(location)) {
      endpoint = String.format("%s-%s", location, endpoint);
    }
    DocumentSchemaServiceSettings documentSchemaServiceSettings = 
         DocumentSchemaServiceSettings.newBuilder().setEndpoint(endpoint).build(); 

    // Create a Schema Service client
    try (DocumentSchemaServiceClient documentSchemaServiceClient =
        DocumentSchemaServiceClient.create(documentSchemaServiceSettings)) {

      /* The full resource name of the location, e.g.: 
       projects/{project_number}/location/{location}/documentSchemas/{document_schema_id} */
      DocumentSchemaName documentSchemaName = 
          DocumentSchemaName.of(projectNumber, location, documentSchemaId);

      /* Create request to delete Document Schema from provided schema ID.
       * More detail on managing Document Schemas: 
       * https://cloud.google.com/document-warehouse/docs/manage-document-schemas */
      DeleteDocumentSchemaRequest deleteDocumentSchemaRequest = 
          DeleteDocumentSchemaRequest.newBuilder()
            .setName(documentSchemaName.toString()).build();

      // Delete Document Schema
      documentSchemaServiceClient.deleteDocumentSchema(deleteDocumentSchemaRequest);

      System.out.println("Document Schema ID " + documentSchemaId + " has been deleted.");

    }
  }

  private static String getProjectNumber(String projectId) throws IOException { 
    /* Initialize client that will be used to send requests. 
    * This client only needs to be created once, and can be reused for multiple requests. */
    try (ProjectsClient projectsClient = ProjectsClient.create()) { 
      ProjectName projectName = ProjectName.of(projectId); 
      Project project = projectsClient.getProject(projectName);
      String projectNumber = project.getName(); // Format returned is projects/xxxxxx
      return projectNumber.substring(projectNumber.lastIndexOf("/") + 1);
    } 
  }
}

Node.js

如需了解详情,请参阅 Document AI Warehouse Node.js API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


/**
 * TODO(developer): Uncomment these variables before running the sample.
 * const projectId = 'YOUR_PROJECT_ID';
 * const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
 * const documentSchemaId = 'YOUR_DOCUMENT_SCHEMA_ID';
 */

// Import from google cloud

const {DocumentSchemaServiceClient} =
  require('@google-cloud/contentwarehouse').v1;

const apiEndpoint =
  location === 'us'
    ? 'contentwarehouse.googleapis.com'
    : `${location}-contentwarehouse.googleapis.com`;

// Create service client
const serviceClient = new DocumentSchemaServiceClient({
  apiEndpoint: apiEndpoint,
});

// Delete Document Schema
async function deleteDocumentSchema() {
  // Initialize request argument(s)
  const request = {
    // The full resource name of the location, e.g.:
    // projects/{project_number}/locations/{location}/documentSchemas/{document_schema_id}
    name: `projects/${projectId}/locations/${location}/documentSchemas/${documentSchemaId}`,
  };

  // Make Request
  const response = await serviceClient.deleteDocumentSchema(request);

  // Print out response
  console.log(`Document Schema Deleted: ${JSON.stringify(response)}`);
}

更新架构

更新文档架构。目前,更新逻辑仅支持添加新的媒体资源定义。新文档架构应包含现有架构中的所有属性定义。

  • 支持的转换类型:

    • 对于现有房源,用户可以更改以下元数据设置:is_repeatableis_metadatais_required
    • 对于现有的 ENUM 属性,用户可以添加新的 ENUM 可能值或删除现有的 ENUM 可能值。他们可以更新 EnumTypeOptions.validation_check_disabled 标志以停用验证检查。验证检查用于确保在调用 CreateDocument API 时,文档中指定的枚举值在属性定义中定义的可能枚举值范围内。
    • 支持添加新的媒体资源定义。
  • 不支持的转换类型:

    • 对于现有架构,不允许更新 display_namedocument_is_folder
    • 对于现有媒体资源,不允许更新 namedisplay_namevalue_type_options

REST

curl --request PATCH --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/{document_schema_id} \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8" \
--data '{
  "document_schema": {
    "display_name": "Test Doc Schema",
    "property_definitions": [
      {
        "name": "plaintiff",
        "display_name": "Plaintiff",
        "is_repeatable": true,
        "text_type_options": {}
      }
    ]
  }
}'

Python

如需了解详情,请参阅 Document AI Warehouse Python API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from google.cloud import contentwarehouse

# TODO(developer): Uncomment these variables before running the sample.
# project_number = "YOUR_PROJECT_NUMBER"
# location = "us" # Format is 'us' or 'eu'
# document_schema_id = "YOUR_SCHEMA_ID"


def update_document_schema(
    project_number: str, location: str, document_schema_id: str
) -> None:
    # Create a Schema Service client
    document_schema_client = contentwarehouse.DocumentSchemaServiceClient()

    # The full resource name of the location, e.g.:
    # projects/{project_number}/locations/{location}/documentSchemas/{document_schema_id}
    document_schema_path = document_schema_client.document_schema_path(
        project=project_number,
        location=location,
        document_schema=document_schema_id,
    )

    # Define Schema Property of Text Type with updated values
    updated_property_definition = contentwarehouse.PropertyDefinition(
        name="stock_symbol",  # Must be unique within a document schema (case insensitive)
        display_name="Searchable text",
        is_searchable=True,
        is_repeatable=False,
        is_required=True,
        text_type_options=contentwarehouse.TextTypeOptions(),
    )

    # Define Update Document Schema Request
    update_document_schema_request = contentwarehouse.UpdateDocumentSchemaRequest(
        name=document_schema_path,
        document_schema=contentwarehouse.DocumentSchema(
            display_name="My Test Schema",
            property_definitions=[updated_property_definition],
        ),
    )

    # Update Document schema
    updated_document_schema = document_schema_client.update_document_schema(
        request=update_document_schema_request
    )

    # Read the output
    print(f"Updated Document Schema: {updated_document_schema}")

Java

如需了解详情,请参阅 Document AI Warehouse Java API 参考文档

如需向 Document AI Warehouse 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


import com.google.cloud.contentwarehouse.v1.DocumentSchema;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaName;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceClient;
import com.google.cloud.contentwarehouse.v1.DocumentSchemaServiceSettings;
import com.google.cloud.contentwarehouse.v1.PropertyDefinition;
import com.google.cloud.contentwarehouse.v1.TextTypeOptions;
import com.google.cloud.contentwarehouse.v1.UpdateDocumentSchemaRequest;
import com.google.cloud.resourcemanager.v3.Project;
import com.google.cloud.resourcemanager.v3.ProjectName;
import com.google.cloud.resourcemanager.v3.ProjectsClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class UpdateDocumentSchema {
  public static void updateDocumentSchema() throws IOException, 
        InterruptedException, ExecutionException, TimeoutException { 
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String location = "your-region"; // Format is "us" or "eu".
    String documentSchemaId = "your-document-schema-id";
    /* The below method call retrieves details about the schema you are about to update.
     * It is important to note that some properties cannot be edited or removed. 
     * For more information on managing document schemas, please see the below documentation.
     * https://cloud.google.com/document-warehouse/docs/manage-document-schemas */
    GetDocumentSchema.getDocumentSchema(projectId, location, documentSchemaId);
    updateDocumentSchema(projectId, location, documentSchemaId);
  }

  // Updates an existing Document Schema
  public static void updateDocumentSchema(String projectId, String location, 
        String documentSchemaId) throws IOException, InterruptedException,
          ExecutionException, TimeoutException { 
    String projectNumber = getProjectNumber(projectId);

    String endpoint = "contentwarehouse.googleapis.com:443";
    if (!"us".equals(location)) {
      endpoint = String.format("%s-%s", location, endpoint);
    }

    DocumentSchemaServiceSettings documentSchemaServiceSettings = 
             DocumentSchemaServiceSettings.newBuilder().setEndpoint(endpoint).build(); 

    /* Create the Schema Service Client 
     * Initialize client that will be used to send requests. 
     * This client only needs to be created once, and can be reused for multiple requests. */
    try (DocumentSchemaServiceClient documentSchemaServiceClient = 
            DocumentSchemaServiceClient.create(documentSchemaServiceSettings)) {

      /* The full resource name of the location, e.g.: 
       projects/{project_number}/location/{location}/documentSchemas/{document_schema_id} */
      DocumentSchemaName documentSchemaName = 
          DocumentSchemaName.of(projectNumber, location, documentSchemaId);

      // Define the new Schema Property with updated values
      PropertyDefinition propertyDefinition = PropertyDefinition.newBuilder()
          .setName("plaintiff")
          .setDisplayName("Plaintiff")
          .setIsSearchable(true)
          .setIsRepeatable(true)
          .setIsRequired(false)
          .setTextTypeOptions(TextTypeOptions.newBuilder()
          .build())
          .build();

      DocumentSchema updatedDocumentSchema = DocumentSchema.newBuilder()
                    .setDisplayName("Test Doc Schema") 
                    .addPropertyDefinitions(0, propertyDefinition).build();

      // Create the Request to Update the Document Schema
      UpdateDocumentSchemaRequest updateDocumentSchemaRequest = 
            UpdateDocumentSchemaRequest.newBuilder()
            .setName(documentSchemaName.toString())
            .setDocumentSchema(updatedDocumentSchema)
            .build();

      // Update Document Schema
      updatedDocumentSchema = 
        documentSchemaServiceClient.updateDocumentSchema(updateDocumentSchemaRequest);

      // Read the output of Updated Document Schema Name
      System.out.println(updatedDocumentSchema.getName());
    }
  }

  private static String getProjectNumber(String projectId) throws IOException { 
    /* Initialize client that will be used to send requests. 
    * This client only needs to be created once, and can be reused for multiple requests. */
    try (ProjectsClient projectsClient = ProjectsClient.create()) { 
      ProjectName projectName = ProjectName.of(projectId); 
      Project project = projectsClient.getProject(projectName);
      String projectNumber = project.getName(); // Format returned is projects/xxxxxx
      return projectNumber.substring(projectNumber.lastIndexOf("/") + 1);
    } 
  }
}

后续步骤