문서 관리

이 문서에서는 생성, 가져오기, 업데이트, 삭제 작업을 비롯하여 Document AI Warehouse의 문서를 관리하는 방법을 설명합니다.

문서란 무엇인가요?

문서는 Document AI Warehouse에서 실제 문서 (예: PDF 또는 TXT)와 연결된 속성을 정리하는 데 사용되는 데이터 모델입니다. 문서에 대한 작업을 통해 Document AI Warehouse와 상호작용합니다.

지원되는 파일 형식

Document AI Warehouse는 문서에 중점을 두지만 보험, 엔지니어링, 건설, 연구와 같은 업종에서 관련 이미지를 관리하는 데도 사용됩니다.

  • Ingest API는 PDF, TIFF, JPEG, PNG 이미지와 속성 또는 사전 추출된 텍스트를 지원합니다.
  • 업로드 UI는 Document AI OCR 및 맞춤 프로세서를 사용한 PDF 추출을 지원합니다.
  • 뷰어 UI는 PDF, 텍스트, Microsoft Office 파일의 렌더링을 지원합니다.

시작하기 전에

시작하기 전에 빠른 시작 페이지를 완료했는지 확인하세요.

문서 생성의 경우 데이터가 자체 Cloud Storage 버킷에 있는 경우 Document AI Warehouse 서비스 계정에 데이터를 읽을 수 있는 스토리지 객체 뷰어 권한을 부여해야 합니다.

각 문서는 스키마로 지정되며 문서 유형에 속합니다. 문서 스키마는 Document AI Warehouse의 문서 구조를 정의합니다. 문서를 만들려면 먼저 문서 스키마를 만들어야 합니다.

문서 만들기

문서를 만들려면 Document AI Warehouse에 원시 문서 콘텐츠를 제공해야 합니다. 원시 문서 바이트 콘텐츠를 제공하는 두 가지 방법은 Document.inline_raw_document 또는 Document.raw_document_path을 설정하는 것입니다.

차이점은 다음과 같습니다.

  • Document.raw_document_path: 이 방법을 사용하는 것이 좋습니다. 수집할 파일의 Cloud Storage 경로 (gs://bucket/object)를 사용합니다. 호출자가 호출에 성공하려면 이 객체에 대한 읽기 권한이 있어야 합니다.
  • Document.inline_raw_document: 엔드포인트에 직접 제공되는 파일의 바이트/텍스트 표현입니다.

문서를 만들려면 다음 단계를 따르세요.

Cloud Storage에서 문서 업로드

필수사항 섹션에 설명된 대로 Document AI Warehouse 서비스 계정에 Cloud Storage 버킷에 대한 액세스 권한을 부여해야 합니다.

안내에 따라 파일을 Cloud Storage 버킷에 업로드해야 합니다.

REST

요청:

curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=utf-8" \
--data '{
"document": {
  "display_name": "TestDoc3",
  "document_schema_name": "projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/DOCUMENT_SCHEMA_ID",
  "raw_document_path": "gs://BUCKET_URI/FILE_URI",
  "properties": [
    {
      "name": "supplier_name",
      "text_values": {
        "values": "Stanford Plumbing & Heating"
      }
    },
    {
      "name": "total_amount",
      "float_values": {
        "values": "1091.81"
      }
    },
  ]
}, 
"requestMetadata":{
  "userInfo":{
    "id": "user:USER_EMAIL_ID"
  }
}
}'

로컬 머신에서 업로드

REST

요청:

curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/ \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=utf-8" \
--data '{
"document": {
  "display_name": "TestDoc3",
  "document_schema_name": "projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/DOCUMENT_SCHEMA_ID",
  "inline_raw_document": "<bytes>",
  "properties": [
    {
      "name": "supplier_name",
      "text_values": {
        "values": "Stanford Plumbing & Heating"
      }
    },
    {
      "name": "total_amount",
      "float_values": {
        "values": "1091.81"
      }
    },
  ]
},
"requestMetadata": {
  "userInfo": {
    "id": "user:USER_EMAIL_ID"
  }
}
}'

문서 가져오기

document_id의 경우:

REST

curl --request POST \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=UTF-8" -d '{
  "requestMetadata":{
    "userInfo":{
      "id": "user:USER_EMAIL"
    }
  }
}' \
"https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/DOCUMENT_ID:get"

Python

자세한 내용은 Document AI Warehouse Python API 참고 문서를 참고하세요.

Document AI Warehouse에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


from google.cloud import contentwarehouse


def sample_get_document(document_name: str, user_id: str) -> contentwarehouse.Document:
    """Gets a document.

    Args:
        document_name: The resource name of the document.
                Format: 'projects/{project_number}/
                locations/{location}/documents/{document_id}'.
        user_id: user:YOUR_SERVICE_ACCOUNT_ID or user:USER_EMAIL.
    Returns:
        Response object
    """
    # Create a client
    client = contentwarehouse.DocumentServiceClient()

    request_metadata = contentwarehouse.RequestMetadata(
        user_info=contentwarehouse.UserInfo(id=user_id)
    )

    # Initialize request argument(s)
    request = contentwarehouse.GetDocumentRequest(
        name=document_name, request_metadata=request_metadata
    )

    # Make the request
    response = client.get_document(request=request)

    return response

Java

자세한 내용은 Document AI Warehouse Java API 참고 문서를 참고하세요.

Document AI Warehouse에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


import com.google.cloud.contentwarehouse.v1.Document;
import com.google.cloud.contentwarehouse.v1.DocumentName;
import com.google.cloud.contentwarehouse.v1.DocumentServiceClient;
import com.google.cloud.contentwarehouse.v1.DocumentServiceSettings;
import com.google.cloud.contentwarehouse.v1.GetDocumentRequest;
import com.google.cloud.contentwarehouse.v1.RequestMetadata;
import com.google.cloud.contentwarehouse.v1.UserInfo;
import com.google.cloud.resourcemanager.v3.Project;
import com.google.cloud.resourcemanager.v3.ProjectName;
import com.google.cloud.resourcemanager.v3.ProjectsClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class GetDocument {

  public static void getDocument() throws IOException, 
        InterruptedException, ExecutionException, TimeoutException {
    String projectId = "your-project-id";
    String location = "your-region"; // Format is "us" or "eu".
    String documentId = "your-document-id";
    String userId = "your-user-id"; // Format is user:<user-id>
    getDocument(projectId, location, documentId, userId);
  }

  // Retrieves details about existing Document using the document Id
  public static void getDocument(String projectId, String location, 
        String documentId, String userId) throws IOException, 
            InterruptedException, ExecutionException, TimeoutException {
    String projectNumber = getProjectNumber(projectId);

    String endpoint = "contentwarehouse.googleapis.com:443";
    if (!"us".equals(location)) {
      endpoint = String.format("%s-%s", location, endpoint);
    }
    DocumentServiceSettings documentServiceSettings = 
         DocumentServiceSettings.newBuilder().setEndpoint(endpoint).build(); 

    // Create a Document Service client
    try (DocumentServiceClient documentServiceClient =
        DocumentServiceClient.create(documentServiceSettings)) {
      /* The full resource name of the location, e.g.: 
       projects/{project_number}/location/{location}/documents/{document_id} */
      DocumentName documentName = 
          DocumentName.of(projectNumber, location, documentId);

      // Define Request Metadata for enforcing access control
      RequestMetadata requestMetadata = RequestMetadata.newBuilder()
            .setUserInfo(
            UserInfo.newBuilder()
              .setId(userId).build()).build();

      // Define request to get details of a specific Document Schema
      GetDocumentRequest getDocumentRequest = 
          GetDocumentRequest.newBuilder()
          .setName(documentName.toString())
          .setRequestMetadata(requestMetadata).build();

      // Get details of the Document 
      Document document = documentServiceClient.getDocument(getDocumentRequest);

      System.out.println(document.getName());
    }
  }

  private static String getProjectNumber(String projectId) throws IOException { 
    /* Initialize client that will be used to send requests. 
    * This client only needs to be created once, and can be reused for multiple requests. */
    try (ProjectsClient projectsClient = ProjectsClient.create()) { 
      ProjectName projectName = ProjectName.of(projectId); 
      Project project = projectsClient.getProject(projectName);
      String projectNumber = project.getName(); // Format returned is projects/xxxxxx
      return projectNumber.substring(projectNumber.lastIndexOf("/") + 1);
    } 
  }
}

Node.js

자세한 내용은 Document AI Warehouse Node.js API 참고 문서를 참고하세요.

Document AI Warehouse에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


/**
 * TODO(developer): Uncomment these variables before running the sample.
 * const projectNumber = 'YOUR_PROJECT_NUMBER';
 * const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
 * documentId = 'YOUR_DOCUMENT_ID';
 * const userId = 'user:xxx@example.com'; // Format is "user:xxx@example.com"
 */

// Import from google cloud
const {DocumentServiceClient} = require('@google-cloud/contentwarehouse').v1;

const apiEndpoint =
  location === 'us'
    ? 'contentwarehouse.googleapis.com'
    : `${location}-contentwarehouse.googleapis.com`;

// Create service client
const serviceClient = new DocumentServiceClient({
  apiEndpoint: apiEndpoint,
});

// Get Document Schema
async function getDocument() {
  // Initialize request argument(s)
  const documentRequest = {
    // The full resource name of the document, e.g.:
    // projects/{project_number}/locations/{location}/documents/{document_id}
    name: serviceClient.projectLocationDocumentPath(
      projectNumber,
      location,
      documentId
    ),
    requestMetadata: {userInfo: {id: userId}},
  };

  // Make Request
  const response = serviceClient.getDocument(documentRequest);

  // Print out response
  response.then(
    result => console.log(`Document Found: ${JSON.stringify(result)}`),
    error => console.log(`${error}`)
  );
}

reference_id의 경우:

  curl --request POST \
    --header "Authorization: Bearer $(gcloud auth print-access-token)" \
    --header "Content-Type: application/json; charset=UTF-8" -d '{
      "requestMetadata":{
        "userInfo":{
          "id": "user:USER_EMAIL"
        }
      }
    }' \
    "https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/referenceId/REFERENCE_ID:get"

문서 업데이트

document_id의 경우:

REST

posix-terminal curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents \ --header "Authorization: Bearer $(gcloud auth print-access-token)" \ --header "Content-Type: application/json; charset=utf-8" \ --data '{ "document": { "display_name": "TestDoc3", "document_schema_name": "projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/DOCUMENT_SCHEMA_ID", "raw_document_path": "gs://BUCKET_URI/FILE_URI", "properties": [ { "name": "supplier_name", "text_values": { "values": "Stanford Plumbing & Heating" } }, { "name": "total_amount", "float_values": { "values": "1091.81" } }, { "name": "invoice_id", "text_values": { "values": "invoiceid" } }, ] }, "requestMetadata": { "userInfo": { "id": "user:USER_EMAIL" } } }'

Python

자세한 내용은 Document AI Warehouse Python API 참고 문서를 참고하세요.

Document AI Warehouse에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


from google.cloud import contentwarehouse


def sample_update_document(
    document_name: str, document: contentwarehouse.Document, user_id: str
) -> contentwarehouse.CreateDocumentResponse:
    """Updates a document.

    Args:
        document_name: The resource name of the document.
                    Format: 'projects/{project_number}/
                    locations/{location}/documents/{document_id}'.
        document: Document AI Warehouse Document object..
        user_id: user_id: user:YOUR_SERVICE_ACCOUNT_ID or user:USER_EMAIL.
    Returns:
        Response object.
    """
    # Create a client
    client = contentwarehouse.DocumentServiceClient()

    # Update document fields
    # For fields which can be updated, refer
    # https://cloud.google.com/python/docs/reference/contentwarehouse/
    # latest/google.cloud.contentwarehouse_v1.types.Document
    document.display_name = "Updated Order Invoice"

    request_metadata = contentwarehouse.RequestMetadata(
        user_info=contentwarehouse.UserInfo(id=user_id)
    )

    request = contentwarehouse.UpdateDocumentRequest(
        name=document_name, document=document, request_metadata=request_metadata
    )

    # Make the request
    response = client.update_document(request=request)

    return response

Java

자세한 내용은 Document AI Warehouse Java API 참고 문서를 참고하세요.

Document AI Warehouse에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

import com.google.cloud.contentwarehouse.v1.Document;
import com.google.cloud.contentwarehouse.v1.DocumentName;
import com.google.cloud.contentwarehouse.v1.DocumentServiceClient;
import com.google.cloud.contentwarehouse.v1.DocumentServiceSettings;
import com.google.cloud.contentwarehouse.v1.GetDocumentRequest;
import com.google.cloud.contentwarehouse.v1.RequestMetadata;
import com.google.cloud.contentwarehouse.v1.UpdateDocumentRequest;
import com.google.cloud.contentwarehouse.v1.UpdateDocumentResponse;
import com.google.cloud.contentwarehouse.v1.UserInfo;
import com.google.cloud.resourcemanager.v3.Project;
import com.google.cloud.resourcemanager.v3.ProjectName;
import com.google.cloud.resourcemanager.v3.ProjectsClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class UpdateDocument {
  public static void updateDocument() throws IOException, 
        InterruptedException, ExecutionException, TimeoutException { 
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String location = "your-region"; // Format is "us" or "eu".
    String documentId = "your-document-id";
    String userId = "your-user-id"; // Format is user:<user-id>
    /* The below method call retrieves details about the document you are about to update.
    * It is important to note that some properties cannot be edited or removed. 
    * For more information on managing documents, please see the below documentation.
    * https://cloud.google.com/document-warehouse/docs/manage-documents */
    GetDocument.getDocument(projectId, location, documentId, userId);
    updateDocument(projectId, location, documentId, userId);
  }

  // Updates an existing Document
  public static void updateDocument(String projectId, String location,
        String documentId, String userId) throws IOException, InterruptedException,
          ExecutionException, TimeoutException { 
    String projectNumber = getProjectNumber(projectId);

    String endpoint = "contentwarehouse.googleapis.com:443";
    if (!"us".equals(location)) {
      endpoint = String.format("%s-%s", location, endpoint);
    }

    DocumentServiceSettings documentServiceSettings = 
             DocumentServiceSettings.newBuilder().setEndpoint(endpoint).build(); 

    /* Create the Document Service Client 
     * Initialize client that will be used to send requests. 
     * This client only needs to be created once, and can be reused for multiple requests. */
    try (DocumentServiceClient documentServiceClient = 
            DocumentServiceClient.create(documentServiceSettings)) {

      /* The full resource name of the location, e.g.: 
       projects/{project_number}/location/{location}/documentSchemas/{document_schema_id} */
      DocumentName documentName = 
          DocumentName.of(projectNumber, location, documentId);

      // Define RequestMetadata object for context of the user making the API call
      RequestMetadata requestMetadata = RequestMetadata.newBuilder()
          .setUserInfo(
          UserInfo.newBuilder()
            .setId(userId).build()).build();

      // Get the document to retreive the document schema associated with the object
      GetDocumentRequest getDocumentRequest = GetDocumentRequest.newBuilder()
          .setName(documentName.toString())
          .setRequestMetadata(requestMetadata)
          .build(); 

      // Execute the request and store response in a document object
      Document document = documentServiceClient.getDocument(getDocumentRequest);

      // Define the updates to the document that will be passed in the request
      Document updatedDocument = Document.newBuilder()
          .setDisplayName("Updated Document Display Name")
          .setDocumentSchemaName(document.getDocumentSchemaName()).build();

      // Create the request to Update the Document
      UpdateDocumentRequest updateDocumentRequest = 
            UpdateDocumentRequest.newBuilder()
              .setName(documentName.toString())
              .setDocument(updatedDocument)
              .setRequestMetadata(requestMetadata)
              .build();

      // Update Document and receive response
      UpdateDocumentResponse updateDocumentResponse = 
          documentServiceClient.updateDocument(updateDocumentRequest);

      // Read the output of Updated Document Name
      System.out.println(updateDocumentResponse.getDocument().getDisplayName());
    }
  }

  private static String getProjectNumber(String projectId) throws IOException { 
    /* Initialize client that will be used to send requests. 
    * This client only needs to be created once, and can be reused for multiple requests. */
    try (ProjectsClient projectsClient = ProjectsClient.create()) { 
      ProjectName projectName = ProjectName.of(projectId); 
      Project project = projectsClient.getProject(projectName);
      String projectNumber = project.getName(); // Format returned is projects/xxxxxx
      return projectNumber.substring(projectNumber.lastIndexOf("/") + 1);
    } 
  }
}

reference_id의 경우:

  curl --location --request POST --url https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header "Content-Type: application/json; charset=utf-8" \
--data '{
  "document": {
    "display_name": "TestDoc3",
    "document_schema_name": "projects/PROJECT_NUMBER/locations/LOCATION/documentSchemas/referenceId/REFERENCE_ID",
    "raw_document_path": "gs://BUCKET_URI/FILE_URI",
    "properties": [
      {
        "name": "supplier_name",
        "text_values": {
          "values": "Stanford Plumbing & Heating"
        }
      },
      {
        "name": "total_amount",
        "float_values": {
          "values": "1091.81"
        }
      },
      {
        "name": "invoice_id",
        "text_values": {
          "values": "invoiceid"
        }
      },
    ]
  },
  "requestMetadata": {
    "userInfo": {
      "id": "user:USER_EMAIL"
    }
  }
}'

문서 삭제

REST

document_id의 경우:

curl --request POST \
  --header "Authorization: Bearer $(gcloud auth print-access-token)" \
  --header "Content-Type: application/json; charset=UTF-8" -d '{
    "requestMetadata":{
      "userInfo":{
        "id": "user:USER_EMAIL"
      }
    }
  }' \
  "https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/DOCUMENT_ID:delete"

reference_id의 경우:

curl --request POST \
  --header "Authorization: Bearer $(gcloud auth print-access-token)" \
  --header "Content-Type: application/json; charset=UTF-8" -d '{
    "requestMetadata":{
      "userInfo":{
        "id": "user:USER_EMAIL"
      }
    }
  }' \
  "https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/documents/referenceId/REFERENCE_ID":delete"

다음 단계