Processar resposta de processamento

A resposta a um pedido de processamento contém um objeto Document que contém tudo o que se sabe sobre o documento processado, incluindo todas as informações estruturadas que a IA Documentos conseguiu extrair.

Esta página explica o esquema do objeto Document através de documentos de exemplo e, em seguida, mapeia os aspetos dos resultados de OCR para os elementos específicos do JSON do objeto Document. Também fornece bibliotecas cliente, exemplos de código e exemplos de código do SDK Document AI Toolbox. Estes exemplos de código usam o processamento online, mas a análise de objetos Document funciona da mesma forma para o processamento em lote.

handle-response-1

Os retângulos e as setas laranja e azuis representam que, pelo menos, um campo dos objetos ligados é .layout ou detectedLanguage, respetivamente. O diagrama usa a notação de pé de galinha.

Use um visualizador JSON ou um utilitário de edição especificamente concebido para expandir ou reduzir elementos. A revisão de JSON não processado numa utilidade de texto simples é ineficiente.

Texto, esquema e índices de qualidade

Segue-se um exemplo de um documento de texto:

handle-response-2

Segue-se o objeto de documento completo devolvido pelo processador Enterprise Document OCR:

Transferir JSON

Este resultado do OCR também está sempre incluído no resultado do processador do Document AI, uma vez que o OCR é executado pelos processadores. Usa os dados de OCR existentes, pelo que pode introduzir esses dados JSON através da opção de documento inline nos processadores do Document AI.

  image=None, # all our samples pass this var
  mime_type="application/json",
  inline_document=document_response # pass OCR output to CDE input - undocumented

Seguem-se alguns dos campos importantes:

Texto não processado

O campo text contém o texto que é reconhecido pela IA Documentos. Este texto não contém nenhuma estrutura de esquema além de espaços, tabulações e mudanças de linha. Este é o único campo que armazena informações textuais de um documento e serve como fonte de informações fidedignas do texto do documento. Outros campos podem referir-se a partes do campo de texto por posição (startIndex e endIndex).

  {
    text: "Sample Document\nHeading 1\nLorem ipsum dolor sit amet, ..."
  }

Tamanho da página e idiomas

Cada page no objeto document corresponde a uma página física do documento de exemplo. O resultado JSON de exemplo contém uma página porque é uma única imagem PNG.

  {
    "pages:" [
      {
        "pageNumber": 1,
        "dimension": {
          "width": 679.0,
          "height": 460.0,
          "unit": "pixels"
        },
      }
    ]
  }
  • O campo pages[].detectedLanguages[] contém os idiomas encontrados numa determinada página, juntamente com a pontuação de confiança.
{
  "pages": [
    {
      "detectedLanguages": [
        {
          "confidence": 0.98009938,
          "languageCode": "en"
        },
        {
          "confidence": 0.01990064,
          "languageCode": "und"
        }
      ]
    }
  ]
}

Dados de OCR

O OCR da Document AI deteta texto com vários níveis de detalhe ou organização na página, como blocos de texto, parágrafos, tokens e símbolos (o nível de símbolo é opcional, se estiver configurado para gerar dados ao nível do símbolo). Estes são todos os membros do objeto page.

Cada elemento tem um layout correspondente que descreve a respetiva posição e texto. Os elementos visuais não textuais (como caixas de verificação) também estão ao nível da página.

{
  "pages": [
    {
      "paragraphs": [
        {
          "layout": {
            "textAnchor": {
              "textSegments": [
                {
                  "endIndex": "16"
                }
              ]
            },
            "confidence": 0.9939527,
            "boundingPoly": {
              "vertices": [ ... ],
              "normalizedVertices": [ ... ]
            },
            "orientation": "PAGE_UP"
          }
        }
      ]
    }
  ]
}

O texto não processado é referido no objeto textAnchor que é indexado na string de texto principal com startIndex e endIndex.

  • Para boundingPoly, o canto superior esquerdo da página é a origem (0,0). Os valores X positivos estão à direita e os valores Y positivos estão para baixo.

  • O objeto vertices usa as mesmas coordenadas que a imagem original, enquanto normalizedVertices está no intervalo [0,1]. Existe uma matriz de transformação que indica as medidas de correção da distorção e outros atributos da normalização da imagem.

  • Para desenhar o boundingPoly, desenhe segmentos de linha de um vértice para o seguinte. Em seguida, feche o polígono desenhando um segmento de linha do último vértice de volta ao primeiro. O elemento orientation do esquema indica se o texto foi rodado relativamente à página.

Para ajudar a visualizar a estrutura do documento, as seguintes imagens desenham polígonos delimitadores para page.paragraphs, page.lines e page.tokens.

Parágrafos

handle-response-3

Linhas

handle-response-4

Tokens

handle-response-5

Blocos

handle-response-6

O processador Enterprise Document OCR pode realizar uma avaliação de qualidade de um documento com base na respetiva legibilidade.

Esta avaliação de qualidade é um índice de qualidade em [0, 1], em que 1 significa qualidade perfeita. O índice de qualidade é devolvido no campo Page.imageQualityScores. Todos os defeitos detetados são apresentados como quality/defect_* e ordenados por ordem descendente pelo valor de confiança.

Segue-se um PDF demasiado escuro e desfocado para ser lido confortavelmente:

Transferir PDF

Seguem-se as informações de qualidade do documento devolvidas pelo processador Enterprise Document OCR:

  {
    "pages": [
      {
        "imageQualityScores": {
          "qualityScore": 0.7811847,
          "detectedDefects": [
            {
              "type": "quality/defect_document_cutoff",
              "confidence": 1.0
            },
            {
              "type": "quality/defect_glare",
              "confidence": 0.97849524
            },
            {
              "type": "quality/defect_text_cutoff",
              "confidence": 0.5
            }
          ]
        }
      }
    ]
  }

Exemplos de código

Os seguintes exemplos de código demonstram como enviar um pedido de processamento e, em seguida, ler e imprimir os campos no terminal:

Java

Para mais informações, consulte a documentação de referência da API Java Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


import com.google.cloud.documentai.v1beta3.Document;
import com.google.cloud.documentai.v1beta3.DocumentProcessorServiceClient;
import com.google.cloud.documentai.v1beta3.DocumentProcessorServiceSettings;
import com.google.cloud.documentai.v1beta3.ProcessRequest;
import com.google.cloud.documentai.v1beta3.ProcessResponse;
import com.google.cloud.documentai.v1beta3.RawDocument;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class ProcessOcrDocument {
  public static void processOcrDocument()
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String location = "your-project-location"; // Format is "us" or "eu".
    String processerId = "your-processor-id";
    String filePath = "path/to/input/file.pdf";
    processOcrDocument(projectId, location, processerId, filePath);
  }

  public static void processOcrDocument(
      String projectId, String location, String processorId, String filePath)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs
    // to be created
    // once, and can be reused for multiple requests. After completing all of your
    // requests, call
    // the "close" method on the client to safely clean up any remaining background
    // resources.
    String endpoint = String.format("%s-documentai.googleapis.com:443", location);
    DocumentProcessorServiceSettings settings =
        DocumentProcessorServiceSettings.newBuilder().setEndpoint(endpoint).build();
    try (DocumentProcessorServiceClient client = DocumentProcessorServiceClient.create(settings)) {
      // The full resource name of the processor, e.g.:
      // projects/project-id/locations/location/processor/processor-id
      // You must create new processors in the Cloud Console first
      String name =
          String.format("projects/%s/locations/%s/processors/%s", projectId, location, processorId);

      // Read the file.
      byte[] imageFileData = Files.readAllBytes(Paths.get(filePath));

      // Convert the image data to a Buffer and base64 encode it.
      ByteString content = ByteString.copyFrom(imageFileData);

      RawDocument document =
          RawDocument.newBuilder().setContent(content).setMimeType("application/pdf").build();

      // Configure the process request.
      ProcessRequest request =
          ProcessRequest.newBuilder().setName(name).setRawDocument(document).build();

      // Recognizes text entities in the PDF document
      ProcessResponse result = client.processDocument(request);
      Document documentResponse = result.getDocument();

      System.out.println("Document processing complete.");

      // Read the text recognition output from the processor
      // For a full list of Document object attributes,
      // please reference this page:
      // https://googleapis.dev/java/google-cloud-document-ai/latest/index.html

      // Get all of the document text as one big string
      String text = documentResponse.getText();
      System.out.printf("Full document text: '%s'\n", escapeNewlines(text));

      // Read the text recognition output from the processor
      List<Document.Page> pages = documentResponse.getPagesList();
      System.out.printf("There are %s page(s) in this document.\n", pages.size());

      for (Document.Page page : pages) {
        System.out.printf("Page %d:\n", page.getPageNumber());
        printPageDimensions(page.getDimension());
        printDetectedLanguages(page.getDetectedLanguagesList());
        printParagraphs(page.getParagraphsList(), text);
        printBlocks(page.getBlocksList(), text);
        printLines(page.getLinesList(), text);
        printTokens(page.getTokensList(), text);
      }
    }
  }

  private static void printPageDimensions(Document.Page.Dimension dimension) {
    String unit = dimension.getUnit();
    System.out.printf("    Width: %.1f %s\n", dimension.getWidth(), unit);
    System.out.printf("    Height: %.1f %s\n", dimension.getHeight(), unit);
  }

  private static void printDetectedLanguages(
      List<Document.Page.DetectedLanguage> detectedLangauges) {
    System.out.println("    Detected languages:");
    for (Document.Page.DetectedLanguage detectedLanguage : detectedLangauges) {
      String languageCode = detectedLanguage.getLanguageCode();
      float confidence = detectedLanguage.getConfidence();
      System.out.printf("        %s (%.2f%%)\n", languageCode, confidence * 100.0);
    }
  }

  private static void printParagraphs(List<Document.Page.Paragraph> paragraphs, String text) {
    System.out.printf("    %d paragraphs detected:\n", paragraphs.size());
    Document.Page.Paragraph firstParagraph = paragraphs.get(0);
    String firstParagraphText = getLayoutText(firstParagraph.getLayout().getTextAnchor(), text);
    System.out.printf("        First paragraph text: %s\n", escapeNewlines(firstParagraphText));
    Document.Page.Paragraph lastParagraph = paragraphs.get(paragraphs.size() - 1);
    String lastParagraphText = getLayoutText(lastParagraph.getLayout().getTextAnchor(), text);
    System.out.printf("        Last paragraph text: %s\n", escapeNewlines(lastParagraphText));
  }

  private static void printBlocks(List<Document.Page.Block> blocks, String text) {
    System.out.printf("    %d blocks detected:\n", blocks.size());
    Document.Page.Block firstBlock = blocks.get(0);
    String firstBlockText = getLayoutText(firstBlock.getLayout().getTextAnchor(), text);
    System.out.printf("        First block text: %s\n", escapeNewlines(firstBlockText));
    Document.Page.Block lastBlock = blocks.get(blocks.size() - 1);
    String lastBlockText = getLayoutText(lastBlock.getLayout().getTextAnchor(), text);
    System.out.printf("        Last block text: %s\n", escapeNewlines(lastBlockText));
  }

  private static void printLines(List<Document.Page.Line> lines, String text) {
    System.out.printf("    %d lines detected:\n", lines.size());
    Document.Page.Line firstLine = lines.get(0);
    String firstLineText = getLayoutText(firstLine.getLayout().getTextAnchor(), text);
    System.out.printf("        First line text: %s\n", escapeNewlines(firstLineText));
    Document.Page.Line lastLine = lines.get(lines.size() - 1);
    String lastLineText = getLayoutText(lastLine.getLayout().getTextAnchor(), text);
    System.out.printf("        Last line text: %s\n", escapeNewlines(lastLineText));
  }

  private static void printTokens(List<Document.Page.Token> tokens, String text) {
    System.out.printf("    %d tokens detected:\n", tokens.size());
    Document.Page.Token firstToken = tokens.get(0);
    String firstTokenText = getLayoutText(firstToken.getLayout().getTextAnchor(), text);
    System.out.printf("        First token text: %s\n", escapeNewlines(firstTokenText));
    Document.Page.Token lastToken = tokens.get(tokens.size() - 1);
    String lastTokenText = getLayoutText(lastToken.getLayout().getTextAnchor(), text);
    System.out.printf("        Last token text: %s\n", escapeNewlines(lastTokenText));
  }

  // Extract shards from the text field
  private static String getLayoutText(Document.TextAnchor textAnchor, String text) {
    if (textAnchor.getTextSegmentsList().size() > 0) {
      int startIdx = (int) textAnchor.getTextSegments(0).getStartIndex();
      int endIdx = (int) textAnchor.getTextSegments(0).getEndIndex();
      return text.substring(startIdx, endIdx);
    }
    return "[NO TEXT]";
  }

  private static String escapeNewlines(String s) {
    return s.replace("\n", "\\n").replace("\r", "\\r");
  }
}

Node.js

Para mais informações, consulte a documentação de referência da API Node.js Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
// const processorId = 'YOUR_PROCESSOR_ID'; // Create processor in Cloud Console
// const filePath = '/path/to/local/pdf';

const {DocumentProcessorServiceClient} =
  require('@google-cloud/documentai').v1beta3;

// Instantiates a client
const client = new DocumentProcessorServiceClient();

async function processDocument() {
  // The full resource name of the processor, e.g.:
  // projects/project-id/locations/location/processor/processor-id
  // You must create new processors in the Cloud Console first
  const name = `projects/${projectId}/locations/${location}/processors/${processorId}`;

  // Read the file into memory.
  const fs = require('fs').promises;
  const imageFile = await fs.readFile(filePath);

  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const request = {
    name,
    rawDocument: {
      content: encodedImage,
      mimeType: 'application/pdf',
    },
  };

  // Recognizes text entities in the PDF document
  const [result] = await client.processDocument(request);

  console.log('Document processing complete.');

  // Read the text recognition output from the processor
  // For a full list of Document object attributes,
  // please reference this page: https://googleapis.dev/nodejs/documentai/latest/index.html
  const {document} = result;
  const {text} = document;

  // Read the text recognition output from the processor
  console.log(`Full document text: ${JSON.stringify(text)}`);
  console.log(`There are ${document.pages.length} page(s) in this document.`);
  for (const page of document.pages) {
    console.log(`Page ${page.pageNumber}`);
    printPageDimensions(page.dimension);
    printDetectedLanguages(page.detectedLanguages);
    printParagraphs(page.paragraphs, text);
    printBlocks(page.blocks, text);
    printLines(page.lines, text);
    printTokens(page.tokens, text);
  }
}

const printPageDimensions = dimension => {
  console.log(`    Width: ${dimension.width}`);
  console.log(`    Height: ${dimension.height}`);
};

const printDetectedLanguages = detectedLanguages => {
  console.log('    Detected languages:');
  for (const lang of detectedLanguages) {
    const code = lang.languageCode;
    const confPercent = lang.confidence * 100;
    console.log(`        ${code} (${confPercent.toFixed(2)}% confidence)`);
  }
};

const printParagraphs = (paragraphs, text) => {
  console.log(`    ${paragraphs.length} paragraphs detected:`);
  const firstParagraphText = getText(paragraphs[0].layout.textAnchor, text);
  console.log(
    `        First paragraph text: ${JSON.stringify(firstParagraphText)}`
  );
  const lastParagraphText = getText(
    paragraphs[paragraphs.length - 1].layout.textAnchor,
    text
  );
  console.log(
    `        Last paragraph text: ${JSON.stringify(lastParagraphText)}`
  );
};

const printBlocks = (blocks, text) => {
  console.log(`    ${blocks.length} blocks detected:`);
  const firstBlockText = getText(blocks[0].layout.textAnchor, text);
  console.log(`        First block text: ${JSON.stringify(firstBlockText)}`);
  const lastBlockText = getText(
    blocks[blocks.length - 1].layout.textAnchor,
    text
  );
  console.log(`        Last block text: ${JSON.stringify(lastBlockText)}`);
};

const printLines = (lines, text) => {
  console.log(`    ${lines.length} lines detected:`);
  const firstLineText = getText(lines[0].layout.textAnchor, text);
  console.log(`        First line text: ${JSON.stringify(firstLineText)}`);
  const lastLineText = getText(
    lines[lines.length - 1].layout.textAnchor,
    text
  );
  console.log(`        Last line text: ${JSON.stringify(lastLineText)}`);
};

const printTokens = (tokens, text) => {
  console.log(`    ${tokens.length} tokens detected:`);
  const firstTokenText = getText(tokens[0].layout.textAnchor, text);
  console.log(`        First token text: ${JSON.stringify(firstTokenText)}`);
  const firstTokenBreakType = tokens[0].detectedBreak.type;
  console.log(`        First token break type: ${firstTokenBreakType}`);
  const lastTokenText = getText(
    tokens[tokens.length - 1].layout.textAnchor,
    text
  );
  console.log(`        Last token text: ${JSON.stringify(lastTokenText)}`);
  const lastTokenBreakType = tokens[tokens.length - 1].detectedBreak.type;
  console.log(`        Last token break type: ${lastTokenBreakType}`);
};

// Extract shards from the text field
const getText = (textAnchor, text) => {
  if (!textAnchor.textSegments || textAnchor.textSegments.length === 0) {
    return '';
  }

  // First shard in document doesn't have startIndex property
  const startIndex = textAnchor.textSegments[0].startIndex || 0;
  const endIndex = textAnchor.textSegments[0].endIndex;

  return text.substring(startIndex, endIndex);
};

Python

Para mais informações, consulte a documentação de referência da API Python Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


from typing import Optional, Sequence

from google.api_core.client_options import ClientOptions
from google.cloud import documentai

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION" # Format is "us" or "eu"
# processor_id = "YOUR_PROCESSOR_ID" # Create processor before running sample
# processor_version = "rc" # Refer to https://cloud.google.com/document-ai/docs/manage-processor-versions for more information
# file_path = "/path/to/local/pdf"
# mime_type = "application/pdf" # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types


def process_document_ocr_sample(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
) -> None:
    # Optional: Additional configurations for Document OCR Processor.
    # For more information: https://cloud.google.com/document-ai/docs/enterprise-document-ocr
    process_options = documentai.ProcessOptions(
        ocr_config=documentai.OcrConfig(
            enable_native_pdf_parsing=True,
            enable_image_quality_scores=True,
            enable_symbol=True,
            # OCR Add Ons https://cloud.google.com/document-ai/docs/ocr-add-ons
            premium_features=documentai.OcrConfig.PremiumFeatures(
                compute_style_info=True,
                enable_math_ocr=False,  # Enable to use Math OCR Model
                enable_selection_mark_detection=True,
            ),
        )
    )
    # Online processing request to Document AI
    document = process_document(
        project_id,
        location,
        processor_id,
        processor_version,
        file_path,
        mime_type,
        process_options=process_options,
    )

    text = document.text
    print(f"Full document text: {text}\n")
    print(f"There are {len(document.pages)} page(s) in this document.\n")

    for page in document.pages:
        print(f"Page {page.page_number}:")
        print_page_dimensions(page.dimension)
        print_detected_languages(page.detected_languages)

        print_blocks(page.blocks, text)
        print_paragraphs(page.paragraphs, text)
        print_lines(page.lines, text)
        print_tokens(page.tokens, text)

        if page.symbols:
            print_symbols(page.symbols, text)

        if page.image_quality_scores:
            print_image_quality_scores(page.image_quality_scores)

        if page.visual_elements:
            print_visual_elements(page.visual_elements, text)


def print_page_dimensions(dimension: documentai.Document.Page.Dimension) -> None:
    print(f"    Width: {str(dimension.width)}")
    print(f"    Height: {str(dimension.height)}")


def print_detected_languages(
    detected_languages: Sequence[documentai.Document.Page.DetectedLanguage],
) -> None:
    print("    Detected languages:")
    for lang in detected_languages:
        print(f"        {lang.language_code} ({lang.confidence:.1%} confidence)")


def print_blocks(blocks: Sequence[documentai.Document.Page.Block], text: str) -> None:
    print(f"    {len(blocks)} blocks detected:")
    first_block_text = layout_to_text(blocks[0].layout, text)
    print(f"        First text block: {repr(first_block_text)}")
    last_block_text = layout_to_text(blocks[-1].layout, text)
    print(f"        Last text block: {repr(last_block_text)}")


def print_paragraphs(
    paragraphs: Sequence[documentai.Document.Page.Paragraph], text: str
) -> None:
    print(f"    {len(paragraphs)} paragraphs detected:")
    first_paragraph_text = layout_to_text(paragraphs[0].layout, text)
    print(f"        First paragraph text: {repr(first_paragraph_text)}")
    last_paragraph_text = layout_to_text(paragraphs[-1].layout, text)
    print(f"        Last paragraph text: {repr(last_paragraph_text)}")


def print_lines(lines: Sequence[documentai.Document.Page.Line], text: str) -> None:
    print(f"    {len(lines)} lines detected:")
    first_line_text = layout_to_text(lines[0].layout, text)
    print(f"        First line text: {repr(first_line_text)}")
    last_line_text = layout_to_text(lines[-1].layout, text)
    print(f"        Last line text: {repr(last_line_text)}")


def print_tokens(tokens: Sequence[documentai.Document.Page.Token], text: str) -> None:
    print(f"    {len(tokens)} tokens detected:")
    first_token_text = layout_to_text(tokens[0].layout, text)
    first_token_break_type = tokens[0].detected_break.type_.name
    print(f"        First token text: {repr(first_token_text)}")
    print(f"        First token break type: {repr(first_token_break_type)}")
    if tokens[0].style_info:
        print_style_info(tokens[0].style_info)

    last_token_text = layout_to_text(tokens[-1].layout, text)
    last_token_break_type = tokens[-1].detected_break.type_.name
    print(f"        Last token text: {repr(last_token_text)}")
    print(f"        Last token break type: {repr(last_token_break_type)}")
    if tokens[-1].style_info:
        print_style_info(tokens[-1].style_info)


def print_symbols(
    symbols: Sequence[documentai.Document.Page.Symbol], text: str
) -> None:
    print(f"    {len(symbols)} symbols detected:")
    first_symbol_text = layout_to_text(symbols[0].layout, text)
    print(f"        First symbol text: {repr(first_symbol_text)}")
    last_symbol_text = layout_to_text(symbols[-1].layout, text)
    print(f"        Last symbol text: {repr(last_symbol_text)}")


def print_image_quality_scores(
    image_quality_scores: documentai.Document.Page.ImageQualityScores,
) -> None:
    print(f"    Quality score: {image_quality_scores.quality_score:.1%}")
    print("    Detected defects:")

    for detected_defect in image_quality_scores.detected_defects:
        print(f"        {detected_defect.type_}: {detected_defect.confidence:.1%}")


def print_style_info(style_info: documentai.Document.Page.Token.StyleInfo) -> None:
    """
    Only supported in version `pretrained-ocr-v2.0-2023-06-02`
    """
    print(f"           Font Size: {style_info.font_size}pt")
    print(f"           Font Type: {style_info.font_type}")
    print(f"           Bold: {style_info.bold}")
    print(f"           Italic: {style_info.italic}")
    print(f"           Underlined: {style_info.underlined}")
    print(f"           Handwritten: {style_info.handwritten}")
    print(
        f"           Text Color (RGBa): {style_info.text_color.red}, {style_info.text_color.green}, {style_info.text_color.blue}, {style_info.text_color.alpha}"
    )


def print_visual_elements(
    visual_elements: Sequence[documentai.Document.Page.VisualElement], text: str
) -> None:
    """
    Only supported in version `pretrained-ocr-v2.0-2023-06-02`
    """
    checkboxes = [x for x in visual_elements if "checkbox" in x.type]
    math_symbols = [x for x in visual_elements if x.type == "math_formula"]

    if checkboxes:
        print(f"    {len(checkboxes)} checkboxes detected:")
        print(f"        First checkbox: {repr(checkboxes[0].type)}")
        print(f"        Last checkbox: {repr(checkboxes[-1].type)}")

    if math_symbols:
        print(f"    {len(math_symbols)} math symbols detected:")
        first_math_symbol_text = layout_to_text(math_symbols[0].layout, text)
        print(f"        First math symbol: {repr(first_math_symbol_text)}")




def process_document(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
    process_options: Optional[documentai.ProcessOptions] = None,
) -> documentai.Document:
    # You must set the `api_endpoint` if you use a location other than "us".
    client = documentai.DocumentProcessorServiceClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-documentai.googleapis.com"
        )
    )

    # The full resource name of the processor version, e.g.:
    # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
    # You must create a processor before running this sample.
    name = client.processor_version_path(
        project_id, location, processor_id, processor_version
    )

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Configure the process request
    request = documentai.ProcessRequest(
        name=name,
        raw_document=documentai.RawDocument(content=image_content, mime_type=mime_type),
        # Only supported for Document OCR processor
        process_options=process_options,
    )

    result = client.process_document(request=request)

    # For a full list of `Document` object attributes, reference this page:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1/Document
    return result.document




def layout_to_text(layout: documentai.Document.Page.Layout, text: str) -> str:
    """
    Document AI identifies text in different parts of the document by their
    offsets in the entirety of the document"s text. This function converts
    offsets to a string.
    """
    # If a text segment spans several lines, it will
    # be stored in different text segments.
    return "".join(
        text[int(segment.start_index) : int(segment.end_index)]
        for segment in layout.text_anchor.text_segments
    )

Formulários e tabelas

Segue-se o nosso formulário de exemplo:

handle-response-7

Segue-se o objeto de documento completo devolvido pelo Form Parser:

Transferir JSON

Seguem-se alguns dos campos importantes:

O analisador de formulários consegue detetar FormFields na página. Cada campo de formulário tem um nome e um valor. Estes também são denominados pares de chave-valor (KVP). Tenha em atenção que os KVPs são diferentes das entidades (esquema) noutros extratores:

Os nomes das entidades estão configurados. As chaves nos PCVs são literalmente o texto da chave no documento.

{
  "pages:" [
    {
      "formFields": [
        {
          "fieldName": { ... },
          "fieldValue": { ... }
        }
      ]
    }
  ]
}
  • A IA Documentos também pode detetar Tables na página.
{
  "pages:" [
    {
      "tables": [
        {
          "layout": { ... },
          "headerRows": [
            {
              "cells": [
                {
                  "layout": { ... },
                  "rowSpan": 1,
                  "colSpan": 1
                },
                {
                  "layout": { ... },
                  "rowSpan": 1,
                  "colSpan": 1
                }
              ]
            }
          ],
          "bodyRows": [
            {
              "cells": [
                {
                  "layout": { ... },
                  "rowSpan": 1,
                  "colSpan": 1
                },
                {
                  "layout": { ... },
                  "rowSpan": 1,
                  "colSpan": 1
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

A extração de tabelas no analisador de formulários só reconhece tabelas convencionais, ou seja, sem células que abranjam linhas ou colunas. Assim, rowSpan e colSpan estão sempre 1.

  • A partir da versão do processador pretrained-form-parser-v2.0-2022-11-10, o analisador de formulários também pode reconhecer entidades genéricas. Para mais informações, consulte o artigo Form Parser.

  • Para ajudar a visualizar a estrutura do documento, as imagens seguintes desenham polígonos delimitadores para page.formFields e page.tables.

  • Caixas de verificação em tabelas. O analisador de formulários consegue digitalizar caixas de verificação de imagens e PDFs como KVPs. Fornecer um exemplo de digitalização de caixas de verificação como um par de chave-valor.

handle-response-8

Fora das tabelas, as caixas de verificação são representadas como elementos visuais no analisador de formulários. Realçar as caixas quadradas com marcas de verificação na IU e o unicode no JSON.

handle-response-9

"pages:" [
    {
      "tables": [
        {
          "layout": { ... },
          "headerRows": [
            {
              "cells": [
                {
                  "layout": { ... },
                  "rowSpan": 1,
                  "colSpan": 1
                },
                {
                  "layout": { ... },
                  "rowSpan": 1,
                  "colSpan": 1
                }
              ]
            }
          ],
          "bodyRows": [
            {
              "cells": [
                {
                  "layout": { ... },
                  "rowSpan": 1,
                  "colSpan": 1
                },
                {
                  "layout": { ... },
                  "rowSpan": 1,
                  "colSpan": 1
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Nas tabelas, as caixas de verificação aparecem como carateres Unicode, como (selecionado) ou (desmarcado).

As caixas de verificação preenchidas têm o valor filled_checkbox: under pages > x > formFields > x > fieldValue > valueType.. As caixas de verificação desmarcadas têm o valor unfilled_checkbox.

handle-response-10

Os campos de conteúdo mostram o valor de conteúdo da caixa de verificação realçado com no caminho pages>formFields>x>fieldValue>textAnchor>content.

Para ajudar a visualizar a estrutura do documento, as imagens seguintes desenham polígonos delimitadores para page.formFields e page.tables.

Campos do formulário

handle-response-11

Tabelas

handle-response-12

Exemplos de código

Os seguintes exemplos de código demonstram como enviar um pedido de processamento e, em seguida, ler e imprimir os campos no terminal:

Java

Para mais informações, consulte a documentação de referência da API Java Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


import com.google.cloud.documentai.v1beta3.Document;
import com.google.cloud.documentai.v1beta3.DocumentProcessorServiceClient;
import com.google.cloud.documentai.v1beta3.DocumentProcessorServiceSettings;
import com.google.cloud.documentai.v1beta3.ProcessRequest;
import com.google.cloud.documentai.v1beta3.ProcessResponse;
import com.google.cloud.documentai.v1beta3.RawDocument;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class ProcessFormDocument {
  public static void processFormDocument()
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String location = "your-project-location"; // Format is "us" or "eu".
    String processerId = "your-processor-id";
    String filePath = "path/to/input/file.pdf";
    processFormDocument(projectId, location, processerId, filePath);
  }

  public static void processFormDocument(
      String projectId, String location, String processorId, String filePath)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs
    // to be created
    // once, and can be reused for multiple requests. After completing all of your
    // requests, call
    // the "close" method on the client to safely clean up any remaining background
    // resources.
    String endpoint = String.format("%s-documentai.googleapis.com:443", location);
    DocumentProcessorServiceSettings settings =
        DocumentProcessorServiceSettings.newBuilder().setEndpoint(endpoint).build();
    try (DocumentProcessorServiceClient client = DocumentProcessorServiceClient.create(settings)) {
      // The full resource name of the processor, e.g.:
      // projects/project-id/locations/location/processor/processor-id
      // You must create new processors in the Cloud Console first
      String name =
          String.format("projects/%s/locations/%s/processors/%s", projectId, location, processorId);

      // Read the file.
      byte[] imageFileData = Files.readAllBytes(Paths.get(filePath));

      // Convert the image data to a Buffer and base64 encode it.
      ByteString content = ByteString.copyFrom(imageFileData);

      RawDocument document =
          RawDocument.newBuilder().setContent(content).setMimeType("application/pdf").build();

      // Configure the process request.
      ProcessRequest request =
          ProcessRequest.newBuilder().setName(name).setRawDocument(document).build();

      // Recognizes text entities in the PDF document
      ProcessResponse result = client.processDocument(request);
      Document documentResponse = result.getDocument();

      System.out.println("Document processing complete.");

      // Read the text recognition output from the processor
      // For a full list of Document object attributes,
      // please reference this page:
      // https://googleapis.dev/java/google-cloud-document-ai/latest/index.html

      // Get all of the document text as one big string
      String text = documentResponse.getText();
      System.out.printf("Full document text: '%s'\n", removeNewlines(text));

      // Read the text recognition output from the processor
      List<Document.Page> pages = documentResponse.getPagesList();
      System.out.printf("There are %s page(s) in this document.\n", pages.size());

      for (Document.Page page : pages) {
        System.out.printf("\n\n**** Page %d ****\n", page.getPageNumber());

        List<Document.Page.Table> tables = page.getTablesList();
        System.out.printf("Found %d table(s):\n", tables.size());
        for (Document.Page.Table table : tables) {
          printTableInfo(table, text);
        }

        List<Document.Page.FormField> formFields = page.getFormFieldsList();
        System.out.printf("Found %d form fields:\n", formFields.size());
        for (Document.Page.FormField formField : formFields) {
          String fieldName = getLayoutText(formField.getFieldName().getTextAnchor(), text);
          String fieldValue = getLayoutText(formField.getFieldValue().getTextAnchor(), text);
          System.out.printf(
              "    * '%s': '%s'\n", removeNewlines(fieldName), removeNewlines(fieldValue));
        }
      }
    }
  }

  private static void printTableInfo(Document.Page.Table table, String text) {
    Document.Page.Table.TableRow firstBodyRow = table.getBodyRows(0);
    int columnCount = firstBodyRow.getCellsCount();
    System.out.printf(
        "    Table with %d columns and %d rows:\n", columnCount, table.getBodyRowsCount());

    Document.Page.Table.TableRow headerRow = table.getHeaderRows(0);
    StringBuilder headerRowText = new StringBuilder();
    for (Document.Page.Table.TableCell cell : headerRow.getCellsList()) {
      String columnName = getLayoutText(cell.getLayout().getTextAnchor(), text);
      headerRowText.append(String.format("%s | ", removeNewlines(columnName)));
    }
    headerRowText.setLength(headerRowText.length() - 3);
    System.out.printf("        Collumns: %s\n", headerRowText.toString());

    StringBuilder firstRowText = new StringBuilder();
    for (Document.Page.Table.TableCell cell : firstBodyRow.getCellsList()) {
      String cellText = getLayoutText(cell.getLayout().getTextAnchor(), text);
      firstRowText.append(String.format("%s | ", removeNewlines(cellText)));
    }
    firstRowText.setLength(firstRowText.length() - 3);
    System.out.printf("        First row data: %s\n", firstRowText.toString());
  }

  // Extract shards from the text field
  private static String getLayoutText(Document.TextAnchor textAnchor, String text) {
    if (textAnchor.getTextSegmentsList().size() > 0) {
      int startIdx = (int) textAnchor.getTextSegments(0).getStartIndex();
      int endIdx = (int) textAnchor.getTextSegments(0).getEndIndex();
      return text.substring(startIdx, endIdx);
    }
    return "[NO TEXT]";
  }

  private static String removeNewlines(String s) {
    return s.replace("\n", "").replace("\r", "");
  }
}

Node.js

Para mais informações, consulte a documentação de referência da API Node.js Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
// const processorId = 'YOUR_PROCESSOR_ID'; // Create processor in Cloud Console
// const filePath = '/path/to/local/pdf';

const {DocumentProcessorServiceClient} =
  require('@google-cloud/documentai').v1beta3;

// Instantiates a client
const client = new DocumentProcessorServiceClient();

async function processDocument() {
  // The full resource name of the processor, e.g.:
  // projects/project-id/locations/location/processor/processor-id
  // You must create new processors in the Cloud Console first
  const name = `projects/${projectId}/locations/${location}/processors/${processorId}`;

  // Read the file into memory.
  const fs = require('fs').promises;
  const imageFile = await fs.readFile(filePath);

  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const request = {
    name,
    rawDocument: {
      content: encodedImage,
      mimeType: 'application/pdf',
    },
  };

  // Recognizes text entities in the PDF document
  const [result] = await client.processDocument(request);

  console.log('Document processing complete.');

  // Read the table and form fields output from the processor
  // The form processor also contains OCR data. For more information
  // on how to parse OCR data please see the OCR sample.
  // For a full list of Document object attributes,
  // please reference this page: https://googleapis.dev/nodejs/documentai/latest/index.html
  const {document} = result;
  const {text} = document;
  console.log(`Full document text: ${JSON.stringify(text)}`);
  console.log(`There are ${document.pages.length} page(s) in this document.`);

  for (const page of document.pages) {
    console.log(`\n\n**** Page ${page.pageNumber} ****`);

    console.log(`Found ${page.tables.length} table(s):`);
    for (const table of page.tables) {
      const numCollumns = table.headerRows[0].cells.length;
      const numRows = table.bodyRows.length;
      console.log(`Table with ${numCollumns} columns and ${numRows} rows:`);
      printTableInfo(table, text);
    }
    console.log(`Found ${page.formFields.length} form field(s):`);
    for (const field of page.formFields) {
      const fieldName = getText(field.fieldName.textAnchor, text);
      const fieldValue = getText(field.fieldValue.textAnchor, text);
      console.log(
        `\t* ${JSON.stringify(fieldName)}: ${JSON.stringify(fieldValue)}`
      );
    }
  }
}

const printTableInfo = (table, text) => {
  // Print header row
  let headerRowText = '';
  for (const headerCell of table.headerRows[0].cells) {
    const headerCellText = getText(headerCell.layout.textAnchor, text);
    headerRowText += `${JSON.stringify(headerCellText.trim())} | `;
  }
  console.log(
    `Collumns: ${headerRowText.substring(0, headerRowText.length - 3)}`
  );
  // Print first body row
  let bodyRowText = '';
  for (const bodyCell of table.bodyRows[0].cells) {
    const bodyCellText = getText(bodyCell.layout.textAnchor, text);
    bodyRowText += `${JSON.stringify(bodyCellText.trim())} | `;
  }
  console.log(
    `First row data: ${bodyRowText.substring(0, bodyRowText.length - 3)}`
  );
};

// Extract shards from the text field
const getText = (textAnchor, text) => {
  if (!textAnchor.textSegments || textAnchor.textSegments.length === 0) {
    return '';
  }

  // First shard in document doesn't have startIndex property
  const startIndex = textAnchor.textSegments[0].startIndex || 0;
  const endIndex = textAnchor.textSegments[0].endIndex;

  return text.substring(startIndex, endIndex);
};

Python

Para mais informações, consulte a documentação de referência da API Python Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


from typing import Optional, Sequence

from google.api_core.client_options import ClientOptions
from google.cloud import documentai

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION" # Format is "us" or "eu"
# processor_id = "YOUR_PROCESSOR_ID" # Create processor before running sample
# processor_version = "rc" # Refer to https://cloud.google.com/document-ai/docs/manage-processor-versions for more information
# file_path = "/path/to/local/pdf"
# mime_type = "application/pdf" # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types


def process_document_form_sample(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
) -> documentai.Document:
    # Online processing request to Document AI
    document = process_document(
        project_id, location, processor_id, processor_version, file_path, mime_type
    )

    # Read the table and form fields output from the processor
    # The form processor also contains OCR data. For more information
    # on how to parse OCR data please see the OCR sample.

    text = document.text
    print(f"Full document text: {repr(text)}\n")
    print(f"There are {len(document.pages)} page(s) in this document.")

    # Read the form fields and tables output from the processor
    for page in document.pages:
        print(f"\n\n**** Page {page.page_number} ****")

        print(f"\nFound {len(page.tables)} table(s):")
        for table in page.tables:
            num_columns = len(table.header_rows[0].cells)
            num_rows = len(table.body_rows)
            print(f"Table with {num_columns} columns and {num_rows} rows:")

            # Print header rows
            print("Columns:")
            print_table_rows(table.header_rows, text)
            # Print body rows
            print("Table body data:")
            print_table_rows(table.body_rows, text)

        print(f"\nFound {len(page.form_fields)} form field(s):")
        for field in page.form_fields:
            name = layout_to_text(field.field_name, text)
            value = layout_to_text(field.field_value, text)
            print(f"    * {repr(name.strip())}: {repr(value.strip())}")

    # Supported in version `pretrained-form-parser-v2.0-2022-11-10` and later.
    # For more information: https://cloud.google.com/document-ai/docs/form-parser
    if document.entities:
        print(f"Found {len(document.entities)} generic entities:")
        for entity in document.entities:
            print_entity(entity)
            # Print Nested Entities
            for prop in entity.properties:
                print_entity(prop)

    return document


def print_table_rows(
    table_rows: Sequence[documentai.Document.Page.Table.TableRow], text: str
) -> None:
    for table_row in table_rows:
        row_text = ""
        for cell in table_row.cells:
            cell_text = layout_to_text(cell.layout, text)
            row_text += f"{repr(cell_text.strip())} | "
        print(row_text)




def print_entity(entity: documentai.Document.Entity) -> None:
    # Fields detected. For a full list of fields for each processor see
    # the processor documentation:
    # https://cloud.google.com/document-ai/docs/processors-list
    key = entity.type_

    # Some other value formats in addition to text are available
    # e.g. dates: `entity.normalized_value.date_value.year`
    text_value = entity.text_anchor.content or entity.mention_text
    confidence = entity.confidence
    normalized_value = entity.normalized_value.text
    print(f"    * {repr(key)}: {repr(text_value)} ({confidence:.1%} confident)")

    if normalized_value:
        print(f"    * Normalized Value: {repr(normalized_value)}")




def process_document(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
    process_options: Optional[documentai.ProcessOptions] = None,
) -> documentai.Document:
    # You must set the `api_endpoint` if you use a location other than "us".
    client = documentai.DocumentProcessorServiceClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-documentai.googleapis.com"
        )
    )

    # The full resource name of the processor version, e.g.:
    # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
    # You must create a processor before running this sample.
    name = client.processor_version_path(
        project_id, location, processor_id, processor_version
    )

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Configure the process request
    request = documentai.ProcessRequest(
        name=name,
        raw_document=documentai.RawDocument(content=image_content, mime_type=mime_type),
        # Only supported for Document OCR processor
        process_options=process_options,
    )

    result = client.process_document(request=request)

    # For a full list of `Document` object attributes, reference this page:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1/Document
    return result.document




def layout_to_text(layout: documentai.Document.Page.Layout, text: str) -> str:
    """
    Document AI identifies text in different parts of the document by their
    offsets in the entirety of the document"s text. This function converts
    offsets to a string.
    """
    # If a text segment spans several lines, it will
    # be stored in different text segments.
    return "".join(
        text[int(segment.start_index) : int(segment.end_index)]
        for segment in layout.text_anchor.text_segments
    )

Entidades, entidades aninhadas e valores normalizados

Muitos dos processadores especializados extraem dados estruturados baseados num esquema bem definido. Por exemplo, o analisador de faturas deteta campos específicos, como invoice_date e supplier_name. Segue-se um exemplo de fatura:

handle-response-13

Segue-se o objeto de documento completo devolvido pelo analisador de faturas:

Transferir JSON

Seguem-se algumas das partes importantes do objeto document:

  • Campos detetados: Entities contém os campos que o processador conseguiu detetar, por exemplo, o campo invoice_date:

    {
     "entities": [
        {
          "textAnchor": {
            "textSegments": [
              {
                "startIndex": "14",
                "endIndex": "24"
              }
            ],
            "content": "2020/01/01"
          },
          "type": "invoice_date",
          "confidence": 0.9938466,
          "pageAnchor": { ... },
          "id": "2",
          "normalizedValue": {
            "text": "2020-01-01",
            "dateValue": {
              "year": 2020,
              "month": 1,
              "day": 1
            }
          }
        }
      ]
    }
    

    Para determinados campos, o processador também normaliza o valor. Neste exemplo, a data foi normalizada de 2020/01/01 para 2020-01-01.

  • Normalização: para muitos campos suportados específicos, o processador também normaliza o valor e também devolve um entity. O campo normalizedValue é adicionado ao campo extraído não processado obtido através do textAnchor de cada entidade. Assim, normaliza o texto literal, dividindo frequentemente o valor de texto em subcampos. Por exemplo, uma data como 1 de setembro de 2024 seria representada da seguinte forma:

  normalizedValue": {
    "text": "2020-09-01",
    "dateValue": {
      "year": 2024,
      "month": 9,
      "day": 1
  }

Neste exemplo, a data foi normalizada de 2020/01/01 para 2020-01-01, um formato padronizado para reduzir o pós-processamento e permitir a conversão para o formato escolhido.

Os endereços também são frequentemente normalizados, o que divide os elementos do endereço em campos individuais. Os números são normalizados com um número inteiro ou de vírgula flutuante como o normalizedValue.

  • Enriquecimento: determinados processadores e campos também suportam o enriquecimento. Por exemplo, o supplier_name original no documento Google Singapore foi normalizado em relação ao Gráfico de Conhecimento empresarial para Google Asia Pacific, Singapore. Tenha também em atenção que, uma vez que o gráfico de conhecimento empresarial contém informações sobre a Google, a IA Documentos infere o supplier_address, mesmo que não estivesse presente no documento de exemplo.
  {
    "entities": [
      {
        "textAnchor": {
          "textSegments": [ ... ],
          "content": "Google Singapore"
        },
        "type": "supplier_name",
        "confidence": 0.39170802,
        "pageAnchor": { ... },
        "id": "12",
        "normalizedValue": {
          "text": "Google Asia Pacific, Singapore"
        }
      },
      {
        "type": "supplier_address",
        "id": "17",
        "normalizedValue": {
          "text": "70 Pasir Panjang Rd #03-71 Mapletree Business City II Singapore 117371",
          "addressValue": {
            "regionCode": "SG",
            "languageCode": "en-US",
            "postalCode": "117371",
            "addressLines": [
              "70 Pasir Panjang Rd",
              "#03-71 Mapletree Business City II"
            ]
          }
        }
      }
    ]
  }
  • Campos aninhados: pode criar um esquema (campos) aninhado declarando primeiro uma entidade como principal e, em seguida, criando entidades secundárias na entidade principal. A resposta de análise do elemento principal inclui os campos subordinados no elemento properties do campo principal. No exemplo seguinte, line_item é um campo principal que tem dois campos secundários: line_item/description e line_item/quantity.

    {
      "entities": [
        {
          "textAnchor": { ... },
          "type": "line_item",
          "confidence": 1.0,
          "pageAnchor": { ... },
          "id": "19",
          "properties": [
            {
              "textAnchor": {
                "textSegments": [ ... ],
                "content": "Tool A"
              },
              "type": "line_item/description",
              "confidence": 0.3461604,
              "pageAnchor": { ... },
              "id": "20"
            },
            {
              "textAnchor": {
                "textSegments": [ ... ],
                "content": "500"
              },
              "type": "line_item/quantity",
              "confidence": 0.8077843,
              "pageAnchor": { ... },
              "id": "21",
              "normalizedValue": {
                "text": "500"
              }
            }
          ]
        }
      ]
    }
    

Os seguintes analisadores sintáticos seguem-no:

  • Extrair (extrator personalizado)
  • Antigo
    • Analisador de extratos de conta
    • Analisador de despesas
    • Analisador de faturas
    • Analisador de recibos de vencimento
    • Analisador W2

Exemplos de código

Os seguintes exemplos de código demonstram como enviar um pedido de processamento e, em seguida, ler e imprimir os campos de um processador especializado no terminal:

Java

Para mais informações, consulte a documentação de referência da API Java Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


import com.google.cloud.documentai.v1beta3.Document;
import com.google.cloud.documentai.v1beta3.DocumentProcessorServiceClient;
import com.google.cloud.documentai.v1beta3.DocumentProcessorServiceSettings;
import com.google.cloud.documentai.v1beta3.ProcessRequest;
import com.google.cloud.documentai.v1beta3.ProcessResponse;
import com.google.cloud.documentai.v1beta3.RawDocument;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class ProcessSpecializedDocument {
  public static void processSpecializedDocument()
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String location = "your-project-location"; // Format is "us" or "eu".
    String processerId = "your-processor-id";
    String filePath = "path/to/input/file.pdf";
    processSpecializedDocument(projectId, location, processerId, filePath);
  }

  public static void processSpecializedDocument(
      String projectId, String location, String processorId, String filePath)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs
    // to be created
    // once, and can be reused for multiple requests. After completing all of your
    // requests, call
    // the "close" method on the client to safely clean up any remaining background
    // resources.
    String endpoint = String.format("%s-documentai.googleapis.com:443", location);
    DocumentProcessorServiceSettings settings =
        DocumentProcessorServiceSettings.newBuilder().setEndpoint(endpoint).build();
    try (DocumentProcessorServiceClient client = DocumentProcessorServiceClient.create(settings)) {
      // The full resource name of the processor, e.g.:
      // projects/project-id/locations/location/processor/processor-id
      // You must create new processors in the Cloud Console first
      String name =
          String.format("projects/%s/locations/%s/processors/%s", projectId, location, processorId);

      // Read the file.
      byte[] imageFileData = Files.readAllBytes(Paths.get(filePath));

      // Convert the image data to a Buffer and base64 encode it.
      ByteString content = ByteString.copyFrom(imageFileData);

      RawDocument document =
          RawDocument.newBuilder().setContent(content).setMimeType("application/pdf").build();

      // Configure the process request.
      ProcessRequest request =
          ProcessRequest.newBuilder().setName(name).setRawDocument(document).build();

      // Recognizes text entities in the PDF document
      ProcessResponse result = client.processDocument(request);
      Document documentResponse = result.getDocument();

      System.out.println("Document processing complete.");

      // Read fields specificly from the specalized US drivers license processor:
      // https://cloud.google.com/document-ai/docs/processors-list#processor_us-driver-license-parser
      // retriving data from other specalized processors follow a similar pattern.
      // For a complete list of processors see:
      // https://cloud.google.com/document-ai/docs/processors-list
      //
      // OCR and other data is also present in the quality processor's response.
      // Please see the OCR and other samples for how to parse other data in the
      // response.
      for (Document.Entity entity : documentResponse.getEntitiesList()) {
        // Fields detected. For a full list of fields for each processor see
        // the processor documentation:
        // https://cloud.google.com/document-ai/docs/processors-list
        String entityType = entity.getType();
        // some other value formats in addition to text are availible
        // e.g. dates: `entity.getNormalizedValue().getDateValue().getYear()`
        // check for normilized value with `entity.hasNormalizedValue()`
        String entityTextValue = escapeNewlines(entity.getTextAnchor().getContent());
        float entityConfidence = entity.getConfidence();
        System.out.printf(
            "    * %s: %s (%.2f%% confident)\n",
            entityType, entityTextValue, entityConfidence * 100.0);
      }
    }
  }

  private static String escapeNewlines(String s) {
    return s.replace("\n", "\\n").replace("\r", "\\r");
  }
}

Node.js

Para mais informações, consulte a documentação de referência da API Node.js Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
// const processorId = 'YOUR_PROCESSOR_ID'; // Create processor in Cloud Console
// const filePath = '/path/to/local/pdf';

const {DocumentProcessorServiceClient} =
  require('@google-cloud/documentai').v1beta3;

// Instantiates a client
const client = new DocumentProcessorServiceClient();

async function processDocument() {
  // The full resource name of the processor, e.g.:
  // projects/project-id/locations/location/processor/processor-id
  // You must create new processors in the Cloud Console first
  const name = `projects/${projectId}/locations/${location}/processors/${processorId}`;

  // Read the file into memory.
  const fs = require('fs').promises;
  const imageFile = await fs.readFile(filePath);

  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const request = {
    name,
    rawDocument: {
      content: encodedImage,
      mimeType: 'application/pdf',
    },
  };

  // Recognizes text entities in the PDF document
  const [result] = await client.processDocument(request);

  console.log('Document processing complete.');

  // Read fields specificly from the specalized US drivers license processor:
  // https://cloud.google.com/document-ai/docs/processors-list#processor_us-driver-license-parser
  // retriving data from other specalized processors follow a similar pattern.
  // For a complete list of processors see:
  // https://cloud.google.com/document-ai/docs/processors-list
  //
  // OCR and other data is also present in the quality processor's response.
  // Please see the OCR and other samples for how to parse other data in the
  // response.
  const {document} = result;
  for (const entity of document.entities) {
    // Fields detected. For a full list of fields for each processor see
    // the processor documentation:
    // https://cloud.google.com/document-ai/docs/processors-list
    const key = entity.type;
    // some other value formats in addition to text are availible
    // e.g. dates: `entity.normalizedValue.dateValue.year`
    const textValue =
      entity.textAnchor !== null ? entity.textAnchor.content : '';
    const conf = entity.confidence * 100;
    console.log(
      `* ${JSON.stringify(key)}: ${JSON.stringify(textValue)}(${conf.toFixed(
        2
      )}% confident)`
    );
  }
}

Python

Para mais informações, consulte a documentação de referência da API Python Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


from typing import Optional, Sequence

from google.api_core.client_options import ClientOptions
from google.cloud import documentai

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION" # Format is "us" or "eu"
# processor_id = "YOUR_PROCESSOR_ID" # Create processor before running sample
# processor_version = "rc" # Refer to https://cloud.google.com/document-ai/docs/manage-processor-versions for more information
# file_path = "/path/to/local/pdf"
# mime_type = "application/pdf" # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types


def process_document_entity_extraction_sample(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
) -> None:
    # Online processing request to Document AI
    document = process_document(
        project_id, location, processor_id, processor_version, file_path, mime_type
    )

    # Print extracted entities from entity extraction processor output.
    # For a complete list of processors see:
    # https://cloud.google.com/document-ai/docs/processors-list
    #
    # OCR and other data is also present in the processor's response.
    # Refer to the OCR samples for how to parse other data in the response.

    print(f"Found {len(document.entities)} entities:")
    for entity in document.entities:
        print_entity(entity)
        # Print Nested Entities (if any)
        for prop in entity.properties:
            print_entity(prop)




def print_entity(entity: documentai.Document.Entity) -> None:
    # Fields detected. For a full list of fields for each processor see
    # the processor documentation:
    # https://cloud.google.com/document-ai/docs/processors-list
    key = entity.type_

    # Some other value formats in addition to text are available
    # e.g. dates: `entity.normalized_value.date_value.year`
    text_value = entity.text_anchor.content or entity.mention_text
    confidence = entity.confidence
    normalized_value = entity.normalized_value.text
    print(f"    * {repr(key)}: {repr(text_value)} ({confidence:.1%} confident)")

    if normalized_value:
        print(f"    * Normalized Value: {repr(normalized_value)}")




def process_document(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
    process_options: Optional[documentai.ProcessOptions] = None,
) -> documentai.Document:
    # You must set the `api_endpoint` if you use a location other than "us".
    client = documentai.DocumentProcessorServiceClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-documentai.googleapis.com"
        )
    )

    # The full resource name of the processor version, e.g.:
    # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
    # You must create a processor before running this sample.
    name = client.processor_version_path(
        project_id, location, processor_id, processor_version
    )

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Configure the process request
    request = documentai.ProcessRequest(
        name=name,
        raw_document=documentai.RawDocument(content=image_content, mime_type=mime_type),
        # Only supported for Document OCR processor
        process_options=process_options,
    )

    result = client.process_document(request=request)

    # For a full list of `Document` object attributes, reference this page:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1/Document
    return result.document

Extrator de documentos personalizado

O processador Custom Document Extractor pode extrair entidades personalizadas de documentos que não têm um processador pré-treinado disponível. Isto pode ser conseguido através da preparação de um modelo personalizado ou da utilização de modelos básicos de IA generativa para extrair entidades com nomes sem qualquer preparação. Para mais informações, consulte o artigo Crie um extrator de documentos personalizado na consola.

  • Se preparar um modelo personalizado, o processador pode ser usado exatamente da mesma forma que um processador de extração de entidades pré-preparado.
  • Se usar um modelo base, pode criar uma versão do processador para extrair entidades específicas para cada pedido ou pode configurá-lo com base em cada pedido.

Para ver informações sobre a estrutura de saída, consulte o artigo Entidades, entidades aninhadas e valores normalizados.

Exemplos de código

Se estiver a usar um modelo personalizado ou tiver criado uma versão do processador com um modelo de base, use os exemplos de código de extração de entidades.

O seguinte exemplo de código demonstra como configurar entidades específicas para um extrator de documentos personalizado de um modelo base por pedido e imprimir as entidades extraídas:

Python

Para mais informações, consulte a documentação de referência da API Python Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


from typing import Optional, Sequence

from google.api_core.client_options import ClientOptions
from google.cloud import documentai

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION" # Format is "us" or "eu"
# processor_id = "YOUR_PROCESSOR_ID" # Create processor before running sample
# processor_version = "rc" # Refer to https://cloud.google.com/document-ai/docs/manage-processor-versions for more information
# file_path = "/path/to/local/pdf"
# mime_type = "application/pdf" # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types




def process_document_custom_extractor_sample(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
) -> None:
    # Entities to extract from Foundation Model CDE
    properties = [
        documentai.DocumentSchema.EntityType.Property(
            name="invoice_id",
            value_type="string",
            occurrence_type=documentai.DocumentSchema.EntityType.Property.OccurrenceType.REQUIRED_ONCE,
        ),
        documentai.DocumentSchema.EntityType.Property(
            name="notes",
            value_type="string",
            occurrence_type=documentai.DocumentSchema.EntityType.Property.OccurrenceType.OPTIONAL_MULTIPLE,
        ),
        documentai.DocumentSchema.EntityType.Property(
            name="terms",
            value_type="string",
            occurrence_type=documentai.DocumentSchema.EntityType.Property.OccurrenceType.OPTIONAL_MULTIPLE,
        ),
    ]
    # Optional: For Generative AI processors, request different fields than the
    # schema for a processor version
    process_options = documentai.ProcessOptions(
        schema_override=documentai.DocumentSchema(
            display_name="CDE Schema",
            description="Document Schema for the CDE Processor",
            entity_types=[
                documentai.DocumentSchema.EntityType(
                    name="custom_extraction_document_type",
                    base_types=["document"],
                    properties=properties,
                )
            ],
        )
    )

    # Online processing request to Document AI
    document = process_document(
        project_id,
        location,
        processor_id,
        processor_version,
        file_path,
        mime_type,
        process_options=process_options,
    )

    for entity in document.entities:
        print_entity(entity)
        # Print Nested Entities (if any)
        for prop in entity.properties:
            print_entity(prop)




def print_entity(entity: documentai.Document.Entity) -> None:
    # Fields detected. For a full list of fields for each processor see
    # the processor documentation:
    # https://cloud.google.com/document-ai/docs/processors-list
    key = entity.type_

    # Some other value formats in addition to text are available
    # e.g. dates: `entity.normalized_value.date_value.year`
    text_value = entity.text_anchor.content or entity.mention_text
    confidence = entity.confidence
    normalized_value = entity.normalized_value.text
    print(f"    * {repr(key)}: {repr(text_value)} ({confidence:.1%} confident)")

    if normalized_value:
        print(f"    * Normalized Value: {repr(normalized_value)}")




def process_document(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
    process_options: Optional[documentai.ProcessOptions] = None,
) -> documentai.Document:
    # You must set the `api_endpoint` if you use a location other than "us".
    client = documentai.DocumentProcessorServiceClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-documentai.googleapis.com"
        )
    )

    # The full resource name of the processor version, e.g.:
    # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
    # You must create a processor before running this sample.
    name = client.processor_version_path(
        project_id, location, processor_id, processor_version
    )

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Configure the process request
    request = documentai.ProcessRequest(
        name=name,
        raw_document=documentai.RawDocument(content=image_content, mime_type=mime_type),
        # Only supported for Document OCR processor
        process_options=process_options,
    )

    result = client.process_document(request=request)

    # For a full list of `Document` object attributes, reference this page:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1/Document
    return result.document

Resumo

O processador Summarizer usa modelos de base de IA generativa para resumir o texto extraído de um documento. Pode personalizar a duração e o formato da resposta das seguintes formas:

  • Duração
    • BRIEF: um breve resumo de uma ou duas frases
    • MODERATE: um resumo com o comprimento de um parágrafo
    • COMPREHENSIVE: a opção mais longa disponível
  • Formato

Pode criar uma versão do processador para um comprimento e um formato específicos ou configurá-la com base em cada pedido.

O texto resumido aparece em Document.entities.normalizedValue.text. Pode encontrar um ficheiro JSON de saída de exemplo completo em Saída do processador de exemplo.

Para mais informações, consulte o artigo Crie um resumidor de documentos na consola.

Exemplos de código

O seguinte exemplo de código demonstra como configurar um comprimento e um formato específicos num pedido de processamento e imprimir o texto resumido:

Python

Para mais informações, consulte a documentação de referência da API Python Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.

from typing import Optional

from google.api_core.client_options import ClientOptions
from google.cloud import documentai_v1beta3 as documentai


# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION" # Format is "us" or "eu"
# processor_id = "YOUR_PROCESSOR_ID" # Create processor before running sample
# processor_version = "rc" # Refer to https://cloud.google.com/document-ai/docs/manage-processor-versions for more information
# file_path = "/path/to/local/pdf"
# mime_type = "application/pdf" # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types

def process_document_summarizer_sample(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
) -> None:
    # For supported options, refer to:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1beta3/projects.locations.processors.processorVersions#summaryoptions
    summary_options = documentai.SummaryOptions(
        length=documentai.SummaryOptions.Length.BRIEF,
        format=documentai.SummaryOptions.Format.BULLETS,
    )

    properties = [
        documentai.DocumentSchema.EntityType.Property(
            name="summary",
            value_type="string",
            occurrence_type=documentai.DocumentSchema.EntityType.Property.OccurrenceType.REQUIRED_ONCE,
            property_metadata=documentai.PropertyMetadata(
                field_extraction_metadata=documentai.FieldExtractionMetadata(
                    summary_options=summary_options
                )
            ),
        )
    ]

    # Optional: Request specific summarization format other than the default
    # for the processor version.
    process_options = documentai.ProcessOptions(
        schema_override=documentai.DocumentSchema(
            entity_types=[
                documentai.DocumentSchema.EntityType(
                    name="summary_document_type",
                    base_types=["document"],
                    properties=properties,
                )
            ]
        )
    )

    # Online processing request to Document AI
    document = process_document(
        project_id,
        location,
        processor_id,
        processor_version,
        file_path,
        mime_type,
        process_options=process_options,
    )

    for entity in document.entities:
        print_entity(entity)
        # Print Nested Entities (if any)
        for prop in entity.properties:
            print_entity(prop)


def print_entity(entity: documentai.Document.Entity) -> None:
    # Fields detected. For a full list of fields for each processor see
    # the processor documentation:
    # https://cloud.google.com/document-ai/docs/processors-list
    key = entity.type_

    # Some other value formats in addition to text are availible
    # e.g. dates: `entity.normalized_value.date_value.year`
    text_value = entity.text_anchor.content
    confidence = entity.confidence
    normalized_value = entity.normalized_value.text
    print(f"    * {repr(key)}: {repr(text_value)}({confidence:.1%} confident)")

    if normalized_value:
        print(f"    * Normalized Value: {repr(normalized_value)}")


def process_document(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
    process_options: Optional[documentai.ProcessOptions] = None,
) -> documentai.Document:
    # You must set the `api_endpoint` if you use a location other than "us".
    client = documentai.DocumentProcessorServiceClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-documentai.googleapis.com"
        )
    )

    # The full resource name of the processor version, e.g.:
    # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
    # You must create a processor before running this sample.
    name = client.processor_version_path(
        project_id, location, processor_id, processor_version
    )

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Configure the process request
    request = documentai.ProcessRequest(
        name=name,
        raw_document=documentai.RawDocument(content=image_content, mime_type=mime_type),
        # Only supported for Document OCR processor
        process_options=process_options,
    )

    result = client.process_document(request=request)

    # For a full list of `Document` object attributes, reference this page:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1/Document
    return result.document

Divisão e classificação

Segue-se um PDF composto de 10 páginas que contém diferentes tipos de documentos e formulários:

Transferir PDF

Segue-se o objeto de documento completo devolvido pelo separador e classificador de documentos de empréstimo:

Transferir JSON

Cada documento detetado pelo separador é representado por um elemento entity. Por exemplo:

  {
    "entities": [
      {
        "textAnchor": {
          "textSegments": [
            {
              "startIndex": "13936",
              "endIndex": "21108"
            }
          ]
        },
        "type": "1040se_2020",
        "confidence": 0.76257163,
        "pageAnchor": {
          "pageRefs": [
            {
              "page": "6"
            },
            {
              "page": "7"
            }
          ]
        }
      }
    ]
  }
  • Entity.pageAnchor indica que este documento tem 2 páginas. Tenha em atenção que pageRefs[].page baseia-se em zero e é o índice no campo document.pages[].

  • Entity.type especifica que este documento é um formulário 1040 Schedule SE. Para ver uma lista completa dos tipos de documentos que podem ser identificados, consulte a secção Tipos de documentos identificados na documentação do processador.

Para mais informações, consulte o artigo Comportamento dos divisores de documentos.

Exemplos de código

Os divisores identificam os limites das páginas, mas não dividem o documento de entrada para si. Pode usar a caixa de ferramentas da Document AI para dividir fisicamente um ficheiro PDF usando os limites das páginas. Os seguintes exemplos de código imprimem os intervalos de páginas sem dividir o PDF:

Java

Para mais informações, consulte a documentação de referência da API Java Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


import com.google.cloud.documentai.v1beta3.Document;
import com.google.cloud.documentai.v1beta3.DocumentProcessorServiceClient;
import com.google.cloud.documentai.v1beta3.DocumentProcessorServiceSettings;
import com.google.cloud.documentai.v1beta3.ProcessRequest;
import com.google.cloud.documentai.v1beta3.ProcessResponse;
import com.google.cloud.documentai.v1beta3.RawDocument;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class ProcessSplitterDocument {
  public static void processSplitterDocument()
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String location = "your-project-location"; // Format is "us" or "eu".
    String processerId = "your-processor-id";
    String filePath = "path/to/input/file.pdf";
    processSplitterDocument(projectId, location, processerId, filePath);
  }

  public static void processSplitterDocument(
      String projectId, String location, String processorId, String filePath)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs
    // to be created
    // once, and can be reused for multiple requests. After completing all of your
    // requests, call
    // the "close" method on the client to safely clean up any remaining background
    // resources.
    String endpoint = String.format("%s-documentai.googleapis.com:443", location);
    DocumentProcessorServiceSettings settings =
        DocumentProcessorServiceSettings.newBuilder().setEndpoint(endpoint).build();
    try (DocumentProcessorServiceClient client = DocumentProcessorServiceClient.create(settings)) {
      // The full resource name of the processor, e.g.:
      // projects/project-id/locations/location/processor/processor-id
      // You must create new processors in the Cloud Console first
      String name =
          String.format("projects/%s/locations/%s/processors/%s", projectId, location, processorId);

      // Read the file.
      byte[] imageFileData = Files.readAllBytes(Paths.get(filePath));

      // Convert the image data to a Buffer and base64 encode it.
      ByteString content = ByteString.copyFrom(imageFileData);

      RawDocument document =
          RawDocument.newBuilder().setContent(content).setMimeType("application/pdf").build();

      // Configure the process request.
      ProcessRequest request =
          ProcessRequest.newBuilder().setName(name).setRawDocument(document).build();

      // Recognizes text entities in the PDF document
      ProcessResponse result = client.processDocument(request);
      Document documentResponse = result.getDocument();

      System.out.println("Document processing complete.");

      // Read the splitter output from the document splitter processor:
      // https://cloud.google.com/document-ai/docs/processors-list#processor_doc-splitter
      // This processor only provides text for the document and information on how
      // to split the document on logical boundaries. To identify and extract text,
      // form elements, and entities please see other processors like the OCR, form,
      // and specalized processors.
      List<Document.Entity> entities = documentResponse.getEntitiesList();
      System.out.printf("Found %d subdocuments:\n", entities.size());
      for (Document.Entity entity : entities) {
        float entityConfidence = entity.getConfidence();
        String pagesRangeText = pageRefsToString(entity.getPageAnchor().getPageRefsList());
        String subdocumentType = entity.getType();
        if (subdocumentType.isEmpty()) {
          System.out.printf(
              "%.2f%% confident that %s a subdocument.\n", entityConfidence * 100, pagesRangeText);
        } else {
          System.out.printf(
              "%.2f%% confident that %s a '%s' subdocument.\n",
              entityConfidence * 100, pagesRangeText, subdocumentType);
        }
      }
    }
  }

  // Converts page reference(s) to a string describing the page or page range.
  private static String pageRefsToString(List<Document.PageAnchor.PageRef> pageRefs) {
    if (pageRefs.size() == 1) {
      return String.format("page %d is", pageRefs.get(0).getPage() + 1);
    } else {
      long start = pageRefs.get(0).getPage() + 1;
      long end = pageRefs.get(1).getPage() + 1;
      return String.format("pages %d to %d are", start, end);
    }
  }
}

Node.js

Para mais informações, consulte a documentação de referência da API Node.js Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
// const processorId = 'YOUR_PROCESSOR_ID'; // Create processor in Cloud Console
// const filePath = '/path/to/local/pdf';

const {DocumentProcessorServiceClient} =
  require('@google-cloud/documentai').v1beta3;

// Instantiates a client
const client = new DocumentProcessorServiceClient();

async function processDocument() {
  // The full resource name of the processor, e.g.:
  // projects/project-id/locations/location/processor/processor-id
  // You must create new processors in the Cloud Console first
  const name = `projects/${projectId}/locations/${location}/processors/${processorId}`;

  // Read the file into memory.
  const fs = require('fs').promises;
  const imageFile = await fs.readFile(filePath);

  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const request = {
    name,
    rawDocument: {
      content: encodedImage,
      mimeType: 'application/pdf',
    },
  };

  // Recognizes text entities in the PDF document
  const [result] = await client.processDocument(request);

  console.log('Document processing complete.');

  // Read fields specificly from the specalized US drivers license processor:
  // https://cloud.google.com/document-ai/docs/processors-list#processor_us-driver-license-parser
  // retriving data from other specalized processors follow a similar pattern.
  // For a complete list of processors see:
  // https://cloud.google.com/document-ai/docs/processors-list
  //
  // OCR and other data is also present in the quality processor's response.
  // Please see the OCR and other samples for how to parse other data in the
  // response.
  const {document} = result;
  console.log(`Found ${document.entities.length} subdocuments:`);
  for (const entity of document.entities) {
    const conf = entity.confidence * 100;
    const pagesRange = pageRefsToRange(entity.pageAnchor.pageRefs);
    if (entity.type !== '') {
      console.log(
        `${conf.toFixed(2)}% confident that ${pagesRange} a "${
          entity.type
        }" subdocument.`
      );
    } else {
      console.log(
        `${conf.toFixed(2)}% confident that ${pagesRange} a subdocument.`
      );
    }
  }
}

// Converts a page ref to a string describing the page or page range.
const pageRefsToRange = pageRefs => {
  if (pageRefs.length === 1) {
    const num = parseInt(pageRefs[0].page) + 1 || 1;
    return `page ${num} is`;
  } else {
    const start = parseInt(pageRefs[0].page) + 1 || 1;
    const end = parseInt(pageRefs[1].page) + 1;
    return `pages ${start} to ${end} are`;
  }
};

Python

Para mais informações, consulte a documentação de referência da API Python Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


from typing import Optional, Sequence

from google.api_core.client_options import ClientOptions
from google.cloud import documentai

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION" # Format is "us" or "eu"
# processor_id = "YOUR_PROCESSOR_ID" # Create processor before running sample
# processor_version = "rc" # Refer to https://cloud.google.com/document-ai/docs/manage-processor-versions for more information
# file_path = "/path/to/local/pdf"
# mime_type = "application/pdf" # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types


def process_document_splitter_sample(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
) -> None:
    # Online processing request to Document AI
    document = process_document(
        project_id, location, processor_id, processor_version, file_path, mime_type
    )

    # Read the splitter output from a document splitter/classifier processor:
    # e.g. https://cloud.google.com/document-ai/docs/processors-list#processor_procurement-document-splitter
    # This processor only provides text for the document and information on how
    # to split the document on logical boundaries. To identify and extract text,
    # form elements, and entities please see other processors like the OCR, form,
    # and specalized processors.

    print(f"Found {len(document.entities)} subdocuments:")
    for entity in document.entities:
        conf_percent = f"{entity.confidence:.1%}"
        pages_range = page_refs_to_string(entity.page_anchor.page_refs)

        # Print subdocument type information, if available
        if entity.type_:
            print(
                f"{conf_percent} confident that {pages_range} a '{entity.type_}' subdocument."
            )
        else:
            print(f"{conf_percent} confident that {pages_range} a subdocument.")


def page_refs_to_string(
    page_refs: Sequence[documentai.Document.PageAnchor.PageRef],
) -> str:
    """Converts a page ref to a string describing the page or page range."""
    pages = [str(int(page_ref.page) + 1) for page_ref in page_refs]
    if len(pages) == 1:
        return f"page {pages[0]} is"
    else:
        return f"pages {', '.join(pages)} are"




def process_document(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
    process_options: Optional[documentai.ProcessOptions] = None,
) -> documentai.Document:
    # You must set the `api_endpoint` if you use a location other than "us".
    client = documentai.DocumentProcessorServiceClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-documentai.googleapis.com"
        )
    )

    # The full resource name of the processor version, e.g.:
    # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
    # You must create a processor before running this sample.
    name = client.processor_version_path(
        project_id, location, processor_id, processor_version
    )

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Configure the process request
    request = documentai.ProcessRequest(
        name=name,
        raw_document=documentai.RawDocument(content=image_content, mime_type=mime_type),
        # Only supported for Document OCR processor
        process_options=process_options,
    )

    result = client.process_document(request=request)

    # For a full list of `Document` object attributes, reference this page:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1/Document
    return result.document

O exemplo de código seguinte usa a Document AI Toolbox para dividir um ficheiro PDF usando os limites de página de um Document processado.

Python

Para mais informações, consulte a documentação de referência da API Python Document AI.

Para se autenticar no Document AI, configure as Credenciais padrão da aplicação. Para mais informações, consulte o artigo Configure a autenticação para um ambiente de desenvolvimento local.


from google.cloud.documentai_toolbox import document

# TODO(developer): Uncomment these variables before running the sample.
# Given a local document.proto or sharded document.proto from a splitter/classifier in path
# document_path = "path/to/local/document.json"
# pdf_path = "path/to/local/document.pdf"
# output_path = "resources/output/"


def split_pdf_sample(document_path: str, pdf_path: str, output_path: str) -> None:
    wrapped_document = document.Document.from_document_path(document_path=document_path)

    output_files = wrapped_document.split_pdf(
        pdf_path=pdf_path, output_path=output_path
    )

    print("Document Successfully Split")
    for output_file in output_files:
        print(output_file)

Caixa de ferramentas do Document AI

A caixa de ferramentas da IA Documental é um SDK para Python que fornece funções de utilidade para gerir, manipular e extrair informações da resposta do documento. Cria um objeto de documento "wrapped" a partir de uma resposta de documento processada de ficheiros JSON no Cloud Storage, ficheiros JSON locais ou saída diretamente do método process_document().

Pode realizar as seguintes ações:

Exemplos de código

Os exemplos de código seguintes demonstram como usar o Document AI Toolbox.

Início rápido

from typing import Optional

from google.cloud import documentai
from google.cloud.documentai_toolbox import document, gcs_utilities

# TODO(developer): Uncomment these variables before running the sample.
# Given a Document JSON or sharded Document JSON in path gs://bucket/path/to/folder
# gcs_bucket_name = "bucket"
# gcs_prefix = "path/to/folder"

# Or, given a Document JSON in path gs://bucket/path/to/folder/document.json
# gcs_uri = "gs://bucket/path/to/folder/document.json"

# Or, given a Document JSON in path local/path/to/folder/document.json
# document_path = "local/path/to/folder/document.json"

# Or, given a Document object from Document AI
# documentai_document = documentai.Document()

# Or, given a BatchProcessMetadata object from Document AI
# operation = client.batch_process_documents(request)
# operation.result(timeout=timeout)
# batch_process_metadata = documentai.BatchProcessMetadata(operation.metadata)

# Or, given a BatchProcessOperation name from Document AI
# batch_process_operation = "projects/project_id/locations/location/operations/operation_id"


def quickstart_sample(
    gcs_bucket_name: Optional[str] = None,
    gcs_prefix: Optional[str] = None,
    gcs_uri: Optional[str] = None,
    document_path: Optional[str] = None,
    documentai_document: Optional[documentai.Document] = None,
    batch_process_metadata: Optional[documentai.BatchProcessMetadata] = None,
    batch_process_operation: Optional[str] = None,
) -> document.Document:
    if gcs_bucket_name and gcs_prefix:
        # Load from Google Cloud Storage Directory
        print("Document structure in Cloud Storage")
        gcs_utilities.print_gcs_document_tree(
            gcs_bucket_name=gcs_bucket_name, gcs_prefix=gcs_prefix
        )

        wrapped_document = document.Document.from_gcs(
            gcs_bucket_name=gcs_bucket_name, gcs_prefix=gcs_prefix
        )
    elif gcs_uri:
        # Load a single Document from a Google Cloud Storage URI
        wrapped_document = document.Document.from_gcs_uri(gcs_uri=gcs_uri)
    elif document_path:
        # Load from local `Document` JSON file
        wrapped_document = document.Document.from_document_path(document_path)
    elif documentai_document:
        # Load from `documentai.Document` object
        wrapped_document = document.Document.from_documentai_document(
            documentai_document
        )
    elif batch_process_metadata:
        # Load Documents from `BatchProcessMetadata` object
        wrapped_documents = document.Document.from_batch_process_metadata(
            metadata=batch_process_metadata
        )
        wrapped_document = wrapped_documents[0]
    elif batch_process_operation:
        wrapped_documents = document.Document.from_batch_process_operation(
            location="us", operation_name=batch_process_operation
        )
        wrapped_document = wrapped_documents[0]
    else:
        raise ValueError("No document source provided.")

    # For all properties and methods, refer to:
    # https://cloud.google.com/python/docs/reference/documentai-toolbox/latest/google.cloud.documentai_toolbox.wrappers.document.Document

    print("Document Successfully Loaded!")
    print(f"\t Number of Pages: {len(wrapped_document.pages)}")
    print(f"\t Number of Entities: {len(wrapped_document.entities)}")

    for page in wrapped_document.pages:
        print(f"Page {page.page_number}")
        for block in page.blocks:
            print(block.text)
        for paragraph in page.paragraphs:
            print(paragraph.text)
        for line in page.lines:
            print(line.text)
        for token in page.tokens:
            print(token.text)

        # Only supported with Form Parser processor
        # https://cloud.google.com/document-ai/docs/form-parser
        for form_field in page.form_fields:
            print(f"{form_field.field_name} : {form_field.field_value}")

        # Only supported with Enterprise Document OCR version `pretrained-ocr-v2.0-2023-06-02`
        # https://cloud.google.com/document-ai/docs/process-documents-ocr#enable_symbols
        for symbol in page.symbols:
            print(symbol.text)

        # Only supported with Enterprise Document OCR version `pretrained-ocr-v2.0-2023-06-02`
        # https://cloud.google.com/document-ai/docs/process-documents-ocr#math_ocr
        for math_formula in page.math_formulas:
            print(math_formula.text)

    # Only supported with Entity Extraction processors
    # https://cloud.google.com/document-ai/docs/processors-list
    for entity in wrapped_document.entities:
        print(f"{entity.type_} : {entity.mention_text}")
        if entity.normalized_text:
            print(f"\tNormalized Text: {entity.normalized_text}")

    # Only supported with Layout Parser
    for chunk in wrapped_document.chunks:
        print(f"Chunk {chunk.chunk_id}: {chunk.content}")

    for block in wrapped_document.document_layout_blocks:
        print(f"Document Layout Block {block.block_id}")

        if block.text_block:
            print(f"{block.text_block.type_}: {block.text_block.text}")
        if block.list_block:
            print(f"{block.list_block.type_}: {block.list_block.list_entries}")
        if block.table_block:
            print(block.table_block.header_rows, block.table_block.body_rows)

Tabelas


from google.cloud.documentai_toolbox import document

# TODO(developer): Uncomment these variables before running the sample.
# Given a local document.proto or sharded document.proto in path
# document_path = "path/to/local/document.json"
# output_file_prefix = "output/table"


def table_sample(document_path: str, output_file_prefix: str) -> None:
    wrapped_document = document.Document.from_document_path(document_path=document_path)

    print("Tables in Document")
    for page in wrapped_document.pages:
        for table_index, table in enumerate(page.tables):
            # Convert table to Pandas Dataframe
            # Refer to https://pandas.pydata.org/docs/reference/frame.html for all supported methods
            df = table.to_dataframe()
            print(df)

            output_filename = f"{output_file_prefix}-{page.page_number}-{table_index}"

            # Write Dataframe to CSV file
            df.to_csv(f"{output_filename}.csv", index=False)

            # Write Dataframe to HTML file
            df.to_html(f"{output_filename}.html", index=False)

            # Write Dataframe to Markdown file
            df.to_markdown(f"{output_filename}.md", index=False)

BigQuery Export


from google.cloud.documentai_toolbox import document

# TODO(developer): Uncomment these variables before running the sample.
# Given a document.proto or sharded document.proto in path gs://bucket/path/to/folder
# gcs_bucket_name = "bucket"
# gcs_prefix = "path/to/folder"
# dataset_name = "test_dataset"
# table_name = "test_table"
# project_id = "YOUR_PROJECT_ID"


def entities_to_bigquery_sample(
    gcs_bucket_name: str,
    gcs_prefix: str,
    dataset_name: str,
    table_name: str,
    project_id: str,
) -> None:
    wrapped_document = document.Document.from_gcs(
        gcs_bucket_name=gcs_bucket_name, gcs_prefix=gcs_prefix
    )

    job = wrapped_document.entities_to_bigquery(
        dataset_name=dataset_name, table_name=table_name, project_id=project_id
    )

    # Also supported:
    # job = wrapped_document.form_fields_to_bigquery(
    #     dataset_name=dataset_name, table_name=table_name, project_id=project_id
    # )

    print("Document entities loaded into BigQuery")
    print(f"Job ID: {job.job_id}")
    print(f"Table: {job.destination.path}")

PDF dividido


from google.cloud.documentai_toolbox import document

# TODO(developer): Uncomment these variables before running the sample.
# Given a local document.proto or sharded document.proto from a splitter/classifier in path
# document_path = "path/to/local/document.json"
# pdf_path = "path/to/local/document.pdf"
# output_path = "resources/output/"


def split_pdf_sample(document_path: str, pdf_path: str, output_path: str) -> None:
    wrapped_document = document.Document.from_document_path(document_path=document_path)

    output_files = wrapped_document.split_pdf(
        pdf_path=pdf_path, output_path=output_path
    )

    print("Document Successfully Split")
    for output_file in output_files:
        print(output_file)

Extração de imagens


from google.cloud.documentai_toolbox import document

# TODO(developer): Uncomment these variables before running the sample.
# Given a local document.proto or sharded document.proto from an identity processor in path
# document_path = "path/to/local/document.json"
# output_path = "resources/output/"
# output_file_prefix = "exported_photo"
# output_file_extension = "png"


def export_images_sample(
    document_path: str,
    output_path: str,
    output_file_prefix: str,
    output_file_extension: str,
) -> None:
    wrapped_document = document.Document.from_document_path(document_path=document_path)

    output_files = wrapped_document.export_images(
        output_path=output_path,
        output_file_prefix=output_file_prefix,
        output_file_extension=output_file_extension,
    )
    print("Images Successfully Exported")
    for output_file in output_files:
        print(output_file)

Conversão de visão


from google.cloud.documentai_toolbox import document

# TODO(developer): Uncomment these variables before running the sample.
# Given a document.proto or sharded document.proto in path gs://bucket/path/to/folder
# gcs_bucket_name = "bucket"
# gcs_prefix = "path/to/folder"


def convert_document_to_vision_sample(
    gcs_bucket_name: str,
    gcs_prefix: str,
) -> None:
    wrapped_document = document.Document.from_gcs(
        gcs_bucket_name=gcs_bucket_name, gcs_prefix=gcs_prefix
    )

    # Converting wrapped_document to vision AnnotateFileResponse
    annotate_file_response = (
        wrapped_document.convert_document_to_annotate_file_response()
    )

    print("Document converted to AnnotateFileResponse!")
    print(
        f"Number of Pages : {len(annotate_file_response.responses[0].full_text_annotation.pages)}"
    )

Conversão de hOCR


from google.cloud.documentai_toolbox import document

# TODO(developer): Uncomment these variables before running the sample.
# Given a document.proto or sharded document.proto in path gs://bucket/path/to/folder
# document_path = "path/to/local/document.json"
# document_title = "your-document-title"


def convert_document_to_hocr_sample(document_path: str, document_title: str) -> str:
    wrapped_document = document.Document.from_document_path(document_path=document_path)

    # Converting wrapped_document to hOCR format
    hocr_string = wrapped_document.export_hocr_str(title=document_title)

    print("Document converted to hOCR!")
    return hocr_string

Conversão de terceiros


from google.cloud.documentai_toolbox import converter

# TODO(developer): Uncomment these variables before running the sample.
# This sample will convert external annotations to the Document.json format used by Document AI Workbench for training.
# To process this the external annotation must have these type of objects:
#       1) Type
#       2) Text
#       3) Bounding Box (bounding boxes must be 1 of the 3 optional types)
#
# This is the bare minimum requirement to convert the annotations but for better accuracy you will need to also have:
#       1) Document width & height
#
# Bounding Box Types:
#   Type 1:
#       bounding_box:[{"x":1,"y":2},{"x":2,"y":2},{"x":2,"y":3},{"x":1,"y":3}]
#   Type 2:
#       bounding_box:{ "Width": 1, "Height": 1, "Left": 1, "Top": 1}
#   Type 3:
#       bounding_box: [1,2,2,2,2,3,1,3]
#
#   Note: If these types are not sufficient you can propose a feature request or contribute the new type and conversion functionality.
#
# Given a folders in gcs_input_path with the following structure :
#
# gs://path/to/input/folder
#   ├──test_annotations.json
#   ├──test_config.json
#   └──test.pdf
#
# An example of the config is in sample-converter-configs/Azure/form-config.json
#
# location = "us",
# processor_id = "my_processor_id"
# gcs_input_path = "gs://path/to/input/folder"
# gcs_output_path = "gs://path/to/input/folder"


def convert_external_annotations_sample(
    location: str,
    processor_id: str,
    project_id: str,
    gcs_input_path: str,
    gcs_output_path: str,
) -> None:
    converter.convert_from_config(
        project_id=project_id,
        location=location,
        processor_id=processor_id,
        gcs_input_path=gcs_input_path,
        gcs_output_path=gcs_output_path,
    )

Documentos em lote


from google.cloud import documentai
from google.cloud.documentai_toolbox import gcs_utilities

# TODO(developer): Uncomment these variables before running the sample.
# Given unprocessed documents in path gs://bucket/path/to/folder
# gcs_bucket_name = "bucket"
# gcs_prefix = "path/to/folder"
# batch_size = 50


def create_batches_sample(
    gcs_bucket_name: str,
    gcs_prefix: str,
    batch_size: int = 50,
) -> None:
    # Creating batches of documents for processing
    batches = gcs_utilities.create_batches(
        gcs_bucket_name=gcs_bucket_name, gcs_prefix=gcs_prefix, batch_size=batch_size
    )

    print(f"{len(batches)} batch(es) created.")
    for batch in batches:
        print(f"{len(batch.gcs_documents.documents)} files in batch.")
        print(batch.gcs_documents.documents)

        # Use as input for batch_process_documents()
        # Refer to https://cloud.google.com/document-ai/docs/send-request
        # for how to send a batch processing request
        request = documentai.BatchProcessRequest(
            name="processor_name", input_documents=batch
        )
        print(request)

Unir fragmentos de documentos


from google.cloud import documentai
from google.cloud.documentai_toolbox import document

# TODO(developer): Uncomment these variables before running the sample.
# Given a document.proto or sharded document.proto in path gs://bucket/path/to/folder
# gcs_bucket_name = "bucket"
# gcs_prefix = "path/to/folder"
# output_file_name = "path/to/folder/file.json"


def merge_document_shards_sample(
    gcs_bucket_name: str, gcs_prefix: str, output_file_name: str
) -> None:
    wrapped_document = document.Document.from_gcs(
        gcs_bucket_name=gcs_bucket_name, gcs_prefix=gcs_prefix
    )

    merged_document = wrapped_document.to_merged_documentai_document()

    with open(output_file_name, "w") as f:
        f.write(documentai.Document.to_json(merged_document))

    print(f"Document with {len(wrapped_document.shards)} shards successfully merged.")