Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

布局解析器快速入门

使用布局解析器从文档中提取元素，例如文本、表格和列表。

如需在 Google Cloud 控制台中直接遵循有关此任务的分步指导，请点击操作演示：

操作演示

准备工作

登录您的 Google Cloud 账号。如果您是 Google Cloud新手，请创建一个账号来评估我们的产品在实际场景中的表现。新客户还可获享 $300 赠金，用于运行、测试和部署工作负载。

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Document AI, Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Document AI, Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

创建处理器

在 Google Cloud 控制台的 Document AI 部分，选择处理器库。

处理器库
在处理器库中，搜索布局解析器，然后选择创建。
在侧边窗口中，输入处理器名称，例如 quickstart-layout-processor。
选择离您最近的区域。
点击创建。

系统会将您转到新表单解析器处理器的处理器详情页面。
可选：点击管理版本，然后从版本表中选择一个处理器，以选择默认处理器。接下来，点击标记为默认，并通过输入处理器名称进行确认。

测试处理器

创建处理器后，您可以向该处理器发送注解请求。

下载示例文档。
点击上传测试文档按钮，然后选择您刚刚下载的文档。
您现在看到的应该是布局解析器分析页面。您可以查看从文档中解析出的块，这些块按检测到的类型进行整理。

注意：文档视图仅适用于 PDF 文件类型。
可选：选择 修改布局配置 以启用图片或表格注解数据。

处理文档

REST

此示例展示了如何将存储在 Cloud Storage 中的文档发送到布局解析器进行处理。此流程默认启用图片和表格注解。

REST

在使用任何请求数据之前，请先进行以下替换：

PROJECT_ID：您的 Google Cloud 项目 ID。
LOCATION：处理器的位置，例如：
- us - 美国
- eu - 欧盟
PROCESSOR_ID：自定义处理器的 ID。
MIME_TYPE：布局解析器支持 application/pdf 和 text/html。
GCS_FILE_PATH：包含文档的 Cloud Storage 存储桶的文件路径。
CHUNK_SIZE：可选。拆分文档时使用的块大小（以 token 为单位）。
INCLUDE_ANCESTOR_HEADINGS：可选。布尔值。在拆分文档时是否包含祖先标题。

HTTP 方法和网址：

POST https://LOCATION-documentai.googleapis.com/v1beta3/projects/PROJECT_ID/locations/LOCATION/processors/PROCESSOR_ID/processorVersions/pretrained-layout-parser-v1.5-2025-08-25:process

请求 JSON 正文：

{
  "gcsDocument": {
    "gcsUri": "GCS_FILE_PATH",
    "mimeType": "MIME_TYPE"
  },
  "processOptions": {
    "layoutConfig": {
      "enableTableAnnotation": "true",
      "enableImageAnnotation": "true",
      "chunkingConfig": {
        "chunkSize": "CHUNK_SIZE",
        "includeAncestorHeadings": "INCLUDE_ANCESTOR_HEADINGS",
      }
    }
  }
}

如需发送请求，请选择以下方式之一：

curl

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-documentai.googleapis.com/v1beta3/projects/PROJECT_ID/locations/LOCATION/processors/PROCESSOR_ID/processorVersions/pretrained-layout-parser-v1.5-2025-08-25:process"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-documentai.googleapis.com/v1beta3/projects/PROJECT_ID/locations/LOCATION/processors/PROCESSOR_ID/processorVersions/pretrained-layout-parser-v1.5-2025-08-25:process" | Select-Object -Expand Content

您应该会收到一个成功的状态代码 (2xx) 和一个空响应。

查看输出

如果请求成功，则会返回 JSON 格式的文档对象。对于检索增强生成 (RAG)，最重要的字段是 document.chunked_document.chunks。

以下是解析 A.A. 米尔恩的《小熊维尼》第三页后的输出。

{
  "document": {
  document_layout {
    blocks {
      block_id: "1"
      text_block {
        text: "WE ARE INTRODUCED 3"
        type_: "header"
      }
      page_span {
        page_start: 1
        page_end: 1
      }
    }
    blocks {
      block_id: "2"
      page_span {
        page_start: 1
        page_end: 1
      }
      image_block {
        mime_type: "image/png"
        annotations {
          description: "This is an ink drawing depicting Winnie-the-Pooh sitting outside his house.\n\nHere are the facts and conclusions that can be derived from the image:\n\n*   **Character:** The central figure is a bear, identifiable as Winnie-the-Pooh, sitting on a log.\n*   **Location:** He is positioned outside what appears to be a small, rustic shelter or house.\n*   **Signage:** Above the doorway of the shelter, there is a sign that reads \"MR SANDERZ\". Below this sign, there is another partial sign visible, where the letters \"RNIG\" and \"ALSO\" can be seen.\n*   **Doorbell:** To the left of the doorway, a bell is hanging, indicating a doorbell mechanism.\n*   **Setting:** The dwelling is surrounded by what looks like brush, trees, and general wilderness, suggested by the lines representing foliage and twigs.\n*   **Log:** Pooh is seated on a cut log or tree trunk. To the left of this log, there are other smaller logs or branches piled up.\n*   **Style:** The image is a black and white line drawing, characteristic of classic book illustrations."
        }
        blob_asset_id: "blob_1"
      }
    }
    blocks {
      block_id: "3"
      text_block {
        text: ""Winnie-the-Pooh wasn't quite sure," said Christopher Robin. "Now I am," said a growly voice. "Then I will go on,"said I.) One day when he was out walking, he came to an open place in the middle of the forest, and in the middle of this place was a large oak-tree, and, from the top of the tree, there came a loud buzzing-noise. Winnie-the-Pooh sat down at the foot of the tree,put his head between his paws and began to think."
        type_: "paragraph"
      }
      page_span {
        page_start: 1
        page_end: 1
      }
    }
    blocks {
      block_id: "4"
      text_block {
        text: "Digitized by Google"
        type_: "footer"
      }
      page_span {
        page_start: 1
        page_end: 1
      }
    }
  }
  chunked_document {
    chunks {
      chunk_id: "c1"
      source_block_ids: "2"
      source_block_ids: "3"
      content: "__START_OF_ANNOTATION__This is an ink drawing depicting Winnie-the-Pooh sitting outside his house.\n\nHere are the facts and conclusions that can be derived from the image:\n\n*   **Character:** The central figure is a bear, identifiable as Winnie-the-Pooh, sitting on a log.\n*   **Location:** He is positioned outside what appears to be a small, rustic shelter or house.\n*   **Signage:** Above the doorway of the shelter, there is a sign that reads \"MR SANDERZ\". Below this sign, there is another partial sign visible, where the letters \"RNIG\" and \"ALSO\" can be seen.\n*   **Doorbell:** To the left of the doorway, a bell is hanging, indicating a doorbell mechanism.\n*   **Setting:** The dwelling is surrounded by what looks like brush, trees, and general wilderness, suggested by the lines representing foliage and twigs.\n*   **Log:** Pooh is seated on a cut log or tree trunk. To the left of this log, there are other smaller logs or branches piled up.\n*   **Style:** The image is a black and white line drawing, characteristic of classic book illustrations.__END_OF_ANNOTATION__"Winnie-the-Pooh wasn't quite sure," said Christopher Robin. "Now I am," said a growly voice. "Then I will go on," said I.) One day when he was out walking, he came to an open place in the middle of the forest, and in the middle of this place was a large oak-tree, and, from the top of the tree, there came a loud buzzing-noise. Winnie-the-Pooh sat down at the foot of the tree,put his head between his paws and began to think."
      page_span {
        page_start: 1
        page_end: 1
      }
      page_headers {
        text: "WE ARE INTRODUCED 3"
        page_span {
          page_start: 1
          page_end: 1
        }
      }
      page_footers {
        text: "Digitized by Google"
        page_span {
          page_start: 1
          page_end: 1
        }
      }
      chunk_fields {
        image_chunk_field {
          blob_asset_id: "blob_1"
          annotations {
            description: "This is an ink drawing depicting Winnie-the-Pooh sitting outside his house.\n\nHere are the facts and conclusions that can be derived from the image:\n\n*   **Character:** The central figure is a bear, identifiable as Winnie-the-Pooh, sitting on a log.\n*   **Location:** He is positioned outside what appears to be a small, rustic shelter or house.\n*   **Signage:** Above the doorway of the shelter, there is a sign that reads \"MR SANDERZ\". Below this sign, there is another partial sign visible, where the letters \"RNIG\" and \"ALSO\" can be seen.\n*   **Doorbell:** To the left of the doorway, a bell is hanging, indicating a doorbell mechanism.\n*   **Setting:** The dwelling is surrounded by what looks like brush, trees, and general wilderness, suggested by the lines representing foliage and twigs.\n*   **Log:** Pooh is seated on a cut log or tree trunk. To the left of this log, there are other smaller logs or branches piled up.\n*   **Style:** The image is a black and white line drawing, characteristic of classic book illustrations."
          }
        }
      }
    }
  }
  blob_assets {
    asset_id: "blob_1"
    content: "image_bytes"
    mime_type: "image/png"
  }
}

Python

本指南介绍了如何使用 Python 客户端库处理文档。使用此代码可默认启用图片和表格注解。

安装客户端库。

! pip install --upgrade --quiet google-cloud-documentai

运行处理器。


def process_layout_parser(
    project_id: str, location: str, processor_id: str, gcs_uri: str, mime_type: str
):
    """
    Processes a document with the layout parser and prints chunk text.
    """
    client = documentai.DocumentProcessorServiceClient()

    # The full resource name of the processor
    processor_version_id = 'pretrained-layout-parser-v1.5-2025-08-25'
    name = client.processor_path(project_id, location, processor_id, processor_version_id)

    # Configure the Cloud Storage document
    gcs_document = documentai.GcsDocument(gcs_uri=gcs_uri, mime_type=mime_type)

    # Configure processing options for RAG
    # This enables annotation and context-aware chunking.
    process_options = documentai.ProcessOptions(
      # Process only specific pages
      layout_config=documentai.ProcessOptions.LayoutConfig(
          enable_table_annotation=True,
          enable_image_annotation=True,
          chunking_config=documentai.ProcessOptions.LayoutConfig.ChunkingConfig(
        chunk_size=1024,
        include_ancestor_headings=True,
          ),
      ),
  )

    # Build the request
    request = documentai.ProcessRequest(
        name=name,
        gcs_document=gcs_document,
        process_options=process_options,
    )

    # Process the document
    result = client.process_document(request=request)
    document = result.document

    print(f"Document processing complete.\n")

    print("--- RAG-Ready Chunks (with context) ---")
    for i, chunk in enumerate(document.chunked_document.chunks):
        print(f"\n--- Chunk {i} ---")

        # Print the chunk's content
        print(f"Text: {chunk.content}")
    return result

使用布局解析器批处理文档

使用以下过程在单个请求中解析多个文档并将其分块。

向布局解析器输入文档以进行解析和分块。
请按照发送处理请求中的批处理请求说明操作。
在发出 batchProcess 请求时，在 ProcessOptions.layoutConfig 中配置字段。
输入
以下 JSON 示例配置了 ProcessOptions.layoutConfig。
"processOptions": { "layoutConfig": { "enableTableAnnotation": "true", "enableImageAnnotation": "true", "chunkingConfig": { "chunkSize": "CHUNK_SIZE", "includeAncestorHeadings": "INCLUDE_ANCESTOR_HEADINGS_BOOLEAN" } } }
替换以下内容：
- CHUNK_SIZE：拆分文档时使用的最大块大小（以令牌数量表示）。
- INCLUDE_ANCESTOR_HEADINGS_BOOLEAN：在拆分文档时是否包含祖先标题。祖先标题是原始文档中子标题的父级。它们可以提供一个块，其中包含有关其在原始文档中的位置的其他上下文。一个块最多可包含两个级别的标题。

清理

为避免因本页中使用的资源导致您的 Google Cloud 账号产生费用，请按照以下步骤操作。

为避免产生不必要的 Google Cloud 费用，请使用Google Cloud console 删除您不需要的处理器和项目。

如果您为了解 Document AI 创建了一个新项目，但现在不再需要该项目，[请删除项目][delete-project]。

如果您使用的是现有 Google Cloud 项目，请删除您创建的资源，以避免您的账号产生费用：

在 Google Cloud 控制台导航菜单中，依次选择 Document AI 和我的处理器。
选择要删除的处理器所在行中的更多操作。
选择删除处理器，输入处理器名称，然后再次选择删除进行确认。

后续步骤

如需了解详情，请参阅指南。

Gemini 布局解析器

Enterprise Document OCR

布局解析器快速入门

准备工作

创建处理器

测试处理器

处理文档

REST

REST

curl

PowerShell

查看输出

Python

使用布局解析器批处理文档

输入

清理

后续步骤