总结内容

ai.summarize 函数利用 Gemini 等大语言模型 (LLM) 将输入文本改写并重组为最精简的形式。此函数可捕获文档的核心论点,同时保留原始作者的语气和细微差别。借助此函数,您可以从海量数据集中提取关键发现和结论,而无需手动审核,并且可以通过在提示中指定字数或句子限制等约束条件来控制输出长度。您还可以使用批处理或基于游标的快照来异步处理数百万行数据。

ai.summarize

AlloyDB AI 内容摘要功能支持各种使用场景,包括但不限于以下场景:

  • 会议转写:总结对话文本,重点关注决策和待办事项。
  • 技术文档:将复杂的方法、结果和影响浓缩为概要介绍。
  • 客户支持日志:将多条用户评价或支持工单汇总为关于常见问题的统一摘要。

准备工作

在使用 ai.summarize 函数之前,请确保满足以下要求。

启用扩展程序

确保您已安装最新版本的 google_ml_integration.enable_preview_ai_functions 扩展程序(版本 1.5.7 或更高版本),并且已启用预览版功能。

如需在 AlloyDB 中启用 google_ml_integration.enable_preview_ai_functions 标志,请使用 SET 命令。此标志控制对 ai.summarize 等预览版 AI 函数的访问权限。

  1. 确保您的 google_ml_integration extension 为 1.5.7 或更高版本。 您可以通过运行以下命令来查看版本:

    SELECT extversion FROM pg_extension WHERE extname = 'google_ml_integration';
    

    如果您需要升级到包含这些预览版函数的版本,请调用以下命令:

    CALL google_ml.upgrade_to_preview_version();
    
  2. 为当前会话或整个数据库启用标志。如需为当前会话启用标志,请执行以下命令:

    SET google_ml_integration.enable_preview_ai_functions = 'on';
    

    此更改不需要重启数据库。此标志的默认值为 off

创建示例表

如需按照本文档中的摘要函数示例进行操作,请创建一个表并使用以下电影评价填充该表。

CREATE TABLE movie_reviews (
    id INT PRIMARY KEY,
    movie_id INT,
    review TEXT
);

INSERT INTO movie_reviews (id, movie_id, review) VALUES
(1, 1, 'The Midnight Echo represents a masterclass in atmospheric tension. The cinematography captures the isolation of the frozen tundra perfectly while the lead actor delivers a superb performance. Although the pacing slows down significantly in the second act, the final twist remains unexpected. This is a truly remarkable experience.'),
(2, 1, 'Neon Velocity delivers high octane action but lacks the narrative depth required to make the stakes feel real. The visual effects are undeniably stunning, creating a vibrant cyberpunk world that pops off the screen. However, the dialogue feels cliched and the secondary characters are thin. This film is a shallow blockbuster.'),

-- more rows are present in this table

总结单个字符串

如需处理单个文本块,请使用 ai.summarize 的标量版本。

SELECT ai.summarize(
  prompt => 'TEXT_CONTENT',
  model_id => 'MODEL_ID' -- Optional
);

运行以下示例查询,以提供电影评价的简洁摘要:

SELECT ai.summarize(review) FROM movie_reviews;

以下是示例输出:

 id |                         summary
----+-----------------------------------------------------------
  1 | "Atmospheric thriller with a haunting performance and shocking twist."
  2 | "Visually stunning cyberpunk action film lacks narrative depth."
...

批量总结

如需高效处理多条记录,请使用基于数组的版本。

SELECT ai.summarize(
  prompts => TEXT[],
  batch_size => INT, -- Optional. The default is 10.
  model_id => VARCHAR(100) -- Optional. The default value is gemini-2.5-flash-lite.
);

以下示例展示了如何使用 ai.summarize 函数的基于数组的版本,以便在一次调用中高效处理多条文本记录。

WITH summarized_results AS (
SELECT
  ARRAY_AGG(id ORDER BY id) as ids,
  ai.summarize(
    prompts => array_agg( 'Please summarize this in max 10 words, review : ' || review
      ORDER BY id),
    batch_size => 15) as summaries
FROM movie_reviews
),
correlated_results AS (
    SELECT ids[i] as ID, summaries[i] as summary
    FROM summarized_results,
    generate_series(1, array_length(ids, 1)) AS i
)
SELECT movie_reviews.id, correlated_results.summary as summary
FROM movie_reviews
JOIN correlated_results ON movie_reviews.id = correlated_results.id
ORDER BY movie_reviews.id DESC ;

以下是示例输出:

 id | summary
---+----------------------------------------------------------------------------
  1 | "Masterclass in tension, haunting performance with a twist."
  2 | "High octane action, stunning visuals, shallow blockbuster."
  3 | "Gentle, moving exploration of  emotional honesty."

使用游标进行总结

您可以使用游标高效处理可能包含大量评价的数据集,将这些评价馈送到 AI 函数中,然后逐个处理结果,将其存储回数据库中。

以下示例展示了如何将游标与 ai.summarize 函数搭配使用,以便高效处理 movie_reviews 表中的行,为每条评价生成摘要,并将这些摘要存储在名为 review_summaries 的新表中。这种基于游标的方法对于处理大型数据集非常有用,这些数据集可能太大,无法在单个批次中处理或一次性加载到内存中。

input_cursor 参数接受 REFCURSOR。这意味着您需要提供游标名称,该名称就像是指向 SQL 查询结果的指针。然后,ai.summarize 函数会从此游标中提取数据,以用作摘要的输入。

CREATE OR REPLACE FUNCTION ai.summarize(
  prompt TEXT,
  input_cursor REFCURSOR,
  batch_size INT DEFAULT NULL,
  model_id VARCHAR(100) DEFAULT NULL)
RETURNS REFCURSOR

以下示例会创建一个表 review_summaries(如果需要),然后遍历电影评价,使用 AI 函数为每条评价生成简洁摘要,并将摘要与其原始评价 ID 相关联后存储在该表中。

-- Create a table to store the results
CREATE TABLE IF NOT EXISTS review_summaries (
    review_id INT, 
    summary_text TEXT
);

DO $$
DECLARE
    -- Use descriptive cursor and variable names
    review_cursor   REFCURSOR;
    result_record   RECORD;
    cursor_response  REFCURSOR;
    id_array        INT[];
    idx             INT := 1; 
BEGIN
    -- 1. Open cursor for the input text
    OPEN review_cursor FOR 
        SELECT review AS prompt FROM movie_reviews ORDER BY id;

    -- 2. Call the AI summarize function
    cursor_response := ai.summarize(
        prompt       => 'Please summarize the following review in max 10 words: ',
        input_cursor => review_cursor
    );

    -- 3. Map IDs into an array to maintain relational integrity during the loop
    SELECT ARRAY_AGG(id ORDER BY id) INTO id_array FROM movie_reviews;

    -- 4. Iterate through AI results and insert into the results table
    LOOP
        FETCH cursor_response INTO result_record;
        EXIT WHEN NOT FOUND;

        INSERT INTO review_summaries (review_id, summary_text)
        VALUES (id_array[idx], result_record.output);

        idx := idx + 1;
    END LOOP;

    -- 5. Clean up cursors
    CLOSE review_cursor;
    CLOSE cursor_response;
END;
$$;

-- Verify results
SELECT * FROM review_summaries;

以下是示例输出:

 review_id | summary_text
-----------+---------------------------------------------------------------------------
         1 | "Masterclass in atmospheric tension with haunting performance, devastating twist."
         2 | "High octane action, stunning visuals, but shallow blockbuster."
         3 | "Gentle, moving exploration of grief with poetic screenplay."

跨行汇总摘要

ai.agg_summarize 函数可跨列中的多行运行。它将值汇总到统一的提示中,以便为整个组生成单个摘要。

以下 SQL 查询使用 ai.agg_summarize 聚合函数,根据 movie_reviews 表中的所有评价,为每个 movie_id 生成单个统一的摘要。

select ai.agg_summarize(review) from movie_reviews group by movie_id;

与单独总结每一行的 ai.summarize 不同,ai.agg_summarize 会将多行中的文本合并为单个输入,以便为整个行组生成一个整合的摘要。

以下是示例输出:

agg_summarize
"The Midnight Echo is a masterclass in atmospheric tension, featuring haunting performances and stunning cinematography, though its pacing falters in the second act before a final twist. Neon Velocity offers stunning visual effects and high-octane action within a vibrant cyberpunk world, but suffers from shallow narrative depth, cliched dialogue, and underdeveloped characters. Garden of Whispers is a gentle, moving, and emotionally honest exploration of and life, characterized by poetic screenwriting and natural performances. Shadow Protocol ambitiously attempts to reinvent the spy thriller with a non-linear narrative and precise action, but ultimately confuses the audience and presents a derivative central mystery. The Last Alchemist uniquely blends historical drama with subtle fantasy, boasting meticulous production design and electric chemistry between its protagonists."

后续步骤