迁移到 BigQuery DataFrames 版本 2.0

BigQuery DataFrames 版本 2.0 对 BigQuery DataFrames API 进行了安全性和性能改进，添加了新功能，并引入了重大变更。本文档介绍了这些变更并提供了迁移指南。您可以使用最新的 1.x 版 BigQuery DataFrames 在安装 2.0 版之前应用这些建议。

BigQuery DataFrames 版本 2.0 具有以下优势：

当您运行将结果返回给客户端的查询时，由于 allow_large_results 默认为 False，因此查询速度更快，创建的表也更少。这种设计可以降低存储费用，尤其是在您使用物理字节数结算时。
默认情况下，BigQuery DataFrames 部署的远程函数具有更高的安全性。

安装 BigQuery DataFrames 版本 2.0

为避免出现重大变更，请在 requirements.txt 文件（例如 bigframes==1.42.0）或 pyproject.toml 文件（例如 dependencies = ["bigframes = 1.42.0"]）中将 BigQuery DataFrames 固定到特定版本。准备好尝试最新版本后，您可以运行 pip install --upgrade bigframes 来安装最新版本的 BigQuery DataFrames。

使用 `allow_large_results` 选项

BigQuery 对查询作业设有响应大小上限。从 BigQuery DataFrames 2.0 版开始，BigQuery DataFrames 会在将结果返回给客户端的方法（例如 peek()、to_pandas() 和 to_pandas_batches()）中默认强制执行此限制。如果作业返回的结果较大，您可以在 BigQueryOptions 对象中将 allow_large_results 设置为 True，以避免出现重大变更。在 BigQuery DataFrames 版本 2.0 中，此选项默认设置为 False。

import bigframes.pandas as bpd

bpd.options.bigquery.allow_large_results = True

您可以使用 to_pandas() 和其他方法中的 allow_large_results 参数替换 allow_large_results 选项。例如：

bf_df = bpd.read_gbq(query)
# ... other operations on bf_df ...
pandas_df = bf_df.to_pandas(allow_large_results=True)

使用 `@remote_function` 修饰器

BigQuery DataFrames 版本 2.0 对 @remote_function 修饰器的默认行为进行了一些更改。

对于不明确的形参，强制使用关键字实参

为防止将值传递给意外的参数，BigQuery DataFrames 版本 2.0 及更高版本强制要求为以下参数使用关键字实参：

bigquery_connection
reuse
name
packages
cloud_function_service_account
cloud_function_kms_key_name
cloud_function_docker_repository
max_batching_rows
cloud_function_timeout
cloud_function_max_instances
cloud_function_vpc_connector
cloud_function_memory_mib
cloud_function_ingress_settings

使用这些参数时，请提供参数名称。例如：

@remote_function(
  name="my_remote_function",
  ...
)
def my_remote_function(parameter: int) -> str:
  return str(parameter)

设置服务账号

从版本 2.0 开始，BigQuery DataFrames 不再默认使用 Compute Engine 服务账号来部署 Cloud Run 函数。如需限制所部署函数的权限，请执行以下操作：

创建具有最小权限的服务账号。
向 @remote_function 修饰器的 cloud_function_service_account 参数提供服务账号邮箱。

例如：

@remote_function(
  cloud_function_service_account="my-service-account@my-project.iam.gserviceaccount.com",
  ...
)
def my_remote_function(parameter: int) -> str:
  return str(parameter)

如果您想使用 Compute Engine 服务账号，可以将 @remote_function 修饰器的 cloud_function_service_account 参数设置为 "default"。例如：

# This usage is discouraged. Use only if you have a specific reason to use the
# default Compute Engine service account.
@remote_function(cloud_function_service_account="default", ...)
def my_remote_function(parameter: int) -> str:
  return str(parameter)

设置入站流量设置

从版本 2.0 开始，BigQuery DataFrames 会设置部署到 "internal-only" 的Cloud Run 函数的入站流量设置。以前，默认情况下，入站流量设置会设置为 "all"。您可以通过设置 @remote_function 修饰器的 cloud_function_ingress_settings 参数来更改入站流量设置。例如：

@remote_function(cloud_function_ingress_settings="internal-and-gclb", ...)
def my_remote_function(parameter: int) -> str:
  return str(parameter)

使用自定义端点

在低于 2.0 的 BigQuery DataFrames 版本中，如果某个区域不支持区域服务端点和 bigframes.pandas.options.bigquery.use_regional_endpoints = True，则 BigQuery DataFrames 会回退到位置端点。BigQuery DataFrames 版本 2.0 移除了此回退行为。如需在 2.0 版中连接到位置端点，请设置 bigframes.pandas.options.bigquery.client_endpoints_override 选项。例如：

import bigframes.pandas as bpd

bpd.options.bigquery.client_endpoints_override = {
  "bqclient": "https://LOCATION-bigquery.googleapis.com",
  "bqconnectionclient": "LOCATION-bigqueryconnection.googleapis.com",
  "bqstoragereadclient": "LOCATION-bigquerystorage.googleapis.com",
}

将 LOCATION 替换为您要连接到的 BigQuery 位置的名称。

使用 `bigframes.ml.llm` 模块

在 BigQuery DataFrames 2.0 版中，GeminiTextGenerator 的默认 model_name 已更新为 "gemini-2.0-flash-001"。建议您直接提供 model_name，以避免将来默认模型发生变化时出现中断。

import bigframes.ml.llm

model = bigframes.ml.llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")

后续步骤

了解如何使用 BigQuery DataFrames 直观呈现图表。
了解如何使用 Gemini 生成 BigQuery DataFrames 代码。
了解如何使用 BigQuery DataFrames 分析通过 PyPI 进行的软件包下载情况。
在 GitHub 上查看 BigQuery DataFrames 的源代码、示例笔记本和示例。
探索 BigQuery DataFrames API 参考文档。