在笔记本中探索查询结果

您可以使用 BigQuery 中的 Colab Enterprise 笔记本来探索 BigQuery 查询结果。

在本教程中，您将查询 BigQuery 公共数据集中的数据，并在笔记本中探索查询结果。

目标

在 BigQuery 中创建和运行查询。
在笔记本中探索查询结果。

费用

本教程使用通过 Google Cloud 公共数据集计划提供的数据集。Google 会支付这些数据集的存储费用，并提供对数据的公开访问权限。您需要为数据执行的查询付费。如需了解详情，请参阅 BigQuery 价格。

准备工作

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.
Enable the API

对于新项目，系统会自动启用 BigQuery。

设置代码资源的默认区域

如果这是您第一次创建代码资源，您应设置代码资源的默认区域。代码资源创建后，便无法更改该区域。

BigQuery Studio 中的所有代码资源都使用相同的默认区域。如需为代码资源设置默认区域，请按以下步骤操作：

转到 BigQuery 页面。

转到 BigQuery
在探索器窗格中，找到启用了代码资源的项目。
点击项目旁边的 查看操作，然后点击更改我的默认代码区域。
对于区域，选择您要用于代码资源的区域。
点击选择。

如需查看支持的区域列表，请参阅 BigQuery Studio 位置。

所需权限

如需创建和运行笔记本，您需要以下 Identity and Access Management (IAM) 角色：

在笔记本中打开查询结果

您可以运行 SQL 查询，然后使用笔记本来探索数据。如果您想要在使用数据之前先在 BigQuery 中修改数据，或者如果您只需要表中的部分字段，则此方法会非常有用。

在 Google Cloud 控制台中，前往 BigQuery 页面。

转到 BigQuery
在输入内容即可搜索字段中，输入 bigquery-public-data。

如果未显示该项目，请在搜索字段中输入 bigquery，然后点击搜索所有项目，将搜索字符串与现有项目匹配。
选择 bigquery-public-data > ml_datasets > penguins。
对于 penguins 表，点击 查看操作，然后点击查询。
在生成的查询中添加星号 (*)，以便选择字段，如下所示：
```
SELECT * FROM `bigquery-public-data.ml_datasets.penguins` LIMIT 1000;
```
点击运行。
在查询结果部分中，点击打开方式，然后点击笔记本。

准备好笔记本以供使用

通过连接到运行时并设置应用默认值来准备好笔记本，以供使用。

在笔记本标头中，点击连接以连接到默认运行时。
在设置代码块中，点击 运行单元。

探索数据

若要将 penguins 数据加载到 BigQuery DataFrame 中并显示结果，请单击从 BigQuery 作业加载为 DataFrame 的结果集部分的代码块中的 运行单元。
如需获取有关数据的描述性指标，请点击使用 describe() 显示描述性统计信息部分的代码块中的 运行单元。
可选：使用其他 Python 函数或软件包来探索和分析数据。

以下代码示例展示了使用 bigframes.pandas 分析数据，以及使用 bigframes.ml 根据 BigQuery DataFrame 中的 penguins 数据创建线性回归模型：

import bigframes.pandas as bpd

# Load data from BigQuery
query_or_table = "bigquery-public-data.ml_datasets.penguins"
bq_df = bpd.read_gbq(query_or_table)

# Inspect one of the columns (or series) of the DataFrame:
bq_df["body_mass_g"]

# Compute the mean of this series:
average_body_mass = bq_df["body_mass_g"].mean()
print(f"average_body_mass: {average_body_mass}")

# Find the heaviest species using the groupby operation to calculate the
# mean body_mass_g:
(
    bq_df["body_mass_g"]
    .groupby(by=bq_df["species"])
    .mean()
    .sort_values(ascending=False)
    .head(10)
)

# Create the Linear Regression model
from bigframes.ml.linear_model import LinearRegression

# Filter down to the data we want to analyze
adelie_data = bq_df[bq_df.species == "Adelie Penguin (Pygoscelis adeliae)"]

# Drop the columns we don't care about
adelie_data = adelie_data.drop(columns=["species"])

# Drop rows with nulls to get our training data
training_data = adelie_data.dropna()

# Pick feature columns and label column
X = training_data[
    [
        "island",
        "culmen_length_mm",
        "culmen_depth_mm",
        "flipper_length_mm",
        "sex",
    ]
]
y = training_data[["body_mass_g"]]

model = LinearRegression(fit_intercept=False)
model.fit(X, y)
model.score(X, y)

清理

为避免因本教程中使用的资源导致您的 Google Cloud 账号产生费用，请删除包含这些资源的项目，或者保留项目但删除各个资源。

若要避免产生费用，最简单的方法是删除您为本教程创建的 Google Cloud 项目。

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

后续步骤

详细了解在 BigQuery 中创建笔记本。
详细了解如何使用 BigQuery DataFrames 探索数据。