試用 BigQuery DataFrames
在本快速入門導覽課程中,您將在 BigQuery 筆記本中使用 BigQuery DataFrames API,執行下列分析和機器學習 (ML) 工作:
- 在公開資料集上建立 DataFrame。bigquery-public-data.ml_datasets.penguins
- 計算企鵝的平均體重。
- 建立線性迴歸模型。
- 在企鵝資料的子集上建立 DataFrame,做為訓練資料。
- 清理訓練資料。
- 設定模型參數。
- 調整模型。
- 評估模型。
事前準備
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
- 
    
    
      In the Google Cloud console, on the project selector page, select or create a Google Cloud project. Roles required to select or create a project - Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- 
      Create a project: To create a project, you need the Project Creator
      (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
 
- 
    
    
      In the Google Cloud console, on the project selector page, select or create a Google Cloud project. Roles required to select or create a project - Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- 
      Create a project: To create a project, you need the Project Creator
      (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
 
- 
  
    Verify that billing is enabled for your Google Cloud project. 
- 確認已啟用 BigQuery API。 - 如果您建立新專案,系統會自動啟用 BigQuery API。 
- BigQuery 使用者 (roles/bigquery.user)
- 筆記本執行階段使用者 (roles/aiplatform.notebookRuntimeUser)
- 程式碼建立工具 (roles/dataform.codeCreator)
- 在筆記本中建立新的程式碼儲存格。
- 在程式碼儲存格中新增下列程式碼: - import bigframes.pandas as bpd # Set BigQuery DataFrames options # Note: The project option is not required in all environments. # On BigQuery Studio, the project ID is automatically detected. bpd.options.bigquery.project = your_gcp_project_id # Use "partial" ordering mode to generate more efficient queries, but the # order of the rows in DataFrames may not be deterministic if you have not # explictly sorted it. Some operations that depend on the order, such as # head() will not function until you explictly order the DataFrame. Set the # ordering mode to "strict" (default) for more pandas compatibility. bpd.options.bigquery.ordering_mode = "partial" # Create a DataFrame from a BigQuery table query_or_table = "bigquery-public-data.ml_datasets.penguins" df = bpd.read_gbq(query_or_table) # Efficiently preview the results using the .peek() method. df.peek()
- 修改 - bpd.options.bigquery.project = your_gcp_project_id行,指定專案 ID。 Google Cloud 例如:- bpd.options.bigquery.project = "myProjectID"。
- 執行程式碼儲存格。 - 程式碼會傳回 - DataFrame物件,其中包含企鵝的相關資料。
- 在筆記本中建立新的程式碼儲存格,並新增下列程式碼: - # Use the DataFrame just as you would a pandas DataFrame, but calculations # happen in the BigQuery query engine instead of the local system. average_body_mass = df["body_mass_g"].mean() print(f"average_body_mass: {average_body_mass}")
- 執行程式碼儲存格。 - 這段程式碼會計算企鵝的平均體重,並將結果列印到控制台。Google Cloud 
- 在筆記本中建立新的程式碼儲存格,並新增下列程式碼: - # Create the Linear Regression model from bigframes.ml.linear_model import LinearRegression # Filter down to the data we want to analyze adelie_data = df[df.species == "Adelie Penguin (Pygoscelis adeliae)"] # Drop the columns we don't care about adelie_data = adelie_data.drop(columns=["species"]) # Drop rows with nulls to get our training data training_data = adelie_data.dropna() # Pick feature columns and label column X = training_data[ [ "island", "culmen_length_mm", "culmen_depth_mm", "flipper_length_mm", "sex", ] ] y = training_data[["body_mass_g"]] model = LinearRegression(fit_intercept=False) model.fit(X, y) model.score(X, y)
- 執行程式碼儲存格。 - 程式碼會傳回模型的評估指標。 
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
- 繼續瞭解如何使用 BigQuery DataFrames。
- 瞭解如何使用 BigQuery DataFrame 繪製圖表。
- 瞭解如何使用 BigQuery DataFrames 筆記本。
所需權限
如要建立及執行 Notebook,您需要下列 Identity and Access Management (IAM) 角色:
建立筆記本
按照「從 BigQuery 編輯器建立筆記本」一文中的操作說明,建立新的筆記本。
試用 BigQuery DataFrames
如要試用 BigQuery DataFrames,請按照下列步驟操作:
清除所用資源
如要避免付費,最簡單的方法就是刪除您為了本教學課程所建立的專案。
如要刪除專案: