本頁說明如何使用 Google Cloud CLI gcloud 指令列工具建立 Dataproc 叢集、在叢集中執行 Apache Spark 工作,然後修改叢集中的工作站數量。
如要瞭解如何執行相同或類似的工作,請參閱「使用 API Explorer 的快速入門導覽課程」、「使用 Google Cloud 控制台建立 Dataproc 叢集」和「使用用戶端程式庫建立 Dataproc 叢集」。 Google Cloud
事前準備
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
Install the Google Cloud CLI.
-
若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI。
-
執行下列指令,初始化 gcloud CLI:
gcloud init -
Create or select a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the Dataproc API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.gcloud services enable dataproc.googleapis.com
-
Install the Google Cloud CLI.
-
若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI。
-
執行下列指令,初始化 gcloud CLI:
gcloud init -
Create or select a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the Dataproc API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.gcloud services enable dataproc.googleapis.com
-
Dataproc 編輯者 (
roles/dataproc.editor) 專案 -
Compute Engine 預設服務帳戶的「服務帳戶使用者」 (
roles/iam.serviceAccountUser) - 這項工作會在
example-cluster執行。 class包含 SparkPi 的主要方法,可計算pi的近似值。完整性- 這個 JAR 檔案包含工作程式碼。
1000是工作參數。這項參數會指定工作執行的任務 (疊代) 數量,以計算pi的值。- 如要刪除
example-cluster,請執行clusters delete指令:gcloud dataproc clusters delete example-cluster \ --region=REGION
- 瞭解如何編寫及執行 Spark Scala 工作。
必要的角色
如要執行本頁的範例,您必須具備特定 IAM 角色。視機構政策而定,這些角色可能已獲授權。如要檢查角色授予情形,請參閱「是否需要授予角色?」一節。
如要進一步瞭解如何授予角色,請參閱「管理專案、資料夾和機構的存取權」。
使用者角色
如要取得建立 Dataproc 叢集所需的權限,請要求管理員授予下列 IAM 角色:
服務帳戶角色
為確保 Compute Engine 預設服務帳戶具備建立 Dataproc 叢集所需的權限,請要求管理員在專案中,將 Dataproc 工作站 (roles/dataproc.worker) IAM 角色授予 Compute Engine 預設服務帳戶。
建立叢集
如要建立名為 example-cluster 的叢集,請執行下列gcloud Dataproc clusters create指令。
gcloud dataproc clusters create example-cluster --region=REGION
更改下列內容:
REGION:指定叢集所在的區域。
提交工作
如要提交計算 pi 粗略值的範例 Spark 工作,請執行下列 gcloud Dataproc jobs submit spark 指令:
gcloud dataproc jobs submit spark --cluster example-cluster \ --region=REGION \ --class org.apache.spark.examples.SparkPi \ --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000
注意:
更改下列內容:
REGION:指定叢集區域。
終端機視窗會顯示工作執行中和最終輸出的內容:
Waiting for job output... ... Pi is roughly 3.14118528 ... Job finished successfully.
更新叢集
如要將叢集中的工作站數量變更為 5,請執行下列指令:
gcloud dataproc clusters update example-cluster \ --region=REGION \ --num-workers 5
指令輸出內容會顯示叢集詳細資料:
workerConfig: ... instanceNames: - example-cluster-w-0 - example-cluster-w-1 - example-cluster-w-2 - example-cluster-w-3 - example-cluster-w-4 numInstances: 5 statusHistory: ... - detail: Add 3 workers.
如要將工作站節點數減至原始值 2,請執行下列指令:
gcloud dataproc clusters update example-cluster \ --region=REGION \ --num-workers 2
清除所用資源
如要避免系統向您的 Google Cloud 帳戶收取本頁面所用資源的費用,請按照下列步驟操作。