在 A3 Mega 虛擬機器上，使用 Megatron-LM 訓練 Llama2

標準

總覽

本快速入門導覽課程將說明如何在 A3 Mega 上，執行以容器為基礎的 Megatron-LM PyTorch 工作負載。您可以在這個 GitHub 存放區找到程式碼：megatron-gke。

事前準備

請按照下列步驟啟用 Google Kubernetes Engine (GKE) API：

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

Make sure that you have the following role or roles on the project: roles/container.admin, roles/compute.networkAdmin, roles/iam.serviceAccountUser
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  前往 IAM
2. 選取所需專案。
3. 按一下「Grant access」(授予存取權)。
4. 在「New principals」(新增主體) 欄位中，輸入您的使用者 ID。這通常是指 Google 帳戶的電子郵件地址。
5. 按一下「選取角色」，然後搜尋角色。
6. 如要授予其他角色，請按一下「Add another role」(新增其他角色)，然後新增其他角色。
7. 按一下「Save」(儲存)。

在 A3 Mega 虛擬機器上，使用 Megatron-LM 訓練 Llama2

總覽

事前準備

Check for the roles

Grant the roles

建立 A3 Mega 叢集

設定環境

使用拓撲感知排程器部署 Pod

執行工作負載

建構 Dockerfile 並推送至 Google Cloud Artifact Registry

啟動 Megatron-LM Llama2 基準測試

清除所用資源

刪除 GKE 叢集：

刪除 Cloud Storage 值區

後續步驟