Cloud TPU에서 PyTorch를 사용하여 Resnet50 학습

이 튜토리얼에서는 Cloud TPU 기기에서 PyTorch를 사용하여 ResNet-50 모델을 학습시키는 방법을 보여줍니다. PyTorch 및 ImageNet 데이터 세트를 사용하는 다른 TPU 최적화 이미지 분류 모델에 같은 패턴을 적용할 수 있습니다.

이 튜토리얼의 모델은 최초로 레지듀얼 네트워크(ResNet) 아키텍처를 도입한 이미지 인식을 위한 딥 레지듀얼 학습을 바탕으로 합니다. 이 튜토리얼에서는 50 레이어 변형판 ResNet-50을 사용하며 PyTorch/XLA를 통한 모델 학습을 보여줍니다.

목표

데이터 세트를 준비합니다.
학습 작업을 실행합니다.
출력 결과를 확인합니다.

비용

이 문서에서는 비용이 청구될 수 있는 Google Cloud구성요소를 사용합니다.

Compute Engine
Cloud TPU

프로젝트 사용량을 기준으로 예상 비용을 산출하려면 가격 계산기를 사용합니다.

Google Cloud 신규 사용자는 무료 체험판을 사용할 수 있습니다.

시작하기 전에

이 튜토리얼을 시작하기 전에 Google Cloud 프로젝트가 올바르게 설정되었는지 확인하세요.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

이 둘러보기에서는 비용이 청구될 수 있는 Google Cloud구성요소를 사용합니다. 예상 비용은 Cloud TPU 가격 책정 페이지를 확인하세요. 리소스 사용을 마쳤으면 불필요한 비용이 청구되지 않도록 생성한 리소스를 삭제하세요.

TPU VM 만들기

Cloud Shell 창을 엽니다.

Cloud Shell 열기
TPU VM을 만듭니다.
```
gcloud compute tpus tpu-vm create your-tpu-name \
--accelerator-type=v3-8 \
--version=tpu-ubuntu2204-base \
--zone=us-central1-a \
--project=your-project
```
참고: 새 Cloud Shell VM에서 명령어를 처음 실행하면 Authorize Cloud Shell 페이지가 표시됩니다. 페이지 하단에 있는 Authorize를 클릭하여 gcloud에서 사용자 인증 정보로 Google Cloud API를 호출하도록 허용합니다.

SSH를 사용하여 TPU VM에 연결합니다.

gcloud compute tpus tpu-vm ssh  your-tpu-name --zone=us-central1-a

TPU VM에 PyTorch/XLA를 설치합니다.

(vm)$ pip install torch torch_xla[tpu] torchvision -f https://storage.googleapis.com/libtpu-releases/index.html -f https://storage.googleapis.com/libtpu-wheels/index.html

PyTorch/XLA GitHub 저장소를 클론합니다.

(vm)$ git clone --depth=1 https://github.com/pytorch/xla.git

가짜 데이터로 학습 스크립트를 실행합니다.

(vm) $ PJRT_DEVICE=TPU python3 xla/test/test_train_mp_imagenet.py --fake_data --batch_size=256 --num_epochs=1

삭제

이 튜토리얼에서 사용된 리소스 비용이 Google Cloud 계정에 청구되지 않도록 하려면 리소스가 포함된 프로젝트를 삭제하거나 프로젝트는 유지하되 개별 리소스를 삭제하세요.

TPU VM에서 연결을 해제합니다.
```
(vm) $ exit
```
프롬프트가 username@projectname으로 바뀌면 Cloud Shell에 있는 것입니다.

TPU VM을 삭제합니다.

$ gcloud compute tpus tpu-vm delete your-tpu-name \
   --zone=us-central1-a

Cloud TPU에서 PyTorch를 사용하여 Resnet50 학습 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

목표

비용

시작하기 전에

TPU VM 만들기

삭제

다음 단계

Cloud TPU에서 PyTorch를 사용하여 Resnet50 학습