사전 구성된 아키텍처로 GKE에서 오픈 LLM 서빙

Autopilot Standard

이 페이지에서는 사전 구성된 프로덕션에 즉시 사용 가능한 참조 아키텍처를 사용해, 추론을 위해 GKE에서 인기 있는 오픈 대규모 언어 모델(LLM)을 빠르게 배포하고 서빙하는 방법을 설명합니다. 이 접근 방식은 CLI 스크립트로 감싼 Terraform을 포함한 코드형 인프라(IaC)를 사용해, AI 추론 워크로드를 위해 설계된 표준화되고 안전하며 확장 가능한 GKE 환경을 만듭니다.

이 가이드에서는 vLLM 서빙 프레임워크와 단일 호스트 GPU 노드를 사용해 GKE에서 LLM을 배포하고 서빙합니다. 이 가이드는 다음 오픈 모델을 배포하기 위한 안내와 구성을 제공합니다.

이 가이드는 추론용 오픈 모델 서빙을 위해 Kubernetes 컨테이너 조정 기능을 살펴보고자 하는 머신러닝(ML) 엔지니어와 데이터 및 AI 전문가를 대상으로 합니다. Google Cloud 콘텐츠에서 언급된 일반적인 역할 및 예시 태스크에 대해 자세히 알아보려면 일반 GKE 사용자 역할 및 태스크를 참조하세요.

이들 오픈 모델의 모델 서빙 성능 및 비용에 대한 자세한 분석을 위해 GKE Inference Quickstart 도구를 사용할 수도 있습니다. 자세한 내용은 GKE Inference Quickstart 가이드와 함께 제공되는 Colab 노트북을 참조하세요.

시작하기 전에

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the required APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: roles/artifactregistry.admin, roles/browser, roles/compute.networkAdmin, roles/container.clusterAdmin, roles/iam.serviceAccountAdmin, roles/resourcemanager.projectIamAdmin, and roles/serviceusage.serviceUsageAdmin
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  IAM으로 이동
2. 프로젝트를 선택합니다.
3. 액세스 권한 부여를 클릭합니다.
4. 새 주 구성원 필드에 사용자 식별자를 입력합니다. 일반적으로 Google 계정의 이메일 주소입니다.
5. 역할 선택 목록에서 역할을 선택합니다.
6. 역할을 추가로 부여하려면 다른 역할 추가를 클릭하고 각 역할을 추가합니다.
7. 저장을 클릭합니다.

사전 구성된 아키텍처로 GKE에서 오픈 LLM 서빙

시작하기 전에

Check for the roles

Grant the roles

모델 액세스 권한 얻기

GKE 추론 환경 프로비저닝

Cloud Shell 실행

기본 아키텍처 배포

Autopilot

Standard

오픈 모델 배포

Gemma 3 27B-it

Llama 4 Scout 17B-16E-Instruct

Qwen3 32B

gpt-oss 20B

배포 테스트

삭제

Autopilot

Standard

다음 단계

사전 구성된 아키텍처로 GKE에서 오픈 LLM 서빙 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

시작하기 전에

Check for the roles

Grant the roles

모델 액세스 권한 얻기

GKE 추론 환경 프로비저닝

Cloud Shell 실행

기본 아키텍처 배포

Autopilot

Standard

오픈 모델 배포

Gemma 3 27B-it

Llama 4 Scout 17B-16E-Instruct

Qwen3 32B

gpt-oss 20B

배포 테스트

삭제

Autopilot

Standard

다음 단계

사전 구성된 아키텍처로 GKE에서 오픈 LLM 서빙