커스텀 가중치를 적용한 모델 배포

커스텀 가중치로 모델을 배포하는 것은 프리뷰로 제공됩니다. 사전 정의된 기본 모델 세트를 기반으로 모델을 미세 조정하고 맞춤설정된 모델을 Vertex AI Model Garden에 배포할 수 있습니다. Vertex AI에서 클릭 한 번으로 모델 아티팩트를 프로젝트의 Cloud Storage 버킷에 업로드하여 커스텀 가중치 가져오기를 사용해 커스텀 모델을 배포할 수 있습니다.

맞춤 가중치에 대한 VPC 서비스 제어를 사용할 수 있습니다.

지원되는 모델

커스텀 가중치로 모델 배포 공개 프리뷰는 다음 기본 모델에서 지원됩니다.

모델 이름	버전
Llama	Llama-2: 7B, 13B Llama-3.1: 8B, 70B Llama-3.2: 1B, 3B Llama-4: Scout-17B, Maverick-17B CodeLlama-13B
Gemma	Gemma-2: 9B, 27B Gemma-3: 1B, 4B, 3-12B, 27B Medgemma: 4B, 27B-text
Qwen	Qwen2: 1.5B Qwen2.5: 0.5B, 1.5B, 7B, 32B Qwen3: 0.6B, 1.7B, 4B, 8B, 32B, Qwen3-Coder-480B-A35B-Instruct, Qwen3-Next-80B-A3B-Instruct, Qwen3-Next-80B-A3B-Thinking
Deepseek	Deepseek-R1 Deepseek-V3 DeepSeek-V3.1
Mistral 및 Mixtral	Mistral-7B-v0.1 Mixtral-8x7B-v0.1 Mistral-Nemo-Base-2407
Phi-4	Phi-4-reasoning
OpenAI OSS	gpt-oss: 20B, 120B

제한사항

커스텀 가중치는 양자화된 모델의 가져오기를 지원하지 않습니다.

모델 파일

Hugging Face 가중치 형식으로 모델 파일을 제공해야 합니다. Hugging Face 가중치 형식에 관한 자세한 내용은 Hugging Face 모델 사용을 참고하세요.

필수 파일이 제공되지 않으면 모델 배포가 실패할 수 있습니다.

이 표에는 모델 아키텍처에 따라 달라지는 모델 파일 유형이 나와 있습니다.

모델 파일 콘텐츠	파일 형식
모델 구성	`config.json`
모델 가중치	`.safetensors` `.bin`
가중치 색인	`*.index.json`
토크나이저 파일	`tokenizer.model` `tokenizer.json` `tokenizer_config.json`

위치

Model Garden 서비스에서 모든 리전에 커스텀 모델을 배포할 수 있습니다.

기본 요건

이 섹션에서는 커스텀 모델을 배포하는 방법을 보여줍니다.

시작하기 전에

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

이 튜토리얼에서는 Cloud Shell을 사용하여 Google Cloud와 상호작용한다고 가정합니다. Cloud Shell 대신 다른 셸을 사용하려면 다음 추가 구성을 수행하세요.

Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
```
gcloud init
```

커스텀 모델 배포

이 섹션에서는 커스텀 모델을 배포하는 방법을 보여줍니다.

명령줄 인터페이스(CLI), Python 또는 JavaScript를 사용하는 경우 다음 변수를 코드 샘플이 작동하는 값으로 바꿉니다.

REGION: 사용자의 리전. 예를 들면 uscentral1입니다.
MODEL_GCS: Google Cloud 모델. 예를 들면 gs://custom-weights-fishfooding/meta-llama/Llama-3.2-1B-Instruct입니다.
PROJECT_ID: 프로젝트 ID.
MODEL_ID: 모델 ID
MACHINE_TYPE: 머신 유형. 예를 들면 g2-standard-12입니다.
ACCELERATOR_TYPE: 가속기 유형. 예를 들면 NVIDIA_L4입니다.
ACCELERATOR_COUNT: 가속기 수
PROMPT: 텍스트 프롬프트

콘솔

다음 단계에서는 Google Cloud 콘솔을 사용하여 커스텀 가중치로 모델을 배포하는 방법을 보여줍니다.

Google Cloud 콘솔에서 Model Garden 페이지로 이동합니다.

Model Garden으로 이동
커스텀 가중치로 모델 배포를 클릭합니다. 커스텀 가중치로 모델 배포 창이 표시됩니다.
모델 소스 섹션에서 다음을 수행합니다.
1. 찾아보기를 클릭하고 모델이 저장된 버킷을 선택한 다음 선택을 클릭합니다.
2. (선택사항) 모델 이름 필드에 모델 이름을 입력합니다.
대상 설정 섹션에서 다음을 수행합니다.
1. 리전 필드에서 리전을 선택하고 확인을 클릭합니다.
2. 머신 사양 필드에서 모델을 배포하는 데 사용되는 머신 사양을 선택합니다.
3. (선택사항) 엔드포인트 이름 필드에 모델의 엔드포인트가 기본적으로 표시됩니다. 하지만 필드에 다른 엔드포인트 이름을 입력할 수 있습니다.
4. 프로젝트에서 VPC-SC를 적용하거나 비공개 액세스를 선호하는 경우 엔드포인트 액세스 필드에서 비공개(Private Service Connect)를 선택합니다. 그 외의 경우에는 공개를 선택합니다.
5. Private Service Connect를 사용하는 경우 쿼리 클라이언트가 실행되는 프로젝트인 프로젝트 ID를 프로젝트 ID 필드에 입력하거나 프로젝트 ID 선택을 클릭하여 프로젝트 ID가 포함된 대화상자를 표시합니다.
  
  프로젝트 ID 선택을 클릭하면 다음 단계를 따르세요.
  1. 모델에 액세스하려고 하는 코드가 포함된 프로젝트를 찾습니다.
  2. 프로젝트의 체크박스를 클릭합니다.
  3. 선택을 클릭합니다.
배포를 클릭합니다.

gcloud CLI

이 명령어는 특정 리전에 모델을 배포하는 방법을 보여줍니다.

gcloud ai model-garden models deploy --model=${MODEL_GCS} --region ${REGION}

이 명령어는 머신 유형, 가속기 유형, 가속기 수를 사용하여 특정 리전에 모델을 배포하는 방법을 보여줍니다. 특정 머신 구성을 선택하려면 세 필드를 모두 설정해야 합니다.

gcloud ai model-garden models deploy --model=${MODEL_GCS} --machine-type=${MACHINE_TYE} --accelerator-type=${ACCELERATOR_TYPE} --accelerator-count=${ACCELERATOR_COUNT} --region ${REGION}

Python

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy(
  machine_type="${MACHINE_TYPE}",
  accelerator_type="${ACCELERATOR_TYPE}",
  accelerator_count="${ACCELERATOR_COUNT}",
  model_display_name="custom-model",
  endpoint_display_name="custom-model-endpoint")

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

또는 custom_model.deploy() 메서드에 파라미터를 전달하지 않아도 됩니다.

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy()

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

curl


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
}'

또는 API를 사용하여 머신 유형을 명시적으로 설정할 수 있습니다.


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
  "deploy_config": {
    "dedicated_resources": {
      "machine_spec": {
        "machine_type": "'"${MACHINE_TYPE}"'",
        "accelerator_type": "'"${ACCELERATOR_TYPE}"'",
        "accelerator_count": '"${ACCELERATOR_COUNT}"'
      },
      "min_replica_count": 1
    }
  }
}'

API를 사용하여 배포

VPC 서비스 제어는 비공개 전용 엔드포인트에서만 작동합니다. 따라서 API를 사용하여 배포하는 방법을 보여주는 다음 코드 샘플에서 private_service_connect_config를 설정해야 합니다.

curl

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/YOUR_PROJECT/locations/us-central1:deploy" \
    -d '{
      "custom_model": {
        "model_id": "test-mg-deploy-092301",
        "gcs_uri": "gs://YOUR_GCS_BUCKET"
      },
      "destination": "projects/YOUR_PROJECT/locations/us-central1",
      "endpoint_config": {
        "endpoint_display_name": "psc-ep1",
        "private_service_connect_config": {
          "enablePrivateServiceConnect": true,
          "projectAllowlist": ["YOUR_PROJECT"]
        }
      },
      "deploy_config": {
        "dedicated_resources": {
          "machine_spec": {
            "machine_type": "g2-standard-24",
            "accelerator_type": "NVIDIA_L4",
            "accelerator_count": 2
          },
          "min_replica_count": 1,
          "max_replica_count": 1
        }
      }
    }'

Google Cloud 콘솔을 사용하여 엔드포인트 ID 및 모델 ID 가져오기

배포가 완료되면 다음 단계를 따르세요.

Google Cloud 콘솔에서 Model Garden 페이지로 이동합니다.

Model Garden으로 이동
내 엔드포인트 및 모델 보기를 클릭합니다.
내 엔드포인트 표의 이름 열에서 방금 배포한 엔드포인트를 확인합니다. 세부정보 페이지가 표시됩니다.
배포된 모델 표에서 모델을 클릭합니다.
버전 세부정보 페이지를 선택합니다. 모델 ID는 표의 환경 변수 행에 표시됩니다.

Private Service Connect 설정

Google API에 액세스하기 위해 새 엔드포인트를 추가하려고 합니다. 이 엔드포인트는 선택한 VPC 네트워크의 모든 리전에서 사용할 수 있습니다. 다음 사항도 고려하세요.

하이브리드 연결을 사용하여 엔드포인트의 VPC 네트워크에 연결된 네트워크의 클라이언트는 엔드포인트에 액세스할 수 있습니다. 자세한 내용은 엔드포인트를 통해Google API에 액세스를 참고하세요.
피어링된 VPC 네트워크의 클라이언트는 엔드포인트에 액세스할 수 없습니다.

서비스 연결을 가져오기 위한 엔드포인트 나열

이 코드 샘플은 서비스 첨부 파일을 가져오기 위해 엔드포인트를 나열하는 방법을 보여줍니다.

curl

$ curl \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/YOUR_PROJECT/locations/us-central1/endpoints/YOUR_ENDPOINT_ID"

목록 엔드포인트 응답입니다.

  {
  "name": "projects/440968033208/locations/us-central1/endpoints/mg-endpoint-2c6ae2be-1491-43fe-b179-cb5a63e2c955",
  "displayName": "psc-ep1",
  "deployedModels": [
    {
      "id": "4026753529031950336",
      "model": "projects/440968033208/locations/us-central1/models/mg-custom-1758645924",
      "displayName": "null-null-null-1758645933",
      "createTime": "2025-09-23T16:45:45.169195Z",
      "dedicatedResources": {
        "machineSpec": {
          "machineType": "g2-standard-24",
          "acceleratorType": "NVIDIA_L4",
          "acceleratorCount": 2
        },
        "minReplicaCount": 1,
        "maxReplicaCount": 1
      },
      "enableContainerLogging": true,
      "privateEndpoints": {
        "serviceAttachment": "projects/qdb392d34e2a11149p-tp/regions/us-central1/serviceAttachments/gkedpm-fbbc4061323c91c14ab4d961a2f8b0"
      },
      "modelVersionId": "1",
      "status": {
        "lastUpdateTime": "2025-09-23T17:26:10.031652Z",
        "availableReplicaCount": 1
      }
    }
  ],
  "trafficSplit": {
    "4026753529031950336": 100
  },
  "etag": "AMEw9yPIWQYdbpHu6g6Mhpu1_10J062_oR9Jw9txrp8dFFbel7odLgSK8CGIogAUkR_r",
  "createTime": "2025-09-23T16:45:45.169195Z",
  "updateTime": "2025-09-23T17:13:36.320873Z",
  "privateServiceConnectConfig": {
    "enablePrivateServiceConnect": true,
    "projectAllowlist": [
      "ucaip-vpc-s-1605069239-dut-24"
    ]
  }
}

Private Service Connect 만들기

Private Service Connect(PSC)를 만들려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 Private Service Connect 페이지로 이동합니다. 연결된 엔드포인트 페이지가 표시됩니다.

Private Service Connect로 이동
+ 엔드포인트 연결을 클릭합니다. 엔드포인트 연결 페이지가 표시됩니다.
타겟 필드에서 옵션을 선택합니다. 대부분의 Google API 및 서비스에 대한 액세스를 제공하는 Google API 또는 게시된 서비스에 대한 액세스를 제공하는 게시된 서비스를 선택할 수 있습니다.
타겟 세부정보 섹션의 범위 목록에서 값을 선택하고 번들 유형 목록에서 값을 선택합니다.
엔드포인트 세부정보 섹션에서 다음을 수행합니다.
1. 엔드포인트 이름 필드에 이름을 입력합니다.
2. 네트워크 목록에서 값을 선택합니다. 이 프로젝트에 있는 VPC 네트워크를 선택하세요. 호스트 프로젝트에서 공유 VPC 네트워크를 사용하는 서비스 프로젝트에 PSC 엔드포인트를 만들어야 하는 경우 Google Cloud CLI를 사용하거나 API 요청을 보내세요.
3. IP 주소 목록에서 값을 선택합니다.
서비스 디렉터리 섹션을 펼칩니다.
리전 목록에서 리전을 선택합니다.
네임스페이스 목록에서 네임스페이스를 선택합니다.
엔드포인트 추가를 클릭합니다. 엔드포인트 표가 새 엔드포인트의 행으로 업데이트됩니다.

쿼리 만들기

이 섹션에서는 공개 엔드포인트와 비공개 엔드포인트를 만드는 방법을 설명합니다.

공개 엔드포인트에 쿼리 만들기

모델이 배포된 후 맞춤 가중치는 공개 전용 엔드포인트를 지원합니다. API 또는 SDK를 사용하여 쿼리를 보낼 수 있습니다.

쿼리를 전송하기 전에 Google Cloud 콘솔에서 제공되는 엔드포인트 URL, 엔드포인트 ID, 모델 ID를 가져와야 합니다.

정보를 확인하려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 Model Garden 페이지로 이동합니다.

Model Garden
내 엔드포인트 및 모델 보기를 클릭합니다.
리전 목록에서 리전을 선택합니다.
엔드포인트 ID와 엔드포인트 URL을 가져오려면 내 엔드포인트 섹션에서 엔드포인트를 클릭합니다.

엔드포인트 ID가 엔드포인트 ID 필드에 표시됩니다.

공개 엔드포인트 URL이 전용 엔드포인트 필드에 표시됩니다.
모델 ID를 가져오려면 배포된 모델 섹션에 나열된 모델을 찾아 다음 단계를 따르세요.
1. 모델 필드에서 배포된 모델의 이름을 클릭합니다.
2. 버전 세부정보를 클릭합니다. 모델 ID가 모델 ID 필드에 표시됩니다.

엔드포인트 및 배포된 모델 정보를 가져온 후 추론 요청을 보내는 방법은 다음 코드 샘플을 참고하거나 온라인 추론 요청을 전용 공개 엔드포인트로 보내기를 참고하세요.

API

다음 코드 샘플은 사용 사례에 따라 API를 사용하는 다양한 방법을 보여줍니다.

채팅 완성(단항)

이 샘플 요청은 완전한 채팅 메시지를 모델에 전송하고 전체 응답이 생성된 후 단일 청크로 응답을 가져옵니다. 이는 문자 메시지를 보내고 하나의 전체 답장을 받는 것과 유사합니다.

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}/chat/completions" \
    -d '{
    "model": "'"${MODEL_ID}"'",
    "temperature": 0,
    "top_p": 1,
    "max_tokens": 154,
    "ignore_eos": true,
    "messages": [
      {
        "role": "user",
        "content": "How to tell the time by looking at the sky?"
      }
    ]
  }'

채팅 완성(스트리밍)

이 요청은 단항 채팅 완성 요청의 스트리밍 버전입니다. 요청에 "stream": true를 추가하면 모델은 생성되는 대로 대답을 조각별로 전송합니다. 이는 채팅 애플리케이션에서 실시간 타자기와 같은 효과를 만드는 데 유용합니다.

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \  "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}/chat/completions" \
    -d '{
    "model": "'"${MODEL_ID}"'",
    "stream": true,
    "temperature": 0,
    "top_p": 1,
    "max_tokens": 154,
    "ignore_eos": true,
    "messages": [
      {
        "role": "user",
        "content": "How to tell the time by looking at the sky?"
      }
    ]
  }'

예측

이 요청은 모델에서 추론을 가져오기 위해 직접 프롬프트를 전송합니다. 이는 텍스트 요약 또는 분류와 같이 반드시 대화형이 아닌 작업에 자주 사용됩니다.

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
  "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}:predict" \
    -d '{
    "instances": [
      {
        "prompt": "How to tell the time by looking at the sky?",
        "temperature": 0,
        "top_p": 1,
        "max_tokens": 154,
        "ignore_eos": true
      }
    ]
  }'

원시 예측

이 요청은 예측 요청의 스트리밍 버전입니다. :streamRawPredict 엔드포인트를 사용하고 "stream": true를 포함하면 이 요청은 직접 프롬프트를 전송하고 모델의 출력을 생성되는 대로 연속적인 데이터 스트림으로 수신합니다. 이는 스트리밍 채팅 완성 요청과 유사합니다.

  curl -X POST \
    -N \
    --output - \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}:streamRawPredict" \
    -d '{
    "instances": [
      {
        "prompt": "How to tell the time by looking at the sky?",
        "temperature": 0,
        "top_p": 1,
        "max_tokens": 154,
        "ignore_eos": true,
        "stream": true
      }
    ]
  }'

SDK

이 코드 샘플은 SDK를 사용하여 모델에 쿼리를 전송하고 모델에서 응답을 다시 가져옵니다.

  from google.cloud import aiplatform

  project_id = ""
  location = ""
  endpoint_id = "" # Use the short ID here

  aiplatform.init(project=project_id, location=location)

  endpoint = aiplatform.Endpoint(endpoint_id)

  prompt = "How to tell the time by looking at the sky?"
  instances=[{"text": prompt}]
  response = endpoint.predict(instances=instances, use_dedicated_endpoint=True)
  print(response.predictions)

비공개 엔드포인트에 대한 쿼리 만들기

허용된 프로젝트에서 노트북 또는 VM을 사용하여 쿼리를 테스트할 수 있습니다.

이 샘플 쿼리를 사용하면 변수를 IP, 프로젝트, 엔드포인트 ID, 모델 ID(위의 배포 단계에서 가져옴)로 바꿀 수 있습니다.

curl

채팅 완성

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" 'https://YOUR_IP/v1beta1/projects/YOUR_PROJECT_ID/locations/YOUR_LOCATION/endpoints/YOUR_ENDPOINT_ID/chat/completions' -d '{ "model": "YOUR_MODEL_ID", "max_tokens": 300, "messages": [{ "role": "user", "content": "how to tell the time by looking at sky?" }]}'

예측

$ curl -k -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" 'https:/YOUR_IP/v1beta1/projects/YOUR_PROJECT_ID/locations/YOUR_LOCATION/endpoints/YOUR_ENDPOINT_ID:predict' -d '{
  "instances": [
    {
      "prompt": "Summarize Goog stock performance",
      "temperature": 0,
      "top_p": 1,
      "max_tokens": 154,
      "ignore_eos": true
    }
  ]
}'

API 사용 방법의 또 다른 예는 맞춤 가중치 가져오기 노트북을 참고하세요.

Vertex AI에서 자체 배포된 모델에 대해 자세히 알아보기

Vertex AI 온라인 예측 전용 비공개 엔드포인트에 관한 자세한 내용은 온라인 추론에 Private Service Connect 기반 전용 비공개 엔드포인트 사용을 참고하세요.
자체 배포된 모델에 대한 자세한 내용은 자체 배포된 모델 개요를 참고하세요.
Model Garden에 대한 자세한 내용은 Model Garden 개요를 참고하세요.
모델 배포에 관한 자세한 내용은 Model Garden에서 모델 사용을 참고하세요.
Gemma 개방형 모델 사용
Llama 개방형 모델 사용
Hugging Face 개방형 모델 사용

커스텀 가중치를 적용한 모델 배포 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

지원되는 모델

제한사항

모델 파일

위치

기본 요건

시작하기 전에

커스텀 모델 배포

콘솔

gcloud CLI

Python

curl

API를 사용하여 배포

curl

Google Cloud 콘솔을 사용하여 엔드포인트 ID 및 모델 ID 가져오기

Private Service Connect 설정

서비스 연결을 가져오기 위한 엔드포인트 나열

curl

Private Service Connect 만들기

쿼리 만들기

공개 엔드포인트에 쿼리 만들기

API

SDK

비공개 엔드포인트에 대한 쿼리 만들기

curl

Vertex AI에서 자체 배포된 모델에 대해 자세히 알아보기

커스텀 가중치를 적용한 모델 배포