Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

オープンモデルの教師ありファインチューニングと抽出ファインチューニング

このページでは、Llama 3.1 などのオープンモデルで教師ありファインチューニングと蒸留ファインチューニングを行う方法について説明します。特に明記されていない限り、このページの手順は教師ありファインチューニングと蒸留ファインチューニングの両方に適用されます。抽出では、大規模な教師モデルの出力を使用して、小規模な生徒モデルを調整できます。

サポートされているチューニングモード

教師ありファインチューニング:
- フルファインチューニング
- Low-Rank Adaptation（LoRA）: LoRA は、パラメータのサブセットのみを調整するパラメータエフィシエントチューニングモードです。フルファインチューニングよりも費用対効果が高く、必要なトレーニングデータも少なくなります。一方、フルファインチューニングでは、すべてのパラメータを調整することで、品質がより高くなる可能性があります。
抽出ファインチューニング: 抽出ファインチューニングでは、GenAI SDK を使用します。ここで、教師モデルを指定してレスポンスを生成し、そのレスポンスを使用して小規模な生徒モデルをチューニングします。

蒸留ファインチューニングの推奨ユースケース

蒸留ファインチューニングは、教師モデルがターゲットタスクで生徒モデルよりも大幅に優れている場合に最も効果的です。複雑な多段階の推論機能を大規模な教師から小規模な生徒に転送する場合におすすめします。たとえば、次のような場合です。

数学と量的推論
段階的な推論が必要な科学、医学、その他のドメイン固有の質問応答
「思考」または Chain-of-Thought の動作を備えた強力な教師モデルが、生徒よりも一貫して高品質の回答を生成するその他のタスク。

蒸留は、生徒モデルが教師に近いパフォーマンスを発揮するタスクや、教師の推論トレースが価値を追加しない短形式の検索タスクでは、ゲインが小さくなります。

サポートされているモデル

教師ありファインチューニングがサポートされているモデル

Gemma 3 1B IT（google/gemma3@gemma-3-1b-it）
Gemma 3 4B IT（google/gemma3@gemma-3-4b-it）
Gemma 3 12B IT（google/gemma3@gemma-3-12b-it）
Gemma 3 27B IT（google/gemma3@gemma-3-27b-it）
Medgemma 1.5 4B IT（google/medgemma@medgemma-4b-it）
Llama 3.1 8B（meta/llama3_1@llama-3.1-8b）
Llama 3.1 8B Instruct（meta/llama3_1@llama-3.1-8b-instruct）
Llama 3.2 1B Instruct（meta/llama3-2@llama-3.2-1b-instruct）
Llama 3.2 3B Instruct（meta/llama3-2@llama-3.2-3b-instruct）
Llama 3.3 70B Instruct（meta/llama3-3@llama-3.3-70b-instruct）
Qwen 3 4B（qwen/qwen3@qwen3-4b）
Qwen 3 8B（qwen/qwen3@qwen3-8b）
Qwen 3 14B（qwen/qwen3@qwen3-14b）
Qwen 3 32B（qwen/qwen3@qwen3-32b）
Llama 4 Scout 17B 16E Instruct（meta/llama4@llama-4-scout-17b-16e-instruct）

抽出チューニングでサポートされているモデル

サポートされている教師モデル:

DeepSeek R1 0528 MaaS（deepseek-ai/deepseek-r1-0528-maas）
DeepSeek V3.2 MaaS（deepseek-ai/deepseek-v3.2-maas）
Qwen 3 Next 80B A3B Thinking MaaS（qwen/qwen3-next-80b-a3b-thinking-maas）

サポートされている生徒モデル:

Qwen 3 4B（qwen/qwen3@qwen3-4b）
Qwen 3 8B（qwen/qwen3@qwen3-8b）
Qwen 3 14B（qwen/qwen3@qwen3-14b）
Qwen 3 32B（qwen/qwen3@qwen3-32b）
Gemma 3 1B IT（google/gemma3@gemma-3-1b-it）
Gemma 3 4B IT（google/gemma3@gemma-3-4b-it）
Gemma 3 12B IT（google/gemma3@gemma-3-12b-it）
Gemma 3 27B IT（google/gemma3@gemma-3-27b-it）

サポートされるリージョン

アイオワ（us-central1）
オランダ（europe-west4）
オレゴン（us-west1）
コロンバス（us-east5）
シンガポール（asia-southeast1）

制限事項

モデル	仕様	値
Gemma 3 1B IT	チューニングモード	フルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Gemma 3 4B IT	チューニングモード	フルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Gemma 3 12B IT	チューニングモード	フルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Gemma 3 27B IT	チューニングモード	パラメータエフィシエントファインチューニングフルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Medgemma 1.5 4B IT	チューニングモード	フルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Llama 3.1 8B	チューニングモード	パラメータエフィシエントファインチューニングフルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Llama 3.1 8B Instruct	チューニングモード	パラメータエフィシエントファインチューニングフルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Llama 3.2 1B Instruct	チューニングモード	フルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Llama 3.2 3B Instruct	チューニングモード	フルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Llama 3.3 70B Instruct	チューニングモード	パラメータエフィシエントファインチューニングフルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Llama 4 Scout 17B 16E Instruct	チューニングモード	パラメータエフィシエントファインチューニング
	シーケンスの最大長	2048
	モダリティ	テキスト画像^* ^*テキストのみの例と画像の例が混在するデータセットはサポートされていません。データセットに画像の例が少なくとも 1 つある場合、テキストのみの例はすべて除外されます。
Qwen 3 4B	チューニングモード	フルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Qwen 3 8B	チューニングモード	フルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Qwen 3 14B	チューニングモード	フルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト
Qwen 3 32B	チューニングモード	パラメータエフィシエントファインチューニングフルファインチューニング
	シーケンスの最大長	8192
	モダリティ	テキスト

始める前に

Google Cloud アカウントにログインします。 Google Cloudを初めて使用する場合は、アカウントを作成して、実際のシナリオでの Google プロダクトのパフォーマンスを評価してください。新規のお客様には、ワークロードの実行、テスト、デプロイができる無料クレジット $300 分を差し上げます。

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Gemini Enterprise Agent Platform and Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Gemini Enterprise Agent Platform and Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

SDK をインストールし、チューニング方法のライブラリをインポートします。

教師ありファインチューニング

Vertex AI SDK for Python をインストールして初期化し、次のライブラリをインポートします。

import os
import time
import uuid
import vertexai

vertexai.init(project=PROJECT_ID, location=REGION)

from google.cloud import aiplatform
from vertexai.tuning import sft, SourceModel

抽出チューニング

次の SDK をインストールします。

pip install google-genai

次に、次のライブラリをインポートします。

import os
import time
import uuid

from google import genai
from google.genai import types
client = genai.Client(vertexai=True, project=PROJECT_ID, location=REGION)

チューニング用のデータセットを準備する

チューニングにはトレーニングデータセットが必要です。チューニングされたモデルのパフォーマンスを評価する場合は、オプションの検証データセットを準備することをおすすめします。

データセットは、次のいずれかのサポートされている JSON Lines（JSONL）形式にする必要があります。各行には 1 つのチューニングサンプルが含まれます。

JSONL ファイルを Cloud Storage にアップロードします。

テキストのみのデータセット

プロンプトの完了

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

ターンベースのチャット形式

{"messages": [
  {"content": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles.",
    "role": "system"},
  {"content": "Summarize the paper in one paragraph.",
    "role": "user"},
  {"content": " Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ...",
    "role": "assistant"}
]}

GenerateContent

{
"systemInstruction": {
  "parts": [{ "text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles." }]},
"contents": [
  {"role": "user",
    "parts": [{ "text": "Summarize the paper in one paragraph." }]},
  {"role": "assistant",
    "parts": [{ "text": "Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ..." }]}
]}

マルチモーダルデータセット

ターンベースのチャット形式

{"messages": [
  {"role": "user", "content": [
    {"type": "text", "text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles."},
    {"type": "image_url", "image_url": {
      "url": "gs://your-gcs-bucket/your-image.jpeg",
      "detail": "low"}}]
  },
  {"role": "assistant", "content": [
    {"type": "text", "text": "Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ..."}]
  },
  {"role": "user", "content": [
    {"type": "text", "text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles."},
    {"type": "image_url", "image_url": {
      "url": "data:image/jpeg;base64,<base64 image>",
      "detail": "low"}}]
  },
  {"role": "assistant", "content": [
    {"type": "text", "text": "Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ..."}]
  },
]}

GenerateContent

{
"systemInstruction": {
  "parts": [{ "text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles." }]},
"contents": [
  {"role": "user",
    "parts": [
      {"text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles." },
      {"file_data": {
        "mime_type": "image/jpeg", "file_uri": "gs://your-gcs-bucket/your-image.jpeg"}}]
  },
  {"role": "assistant",
    "parts": [{ "text": "Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ..." }]}
]}

サポートされている形式は、JPEG、PNG、WEBP、Base64 エンコード画像です。

画像が JSONL ファイルとは異なる Cloud Storage バケットに保存されている場合は、次の 2 つのサービスアカウントの両方のバケットに Storage オブジェクトユーザー（roles/storage.objectUser）IAM ロールが付与されていることを確認してください。

service-PROJECT_NUMBER@gcp-sa-vertex-moss-ft.iam.gserviceaccount.com
service-PROJECT_NUMBER@gcp-sa-aiplatform.iam.gserviceaccount.com

チューニングジョブの作成

次の項目をチューニングできます。

Llama 3.1 などのサポートされているベースモデル
サポートされているベースモデルのいずれかと同じアーキテクチャを持つモデル。これは、Hugging Face などのリポジトリのカスタムモデルチェックポイント、または Gemini Enterprise Agent Platform チューニングジョブから以前にチューニングされたモデルのいずれかになります。これにより、すでにチューニングされているモデルのチューニングを続行できます。

Cloud コンソール（管理対象）

ファインチューニングは次の方法で開始できます。
- モデルカードに移動して [ファインチューニング] をクリックし、[マネージドチューニング] を選択します。
Llama 3.1 モデルカードに移動

または
- [チューニング] ページに移動し、[チューニング済みモデルを作成] をクリックします。
[チューニング] に移動
パラメータを入力し、[チューニングを開始] をクリックします。

これにより、チューニングジョブが開始されます。このジョブは、[チューニング] ページの [マネージドチューニング] タブで確認できます。

チューニングジョブが完了すると、[詳細] タブでチューニングされたモデルに関する情報を確認できます。

Agent Platform SDK（監督あり）

パラメータ値を独自の値に置き換えて、次のコードを実行してチューニングジョブを作成します。

sft_tuning_job = sft.train(
    source_model=SourceModel(
      base_model="meta/llama3_1@llama-3.1-8b",
      # Optional, folder that is either a custom model checkpoint or previously tuned model
      custom_base_model="gs://{STORAGE-URI}",
    ),
    tuning_mode="FULL", # FULL or PEFT_ADAPTER
    epochs=3,
    train_dataset="gs://{STORAGE-URI}", # JSONL file
    validation_dataset="gs://{STORAGE-URI}", # JSONL file
    output_uri="gs://{STORAGE-URI}",
)

GenAI SDK（蒸留）

パラメータ値を独自の値に置き換えて、次のコードを実行して蒸留チューニングジョブを作成します。

tuning_job = client.tunings.tune(
    base_model="qwen/qwen3@qwen3-4b",
    training_dataset=types.TuningDataset(
        gcs_uri="gs://{STORAGE-URI}"
    ),
    config=types.CreateTuningJobConfig(
        method="DISTILLATION",
        base_teacher_model="qwen/qwen3-next-80b-a3b-thinking-maas",
        epoch_count=3,
        validation_dataset=types.TuningValidationDataset(
            gcs_uri="gs://{STORAGE-URI}"
        ),
        output_uri="gs://{STORAGE-URI}",
    ),
)

チューニング済みモデルのアーティファクト

チューニングジョブが完了すると、チューニングされたモデルのモデルアーティファクトが Cloud Storage 出力ディレクトリに保存されます。

gs://<output_dir>/
    # (Distillation tuning only) The labeled dataset from teacher model's inference
    -> distillation_labelled_dataset.jsonl

gs://<output_dir>/postprocess/node-0/checkpoints/
    # Final checkpoint
    -> final/
        -> model-00001-of-000xx.safetensors
        -> model-000yy-of-000xx.safetensors

    # Intermediate checkpoints
    -> checkpoint-M/
        -> model-00001-of-000xx.safetensors
        -> model-000yy-of-000xx.safetensors
    …
    -> checkpoint-N/
        -> model-00001-of-000xx.safetensors
        -> model-000yy-of-000xx.safetensors

最大 10 個のチェックポイントが保存されます。
エポック数（E）が 10 未満の場合、E 個のチェックポイントが保存されます（エポックごとに 1 つ）。
範囲 M ～ N の中間チェックポイントが順序付けされます。中間チェックポイントには、必ずしも連番が付けられるとは限りません。たとえば、チェックポイントの番号が 1、2、3、4 ではなく 1、3、5、10 になることがあります。

チューニングされたモデルをデプロイする

チューニングされたモデルは、Gemini Enterprise Agent Platform エンドポイントにデプロイできます。チューニングされたモデルを Cloud Storage からエクスポートして、別の場所にデプロイすることもできます。

チューニングされたモデルを Gemini Enterprise Agent Platform エンドポイントにデプロイするには:

Cloud Console

[Model Garden] ページに移動し、[カスタムの重みを使用してモデルをデプロイ] をクリックします。

Model Garden に移動
パラメータを入力して、[デプロイ] をクリックします。

Agent Platform SDK for Python

ビルド済みコンテナを使用して G2 machine をデプロイします。

from vertexai.preview import model_garden

MODEL_ARTIFACTS_STORAGE_URI = "gs://{STORAGE-URI}/postprocess/node-0/checkpoints/final"

model = model_garden.CustomModel(
    gcs_uri=MODEL_ARTIFACTS_STORAGE_URI,
)

# deploy the model to an endpoint using GPUs. Cost will incur for the deployment
endpoint = model.deploy(
  machine_type="g2-standard-12",
  accelerator_type="NVIDIA_L4",
  accelerator_count=1,
)

推論を取得する

デプロイが成功すると、テキストプロンプトを使用してエンドポイントにリクエストを送信できます。最初の数個のプロンプトの実行には時間がかかります。

# Loads the deployed endpoint
endpoint = aiplatform.Endpoint("projects/{PROJECT_ID}/locations/{REGION}/endpoints/{endpoint_name}")

prompt = "Summarize the following article. Article: Preparing a perfect risotto requires patience and attention to detail. Begin by heating butter in a large, heavy-bottomed pot over medium heat. Add finely chopped onions and minced garlic to the pot, and cook until they're soft and translucent, about 5 minutes. Next, add Arborio rice to the pot and cook, stirring constantly, until the grains are coated with the butter and begin to toast slightly. Pour in a splash of white wine and cook until it's absorbed. From there, gradually add hot chicken or vegetable broth to the rice, stirring frequently, until the risotto is creamy and the rice is tender with a slight bite.. Summary:"

# Define input to the prediction call
instances = [
    {
        "prompt": "What is a car?",
        "max_tokens": 200,
        "temperature": 1.0,
        "top_p": 1.0,
        "top_k": 1,
        "raw_response": True,
    },
]

# Request the prediction
response = endpoint.predict(
    instances=instances
)

for prediction in response.predictions:
    print(prediction)

デプロイされたモデルから推論を取得する方法の詳細については、オンライン推論を取得するをご覧ください。

マネージドオープンモデルでは、デプロイされたモデルで使用される predict メソッドではなく、chat.completions メソッドが使用されます。マネージドモデルから推論を取得する方法については、Llama モデルを呼び出すをご覧ください。

制限と割り当て

同時チューニングジョブの数に割り当てが適用されます。どのプロジェクトにも、少なくとも 1 つのチューニングジョブを実行するためのデフォルトの割り当てがあります。これはグローバル割り当てであり、利用可能なすべてのリージョンとサポートされているモデルで共有されます。複数のジョブを同時に実行する場合は、Global concurrent managed OSS model fine-tuning jobs per project の追加の割り当てをリクエストする必要があります。

蒸留ファインチューニングでは、チューニングジョブの割り当てに加えて、教師モデルも使用します。プロジェクトには、指定された教師モデルに十分な割り当てが必要です。サービス（MaaS）として提供されるオープンモデルは、動的共有割り当てを使用します。チューニングジョブが教師モデルを呼び出すと、そのモデルのプロジェクトの共有割り当てが消費されます。マネージドオープンモデルの割り当ての詳細については、MaaS 用 Gemini Enterprise Agent Platform マネージドモデルをご覧ください。

料金

チューニングの料金は、モデルチューニングの料金に基づいて請求されます。トレーニングトークンの数は、トレーニングデータセット内のトークンの数にエポック数を掛けて計算されます。蒸留チューニングでは、レスポンスを生成するために教師モデルに対して行われた API 呼び出しについても、マネージドモデルの料金に基づいて請求されます。

Cloud Storage や Gemini Enterprise Agent Platform Prediction などの関連サービスに対しても課金されます。

Gemini Enterprise Agent Platform の料金と Cloud Storage の料金をご覧ください。また、料金計算ツールを使用すると、予想される使用量に基づいて費用を見積もることができます。

次のステップ

チューニング済みモデルを評価する

オープンモデルの教師ありファインチューニングと抽出ファインチューニング コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

サポートされているチューニング モード

蒸留ファインチューニングの推奨ユースケース

サポートされているモデル

教師ありファインチューニングがサポートされているモデル

抽出チューニングでサポートされているモデル

サポートされるリージョン

制限事項

始める前に

教師ありファインチューニング

抽出チューニング

チューニング用のデータセットを準備する

テキストのみのデータセット

プロンプトの完了

ターンベースのチャット形式

GenerateContent

マルチモーダル データセット

ターンベースのチャット形式

GenerateContent

チューニング ジョブの作成

Cloud コンソール（管理対象）

Agent Platform SDK（監督あり）

GenAI SDK（蒸留）

チューニング済みモデルのアーティファクト

チューニングされたモデルをデプロイする

Cloud Console

Agent Platform SDK for Python

推論を取得する

制限と割り当て

料金

次のステップ

オープンモデルの教師ありファインチューニングと抽出ファインチューニング

サポートされているチューニングモード

マルチモーダルデータセット

チューニングジョブの作成