調整開放式模型

本頁面說明如何對 Llama 3.1 等開放模型執行監督式微調

支援的調整模式

  • 完整微調

  • 低秩調整 (LoRA):LoRA 是一種參數高效調整模式,只會調整部分參數。與完整微調相比,這項技術的成本效益更高,且需要的訓練資料較少。另一方面,完整微調可調整所有參數,因此品質潛力較高。

支援的模型

  • Gemma 3 27B IT (google/gemma-3-27b-it)
  • Llama 3.1 8B (meta/llama3_1@llama-3.1-8b)
  • Llama 3.1 8B Instruct (meta/llama3_1@llama-3.1-8b-instruct)
  • Llama 3.2 1B Instruct (meta/llama3-2@llama-3.2-1b-instruct)
  • Llama 3.2 3B Instruct (meta/llama3-2@llama-3.2-3b-instruct)
  • Llama 3.3 70B Instruct (meta/llama3-3@llama-3.3-70b-instruct)
  • Qwen 3 32B (qwen/qwen3@qwen3-32b)
  • Llama 4 Scout 17B 16E Instruct (meta/llama4@llama-4-scout-17b-16e-instruct)

支援的地區

  • 愛荷華州 (us-central1)
  • 荷蘭 (europe-west4)

限制

型號 規格
Gemma 3 27B IT 調整模式 高效參數微調
全面微調
序列長度上限 8192
形式 文字
Llama 3.1 8B 調整模式 高效參數微調
全面微調
序列長度上限 4096 (高效參數微調)
8192 (全面微調)
形式 文字
Llama 3.1 8B Instruct 調整模式 高效參數微調
全面微調
序列長度上限 4096 (高效參數微調)
8192 (全面微調)
形式 文字
Llama 3.2 1B Instruct 調整模式 完整微調
序列長度上限 8192
形式 文字
Llama 3.2 3B Instruct 調整模式 完整微調
序列長度上限 8192
形式 文字
Llama 3.3 70B Instruct 調整模式 高效參數微調
全面微調
序列長度上限 4096 (高效參數微調)
8192 (全面微調)
形式 文字
Llama 4 Scout 17B 16E Instruct 調整模式 高效參數微調
序列長度上限 2048
形式 文字
圖片*

*不支援純文字和圖片範例的混合資料集。如果資料集中至少有一個圖片範例,系統就會篩除所有僅含文字的範例。
Qwen 3 32B 調整模式 高效參數微調
全面微調
序列長度上限 8192
形式 文字

事前準備

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI and Cloud Storage APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI and Cloud Storage APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the APIs

  8. 安裝並初始化 Vertex AI SDK for Python
  9. 匯入下列程式庫:
    import os
    import time
    import uuid
    import vertexai
    
    vertexai.init(project=PROJECT_ID, location=REGION)
    
    from google.cloud import aiplatform
    from vertexai.tuning import sft, SourceModel
    
  10. 準備用於調整的資料集

    如要微調模型,必須提供訓練資料集。建議您準備選用的驗證資料集,評估調整後模型的成效。

    資料集必須採用下列其中一種支援的 JSON Lines (JSONL) 格式,其中每一行都包含一個微調範例。

    將 JSONL 檔案上傳至 Cloud Storage。

    純文字資料集

    提示完成

    {"prompt": "<prompt text>", "completion": "<ideal generated text>"}
    

    回合制即時通訊格式

    {"messages": [
      {"content": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles.",
        "role": "system"},
      {"content": "Summarize the paper in one paragraph.",
        "role": "user"},
      {"content": " Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ...",
        "role": "assistant"}
    ]}
    

    GenerateContent

    {
    "systemInstruction": {
      "parts": [{ "text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles." }]},
    "contents": [
      {"role": "user",
        "parts": [{ "text": "Summarize the paper in one paragraph." }]},
      {"role": "assistant",
        "parts": [{ "text": "Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ..." }]}
    ]}
    

    多模態資料集

    回合制即時通訊格式

    {"messages": [
      {"role": "user", "content": [
        {"type": "text", "text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles."},
        {"type": "image_url", "image_url": {
          "url": "gs://your-gcs-bucket/your-image.jpeg",
          "detail": "low"}}]
      },
      {"role": "assistant", "content": [
        {"type": "text", "text": "Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ..."}]
      },
      {"role": "user", "content": [
        {"type": "text", "text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles."},
        {"type": "image_url", "image_url": {
          "url": "data:image/jpeg;base64,<base64 image>",
          "detail": "low"}}]
      },
      {"role": "assistant", "content": [
        {"type": "text", "text": "Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ..."}]
      },
    ]}
    

    GenerateContent

    {
    "systemInstruction": {
      "parts": [{ "text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles." }]},
    "contents": [
      {"role": "user",
        "parts": [
          {"text": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles." },
          {"file_data": {
            "mime_type": "image/jpeg", "file_uri": "gs://your-gcs-bucket/your-image.jpeg"}}]
      },
      {"role": "assistant",
        "parts": [{ "text": "Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ..." }]}
    ]}
    

    支援的格式包括 JPEG、PNG、WEBP 和 Base64 編碼圖片。

    請注意,如果圖片儲存在與 JSONL 檔案不同的 Cloud Storage bucket 中,請務必為這兩個服務帳戶授予兩個 bucket 的 Storage 物件使用者 (roles/storage.objectUser) IAM 角色:

    • service-PROJECT_NUMBER@gcp-sa-vertex-moss-ft.iam.gserviceaccount.com
    • service-PROJECT_NUMBER@gcp-sa-aiplatform.iam.gserviceaccount.com

    建立調整工作

    你可以從以下位置調整:

    • 支援的基礎模型,例如 Llama 3.1
    • 模型架構與其中一個支援的基本模型相同。 這可以是來自 Hugging Face 等存放區的自訂模型檢查點,也可以是來自 Vertex AI 微調作業的先前微調模型。這樣就能繼續調整已調整過的模型。

    Cloud Console

    1. 您可以透過下列方式啟動微調:

      • 前往模型資訊卡,按一下「微調」,然後選擇「受管理微調」

      前往 Llama 3.1 模型資訊卡

      • 前往「調整」頁面,然後按一下「建立調整後的模型」

      前往「微調」

    2. 填寫參數,然後按一下「開始調整」

    這會啟動微調作業,您可以在「受管理微調」分頁下方的「微調」頁面中查看。

    調整工作完成後,您可以在「詳細資料」分頁中查看調整後模型的相關資訊。

    Python 適用的 Vertex AI SDK

    將參數值替換成自己的值,然後執行下列程式碼,建立微調工作:

    sft_tuning_job = sft.train(
        source_model=SourceModel(
          base_model="meta/llama3_1@llama-3.1-8b",
          # Optional, folder that is either a custom model checkpoint or previously tuned model
          custom_base_model="gs://{STORAGE-URI}",
        ),
        tuning_mode="FULL", # FULL or PEFT_ADAPTER
        epochs=3,
        train_dataset="gs://{STORAGE-URI}", # JSONL file
        validation_dataset="gs://{STORAGE-URI}", # JSONL file
        output_uri="gs://{STORAGE-URI}",
    )
    

    工作完成後,調整後模型的模型構件會儲存在 <output_uri>/postprocess/node-0/checkpoints/final 資料夾中。

    部署經過調整的模型

    您可以將調整過的模型部署至 Vertex AI 端點。您也可以從 Cloud Storage 匯出微調模型,並部署至其他位置。

    如要將調整過的模型部署至 Vertex AI 端點,請按照下列步驟操作:

    Cloud Console

    1. 前往 Model Garden 頁面,然後按一下「Deploy model with custom weights」(使用自訂權重部署模型)

    前往 Model Garden

    1. 填寫參數,然後按一下「部署」

    Python 適用的 Vertex AI SDK

    使用預先建構的容器部署 G2 machine

    from vertexai.preview import model_garden
    
    MODEL_ARTIFACTS_STORAGE_URI = "gs://{STORAGE-URI}/postprocess/node-0/checkpoints/final"
    
    model = model_garden.CustomModel(
        gcs_uri=MODEL_ARTIFACTS_STORAGE_URI,
    )
    
    # deploy the model to an endpoint using GPUs. Cost will incur for the deployment
    endpoint = model.deploy(
      machine_type="g2-standard-12",
      accelerator_type="NVIDIA_L4",
      accelerator_count=1,
    )
    

    取得推論結果

    部署成功後,您就可以使用文字提示將要求傳送至端點。請注意,前幾個提示的執行時間會比較長。

    # Loads the deployed endpoint
    endpoint = aiplatform.Endpoint("projects/{PROJECT_ID}/locations/{REGION}/endpoints/{endpoint_name}")
    
    prompt = "Summarize the following article. Article: Preparing a perfect risotto requires patience and attention to detail. Begin by heating butter in a large, heavy-bottomed pot over medium heat. Add finely chopped onions and minced garlic to the pot, and cook until they're soft and translucent, about 5 minutes. Next, add Arborio rice to the pot and cook, stirring constantly, until the grains are coated with the butter and begin to toast slightly. Pour in a splash of white wine and cook until it's absorbed. From there, gradually add hot chicken or vegetable broth to the rice, stirring frequently, until the risotto is creamy and the rice is tender with a slight bite.. Summary:"
    
    # Define input to the prediction call
    instances = [
        {
            "prompt": "What is a car?",
            "max_tokens": 200,
            "temperature": 1.0,
            "top_p": 1.0,
            "top_k": 1,
            "raw_response": True,
        },
    ]
    
    # Request the prediction
    response = endpoint.predict(
        instances=instances
    )
    
    for prediction in response.predictions:
        print(prediction)
    

    如要進一步瞭解如何從已部署的模型取得推論結果,請參閱「取得線上推論結果」。

    請注意,受管理開放模型使用 chat.completions 方法,而非已部署模型使用的 predict 方法。如要進一步瞭解如何從受管理模型取得推論結果,請參閱「呼叫 Llama 模型」。

    限制與配額

    系統會對並行微調工作數量強制執行配額。每個專案都設有預設配額,至少可執行一項微調作業。這是全域配額,適用於所有可用區域和支援的模型。如要同時執行更多工作,請要求 Global concurrent managed OSS model fine-tuning jobs per project 的額外配額

    定價

    我們會根據模型微調定價向您收取微調費用。

    此外,系統也會針對相關服務 (例如 Cloud Storage 和 Vertex AI Prediction) 向您收費。

    請參閱 Vertex AI 的計價方式 Cloud Storage 的計價方式,然後利用 Pricing Calculator,根據您預估的用量來預估費用。

    後續步驟