Cloud TPU v5e 上的 SAX
SAX 叢集 (SAX 單元)
SAX 管理伺服器和 SAX 模型伺服器是執行 SAX 叢集的兩個必要元件。
SAX 管理伺服器
SAX 管理伺服器會監控並協調 SAX 叢集中的所有 SAX 模型伺服器。在 SAX 叢集中,您可以啟動多個 SAX 管理伺服器,其中只有一個 SAX 管理伺服器會透過領導者選舉啟用,其他則為待命伺服器。當主管理伺服器發生故障時,備用管理伺服器就會啟用。主動 SAX 管理伺服器會將模型副本和推論要求指派給可用的 SAX 模型伺服器。
SAX 管理員儲存空間值區
每個 SAX 叢集都需要一個 Cloud Storage 值區,用於儲存 SAX 叢集中 SAX 管理伺服器和 SAX 模型伺服器的設定和位置。
SAX 模型伺服器
SAX 模型伺服器會載入模型查核點,並透過 GSPMD 執行推論。SAX 模型伺服器會在單一 TPU VM 工作站上執行。單主機 TPU 模型服務需要單一主機 TPU VM 上的單一 SAX 模型伺服器。多主機 TPU 模型服務需要在多主機 TPU 區段上建立一組 SAX 模型伺服器。目前無法提供多主機模型服務,但本文件提供使用 175B 測試模型的示例供您預覽。
SAX 模型提供
以下章節將逐步說明使用 SAX 提供語言模型的工作流程。它使用 GPT-J 6B 模型做為單一主機模型服務的範例。
開始之前,請在 TPU VM 上安裝 Cloud TPU SAX Docker 映像檔:
sudo usermod -a -G docker ${USER} newgrp docker gcloud auth configure-docker us-docker.pkg.dev SAX_ADMIN_SERVER_IMAGE_NAME="us-docker.pkg.dev/cloud-tpu-images/inference/sax-admin-server" SAX_MODEL_SERVER_IMAGE_NAME="us-docker.pkg.dev/cloud-tpu-images/inference/sax-model-server" SAX_UTIL_IMAGE_NAME="us-docker.pkg.dev/cloud-tpu-images/inference/sax-util" SAX_VERSION=v1.0.0 export SAX_ADMIN_SERVER_IMAGE_URL=${SAX_ADMIN_SERVER_IMAGE_NAME}:${SAX_VERSION} export SAX_MODEL_SERVER_IMAGE_URL=${SAX_MODEL_SERVER_IMAGE_NAME}:${SAX_VERSION} export SAX_UTIL_IMAGE_URL="${SAX_UTIL_IMAGE_NAME}:${sax_version}" docker pull ${SAX_ADMIN_SERVER_IMAGE_URL} docker pull ${SAX_MODEL_SERVER_IMAGE_URL} docker pull ${SAX_UTIL_IMAGE_URL}
設定稍後要使用的其他變數:
export SAX_ADMIN_SERVER_DOCKER_NAME="sax-admin-server" export SAX_MODEL_SERVER_DOCKER_NAME="sax-model-server" export SAX_CELL="/sax/test"
GPT-J 6B 單主機模型服務範例
單主機模型服務適用於單主機 TPU 區段,也就是 v5litepod-1、v5litepod-4 和 v5litepod-8。
建立 SAX 叢集
為 SAX 叢集建立 Cloud Storage 儲存值區:
SAX_ADMIN_STORAGE_BUCKET=${your_admin_storage_bucket} gcloud storage buckets create gs://${SAX_ADMIN_STORAGE_BUCKET} \ --project=${PROJECT_ID}
您可能需要另一個 Cloud Storage 儲存值區來儲存查核點。
SAX_DATA_STORAGE_BUCKET=${your_data_storage_bucket}
在終端機中透過 SSH 連線至 TPU VM,啟動 SAX 管理伺服器:
docker run \ --name ${SAX_ADMIN_SERVER_DOCKER_NAME} \ -it \ -d \ --rm \ --network host \ --env GSBUCKET=${SAX_ADMIN_STORAGE_BUCKET} \ ${SAX_ADMIN_SERVER_IMAGE_URL}
您可以透過下列方式檢查 Docker 記錄:
docker logs -f ${SAX_ADMIN_SERVER_DOCKER_NAME}
記錄中的輸出內容會類似以下內容:
I0829 01:22:31.184198 7 config.go:111] Creating config fs_root: "gs://test_sax_admin/sax-fs-root" I0829 01:22:31.347883 7 config.go:115] Created config fs_root: "gs://test_sax_admin/sax-fs-root" I0829 01:22:31.360837 24 admin_server.go:44] Starting the server I0829 01:22:31.361420 24 ipaddr.go:39] Skipping non-global IP address 127.0.0.1/8. I0829 01:22:31.361455 24 ipaddr.go:39] Skipping non-global IP address ::1/128. I0829 01:22:31.361462 24 ipaddr.go:39] Skipping non-global IP address fe80::4001:aff:fe8e:fc8/64. I0829 01:22:31.361469 24 ipaddr.go:39] Skipping non-global IP address fe80::42:bfff:fef9:1bd3/64. I0829 01:22:31.361474 24 ipaddr.go:39] Skipping non-global IP address fe80::20fb:c3ff:fe5b:baac/64. I0829 01:22:31.361482 24 ipaddr.go:56] IPNet address 10.142.15.200 I0829 01:22:31.361488 24 ipaddr.go:56] IPNet address 172.17.0.1 I0829 01:22:31.456952 24 admin.go:305] Loaded config: fs_root: "gs://test_sax_admin/sax-fs-root" I0829 01:22:31.609323 24 addr.go:105] SetAddr /gcs/test_sax_admin/sax-root/sax/test/location.proto "10.142.15.200:10000" I0829 01:22:31.656021 24 admin.go:325] Updated config: fs_root: "gs://test_sax_admin/sax-fs-root" I0829 01:22:31.773245 24 mgr.go:781] Loaded manager state I0829 01:22:31.773260 24 mgr.go:784] Refreshing manager state every 10s I0829 01:22:31.773285 24 admin.go:350] Starting the server on port 10000 I0829 01:22:31.773292 24 cloud.go:506] Starting the HTTP server on port 8080
將單主機 SAX 模型伺服器啟動至 SAX 叢集:
此時,SAX 叢集只包含 SAX 管理員伺服器。您可以透過第二個終端機的 SSH 連線至 TPU VM,在 SAX 叢集中啟動 SAX 模型伺服器:
docker run \ --privileged \ -it \ -d \ --rm \ --network host \ --name ${SAX_MODEL_SERVER_DOCKER_NAME} \ --env SAX_ROOT=gs://${SAX_ADMIN_STORAGE_BUCKET}/sax-root \ ${SAX_MODEL_SERVER_IMAGE_URL} \ --sax_cell=${SAX_CELL} \ --port=10001 \ --platform_chip=tpuv4 \ --platform_topology=1x1
轉換模型查核點:
您必須安裝 PyTorch 和 Transformers,才能從 EleutherAI 下載 GPT-J 檢查點:
pip3 install accelerate pip3 install torch pip3 install transformers
如要將檢查點轉換為 SAX 檢查點,您必須安裝
paxml:pip3 install paxml==1.1.0
下列指令碼可將 GPT-J 檢查點轉換為 SAX 檢查點:
python3 -m convert_gptj_ckpt --base EleutherAI/gpt-j-6b --pax pax_6b
轉換完成後:
ls checkpoint_00000000/您需要建立 commit_success 檔案並放入子目錄:
gcloud storage cp checkpoint_00000000 ${CHECKPOINT_PATH} --recursive touch commit_success.txt gcloud storage cp commit_success.txt ${CHECKPOINT_PATH}/ gcloud storage cp commit_success.txt ${CHECKPOINT_PATH}/metadata/ gcloud storage cp commit_success.txt ${CHECKPOINT_PATH}/state/將模型發布至 SAX 叢集
您現在可以使用在上一個步驟中轉換的檢查點,發布 GPT-J。
MODEL_NAME=gptjtokenizedbf16bs32 MODEL_CONFIG_PATH=saxml.server.pax.lm.params.gptj.GPTJ4TokenizedBF16BS32 REPLICA=1
如要發布 GPT-J (以及後續步驟),請使用 SSH 在第三個終端機中連線至 TPU VM:
docker run \ ${SAX_UTIL_IMAGE_URL} \ --sax_root=gs://${SAX_ADMIN_STORAGE_BUCKET}/sax-root \ publish \ ${SAX_CELL}/${MODEL_NAME} \ ${MODEL_CONFIG_PATH} \ ${CHECKPOINT_PATH} \ ${REPLICA}
您會在模型伺服器 Docker 記錄中看到許多活動,直到您看到類似以下的內容,表示模型已成功載入:
I0829 01:33:49.287459 139865140229696 servable_model.py:697] loading completed.
產生推論結果
對於 GPT-J,輸入內容和輸出內容必須以半形逗號分隔的符記 ID 字串格式輸入。您需要將文字輸入內容切割成子字串。
TEXT = "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction\:\nSummarize the following news article\:\n\n### Input\:\nMarch 10, 2015 . We're truly international in scope on Tuesday. We're visiting Italy, Russia, the United Arab Emirates, and the Himalayan Mountains. Find out who's attempting to circumnavigate the globe in a plane powered partially by the sun, and explore the mysterious appearance of craters in northern Asia. You'll also get a view of Mount Everest that was previously reserved for climbers. On this page you will find today's show Transcript and a place for you to request to be on the CNN Student News Roll Call. TRANSCRIPT . Click here to access the transcript of today's CNN Student News program. Please note that there may be a delay between the time when the video is available and when the transcript is published. CNN Student News is created by a team of journalists who consider the Common Core State Standards, national standards in different subject areas, and state standards when producing the show. ROLL CALL . For a chance to be mentioned on the next CNN Student News, comment on the bottom of this page with your school name, mascot, city and state. We will be selecting schools from the comments of the previous show. You must be a teacher or a student age 13 or older to request a mention on the CNN Student News Roll Call! Thank you for using CNN Student News!\n\n### Response\:
您可以透過 EleutherAI/gpt-j-6b 分詞器取得符記 ID 字串:
from transformers import GPT2Tokenizer tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-j-6b") :
將輸入文字切割成符號:
encoded_example = tokenizer(TEXT) input_ids = encoded_example.input_ids INPUT_STR = ",".join([str(input_id) for input_id in input_ids])
您可以預期符記 ID 字串類似以下內容:
>>> INPUT_STR '21106,318,281,12064,326,8477,257,4876,11,20312,351,281,5128,326,3769,2252,4732,13,19430,257,2882,326,20431,32543,262,2581,13,198,198,21017,46486,25,198,13065,3876,1096,262,1708,1705,2708,25,198,198,21017,23412,25,198,16192,838,11,1853,764,775,821,4988,3230,287,8354,319,3431,13,775,821,10013,8031,11,3284,11,262,1578,4498,24880,11,290,262,42438,22931,21124,13,9938,503,508,338,9361,284,2498,4182,615,10055,262,13342,287,257,6614,13232,12387,416,262,4252,11,290,7301,262,11428,5585,286,1067,8605,287,7840,7229,13,921,1183,635,651,257,1570,286,5628,41336,326,373,4271,10395,329,39311,13,1550,428,2443,345,481,1064,1909,338,905,42978,290,257,1295,329,345,284,2581,284,307,319,262,8100,13613,3000,8299,4889,13,48213,6173,46023,764,6914,994,284,1895,262,14687,286,1909,338,8100,13613,3000,1430,13,4222,3465,326,612,743,307,257,5711,1022,262,640,618,262,2008,318,1695,290,618,262,14687,318,3199,13,8100,13613,3000,318,2727,416,257,1074,286,9046,508,2074,262,8070,7231,1812,20130,11,2260,5423,287,1180,2426,3006,11,290,1181,5423,618,9194,262,905,13,15107,3069,42815,764,1114,257,2863,284,307,4750,319,262,1306,8100,13613,3000,11,2912,319,262,4220,286,428,2443,351,534,1524,1438,11,37358,11,1748,290,1181,13,775,481,307,17246,4266,422,262,3651,286,262,2180,905,13,921,1276,307,257,4701,393,257,3710,2479,1511,393,4697,284,2581,257,3068,319,262,8100,13613,3000,8299,4889,0,6952,345,329,1262,8100,13613,3000,0,198,198,21017,18261,25'
如要產生文章摘要,請按照下列步驟操作:
docker run \ ${SAX_UTIL_IMAGE_URL} \ --sax_root=gs://${SAX_ADMIN_STORAGE_BUCKET}/sax-root \ lm.generate \ ${SAX_CELL}/${MODEL_NAME} \ ${INPUT_STR}您可能會看到類似以下的內容:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+ | GENERATE | SCORE | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+ | 1212,2443,3407,262,905,42978,764,198,11041,262,42978,284,1037,2444,351,3555,35915,290,25818,764,198,2953,262,4220,286,262,2443,11,2912,329,257,2863,284,307,4750,319,8100,13613,3000,13,220,921,1276,307,257,4701,393,257,3710,2479,1511,393,4697,284,2581,257,3068,319,262,8100,13613,3000,8299,4889,13,50256 | -0.023136413 | | 1212,2443,3407,262,905,42978,764,198,11041,262,42978,284,1037,2444,351,3555,35915,290,25818,764,198,2953,262,4220,286,262,2443,11,2912,329,257,2863,284,307,4750,319,8100,13613,3000,13,220,921,1276,307,257,4701,393,257,3710,2479,1511,393,4697,284,2581,257,3068,319,262,8100,13613,3000,8299,4889,0,50256 | -0.91842502 | | 1212,2443,3407,262,905,42978,764,198,11041,262,42978,284,1037,2444,351,3555,35915,290,25818,764,198,2953,262,4220,286,262,2443,11,2912,329,257,2863,284,307,4750,319,8100,13613,3000,13,921,1276,307,257,4701,393,257,3710,2479,1511,393,4697,284,2581,257,3068,319,262,8100,13613,3000,8299,4889,13,50256 | -1.1726116 | | 1212,2443,3407,262,905,42978,764,198,11041,262,42978,284,1037,2444,351,3555,35915,290,25818,764,198,2953,262,4220,286,262,2443,11,2912,329,257,2863,284,307,4750,319,8100,13613,3000,13,220,921,1276,307,1511,393,4697,284,2581,257,3068,319,262,8100,13613,3000,8299,4889,13,50256 | -1.2472695 | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+
如要解析輸出符號 ID 字串,請按照下列步驟操作:
output_token_ids = [int(token_id) for token_id in OUTPUT_STR.split(',')] OUTPUT_TEXT = tokenizer.decode(output_token_ids, skip_special_tokens=True)您可以預期解析符號的文字如下:
>>> OUTPUT_TEXT 'This page includes the show Transcript.\nUse the Transcript to help students with reading comprehension and vocabulary.\nAt the bottom of the page, comment for a chance to be mentioned on CNN Student News. You must be a teacher or a student age 13 or older to request a mention on the CNN Student News Roll Call.'
清理 Docker 容器和 Cloud Storage 儲存空間值區。
175B 多主機模型服務預覽
部分大型語言模型需要多主機 TPU 值區,也就是 v5litepod-16 以上版本。在這種情況下,所有多主機 TPU 主機都需要有 SAX 模型伺服器的副本,且所有模型伺服器都會以 SAX 模型伺服器群組的功能,在多主機 TPU 區塊上提供大型模型。
建立新的 SAX 叢集
您可以按照 GPT-J 操作說明中的步驟建立 SAX 叢集,並建立 SAX 管理伺服器。
或者,如果您已有 SAX 叢集,可以將多主機模型伺服器啟動至 SAX 叢集。
將多主機 SAX 模型伺服器啟動至 SAX 叢集
如要建立多主機 TPU 配量,請使用與單主機 TPU 配量相同的指令,只需指定適當的多主機加速器類型:
ACCELERATOR_TYPE=v5litepod-32 ZONE=us-east1-c gcloud alpha compute tpus queued-resources create ${QUEUED_RESOURCE_ID} \ --node-id ${TPU_NAME} \ --project ${PROJECT_ID} \ --zone ${ZONE} \ --accelerator-type ${ACCELERATOR_TYPE} \ --runtime-version ${RUNTIME_VERSION} \ --service-account ${SERVICE_ACCOUNT} \ --reserved如要將 SAX 模型伺服器映像檔提取至所有 TPU 主機/工作站並啟動,請按照下列步驟操作:
gcloud compute tpus tpu-vm ssh ${TPU_NAME} \ --project ${PROJECT_ID} \ --zone ${ZONE} \ --worker=all \ --command=" gcloud auth configure-docker \ us-docker.pkg.dev # Pull sax model server image docker pull ${SAX_MODEL_SERVER_IMAGE_URL} # Run model server docker run \ --privileged \ -it \ -d \ --rm \ --network host \ --name ${SAX_MODEL_SERVER_DOCKER_NAME} \ --env SAX_ROOT=gs://${SAX_ADMIN_STORAGE_BUCKET}/sax-root \ ${SAX_MODEL_SERVER_IMAGE_URL} \ --sax_cell=${SAX_CELL} \ --port=10001 \ --platform_chip=tpuv4 \ --platform_topology=1x1"將模型發布至 SAX 叢集
本範例使用 LmCloudSpmd175B32Test 模型:
MODEL_NAME=lmcloudspmd175b32test MODEL_CONFIG_PATH=saxml.server.pax.lm.params.lm_cloud.LmCloudSpmd175B32Test CHECKPOINT_PATH=None REPLICA=1
如要發布測試模型,請按照下列步驟操作:
docker run \ ${SAX_UTIL_IMAGE_URL} \ --sax_root=gs://${SAX_ADMIN_STORAGE_BUCKET}/sax-root \ publish \ ${SAX_CELL}/${MODEL_NAME} \ ${MODEL_CONFIG_PATH} \ ${CHECKPOINT_PATH} \ ${REPLICA}
產生推論結果
docker run \ ${SAX_UTIL_IMAGE_URL} \ --sax_root=gs://${SAX_ADMIN_STORAGE_BUCKET}/sax-root \ lm.generate \ ${SAX_CELL}/${MODEL_NAME} \ "Q: Who is Harry Porter's mother? A\: "請注意,由於這個範例使用隨機權重的測試模型,因此輸出結果可能沒有意義。
清除
停止 Docker 容器:
docker stop ${SAX_ADMIN_SERVER_DOCKER_NAME} docker stop ${SAX_MODEL_SERVER_DOCKER_NAME}
使用 gcloud CLI 刪除 Cloud Storage 管理員儲存空間值區和任何資料儲存空間值區,如以下所示。
gcloud storage rm gs://${SAX_ADMIN_STORAGE_BUCKET} --recursive --continue-on-error gcloud storage rm gs://${SAX_DATA_STORAGE_BUCKET} --recursive --continue-on-error