如果您对 Vertex AI 训练集群感兴趣,请与您的销售代表联系以获取访问权限。
部署集群后,您可以使用以下 REST API 端点来管理其整个生命周期。
List:查看项目中的所有活跃集群。Get:检索特定集群的详细信息。Update:修改现有集群配置。Delete:永久移除集群及其资源。
身份验证
alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"'
列出集群:
gcurl -X GET https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters
列表方法支持以下可选查询参数来控制分页。
pageSize(整数,可选):响应中要返回的集群数量上限。即使存在更多项,服务返回的项数也可能小于此值。如果未指定,则使用默认页面大小。pageToken (string, optional):从上一个列表调用收到的令牌。提供此令牌可检索后续结果页面。
gcurl "https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters?pageSize=5"
nextPageToken 字符串。获取集群:
gcurl -X GET https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters/CLUSTER_ID
更新集群:
UPDATE_PAYLOAD 指定 JSON 文件的本地路径,该文件定义了您要更新到的完整 ModelDevelopmentCluster。
例如,如需更新仅限使用 CPU 的集群的节点池的节点数量,请使用以下 JSON 载荷:
{ "display_name": "DISPLAY_NAME", "network": { "network": "projects/PROJECT_ID/global/networks/NETWORK", "subnetwork": "projects/PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" }, "node_pools": [ { "id": "cpu", "machine_spec": { "machine_type": "n2-standard-8" }, "scaling_spec": { "min_node_count": UPDATED_MIN_NODE_COUNT, "max_node_count": UPDATED_MAX_NODE_COUNT }, "zone": "ZONE", "enable_public_ips": true, "boot_disk": { "boot_disk_type": "pd-standard", "boot_disk_size_gb": 120 } }, { "id": "login", "machine_spec": { "machine_type": "n2-standard-8", }, "scaling_spec": { "min_node_count": 1, "max_node_count": 1 }, "zone": "ZONE", "enable_public_ips": true, "boot_disk": { "boot_disk_type": "pd-standard", "boot_disk_size_gb": 120 } }, ], "orchestrator_spec": { "slurm_spec": { "home_directory_storage": "projects/PROJECT_ID/locations/ZONE/instances/FILESTORE", "partitions": [ { "id": "cpu", "node_pool_ids": [ "cpu" ] } ], "login_node_pool_id": "login" } } }
gcurl -X PATCH -d @UPDATE_PAYLOAD https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters/CLUSTER_ID
-
updateMask(字符串,可选):一个 FieldMask,用于指定要更新的模型开发集群资源的哪些字段。只有updateMask中列出的字段会发生更改。ModelDevelopmentCluster资源中的以下字段可在updateMask中指定:node_poolsorchestrator_spec.slurm_spec.partitionsorchestrator_spec.slurm_spec.login_node_pool_idorchestrator_spec.slurm_spec.prolog_bash_scriptsorchestrator_spec.slurm_spec.epilog_bash_scripts
以下命令会同时更新节点池配置和 Slurm 分区。
gcurl -X PATCH -d @update-payload.json https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters/CLUSTER_ID?updateMask=orchestrator_spec.slurm_spec.partitions,node_pools
对于 node_pools、prolog_bash_scripts 和 epilog_bash_scripts 等重复字段,该 API 仅支持完全替换操作。用户必须在请求载荷中提供完整的预期商品列表,才能完全替换现有列表。
如果请求成功,则会返回长时间运行的操作 (LRO)。然后,您可以使用以下命令监控此操作的状态:
gcurl https://REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID
删除集群:
gcurl -X DELETE https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters/CLUSTER_ID
如果成功,此命令会返回一个长时间运行的操作,然后您可以使用 operations describe 命令监控该操作。
gcurl https://REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID