Once a cluster is deployed, you can manage its full lifecycle using the following REST API endpoints.
List: Views all active clusters in your project.Get: Retrieves detailed information for a specific cluster.Update: Modifies an existing cluster configuration.Delete: Permanently removes a cluster and its resources.
Authentication
alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"'
List clusters:
gcurl -X GET https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters
The list method supports the following optional query parameters to control pagination.
pageSize(integer, optional): The maximum number of clusters to return in the response. The service may return fewer than this value, even if more items exist. If unspecified, a default page size will be used.pageToken (string, optional): A token received from a previous list call. Provide this token to retrieve the subsequent page of results.
gcurl "https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters?pageSize=5"
nextPageToken string.
Get a cluster:
gcurl -X GET https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters/CLUSTER_ID
Update a cluster:
UPDATE_PAYLOAD specifies the local path to a JSON file that defines the full
ModelDevelopmentCluster you want to update to.
For example, to update the node count of a pool of a CPU-only cluster, use the following JSON payload:
{ "display_name": "DISPLAY_NAME", "network": { "network": "projects/PROJECT_ID/global/networks/NETWORK", "subnetwork": "projects/PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" }, "node_pools": [ { "id": "cpu", "machine_spec": { "machine_type": "n2-standard-8" }, "scaling_spec": { "min_node_count": UPDATED_MIN_NODE_COUNT, "max_node_count": UPDATED_MAX_NODE_COUNT }, "zone": "ZONE", "enable_public_ips": true, "boot_disk": { "boot_disk_type": "pd-standard", "boot_disk_size_gb": 120 } }, { "id": "login", "machine_spec": { "machine_type": "n2-standard-8", }, "scaling_spec": { "min_node_count": 1, "max_node_count": 1 }, "zone": "ZONE", "enable_public_ips": true, "boot_disk": { "boot_disk_type": "pd-standard", "boot_disk_size_gb": 120 } }, ], "orchestrator_spec": { "slurm_spec": { "home_directory_storage": "projects/PROJECT_ID/locations/ZONE/instances/FILESTORE", "partitions": [ { "id": "cpu", "node_pool_ids": [ "cpu" ] } ], "login_node_pool_id": "login" } } }
gcurl -X PATCH -d @UPDATE_PAYLOAD https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters/CLUSTER_ID
-
updateMask(string, optional): A FieldMask that specifies which fields of the Model Development cluster resource to update. Only the fields listed in theupdateMaskare changed.The following fields within the
ModelDevelopmentClusterresource can be specified in theupdateMask:node_poolsorchestrator_spec.slurm_spec.partitionsorchestrator_spec.slurm_spec.login_node_pool_idorchestrator_spec.slurm_spec.prolog_bash_scriptsorchestrator_spec.slurm_spec.epilog_bash_scripts
The command below updates both the node pool configuration and the Slurm partitions.
gcurl -X PATCH -d @update-payload.json https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters/CLUSTER_ID?updateMask=orchestrator_spec.slurm_spec.partitions,node_pools
For repeated fields, such as node_pools, prolog_bash_scripts, and
epilog_bash_scripts, the API only supports a full replacement operation. The user
must provide the entire, expected list of items in the request payload to replace the existing list
completely.
A successful request returns a Long Running Operation (LRO). You can then monitor the status of this operation using the following command:
gcurl https://REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID
Delete a cluster:
gcurl -X DELETE https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/modelDevelopmentClusters/CLUSTER_ID
This command returns a Long-Running Operation on success, which you can then monitor using the
operations describe command.
gcurl https://REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID