Tool: create_cluster
Create a Dataproc cluster in a Google Cloud project
The following sample demonstrate how to use curl to invoke the create_cluster MCP tool.
| Curl Request |
|---|
curl --location 'https://dataproc.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "create_cluster", "arguments": { // provide these details according to the tool's MCP specification } }, "jsonrpc": "2.0", "id": 1 }' |
Input Schema
A request to create a Dataproc cluster.
CreateClusterRequest
| JSON representation |
|---|
{ "projectId": string, "region": string, "clusterName": string, "masterConfig": { object ( |
| Fields | |
|---|---|
projectId |
Required. The ID of the Google Cloud Platform project that the cluster belongs to. |
region |
Required. The Dataproc region in which to handle the request. |
clusterName |
Required. The cluster name. Cluster names within a project must be unique. Names of deleted clusters can be reused. |
masterConfig |
Optional. Configuration for master instances. |
workerConfig |
Optional. Configuration for worker instances. |
secondaryWorkerConfig |
Optional. Configuration for secondary worker instances. |
imageVersion |
Optional. The version of software inside the cluster. It must be one of the supported Dataproc Versions, such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version. If unspecified, it defaults to the latest Debian version. E.g. "2.2-debian12" |
image |
Optional. The Compute Engine image resource used for cluster instances. The URI can represent an image or image family. Image examples:
Image family examples. Dataproc will use the most recent image from the family:
If the URI is unspecified, it will be inferred from |
zone |
Optional. The Compute Engine zone where the cluster will be located. On a get request, zone will always be present. A full URL, partial URI, or short name are valid. Examples:
|
labels |
Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035. Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035. No more than 32 labels can be associated with a cluster. An object containing a list of |
properties |
Optional. The properties to set on daemon config files. Property keys are specified in
For more information, see Cluster properties. An object containing a list of |
bucket |
Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets). This field requires a Cloud Storage bucket name, not a |
tempBucket |
Optional. A Cloud Storage bucket used to store ephemeral cluster and jobs data, such as Spark and MapReduce history files. If you do not specify a temp bucket, Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's temp bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket. The default bucket has a TTL of 90 days, but you can use any TTL (or none) if you specify a bucket (see Dataproc staging and temp buckets). This field requires a Cloud Storage bucket name, not a |
enableComponentGateway |
Optional. If true, enable http access to specific ports on the cluster from external sources. Defaults to false. |
serviceAccount |
Optional. The Dataproc service account (also see VM Data Plane identity) used by Dataproc cluster VM instances to access Google Cloud Platform services. If not specified, the Compute Engine default service account is used. |
network |
Optional. The Compute Engine network to be used for machine communications. Cannot be specified with subnetwork_uri. If neither A full URL, partial URI, or short name are valid. Examples:
|
subnetwork |
Optional. The Compute Engine subnetwork to be used for machine communications. Cannot be specified with network_uri. A full URL, partial URI, or short name are valid. Examples:
|
optionalComponents[] |
Optional. The set of components to activate on the cluster. |
tier |
Optional. The cluster tier. |
initializationActions[] |
Optional. Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes. |
autoscalingPolicy |
Optional. The autoscaling policy used by the cluster. You can specify either the short name (e.g., |
deleteMaxIdle |
Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be deleted. Minimum value is 5 minutes; maximum value is 14 days. |
deleteMaxAge |
Optional. The lifetime duration of cluster. The cluster will be auto-deleted at the end of this period. Minimum value is 10 minutes; maximum value is 14 days. |
stopMaxIdle |
Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be stopped. Minimum value is 5 minutes; maximum value is 14 days. |
stopMaxAge |
Optional. The lifetime duration of cluster. The cluster will be auto-stopped at the end of this period. Minimum value is 10 minutes; maximum value is 14 days. |
tags[] |
Optional. The Compute Engine tags to add to all instances (see Tagging instances). |
resourceManagerTags |
Optional. The Resource Manager tags associated with this cluster. An object containing a list of |
InstanceGroupConfig
| JSON representation |
|---|
{ "numInstances": integer, "machineType": string, "bootDiskSizeGb": integer, "bootDiskType": string, "preemptibility": enum ( |
| Fields | |
|---|---|
numInstances |
Optional. The number of VM instances in the instance group. For HA cluster master_config groups, must be set to 3. For standard cluster master_config groups, must be set to 1. |
machineType |
Optional. The Compute Engine machine type used for cluster instances. A full URL, partial URI, or short name are valid. Examples:
Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, |
bootDiskSizeGb |
Optional. Size in GB of the boot disk (default is 500GB). |
bootDiskType |
Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-balanced" (Persistent Disk Balanced Solid State Drive), "pd-ssd" (Persistent Disk Solid State Drive), or "pd-standard" (Persistent Disk Hard Disk Drive). See Disk types. |
preemptibility |
Optional. Specifies the preemptibility of the instance group. The default value for master and worker groups is The default value for secondary instances is |
accelerators[] |
Optional. The Compute Engine accelerator configuration for these instances. |
AcceleratorConfig
| JSON representation |
|---|
{ "acceleratorTypeUri": string, "acceleratorCount": integer } |
| Fields | |
|---|---|
acceleratorTypeUri |
Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes. Examples:
Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, |
acceleratorCount |
The number of the accelerator cards of this type exposed to this instance. |
LabelsEntry
| JSON representation |
|---|
{ "key": string, "value": string } |
| Fields | |
|---|---|
key |
|
value |
|
PropertiesEntry
| JSON representation |
|---|
{ "key": string, "value": string } |
| Fields | |
|---|---|
key |
|
value |
|
NodeInitializationAction
| JSON representation |
|---|
{ "executableFile": string, "executionTimeout": string } |
| Fields | |
|---|---|
executableFile |
Required. Cloud Storage URI of executable file. |
executionTimeout |
Optional. Amount of time executable has to complete. Default is 10 minutes (see JSON representation of Duration). Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period. |
Duration
| JSON representation |
|---|
{ "seconds": string, "nanos": integer } |
| Fields | |
|---|---|
seconds |
Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years |
nanos |
Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 |
ResourceManagerTagsEntry
| JSON representation |
|---|
{ "key": string, "value": string } |
| Fields | |
|---|---|
key |
|
value |
|
Output Schema
This resource represents a long-running operation that is the result of a network API call.
Operation
| JSON representation |
|---|
{ "name": string, "metadata": { "@type": string, field1: ..., ... }, "done": boolean, // Union field |
| Fields | |
|---|---|
name |
The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the |
metadata |
Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any. An object containing fields of an arbitrary type. An additional field |
done |
If the value is |
Union field result. The operation result, which can be either an error or a valid response. If done == false, neither error nor response is set. If done == true, exactly one of error or response can be set. Some services might not provide the result. result can be only one of the following: |
|
error |
The error result of the operation in case of failure or cancellation. |
response |
The normal, successful response of the operation. If the original method returns no data on success, such as An object containing fields of an arbitrary type. An additional field |
Any
| JSON representation |
|---|
{ "typeUrl": string, "value": string } |
| Fields | |
|---|---|
typeUrl |
Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name. Example: type.googleapis.com/google.protobuf.StringValue This string must contain at least one The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): In the original design of |
value |
Holds a Protobuf serialization of the type described by type_url. A base64-encoded string. |
Status
| JSON representation |
|---|
{ "code": integer, "message": string, "details": [ { "@type": string, field1: ..., ... } ] } |
| Fields | |
|---|---|
code |
The status code, which should be an enum value of |
message |
A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the |
details[] |
A list of messages that carry the error details. There is a common set of message types for APIs to use. An object containing fields of an arbitrary type. An additional field |
Tool Annotations
Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ❌ | Open World Hint: ❌