"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

MCP Tools Reference: dataproc.googleapis.com

Tool: `create_cluster`

Create a Dataproc cluster in a Google Cloud project

The following sample demonstrate how to use curl to invoke the create_cluster MCP tool.

Curl Request
curl --location 'https://dataproc.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "create_cluster", "arguments": { // provide these details according to the tool's MCP specification } }, "jsonrpc": "2.0", "id": 1 }'

Curl Request

                  
curl --location 'https://dataproc.googleapis.com/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json, text/event-stream' \
--data '{
  "method": "tools/call",
  "params": {
    "name": "create_cluster",
    "arguments": {
      // provide these details according to the tool's MCP specification
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}'

Input Schema

A request to create a Dataproc cluster.

CreateClusterRequest

JSON representation

JSON representation
{ "projectId": string, "region": string, "clusterName": string, "masterConfig": { object (`InstanceGroupConfig`) }, "workerConfig": { object (`InstanceGroupConfig`) }, "secondaryWorkerConfig": { object (`InstanceGroupConfig`) }, "imageVersion": string, "image": string, "zone": string, "labels": { string: string, ... }, "properties": { string: string, ... }, "bucket": string, "tempBucket": string, "enableComponentGateway": boolean, "serviceAccount": string, "network": string, "subnetwork": string, "optionalComponents": [ enum (`Component`) ], "tier": enum (`ClusterTier`), "initializationActions": [ { object (`NodeInitializationAction`) } ], "autoscalingPolicy": string, "deleteMaxIdle": string, "deleteMaxAge": string, "stopMaxIdle": string, "stopMaxAge": string, "tags": [ string ], "resourceManagerTags": { string: string, ... } }

{
  "projectId": string,
  "region": string,
  "clusterName": string,
  "masterConfig": {
    object (InstanceGroupConfig)
  },
  "workerConfig": {
    object (InstanceGroupConfig)
  },
  "secondaryWorkerConfig": {
    object (InstanceGroupConfig)
  },
  "imageVersion": string,
  "image": string,
  "zone": string,
  "labels": {
    string: string,
    ...
  },
  "properties": {
    string: string,
    ...
  },
  "bucket": string,
  "tempBucket": string,
  "enableComponentGateway": boolean,
  "serviceAccount": string,
  "network": string,
  "subnetwork": string,
  "optionalComponents": [
    enum (Component)
  ],
  "tier": enum (ClusterTier),
  "initializationActions": [
    {
      object (NodeInitializationAction)
    }
  ],
  "autoscalingPolicy": string,
  "deleteMaxIdle": string,
  "deleteMaxAge": string,
  "stopMaxIdle": string,
  "stopMaxAge": string,
  "tags": [
    string
  ],
  "resourceManagerTags": {
    string: string,
    ...
  }
}

Fields
`projectId`	`string` Required. The ID of the Google Cloud Platform project that the cluster belongs to.
`region`	`string` Required. The Dataproc region in which to handle the request.
`clusterName`	`string` Required. The cluster name. Cluster names within a project must be unique. Names of deleted clusters can be reused.
`masterConfig`	`object (InstanceGroupConfig)` Optional. Configuration for master instances.
`workerConfig`	`object (InstanceGroupConfig)` Optional. Configuration for worker instances.
`secondaryWorkerConfig`	`object (InstanceGroupConfig)` Optional. Configuration for secondary worker instances.
`imageVersion`	`string` Optional. The version of software inside the cluster. It must be one of the supported Dataproc Versions, such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version. If unspecified, it defaults to the latest Debian version. E.g. "2.2-debian12"
`image`	`string` Optional. The Compute Engine image resource used for cluster instances. The URI can represent an image or image family. Image examples: `https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/[image-id]` `projects/[project_id]/global/images/[image-id]` `image-id` Image family examples. Dataproc will use the most recent image from the family: `https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/family/[custom-image-family-name]` `projects/[project_id]/global/images/family/[custom-image-family-name]` If the URI is unspecified, it will be inferred from `SoftwareConfig.image_version` or the system default.
`zone`	`string` Optional. The Compute Engine zone where the cluster will be located. On a get request, zone will always be present. A full URL, partial URI, or short name are valid. Examples: `https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]` `projects/[project_id]/zones/[zone]` `[zone]`
`labels`	`map (key: string, value: string)` Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035. Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035. No more than 32 labels can be associated with a cluster. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.
`properties`	`map (key: string, value: string)` Optional. The properties to set on daemon config files. Property keys are specified in `prefix:property` format, for example `core:hadoop.tmp.dir`. The following are supported prefixes and their mappings: capacity-scheduler: `capacity-scheduler.xml` core: `core-site.xml` distcp: `distcp-default.xml` hdfs: `hdfs-site.xml` hive: `hive-site.xml` mapred: `mapred-site.xml` pig: `pig.properties` spark: `spark-defaults.conf` yarn: `yarn-site.xml` For more information, see Cluster properties. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.
`bucket`	`string` Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets). This field requires a Cloud Storage bucket name, not a `gs://...` URI to a Cloud Storage bucket.
`tempBucket`	`string` Optional. A Cloud Storage bucket used to store ephemeral cluster and jobs data, such as Spark and MapReduce history files. If you do not specify a temp bucket, Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's temp bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket. The default bucket has a TTL of 90 days, but you can use any TTL (or none) if you specify a bucket (see Dataproc staging and temp buckets). This field requires a Cloud Storage bucket name, not a `gs://...` URI to a Cloud Storage bucket.
`enableComponentGateway`	`boolean` Optional. If true, enable http access to specific ports on the cluster from external sources. Defaults to false.
`serviceAccount`	`string` Optional. The Dataproc service account (also see VM Data Plane identity) used by Dataproc cluster VM instances to access Google Cloud Platform services. If not specified, the Compute Engine default service account is used.
`network`	`string` Optional. The Compute Engine network to be used for machine communications. Cannot be specified with subnetwork_uri. If neither `network_uri` nor `subnetwork_uri` is specified, the "default" network of the project is used, if it exists. Cannot be a "Custom Subnet Network" (see Using Subnetworks for more information). A full URL, partial URI, or short name are valid. Examples: `https://www.googleapis.com/compute/v1/projects/[project_id]/global/networks/default` `projects/[project_id]/global/networks/default` `default`
`subnetwork`	`string` Optional. The Compute Engine subnetwork to be used for machine communications. Cannot be specified with network_uri. A full URL, partial URI, or short name are valid. Examples: `https://www.googleapis.com/compute/v1/projects/[project_id]/regions/[region]/subnetworks/sub0` `projects/[project_id]/regions/[region]/subnetworks/sub0` `sub0`
`optionalComponents[]`	`enum (Component)` Optional. The set of components to activate on the cluster.
`tier`	`enum (ClusterTier)` Optional. The cluster tier.
`initializationActions[]`	`object (NodeInitializationAction)` Optional. Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes.
`autoscalingPolicy`	`string` Optional. The autoscaling policy used by the cluster. You can specify either the short name (e.g., `my-policy`) or the full resource name (e.g., `projects/[project_id]/locations/[region]/autoscalingPolicies/[policy_id]`).
`deleteMaxIdle`	`string (Duration format)` Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be deleted. Minimum value is 5 minutes; maximum value is 14 days.
`deleteMaxAge`	`string (Duration format)` Optional. The lifetime duration of cluster. The cluster will be auto-deleted at the end of this period. Minimum value is 10 minutes; maximum value is 14 days.
`stopMaxIdle`	`string (Duration format)` Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be stopped. Minimum value is 5 minutes; maximum value is 14 days.
`stopMaxAge`	`string (Duration format)` Optional. The lifetime duration of cluster. The cluster will be auto-stopped at the end of this period. Minimum value is 10 minutes; maximum value is 14 days.
`tags[]`	`string` Optional. The Compute Engine tags to add to all instances (see Tagging instances).
`resourceManagerTags`	`map (key: string, value: string)` Optional. The Resource Manager tags associated with this cluster. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.

InstanceGroupConfig

JSON representation
{ "numInstances": integer, "machineType": string, "bootDiskSizeGb": integer, "bootDiskType": string, "preemptibility": enum (`Preemptibility`), "accelerators": [ { object (`AcceleratorConfig`) } ] }

Fields
`numInstances`	`integer` Optional. The number of VM instances in the instance group. For HA cluster master_config groups, must be set to 3. For standard cluster master_config groups, must be set to 1.
`machineType`	`string` Optional. The Compute Engine machine type used for cluster instances. A full URL, partial URI, or short name are valid. Examples: `https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2` `projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2` `n1-standard-2` Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example, `n1-standard-2`.
`bootDiskSizeGb`	`integer` Optional. Size in GB of the boot disk (default is 500GB).
`bootDiskType`	`string` Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-balanced" (Persistent Disk Balanced Solid State Drive), "pd-ssd" (Persistent Disk Solid State Drive), or "pd-standard" (Persistent Disk Hard Disk Drive). See Disk types.
`preemptibility`	`enum (Preemptibility)` Optional. Specifies the preemptibility of the instance group. The default value for master and worker groups is `NON_PREEMPTIBLE`. This default cannot be changed. The default value for secondary instances is `PREEMPTIBLE`.
`accelerators[]`	`object (AcceleratorConfig)` Optional. The Compute Engine accelerator configuration for these instances.

AcceleratorConfig

JSON representation
{ "acceleratorTypeUri": string, "acceleratorCount": integer }

Fields

Fields
`acceleratorTypeUri`	`string` Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes. Examples: `https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4` `projects/[project_id]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4` `nvidia-tesla-t4` Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, `nvidia-tesla-t4`.
`acceleratorCount`	`integer` The number of the accelerator cards of this type exposed to this instance.

acceleratorTypeUri

string

Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes.

Examples:

https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4
projects/[project_id]/zones/[zone]/acceleratorTypes/nvidia-tesla-t4
nvidia-tesla-t4

Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example, nvidia-tesla-t4.

acceleratorCount

integer

The number of the accelerator cards of this type exposed to this instance.

LabelsEntry

JSON representation
{ "key": string, "value": string }

Fields
`key`	`string`
`value`	`string`

PropertiesEntry

JSON representation
{ "key": string, "value": string }

Fields
`key`	`string`
`value`	`string`

NodeInitializationAction

JSON representation
{ "executableFile": string, "executionTimeout": string }

Fields

Fields
`executableFile`	`string` Required. Cloud Storage URI of executable file.
`executionTimeout`	`string (Duration format)` Optional. Amount of time executable has to complete. Default is 10 minutes (see JSON representation of Duration). Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period.

executableFile

string

Required. Cloud Storage URI of executable file.

executionTimeout

string (Duration format)

Optional. Amount of time executable has to complete. Default is 10 minutes (see JSON representation of Duration).

Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period.

Duration

JSON representation
{ "seconds": string, "nanos": integer }

Fields

Fields
`seconds`	`string (int64 format)` Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years
`nanos`	`integer` Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 `seconds` field and a positive or negative `nanos` field. For durations of one second or more, a non-zero value for the `nanos` field must be of the same sign as the `seconds` field. Must be from -999,999,999 to +999,999,999 inclusive.

seconds

string (int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

ResourceManagerTagsEntry

JSON representation
{ "key": string, "value": string }

Fields
`key`	`string`
`value`	`string`

Output Schema

This resource represents a long-running operation that is the result of a network API call.

Operation

JSON representation

JSON representation
{ "name": string, "metadata": { "@type": string, field1: ..., ... }, "done": boolean, // Union field `result` can be only one of the following: "error": { object (`Status`) }, "response": { "@type": string, field1: ..., ... } // End of list of possible types for union field `result`. }

{
  "name": string,
  "metadata": {
    "@type": string,
    field1: ...,
    ...
  },
  "done": boolean,

  // Union field result can be only one of the following:
  "error": {
    object (Status)
  },
  "response": {
    "@type": string,
    field1: ...,
    ...
  }
  // End of list of possible types for union field result.
}

Fields
`name`	`string` The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the `name` should be a resource name ending with `operations/{unique_id}`.
`metadata`	`object` Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any. An object containing fields of an arbitrary type. An additional field `"@type"` contains a URI identifying the type. Example: `{ "id": 1234, "@type": "types.example.com/standard/id" }`.
`done`	`boolean` If the value is `false`, it means the operation is still in progress. If `true`, the operation is completed, and either `error` or `response` is available.
Union field `result`. The operation result, which can be either an `error` or a valid `response`. If `done` == `false`, neither `error` nor `response` is set. If `done` == `true`, exactly one of `error` or `response` can be set. Some services might not provide the result. `result` can be only one of the following:
`error`	`object (Status)` The error result of the operation in case of failure or cancellation.
`response`	`object` The normal, successful response of the operation. If the original method returns no data on success, such as `Delete`, the response is `google.protobuf.Empty`. If the original method is standard `Get`/`Create`/`Update`, the response should be the resource. For other methods, the response should have the type `XxxResponse`, where `Xxx` is the original method name. For example, if the original method name is `TakeSnapshot()`, the inferred response type is `TakeSnapshotResponse`. An object containing fields of an arbitrary type. An additional field `"@type"` contains a URI identifying the type. Example: `{ "id": 1234, "@type": "types.example.com/standard/id" }`.

Any

JSON representation
{ "typeUrl": string, "value": string }

Fields

Fields
`typeUrl`	`string` Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name. Example: type.googleapis.com/google.protobuf.StringValue This string must contain at least one `/` character, and the content after the last `/` must be the fully-qualified name of the type in canonical form, without a leading dot. Do not write a scheme on these URI references so that clients do not attempt to contact them. The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last `/` to identify the type. `type.googleapis.com/` is a common default prefix that some legacy implementations require. This prefix does not indicate the origin of the type, and URIs containing it are not expected to respond to any requests. All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): `/-.~_!$&()*+,;=`. Despite our allowing percent encodings, implementations should not unescape them to prevent confusion with existing parsers. For example, `type.googleapis.com%2FFoo` should be rejected. In the original design of `Any`, the possibility of launching a type resolution service at these type URLs was considered but Protobuf never implemented one and considers contacting these URLs to be problematic and a potential security issue. Do not attempt to contact type URLs.
`value`	`string (bytes format)` Holds a Protobuf serialization of the type described by type_url. A base64-encoded string.

typeUrl

string

Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name.

Example: type.googleapis.com/google.protobuf.StringValue

This string must contain at least one / character, and the content after the last / must be the fully-qualified name of the type in canonical form, without a leading dot. Do not write a scheme on these URI references so that clients do not attempt to contact them.

The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last / to identify the type. type.googleapis.com/ is a common default prefix that some legacy implementations require. This prefix does not indicate the origin of the type, and URIs containing it are not expected to respond to any requests.

All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): /-.~_!$&()*+,;=. Despite our allowing percent encodings, implementations should not unescape them to prevent confusion with existing parsers. For example, type.googleapis.com%2FFoo should be rejected.

In the original design of Any, the possibility of launching a type resolution service at these type URLs was considered but Protobuf never implemented one and considers contacting these URLs to be problematic and a potential security issue. Do not attempt to contact type URLs.

value

string (bytes format)

Holds a Protobuf serialization of the type described by type_url.

A base64-encoded string.

Status

JSON representation
{ "code": integer, "message": string, "details": [ { "@type": string, field1: ..., ... } ] }

Fields

Fields
`code`	`integer` The status code, which should be an enum value of `google.rpc.Code`.
`message`	`string` A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the `google.rpc.Status.details` field, or localized by the client.
`details[]`	`object` A list of messages that carry the error details. There is a common set of message types for APIs to use. An object containing fields of an arbitrary type. An additional field `"@type"` contains a URI identifying the type. Example: `{ "id": 1234, "@type": "types.example.com/standard/id" }`.

code

integer

The status code, which should be an enum value of google.rpc.Code.

message

string

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.

details[]

object

A list of messages that carry the error details. There is a common set of message types for APIs to use.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }.

Tool Annotations

Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ❌ | Open World Hint: ❌

MCP Tools Reference: dataproc.googleapis.com Stay organized with collections Save and categorize content based on your preferences.

Tool: create_cluster

Input Schema

CreateClusterRequest

InstanceGroupConfig

AcceleratorConfig

LabelsEntry

PropertiesEntry

NodeInitializationAction

Duration

ResourceManagerTagsEntry

Output Schema

Operation

Any

Status

Tool Annotations

MCP Tools Reference: dataproc.googleapis.com

Tool: `create_cluster`