- JSON representation
- DiskConfig
- Preemptibility
- ManagedGroupConfig
- AcceleratorConfig
- InstanceFlexibilityPolicy
- ProvisioningModelMix
- InstanceSelection
- InstanceSelectionResult
- StartupConfig
The config settings for Compute Engine resources in an instance group, such as a master or worker group.
| JSON representation | 
|---|
| { "numInstances": integer, "instanceNames": [ string ], "imageUri": string, "machineTypeUri": string, "diskConfig": { object ( | 
| Fields | |
|---|---|
| numInstances | 
 Optional. The number of VM instances in the instance group. For HA cluster masterConfig groups, must be set to 3. For standard cluster masterConfig groups, must be set to 1. | 
| instanceNames[] | 
 Output only. The list of instance names. Dataproc derives the names from  | 
| imageUri | 
 Optional. The Compute Engine image resource used for cluster instances. The URI can represent an image or image family. Image examples: 
 Image family examples. Dataproc will use the most recent image from the family: 
 If the URI is unspecified, it will be inferred from  | 
| machineTypeUri | 
 Optional. The Compute Engine machine type used for cluster instances. A full URL, partial URI, or short name are valid. Examples: 
 Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the machine type resource, for example,  | 
| diskConfig | 
 Optional. Disk option config settings. | 
| isPreemptible | 
 Output only. Specifies that this instance group contains preemptible instances. | 
| preemptibility | 
 Optional. Specifies the preemptibility of the instance group. The default value for master and worker groups is  The default value for secondary instances is  | 
| managedGroupConfig | 
 Output only. The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups. | 
| accelerators[] | 
 Optional. The Compute Engine accelerator configuration for these instances. | 
| minCpuPlatform | 
 Optional. Specifies the minimum cpu platform for the Instance Group. See Dataproc -> Minimum CPU Platform. | 
| minNumInstances | 
 Optional. The minimum number of primary worker instances to create. If  Example: Cluster creation request with  
 | 
| instanceFlexibilityPolicy | 
 Optional. Instance flexibility Policy allowing a mixture of VM shapes and provisioning models. | 
| startupConfig | 
 Optional. Configuration to handle the startup of instances during cluster create and update process. | 
DiskConfig
Specifies the config of boot disk and attached disk options for a group of VM instances.
| JSON representation | 
|---|
| { "bootDiskType": string, "bootDiskSizeGb": integer, "numLocalSsds": integer, "localSsdInterface": string, "bootDiskProvisionedIops": string, "bootDiskProvisionedThroughput": string } | 
| Fields | |
|---|---|
| bootDiskType | 
 Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-balanced" (Persistent Disk Balanced Solid State Drive), "pd-ssd" (Persistent Disk Solid State Drive), or "pd-standard" (Persistent Disk Hard Disk Drive). See Disk types. | 
| bootDiskSizeGb | 
 Optional. Size in GB of the boot disk (default is 500GB). | 
| numLocalSsds | 
 Optional. Number of attached SSDs, from 0 to 8 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries. Note: Local SSD options may vary by machine type and number of vCPUs selected. | 
| localSsdInterface | 
 Optional. Interface type of local SSDs (default is "scsi"). Valid values: "scsi" (Small Computer System Interface), "nvme" (Non-Volatile Memory Express). See local SSD performance. | 
| bootDiskProvisionedIops | 
 Optional. Indicates how many IOPS to provision for the disk. This sets the number of I/O operations per second that the disk can handle. This field is supported only if  | 
| bootDiskProvisionedThroughput | 
 Optional. Indicates how much throughput to provision for the disk. This sets the number of throughput mb per second that the disk can handle. Values must be greater than or equal to 1. This field is supported only if  | 
Preemptibility
Controls the use of preemptible instances within the group.
| Enums | |
|---|---|
| PREEMPTIBILITY_UNSPECIFIED | Preemptibility is unspecified, the system will choose the appropriate setting for each instance group. | 
| NON_PREEMPTIBLE | Instances are non-preemptible. This option is allowed for all instance groups and is the only valid value for Master and Worker instance groups. | 
| PREEMPTIBLE | Instances are preemptible. This option is allowed only for secondary worker groups. | 
| SPOT | Instances are Spot VMs. This option is allowed only for secondary worker groups. Spot VMs are the latest version of preemptible VMs, and provide additional features. | 
ManagedGroupConfig
Specifies the resources used to actively manage an instance group.
| JSON representation | 
|---|
| { "instanceTemplateName": string, "instanceGroupManagerName": string, "instanceGroupManagerUri": string } | 
| Fields | |
|---|---|
| instanceTemplateName | 
 Output only. The name of the Instance Template used for the Managed Instance Group. | 
| instanceGroupManagerName | 
 Output only. The name of the Instance Group Manager for this group. | 
| instanceGroupManagerUri | 
 Output only. The partial URI to the instance group manager for this group. E.g. projects/my-project/regions/us-central1/instanceGroupManagers/my-igm. | 
AcceleratorConfig
Specifies the type and number of accelerator cards attached to the instances of an instance. See GPUs on Compute Engine.
| JSON representation | 
|---|
| { "acceleratorTypeUri": string, "acceleratorCount": integer } | 
| Fields | |
|---|---|
| acceleratorTypeUri | 
 Full URL, partial URI, or short name of the accelerator type resource to expose to this instance. See Compute Engine AcceleratorTypes. Examples: 
 Auto Zone Exception: If you are using the Dataproc Auto Zone Placement feature, you must use the short name of the accelerator type resource, for example,  | 
| acceleratorCount | 
 The number of the accelerator cards of this type exposed to this instance. | 
InstanceFlexibilityPolicy
Instance flexibility Policy allowing a mixture of VM shapes and provisioning models.
| JSON representation | 
|---|
| { "provisioningModelMix": { object ( | 
| Fields | |
|---|---|
| provisioningModelMix | 
 Optional. Defines how the Group selects the provisioning model to ensure required reliability. | 
| instanceSelectionList[] | 
 Optional. List of instance selection options that the group will use when creating new VMs. | 
| instanceSelectionResults[] | 
 Output only. A list of instance selection results in the group. | 
ProvisioningModelMix
Defines how Dataproc should create VMs with a mixture of provisioning models.
| JSON representation | 
|---|
| { "standardCapacityBase": integer, "standardCapacityPercentAboveBase": integer } | 
| Fields | |
|---|---|
| standardCapacityBase | 
 Optional. The base capacity that will always use Standard VMs to avoid risk of more preemption than the minimum capacity you need. Dataproc will create only standard VMs until it reaches standardCapacityBase, then it will start using standardCapacityPercentAboveBase to mix Spot with Standard VMs. eg. If 15 instances are requested and standardCapacityBase is 5, Dataproc will create 5 standard VMs and then start mixing spot and standard VMs for remaining 10 instances. | 
| standardCapacityPercentAboveBase | 
 Optional. The percentage of target capacity that should use Standard VM. The remaining percentage will use Spot VMs. The percentage applies only to the capacity above standardCapacityBase. eg. If 15 instances are requested and standardCapacityBase is 5 and standardCapacityPercentAboveBase is 30, Dataproc will create 5 standard VMs and then start mixing spot and standard VMs for remaining 10 instances. The mix will be 30% standard and 70% spot. | 
InstanceSelection
Defines machines types and a rank to which the machines types belong.
| JSON representation | 
|---|
| { "machineTypes": [ string ], "rank": integer } | 
| Fields | |
|---|---|
| machineTypes[] | 
 Optional. Full machine-type names, e.g. "n1-standard-16". | 
| rank | 
 Optional. Preference of this instance selection. Lower number means higher preference. Dataproc will first try to create a VM based on the machine-type with priority rank and fallback to next rank based on availability. Machine types and instance selections with the same priority have the same preference. | 
InstanceSelectionResult
Defines a mapping from machine types to the number of VMs that are created with each machine type.
| JSON representation | 
|---|
| { "machineType": string, "vmCount": integer } | 
| Fields | |
|---|---|
| machineType | 
 Output only. Full machine-type names, e.g. "n1-standard-16". | 
| vmCount | 
 Output only. Number of VM provisioned with the machineType. | 
StartupConfig
Configuration to handle the startup of instances during cluster create and update process.
| JSON representation | 
|---|
| { "requiredRegistrationFraction": number } | 
| Fields | |
|---|---|
| requiredRegistrationFraction | 
 Optional. The config setting to enable cluster creation/ updation to be successful only after requiredRegistrationFraction of instances are up and running. This configuration is applicable to only secondary workers for now. The cluster will fail if requiredRegistrationFraction of instances are not available. This will include instance creation, agent registration, and service registration (if enabled). |