Service options are a type of pipeline option that let you specify additional job modes and configurations for a Dataflow job. Set these options by setting the Dataflow service options pipeline option.
Java
--dataflowServiceOptions=SERVICE_OPTION
Replace SERVICE_OPTION with the service option that you want to use.
Python
--dataflow_service_options=SERVICE_OPTION
Replace SERVICE_OPTION with the service option that you want to use.
Go
--dataflow_service_options=SERVICE_OPTION
Replace SERVICE_OPTION with the service option that you want to use.
gcloud
Use the
gcloud dataflow jobs run command
with the additional-experiments option. If you're using Flex Templates, use
the
gcloud dataflow flex-template run
command.
--additional-experiments=SERVICE_OPTION
For example:
gcloud dataflow jobs run JOB_NAME --additional-experiments=SERVICE_OPTION
Replace the following values:
- JOB_NAME: the name of your Dataflow job
- SERVICE_OPTION: the service option that you want to use
REST
Use the additionalExperiments field in the
RuntimeEnvironment
object. If you're using Flex Templates, use the additionalExperiments field
in the
FlexTemplateRuntimeEnvironment
object.
{
  additionalExperiments : ["SERVICE_OPTION"]
  ...
}
Replace SERVICE_OPTION with the service option that you want to use.
For more information, see Set Dataflow pipeline options.
Dataflow supports the following service options.
| Option | Description | 
|---|---|
| automatically_use_created_reservation | Use Compute Engine reservations for the Dataflow workers. For more information, see Use Compute Engine reservations with Dataflow | 
| block_project_ssh_keys | Prevents VMs from accepting SSH keys that are stored in project metadata. For more information, see Restrict SSH keys from VMs. | 
| disable_image_streaming | Fully download containers up-front instead of downloading their
        content as needed. This option is the opposite of enable_image_streaming. | 
| enable_confidential_compute | Enables Confidential VM with AMD Secure Encryption Virtualization (SEV) on Dataflow worker VMs. For more information, see Confidential Computing concepts. This service option is not compatible with Dataflow Prime or worker accelerators. You must specify a supported machine type. When this option is enabled, the job incurs additional flat per-vCPU and per-GB costs. For more information, see Dataflow pricing. | 
| enable_lineage | Enable data lineage for your Dataflow jobs. For more information, see Use data lineage in Dataflow. | 
| enable_dynamic_thread_scaling | Enable dynamic thread scaling on Dataflow worker VMs. For more information, see Dynamic thread scaling. | 
| enable_google_cloud_heap_sampling | Enable heap profiling. For more information, see Monitoring pipeline performance using Cloud Profile. | 
| enable_google_cloud_profiler | Enable performance profiling. For more information, see Monitoring pipeline performance using Cloud Profile. | 
| enable_image_streaming | Download container content as-needed instead of downloading their full content up-front. This option improves startup time and autoscaling latency for pipelines using custom containers that can process data before their full content is available. You must have the Container File System API enabled to benefit from this option. For more information, see Dataflow container image streaming | 
| enable_preflight_validation | When you run your pipeline on Dataflow,
        before the job launches, Dataflow
        performs validation checks on the pipeline. This option is enabled by
        default. To disable pipeline validation, set this option to false.
        For more information, see
        Pipeline validation. | 
| enable_prime | Enable Dataflow Prime for this job. For more information, see Use Dataflow Prime. | 
| enable_streaming_engine_resource_based_billing | Enable resource-based billing for this job. For more information, see Pricing in "Use Streaming Engine for streaming jobs." | 
| graph_validate_only | Runs a job graph validation check to verify whether a replacement job is valid. For more information, see Validate a replacement job. | 
| max_workflow_runtime_walltime_seconds | The maximum number of seconds the job can run. If the job exceeds this limit, Dataflow cancels the job. This service option is supported for batch jobs only. Batch jobs can't run for more than 10 days. After 10 days, the job is cancelled. Specify the number of seconds as a parameter to the flag. For example: 
 | 
| min_num_workers | The minimum number of workers the job uses. Guarantees that a Dataflow job always has at least the number of workers specified. Horizontal autoscaling doesn't scale below the number of workers specified. | 
| parallel_replace_job_id | 
          When performing an
          automated parallel pipeline
          update, identifies the job to replace by job ID.  Use this option
          with  | 
| parallel_replace_job_min_parallel_pipelines_duration | 
          When performing an
          automated parallel pipeline
          update, specifies the minimum amount of time the two pipelines run
          in parallel.  After this duration passes, the old job is sent a drain
          signal. The duration must be between 0 seconds ( Specify the duration as a parameter. For example, to specify 10 minutes, use the following syntax: 
 | 
| parallel_replace_job_name | 
          When performing an
          automated parallel pipeline
          update, identifies the job to replace by job name.  Use this
          option with  | 
| sdf_checkpoint_after_duration | The maximum duration each worker buffers
        splittable  This service option is supported for Streaming Engine jobs that use Runner v2. Specify the duration as a parameter. For example, to change the default from 5 seconds to 500 milliseconds, use the following syntax: 
 | 
| sdf_checkpoint_after_output_bytes | The maximum splittable  This service option is supported for Streaming Engine jobs that use Runner v2. Specify the number of bytes as a parameter. For example, to change the default from 5 MiB to 512 KiB, use the following syntax: 
 | 
| streaming_mode_at_least_once | Enables at-least-once streaming mode. For more information, see Set the pipeline streaming mode. | 
| streaming_enable_pubsub_direct_output | Enables at-least-once streaming mode for Pub/Sub output. This option does not affect the streaming mode for the rest of the pipeline. For more information, see Set the pipeline streaming mode. | 
| use_network_tags | Apply network tags to a Dataflow job. For more information, see Use network tags with Dataflow. | 
| use_vm_tags | Apply secure tags to a Dataflow job. For more information, see Use secure tags with Dataflow. | 
| worker_accelerator | Enable GPUs or TPUs for this job. If you use right fitting, don't use this service option. GPUs Specify the type and number of GPUs to attach to Dataflow workers as parameters to the flag. For a list of GPU types that are supported with Dataflow, see Dataflow support for GPUs. For example: --dataflow_service_options "worker_accelerator=type:GPU_TYPE;count:GPU_COUNT;install-nvidia-driver"If you're using
        NVIDIA Multi-Process Service (MPS),
        append the  
         For more information about using GPUs, see GPUs with Dataflow. TPUs Specify the type and topology of TPUs to attach to Dataflow workers as parameters to the flag. For a list of TPU types that are supported with Dataflow, see Supported TPU accelerators. For example: --dataflow_service_options "worker_accelerator=type:TPU_TYPE;topology:TPU_TOPOLOGY"For a TPU type  "worker_accelerator=type:tpu-v5-lite-podslice;topology:1x1" | 
| worker_utilization_hint | Specifies the target CPU utilization. For more information, see Set the worker utilization hint. |