"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

Use YAML files with workflows

You can define a workflow template in a YAML file, then instantiate the template to run the workflow. You can also import and export a workflow template YAML file to create and update a Managed Service for Apache Spark workflow template resource.

Run a workflow using a YAML file

To run a workflow without first creating a workflow template resource, use the gcloud dataproc workflow-templates instantiate-from-file command.

Define your workflow template in a YAML file. The YAML file must include all required WorkflowTemplate fields except the id field, and it must also exclude the version field and all output-only fields. In the following workflow example, the prerequisiteStepIds list in the terasort step ensures the terasort step will only begin after the teragen step completes successfully.

jobs:
- hadoopJob:
    args:
    - teragen
    - '1000'
    - hdfs:///gen/
    mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
  stepId: teragen
- hadoopJob:
    args:
    - terasort
    - hdfs:///gen/
    - hdfs:///sort/
    mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
  stepId: terasort
  prerequisiteStepIds:
    - teragen
placement:
  managedCluster:
    clusterName: my-managed-cluster
    config:
      gceClusterConfig:
        zoneUri: us-central1-a

Run the workflow:

gcloud dataproc workflow-templates instantiate-from-file \
    --file=TEMPLATE_YAML \
    --region=REGION

Instantiate a workflow using a YAML file with Managed Service for Apache Spark Auto Zone Placement

Define your workflow template in a YAML file. This YAML file is the same as the previous YAML file, except the zoneUri field is set to the empty string ('') to allow Managed Service for Apache Spark Auto Zone Placement to select the zone for the cluster.

jobs:
- hadoopJob:
    args:
    - teragen
    - '1000'
    - hdfs:///gen/
    mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
  stepId: teragen
- hadoopJob:
    args:
    - terasort
    - hdfs:///gen/
    - hdfs:///sort/
    mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
  stepId: terasort
  prerequisiteStepIds:
    - teragen
placement:
  managedCluster:
    clusterName: my-managed-cluster
    config:
      gceClusterConfig:
        zoneUri: ''

Run the workflow. When using Auto Placement, you must pass a region to the gcloud command.

gcloud dataproc workflow-templates instantiate-from-file \
    --file=TEMPLATE_YAML \
    --region=REGION

Import and export a workflow template YAML file

You can import and export workflow template YAML files. Typically, a workflow template is first exported as a YAML file, then the YAML is edited, and then the edited YAML file is imported to update the template.

Export the workflow template to a YAML file. During the export operation, the id and version fields, and all output-only fields are filtered from the output and do not appear in the exported YAML file.
```
gcloud dataproc workflow-templates export TEMPLATE_ID or TEMPLATE_NAME \
    --destination=TEMPLATE_YAML \
    --region=REGION
```
You can pass either the WorkflowTemplate id or the fully qualified template resource name ("projects/PROJECT_ID/regions/REGION/workflowTemplates/TEMPLATE_ID") to the command.
If you omit the --destination flag, the output is directed to stdout, so the following command will also export the template to a YAML file:
```
gcloud dataproc workflow-templates export TEMPLATE_ID or TEMPLATE_NAME \
    --region=REGION > TEMPLATE_YAML
```
Edit the YAML file locally. Note that the id, version, and output-only fields, which were filtered from the YAML file when the template was exported, are disallowed in the imported YAML file.
Import the updated workflow template YAML file:
```
gcloud dataproc workflow-templates import TEMPLATE_ID or TEMPLATE_NAME \
    --source=TEMPLATE_YAML \
    --region=REGION
```
You can pass either the WorkflowTemplate id or the fully qualified template resource name ("projects/PROJECT_ID/regions/region/workflowTemplates/TEMPLATE_ID") to the command. The template resource with the same template name will be overwritten (updated) and its version number will be incremented. If a template with the same template name does not exist, it will be created.