Managed Service for Apache Spark serverless deployment overview

Managed Service for Apache Spark serverless deployment lets you run Spark workloads without requiring you to provision and manage your own Managed Service for Apache Spark cluster. There are two ways to run Managed Service for Apache Spark workloads: batch workloads and interactive sessions.

Batch workloads

Submit a batch workload using the Google Cloud console, Google Cloud CLI, or REST API. Managed Service for Apache Spark runs the workload on a managed compute infrastructure, autoscaling resources as needed. Charges apply only to the time when the workload is executing.

Batch workload capabilities

You can run the following batch workload types:

  • PySpark
  • Spark SQL
  • Spark R
  • Spark (Java or Scala)

You can specify Spark properties when you submit a batch workload.

Schedule batch workloads

You can schedule a Spark batch workload as part of an Airflow or Cloud Composer workflow using an Airflow batch operator. For more information, see Run Managed Service for Apache Spark workloads with Cloud Composer.

Get started

To get started, see Run an Apache Spark batch workload.

Interactive sessions

Write and run code in Jupyter notebooks during an interactive session. You can create a notebook session in the following ways:

  • Run PySpark code in BigQuery Studio notebooks. Open a BigQuery Python notebook to create a Spark-Connect-based Managed Service for Apache Spark interactive session. Each BigQuery notebook can have only one active Managed Service for Apache Spark session associated with it.

  • Use the JupyterLab plugin to create multiple Jupyter notebook sessions from templates that you create and manage. When you install the plugin on a local machine or Compute Engine VM, different cards that correspond to different Spark kernel configurations appear on the JupyterLab launcher page. Click a card to create a Managed Service for Apache Spark notebook session, then start writing and testing your code in the notebook.

    The JupyterLab plugin also lets you use the JupyterLab launcher page to take the following actions:

    • Create Managed Service for Apache Spark clusters.
    • Submit jobs to clusters.
    • View Google Cloud and Spark logs.

Security compliance

Managed Service for Apache Spark adheres to all data residency, CMEK, VPC-SC, and other security requirements that Managed Service for Apache Spark is compliant with.