"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

Managed Service for Apache Spark on clusters overview

Managed Service for Apache Spark on clusters lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Managed Service for Apache Spark on clusters automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them. With less time and money spent on administration, you can focus on your jobs and your data.

Advantages of Managed Service for Apache Spark on clusters

When compared to traditional on-premises products and competing cloud services, Managed Service for Apache Spark provides a number of unique advantages for clusters of three to hundreds of nodes:

Low cost — Managed Service for Apache Spark on clusters is priced at only 1 cent per virtual CPU in your cluster per hour, on top of the other Google Cloud resources you use. In addition to this low price, Managed Service for Apache Spark clusters can include preemptible instances that have lower compute prices, reducing your costs even further. Instead of rounding your usage up to the nearest hour, Managed Service for Apache Spark on clusters charges you only for what you really use with second-by-second billing and a low, one-minute-minimum billing period.
Super fast — Without using Managed Service for Apache Spark on clusters, it can take from five to 30 minutes to create Spark and Hadoop clusters on-premises or through IaaS providers. By comparison, Managed Service for Apache Spark clusters are quick to start, scale, and shutdown, with each of these operations taking 90 seconds or less, on average. This means you can spend less time waiting for clusters and more time working with your data.
Integrated — Managed Service for Apache Spark on clusters has built-in integration with other Google Cloud services, such as BigQuery, Cloud Storage, Bigtable, Cloud Logging, and Cloud Monitoring, so you have more than just a Spark or Hadoop cluster—you have a complete data platform. For example, you can use Managed Service for Apache Spark on clusters to effortlessly ETL terabytes of raw log data directly into BigQuery for business reporting.
Managed — Use Spark and Hadoop clusters without the assistance of an administrator or special software. You can interact with clusters and Spark or Hadoop jobs through the Google Cloud console, the Cloud SDK, or the Managed Service for Apache Spark on clusters REST API. When you're done with a cluster, you can turn it off, so you don't spend money on an idle cluster. You won't need to worry about losing data, because Managed Service for Apache Spark is integrated with Cloud Storage, BigQuery, and Bigtable.
Simple and familiar — You don't need to learn new tools or APIs to use Managed Service for Apache Spark on clusters, which lets you move existing projects into Managed Service for Apache Spark on clusters without redevelopment. Spark, Hadoop, Pig, and Hive are frequently updated, so you can be productive faster.

What's included in Managed Service for Apache Spark on clusters?

For a list of the open source (Hadoop, Spark, Hive, and Pig) and Google Cloud connector versions supported by Managed Service for Apache Spark on clusters, see the Managed Service for Apache Spark cluster image version lists.

Getting started

To get started, see the Managed Service for Apache Spark on clusters quickstarts. You can access Managed Service for Apache Spark on clusters in the following ways:

Through the REST API
Using the Cloud SDK
Using the Google Cloud console
Using Cloud Client Libraries

Managed Service for Apache Spark on clusters overview Stay organized with collections Save and categorize content based on your preferences.

Advantages of Managed Service for Apache Spark on clusters

What's included in Managed Service for Apache Spark on clusters?

Getting started

Managed Service for Apache Spark on clusters overview