Send feedback
Use Dataproc Hub
Stay organized with collections
Save and categorize content based on your preferences.
Dataproc Hub and
Vertex AI Workbench user-managed notebooks are
deprecated. On January 30, 2025, support for user-managed notebooks
will end and the ability to create user-managed notebooks instances
will be removed. For alternative notebook solutions
on Google Cloud, see:
Create a Dataproc JupyterLab cluster from Dataproc Hub
Select the User-Managed Notebooks tab on the
Dataproc→Workbench
page in the Google Cloud console.
Click Open JupyterLab in the row that
lists the Dataproc Hub instance created by the administrator.
If you do not have access to the Google Cloud console, enter the
Dataproc Hub instance URL that an
administrator shared with you in your web browser.
On the Jupyterhub→Dataproc Options page, select
a cluster configuration and zone. If enabled, specify any customizations, then
click Create .
After the Dataproc cluster is created, you are redirected
to the JupyterLab interface running on the cluster.
Create a notebook and run a Spark job
On the left panel of the JupyterLab interface, click on GCS
(Cloud Storage).
Create a PySpark notebook from the JupyterLab launcher.
The PySpark kernel initializes a SparkContext (using the sc
variable).
You can examine the SparkContext and run a Spark job from the notebook.
rdd = (sc.parallelize(['lorem', 'ipsum', 'dolor', 'sit', 'amet', 'lorem'])
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b))
print(rdd.collect())
Name and save the notebook. The notebook is saved and remains in
Cloud Storage after the Dataproc cluster is deleted.
Shut down the Dataproc cluster
From the JupyterLab interface, select File→Hub Control Panel to
open the Jupyterhub page.
When using Dataproc image versions 1.4 or earlier,
navigate to /hub/home
to access the Jupyterhub
page.
Click Stop My Cluster to shut down (delete) the JupyterLab server, which
deletes the Dataproc cluster.
Stopping the server and deleting the cluster
does not delete the Dataproc Hub instance .
You can click Start my server on the Jupyterhub
(Hub Control Panel) page or select the Open JupyterLab link for
your Dataproc Hub instance on the
Dataproc→Workbench→User-Managed Notebooks
page in the Google Cloud console to open configure and
create another Dataproc JupyterLab cluster.
Send feedback
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-10-18 UTC.
Need to tell us more?
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-10-18 UTC."],[],[]]