This document describes the requirements for Google Cloud Serverless for Apache Spark network configuration.
Virtual Private Cloud subnetwork requirements
This document describes Virtual Private Cloud network requirements for Google Cloud Serverless for Apache Spark batch workloads and interactive sessions.
Private Google Access
Serverless for Apache Spark batch workloads and interactive sessions run on VMs with internal IP addresses only and on a regional subnet with Private Google Access (PGA) enabled automatically on the subnet.
If you don't specify a subnet, Serverless for Apache Spark selects the
default subnet in the batch workload or session region as the subnet for a
batch workload or session.
If your workload requires external network or internet access, for example, to download resources such as ML models from PyTorch Hub or Hugging Face, you can set up Cloud NAT to allow outbound traffic using internal IPs on your VPC network.
Open subnet connectivity
The VPC subnet for the region selected for the Serverless for Apache Spark batch workload or interactive session must allow internal communication on all ports between VM instances within the subnet.
To prevent malicious scripts in one workload from affecting other workloads, Serverless for Apache Spark deploys default security measures.
The following Google Cloud CLI command attaches a network firewall to a subnet that allows internal ingress communications among VMs using all protocols on all ports:
gcloud compute firewall-rules create allow-internal-ingress \ --network=NETWORK_NAME \ --source-ranges=SUBNET_RANGES \ --destination-ranges=SUBNET_RANGES \ --direction=ingress \ --action=allow \ --rules=all
Notes:
SUBNET_RANGES: See Allow internal ingress connections between VMs. The
defaultVPC network in a project with thedefault-allow-internalfirewall rule, which allows ingress communication on all ports (tcp:0-65535,udp:0-65535, andicmp protocols:ports), meets the open-subnet-connectivity requirement. However, this rule also allows ingress by any VM instance on the network.
Automatically created regional system firewall policy
To satisfy the open subnet connectivity requirement,
Serverless for Apache Spark batch workloads and interactive sessions
that use runtime version 3.0 or later automatically create a regional
system firewall policy dataproc-firewall-policy-[network-id]-region or
dataproc-fw-[network-id]-region on the batch or session VPC subnet.
This policy contains the following ingress and egress rules.
| Name | Purpose | Priority | Direction | Action | Source and Destination | Protocol and ports |
|---|---|---|---|---|---|---|
dataproc-allow-internal-ingress-rule-[subnetworkId] |
Allows all necessary internal communication only from other tagged Serverless for Apache Spark VMs within the same subnet. | 4 | INGRESS | ALLOW |
srcSecureTag: secure tag value for this subnet.targetSecureTags:secure tag value for this subnet. |
tcp:0-65535, udp:0-65535, icmp protocols:ports |
dataproc-allow-internal-egress-rule-[subnetworkId] |
Allows Serverless for Apache Spark VMs to download packages, for example pip and apt-get, and access Google APIs using Private Google Access. | 5 | EGRESS | ALLOW |
destIpRanges: 0.0.0.0/0.targetSecureTags:secure tag value for this subnet. |
tcp:0-65535, udp:0-65535, icmp protocols:ports |
Notes:
Serverless for Apache Spark provisions a tenant project associated with the user project to store secure tags. Serverless for Apache Spark creates a secure tag for the subnet in the tenant project and attaches it to Serverless for Apache Spark VMs, which ensures that the created system firewall policy only applies to Serverless for Apache Spark VMs.
Automatically created system firewall policy is not supported for Shared VPC.
Serverless for Apache Spark and VPC-SC networks
With VPC Service Controls, network administrators can define a security perimeter around resources of Google-managed services to control communication to and between those services.
Consider the following strategies when you use VPC-SC networks with Serverless for Apache Spark:
Create a custom container image that pre-installs dependencies outside the VPC-SC perimeter, and then submit a Spark batch workload that uses your custom container image.
For more information, see VPC Service Controls— Serverless for Apache Spark.