Dataflow controls for generative AI use cases

This document includes the best practices and guidelines for Dataflow when running generative AI workloads on Google Cloud. Use Dataflow with Vertex AI to build complex pipelines that ingest data from various sources and aggregate the data as appropriate.

Optional Dataflow controls

We recommend that you implement the following security controls, depending on your data source.

Turn off external IP addresses for Dataflow jobs

Google control ID DF-CO-6.1
Category Optional
Description

Turn off external IP addresses for administrative and monitoring tasks that are related to Dataflow jobs. Instead, configure access to your Dataflow worker VMs using SSH.

Enable Private Google Access and specify one of the following options in your Dataflow job:

  • --usePublicIps=false and --network=NETWORK-NAME
  • --subnetwork=SUBNETWORK-NAME

Where:

  • NETWORK-NAME: The name of your Compute Engine network.
  • SUBNETWORK-NAME: The name of your Compute Engine subnetwork.
Applicable products
  • Compute Engine
  • Dataflow
Related NIST-800-53 controls
  • SC-7
  • SC-8
Related CRI profile controls
  • PR.AC-5.1
  • PR.AC-5.2
  • PR.DS-2.1
  • PR.DS-2.2
  • PR.DS-5.1
  • PR.PT-4.1
  • DE.CM-1.1
  • DE.CM-1.2
  • DE.CM-1.3
  • DE.CM-1.4
Related information

Use network tags for firewall rules

Google control ID DF-CO-6.2
Category Optional
Description

Network tags are text attributes that attach to Compute Engine VMs such as Dataflow worker VMs. Network tags let you make VPC network firewall rules and some custom static routes applicable to specific VM instances. Dataflow supports adding network tags to all worker VMs that run a particular Dataflow job.

Applicable products
  • Compute Engine
  • Dataflow
Related NIST-800-53 controls
  • SC-7
  • SC-8
Related CRI profile controls
  • PR.AC-5.1
  • PR.AC-5.2
  • PR.DS-2.1
  • PR.DS-2.2
  • PR.DS-5.1
  • PR.PT-4.1
  • DE.CM-1.1
  • DE.CM-1.2
  • DE.CM-1.3
  • DE.CM-1.4
Related information

What's next