This document includes the best practices and guidelines for Dataflow when running generative AI workloads on Google Cloud. Use Dataflow with Vertex AI to build complex pipelines that ingest data from various sources and aggregate the data as appropriate.
Optional Dataflow controls
We recommend that you implement the following security controls, depending on your data source.
Turn off external IP addresses for Dataflow jobs
| Google control ID | DF-CO-6.1 |
|---|---|
| Category | Optional |
| Description | Turn off external IP addresses for administrative and monitoring tasks that are related to Dataflow jobs. Instead, configure access to your Dataflow worker VMs using SSH. Enable Private Google Access and specify one of the following options in your Dataflow job:
Where:
|
| Applicable products |
|
| Related NIST-800-53 controls |
|
| Related CRI profile controls |
|
| Related information |
Use network tags for firewall rules
| Google control ID | DF-CO-6.2 |
|---|---|
| Category | Optional |
| Description | Network tags are text attributes that attach to Compute Engine VMs such as Dataflow worker VMs. Network tags let you make VPC network firewall rules and some custom static routes applicable to specific VM instances. Dataflow supports adding network tags to all worker VMs that run a particular Dataflow job. |
| Applicable products |
|
| Related NIST-800-53 controls |
|
| Related CRI profile controls |
|
| Related information |