This document provides a high-level architecture for securing massive datasets that contain sensitive data, including personally identifiable information (PII), in Google Cloud. The architecture is designed to help secure sensitive data against accidental exposure and malicious exfiltration. It's intended for Data Compliance Officers and Cloud Security Engineers who are familiar with foundational cloud networking and identity concepts. The architecture highlights the use of network perimeters that explicitly override permissive Identity and Access Management (IAM) settings to prevent unauthorized public access, even when resources are misconfigured.
The deployment section of this document provides a Terraform code sample to self-deploy this perimeter and simulate a public access block.
Architecture
The following architecture diagram illustrates a robust, multi-layered data protection strategy on Google Cloud. It effectively shows how data moves from an unstructured state into a secured, governed environment.
The preceding diagram shows how each Google service provides security for the data from the initial data upload through data access:
- A user with appropriate IAM permissions uploads data to the BigQuery or Cloud Storage data storage service. The storage services are configured to use customer-managed encryption keys (CMEK) and Cloud KMS Autokey.
- The service encrypts the data by using a CMEK that's fetched from Cloud KMS Autokey.
- Sensitive Data Protection continually inspects, classifies, and de‑identifies sensitive data in the repository. To de‑identify sensitive data, Sensitive Data Protection masks the data by using templates and options that you configure.
- The storage services are inside a VPC Service Controls perimeter, which blocks data access from outside the perimeter. To grant access permission to designated users and systems (IAM principals), you use an Access Context Manager access level template. The designated principals can then access data that's inside the perimeter.
- VPC Service Controls denies access to anyone who isn't specified in the access level template.
Products used
- VPC Service Controls: A managed networking functionality that minimizes data exfiltration risks for your Google Cloud resources.
- Sensitive Data Protection: A fully managed service that's designed to help you discover, classify, and protect your valuable data assets, including personally identifiable information (PII).
- Cloud Key Management Service (Cloud KMS): A service that lets you create, import, and manage cryptographic keys and perform cryptographic operations in a single centralized cloud service.
- Access Context Manager: A service that lets you define fine-grained, attribute-based access control policies for your projects and resources in Google Cloud.
- BigQuery: An enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning geospatial analysis, and business intelligence.
- Cloud Storage: A low-cost, no-limit object store for diverse data types. Data can be accessed from within and outside Google Cloud, and it's replicated across locations for redundancy.
Use cases
This architecture provides a robust security framework for handling sensitive data in Google Cloud. It focuses on data protection, access control, and exfiltration prevention. The architecture helps to establish a secure environment for sensitive data by combining strong encryption with automated data de‑identification and network perimeter controls. This implementation helps to ensure that data is encrypted with customer-controlled keys (CMEKs), that sensitive information is automatically detected and masked, and that access is strictly confined and contextually managed. These measures help to significantly mitigate data exfiltration risks.
The following are examples of use cases for the architecture that's described in this document:
- Regulated industries (finance, healthcare, public sector): Industries that are heavily regulated require security frameworks that help achieve compliance and data security. In order to achieve compliance and protect sensitive customer data, it's crucial to ensure that data is secured at rest, that sensitive data is discovered, and that access controls are in place. We recommend that regulated industries implement this design to help achieve data compliance.
- Unregulated industries: Industries that aren't subject to data regulations can benefit from implementing a robust security framework. It's important to prevent sensitive data from being exposed, to store data securely, and to have policies in place that control access to data. The architecture in this document can help an unregulated industry to achieve the same level of security as a heavily regulated industry. We recommend that unregulated industries implement this design as a best practice.
Design considerations
- To analyze log violations and prevent accidental lockouts of production workloads, always implement VPC Service Controls in dry-run mode first.
- To ensure that trusted corporate networks and administrative devices can interact with protected resources, define precise Access Context Manager policies and levels.
To quickly and cost-effectively identify where sensitive data might reside, run Sensitive Data Protection with sampling enabled on large datasets. For more information, see the following resources:
To ensure comprehensive protection on those datasets, run a comprehensive scan after you inspect data with sampling. For more information, see Overview of sensitive data discovery.
To help ensure automated encryption enforcement, before you create resources, enable Cloud KMS Autokey at the folder or project level.
Deployment
To deploy a sample implementation of this architecture, use the data-security code sample that's available in GitHub.
What's next
- To gain experience with Cloud KMS Autokey, see the Encrypt Resources Easily with Cloud KMS Autokey Codelab.
- Learn how to use VPC Service Controls with Access Context Manager.
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.
Contributors
Authors:
- Manish Gaur | Security Architect
- James Meyer | Security Architect
Other contributors:
- Osvaldo Costa | Networking Specialist Customer Engineer
- Susan Wu | Outbound Product Manager
- Mark Schlagenhauf | Technical Writer, Networking
- Biodun Awojobi | Head of Customer Engineering, Security and Compliance