Model Armor is a Google Cloud service designed to enhance the security and safety of your AI applications. It works by proactively screening LLM prompts and responses, protecting against various risks and ensuring responsible AI practices. Whether you are deploying AI in Google Cloud or other cloud providers, Model Armor can help you prevent malicious input, verify content safety, protect sensitive data, maintain compliance, and enforce your AI safety and security policies consistently across your AI applications.
Architecture
This architecture diagram shows an application using Model Armor to protect an LLM and a user. The following steps explain the data flow:
- A user provides a prompt to the application.
- Model Armor inspects the incoming prompt for potentially sensitive content.
- The prompt (or sanitized prompt) is sent to the LLM.
- The LLM generates a response.
- Model Armor inspects the generated response for potentially sensitive content.
- The response (or sanitized response) is sent to the user. Model Armor sends a detailed description of triggered and untriggered filters in the response.
Model Armor filters both input (prompts) and output (responses) to prevent the LLM from exposure to or generation of malicious or sensitive content.
Use cases
Model Armor has several use cases, which include the following:
Security
- Mitigate the risk of leaking sensitive intellectual property (IP) and personally identifiable information (PII) in LLM prompts or responses.
- Protect against prompt injection and jailbreak attacks, preventing malicious actors from manipulating AI systems to perform unintended actions.
- Scan text in PDFs for sensitive or malicious content.
Safety and responsible AI
- Prevent your chatbot from recommending competitor solutions, maintaining brand integrity and customer loyalty.
- Filter social media posts generated by AI applications that contain harmful messaging, such as dangerous or hateful content.
Model Armor templates
Model Armor templates let you configure how Model Armor screens prompts and responses. They function as sets of customized filters and thresholds for different safety and security confidence levels, allowing control over what content is flagged.
The thresholds represent confidence levels—how confident
Model Armor is that the prompt or response includes offending
content. For example, you can create a template that filters prompts for hateful
content with a HIGH threshold, meaning Model Armor reports high
confidence that the prompt contains hateful content. A LOW_AND_ABOVE threshold
indicates any level of confidence (LOW, MEDIUM, and HIGH) in making that
claim.
For more information, see Model Armor templates.
Model Armor confidence levels
You can set confidence levels for responsible AI safety categories (sexually explicit, dangerous, harassment, and hate speech), prompt injection and jailbreak detection, and sensitive data protection (including topicality).
For confidence levels that support granular thresholds, Model Armor interprets them as follows:
- High: Identify if the message has content with a high likelihood.
- Medium and above: Identify if the message has content with a medium or high likelihood.
- Low and above: Identify if the message has content with a low, medium, or high likelihood.
Model Armor filters
Model Armor offers a variety of filters to help you provide safe and secure AI models. The following filter categories are available.
Responsible AI safety filter
You can screen prompts and responses at the specified confidence levels for the following categories:
| Category | Definition |
|---|---|
| Hate Speech | Negative or harmful comments targeting identity and/or protected attributes. |
| Harassment | Threatening, intimidating, bullying, or abusive comments targeting another individual. |
| Sexually Explicit | Contains references to sexual acts or other lewd content. |
| Dangerous Content | Promotes or enables access to harmful goods, services, and activities. |
| CSAM | Contains references to child sexual abuse material (CSAM). This filter is applied by default and cannot be turned off. |
Prompt injection and jailbreak detection
Prompt injection is a security vulnerability where attackers craft special commands within the text input (the prompt) to trick an AI model. This can make the AI ignore its usual instructions, reveal sensitive information, or perform actions it wasn't designed to do. Jailbreaking in the context of LLMs refers to the act of bypassing the safety protocols and ethical guidelines that are built into the model. This lets the LLM generate responses that it was originally designed to avoid, such as harmful, unethical, and dangerous content.
When prompt injection and jailbreak detection is enabled, Model Armor scans prompts and responses for malicious content. If detected, Model Armor blocks the prompt or response.
Sensitive Data Protection
Sensitive Data Protection is a Google Cloud service that helps you discover, classify, and de-identify sensitive data. Sensitive Data Protection can identify sensitive elements, context, and documents to help you reduce the risk of data leakage going into and out of AI workloads. You can use Sensitive Data Protection directly within Model Armor to transform, tokenize, and redact sensitive elements while retaining non-sensitive context. Model Armor can accept existing inspection templates, which function as blueprints to streamline the process of scanning and identifying sensitive data specific to your business and compliance needs. This ensures consistency and interoperability between other workloads that use Sensitive Data Protection.
Model Armor offers two modes for Sensitive Data Protection configuration:
Basic configuration: In this mode, you configure Sensitive Data Protection by specifying the types of sensitive data to scan for. This mode supports the following categories:
- Credit card number
- US social security number (SSN)
- Financial account number
- US individual taxpayer identification number (ITIN)
- Google Cloud credentials
- Google Cloud API key
Basic configuration only supports inspection operations and doesn't support the use of Sensitive Data Protection templates. For more information, see Basic Sensitive Data Protection configuration.
Advanced configuration: This mode offers more flexibility and customization through Sensitive Data Protection templates. Sensitive Data Protection templates are predefined configurations that let you specify more granular detection rules and de-identification techniques. Advanced configuration supports both inspection and de-identification operations.
Confidence levels for Sensitive Data Protection operate differently than confidence levels for other filters. For more information about confidence levels for Sensitive Data Protection, see Sensitive Data Protection match likelihood. For more information about Sensitive Data Protection in general, see Sensitive Data Protection overview.
Malicious URL detection
Malicious URLs are often disguised to look legitimate, making them a potent tool for phishing attacks, malware distribution, and other online threats. For example, if a PDF contains an embedded malicious URL, it can be used to compromise any downstream systems processing LLM outputs.
When malicious URL detection is enabled, Model Armor scans URLs to identify whether they're malicious. This lets you take action and prevent malicious URLs from being returned.
Define the enforcement type
Enforcement defines what happens after a violation is detected. To configure how Model Armor handles detections, you set the enforcement type. Model Armor offers the following enforcement types:
- Inspect only: Model Armor inspects requests that violate the configured settings, but it doesn't block them.
- Inspect and block: Model Armor blocks requests that violate the configured settings.
For more information, see Define the enforcement type for templates and Define the enforcement type for floor settings.
To effectively use Inspect only and gain valuable insights, enable
Cloud Logging. Without Cloud Logging enabled, Inspect only won't yield
any useful information.
Access your logs through Cloud Logging. Filter by the service name
modelarmor.googleapis.com. Look for entries related to the operations that you
enabled in your template. For more information, see View logs by using the Logs
Explorer.
Model Armor floor settings
Although Model Armor templates provide flexibility for individual applications, organizations often need to establish a baseline level of protection across all their AI applications. You use Model Armor floor settings to establish this baseline. They define minimum requirements for all templates created at the project level in the Google Cloud resource hierarchy.
For more information, see Model Armor floor settings.
Language support
Model Armor filters support sanitizing prompts and responses across multiple languages.
- The Sensitive Data Protection filter supports English and other languages depending on the infoTypes that you selected.
The responsible AI and prompt injection and jailbreak detection filters are tested on the following languages:
- Chinese (Mandarin)
- English
- French
- German
- Italian
- Japanese
- Korean
- Portuguese
- Spanish
These filters can work in many other languages, but the quality of results might vary. For language codes, see Supported languages.
There are two ways to enable multi-language detection:
Enable on each request: For granular control, enable multi-language detection on a per-request basis when sanitizing a user prompt and sanitizing a model response.
Enable one-time: If you prefer a simpler setup, you can enable multi-language detection as a one-time configuration at the Model Armor template level using the REST API. For more information, see Create a Model Armor template.
Document screening
Text in documents can include malicious and sensitive content. Model Armor can screen the following types of documents for safety, prompt injection and jailbreak attempts, sensitive data, and malicious URLs:
- PDFs
- CSV
- Text files: TXT
- Microsoft Word documents: DOCX, DOCM, DOTX, DOTM
- Microsoft PowerPoint slides: PPTX, PPTM, POTX, POTM, POT
- Microsoft Excel sheets: XLSX, XLSM, XLTX, XLTM
Data handling and storage
Model Armor is designed with privacy and data minimization principles in mind. This section describes how Model Armor handles your data:
- Stateless processing and content disposal: Model Armor operates as a stateless service, processing all prompts and model responses entirely in memory. It does not log, store, or durably retain any content analyzed during its standard operation; all data is immediately discarded once the analysis is complete.
- Customer-controlled logging: The only circumstance under which data related to the content being processed is stored is through Cloud Logging. If you choose to enable Cloud Logging for the Model Armor service, event details—which might include metadata or snippets of the analyzed content as configured—are sent to your designated Cloud Logging destination. The scope of the data logged and its retention are determined by your Cloud Logging configuration.
- Secure storage and encryption: All data handled by Model Armor is protected by industry-standard encryption. This includes data in transit using TLS 1.2 and later and any data residing briefly in memory during analysis.
- Regional data residency: While Model Armor processing is
stateless, the service supports strict data residency controls. This makes
sure that all transient processing occurs exclusively within your defined
geographic boundaries, such as the
USorEU. - Selective processing: To ensure operational efficiency and regional compliance, Model Armor only transmits and processes data for active filters. If a specific filter is disabled (for example, due to regional availability or user preference), no data is sent to or processed by the underlying service associated with that filter.
- Global compliance standards: As part of the Google Cloud ecosystem, Model Armor benefits from a foundation of rigorous security. The infrastructure undergoes regular independent audits to maintain certifications including SOC 1/2/3 and ISO/IEC 27001.
In summary, Model Armor doesn't store the content of your AI interactions unless you explicitly configure and enable platform logging, giving you control over data retention.
Pricing
Model Armor can be purchased as an integrated part of Security Command Center or as a standalone service. For pricing information, see Security Command Center pricing.
Tokens
Generative AI models break down text and other data into units called tokens. Model Armor uses the total number of tokens in AI prompts and responses for pricing purposes. Model Armor limits the number of tokens processed in each prompt and response. For token limits, see token limits.
What's next
- Learn about Model Armor templates.
- Learn about Model Armor floor settings.
- Learn about Model Armor endpoints.
- Sanitize prompts and responses.
- Learn about Model Armor audit logging.
- Troubleshoot Model Armor issues.