Guardrails

Guardrails are a collection of checks and balances that help protect and keep agent applications safe and secure. They provide content restrictions for both model input and model output. Each agent application is provided with default guardrails, which you can modify to suit your needs. To access the guardrails for your agent application, click the guardrails button on the right side of the agent builder.

Prompt guard

Prompt guard provides basic protection against prompt-based attacks like "ignore your instructions and ...". The following settings are available:

  • Enable prompt guard: Enable or disable prompt card.
  • Outcomes: Controls what happens if a prompt guard is triggered using the following outcome controls:
    • Say exactly: Provide the exact agent response.
    • Handoff to an agent: Transition control to a specific agent.
    • Generate a response: Provide instructions to generate a response.
  • Custom: Provide a custom security prompt for screening queries.

Blocklist

Blocklists prevent users and your agent from using certain words and phrases. When you create a list, the following settings are available:

  • How should your agent match blocked content?: This lets you pick the matching method:
    • Whole words: Matches for whole words.
    • Any mention: Matches content that contains words and phrases.
    • Regex pattern: Matches regular expressions.
  • Block words and phrases: The list of blocked words and phrases, where each entry is separated with a comma.
  • Blocked content from: Block content from one or both:
    • On user input
    • On agent response
  • Outcome: Controls what happens if a block is triggered using the following outcome controls:
    • Say exactly: Provide the exact agent response.
    • Handoff to an agent: Transition control to a specific agent.
    • Generate a response: Provide instructions to generate a response.
  • Details: Provide an optional name and description for this blocklist.

Safety

These are guardrails that enforce Responsible AI practices. The following settings are available:

  • Safety level: Select the level of safety:
    • Relaxed: Prioritize flexible generation and low latency. Never engage with illicit or harmful prompts. Always block explicitly harmful content.
    • Balanced: Prioritize safe and natural interactions with customers. Always stop unsafe content. Never engage with harmful prompts.
    • Strict: Prioritize deep harm filtering. Never allow generated content with sensitive elements. Always push back against harmful prompts.
  • Outcomes: Controls what happens if a safety guardrail is triggered using the following outcome controls:
    • Say exactly: Provide the exact agent response.
    • Handoff to an agent: Transition control to a specific agent.
    • Generate a response: Provide instructions to generate a response.
  • Custom: Individually adjust or disable specific safety guardrail safety levels.

Rules

Build your own guardrails using rules. The following settings are available:

  • Behavior: Select one of the following to define your rule:
    • Natural: Provide natural language instructions.
    • Code: Provide the code for a after_model_callback callback.
  • Outcomes: Controls what happens if a rule is triggered using the following outcome controls:
    • Say exactly: Provide the exact agent response.
    • Handoff to an agent: Transition control to a specific agent.
    • Generate a response: Provide instructions to generate a response.
  • Details: Provide an optional name and description for this rule.