Design patterns

This guide provides design patterns to handle common scenarios for agent applications.

Dynamic prompts

You can build agents that have dynamic prompts sent to the model by using:

For example, you can alter the agent instructions based on whether the user is a lawyer or a pirate:

Variables:

Variable name Default value
current_instructions You are Gemini and you work for Google.
lawyer_instructions You are a lawyer and your job is to tell dad joke style jokes but with a lawyer edge.
pirate_instructions You are a pirate and your job is to tell a joke as a pirate.
username Unknown

Instructions:

The current user is: {username}
You can use the 'update_username' tool to update the user's name if they provide
it.

Follow the current instruction set below exactly.

{current_instructions}

Python tool:

from typing import Optional

def update_username(username: str) -> Optional[str]:
  """Updates the current user's name."""
  context.variables["username"] = username

Callback:

def before_model_callback(
  callback_context: CallbackContext,
  llm_request: LlmRequest
) -> Optional[LlmResponse]:
  username = callback_context.variables['username']

  if username == "Jenn":
    callback_context.variables['current_instructions'] = (
      callback_context.variables['pirate_instructions']
    )
  elif username == "Gary":
    callback_context.variables['current_instructions'] = (
      callback_context.variables['lawyer_instructions']
    )

Always read a disclaimer as first message

Some agent applications may need to read a disclaimer at the start of the conversation. You can add this requirement to instructions, but you may not always receive a disclaimer with this approach. A better way to ensure messages like this is to use an after model callback that checks the model response for the disclaimer, and adds it if it is not present.

Example variables:

Variable name Default value
first_turn True

Example callback:

def get_all_text(contents: List[Content]) -> str:
  """Collate all text from the most recent set of contents."""
  all_text = []
  for content in contents:
    for part in content.parts:
      all_text.append(part.text)

  return all_text

def get_model_text(contents: List[Content]) -> str:
  """Get the most recent response from the model."""
  all_text = get_all_text(contents)

  return all_text[-1]

def respond(text: str) -> LlmResponse:
  """Help method to format the LlmResponse class."""
  return LlmResponse(content=Content(parts=[Part(text=text)], role="model"))

DISCLAIMER = "THIS CONVERSATION MAY BE RECORDED FOR LEGAL PURPOSES."

def after_model_callback(
  callback_context: CallbackContext,
  llm_response: LlmResponse
) -> Optional[LlmResponse]:

  if callback_context.variables["first_turn"]:
    model_text = get_model_text(llm_response.contents)

    # If we have a response, check for the existince of the
    # DISCLAIMER in the model response. If it doesn't exist,
    # force the disclaimer to be read before the model text.
    if model_text and DISCLAIMER not in model_text:
      callback_context.variables["first_turn"] = False
      return respond(f"{DISCLAIMER}\n\n{model_text}")

    # The model accurately included the disclaimer already,
    # so we can reset the first turn variable and let the response
    # pass through.
    else:
      callback_context.variables["first_turn"] = False
      return llm_response

  # This wasn't the first turn, so let llm_response pass through
  else:
    return llm_response

Custom response for no-input

When an agent times out waiting for input (see Wait for input in agent application settings), a generative response is used by default. However, you can check whether input was received by the user in a before model callback and conditionally provide a response.

Example callback:

def get_all_text(contents: List[Content]) -> str:
  """Collate all text from the most recent set of contents."""
  all_text = []
  for content in contents:
    for part in content.parts:
      all_text.append(part.text)

  return all_text

def get_last_utterance(contents: List[Content]) -> str:
  """Extract the last user utterance from the list of texts."""
  all_text = get_all_text(contents)

  return all_text[-1]

def get_user_activity_signal(contents: List[Content]) -> str:
  """Check the second to last message for no input signal.

  The current format of the no input signal comes across three text messages
  like so:
  <context>
  no user activity detect in last X seconds
  </context>

  This method will grab the recent list of texts from content and check for the
  existence of the 'no user activity' statement.
  """
  all_text = get_all_text(contents)
  if "no user activity detected" in all_text[-2]:
    return True

  return False

def respond(text: str) -> LlmResponse:
  """Help method to format the LlmResponse class."""
  return LlmResponse(content=Content(parts=[Part(text=text)], role="model"))

def before_model_callback(
  callback_context: CallbackContext,
  llm_request: LlmRequest
) -> Optional[LlmResponse]:
  """Checks for user inactivity signal and last user input text."""
  user_inactive = get_user_activity_signal(llm_request.contents)
  user_text = get_last_utterance(llm_request.contents)

  if user_inactive:
    return respond("THIS IS A STATIC MESSAGE")

  if "apple" in user_text.lower():
    return respond("NO APPLES HERE MATEY!")

Displaying Markdown and HTML

If your conversational interface supports Markdown and HTML for agent responses, you can use the simulator to test these responses, because the simulator also supports Markdown and HTML.

Example instructions:

<role>
    You are a "Markdown Display Assistant," an AI agent designed to demonstrate
    various rich content formatting options like images, videos, and deep links
    using HTML-style markdown. Your purpose is to generate and display this
    content directly to the user based on their requests.
</role>
<persona>
    Your primary goal is to showcase the rich content rendering capabilities of
    the platform by generating HTML markdown for elements like images, videos,
    and hyperlinks. You are a helpful and direct assistant. When asked to show
    something, you generate the markdown for it and present it.
    You should not engage in conversations outside the scope of generating and
    displaying markdown. If the user asks for something unrelated, politely
    state that you can only help with displaying rich content. Adhere strictly
    to the defined constraints and task flow.
</persona>
<constraints>
    1.  **Scope Limitation:** Only handle requests related to displaying
        markdown content (images, videos, links, etc.). Do not answer general
        knowledge questions or perform other tasks.
    2.  **Tool Interaction Protocol:** You must use the \`display_markdown\`
        tool to generate the formatted content string.
    3.  **Direct Output:** Your final response to the user must be the raw
        markdown string returned by the \`display_markdown\` tool. Do not add
        any conversational text around it unless the tool returns an error.
        For example, if the tool returns \`"<img src='...'>"\`, your response
        should be exactly \`"<img src='...'>"\`.
    4.  **Clarity and Defaults:** If a user's request is vague (e.g., "show me
        an image"), use the tool's default values to generate a response. There
        is no need to ask for clarification.
    5.  **Error Handling:** If the tool call fails or returns an error, inform
        the user about the issue in a conversational manner.
</constraints>
<taskflow>
    These define the conversational subtasks that you can take. Each subtask
    has a sequence of steps that should be taken in order.
    <subtask name="Generate and Display Markdown">
        <step name="Parse Request and Call Tool">
            <trigger>
                User initiates a request to see any form of rich content (image,
                video, link, etc.).
            </trigger>
            <action>
                1.  Identify the types of content the user wants to see (e.g.,
                    image, video, deep link).
                2.  Call the \`display_markdown\` tool. Set the corresponding
                    boolean arguments to \`True\` based on the user's request.
                    For example, if the user asks for a video and a link, call
                    \`display_markdown(show_video=True, show_deep_link=True)\`.
                3.  If the user makes a general request like "show me something
                    cool", you can enable all flags.
            </action>
        </step>
        <step name="Output Tool Response">
            <trigger>
                The \`display_markdown\` tool returns a successful response
                containing a \`markdown_string\`.
            </trigger>
            <action>
                1.  Extract the value of the \`markdown_string\` key from the
                    tool's output.
                2.  Use this value as your direct and final response to the
                    user, without any additional text or formatting.
            </action>
        </step>
    </subtask>
</taskflow>

Example python tool:

from typing import Any

def display_markdown(show_image: bool, show_video: bool, show_deep_link: bool) -> dict[str, Any]:
    """
    Constructs a markdown string containing HTML for various rich media elements.

    This function generates an HTML-formatted string based on the boolean flags provided.
    It can include an image, a video, and a hyperlink (deep link). The content for
    these elements is pre-defined.

    Args:
        show_image (bool): If True, an <img> tag will be included in the output.
        show_video (bool): If True, a <video> tag will be included in the output.
        show_deep_link (bool): If True, an <a> tag will be included in the output.

    Returns:
        dict[str, Any]: A dictionary with a single key 'markdown_string' containing the
                        generated HTML markdown. If no flags are set, it returns a
                        message indicating nothing was requested.
    """
    # MOCK: This is a mock implementation. It does not fetch any dynamic content.
    # It assembles a markdown string from hardcoded HTML snippets to demonstrate
    # the agent's ability to render rich content.

    markdown_parts = []

    if show_image:
        image_html = "This is a sample image:\n<img src='https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png' alt='Google Logo' width='272' height='92' />"
        markdown_parts.append(image_html)

    if show_video:
        video_html = "This is a sample video:\n<video controls width='320' height='240'><source src='https://www.w3schools.com/html/mov_bbb.mp4' type='video/mp4'>Sorry, your browser does not support embedded videos.</video>"
        markdown_parts.append(video_html)

    if show_deep_link:
        link_html = "This is a sample deep link:\n<a href='https://www.google.com'>Click here to go to Google</a>"
        markdown_parts.append(link_html)

    if not markdown_parts:
        return {"markdown_string": "You did not request any content to be displayed. Please specify if you want to see an image, video, or link."}

    return {"markdown_string": "\n\n".join(markdown_parts)}