Best practices

This guide provides overall best practices for designing reliable agent applications.

Start simple

When you first start building your agent application, you should start with simple use cases. Once you have simple use cases working, continue building more complicated use cases.

Instructions should be specific

Agent instructions should be specific and unambiguous. Instructions should be well organized and grouped by topics. Avoid scattering instructions on specific topics in a haphazard manner. Instructions should be easy for a human to follow as well.

Use structured instructions

Once you have finished writing your instructions, you should use the restructure instructions feature to format your instructions. In this format, your agent will be more reliable.

Use evaluations

Evaluations help keep your agents reliable. Use them to set the expectations on your agents and the APIs called by your agents.

Wrap APIs with Python tools

External API schemas may define many input and output parameters that are not relevant to your agent. If you use OpenAPI tools for cases like this, you might be providing unnecessary context to the model, which can reduce reliability.

It is a best practice to use Python tools to wrap API calls. This gives you complete control over the input and output parameters defined by the tool, which are shared with the model.

For example:

def python_wrapper(arg_a: str, arg_b: str) -> dict:
  """Call the scheduling service to schedule an appointment."""
  res = complicated_external_api_call(...)
  return res.json()

Use tools and callbacks for deterministic behavior

In certain conversational scenarios, you may require more deterministic behavior from your agent application. In these cases, you should use tools or callbacks.

Callbacks are usually the best option for full deterministic control. Callbacks occur outside of the purview of the agent, so the agent is not involved in the execution of callbacks.

The internals of a tool are fully deterministic, but a tool call orchestrated by an agent is not deterministic. The agent decides to call a tool, prepares tool input arguments, and interprets tool results. Its possible for an agent to hallucinate this orchestration.

Chaining tool calls

Similar to wrapping API calls with tools, If multiple tools need to be executed during a conversational turn, you should instruct the agent to call one tool and implement that tool to call the others. Alternatively, you could instruct the agent to call the first tool and define a after_tool_callback callback to call the remaining tools.

Bad pattern for chaining tool calls

It is considered a bad pattern to instruct the agent to call multiple tools during a conversational turn in order to accomplish a common goal.

The model has to predict every tool call and every parameter in that tool call. Then it has to ensure that it's predicting the tool calls in order as well. This means you are relying heavily on the model (which is inherently non-deterministic) to perform a deterministic task.

For example, consider the following tool sequence:

  • tool_1(arg_a, arg_b) -> output c
  • tool_2(arg_c) -> output d
  • tool_3(arg_d) -> output e

If you define instructions for these three tool calls, you end up with a runtime sequence of events like the following:

  • User Input
  • Model -> Agent predicts tool_1(arg_a, arg_b)
  • tool_1_response.json() is returned
  • Agent interprets tool_1_response.json() and extracts arg_c
  • Model -> Agent predicts tool_2(arg_c)
  • tool_2_response.json() is returned
  • Agent interprets tool_2_response.json() and extracts arg_d
  • Model -> Agent predicts tool_3(arg_d)
  • tool_3_response.json() is returned
  • Model -> Agent provides final response

There are 4 model calls, 3 tool predictions, and 4 input arguments.

Good pattern for chaining tool calls

When you need to call multiple tools, it is considered a good pattern to instruct the agent to call a single tool, and to implement that tool to call the others.

The following tool calls three other tools:

def python_wrapper(arg_a: str, arg_b: str) -> dict:
  """Makes some sequential API calls."""
  res1 = tools.tool_1(arg_a, arg_b)
  res2 = tools.tool_2(res1.json())
  res3 = tools.tool_3(res2.json())

  return res3.json()

Consider the sequence of events for a single tool call:

  • User Input
  • Model -> Agent predicts python_wrapper(arg_a, arg_b)
  • python_wrapper_response.json() is returned
  • Model -> Agent provides final response

This approach reduces tokens and reduces the probability of hallucination.

Define a development process for agent collaboration

When collaborating with a team on agent application development, you should define a development process. The following are examples of possible collaboration practices:

  • Use third-party version control: Use import and restore to synchronize changes with your third-party version control system. Agree on the process for synchronizing, reviewing, and merging. Define clear owners and clear steps to accept changes (for example, having evaluation results).
  • Use built-in version control: Setup a process to use the built-in version control. Agree on how to use snapshots for versioning. For example, you could require a snapshot when a milestone is reached (a set of evals passes), or before new feature development is done. Agree on process for synchronizing, reviewing, and merging changes.

Clear and distinct tool definitions

For tool definitions, the following best practices should be applied:

  • Different tools shouldn't have similar names. Make your tool names noticeably distinct from one another.
  • Tools that are not used should be removed from the agent node.
  • For parameter names, use snake case, use descriptive names, and avoid uncommon abbreviations.

    Good examples: first_name, phone_number, url.

    Bad examples: i, arg1, fn, pnum, rqst.

  • Parameters should use flattened structures rather than nested structures. The more nested a structure is, the more you are relying on the model to predict key/value pairs and their proper typing.

Perform end-to-end testing

Your agent application development process should include end-to-end testing to verify integrations with external systems.