This guide provides instructions and best practices for engineers building food ordering experiences with the FoodOrderingService.BidiProcessOrder RPC method. This real-time, bidirectional streaming API is the core of the Food Ordering AI Agent, enabling dynamic, conversational order-taking in various applications such as mobile apps, voice assistants, drive-thrus, and kiosks.
Overview of BidiProcessOrder
The BidiProcessOrder method establishes a persistent, two-way communication channel between your client application and the Food Ordering AI Agent. Unlike standard unary request & response RPCs, this streaming approach allows for:
- Low-latency interaction: Continuous exchange of information without the overhead of repeated HTTP requests.
- Multimodal input: Handling of audio streams (for voice ordering), text inputs, and client-side events.
- Real-time responses: The agent can send back audio, text, order updates, and other signals as the conversation unfolds.
BidiProcessOrder cannot be invoked using REST. Integrations must use a connection-oriented protocol:
- gRPC (Recommended): Provides a robust and efficient framework for bidirectional streaming.
- WebSocket: Suitable for clients or environments where gRPC isn't a fit due to programming language or network constraints.
Refer to the BidiProcessOrder API Reference for detailed type definitions. WebSocket integrations use JSON representations of these types, as described in the WebSocket section.
Prerequisites
Before integrating with BidiProcessOrder:
- Enable the API: Ensure the Food Ordering AI Agent API is enabled in your Google Cloud project.
bash gcloud services enable foodorderingaiagent.googleapis.com --project=PROJECT_ID - Authentication: Decide your authentication approach and setup any necessary service accounts and IAM roles, as described in Authentication.
- Menu Ingestion: A valid Menu must be ingested and associated with a
Store. See Integrating Menu Data for details.
Authentication
To securely connect to the BidiProcessOrder RPC, your application must authenticate using a Google Cloud Service Account.
1. Configure a Service Account
- Create a Service Account: In your Google Cloud project, create a Service Account that your application will use to authenticate to the Food Ordering AI Agent API. See Creating and managing service accounts.
Grant IAM Roles: Grant the necessary IAM roles to this service account. The primary role required to call
BidiProcessOrderis:- Food Ordering Agent User (
roles/foodorderingaiagent.agentUser): Allows the service account to connect to the ordering service and process sessions.
You can grant this role using the Google Cloud console or
gcloud:bash gcloud projects add-iam-policy-binding PROJECT_ID \ --member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \ --role="roles/foodorderingaiagent.agentUser"- Food Ordering Agent User (
2. Application Authentication Flow
The exact authentication flow depends on your application architecture, especially whether the client application (e.g., mobile app, kiosk software) connects directly or through your own backend.
Common Scenario: Authenticating a consumer-facing client application
This is a typical pattern for mobile or web applications:
- Client-to-YourAuth: The end-user client app (mobile, web) authenticates with your existing user authentication system (this could be Firebase Authentication, your own OAuth server, etc.).
- Token Exchange: The client app, after authenticating the user, requests a short-lived token from a secure backend service you control (e.g., an "API Token Service").
Access Token Generation: Your backend service, using the credentials of the Google Cloud Service Account principal configured in Step 1, generates a standard OAuth 2.0 access token for the
https://www.googleapis.com/auth/cloud-platformscope. This can be done using the Google Cloud Authentication client libraries.- Security: Service account keys or credentials used to generate these tokens must be securely stored and managed on your backend. Never expose service account private keys directly to end-user client applications. See Best practices for managing service account keys.
Token to Client: Your backend service returns the generated Google access token to the client app.
API Call: The client app uses this Google access token to authenticate its gRPC or WebSocket connection to the
BidiProcessOrderRPC.
3. Using the Token
- gRPC: The Google gRPC client libraries typically handle token refreshing and inclusion in the call metadata when provided with service account credentials.
- WebSocket (Non-Browser): Include the token in the
Authorization: Bearer TOKENheader. - WebSocket (Browser): As noted in the WebSocket section, direct browser WebSocket connections cannot use Authorization headers. A server-side streaming proxy is needed to authenticate your clients connection to Google Cloud.
Connecting to the API
You can establish a stream using gRPC client libraries or a WebSocket connection.
gRPC
Using gRPC is the recommended approach. You'll use the client libraries for your language of choice (e.g., Java, Go, Python, Node.js) which are based on the BidiProcessOrder API Reference.
The basic steps involve:
- Create a gRPC channel to the Food Ordering AI Agent API endpoint (e.g.,
foodorderingaiagent.googleapis.com). - Obtain a client stub for
FoodOrderingService. - Invoke the
BidiProcessOrdermethod, which returns a stream object for both sending requests and receiving responses. - Implement business logic according to your use case which concurrently:
- Sends audio, text, and event input from the end user.
- Handles messages from the agent including audio, text, and events.
WebSocket
For WebSocket connections, the URL path is:
wss://foodorderingaiagent.googleapis.com/ws/google.cloud.foodorderingaiagent.v1beta.FoodOrderingService/BidiProcessOrder/locations/LOCATION
LOCATION: e.g.,us
Required Headers:
Authorization:Bearer TOKEN- WhereTOKENis an OAuth 2.0 access token obtained for your service account.
Message Format:
- Client to Server: Messages sent to the API (e.g.,
Config,AudioInput,TextInput,EventInput) must be JSON representations of theBidiProcessOrderRequestproto, sent aswebsocket.TextMessage. - Server to Client: Messages received from the API (
BidiProcessOrderResponse) will be sent aswebsocket.BinaryMessage, but the content of these binary messages is a JSON payload. - Binary Data: Binary data within the JSON payloads (e.g.,
customerAudioinAudioInput,agentAudioinAgentAudio) must be base64 encoded.
Session Lifecycle
Each call to BidiProcessOrder initiates a session. The session remains active as long as the stream is open.
Initiation (Config Message):
- Upon establishing the connection, the first message sent by the client must be a
BidiProcessOrderRequestcontaining theConfigmessage. - Required Fields in
Config:session: A unique client-generated session identifier. Format:projects/PROJECT/locations/LOCATION/sessions/SESSION_ID.store: The resource name of theStore. Format:projects/PROJECT/locations/LOCATION/brands/BRAND/stores/STORE.
- The agent uses the
storeto load the appropriate menu and configuration.
- Upon establishing the connection, the first message sent by the client must be a
Sending Inputs:
- After the initial
Config, the client can send a stream ofBidiProcessOrderRequestmessages containing one of the following inputs:AudioInput: Raw audio data (typically 16-bit linear PCM at 16000 Hz, no headers). Used for voice interactions.TextInput: Text messages from the user.EventInput: Signals for events such asDriveOffEvent(for drive-thru use cases when the vehicle departs),CrewInterjectionEvent(for any situation wherein a human takes over the order taking role mid-conversation), orOrderStateUpdateEvent(if the order is modified on the client-side, e.g., using a touch interface).
- After the initial
Receiving Responses:
- Concurrently, the agent sends back a stream of
BidiProcessOrderResponsemessages. Your client must be prepared to handle various response types within theoneof responsefield:AgentAudio: Synthesized audio bytes to be played to the user, used for voice interactions.AgentText: Text version of the agent's response.SpeechRecognition: Transcript of the recognized user speech.UpdatedOrderState: Contains the complete current state of the customer'sOrderwhenever it's updated by the agent. Use this to update your application's order representation. This should typically result in an update to a user interface or a system of record for order state information, such as a point of sale system.InterruptionSignal: Indicates the user interrupted the agent's speech. The client should immediately stop playing any outgoingAgentAudio.AgentEvent: Special events, such asRestartOrder, requiring client action.SuggestedOptions: Provides contextually relevant options a user might select next, useful for display on a screen.EndSession: Signals the session has been terminated by the agent (e.g., order complete, user drive-off, or agent escalation).
- Concurrently, the agent sends back a stream of
Closing the Stream:
- The stream can be closed by the client or the server. Typically, the server signals the end of a conversation using an
EndSessionmessage. The client should close the stream when this message is received.
- The stream can be closed by the client or the server. Typically, the server signals the end of a conversation using an
Handling Specific Message Types
The following sections describe how to handle specific response types that your client will receive when calling BidiProcessOrder.
AudioInput
- Stream audio in chunks as it becomes available.
- Format: 16-bit linear PCM, 16000 Hz sample rate.
- Audio chunks do not include the audio headers that typically prefix a WAV file.
- For drive-thru scenarios with echo cancellation enabled (
enable_echo_cancellationinConfig), provide bothcustomer_audioandcrew_audio.
UpdatedOrderState
- This message provides the full state of the order each time it's sent. Replace any local cache of the order with the contents of the
Ordermessage received. - Use the
custom_integration_attributeswithin theOrderitems and modifiers to map theOrdercontent into equivalent entities within your application's system of record.
InterruptionSignal
- Upon receiving, immediately halt playback of any
AgentAudioand clear any buffered agent audio. This ensures a natural conversational flow when the user interrupts the agent's speech.
EndSession
- Check the
EndType(e.g.,DRIVE_OFF,AGENT_ESCALATION). - Your application should gracefully close the connection and transition the user appropriately (e.g., notify a human supervisor in the case of
AGENT_ESCALATION, or transition to an order confirmation state).
Best Practices
- Handle Messages Asynchronously: Minimize latency by using threads or non-blocking I/O to concurrently send requests and process incoming responses.
- Reconnection Logic: Implement robust reconnection logic in case of network issues, remembering to send the initial
Configmessage with the same session ID to attempt resumption. - Error Handling: Monitor the stream for errors. gRPC and WebSocket libraries provide mechanisms to detect stream closure or transport errors. Log these events and handle them gracefully.
- Audio Buffering: Manage audio buffers carefully, implementing buffering if necessary, to ensure smooth playback of
AgentAudioand timely delivery ofAudioInput. Carefully consider the tradeoff between latency and playback quality when deciding your buffering scheme. - Session ID Management: Ensure session IDs are unique for each distinct order/conversation.
- Resource Management: Close streams and release resources when the session is complete or if unrecoverable errors occur.
- Timeouts: While the stream itself can be long-lived (up to 15 minutes by default), consider application-level timeouts for specific states if needed.
Example Integration Flow (Conceptual)
- Client App (e.g., Mobile App) initiates an order.
- Establish gRPC/WebSocket connection to
BidiProcessOrder. - Send
BidiProcessOrderRequestwithConfig(session ID, store ID). - Receive initial
AgentAudio(e.g., welcome message) and play it. - User speaks: Capture audio, stream it in
AudioInputmessages. - Receive
SpeechRecognition(display transcript),AgentAudio(play response), and potentiallyUpdatedOrderState(update UI cart). - If user interrupts, receive
InterruptionSignal, stop playback. - Continue exchange of audio or text inputs and agent responses.
- User confirms order: Agent sends final
UpdatedOrderState. - Agent sends
EndSession: Client closes the stream and finalizes the order in the POS system using data from the lastUpdatedOrderState.