Region ID
The REGION_ID is an abbreviated code that Google assigns
based on the region you select when you create your app. The code does not
correspond to a country or province, even though some region IDs may appear
similar to commonly used country and province codes. For apps created after
  February 2020, REGION_ID.r is included in
  App Engine URLs. For existing apps created before this date, the
  region ID is optional in the URL.
Learn more about region IDs.
Software development is all about tradeoffs and microservices are no exception. What you gain in code deployment and operation independence, you pay for in performance overhead. This section provides some recommendations for steps that you can take to minimize this impact.
Turn CRUD operations into microservices
Microservices are particularly well-suited to entities that are accessed with the create, retrieve, update, delete (CRUD) pattern. When working with such entities, you typically use only one entity at a time, such as a user, and you typically perform only one of the CRUD actions at a time. Therefore, you only need a single microservice call for the operation. Look for entities that have CRUD operations plus a set of business methods that could be utilized in many parts of your application. These entities make good candidates for microservices.
Provide batch APIs
In addition to CRUD-style APIs, you can still provide good microservice performance for groups of entities by providing batch APIs. For example, rather than only exposing a GET API method that retrieves a single user, provide an API that takes a set of user IDs and returns a dictionary of corresponding users:
Request:
/user-service/v1/?userId=ABC123&userId=DEF456&userId=GHI789Response:
{
  "ABC123": {
    "userId": "ABC123",
    "firstName": "Jake",
    … },
  "DEF456": {
    "userId": "DEF456",
    "firstName": "Sue",
    … },
  "GHI789": {
    "userId": "GHI789",
    "firstName": "Ted",
    … }
}
The App Engine SDK supports many batch APIs, such as the ability to fetch many entities from Cloud Datastore through a single RPC, so servicing these types of batch APIs can be very efficient.
Use asynchronous requests
Often, you will need to interact with many microservices to compose a response.
For example, you might need to fetch the logged-in user's preferences as well as
their company details. Frequently, these pieces of information are not dependent
on one another and you could fetch them in parallel. The Urlfetch library in
the App Engine SDK supports asynchronous requests,
allowing you to call microservices in parallel.
from google.appengine.api import urlfetch
preferences_rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(preferences_rpc,
                         'https://preferences-service-dot-my-app.uc.r.appspot.com/preferences-service/v1/?userId=ABC123')
company_rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(company_rpc,
                         'https://company-service-dot-my-app.uc.r.appspot.com/company-service/v3/?companyId=ACME')
 ### microservice requests are now occurring in parallel
try:
  preferences_response = preferences_rpc.get_result()  # blocks until response
  if preferences_response.status_code == 200:
    # deserialize JSON, or whatever is appropriate
  else:
    # handle error
except urlfetch.DownloadError:
  # timeout, or other transient error
try:
  company_response = company_rpc.get_result()  # blocks until response
  if company_response.status_code == 200:
    # deserialize JSON, or whatever is appropriate
  else:
    # handle error
except urlfetch.DownloadError:
  # timeout, or other transient error
Doing work in parallel often runs counter to good code structure because, in a
real world scenario, you often use one class to encapsulate preferences methods
and another class to encapsulate company methods. It's difficult to leverage
asynchronous Urlfetch calls without breaking this encapsulation. A good
solution exists in the App Engine Python SDK's NDB package:
Tasklets.
Tasklets enable you to keep good encapsulation in your code while still offering
a mechanism to achieve parallel microservice calls. Note that tasklets use
futures instead of RPCs, but the idea is similar.
Use the shortest route
Depending on how you invoke Urlfetch, you can cause different infrastructure
and routes to be used. In order to use the best-performing route, consider the
following recommendations:
- Use 
REGION_ID.r.appspot.com, not a custom domain
- A custom domain causes a different route to be used when routing through the
Google infrastructure. Since your microservice calls are internal, it's easy
to do and performs better if you use 
https://PROJECT_ID.REGION_ID.r.appspot.com.
- Set follow_redirectstoFalse
- Explicitly set follow_redirects=Falsewhen callingUrlfetch, as it avoids a heavier-weight service designed to follow redirects. Your API endpoints should not need to redirect the clients, because they are your own microservices, and endpoints should only return HTTP 200-, 400-, and 500-series responses.
- Prefer services within a project over multiple projects
- There are good reasons to use multiple projects when building a microservices-based application, but if performance is your primary goal, use services within a single project. Services of a project are hosted in the same datacenter and even though throughput on Google's inter-datacenter network is excellent, local calls are faster.
Avoid chatter during security enforcement
It's bad for performance to use security mechanisms that involve lots of back and forth communication to authenticate the calling API. For example, if your microservice needs to validate a ticket from your application by calling back to the application, you've incurred a number of roundtrips to get your data.
An OAuth2 implementation can amortize this cost over time by using refresh
tokens and caching an access token between Urlfetch invocations. However, if
the cached access token is stored in memcache, you will need to incur memcache
overhead to fetch it. To avoid this overhead, you might cache the access token
in instance memory, but you will still experience the OAuth2 activity
frequently, as each new instance negotiates an access token; remember that App
Engine instances spin up and down frequently. Some hybrid of memcache and
instance cache will help mitigate this issue, but your solution starts to become
more complex.
Another approach that performs well is to share a secret token between microservices, for example, transmitted as a custom HTTP header. In this approach, each microservice could have a unique token for each caller. Typically, shared secrets are a questionable choice for security implementations, but since all the microservices are in the same application, it becomes less of an issue, given the performance gains. With a shared secret, the microservice only needs to perform a string comparison of the incoming secret against a presumably in-memory dictionary, and the security enforcement is very light.
If all of your microservices are on App Engine, you can also inspect the
incoming
X-Appengine-Inbound-Appid header.
This header is added by the Urlfetch infrastructure when making a request to
another App Engine project and cannot be set by an external party. Depending on
your security requirement, your microservices could inspect this incoming header
to enforce your security policy.
Trace microservice requests
As you build your microservices-based application, you begin to accumulate
overhead from successive Urlfetch calls. When this happens, you can use
Cloud Trace
to understand what calls are being
made and where the overhead is. Importantly, Cloud Trace can also help identify
where independent microservices are being serially invoked, so you can
refactor your code to perform these fetches in parallel.
A helpful feature of Cloud Trace kicks in when you use multiple services within a single project. As calls are made between microservice services in your project, Cloud Trace collapses all the calls together into a single call graph to allow you to visualize the entire end-to-end request as a single trace.

Note that in the above example, the calls to the pref-service and the
user-service are performed in parallel by using an asynchronous Urlfetch,
so the RPCs appear scrambled in the visualization.
However this is still a valuable tool for diagnosing latency.
What's next
- Get an overview of microservice architecture on App Engine.
- Understand how to create and name dev, test, qa, staging, and production environments with microservices in App Engine.
- Learn the best practices for designing APIs to communicate between microservices.
- Learn how to Migrate an existing monolithic application to one with microservices.