Best practices for cost-optimized Cloud Run services

Optimizing your Cloud Run services reduces expenses by aligning resource allocation with actual demand. Implementing cost-effective configurations prevents overprovisioning while maintaining service reliability and performance. There is no one-size-fits-all solution for cost optimization. It is important to monitor your needs, budget, and resources to determine what works best for you.

The best practices outlined in this document are specific to Cloud Run. These are not inclusive of other Google Cloud products.

Resource configurations

Optimizing your services for cost involves consideration of many different configurations. Tailor these configurations to your needs to create services that are reliable and cost efficient.

Select the appropriate region

Your service's deployment location impacts your total cost. Cloud Run uses a two tier regional pricing model. Tier 1 regions offer a lower cost per vCPU and memory compared to Tier 2 regions, so consider deploying to a Tier 1 region.

Require authentication

When configuring a Cloud Run service, you can choose from one of the two authentication options:

Allow public access: Authentication checks are not required.
Require authentication: Only authenticated users can access your Cloud Run service.

We recommend requiring authentication, unless you have a specific need to allow public access. This will prevent unwanted requests that could incur costs.

If you manage users with Identity-Aware Proxy (IAP), IAP might have its own associated costs.

Compare instance-based versus request-based billing

Cloud Run services have two billing settings:

Request-based billing (default): You are charged per request, plus a higher per-second rate for vCPU and memory consumed during request processing.
Instance-based billing: You are charged for the entire lifetime of an instance. There is no per-request fee, and the per-second rates for vCPU and memory are lower.

For services with steady, slowly varying traffic, consider using instance-based billing. The savings from lower compute rates and no per-request fee outweigh the cost of paying for idle time between requests. For services with sporadic, bursty, or spiky traffic, consider using request-based billing. If you are still unsure about which billing setting to use, see Recommender. The Recommender looks at the traffic received by your Cloud Run service over the past month and provides recommendations for switching from request-based billing to instance-based billing, if it is cheaper to do so.

Configure service scaling at the service level

To establish a cost-safety baseline, configure maximum instances for your service. Setting a higher maximum number prioritizes availability, but introduces potential billing risks from unexpected traffic spikes or misconfigurations. You should configure this setting at the service-level when you initially deploy your service to establish a cost baseline. For additional cost control tools, see resource allocation quotas or billing budgets and alerts.

Optimize CPU and memory utilization

The cost of your Cloud Run service is impacted by its CPU/memory configuration and how long your service is active, among other factors. Overprovisioning your resources can increase your costs. To determine which configuration might be best for your service:

Establish a baseline configuration.
Monitor your metrics while testing the CPU and memory utilization metrics in Cloud Monitoring.
Adjust your configuration as necessary.

If CPU utilization is consistently low under peak load, consider reducing vCPU allocation. If latency is high, consider increasing vCPU allocation.

If memory utilization is consistently low, consider reducing the allocated memory. If latency is high and memory utilization is near 100%, consider increasing the allocated memory. If you are experiencing Out of Memory (OOM) errors, you should increase the allocated memory or modify your application to prevent memory leaks or use less memory. See the Cloud Monitoring dashboard to better understand your memory utilization.

Configure GPU

All Cloud Run services using GPUs must have instance-based billing configured. This means that Cloud Run instances are charged for the entire lifecycle of instances, even when there are no incoming requests. The minimum CPU and memory configurations required for GPUs also impact the cost of your Cloud Run service. By default, GPU zonal redundancy is turned on. Turning off GPU zonal redundancy results in a lower cost per GPU second, but does not guarantee reserved capacity for failover scenarios.

Optimize networking costs

When configuring networking options for your service, consider the following:

Co-locate your resources: Try to deploy your Cloud Run services in the same region as your backend databases (like Cloud SQL or Firestore) and Cloud Storage buckets. Data transfer between Google Cloud resources within the same region is free.
Switch to Direct VPC egress: If you are securely routing traffic to internal VPC network resources, consider switching to Direct VPC egress from Serverless VPC Access connectors. Direct VPC egress scales to zero, eliminating the baseline compute overhead and idle costs associated with connector instances.
Use Cloud CDN: Offload static assets and highly cacheable content by placing Cloud CDN in front of your Cloud Run services. Serving data from the edge is significantly cheaper than paying for standard internet egress directly from Cloud Run.
Monitor internet egress: Inbound traffic (ingress) is always free, and you receive 1 GiB of free outbound internet data transfer per month within North America. Focus your monitoring efforts on outbound traffic that crosses region boundaries or exceeds the free tier.

Configure concurrency settings

When more instances process requests, Cloud Run allocates more CPU and memory at higher costs. A higher concurrency setting lets fewer instances handle the same request volume, which can reduce costs. However, the application code must be able to handle parallel requests efficiently. For more information, see Tuning concurrency for autoscaling and resource utilization.

Committed use discounts

Committed use discounts (CUDs) provide discounted prices in exchange for committing to continuously using Cloud Run for a specified period of time. CUDs apply at a Cloud Billing-account level. You can purchase Compute flexible CUDs for Cloud Run resources. Compute flexible CUDs don't apply to GPUs or networking. See Compute flexible committed use discount for more details.

Helpful tools

You can use the following tools to better understand your costs and to help avoid cost overruns.

Cloud Run overview: Billing panel

The Cloud Run overview page shows costs per resource name in the Billing panel. The numbers reflect the gross costs for selected time ranges per resource. This tool helps you better understand how much your resources cost.

Budget alerts

Create budget alerts in Cloud Billing to track your actual costs against your planned costs. A budget is an alerting mechanism that triggers notifications when spending thresholds are crossed, not a hard spending cap. There is a billing data delay that might impact when you receive alerts.

Cloud Billing

Cloud Billing is a collection of tools that help you track and understand your Google Cloud spending. These tools help you monitor your usage costs, forecast your spending, and identify opportunities to save on costs.

Cost Explorer

The Cost Explorer lets you understand the cost and utilization of your resources. Use Cost Explorer to:

Filter your resources by cost to see which resources are the most costly.
Understand what proportion of costs are driven by configurations such as vCPU, GPU, networking, and more.
Track impacts of changes to your resource configuration on your monthly bill.

Google Cloud pricing calculator

The Google Cloud pricing overview contains information for better understanding the Google Cloud pricing model. This is also where you can find the Detailed price list. You can estimate your costs by adding and configuring products by using the pricing calculator.

Recommender

Recommender is a tool that provides usage recommendations and insights for Cloud products.

Recommender automatically looks at traffic received by your Cloud Run service over the past month, and will recommend switching from request-based billing to instance-based billing, if this is cheaper.

Cloud Hub Optimization

You can view summary cost data, utilization data, and cost optimization recommendations for Google Cloud services on Cloud Hub's Optimization page.

Best practices for cost-optimized Cloud Run services Stay organized with collections Save and categorize content based on your preferences.