Google Distributed Cloud connected shared responsibility

Monitoring and maintaining Google Distributed Cloud connected is a shared responsibility between Google and the customer. Use the information in this document to determine how to best deploy and manage your on-premises workloads.

Google's responsibilities

As a managed hardware and software service, Google is responsible for managing and monitoring the infrastructure that you use to deploy your business applications.

Google is responsible for the following aspects of the Distributed Cloud connected system:

  • The Google Cloud control plane
  • The Kubernetes control plane, worker node, and built-in system services
  • Google-supplied software add-ons and products
  • Supplied hardware, including servers

Google monitors the functionality that we are responsible for and alerts Google engineers when issues are found so that they can investigate.

Customer responsibilities

You're responsible for the following aspects of the Distributed Cloud connected system:

  • The local network, including any customer-supplied switches
  • Internet connectivity
  • Power
  • The environment, such as cooling
  • Customer application and any customer-installed Google Distributed Cloud or Kubernetes add-ons
  • Customer-owned bastion host instances and boundary proxy deployments, if using these features

Google doesn't directly monitor issues that are your responsibilities. For example, Google doesn't monitor whether a customer VM is not booting correctly or whether the customer's application is not running. In the event that you believe that such behaviors are due to a platform issue, you must open a Google Cloud support ticket in order for Google to investigate.

Shared responsibility

In some cases, Google detects a site failure but believes that the cause is a site-specific issue that you're responsible for. For example, we might see rising temperatures over time across all nodes on a site followed by disconnection, indicating that a local cooling failure is the likely issue. In these scenarios, Google initiates collaborative troubleshooting with you to confirm whether the issue is caused by site-specific customer responsibilities and to verify any hardware failures.

To successfully resolve issues and determine a root cause, Google might need to request and receive information from you. For example, Google might need to know the time of power loss and when the power or network is restored. If you can't provide this information, Google might not be able to conduct a detailed root cause analysis.

Connectivity failures

In the specific case of an internet connectivity failure, the product supports survivability mode for up to seven days. During this period, local access to the service is available. However, Google can't monitor, mitigate, or diagnose on-site system issues until network connectivity is restored.

Although Google monitors site disconnects from the Google telemetry systems, we can't determine remotely whether the root cause is power, ISP connectivity, or a catastrophic site failure, such as a fire or flood.

If all hardware at a site stop reporting data back simultaneously, the likely cause is a local power or network issue. To avoid false alarms, Google might not communicate the issue until we confirm that it won't self-resolve, for example, due to ISP maintenance, and can't be resolved virtually. In that case, further troubleshooting is required.

When configured for bastion host and boundary proxy (BH/BP), Google monitors connectivity using the your BH/BP and Distributed Cloud connected devices through BH/BP using periodic test requests. Google expects you to monitor the overall healthiness of their BH/BP instances, such as by tracking the resource usage. If we detect connectivity issues to BH/BP or to Distributed Cloud connected devices and suspect that the issue might originate from the customer-owned components, we might ask you to diagnose and debug the issue.

Debugging

To aid in debugging, Google might ask for the following data:

  • Any configuration changes applied to non-Google managed networking equipment, such as the switch, router, or firewall, including the timestamp to the nearest second
  • Firewall deny logs, including the timestamp and details
  • The time of and reasons for any device reboots. Reasons might include a software upgrade, power failure, or software error.
  • The time of any power failures, which might be known, such as from building or data center management, or inferred from the last log message of other equipment
  • The time of any network outages, based on either the network provider or log messages on the router or firewall

For interoperability issues, Google might also require joint debugging with a vendor, including the sharing of device log files and the enabling of debug options. When possible, we try to reproduce the issue in a customer lab environment.

In some cases, Google can get information from our managed equipment, but the information might be incomplete. For example, after a power outage, the ISP connection can take longer to boot than the Distributed Cloud connected servers.

Responsibility division

Use the following table to determine who is responsible for common tasks.

Task Customer Google
Identify deployment disconnect issues and provide customer notifications for investigation X
Resolve power issues X
Resolve network issues, including any customer-supplied switches. X X
Resolve environment issues, such as cooling X
Resolve customer-owned bastion host instances and boundary proxy deployments, if deployed X
Monitor the API management plane X
Monitor the Kubernetes control plane, worker node, and built-in system services X
Monitor Google-supplied software add-ons and products, such as Symcloud Storage X
Monitor supplied hardware, such as servers and, for some deployments, network equipment X
Monitor customer-supplied networking equipment X
Monitor upstream network connectivity X
Provide joint debugging support for network or environmental issues X
Platform observability, including metrics and logs X
Application observability, including metrics and logs X
Respond to requests to investigate issues that are believed to be in the customer responsibility X