Configure accurate time for U4 Compute Engine instances

This page describes how to configure accurate time for a U4 Compute Engine instance and how to configure monitoring so that you can view the accuracy of the time synchronization.

How it works

Google Cloud Ultra Low Latency (ULL) Solution uses the Firefly clock synchronization protocol to provide nanosecond-level synchronization. Firefly automatically does the following:

Performs internal synchronization to synchronize the physical network interface (NIC) clocks of all U4 instance host servers to each other.
Performs external synchronization to synchronize the physical NIC clocks of all U4 instance host servers to Coordinated Universal Time (UTC).

Because the physical NIC clocks of host servers that run U4 instances are automatically synchronized by Firefly, you can configure accurate time for your instance by synchronizing the system clock of the instance to the physical NIC clock of its host server.

For more information about Firefly, see the following Google Cloud Blog post: Understanding the Firefly clock synchronization protocol.

Before you begin

Before you configure accurate time for U4 Compute Engine instances, see the following sections.

Create a U4 instance with the required image

Create a U4 Compute Engine instance by using the image provided by Google for testing. See the procedure that corresponds to your use case:

To create a U4P or U4C instance, see Create ULL Compute Engine instances.
To create a U4S instance, see Create non-ULL Compute Engine instances for auxiliary workloads.

Ensure that no other clock synchronization services are running

The procedures on this page use chrony as the recommended clock synchronization client. Before you begin, ensure that there are no other clock synchronization services running on your instance, such as ntpd, systemd-timesyncd or phc2sys. Unexpected interactions with these services can cause errors with your chrony configuration.

In chrony version 4.7 and later, you can check the chronyd warnings log for other clock synchronization services by running the following command:

journalctl -u chronyd

If another clock synchronization service is running, the output includes a warning message such as the following: System clock interference detected (another NTP client?).

Configure `chrony` to load only after network drivers are stable

In some cases, systemd might load chrony before the network interface drivers finish initializing, which can result in chrony failing to start because it can't initialize the PTP hardware clock (PHC) device.

To avoid the preceding issue, override chrony's systemd unit file to wait for PHC devices to be ready:

Run the edit command:
```
systemctl edit chronyd
```

Add the override that corresponds to your instance type:

For U4P and U4C instances:

[Unit]
After=dev-ptp0.device dev-ptp1.device dev-ptp2.device
Requires=dev-ptp0.device dev-ptp1.device dev-ptp2.device

For U4S instances:

[Unit]
After=dev-ptp0.device
Requires=dev-ptp0.device

Restart the service. While the systemctl edit command that you ran previously automatically reloaded the daemon, we recommend that you run the following command to ensure that chrony is running after your changes.
```
systemctl restart chronyd
```

Configure `chrony` to use the Firefly-synced physical NIC clock

This section describes how to configure chrony to synchronize the system clock of your instance to the physical NIC clock on your instance's host server that is already synchronized by Firefly.

Your U4 instance's virtual network interfaces (vNICs) as shown in the guest OS (such as eth0) map to physical NICs on the instance's host server. A given vNIC can access the physical NIC clock by using the corresponding PTP hardware clock (PHC) device:

PHC device names in Linux have the following format: /dev/ptpNUMBER, where NUMBER is determined by the Linux kernel according to the device initialization order. For example, see the following PHC device names: /dev/ptp0, /dev/ptp1, /dev/ptp2.
To specify a physical NIC clock as the source for synchronization, the chrony configuration must use or resolve to the corresponding PHC device.

Each of the following sections provide examples of how to configure chrony according to the preceding requirements. See the section that corresponds to your instance type and chrony version:

Configure chrony 4.7 and later on U4P and U4C instances
Configure chrony 4.6.1 and earlier on U4P and U4C instances
Configure chrony 4.7 and later on U4S instances

Configure `chrony` 4.7 and later on U4P and U4C instances

chrony version 4.7 and later supports specifying a vNIC name (such as eth0) as the clock source and automatically resolves it to the corresponding PTP hardware clock (PHC) device that represents the physical NIC clock.

To configure chrony version 4.7 and later on a U4P or U4C instance, do the following:

Add the following to the chrony configuration file, /etc/chrony.conf. The file must contain only the following configuration. Ensure that you remove or overwrite any pre-existing contents of the file.

# Record the rate at which the system clock gains/loses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 micro-second.
makestep 0.0000001 3

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
log measurements statistics tracking refclocks

# U4 Compute Engine instance clocks are 200ppb accurate
maxclockerror 0.2

# Configure all clocks for tracking, but select only one of them as source.
refclock PHC eth0:nocrossts poll -1 noselect
refclock PHC eth1:nocrossts poll -1
refclock PHC eth2:nocrossts poll -1 noselect

# The following lines opportunistically enable Precision Time Measurement (PTM) based clock synchronization.
# Note that PTM can potentially result in a (constant) clock skew of up to 700 nanoseconds
# which is not accounted for in chrony's accuracy metrics.
refclock PHC eth0 poll -1 noselect
refclock PHC eth1 poll -1 noselect
refclock PHC eth2 poll -1 noselect

To apply the configuration, restart chrony by running the following command:
```
systemctl restart chronyd
```
chrony logs clock synchronization statistics to /var/log/chrony/tracking.log using the PTP hardware clock of eth1 as the time source.

Configure `chrony` 4.6.1 and earlier on U4P and U4C instances

chrony versions 4.6.1 and earlier require that you specify the PTP hardware clock (PHC) device name manually in your configuration file.

To configure chrony versions 4.6.1 and earlier on a U4P or U4C instance, do the following:

Get the index number of the PHC device associated with a vNIC. The following example uses eth0.

ethtool -T eth0

Review the output for PTP Hardware Clock:NUMBER

This example output shows PTP Hardware Clock: 1, which corresponds to /dev/ptp1.

Time stamping parameters for eth0:
Capabilities:
        hardware-receive
        software-receive
        software-system-clock
        hardware-raw-clock
PTP Hardware Clock: 1
Hardware Transmit Timestamp Modes:
        off
Hardware Receive Filter Modes:
        none
        all

The ethtool output in the preceding step showed that eth0 uses /dev/ptp1. The following example synchronizes the system clock to the corresponding physical NIC clock for eth0 by specifying: refclock PHC /dev/ptp1:nocrossts poll -1.

# Record the rate at which the system clock gains/loses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 micro-second.
makestep 0.0000001 3

# Enable kernel synchronization of the real-time clock (RTC).
rtcsync

# Save NTS keys and cookies.
ntsdumpdir /var/lib/chrony

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
log measurements statistics tracking refclocks

# U4 Compute Engine instance clocks are 200ppb accurate
maxclockerror 0.2

# Configure all clocks for tracking, but select only one of them as source.
refclock PHC /dev/ptp0:nocrossts poll -1 noselect
refclock PHC /dev/ptp1:nocrossts poll -1
refclock PHC /dev/ptp2:nocrossts poll -1 noselect

# The following lines opportunistically enable Precision Time Measurement (PTM) based clock synchronization.
# Note that PTM can potentially result in a (constant) clock skew of up to 700 nanoseconds
# which is not accounted for in chrony's accuracy metrics.
refclock PHC /dev/ptp0 poll -1 noselect
refclock PHC /dev/ptp1 poll -1 noselect
refclock PHC /dev/ptp2 poll -1 noselect

To apply the configuration, restart chrony by running the following command:
```
systemctl restart chronyd
```
chrony logs clock synchronization statistics to /var/log/chrony/tracking.log using the PTP hardware clock of eth0 as the time source.

Configure `chrony` 4.7 and later on U4S instances

We recommend that you use chrony versions 4.7 and later for U4S instances. Using earlier versions might result in frequent errors because the Compute Engine clock synchronization device for virtual machine (VM) instances (ptp_kvm) can cause changes to the index numbers of PTP hardware clock (PHC) devices.

This example configuration for U4S instances is similar to the one used for U4P and U4C instances, but has the following differences:

This example includes a single vNIC. A U4S instance can have multiple vNICs, but all of the vNICs are backed by the same physical NIC and access the same physical NIC clock.
Precision Time Measurement (PTM) isn't available.

To configure chrony versions 4.7 and later on a U4S instance, do the following:

# Record the rate at which the system clock gains/loses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 micro-second.
makestep 0.0000001 3

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
log measurements statistics tracking refclocks

# U4 Compute Engine instance clocks are 200ppb accurate
maxclockerror 0.2

# Configure all clocks for tracking, but select only one of them as source.
refclock PHC eth0:nocrossts poll -1

To apply the configuration, restart chrony by running the following command:
```
systemctl restart chronyd
```
chrony logs clock synchronization statistics to /var/log/chrony/tracking.log using the PTP hardware clock of eth0 as the time source.

Verify the `chrony` configuration

To verify that chrony is configured correctly, run the following command:

chronyc sourcestats

A successful configuration results in output similar to the following:

Name/IP Address             NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
PHC0                        5   3     2     -0.002      0.014     +9ns     2ns
PHC1                        5   3     2     -0.003      0.007     -0ns     1ns
PHC2                        5   3     2     -0.004      0.016    +33ns     2ns
PHC3                        5   5     2     +0.002      0.078   +135ns    10ns
PHC4                        5   3     2     -0.005      0.077   +130ns     9ns
PHC5                        5   5     2     -0.006      0.131   +123ns    16ns

If the command returns unexpected output, see Troubleshoot.

Modify the `chrony` configuration

If you want to modify which physical NIC to synchronize the system clock of your instance to, you can update your configuration as follows:

Remove noselect from the line that includes the vNIC name for which you want to use the corresponding physical NIC clock.
Add noselect to the line that includes the vNIC name for which you want to stop using the corresponding physical NIC clock.
Apply the new configuration by restarting chronyd: systemctl restart chronyd.

Monitor time synchronization

This section describes the metrics that are available for time synchronization and how to use them to monitor time synchronization accuracy.

Available metrics for time synchronization

You can use the following metrics to monitor time synchronization:

Measurement Available metric and description

Instance system clock to physical NIC clock

Measurement	Available metric and description
Instance system clock to physical NIC clock	`logging.googleapis.com/user/phc-clock-max-error` This metric measures the accuracy of the synchronization of the instance system clock to the physical NIC clock on its host server. You must configure this metric by collecting it from Ops Agent and creating a log-based metric as described in Configure a custom metric for the instance system clock, which also creates a custom dashboard. Additionally, you can use this metric in the procedures described in Use Cloud Monitoring metrics.
Physical NIC clock to UTC	`compute.googleapis.com/instance/time/firefly_utc_traceable_uncertainty` This metric represents the maximum error bound of the physical NIC clock compared to the actual UTC time. It is automatically reported to Cloud Monitoring. You view this metric, define alerting policies, and create custom dashboards as described in Use Cloud Monitoring metrics.
General health status of the physical NIC clock	`compute.googleapis.com/instance/time/firefly_nic_sync_healthy` This boolean metric indicates the general health status of the physical NIC clock, including both NIC-to-NIC and NIC-to-UTC synchronization. It is automatically reported to Cloud Monitoring. You view this metric, define alerting policies, and create custom dashboards as described in Use Cloud Monitoring metrics.

logging.googleapis.com/user/phc-clock-max-error

This metric measures the accuracy of the synchronization of the instance system clock to the physical NIC clock on its host server.

You must configure this metric by collecting it from Ops Agent and creating a log-based metric as described in Configure a custom metric for the instance system clock, which also creates a custom dashboard. Additionally, you can use this metric in the procedures described in Use Cloud Monitoring metrics.

Physical NIC clock to UTC

compute.googleapis.com/instance/time/firefly_utc_traceable_uncertainty

This metric represents the maximum error bound of the physical NIC clock compared to the actual UTC time. It is automatically reported to Cloud Monitoring.

You view this metric, define alerting policies, and create custom dashboards as described in Use Cloud Monitoring metrics.

General health status of the physical NIC clock

compute.googleapis.com/instance/time/firefly_nic_sync_healthy

This boolean metric indicates the general health status of the physical NIC clock, including both NIC-to-NIC and NIC-to-UTC synchronization. It is automatically reported to Cloud Monitoring.

You view this metric, define alerting policies, and create custom dashboards as described in Use Cloud Monitoring metrics.

For information about the length of time that Cloud Monitoring retains metric data, see Data retention in Cloud Monitoring quotas and limits. For information about exporting metrics for long-term analysis, see Cloud Monitoring metric export in the Cloud Architecture Center documentation.

Configure a custom metric for the instance system clock

This section provides an example monitoring configuration that does the following:

Configures Ops Agent to collect chrony's log for synchronization accuracy from your instance
Configures Cloud Monitoring to ingest the corresponding log from all instances in your project as a log-based metric

Configure Google Cloud Ops Agent on your instance

To configure Ops Agent to collect the metric needed for monitoring, do the following:

If you haven't already, install Ops Agent on your instance.

Add the following configuration to the /etc/google-cloud-ops-agent/config.yaml file:

logging:
  receivers:
    chrony_tracking_receiver:
      type: files
      include_paths:
        - /var/log/chrony/tracking.log
  processors:
    chrony_tracking_processor:
      type: parse_regex
      regex: "^.*PHC.*  (?<max_error>[-\d\.eE]+)$"
  service:
    pipelines:
      chrony_tracking_pipeline:
        receivers: [chrony_tracking_receiver]
        processors: [chrony_tracking_processor]

Restart Ops Agent by running the following command:
```
systemctl restart google-cloud-ops-agent
```

Configure a log-based metric and dashboard in your project

To configure time synchronization monitoring for the instances in your project, run the following logging and dashboard setup script. This script helps you to complete the following tasks:

It sets appropriate permissions on the service account associated with the project of your instance. The script assumes that the service account used for the instances is the default service account for the project. If needed, replace SERVICE_ACCOUNT_EMAIL with a different value.
It creates a log-based metric that measures the accuracy of the time synchronization between the instance's system clock and physical NIC clock on the instance's host server.
It creates a dashboard that displays the accuracy of the time synchronization based on the metric.

To accomplish the preceding tasks, run the following script. After the script completes running, use the dashboard it created to view clock accuracy data for your project's instances.

#!/bin/bash

if [ -z "$1" ]; then
    echo "Usage: setup_logging.sh <project_id>" >&2
    exit 1
fi

PROJECT_ID="$1"
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID  --format="value(projectNumber)")
SERVICE_ACCOUNT_EMAIL=${PROJECT_NUMBER}-compute@developer.gserviceaccount.com

# Grant permissions:

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
    --role="roles/compute.instanceAdmin"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
    --role="roles/monitoring.metricWriter"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
    --role="roles/logging.logWriter"

# Configure log-based metric
METRIC_CONF='
{
  "name": "phc-clock-max-error",
  "description": "Maximum error of the VM clock from the host clock exposed by ptp_kvm",
  "filter": "logName=~\".*/logs/chrony_tracking_receiver\"",
  "metricDescriptor": {
    "metricKind": "DELTA",
    "valueType": "DISTRIBUTION",
    "unit": "s",
    "labels": [ { "key": "instance_id", "valueType": "STRING",
          "description": "Instance ID for the source instance" } ]
  },
  "valueExtractor": "REGEXP_EXTRACT(jsonPayload.max_error, \"(.*)\")",
  "bucketOptions": {
    "explicitBuckets": {
      "bounds": [
        0.0, 1.0E-6, 5.0E-6, 1.0E-5, 1.0E-4, 0.001, 0.01, 0.1, 1.0
      ]
    }
  },
  "labelExtractors": {
    "instance_id": "REGEXP_EXTRACT(resource.labels.instance_id, \"(.*)\")"
  }
}
'
echo "$METRIC_CONF" > /tmp/clock-error-metric.json
gcloud logging metrics create --project=${PROJECT_ID} phc-clock-max-error --config-from-file=/tmp/clock-error-metric.json

# Create a dashboard plotting the clock accuracy

DASHBOARD_CONF='
{
  "displayName": "Chrony Accuracy",
  "dashboardFilters": [],
  "labels": {},
  "mosaicLayout": {
    "columns": 48,
    "tiles": [
      {
        "height": 28,
        "width": 28,
        "widget": {
          "xyChart": {
            "chartOptions": {
              "displayHorizontal": false,
              "mode": "COLOR"
            },
            "dataSets": [
              {
                "plotType": "LINE",
                "targetAxis": "Y1",
                "timeSeriesQuery": {
                  "prometheusQuery": "(\n    histogram_quantile(\n        1,\n        sum by (le, instance_id, monitored_resource) (\n            increase(\n                logging_googleapis_com:user_phc_clock_max_error_bucket{monitored_resource=\"gce_instance\"}[1m]\n            )\n        )\n    ) * 1000000000\n)",
                  "unitOverride": "ns"
                }
              }
            ],
            "thresholds": [],
            "yAxis": {
              "label": "Clock Accuracy",
              "scale": "LINEAR"
            }
          }
        }
      }
    ]
  }
}
'

echo "$DASHBOARD_CONF" > /tmp/metrics-dashboard.json

gcloud monitoring dashboards create --project=${PROJECT_ID} --config-from-file=/tmp/metrics-dashboard.json

Use Cloud Monitoring metrics

The following sections describe how to use Cloud Monitoring metrics. You can use each of the available metrics for time synchronization in the following sections.

In addition to the Google Cloud console, you can create custom dashboards, set up alerts, and query the metrics through the Monitoring API.

View metrics in Monitoring

This section describes how to view metrics in Monitoring.

Console

To view the metrics for a monitored resource by using the Metrics Explorer, do the following:

In the Google Cloud console, go to the Metrics explorer page:
Go to Metrics explorer

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
In the toolbar of the Google Cloud console, select your Google Cloud project. For App Hub configurations, select the App Hub host project or the app-enabled folder's management project.
In the Metric element, expand the Select a metric menu, enter VM instance in the filter bar, and then use the submenus to select a specific resource type and metric:
1. In the Active resources menu, select VM instance.
2. To select a metric, use the Active metric categories and Active metrics menus. For a list of available metrics, see the Available metrics for time synchronization.
3. Click Apply.
To add filters, which remove time series from the query results, use the Filter element.
Configure how the data is viewed. By default, the display aggregates the metrics from all instances and physical NICs. To show per-NIC, per-instance metrics, do the following: in the Aggregation element, select Unaggregated.
For more information about configuring a chart, see Select metrics when using Metrics Explorer.

Define alerting policies

This section describes how to define alerting policies.

When configuring how Monitoring evaluates a condition when data stops arriving, we recommend the option Missing data points treated as values that violate the policy condition, which helps capture silent data loss. However, this setting causes false positive alerts when an instance is deleted.

Console

You can create alerting policies to monitor the values of metrics and to notify you when those metrics violate a condition.

In the Google Cloud console, go to the Alerting page:
Go to Alerting

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
If you haven't created your notification channels and if you want to be notified, then click Edit Notification Channels and add your notification channels. Return to the Alerting page after you add your channels.
From the Alerting page, select Create policy.
To select the metric, expand the Select a metric menu and then do the following:
1. To limit the menu to relevant entries, enter VM Instance into the filter bar. If there are no results after you filter the menu, then disable the Show only active resources & metrics toggle.
2. For the Resource type, select VM Instance.
3. For the Metric category, select Instance.
4. For the Metric, select a metric from the list in Available metrics for time synchronization..
5. Select Apply.
Click Next.
The settings in the Configure alert trigger page determine when the alert is triggered. Select a condition type and, if necessary, specify a threshold. For more information, see Create metric-threshold alerting policies.
Click Next.
Optional: To add notifications to your alerting policy, click Notification channels. In the dialog, select one or more notification channels from the menu, and then click OK.
Optional: Update the Incident autoclose duration. This field determines when Monitoring closes incidents in the absence of metric data.
Optional: Click Documentation, and then add any information that you want included in a notification message.
Click Alert name and enter a name for the alerting policy.
Click Create Policy.

For more information, see Alerting overview.

Create custom Monitoring dashboards

This section describes how to create custom dashboards.

Console

In the Google Cloud console, go to the Dashboards page:
Go to Dashboards

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
Click Create dashboard
Optional: Update the dashboard title with a descriptive name for your dashboard.
For each widget that you want to add to your dashboard, click Add widget, complete the dialog, and then select Apply.

For more information about adding widgets, see the following pages:

Troubleshoot

You might receive unexpected output when verifying your chrony configuration, such as the following output that indicates chrony failed to start:

506 Cannot talk to daemon

To help troubleshoot, check the journald logs for chrony:

journalctl -u chronyd.service

The following example output shows an error that occurs if you apply a configuration intended for chrony 4.7 and later while an earlier version of chrony is installed on your instance.

Feb 19 06:19:42 host-name systemd[1]: Starting chronyd.service - NTP client/server...
Feb 19 06:19:42 host-name chronyd[35160]: chronyd version 4.6.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 +DEBUG)
Feb 19 06:19:42 host-name chronyd[35160]: Setting filter length for PHC0 to 1
Feb 19 06:19:42 host-name chronyd[35160]: Could not open eth0 : No such file or directory
Feb 19 06:19:42 host-name chronyd[35160]: Fatal error : Could not open PHC
Feb 19 06:19:42 host-name chronyd[35157]: Could not open PHC
Feb 19 06:19:42 host-name systemd[1]: chronyd.service: Control process exited, code=exited, status=1/FAILURE
Feb 19 06:19:42 host-name systemd[1]: chronyd.service: Failed with result 'exit-code'.
Feb 19 06:19:42 host-name systemd[1]: Failed to start chronyd.service - NTP client/server.

Limitations

See the following limitations:

The firefly_utc_traceable_uncertainty and firefly_nic_sync_healthy metrics are available only for U4P and U4C instances. These metrics aren't available for U4S instances.

Configure accurate time for U4 Compute Engine instances

How it works

Before you begin

Create a U4 instance with the required image

Ensure that no other clock synchronization services are running

Configure chrony to load only after network drivers are stable

Configure chrony to use the Firefly-synced physical NIC clock

Configure chrony 4.7 and later on U4P and U4C instances

Configure chrony 4.6.1 and earlier on U4P and U4C instances

Configure chrony 4.7 and later on U4S instances

Verify the chrony configuration

Modify the chrony configuration

Monitor time synchronization

Available metrics for time synchronization

Configure a custom metric for the instance system clock

Configure Google Cloud Ops Agent on your instance

Configure a log-based metric and dashboard in your project

Use Cloud Monitoring metrics

View metrics in Monitoring

Console

Define alerting policies

Console

Create custom Monitoring dashboards

Console

Troubleshoot

Limitations

Configure `chrony` to load only after network drivers are stable

Configure `chrony` to use the Firefly-synced physical NIC clock

Configure `chrony` 4.7 and later on U4P and U4C instances

Configure `chrony` 4.6.1 and earlier on U4P and U4C instances

Configure `chrony` 4.7 and later on U4S instances

Verify the `chrony` configuration

Modify the `chrony` configuration