Use a custom OS image
You can use a custom OS image for your TPU VMs to pre-load software, use a specific OS distribution, or apply custom kernel modifications. Creating a custom image involves making specific system modifications during the image creation process and configuring the image to handle boot-time tasks required for TPU functionality.
Keep the following disclaimers in mind if you use a custom OS image with TPUs:
- Google provides default TPU-optimized Ubuntu long-term support (LTS) images. The OS changes listed on this page are only validated for the Google-supported, TPU-optimized Ubuntu LTS images.
- You are responsible for extrapolating the required OS changes for any other OS distribution or custom images. Google doesn't guarantee that the modifications for Ubuntu listed on this page work with other OS distributions or another Ubuntu image with a custom kernel.
- Google doesn't build or provide testing for any OS images other than the default TPU-optimized Ubuntu LTS images. You must build and test your custom OS image.
For more information about the default TPU-optimized Ubuntu LTS images, see TPU OS images.
Prerequisites
Your base image must have the following components installed:
- Python 3
- gcloud CLI
Make modifications during image creation
Apply the following modifications while building your custom Ubuntu image.
Bind TPU devices to VFIO
To allow the guest OS to access TPU hardware, you must bind
TPU devices to the vfio-pci driver.
Create a udev rules file named
99-tpu-vfiopci.rulesin/etc/udev/rules.d/:# Rules for binding vfio-enabled TPU devices to vfio-pci. # v5p SUBSYSTEM=="pci", ACTION=="add", ATTRS{vendor}=="0x1ae0", ATTRS{device}=="0x0062", ATTRS{subsystem_vendor}=="0x1ae0", ATTRS{subsystem_device}=="0x00ad", DRIVER!="vfio-pci", TAG+="bind_to_vfio_pci" # v6e SUBSYSTEM=="pci", ACTION=="add", ATTRS{vendor}=="0x1ae0", ATTRS{device}=="0x006f", ATTRS{subsystem_vendor}=="0x1ae0", ATTRS{subsystem_device}=="0x00d1", DRIVER!="vfio-pci", TAG+="bind_to_vfio_pci" # TPU7x SUBSYSTEM=="pci", ACTION=="add", ATTRS{vendor}=="0x1ae0", ATTRS{device}=="0x0076", ATTRS{subsystem_vendor}=="0x1ae0", ATTRS{subsystem_device}=="0x00f2", DRIVER!="vfio-pci", TAG+="bind_to_vfio_pci" # Bind all 'bind_to_vfio_pci' tagged devices to vfio-pci. TAG=="bind_to_vfio_pci", RUN+="/lib/udev/bind_to_vfio_pci.sh $kernel"Create a script named
bind_to_vfio_pci.shin/lib/udev/:#!/bin/bash #!/usr/bin/env bash # Run ./bind_to_vfio_pci.sh <DBDF> # Binds the device at <DBDF> to vfio-pci. # If the device is already bound to a driver, unbinds it first. # Load the vfio-pci module into the kernel. No-op if already loaded. modprobe vfio-pci DBDF_REGEX="^[[:xdigit:]]{4}:[[:xdigit:]]{2}:[[:xdigit:]]{2}.[[:xdigit:]]$" unset BDF if [[ $1 =~ $DBDF_REGEX ]]; then BDF=$1 else echo "Error: BDF arg ($1) is not in form dddd:bb:dd.f" exit 1 fi PCI_PATH="/sys/bus/pci/devices/$BDF" echo "vfio-pci" > "$PCI_PATH/driver_override" PCI_DRIVER_PATH="$PCI_PATH/driver" if [[ -d "$PCI_DRIVER_PATH" ]]; then curr_driver=$(readlink "$PCI_DRIVER_PATH") curr_driver=${curr_driver##*/} if [[ $curr_driver == "vfio-pci" ]]; then echo "$BDF already bound to vfio-pci" exit 0 else echo "$BDF" > "$PCI_DRIVER_PATH/unbind" if [[ -d "$PCI_DRIVER_PATH" ]]; then echo "Error: Unable to unbind $PCI_DRIVER_PATH" exit 1 fi echo "Unbound $BDF from driver $curr_driver" fi fi echo "$BDF" > /sys/bus/pci/drivers_probe echo "Bound $BDF to vfio-pci" # Grant read/write access on VFIO device to all users IOMMU_GROUP=$(readlink "$PCI_PATH/iommu_group" | xargs basename) VFIO_DEV="/dev/vfio/$IOMMU_GROUP" if [[ -c "$VFIO_DEV" ]]; then chmod 0666 "$VFIO_DEV" else echo "$VFIO_DEV not found" exit 1 fi # Set allow_unsafe_interrupts for x86 platforms. (uname -a | grep -q x86_64) && echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts # This is only needed to avoid non-zero exit code from previous command. echo "All Done!"Make the script executable:
chmod +x /lib/udev/bind_to_vfio_pci.shGrant all users on the system access to the TPU device:
echo 'KERNEL=="accel*" MODE="0666"' >> /etc/udev/rules.d/99-tpu.rules
Modify the image to enhance performance
To ensure optimal performance, adjust the following system limits and parameters.
Memory limits
Allow a single process to lock unlimited memory by updating
/etc/security/limits.conf:
echo '* hard memlock unlimited' >> /etc/security/limits.conf
echo '* soft memlock unlimited' >> /etc/security/limits.conf
File limits
Increase the number of open files by updating /etc/security/limits.conf:
echo "* soft nofile 100000" >> /etc/security/limits.conf
echo "* hard nofile 100000" >> /etc/security/limits.conf
echo "root soft nofile 100000" >> /etc/security/limits.conf
echo "root hard nofile 100000" >> /etc/security/limits.conf
Kernel parameters
Update your GRUB configuration (typically in /etc/default/grub) to include the
following parameters in GRUB_CMDLINE_LINUX:
idle=poll: Prevents the CPU from entering low-power idle states.intel_iommu=on,sm_on: Enables Intel Input-Output Memory Management Unit (IOMMU). Required for TPU7x and v5p architectures.transparent_hugepage=always: Enables Transparent Huge Pages (THP).
The following steps show how to update these kernel parameters:
Prevent the CPU from moving into a low power idle state by setting the following variable, which you will use in the next step.
kernel_cmdline="idle=poll"Enable the Intel Input-Output Memory Management Unit (IOMMU). This step is required for TPU7x and TPU v5p.
kernel_cmdline="${kernel_cmdline} intel_iommu=on,sm_on"; sed -i "s/GRUB_CMDLINE_LINUX=\"\"/GRUB_CMDLINE_LINUX=\"${kernel_cmdline}\"/" /etc/default/grub echo "Status: New kernel cmdline: $(cat /etc/default/grub | grep -e '^GRUB_CMDLINE_LINUX=')" update-grubEnable Transparent Huge Pages (THP):
echo "Status: Enabling THP" sed -i -r 's/GRUB_CMDLINE_LINUX="[a-zA-Z0-9_= ]*/& transparent_hugepage=always/' /etc/default/grub update-grub
Install vBar agent
The vBar agent is required for the inter-chip interconnect (ICI) network to function.
To install the vBar agent, run the following commands:
Authenticate Docker with Artifact Registry:
gcloud auth configure-docker us-docker.pkg.devPull the Docker image from Artifact Registry:
docker pull gcr.io/cloud-tpu-v2-images/vbar_control_agent:0.0.1Run a container using the vBar agent image:
docker run --privileged --net=host vbar_control_agent:0.0.1
Optional: Install and run AI Telemetry Collector
The AI Telemetry Collector runs inside the TPU VM and lets you access runtime
and infrastructure metrics through Cloud Monitoring or through your own
Prometheus-based monitoring pipeline. You can use the AI Telemetry Collector
with a custom OS by using the ai-telemetry-collector Docker image. You can
install the image onto your custom OS and use a config.yaml file to dictate
the collection intervals, enable or disable specific metrics, or change the
export destinations.
To install the AI Telemetry Collector, run the following commands:
Authenticate Docker with Artifact Registry:
gcloud auth configure-docker us-docker.pkg.devPull the Docker image from Artifact Registry:
docker pull gcr.io/cloud-tpu-v2-images/ai-telemetry-collector:latestRun a container using the AI Telemetry Collector image with the default configuration:
docker run --privileged --net=host ai-telemetry-collector:latestFor information about using a custom configuration file or adding additional configuration files, see AI Telemetry Collector.
Make boot time modifications
Configure your image to perform the tasks in the following sections every time a
VM boots. You can use the
cloud-init tool to
configure boot time tasks by passing metadata to your instances. The
configurations in the following sections use modules such as
write_files
and
runcmd.
Snippets that define files to be written should be included under the
write_files: key, and commands that should be run at boot time should be
included under the runcmd: key in your cloud-init configuration.
Start the vBar agent
Initiate the vBar control agent with the appropriate user and group IDs:
vbar_control_agent --logtostderr --gid= --uid= --chroot= --census_enabled=false --loas_pwd_fallback_in_corp
Configure environment variables
To ensure your environment is correctly initialized for TPU workloads, you must
retrieve runtime configuration variables from the Compute Engine metadata
server during the system boot process. To do this, add the following snippet to
the write_files: section of your cloud-init configuration, which creates a
script named /var/scripts/configure-env-vars.sh. This script automates
retrieval of attributes from the tpu-env metadata key and saves them in
/${HOME}/tpu-env to be used by the TPU software stack.
- path: /var/scripts/configure-env-vars.sh
permissions: 0444
owner: root
content: |
grep -q CLOUDSDK_PYTHON /etc/environment || echo "CLOUDSDK_PYTHON=/usr/bin/python3" >> /etc/environment
export HOME=/home/tpu-runtime
curl -s 'http://metadata.google.internal/computeMetadata/v1/instance/attributes/tpu-env' -H 'Metadata-Flavor: Google' > /tmp/tpu-env.yaml
eval $(python3 -c '''
import yaml
stream_in=open("/tmp/tpu-env.yaml", "r")
for k,v in yaml.safe_load(stream_in).items():
print("{var}=\"{value}\"".format(var = k, value = str(v)))
''' > "/${HOME}/tpu-env"
)
rm -f "/tmp/tpu-env.yaml"
printenv
cat ${HOME}/tpu-env
Get VM metadata
The following snippet creates a script named /var/scripts/get-vm-metadata.py,
a Python utility to programmatically query the metadata server for specific
instance attributes and custom metadata tags. Add the following to the
write_files: section of your cloud-init configuration:
- path: /var/scripts/get-vm-metadata.py
permissions: 0444
owner: root
content: |
import sys, requests, os
if len(sys.argv) < 2:
sys.stderr.write('Must provide key')
os._exit(1)
key = sys.argv[1]
default = None
if len(sys.argv) > 2:
default = sys.argv[2]
attribute_type = 'attributes'
if len(sys.argv) > 3:
attribute_type = sys.argv[3]
request = requests.get("http://metadata.google.internal/computeMetadata/v1/instance/{}/{}".format(attribute_type, key), headers={'Metadata-Flavor': 'Google'})
if request.status_code == 200:
print(request.content)
elif request.status_code == 404 or request.status_code == '403':
sys.stderr.write('Metadata key: {} does not exist\n'.format(key))
if default:
print(default)
else:
sys.stderr.write('Lookup failed with: {}'.format(request))
Increase Cloud Storage timeouts
If your workload interacts with Cloud Storage, increase timeout durations by
adding timeout values to /etc/environment. To do this, add the following
snippet to the write_files: section of your cloud-init configuration, which
creates a script named /var/scripts/configure-gcs-timeouts.sh.
- path: /var/scripts/configure-gcs-timeouts.sh
permissions: 0444
owner: root
content: |
echo "GCS_RESOLVE_REFRESH_SECS=60" >> /etc/environment
echo "GCS_REQUEST_CONNECTION_TIMEOUT_SECS=300" >> /etc/environment
echo "GCS_METADATA_REQUEST_TIMEOUT_SECS=300" >> /etc/environment
echo "GCS_READ_REQUEST_TIMEOUT_SECS=300" >> /etc/environment
echo "GCS_WRITE_REQUEST_TIMEOUT_SECS=600" >> /etc/environment
What's next
- Review available TPU OS images.
- Learn how to Manage TPU VMs.