This page covers troubleshooting the virtual machine (VM) for the Application Operator (AO) in Google Distributed Cloud (GDC) air-gapped appliance.
Recover a full VM boot disk
If a VM runs out of space on the boot disk, for example, when an application
fills the boot disk partition with logs, critical capabilities on the VMs fail
to work. You might not have the ability to add a new SSH key through the
VirtualMachineAccessRequest resource, or establish an SSH connection into the
VM using existing keys.
This page describes the steps to create a new VM and attaching the disk to recover the contents to a new VM as an additional disk. These steps demonstrate the following:
- A successful SSH connection into the new VM.
- Increase the amount of space by mounting the disk to recover and delete unnecessary data.
- Delete the new VM and replace the original disk to the original VM.
Before you begin
Before continuing, ensure you request project-level VM access. Follow
steps given
to assign the Project VirtualMachine Admin (project-vm-admin) role.
For VM operations using the gdcloud CLI,
request your Project IAM Admin to assign you both the
Project VirtualMachine Admin role and the Project Viewer (project-viewer)
role.
To use gdcloud command-line interface (CLI) commands, ensure that you have downloaded, installed,
and configured the gdcloud CLI.
All commands for GDC air-gapped appliance use the gdcloud or
kubectl CLI, and require an operating system (OS) environment.
Get the kubeconfig file path
To run commands against the Management API server, ensure you have the following resources:
Locate the Management API server name, or ask your Platform Administrator (PA) what the server name is.
Sign in and generate the kubeconfig file for the Management API server if you don't have one.
Use the path to replace
MANAGEMENT_API_SERVER{"</var>"}}in these instructions.
Recover a VM disk out of space
To recover a VM boot disk out of space, complete the following steps:
Stop the existing VM by following Stop a VM.
Edit the existing VM:
kubectl --kubeconfig ADMIN_KUBECONFIG edit \ virtualmachine.virtualmachine.gdc.goog -n PROJECT VM_NAMEReplace the existing VM disk name in the
specfield with a new placeholder name:... spec: disks: - boot: true virtualMachineDiskRef: name: VM_DISK_PLACEHOLDER_NAMECreate a new VM with an image operating system (OS) different from the original VM. For example, if the original disk uses the OS
ubuntu-2004, create the new VM withrocky-8.Attach the original disk as an additional disk to the new VM:
... spec: disks: - boot: true autoDelete: true virtualMachineDiskRef: name: NEW_VM_DISK_NAME - virtualMachineDiskRef: name: ORIGINAL_VM_DISK_NAMEReplace the following:
- NEW_VM_DISK_NAME: the name you give to the new VM disk.
- ORIGINAL_VM_DISK_NAME: the name of the original VM disk.
After you've created the VM and it is running, establish an SSH connection to the VM by following Connect to a VM.
Create a directory and mount the original disk to a mount point. For example,
/mnt/disks/new-disk.Check through the files and directories in the mount directory using extra space:
cd /mnt/disks/MOUNT_DIR du -hs -- * | sort -rh | head -10Replace MOUNT_DIR with the name of the directory where you mounted the original disk.
The output is similar to the following:
18G home 1.4G usr 331M var 56M boot 5.8M etc 36K snap 24K tmp 16K lost+found 16K dev 8.0K runCheck through each of the files and directories to verify the amount of space each are using. This example checks the
homedirectory as it uses18Gof space.cd home du -hs -- * | sort -rh | head -10The output is similar to the following:
17G log_file ... 4.0K readme.md 4.0K main.goThe example file
log_fileis a file to clear as it consumes17Gof space, and is not necessary.Delete the files you don't need that consume extra space, or back up the files to the new VM boot disk:
Move the files you want to keep:
mv /mnt/disks/MOUNT_DIR/home/FILENAME/home/backup/Delete the files consuming extra space:
rm /mnt/disks/MOUNT_DIR/home/FILENAMEReplace FILENAME with the name of the file you want to move or delete.
Log out of the new VM and Stop the VM.
Edit the new VM to remove the original disk from the
specfield:kubectl --kubeconfig ADMIN_KUBECONFIG \ edit virtualmachine.virtualmachine.gdc.goog -n PROJECT NEW_VM_NAMERemove the
virtualMachineDiskReflist that contains the original VM disk name:spec: disks: - autoDelete: true boot: true virtualMachineDiskRef: name: NEW_VM_DISK_NAME - virtualMachineDiskRef: # Remove this list name: ORIGINAL_VM_DISK_NAME # Remove this disk nameEdit the original VM and replace VM_DISK_PLACEHOLDER_NAME you set in step two with the previous name:
... spec: disks: - boot: true virtualMachineDiskRef: name: VM_DISK_PLACEHOLDER_NAME # Replace this name with the previous VM nameStart the original VM. If you've cleared enough space, the VM boots successfully.
If you don't need the new VM, delete the VM:
kubectl --kubeconfig ADMIN_KUBECONFIG \ delete virtualmachine.virtualmachine.gdc.goog -n PROJECT NEW_VM_NAME
Provision a virtual machine
This section describes how to troubleshoot issues that might occur while provisioning a new virtual machine (VM) in Google Distributed Cloud (GDC) air-gapped appliance.
The Application Operator (AO) must run all commands against the default user cluster.
Unable to create disk
If a PersistentVolumeClaim (PVC) is in a Pending state, review the following
alternatives to resolve the state:
The storage class does not support creating a PVC with the
ReadWriteManyaccess mode:Update the
spec.dataVolumeTemplate.spec.pvc.storageClassNamevalue of the virtual machine with a storage class that supports aReadWriteManyaccess mode and uses a Container Storage Interface (CSI) driver as its storage provisioner.If no other storage classes on the cluster can provide the
ReadWriteManycapability, update thespec.dataVolumeTemplate.spec.pvc.accessModevalue to include theReadWriteOnceaccess mode.
The CSI driver is unable to provision a
PersistentVolume:Check for an error message:
kubectl describe pvc VM_NAME-boot-dv -n NAMESPACE_NAMEReplace the following variables:
VM_NAME: The name of the virtual machine.NAMESPACE_NAME: The name of the namespace.
Configure the driver to resolve the error. To ensure that the
PersistentVolumeprovisioning works, create a test PVC in a newspecwith a different name than the one specified in thedataVolumeTemplate.spec.pvc:cat <<EOF | kubectl apply - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-pvc namespace: NAMESPACE_NAME spec: storageClassName: standard-rwx accessModes: - ReadWriteMany resources: requests: storage: 10Gi EOFAfter the
PersistentVolumeobject provisioning is successful, delete the test PVC after verification:kubectl delete pvc test-pvc -n NAMESPACE_NAME
Unable to create a virtual machine
If the virtual machine resource is applied but does not get to a Running state, follow these steps:
Review the virtual machine logs:
kubectl get vm VM_NAME -n NAMESPACE_NAMECheck the corresponding Pod status of the virtual machine:
kubectl get pod -l kubevirt.io/vm=VM_NAMEThe output shows a Pod status. The possible options are as follows:
The ContainerCreating state
If the Pod is in the ContainerCreating state, follow these steps:
Get additional details about the Pod's state:
kubectl get pod -l kubevirt.io/vm=VM_NAMEIf the volumes are unmounted, ensure all the volumes specified in the
spec.volumesfield are successfully mounted. If the volume is a disk, check the disk status.The
spec.accessCredentialsfield specifies a value to mount a SSH public key. Ensure that the secret is created in the same namespace as the virtual machine.
If there are not enough resources on the cluster to create the Pod, follow these steps:
If the cluster does not have enough compute resources to schedule the virtual machine Pod, remove other unwanted Pods to help release resources.
Reduce the
spec.domain.resources.requests.cpuandspec.domain.resources.requests.memoryvalues of the virtual machine.
The Error or CrashLoopBackoff state
To resolve Pods in Error or CrashLoopBackoff states, retrieve logs from the
virtual machine compute Pod:
kubectl logs -l kubevirt.io/vm=VM_NAME -c compute
The Running state and virtual machine failure
If the Pod is in the Running state but the virtual machine itself fails,
follow these steps:
View the logs from the virtual machine log Pod:
kubectl logs -l kubevirt.io/vm=VM_NAME -c logIf the log shows errors in the virtual machine startup, check the correct boot device of the virtual machine. Set the
spec.domain.devices.disks.bootOrdervalue of the primary boot disk with the value of1. Use the following example as a reference:… spec: domain: devices: disks: - bootOrder: 1 disk: bus: virtio name: VM_NAME-boot-dv …
To troubleshoot configuration issues with the virtual machine image, create another virtual machine with a different image.
Access the serial console
This section describes how use VM instance's serial console to debug boot and networking issues, troubleshoot malfunctioning instances, interact with the Grand Unified Bootloader (GRUB), and perform other troubleshooting tasks.
Interacting with a serial port is comparable to using a terminal window: the input and output is in text mode, without graphical interface support. The operating system (OS) of the instance, basic input and output (BIOS), often writes output to the serial ports and accepts inputs such as commands.
To get access to the serial console, work through the following sections:
Configure username and password
By default, GDC Linux system images are not configured to allow password-based logins for local users.
If your VM is running an image pre-configured with serial console login, you can set up a local password on the VM and log in through the serial console. In GDC Linux VMs, you configure the username and password through a startup script saved as a Kubernetes secret during or post VM creation.
The following instructions describe how to set up a local password after VM creation. To configure the username and password, complete the following steps:
- Create a text file.
In the text file, configure the username and password:
#!/bin/bash username="USERNAME" password="PASSWORD" sudo useradd -m -s /bin/bash "$username" echo "$username:$password" | sudo chpasswd sudo usermod -aG sudo "$username"Replace the following:
USERNAME: the username that you want to add.PASSWORD: the password for the username. Avoid basic passwords, as some OS might require minimal password length and complexity.
Create the startup script as a Kubernetes secret:
kubectl --kubeconfig=ADMIN_KUBECONFIG create secret \ generic STARTUP_SCRIPT_NAME -n PROJECT_NAMESPACE \ --from-file=STARTUP_SCRIPT_PATHReplace the following:
PROJECT_NAMESPACE: the namespace of the project where the VM resides.- STARTUP_SCRIPT_NAME
: the name you give to the startup script. For example,configure-credentials`. STARTUP_SCRIPT_PATH: the path to the startup script that contains the username and password you configured.
Edit the VM specification:
kubectl --kubeconfig=ADMIN_KUBECONFIG edit gvm \ -n PROJECT_NAMESPACE VM_NAMEReplace
VM_NAMEwith the name of the VM to add in the startup script.In the
startupScriptsfield, add in the Kubernetes secret reference you created in step three:spec: compute: memory: 8Gi vcpus: 8 disks: - boot: true virtualMachineDiskRef: name: disk-name startupScripts: - name: STARTUP_SCRIPT_NAME scriptSecretRef: name: STARTUP_SCRIPT_NAME-
- If you are working on a new VM, skip this step.
Access the VM serial console
To start accessing the VM serial console, do the following:
Connect to the serial console:
gdcloud compute connect-to-serial-port VM_NAME \ --project PROJECT_NAMESPACEWhen prompted, enter the username and password you defined in Configure username and password.