llm-d

This document describes how your Google Kubernetes Engine deployment can use Google Cloud Managed Service for Prometheus to collect metrics from llm-d. llm-d consists of many components, including GKE Inference Gateway and vLLM.

For information about collecting metrics from GKE Inference Gateway and vLLM, see the following documents:

The instructions in these documents apply only if you are using managed collection with Managed Service for Prometheus. If you are using self-deployed collection, then see the llm-d documentation.

After you configure GKE Inference Gateway and vLLM, you can access a predefined dashboard in Cloud Monitoring to view the metrics.

Prerequisites

To collect metrics from llm-d by using Managed Service for Prometheus and managed collection, your deployment must meet the following requirements:

  • Your cluster must be running Google Kubernetes Engine version 1.28.15-gke.2475000 or later.
  • You must be running Managed Service for Prometheus with managed collection enabled. For more information, see Get started with managed collection.

You must also change the configuration of the PodMonitoring resource for vLLM. Use the following configuration:

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: llm-d-metrics
spec:
  selector:
    matchLabels:
      llm-d.ai/model: ms-pd-llm-d-modelservice
  endpoints:
  - port: 8200
    interval: 10s
    path: /metrics
  targetLabels:
    fromPod:
    - from: llm-d.ai/role
      to: role
    metadata:
    - pod
    - container
    - node
    - top_level_controller_name
    - top_level_controller_type

View dashboards

The Cloud Monitoring integration includes the llm-d Prometheus Overview dashboard. Dashboards are automatically installed when you configure the integration. You can also view static previews of dashboards without installing the integration.

To view an installed dashboard, do the following:

  1. In the Google Cloud console, go to the  Dashboards page:

    Go to Dashboards

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Select the Dashboard List tab.
  3. Choose the Integrations category.
  4. Click the name of the dashboard, for example, llm-d Prometheus Overview.

To view a static preview of the dashboard, do the following:

  1. In the Google Cloud console, go to the  Integrations page:

    Go to Integrations

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Click the Kubernetes Engine deployment-platform filter.
  3. Locate the llm-d integration and click View Details.
  4. Select the Dashboards tab.