Questo documento descrive i log e le metriche che Gemini on Google Distributed Cloud connected API raccoglie ed esporta.
Configurazione di logging e monitoraggio
Prima di poter iniziare a raccogliere log e metriche, devi:
Abilitare le API Logging utilizzando i seguenti comandi:
gcloud services enable opsconfigmonitoring.googleapis.com --project PROJECT_ID gcloud services enable logging.googleapis.com --project PROJECT_ID gcloud services enable monitoring.googleapis.com --project PROJECT_ID
Sostituisci
PROJECT_IDcon l'ID del progetto di destinazione Google Cloud .Concedere i ruoli necessari per scrivere log e metriche:
gcloud projects add-iam-policy-binding PROJECT_ID \ --role roles/opsconfigmonitoring.resourceMetadata.writer \ --member "serviceAccount:PROJECT_ID.svc.id.goog[kube-system/metadata-agent]" gcloud projects add-iam-policy-binding PROJECT_ID \ --role roles/logging.logWriter \ --member "serviceAccount:PROJECT_ID.svc.id.goog[kube-system/stackdriver-log-forwarder]" gcloud projects add-iam-policy-binding PROJECT_ID \ --role roles/monitoring.metricWriter \ --member "serviceAccount:PROJECT_ID.svc.id.goog[kube-system/gke-metrics-agent]"Sostituisci
PROJECT_IDcon l'ID del progetto di destinazione Google Cloud .
Log
Questa sezione elenca i tipi di risorse Cloud Logging supportati da Gemini on GDC connected API. Per visualizzare i log di Gemini on GDC connected API, utilizza l' Esplora log nella Google Cloud console. Il logging di Gemini on GDC connected API è sempre abilitato.
Il tipo di risorsa registrata di Gemini on GDC connected API è aiplatform.googleapis.com/Endpoint.
Puoi anche acquisire e recuperare i log di Gemini on GDC connected API utilizzando l'API Cloud Logging. Per informazioni su come configurare questo meccanismo di logging, consulta la documentazione delle librerie client di Cloud Logging.
Metriche
Questa sezione elenca le metriche di Cloud Monitoring supportate da Gemini on GDC connected API. Per visualizzare le metriche di Gemini on GDC connected API, utilizza l' Esplora metriche nella Google Cloud console.
Metriche dei cluster Distributed Cloud connected
Gli endpoint di Gemini on GDC connected API vengono implementati sui cluster Distributed Cloud connected. Per informazioni su log e metriche per Distributed Cloud connected, consulta Log e metriche.
Metriche di Inference Gateway
| Nome metrica Prometheus | Tipo di metriche | Tipo di dati | Etichette | Tipo di Chemist | metric_kind di Chemist | value_type di Chemist | Etichette di Chemist |
|---|---|---|---|---|---|---|---|
| ig_ops_successful_incoming_requests | Contatore | modello | aiplatform.googleapis.com/prediction/internal/gdc/ig/successful_requests | CUMULATIVE | INT64 | modello | |
| ig_ops_unique_users | Contatore | modello | aiplatform.googleapis.com/prediction/internal/gdc/ig/unique_users | CUMULATIVE | INT64 | modello | |
| ig_tokens_per_minute | Istogramma | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/ig/tokens_per_min | CUMULATIVE | DISTRIBUTION | modello |
| ig_total_response_time | Istogramma | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/ig/response_time | CUMULATIVE | DISTRIBUTION | modello |
| ig_ops_ffmpeg_image_latency | Istogramma | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/ig/ffmpeg_image_latencies | CUMULATIVE | DISTRIBUTION | modello |
| ig_ops_ffmpeg_video_latency | Istogramma | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/ig/ffmpeg_video_latencies | CUMULATIVE | DISTRIBUTION | modello |
| ig_ops_ffmpeg_audio_latency | Istogramma | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/ig/ffmpeg_audio_latencies | CUMULATIVE | DISTRIBUTION | modello |
| ig_time_to_first_token | Istogramma | double | modello context_window | aiplatform.googleapis.com/prediction/internal/gdc/ig/ttft | CUMULATIVE | DISTRIBUTION | modello context_window |
| ig_time_per_output_token | Istogramma | double | modello context_window | aiplatform.googleapis.com/prediction/internal/gdc/ig/tpot | CUMULATIVE | DISTRIBUTION | modello context_window |
| ig_cache_hit | Contatore | modello | aiplatform.googleapis.com/prediction/internal/gdc/ig/cache_hit_count | CUMULATIVE | DISTRIBUTION | modello _gdch_project | |
| ig_cache_miss | Contatore | modello | aiplatform.googleapis.com/prediction/internal/gdc/ig/cache_miss_count | CUMULATIVE | DISTRIBUTION | modello _gdch_project |
Metriche di GenAI Router
| Nome metrica Prometheus | Tipo di metriche | Tipo di dati | Etichette | Tipo di Chemist | metric_kind di Chemist | value_type di Chemist | Etichette di Chemist |
|---|---|---|---|---|---|---|---|
| llm_total_request_latency_milliseconds | Istogramma | double | context_window modello | aiplatform.googleapis.com/prediction/internal/gdc/gair/total_request_latencies | CUMULATIVE | DISTRIBUTION | context_window modello |
| llm_unary_request_latency_milliseconds | Istogramma | double | context_window modello | aiplatform.googleapis.com/prediction/internal/gdc/gair/unary_request_latencies | CUMULATIVE | DISTRIBUTION | context_window modello |
| llm_streaming_ttft_milliseconds | Istogramma | double | context_window modello | aiplatform.googleapis.com/prediction/internal/gdc/gair/ttft_ms | CUMULATIVE | DISTRIBUTION | context_window modello |
| llm_streaming_tpot_milliseconds | Istogramma | double | context_window modello | aiplatform.googleapis.com/prediction/internal/gdc/gair/tpot_ms | CUMULATIVE | DISTRIBUTION | context_window modello |
| llm_input_token_count | Istogramma | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/gair/input_token_count | CUMULATIVE | DISTRIBUTION | modello |
| llm_output_token_count | Istogramma | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/gair/output_token_count | CUMULATIVE | DISTRIBUTION | modello |
| llm_success_response_count | Contatore | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/gair/success_response_count | CUMULATIVE | INT64 | modello |
| llm_failure_response_count | Contatore | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/gair/failure_response_count | CUMULATIVE | INT64 | modello |
| llm_text_tokenization_latency_milliseconds | Istogramma | double | modello | aiplatform.googleapis.com/prediction/internal/gdc/gair/text_tokenization_latencies | CUMULATIVE | DISTRIBUTION | modello |
| llm_image_tokenization_latency_milliseconds | Istogramma | double | aiplatform.googleapis.com/prediction/internal/gdc/gair/image_tokenization_latencies | CUMULATIVE | DISTRIBUTION | ||
| llm_audio_tokenization_latency_milliseconds | Istogramma | double | aiplatform.googleapis.com/prediction/internal/gdc/gair/audio_tokenization_latencies | CUMULATIVE | DISTRIBUTION |
Metriche GPU
| Nome metrica Prometheus | Tipo di metriche | Tipo di dati | Etichette | Tipo di Chemist | metric_kind di Chemist | value_type di Chemist | Etichette di Chemist |
|---|---|---|---|---|---|---|---|
| DCGM_FI_DEV_MEM_COPY_UTIL | Misuratore | int64 | gpu UUID pci_bus_id device modelName Hostname DCGM_FI_DRIVER_VERSION | aiplatform.googleapis.com/prediction/internal/gdc/gpu/memory_util | GAUGE | INT64 | uuid gpu_model |
| DCGM_FI_DEV_MEMORY_TEMP | Misuratore | int64 | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/memory_temp | GAUGE | INT64 | Come sopra |
| DCGM_FI_DEV_POWER_USAGE | Misuratore | double | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/power_usage | GAUGE | DOUBLE | Come sopra |
| DCGM_FI_DEV_GPU_TEMP | Misuratore | double | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/gpu_temp | GAUGE | INT64 | Come sopra |
| DCGM_FI_DEV_GPU_UTIL | Misuratore | double | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/gpu_util | GAUGE | INT64 | Come sopra |
| DCGM_FI_DEV_ENC_UTIL | Misuratore | int64 | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/encode_util | GAUGE | INT64 | Come sopra |
| DCGM_FI_DEV_XID_ERRORS | Contatore | int64 | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/xid_errors | CUMULATIVE | INT64 | Come sopra |
| DCGM_FI_DEV_POWER_VIOLATION | Contatore | int64 | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_power | CUMULATIVE | INT64 | Come sopra |
| DCGM_FI_DEV_THERMAL_VIOLATION | Contatore | int64 | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_thermal | CUMULATIVE | INT64 | Come sopra |
| DCGM_FI_DEV_SYNC_BOOST_VIOLATION | Contatore | int64 | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_sync_boost | CUMULATIVE | INT64 | Come sopra |
| DCGM_FI_DEV_BOARD_LIMIT_VIOLATION | Contatore | int64 | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_board_limit | CUMULATIVE | INT64 | Come sopra |
| DCGM_FI_DEV_LOW_UTIL_VIOLATION | Contatore | int64 | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_low_util | CUMULATIVE | INT64 | Come sopra |
| DCGM_FI_DEV_RELIABILITY_VIOLATION | Contatore | int64 | Come sopra | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_reliability | CUMULATIVE | INT64 | Come sopra |