Implantar e usar o coletor

Neste documento, descrevemos como implantar o coletor OpenTelemetry, configurar o coletor para usar o exportador otlphttp e a API Telemetry (OTLP) e executar um gerador de telemetria para gravar métricas no Cloud Monitoring. É possível visualizar essas métricas no Cloud Monitoring.

Se você estiver usando o Google Kubernetes Engine, siga as instruções em OpenTelemetry gerenciado para GKE, em vez de implantar e configurar manualmente um coletor do OpenTelemetry que usa a API Telemetry.

Se você estiver usando um SDK para enviar métricas de um aplicativo diretamente para a API Telemetry, consulte Usar SDKs para enviar métricas de aplicativos para mais informações e exemplos.

Você também pode usar um coletor do OpenTelemetry e a API Telemetry em conjunto com a instrumentação sem código do OpenTelemetry. Para mais informações, consulte Usar a instrumentação sem código do OpenTelemetry para Java.

Antes de começar

Nesta seção, descrevemos como configurar seu ambiente para implantar e usar o coletor.

Selecione ou crie um projeto do Google Cloud

Escolha um projeto do Google Cloud para este tutorial. Se você ainda não tiver um projeto do Google Cloud , crie um:

  1. Faça login na sua conta do Google Cloud . Se você começou a usar o Google Cloud, crie uma conta para avaliar o desempenho de nossos produtos em situações reais. Clientes novos também recebem US$ 300 em créditos para executar, testar e implantar cargas de trabalho.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

Instalar ferramentas de linha de comando

Este documento usa as seguintes ferramentas de linha de comando:

  • gcloud
  • kubectl

As ferramentas gcloud e kubectl fazem parte da Google Cloud CLI. Para informações sobre como instalá-los, consulte Como gerenciar componentes da CLI do Google Cloud. Para ver os componentes da CLI gcloud que você instalou, execute o seguinte comando:

gcloud components list

Para configurar a CLI gcloud para uso, execute os seguintes comandos:

gcloud auth login
gcloud config set project PROJECT_ID

Ativar APIs

Ative a API Cloud Monitoring e a API Telemetry no seu Google Cloud projeto. Preste atenção especial à API Telemetry, telemetry.googleapis.com. Este documento pode ser a primeira vez que você encontra essa API.

Ative as APIs executando os seguintes comandos:

gcloud services enable monitoring.googleapis.com
gcloud services enable telemetry.googleapis.com

Criar um cluster

Criar um cluster do GKE.

  1. Crie um cluster do Google Kubernetes Engine chamado otlp-test executando o seguinte comando:

    gcloud container clusters create-auto --location CLUSTER_LOCATION otlp-test --project PROJECT_ID
    
  2. Depois que o cluster for criado, conecte-se a ele executando o seguinte comando:

    gcloud container clusters get-credentials otlp-test --region CLUSTER_LOCATION --project PROJECT_ID
    

Autorizar a conta de serviço do Kubernetes

Os comandos a seguir concedem os papéis necessários do Identity and Access Management (IAM) à conta de serviço do Kubernetes. Estes comandos pressupõem que você está usando a Federação de Identidade da Carga de Trabalho para GKE:

export PROJECT_NUMBER=$(gcloud projects describe PROJECT_ID --format="value(projectNumber)")

gcloud projects add-iam-policy-binding projects/PROJECT_ID \
  --role=roles/logging.logWriter \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector \
  --condition=None

gcloud projects add-iam-policy-binding projects/PROJECT_ID \
  --role=roles/monitoring.metricWriter \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector \
  --condition=None

gcloud projects add-iam-policy-binding projects/PROJECT_ID \
  --role=roles/telemetry.tracesWriter \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector \
  --condition=None

Se a conta de serviço tiver um formato diferente, use o comando na documentação do Google Cloud Managed Service para Prometheus para autorizar a conta de serviço com as seguintes mudanças:

  • Substitua o nome da conta de serviço gmp-test-sa pela sua conta.
  • Conceda os papéis mostrados no conjunto anterior de comandos, não apenas o papel roles/monitoring.metricWriter.

Implantar o OpenTelemetry Collector

Crie a configuração do coletor fazendo uma cópia do seguinte arquivo YAML e colocando-o em um arquivo chamado collector.yaml. Você também pode encontrar a seguinte configuração no GitHub, no repositório otlp-k8s-ingest.

Na sua cópia, substitua a ocorrência de ${GOOGLE_CLOUD_PROJECT} pelo ID do projeto, PROJECT_ID.

O OTLP para métricas do Prometheus só funciona quando você usa a versão 0.140.0 ou mais recente do coletor do OpenTelemetry.

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

exporters:
  # The googlecloud exporter is used for logs
  googlecloud:
    log:
      default_log_name: opentelemetry-collector
    user_agent: Google-Cloud-OTLP manifests:0.4.0 OpenTelemetry Collector Built By Google/0.128.0 (linux/amd64)
  googlemanagedprometheus:
    user_agent: Google-Cloud-OTLP manifests:0.4.0 OpenTelemetry Collector Built By Google/0.128.0 (linux/amd64)
  # The otlphttp exporter is used to send traces to Google Cloud Trace and
  # metrics to Google Managed Prometheus using OTLP http/proto.
  # The otlp exporter could also be used to send them using OTLP grpc
  otlphttp:
    encoding: proto
    endpoint: https://telemetry.googleapis.com
    # Use the googleclientauth extension to authenticate with Google credentials
    auth:
      authenticator: googleclientauth


extensions:
  # Standard for the collector. Used for probes.
  health_check:
    endpoint: ${env:MY_POD_IP}:13133
  # This is an auth extension that adds Google Application Default Credentials to http and gRPC requests.
  googleclientauth:


processors:
  # This filter is a standard part of handling the collector's self-observability metrics. Not related to OTLP ingestion.
  filter/self-metrics:
    metrics:
      include:
        match_type: strict
        metric_names:
        - otelcol_process_uptime
        - otelcol_process_memory_rss
        - otelcol_grpc_io_client_completed_rpcs
        - otelcol_googlecloudmonitoring_point_count

  # The recommended batch size for the OTLP endpoint is 200 metric data points.
  batch:
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  # The k8sattributes processor adds k8s resource attributes to metrics based on the source IP that sent the metrics to the collector.
  # k8s attributes are important for avoiding errors from timeseries "collisions".
  # These attributes help distinguish workloads from each other, and provide useful metadata (e.g. namespace) when querying.
  k8sattributes:
    extract:
      metadata:
      - k8s.namespace.name
      - k8s.deployment.name
      - k8s.statefulset.name
      - k8s.daemonset.name
      - k8s.cronjob.name
      - k8s.job.name
      - k8s.replicaset.name
      - k8s.node.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.start_time
    passthrough: false
    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection

  # Standard processor for gracefully degrading when overloaded to prevent OOM.
  memory_limiter:
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

  # Standard processor for enriching self-observability metrics. Unrelated to OTLP ingestion.
  metricstransform/self-metrics:
    transforms:
    - action: update
      include: otelcol_process_uptime
      operations:
      - action: add_label
        new_label: version
        new_value: Google-Cloud-OTLP manifests:0.4.0 OpenTelemetry Collector Built By Google/0.128.0 (linux/amd64)

  # The resourcedetection processor, similar to the k8sattributes processor, enriches metrics with important metadata.
  # The gcp detector provides the cluster name and cluster location.
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  # This transform processor avoids ingestion errors if metrics contain attributes with names that are reserved for the prometheus_target resource.
  transform/collision:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

  # The relative ordering of statements between ReplicaSet & Deployment and Job & CronJob are important.
  # The ordering of these controllers is decided based on the k8s controller documentation available at
  # https://kubernetes.io/docs/concepts/workloads/controllers.
  # The relative ordering of the other controllers in this list is inconsequential since they directly
  # create pods.
  transform/aco-gke:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["top_level_controller_type"], "ReplicaSet") where resource.attributes["k8s.replicaset.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.replicaset.name"]) where resource.attributes["k8s.replicaset.name"] != nil
      - set(attributes["top_level_controller_type"], "Deployment") where resource.attributes["k8s.deployment.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.deployment.name"]) where resource.attributes["k8s.deployment.name"] != nil
      - set(attributes["top_level_controller_type"], "DaemonSet") where resource.attributes["k8s.daemonset.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.daemonset.name"]) where resource.attributes["k8s.daemonset.name"] != nil
      - set(attributes["top_level_controller_type"], "StatefulSet") where resource.attributes["k8s.statefulset.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.statefulset.name"]) where resource.attributes["k8s.statefulset.name"] != nil
      - set(attributes["top_level_controller_type"], "Job") where resource.attributes["k8s.job.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.job.name"]) where resource.attributes["k8s.job.name"] != nil
      - set(attributes["top_level_controller_type"], "CronJob") where resource.attributes["k8s.cronjob.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.cronjob.name"]) where resource.attributes["k8s.cronjob.name"] != nil
  # For each Prometheus unknown-typed metric, which is a gauge, create a counter that is an exact copy of this metric.
  # The GCP OTLP endpoint will add appropriate the appropriate suffixes for the counter and gauge.
  transform/unknown-counter:
    metric_statements:
    - context: metric
      statements:
      # Copy the unknown metric, but add a suffix so we can distinguish the copy from the original.
      - copy_metric(Concat([metric.name, "unknowncounter"], ":")) where metric.metadata["prometheus.type"] == "unknown" and not HasSuffix(metric.name, ":unknowncounter")
      # Change the copy to a monotonic, cumulative sum.
      - convert_gauge_to_sum("cumulative", true) where HasSuffix(metric.name, ":unknowncounter")
      # Delete the extra suffix once we are done.
      - set(metric.name, Substring(metric.name, 0, Len(metric.name)-Len(":unknowncounter"))) where HasSuffix(metric.name, ":unknowncounter")

  # When sending telemetry to the GCP OTLP endpoint, the gcp.project_id resource attribute is required to be set to your project ID.
  resource/gcp_project_id:
    attributes:
    - key: gcp.project_id
      # MAKE SURE YOU REPLACE THIS WITH YOUR PROJECT ID
      value: ${GOOGLE_CLOUD_PROJECT}
      action: insert
  # The metricstarttime processor is important to include if you are using the prometheus receiver to ensure the start time is set properly.
  # It is a no-op otherwise.
  metricstarttime:
    strategy: subtract_initial_point

receivers:
  # This collector is configured to accept OTLP metrics, logs, and traces, and is designed to receive OTLP from workloads running in the cluster.
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        cors:
          allowed_origins:
          - http://*
          - https://*
        endpoint: ${env:MY_POD_IP}:4318

  # Push the collector's own self-observability metrics to the otlp receiver.
  otlp/self-metrics:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:14317

service:
  extensions:
  - health_check
  - googleclientauth
  pipelines:
    # Recieve OTLP logs, and export logs using the googlecloud exporter.
    logs:
      exporters:
      - googlecloud
      processors:
      - k8sattributes
      - resourcedetection
      - memory_limiter
      - batch
      receivers:
      - otlp
    # Recieve OTLP metrics, and export metrics to GMP using the otlphttp exporter.
    metrics/otlp:
      exporters:
      - otlphttp
      processors:
      - k8sattributes
      - memory_limiter
      - resource/gcp_project_id
      - resourcedetection
      - transform/collision
      - transform/aco-gke
      - transform/unknown-counter
      - metricstarttime
      - batch
      receivers:
      - otlp
    # Scrape self-observability Prometheus metrics, and export metrics to GMP using the otlphttp exporter.
    metrics/self-metrics:
      exporters:
      - otlphttp
      processors:
      - filter/self-metrics
      - metricstransform/self-metrics
      - k8sattributes
      - memory_limiter
      - resource/gcp_project_id
      - resourcedetection
      - batch
      receivers:
      - otlp/self-metrics
    # Recieve OTLP traces, and export traces using the otlphttp exporter.
    traces:
      exporters:
      - otlphttp
      processors:
      - k8sattributes
      - memory_limiter
      - resource/gcp_project_id
      - resourcedetection
      - batch
      receivers:
      - otlp
  telemetry:
    logs:
      encoding: json
    metrics:
      readers:
      - periodic:
          exporter:
            otlp:
              protocol: grpc
              endpoint: ${env:MY_POD_IP}:14317

Configurar o OpenTelemetry Collector implantado

Configure a implantação do coletor criando recursos do Kubernetes.

  1. Crie o namespace opentelemetry e a configuração do coletor no namespace executando os seguintes comandos:

    kubectl create namespace opentelemetry
    
    kubectl create configmap collector-config -n opentelemetry --from-file=collector.yaml
    
  2. Configure o coletor com recursos do Kubernetes executando os seguintes comandos:

    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/otlp-k8s-ingest/refs/heads/otlpmetric/k8s/base/2_rbac.yaml
    
    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/otlp-k8s-ingest/refs/heads/otlpmetric/k8s/base/3_service.yaml
    
    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/otlp-k8s-ingest/refs/heads/otlpmetric/k8s/base/4_deployment.yaml
    
    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/otlp-k8s-ingest/refs/heads/otlpmetric/k8s/base/5_hpa.yaml
    
  3. Aguarde os pods do coletor atingirem o status "Em execução" e terem 1/1 contêineres prontos. Isso leva cerca de três minutos no Autopilot, se for a primeira carga de trabalho implantada. Para verificar os pods, use o seguinte comando:

    kubectl get po -n opentelemetry -w
    

    Para parar de monitorar o status do pod, pressione Ctrl+C para interromper o comando.

  4. Você também pode verificar os registros do coletor para garantir que não haja erros óbvios:

    kubectl logs -n opentelemetry deployment/opentelemetry-collector
    

Implantar o gerador de telemetria

É possível testar sua configuração usando a ferramenta telemetrygen de código aberto. Esse app gera telemetria e a envia para o coletor.

  1. Para implantar o app telemetrygen no namespace opentelemetry-demo, execute o seguinte comando:

    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/otlp-k8s-ingest/refs/heads/main/sample/app.yaml
    
  2. Depois de criar a implantação, pode levar um tempo para que os pods sejam criados e comecem a ser executados. Para verificar o status dos pods, execute o comando a seguir:

    kubectl get po -n opentelemetry-demo -w
    

    Para parar de monitorar o status do pod, pressione Ctrl+C para interromper o comando.

Consultar uma métrica usando o Metrics Explorer

A ferramenta telemetrygen grava em uma métrica chamada gen. É possível consultar essa métrica na interface do criador de consultas e no editor de consultas PromQL do Metrics Explorer.

No console Google Cloud , acesse a página do  Metrics explorer:

Acesse o Metrics Explorer

Se você usar a barra de pesquisa para encontrar essa página, selecione o resultado com o subtítulo Monitoring.

  • Se você usa a interface do criador de consultas do Metrics Explorer, o nome completo da métrica é prometheus.googleapis.com/gen/gauge.
  • Se você usar o editor de consultas do PromQL, poderá consultar a métrica usando o nome gen.

A imagem a seguir mostra um gráfico da métrica gen no Metrics Explorer:

Um gráfico mostra a métrica de geração, capturada pelo exportador otlphttp.

excluir o cluster

Depois de verificar a implantação consultando a métrica, é possível excluir o cluster. Para excluir o cluster, execute o seguinte comando:

gcloud container clusters delete --location CLUSTER_LOCATION otlp-test --project PROJECT_ID

A seguir