Understand ingestion metrics

Supported in:

Google SecOps provides detailed ingestion metrics to offer visibility into the health, volume, and processing status of your security data pipeline. Monitoring these metrics is crucial for ensuring data is flowing correctly, identifying potential issues, and optimizing your Google SecOps deployment.

Why Ingestion Metrics Are Important

  • Health Monitoring: Track the status of your data feeds and collection agents.
  • Troubleshooting: Quickly diagnose issues like data drops, parsing errors, or connectivity problems.
  • Volume Tracking: Understand how much data is being ingested from various sources, useful for capacity planning and cost management.
  • Normalization Insights: Monitor the success rate of logs being parsed into the Unified Data Model (UDM).
  • Alerting: Configure notifications for anomalies in data flow, errors, or volume changes.

Where to Access Ingestion Metrics

  • Google Cloud Monitoring:

    • Integrates Google SecOps metrics into Cloud Monitoring, allowing you to create dashboards and set up alert policies based on thresholds for metrics like ingested log counts, sizes, and normalization events.
    • Metrics are available under prefixes like Chronicle Collector and Chronicle Log Type.
    • Reference: Use Cloud Monitoring for ingestion insights
  • Google SecOps Health Hub:

    • The Health Hub provides a centralized view within the Google SecOps platform to monitor data source status, ingestion volumes, and parsing health. It highlights errors and provides links to dig deeper.
    • Reference: Use the Health Hub

*. BigQuery Export: ** * Google SecOps can export detailed ingestion_metrics tables to your BigQuery dataset. This allows for in-depth SQL-based analysis, custom reporting, and joining with other data. * Reference: Google SecOps data in BigQuery * Reference: Ingestion metrics reference for Looker and BigQuery

Key Metric Categories and Fields

Ingestion metrics are broken down by different components of the ingestion pipeline (for example, Forwarder, Ingestion API, Normalizer, Feeds). Here are some key fields you'll encounter:

  • Volume Metrics:

    • log_count: Number of raw log entries received or processed.
    • log_volume: Total size of logs received or processed (in bytes).
    • event_count: Number of UDM events generated after parsing.
  • Normalization Metrics:

    • state: The outcome of the normalization process for logs (e.g., parsed, failed_parsing, failed_validation).
    • total_events: Count of successfully validated UDM events.
    • total_error_events: Count of events that failed parsing or validation.
    • drop_reason_code: Reason why a log or event was dropped.
  • Health & Performance:

    • last_heartbeat_time: Indicates the last time a collector or feed was active.
    • drop_count: Number of logs dropped by a component.
    • cpu_used, memory_used, disk_used: Resource utilization for on-premise components like the Bindplane Agent or legacy Forwarder.
    • latency_count: Metrics related to the time taken between ingestion and normalization.
  • Quota Metrics (Ingestion API):

    • quota_rejected_long_term_log_volume: Volume of logs rejected due to exceeding quota limits.

Understanding Dimensions

Metrics are typically broken down by dimensions, allowing you to filter and group data:

  • component: The part of the ingestion pipeline the metric refers to (e.g., Forwarder, Ingestion API, Normalizer, Out-of-Band Processor).
  • log_type: The type of log source (e.g., WINDOWS_DNS, CS_EDR).
  • collector_id / feed_id: Unique identifier for the specific collector instance or data feed.
  • namespace: For organizing logs, often used in API ingestion.

Note: As stated in the Ingestion metrics schema, aggregate ingestion metrics should be treated as indicators of approximate volume rather than exact counts for any given time range.

Using Ingestion Metrics

  • Troubleshooting: Filter by log_type and state (e.g., failed_parsing) in the Normalizer metrics to find parsing issues. Check drop_count and drop_reason_code for data loss.
  • Alerting: Set up alerts in Cloud Monitoring for sudden drops in log_count for critical log sources, or high rates of total_error_events.
  • Capacity Planning: Analyze log_volume trends in BigQuery to forecast future needs.
  • Validating Configuration: After setting up a new data source, check the metrics to confirm data is flowing and being parsed as expected.

By regularly monitoring and understanding these ingestion metrics, you can ensure a reliable and efficient data pipeline for your Google SecOps instance.

Need more help? Get answers from Community members and Google SecOps professionals.