Understand rule replays and MTTD
This document explains how rule replays (also called cleanup runs) manage late-arriving data and context updates, and how this affects the Mean Time To Detect (MTTD) metrics.
Rule replays
Google SecOps processes large volumes of security data. To ensure accurate detections for rules that depend on contextual or correlated data, the rule engine automatically runs a rule replay process. This process evaluates the same block of event data multiple times at different intervals to handle late-arriving data or rules that need updated context, such as matching an Indicator of Compromise (IOC).
Rule replay triggers
Rules are re-evaluated (re-run) when relevant context data arrives or is processed later than the initial event data.
Common rule replay triggers include:
Late-arriving enrichment data
Data enrichment pipelines, such as the Entity Context Graph (ECG), might process data in batches. If a UDM event arrives before its related contextual data (like asset information or user context), a detection might be missed during the initial rule execution.
Retroactive UDM enrichment updates
If your rule uses aliased fields (enriched fields) in your detection logic, such as
$udm.event.principal.hostname, and the source data (for example, DHCP records) arrives more than an hour later, those field values are updated retroactively for that specific event time. Subsequent rule executions ("cleanup runs") then use these newly enriched values, which can trigger a detection that was previously missed.Scheduled multi-event rules
Multi-Event (ME) rules run on a schedule (for example, 10 minutes, hourly, or daily) to evaluate blocks of event time. To capture late enrichment updates to historical data, these rules re-evaluate the same time block later, often running at least two or three times (for example, with checks at 5-8 hours and again at 24-48 hours later).
Impact on timing metrics
When a detection results from a rule replay, the alert's Detection Window or Event Timestamp refers to the time of the original malicious activity. The Created Time is the time the detection is created, which can be much later, sometimes hours or days later.
High detection latency (a large time difference between the event and the detection) is typically caused by re-enrichment of late-arriving data or latency in the Entity Context Graph (ECG) pipeline.
This time difference can make a detection appear "late" or "delayed", which can confuse analysts and distort performance metrics like MTTD.
| Metric component | Source of time | How replays affect MTTD |
|---|---|---|
| Detection Window / Event Timestamp | Time the original security event occurred. | This remains accurate to the event time. |
| Detection Time / Created Time | Time the detection was actually emitted by the engine. | This time appears "late" or "delayed" relative to the Event Timestamp because it relies on a secondary (replay) run that incorporates late enrichment data. This delta negatively affects MTTD calculation. |
Best practices for measuring MTTD
MTTD quantifies the time from initial compromise to the effective detection of the threat. When you analyze detections triggered by rule replays, apply the following best practices to maintain accurate MTTD metrics.
Prioritize real-time detection systems
For the fastest detections, use Single-Event rules. These rules run in near-real time, typically with a delay of less than 5 minutes.
This also supports more comprehensive use of Composite detections.
Account for rule replay in multi-event rules
Multi-event rules inherently incur higher latency due to their scheduled execution frequency. When measuring MTTD for detections generated by multi-event rules, it is crucial to recognize that detections resulting from automated rule replays serve to increase coverage and accuracy. While these replays often catch threats that require late context, they will necessarily increase the reported latency for that specific detection.
For critical, time-sensitive alerting: Use Single-Event rules or Multi-Event rules with the shortest practical run frequencies (for example, shorter match windows, under one hour).
For complex, long-duration correlation (UEBA, multi-stage attacks): Expect higher latency (up to 48 hours or more) if rules rely on extensive contextual joins or reference lists which might update asynchronously. The benefit here is higher fidelity detection rather than absolute speed.
Optimize rules to reduce reliance on late enrichment
To optimize for detection speed and minimize the impact of retroactive enrichment runs, consider using non-aliased fields (fields that are not subject to downstream enrichment pipelines) in your rule logic where possible.
Need more help? Get answers from Community members and Google SecOps professionals.