Understand rule replays and MTTD

Supported in:

This document explains how rule replays (also called cleanup runs) handle late-arriving data and context updates, and how these replays affect the Mean Time To Detect (MTTD) metrics.

Rule replays

Google SecOps processes large volumes of security data. To ensure accurate detections for rules that depend on contextual or correlated data, the rule engine automatically runs a rule replay process.

The rule replay process handles these categories of rules:

  • Single-event rules: When the UDM enrichment process updates a previously evaluated event, the system replays single-event rules. (See exceptions for rules with data tables below).

  • Windowed Single-event (WSE) rules and Single-event rules with data tables: These rules have a distinct scheduling mechanism for handling late-arriving data, different from both standard single-event and multi-event rules.

  • Multi-event rules: Multi-event rules execute on a schedule, processing blocks of event time. These rules repeatedly re-evaluate the same time block at different intervals to capture late enrichment updates, such as matching user or asset context data or an Indicator of Compromise (IOC). The exact timings depend on the schedule configuration.

Rule replay triggers

The system re-evaluates (re-runs) rules to ensure it catches detections, even if data arrives or updates after the initial rule execution. This late arriving data includes the following categories:

  • Late arriving source events: The raw log or UDM event itself arrives in Google SecOps significantly later than the event's actual timestamp.
  • Late arriving enrichment data: Contextual data (for example, user, asset, threat intelligence) related to an event becomes available, or the system updates it, after it first processed the event. This often occurs because enrichment pipelines, such as the entity context graph (ECG), process data in batches or depend on external data sources.
  • Retroactive UDM enrichment updates: Late-arriving source data (like DHCP records updating hostnames) triggers changes to UDM event fields. Rules using aliased fields (enriched fields) in their detection logic, such as $udm.event.principal.hostname, can trigger replays when source data is delayed. This late arrival retroactively updates those field values.

The system triggers rule replays differently depending on the rule type and the nature of the late data. The goal is to balance detection timeliness with data completeness.

How the system handles late arriving data by rule type

The rule type and its configuration determine the time window within which late-arriving data can trigger a rule re-evaluation.

  • Single-event rules (without match windows or data tables):

    • Late source events: Generally, these rules process an event regardless of how old its timestamp is when it arrives in the system. The system doesn't impose a strict cut-off window for the initial processing of late source events.
    • Late enrichment: If enrichment data for a previously evaluated event arrives or an update occurs, the system re-evaluates these single-event rules against the event with the new context. This can happen hours or even days after the initial event.
  • Windowed-single-event (WSE) rules and single-event rules with data tables:

    • These rules do not follow the same late data handling as other single-event rules or the true-up schedules of multi-event rules.
    • They have the following behavior:
      • Cut-off: These rules don't process events ingested 7 days or more after the event timestamp.
      • Late-Arriving Data (<7d): The system processes events arriving less than 7 days late, but with potentially higher latency.
      • Late-arriving source events: WSE rules won't process events if the data arrives in Google SecOps 7 days or more after the event timestamp.
      • Context Updates: If context for an event arrives late or if an event is retroactively enriched, the system automatically re-evaluates rules against the enriched event. This rule replay can trigger new detections, even if the initial evaluation didn't result in a detection.
      • Late enrichment: If a UDM event is updated due to enrichment (which can occur up to 7 days after ingestion), the system re-evaluates these rules against the updated event. However, unlike other rule types, updates to data table content don't trigger an automatic re-evaluation of past events for these rules.
      • Lookback window: These rules use a lookback window of approximately 7 days to re-evaluate events. If enrichment data arrives for an event that falls within this 7-day window, the rule will be re-evaluated.
  • Multi-event rules:

    • Multi-event rules run on a schedule and re-evaluate time blocks to account for late data. How you configure the rule's schedule determines the effective cut-off window:
      • Default schedule: The system typically runs automatic true-up runs approximately 5 hours and 24 hours after the event time. If the data arrives after the 24-hour run completes, it won't be evaluated by this rule for that time window.
      • Customizable schedules enabled: This feature gives you more control over the run timings through the "Run Frequency" settings. See Configure customized schedules for rules. The key timings are:
        • First run: The system runs the first run at the event time plus the configured offset (for example, T + 1 hour).
        • True-up run 1: The system runs the first True-up run approximately 4 hours after the First Run. This means the system can include events arriving up to roughly T + 4-5 hours.
        • True-up run 2 (Conditional): If you turn on Ensure enrichment completeness, the system runs a final true-up run approximately 30 hours after the First Run. This extends the window for the system to process late data to around T + 30-31 hours.
      • Cut-off implications: With customizable schedules, the last true-up run dictates the effective cut-off for including late data. This typically occurs around 4 hours after the first run, or around 30 hours after the first run if you enable Ensure enrichment completeness. Events or enrichments arriving after the final true-up run for a given time window won't be processed by this rule for that window.

Examples of late arriving data scenarios

  • Scenario 1: Late source event - Single-event rule

    • Google SecOps ingests an event with a timestamp from 3 days ago. A standard single-event rule processes this event as new data.
  • Scenario 2: Late enrichment - Single-event rule

    • The system processed a login event yesterday. Today, it ingests and enriches new information for the user involved (for example, a department change). The system re-evaluates the single-event rule against the login event with the updated user context.
  • Scenario 3: Late source event - Multi-event rule (Default schedule)

    • If an event arrives 10 hours after its event timestamp, the system misses it during the 5-hour true-up run, but processes it during the 24-hour true-up run. An event arriving 25 hours late won't be processed.
  • Scenario 4: Late source event - Multi-event rule (Customizable schedule)

    • You configure a multi-event rule with a first run offset of 1 hour. An event arrives 6 hours after its timestamp.
    • This event misses the first run (T + 1h) and the first true-up run (T + 4h). The system will NOT process this event with this rule, unless you enable Ensure enrichment completeness.
  • Scenario 5: Late enrichment - Multi-event rule (Customizable with enrichment completeness)

    • A multi-event rule has a 1-hour offset and you enable Ensure enrichment completeness. Enrichment data for an event arrives 28 hours after the event timestamp.
    • Some of this late enrichment data might be available for the second "True-up run," which occurs around T + 30h (because you turned on Ensure enrichment completeness). If the enrichment data is available, the system re-evaluates the rule using this late enrichment.
  • Scenario 6: Late source event - Multi-event rule with Match Window

    • A multi-event rule has a 48-hour match window and a custom schedule with Ensure enrichment completeness enabled (final true-up around T + 30h). An event arrives 36 hours after its timestamp. This event will not be processed because it arrived after the final true-up run, even though the event time is within the rule's match window relative to other events. The cut-off is based on arrival time relative to the true-up schedule, not just the match window.
  • Scenario 7: Late source event - Windowed-single-event rule

    • If a source event with a timestamp from 8 days ago arrives late, it might fall outside of the 7-day lookback window for WSE rules, and it might not be processed.

Impact on timing metrics

When a detection results from a rule replay, the system uses the following terminology:

  • The alert's Detection window or Event timestamp refers to the time of the original malicious activity.
  • The Created time is the time the system creates the detection, which can be much later, sometimes hours or days later.
  • Detection latency is the time difference between the Event timestamp and the detection's Created time.

Re-enrichment due to late-arriving data, or latency with a context source update such as the entity context graph (ECG) typically causes high detection latency.

This time difference can make a detection appear "late" or "delayed", which can confuse analysts and distort performance metrics like MTTD.

Metric component Source of time How replays affect MTTD
Detection window / Event timestamp Time the original security event occurred. Replays keep this accurate to the event time.
Detection time / Created time Time the engine actually emitted the detection. A secondary (replay) run that incorporates late enrichment data causes this time to appear late or delayed relative to the Event timestamp. This delta negatively affects the MTTD calculation.

Best practices for measuring MTTD

MTTD quantifies the time from initial compromise to the effective detection of the threat. When you analyze detections triggered by rule replays, apply the following best practices to maintain accurate MTTD metrics.

Google SecOps provides several user-queryable metrics to measure MTTD accurately. For details about these metrics, see Sample YARA-L 2.0 queries for Dashboards page.

A icon in the Detection Type column identifies detections the system generates from event data arriving more than 30 minutes late, rule reprocessing runs, or retrohunts. This icon also appears on the Alerts page in Google SecOps.

Prioritize real-time detection systems

For the fastest detections, use single-event rules. These rules run in near-real time, typically with a delay of less than 5 minutes.

This also supports more comprehensive use of Composite detections.

Account for rule replay in multi-event rules

Multi-event rules inherently incur higher latency due to their scheduled run frequency. When you measure MTTD for detections from multi-event rules, recognize that automated rule replays increase coverage and accuracy. These replays often catch threats requiring late context, which increases the reported latency for those detections.

  • For critical, time-sensitive alerting: Use single-event rules or multi-event rules with the shortest practical run frequencies. Reducing the match window doesn't directly affect latency, but it can increase efficiency by setting the minimum delay.

  • For complex, long-duration correlation (UEBA, multi-stage attacks): These rules rely on extensive contextual joins or reference lists, which might update asynchronously. They can experience high latency with late-arriving contextual or event data, but they offer the benefit of higher fidelity detection rather than absolute speed.

Optimize rules to reduce reliance on late enrichment

To optimize for detection speed and minimize the impact of retroactive enrichment runs, consider using non-aliased fields (fields that downstream enrichment pipelines don't process) in your rule logic where possible.

Need more help? Get answers from Community members and Google SecOps professionals.