YARA-L 2.0 known issues and limitations

This document is for Detection Engineers who want to debug rule logic and optimize YARA-L 2.0 execution. It explains how to handle non-standard engine behaviors, such as field unnesting, Cartesian product expansion in aggregations, and enrichment eventual consistency. By following these methods, you can prevent logic errors that lead to inflated outcome values or missed detections.

YARA-L 2.0 uses a specific execution model where repeated fields are expanded into individual event rows during evaluation. Because this transformation happens at the engine level, referencing multiple repeated fields or performing arithmetic on unsigned UDM types requires specific syntax workarounds to avoid compiler errors or incorrect result sets. This document outlines those technical constraints and the required logic patterns to resolve them.

Before you begin

Make sure your account has the following technical entitlements before testing or modifying YARA-L 2.0 rules:

Required IAM roles

  • roles/chronicle.viewer (Security Operations Viewer): To view existing rules and detection metadata.
  • roles/chronicle.editor (Security Operations Editor): To modify rule logic and save changes.

Required permissions

  • chronicle.rules.runTest: Required to execute the Run Test feature on historical data.

  • chronicle.detections.get: To inspect the output of unnested events in the detection dashboard.

Key terminology

  • UDM (Unified Data Model): The normalized schema used to structure all ingested security telemetry across the platform.
  • Unnesting: The engine-level expansion of a single UDM event containing a repeated field (array) into multiple rows. Each row represents a unique element from the array, which can lead to row multiplication during rule evaluation.
  • T₀ (initial run): The first execution of a rule on incoming telemetry. This occurs during the "streaming" phase, often before background enrichment processes (like GeoIP or ASN true-ups) are finalized.

Outcome aggregations with repeated field unnesting

When a rule references a repeated field in an event variable with multiple elements, each element splits into a separate event row.

For example, the two IP addresses in the repeated field target.ip on event $e are split into two instances of $e, each with a different target.ip value.

rule outbound_ip_per_app {
  meta:

  events:
    $e.principal.application = $app

  match:
    $app over 10m

  outcome:
    $outbound_ip_count = count($e.target.ip) // yields 2.

  condition:
    $e
}

Event records: Before and after unnesting

The tables in this section demonstrate how a single event containing an array of IP addresses is transformed into two distinct records.

Before unnesting

The following table shows the event record before unnesting the repeated field:

metadata.id principal.application target.ip
aaaaaaaaa Google SecOps [192.0.2.20, 192.0.2.28]

After unnesting

The following table shows the event record after unnesting the repeated field:

metadata.id principal.application target.ip
aaaaaaaaa Google SecOps 192.0.2.20
aaaaaaaaa Google SecOps 192.0.2.28

Nested repeated fields (Cartesian product)

When a rule references a repeated field nested within another, like security_results.action, the unnesting occurs at both levels (parent and child) simultaneously. This results in a Cartesian product of all elements.

In the following example, an event $e with two repeated values on security_results and two repeated values on security_results.actions are unnested into four instances.

rule security_action_per_app {
  meta:

  events:
    $e.principal.application = $app

  match:
    $app over 10m

  outcome:
    $security_action_count = count($e.security_results.actions) // yields 4.

  condition:
    $e
}

Event record before nested unnesting

The original record stores the actions within a nested array structure.

metadata.id principal.application security_results
aaaaaaaaa Google SecOps [ { actions: [ ALLOW, FAIL ] }, { actions: [ CHALLENGE, BLOCK ] } ]

Event records after nested unnesting

After expansion, each unique action becomes its own row, which can lead to unexpected counts in non-distinct aggregations.

metadata.id principal.application security_results.actions
aaaaaaaaa Google SecOps ALLOW
aaaaaaaaa Google SecOps FAIL
aaaaaaaaa Google SecOps CHALLENGE
aaaaaaaaa Google SecOps BLOCK

Impact on unrelated fields

This unnesting behavior in rule evaluation can produce unexpected outcome aggregations when the rule references one or more repeated fields with a parent field that is also a repeated field. Non-distinct aggregations like sum(), array(), and count() can't account for duplicate values on other fields on the same event produced by the unnesting behavior.

In the following example, event $e has a single hostname (google.com), but the outcome (hostnames) aggregates over unnested four instances of the same event $e, each with a duplicate principal.hostname value. This outcome yields four hostnames (instead of one) due to the unnesting of repeated values on security_results.actions.

rule security_action_per_app {
  meta:

  events:
    $e.principal.application = $app

  match:
    $app over 10m

  outcome:
    $hostnames = array($e.principal.hostname) // yields 4.
    $security_action_count = count($e.security_results.action) // yields 4.

  condition:
    $e
}

Event record before unnesting with unrelated fields

The hostname is a single value, but it sits alongside the repeated security results.

metadata.id principal.application principal.hostname security_results
aaaaaaaaa Google SecOps google.com [ { action: [ ALLOW, FAIL ] }, { action: [ CHALLENGE, BLOCK ] } ]

Event record after unnesting with unrelated fields

The hostname is now duplicated across four rows, causing the array() function to pick it up four times.

metadata.id principal.application principal.hostname security_results.action
aaaaaaaaa Google SecOps google.com ALLOW
aaaaaaaaa Google SecOps google.com FAIL
aaaaaaaaa Google SecOps google.com CHALLENGE
aaaaaaaaa Google SecOps google.com BLOCK

Workaround for unnesting behavior

To make sure your outcome values are accurate when unnesting occurs, use the distinct version of your selected aggregation. The following functions ignore the duplicate rows created by unnesting:

  • max()
  • min()
  • array_distinct()
  • count_distinct()

Outcome aggregations with multiple event variables

If a rule contains multiple event variables, there is a separate item in the aggregation for each combination of events that is included in the detection. For example, if the following example rule is run against the listed events:

events:
  $e1.field = $e2.field
  $e2.somefield = $ph

match:
  $ph over 1h

outcome:
   $some_outcome = sum(if($e1.otherfield = "value", 1, 0))

condition:
  $e1 and $e2
event1:
  // UDM event 1
  field="a"
  somefield="d"

event2:
  // UDM event 2
  field="b"
  somefield="d"

event3:
  // UDM event 3
  field="c"
  somefield="d"

The sum is calculated over every combination of events, letting. you use both event variables in the outcome value calculations. The following elements are used in the calculation:

1: $e1 = event1, $e2 = event2
2: $e1 = event1, $e2 = event3
3: $e1 = event2, $e2 = event1
4: $e1 = event2, $e2 = event3
5: $e1 = event3, $e2 = event1
5: $e1 = event3, $e2 = event2

This results in a potential maximum sum of 6, even though $e2 can only correspond to 3 distinct events.

This affects sum, count, and array. For count and array, using count_distinct or array_distinct can solve the issue, but there is no workaround for sum.

Parentheses at the start of an expression

Beginning an expression with parentheses is unsupported and triggers a parsing error in the rule editor.

Invalid syntax

parsing: error with token: ")"
invalid operator in events predicate

The following example generates this type of error:

($event.metadata.ingested_timestamp.seconds -
$event.metadata.event_timestamp.seconds) / 3600 > 1

Valid syntax variations

The following syntax variations return the same result, but with valid syntax:

$event.metadata.ingested_timestamp.seconds / 3600 -
$event.metadata.event_timestamp.seconds / 3600 > 1
    1 / 3600 * ($event.metadata.ingested_timestamp.seconds -
$event.metadata.event_timestamp.seconds) > 1
    1 < ($event.metadata.ingested_timestamp.seconds -
$event.metadata.event_timestamp.seconds) / 3600

Index array in outcome requires aggregation

Directly indexing an array within the outcome section for repeated fields isn't permitted. It requires a temporary placeholder variable.

outcome:
  $principal_user_dept = $suspicious.principal.user.department[0]

Workaround

Capture the specific array index into a placeholder variable within the events section, then reference that placeholder in your outcome.

events:
  $principal_user_dept = $suspicious.principal.user.department[0]

outcome:
  $principal_user_department = $principal_user_dept

OR condition with non-existence

If you apply an OR condition between two separate event variables and if the rule matches on non-existence, the rule successfully compiles, but can produce false positive detections.

For example, the following rule syntax can match events having $event_a.field = "something" even though it shouldn't:

events:
     not ($event_a.field = "something" **or** $event_b.field = "something")
condition:
     $event_a and #event_b >= 0

Workaround

Separate the non-existence checks into individual blocks for each variable to maintain logic integrity.

events:
  not ($event_a.field = "something")
  not ($event_b.field = "something")

condition:
  $event_a and #event_b >= 0

Arithmetic with unsigned event fields

If you try to use an integer constant in an arithmetic operation with a UDM field whose type is an unsigned integer, you will get an error. For example:

events:
  $total_bytes = $e.network.received_bytes * 2

Standard integer constants default to signed integers, which are incompatible with UDM fields defined as unsigned integers, like network.received_bytes.

Workaround

You can bypass this error by forcing the integer constant to behave as a float through a division operation.

events:
  $total_bytes = $e.network.received_bytes * (2/1)

GeoIP enrichment and eventual consistency

The system prioritizes speed over immediate accuracy in the initial enrichment stages (Streaming and Latency-Sensitive), which can lead to missing data and potential false positives. The system continues to enrich the data in the background, but the data may not be available when the rule is run. This is part of the normal eventual consistency process.

To prevent false positives caused by enrichment lag, explicitly check that the field is not empty before evaluating its value.

For example, consider this rule event:

$e.principal.ip_geo_artifact.network.asn = "16509" AND
$e.principal.ip_geo_artifact.location.country_or_region = "United Kingdom"

The rule relies on the fact that the event must have $e.principal.ip_geo_artifact.network.asn = "16509" AND $e.principal.ip_geo_artifact.location.country_or_region = "United Kingdom" which are both enriched fields. If the enrichment is not completed in time, the rule will produce a false positive.

To avoid this, a better check for this rule would be:

$e.principal.ip_geo_artifact.network.asn != "" AND
$e.principal.ip_geo_artifact.network.asn = "16509" AND
$e.principal.ip_geo_artifact.location.country_or_region != "" AND
$e.principal.ip_geo_artifact.location.country_or_region = "United Kingdom"

This rule eliminates the possibility of the event being triggered by IPs with the ASN 16509 but located outside the UK. This improves the overall precision of the rule.

Learn how to troubleshoot the enrichment lag.

Troubleshooting

This section outlines performance expectations and provides self-service fixes for common issues where live detection behavior differs from test results.

Future-dated events

Multi-event rules are designed to process events in chronological order relative to ingestion. If you specify and activate a multi-event rule, it doesn't create detections for events with future timestamps, for example when the event.timestamp has a date and time set after the ingest.timestamp.

Enrichment lag

Google SecOps prioritizes ingestion speed to expose initial alerts as quickly as possible. However, background enrichment processes, such as resolving GeoIP, ASN, or UDM metadata, follow an eventual consistency model.

Initial run (T₀)

The live engine may evaluate a rule before background enrichment is complete. Depending on whether your logic relies on enriched fields for detections or exclusions, this can lead to the following temporary discrepancies:

  • False negatives (detection lag): This is a common result. If a rule depends on an enriched field to trigger (for example, target.user.department == "Finance"), and that field is null, the rule doesn't match during the initial run.

  • False positives (exclusion miss): If your rule uses enriched fields to filter out known-good activity (for example, NOT target.ip_geo_country == "US"), the rule may trigger a false positive because the "exclusion" data hasn't been applied yet.

True-up runs

These background runs re-evaluate the data after a delay (for example, 45 minutes or 30 hours). This "trues up" the detection states as follows:

  • Late detections: Events that were "false negatives" at T₀ now produce a detection once the enrichment is finalized.

  • Correction: Any T₀ false positives remain in the system, but the fully enriched data is visible in the UDM viewer for manual triaging.

Run test discrepancy

The Run Test tool operates on historical data that has already reconciled. Because the data is fully enriched by the time you run a manual test, you can see the "true-up" results immediately. This means you won't see the T₀ false negatives or exclusion-based false positives that occurred during the live initial run.

Error remediation

Use the following table to resolve discrepancies between live alerts and test results.

Issue Description Actionable fix
Exclusion failure A rule fires despite an exclusion (for example, != "ASN_123") because the field was null during the initial run. Add a not null check to the events section to make sure data is enriched before evaluation, for example:

$e.principal.ip_geo_artifact.network.asn != ""
Live compared to test match Live rules trigger alerts, but Run Test on the same data shows "No Results". Add $e.field != "" which checks for all enriched fields (GeoIP, ASN, File Path) to synchronize live and historical behavior.
Missing metadata Detections appear in the dashboard with empty GeoIP or File Path fields. This is expected for T0 runs. To fix, include a field != "" check or increase the first run offset in your run schedule to allow more time for ingestion.

Validation and testing

To verify that a rule is correctly handling delayed enrichment, do the following:

  1. Identify the lag: Locate a detection you believe is a false positive. In the Detection Type column, check for the <span class="material-icons">lightbulb</span> icon. Alerts without this icon are from the initial run where enrichment lag is most common.

  2. Update the rule logic: Add a field != "" check for all enriched data points used in your logic.
    Example (file path):
    $e.target.process.parent_process.file.full_path != ""

  3. Test and verify:

    • Use the Run Test feature to make sure your logic still matches the intended historical data.
    • Verify that the rule now only triggers (or correctly excludes) during the true-up runs once the enrichment fields are populated.

For more details, see Manage your rule run schedule and Configure customized schedules for rules.

Need more help? Get answers from Community members and Google SecOps professionals.