sample_rate
Supported in:
optimization.sample_rate(byteOrString, rateNumerator, rateDenominator)
Description
This function determines whether to include an event based on a deterministic sampling strategy. This function returns:
truefor a fraction of input values, equivalent to (rateNumerator/rateDenominator), indicating that the event should be included in the sample.falseindicating that the event shouldn't be included in the sample.
This function is useful for optimization scenarios where you want to process only a subset of events. Equivalent to:
hash.fingerprint2011(byteOrString) % rateDenominator < rateNumerator
Param data types
- byteOrString: Expression that evaluates to either a
BYTEorSTRING. - rateNumerator: 'INT'
- rateDenominator: 'INT'
Return type
BOOL
Code sample
events:
$e.metadata.event_type = "NETWORK_CONNECTION"
$asset_id = $e.principal.asset.asset_id
optimization.sample_rate($e.metadata.id, 1, 5) // Only 1 out of every 5 events
match:
$asset_id over 1h
outcome:
$event_count = count_distinct($e.metadata.id)
// estimate the usage by multiplying by the inverse of the sample rate
$usage_past_hour = sum(5.0 * $e.network.sent_bytes)
condition:
// Requiring a certain number of events after sampling avoids bias (e.g. a
// device with just 1 connection will still show up 20% of the time and
// if we multiply that traffic by 5, we'll get an incorrect estimate)
$e and ($usage_past_hour > 1000000000) and $event_count >= 100