The ML.HASH_BUCKETIZE function
This document describes the ML.HASH_BUCKETIZE function, which lets you
convert a string expression to a deterministic hash and then bucketize it by the
modulo value of that hash.
You can use this function with models that support manual feature preprocessing. For more information, see the following documents:
Syntax
ML.HASH_BUCKETIZE(string_expression, hash_bucket_size)
Arguments
ML.HASH_BUCKETIZE takes the following arguments:
string_expression: theSTRINGexpression to bucketize.hash_bucket_size: anINT64value that specifies the number of buckets to create. This value must be greater than or equal to0. Ifhash_bucket_sizeequals0, the function only hashes the string without bucketizing the hashed value.
Output
ML.HASH_BUCKETIZE returns an INT64 value that identifies the bucket.
Example
The following example bucketizes string expressions into three buckets:
SELECT f, ML.HASH_BUCKETIZE(f, 3) AS bucket FROM UNNEST(['a', 'b', 'c', 'd']) AS f;
The output looks similar to the following:
+---+--------+ | f | bucket | +---+--------+ | a | 0 | +---+--------+ | b | 1 | +---+--------+ | c | 1 | +---+--------+ | d | 2 | +------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.