The ML.QUANTILE_BUCKETIZE function
This document describes the ML.QUANTILE_BUCKETIZE function, which lets you
break a continuous numerical feature into buckets based on quantiles.
When used in the
TRANSFORM clause,
the same quantiles are automatically used in prediction.
You can use this function with models that support manual feature preprocessing. For more information, see the following documents:
Syntax
ML.QUANTILE_BUCKETIZE(numerical_expression, num_buckets [, output_format]) OVER()
Arguments
ML.QUANTILE_BUCKETIZE takes the following arguments:
numerical_expression: the numerical expression to bucketize.num_buckets: anINT64value that specifies the number of buckets to splitnumerical_expressioninto.output_format: aSTRINGvalue that specifies the output format of the bucket. Valid output formats are as follows:bucket_names: returns aSTRINGvalue in the formatbin_<bucket_index>. For example,bin_3. Thebucket_indexvalue starts at 1. This is the default bucket format.bucket_ranges: returns aSTRINGvalue in the format[lower_bound, upper_bound)in interval notation. For example,(-inf, 2.5),[2.5, 4.6),[4.6, +inf).bucket_ranges_json: returns a JSON-formattedSTRINGvalue in the format{"start": "lower_bound", "end": "upper_bound"}. For example,{"start": "-Infinity", "end": "2.5"},{"start": "2.5", "end": "4.6"},{"start": "4.6", "end": "Infinity"}. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as thebucket_rangesoption.
Output
ML.QUANTILE_BUCKETIZE returns a STRING value that contains the name of the bucket, in the format specified by the output_format argument.
Example
The following example breaks a numerical expression of five elements into three buckets:
SELECT f, ML.QUANTILE_BUCKETIZE(f, 3) OVER() AS bucket, ML.QUANTILE_BUCKETIZE(f, 3, "bucket_ranges") OVER() AS bucket_ranges, ML.QUANTILE_BUCKETIZE(f, 3, "bucket_ranges_json") OVER() AS bucket_ranges_json FROM UNNEST([1,2,3,4,5]) AS f ORDER BY f;
The output looks similar to the following:
+---+--------+---------------+------------------------------------+
| f | bucket | bucket_ranges | bucket_ranges_json |
|---|--------|---------------|------------------------------------|
| 1 | bin_1 | (-inf, 2) | {"start": "-Infinity", "end": "2"} |
| 2 | bin_2 | [2, 4) | {"start": "2", "end": "4"} |
| 3 | bin_2 | [2, 4) | {"start": "2", "end": "4"} |
| 4 | bin_3 | [4, +inf) | {"start": "4", "end": "Infinity"} |
| 5 | bin_3 | [4, +inf) | {"start": "4", "end": "Infinity"} |
+---+--------+---------------+------------------------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.