The ML.BUCKETIZE function
This document describes the ML.BUCKETIZE function, which lets you split
a numerical expression into buckets.
You can use this function with models that support manual feature preprocessing. For more information, see the following documents:
Syntax
ML.BUCKETIZE(numerical_expression, array_split_points [, exclude_boundaries] [, output_format])
Arguments
ML.BUCKETIZE takes the following arguments:
- numerical_expression: the numerical expression to bucketize.
- array_split_points: an array of numerical values that provide the points at which to split the- numerical_expressionvalue. The numerical values in the array must be finite, so not- -inf,- inf, or- NaN. Provide the numerical values in order, lowest to highest. The range of possible buckets is determined by the upper and lower boundaries of the array. For example, if the- array_split_pointsvalue is- [1, 2, 3, 4], then there are five potential buckets that the- numerical_expressionvalue can be bucketized into.
- exclude_boundaries: a- BOOLvalue that determines whether the upper and lower boundaries from- array_split_pointsare used. If- TRUE, then the boundary values aren't used to create buckets. For example, if the- array_split_pointsvalue is- [1, 2, 3, 4]and- exclude_boundariesis- TRUE, then there are three potential buckets that the- numerical_expressionvalue can be bucketized into. The default value is- FALSE.
- output_format: a- STRINGvalue that specifies the output format of the bucket. Valid output formats are as follows:- bucket_names: returns a- STRINGvalue in the format- bin_<bucket_index>. For example,- bin_3. The- bucket_indexvalue starts at 1. This is the default bucket format.
- bucket_ranges: returns a- STRINGvalue in the format- [lower_bound, upper_bound)in interval notation. For example,- (-inf, 2.5),- [2.5, 4.6),- [4.6, +inf).
- bucket_ranges_json: returns a JSON-formatted- STRINGvalue in the format- {"start": "lower_bound", "end": "upper_bound"}. For example,- {"start": "-Infinity", "end": "2.5"},- {"start": "2.5", "end": "4.6"},- {"start": "4.6", "end": "Infinity"}. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as the- bucket_rangesoption.
 
Output
ML.BUCKETIZE returns a STRING value that contains the name of the bucket, in the format specified by the output_format argument.
Example
The following example bucketizes a numerical expression both with and without boundaries:
SELECT ML.BUCKETIZE(2.5, [1, 2, 3]) AS bucket, ML.BUCKETIZE(2.5, [1, 2, 3], TRUE) AS bucket_without_boundaries, ML.BUCKETIZE(2.5, [1, 2, 3], FALSE, "bucket_ranges") AS bucket_ranges, ML.BUCKETIZE(2.5, [1, 2, 3], FALSE, "bucket_ranges_json") AS bucket_ranges_json;
The output looks similar to the following:
+--------+---------------------------+---------------+----------------------------+
| bucket | bucket_without_boundaries | bucket_ranges | bucket_ranges_json         |
|--------|---------------------------|---------------|----------------------------|
| bin_3  | bin_2                     | [2, 3)        | {"start": "2", "end": "3"} |
+--------+---------------------------+---------------+----------------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.