The ML.CENTROIDS function
This document describes the ML.CENTROIDS function, which lets you return
information about the
centroids
in a
k-means model.
Syntax
ML.CENTROIDS( MODEL `PROJECT_ID.DATASET.MODEL`, STRUCT([, STANDARDIZE AS standardize]))
Arguments
ML.CENTROIDS takes the following arguments:
PROJECT_ID: your project ID.DATASET: the BigQuery dataset that contains the model.MODEL: the name of the model.STANDARDIZE: aBOOLvalue that specifies whether the centroid features should be standardized to assume that all features have a mean of0and a standard deviation of1. Standardizing the features allows the absolute magnitude of the values to be compared to each other. The default value isFALSE.
Output
ML.CENTROIDS returns the following columns:
trial_id: anINT64value that contains the hyperparameter tuning trial ID. This column is only returned if you ran hyperparameter tuning when creating the model.centroid_id: anINT64value that contains the centroid ID.feature: aSTRINGvalue that contains the feature column name.numerical_value: aFLOAT64value that contains the feature value for the centroid thatcentroid_ididentifies if the column identified by thefeaturevalue is numeric. Otherwise,numerical_valueisNULL.categorical_value: anARRAY<STRUCT>value that contains information about categorical features. Each struct contains the following fields:categorical_value.category: aSTRINGvalue that contains the name of each category.categorical_value.value: aSTRINGvalue that contains the value ofcategorical_value.categoryfor the centroid thatcentroid_ididentifies.
geography_value: aSTRINGvalue that contains thecategorical_value.categoryvalue for the centroid thatcentroid_ididentifies if the column identified by thefeaturevalue is of typeGEOGRAPHY. Otherwise,geography_valuevalue isNULL.
The output contains one row per feature per centroid.
Examples
The following examples show how to use ML.CENTROIDS with and without the
standardize argument.
Without standardization
Numerical features
The following example retrieves centroid information from the model
mydataset.my_kmeans_model in your default project. This model only
contains numerical features.
SELECT * FROM ML.CENTROIDS(MODEL `mydataset.my_kmeans_model`)
This query returns results like the following:
+-------------+-------------------+----------------------+---------------------+ | centroid_id | feature | numerical_value | categorical_value | +-------------+-------------------+----------------------+---------------------+ | 3 | x_coordinate | 3095929.0 | [] | | 3 | y_coordinate | 1.0089726307692308E7 | [] | | 2 | x_coordinate | 3117072.65625 | [] | | 2 | y_coordinate | 1.0083220745833334E7 | [] | | 1 | x_coordinate | 3259947.096227731 | [] | | 1 | y_coordinate | 1.0105690227895036E7 | [] | | 4 | x_coordinate | 3109887.9056603773 | [] | | 4 | y_coordinate | 1.0057112358490566E7 | [] | +-------------+-------------------+----------------------+---------------------+
Categorical features
The following example retrieves centroid information from the model
mydataset.my_kmeans_model in your default project. This model contains
categorical features.
SELECT * FROM ML.CENTROIDS(MODEL `mydataset.my_kmeans_model`) ORDER BY centroid_id;
This query returns results like the following:
+-------------+-------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| centroid_id | feature |numerical_value| categorical_value |
+-------------+-------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | department | NULL | [{"category":"Medieval Art","feature_value":"1.0"}] |
| 1 | medium | NULL | [{"category":"Iron","feature_value":"0.21602160216021601"},{"category":"Glass, ceramic","feature_value":"0.3933393339333933"},{"category":"Copper alloy","feature_value":"0.39063906390639064"}] |
| 2 | medium | NULL | [{"category":"Wood, gesso, paint","feature_value":"0.15"},{"category":"Carnelian","feature_value":"0.2692307692307692"},{"category":"Papyrus, ink","feature_value":"0.2653846153846154"},{"category":"Steatite, glazed","feature_value":"0.3153846153846154"}] |
| 2 | department | NULL | [{"category":"Egyptian Art","feature_value":"1.0"}] |
| 3 | medium | NULL | [{"category":"Faience","feature_value":"1.0"}] |
| 3 | department | NULL | [{"category":"Egyptian Art","feature_value":"1.0"}] |
| 4 | medium | NULL | [{"category":"Steatite","feature_value":"1.0"}] |
| 4 | department | NULL | [{"category":"Egyptian Art","feature_value":"1.0"}] |
| 5 | medium | NULL | [{"category":"Red quartzite","feature_value":"0.20316027088036118"},{"category":"Bronze or copper alloy","feature_value":"0.3476297968397291"},{"category":"Gold","feature_value":"0.4492099322799097"}] |
| 5 | department | NULL | [{"category":"Egyptian Art","feature_value":"1.0"}] |
+-------------+-------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Numerical and categorical features
The following are the results from the same query against a k-means model with both numerical and categorical features.
+-------------+--------------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| centroid_id | feature | numerical_value | categorical_value |
+-------------+--------------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | start_station_name | NULL | [{"category":"Toomey Rd @ South Lamar","value":"0.5714285714285714"},{"category":"State Capitol @ 14th & Colorado","value":"0.42857142857142855"}] |
| 1 | duration_minutes | 9.142857142857142 | [] |
| 2 | duration_minutes | 9.0 | [] |
| 2 | start_station_name | NULL | [{"category":"Rainey @ River St","value":"0.14285714285714285"},{"category":"11th & San Jacinto","value":"0.42857142857142855"},{"category":"ACC - West & 12th Street","value":"0.14285714285714285"},{"category":"East 11th St. at Victory Grill","value":"0.2857142857142857"}] |
+-------------+--------------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
With standardization
The following example retrieves centroid information from the model
mydataset.my_kmeans_model in your default project. The query in this example
assumes that all features have a mean of 0 and a standard deviation of 1.
SELECT * FROM ML.CENTROIDS(MODEL `mydataset.my_kmeans_model`, STRUCT(TRUE AS standardize))
What's next
- For more information about model weights support in BigQuery ML, see BigQuery ML model weights overview.
- For more information about supported SQL statements and functions for ML models, see End-to-end user journeys for ML models.