The ML.WEIGHTS function
This document describes the ML.WEIGHTS function, which lets you see the
underlying weights that a model uses during prediction. This function applies to
linear and logistic regression models
and
matrix factorization models.
For matrix factorization models, you can use the
ML.GENERATE_EMBEDDING function
as an alternative to the ML.WEIGHTS function.
ML.GENERATE_EMBEDDING generates the same factor weights and intercept data as
ML.WEIGHTS as an array in a single column, rather than in two columns.
Having all of the embeddings in a single column lets you directly use the
VECTOR_SEARCH function
on theML.GENERATE_EMBEDDING output.
Syntax
ML.WEIGHTS( MODEL `PROJECT_ID.DATASET.MODEL`, STRUCT([, STANDARDIZE AS standardize]))
Arguments
ML.WEIGHTS takes the following arguments:
PROJECT_ID: your project ID.DATASET: the BigQuery dataset that contains the model.MODEL: the name of the model.STANDARDIZE: aBOOLvalue that specifies whether the model weights should be standardized to assume that all features have a mean of0and a standard deviation of1. Standardizing the weights allows the absolute magnitude of the weights to be compared to each other. The default value isFALSE. This argument only applies to linear and logistic regression models.
Output
ML.WEIGHTS has different output columns for different model types.
Linear and logistic regression models
For linear and logistic regression models, ML.WEIGHTS returns the
following columns:
trial_id: anINT64value that contains the hyperparameter tuning trial ID. This column is only returned if you ran hyperparameter tuning when creating the model.processed_input: aSTRINGvalue that contains the name of the feature input column. The value of this column matches the name of the feature column provided in thequery_statementclause that was used when the model was trained.weight: if the column identified by theprocessed_inputvalue is numerical,weightcontains aFLOAT64value and thecategory_weightscolumn containsNULLvalues. If the column identified by theprocessed_inputvalue is non-numerical and has been converted to one-hot encoding, theweightcolumn isNULLand thecategory_weightscolumn contains the category names and weights for each category.category_weights.category: aSTRINGvalue that contains the category name if the column identified by theprocessed_inputvalue is non-numeric.category_weights.weight: aFLOAT64that contains the category's weight if the column identified by theprocessed_inputvalue is non-numeric.class_label: aSTRINGvalue that contains the label for a given weight. Only used for multiclass models. The output includes one row per<class_label, processed_input>combination.
If you used the
TRANSFORM clause
in the CREATE MODEL statement that created the model, ML.WEIGHTS outputs
the weights of TRANSFORM output features. The weights are denormalized by
default, with the option to get normalized weights, exactly like models that
are created without TRANSFORM.
Matrix factorization models
For matrix factorization models, ML.WEIGHTS returns the following columns:
trial_id: anINT64value that contains the hyperparameter tuning trial ID. This column is only returned if you ran hyperparameter tuning when creating the model.processed_input: aSTRINGvalue that contains the name of the user or item column. The value of this column matches the name of the user or item column provided in thequery_statementclause that was used when the model was trained.feature: aSTRINGvalue that contains the names of the specific users or items used during training.factor_weights: anARRAY<STRUCT>value that contains the factors and the weights for each factor.factor_weights.factor: anINT64value that contains the latent factor from training. This value can be between1and the value of theNUM_FACTORSoption.factor_weights.weight: aFLOAT64value that contains the weight of the respective factor and feature.
intercept: aFLOAT64value that contains the intercept or bias term for a feature.
There is an additional row in the output that contains the
global__intercept__ value calculated from the input data. This row has NULL
values for the processed_input and factor_weights columns. For
implicit feedback
models, global__intercept__ is always 0.
Examples
The following examples show how to use ML.WEIGHTS with and without the
standardize argument.
Without standardization
The following example retrieves weight information from mymodel in
mydataset. The dataset is in your default project. It returns the weights
that are associated with each one-hot encoded category for the input column
input_col.
SELECT category, weight FROM UNNEST(( SELECT category_weights FROM ML.WEIGHTS(MODEL `mydataset.mymodel`) WHERE processed_input = 'input_col'))
This command uses the UNNEST
function because the category_weights column is a nested repeated column.
With standardization
The following example retrieves weight information from mymodel in
mydataset. The dataset is in your default project. It retrieves standardized
weights, which assume all features have a mean of 0 and a standard deviation
of 1.
SELECT * FROM ML.WEIGHTS(MODEL `mydataset.mymodel`, STRUCT(true AS standardize))
What's next
- For more information about model weights support in BigQuery ML, see BigQuery ML model weights overview.
- For more information about supported SQL statements and functions for ML models, see End-to-end user journeys for ML models.