The ML.ROC_CURVE function
This document describes the ML.ROC_CURVE function, which you can use to
evaluate binary class classification specific metrics.
Syntax
ML.ROC_CURVE(
MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },
[, GENERATE_ARRAY(THRESHOLDS)]
[, STRUCT(TRIAL_ID AS trial_id)])
Arguments
ML.ROC_CURVE takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: the name of the model.TABLE: the name of the input table that contains the evaluation data.If
TABLEis specified, the input column names in the table must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name that's provided during training. This value is provided using theinput_label_colsoption. Ifinput_label_colsis unspecified, the column that's namedlabelin the training data is used.If you don't specify either
TABLEorQUERY_STATEMENT,ML.ROC_CURVEcomputes the curve results as follows:- If the data is split during training, the split evaluation data is used to compute the curve results.
- If the data is not split during training, the entire training input is used to compute the curve results.
QUERY_STATEMENT: a GoogleSQL query that is used to generate the evaluation data. For the supported SQL syntax of theQUERY_STATEMENTclause in GoogleSQL, see Query syntax.If
QUERY_STATEMENTis specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name provided during training. This value is provided using theinput_label_colsoption. Ifinput_label_colsis unspecified, the column namedlabelin the training data is used. The extra columns are ignored.If you used the
TRANSFORMclause in theCREATE MODELstatement that created the model, then only the input columns present in theTRANSFORMclause must appear inQUERY_STATEMENT.If you don't specify either
tableorQUERY_STATEMENT,ML.ROC_CURVEcomputes the curve results as follows:- If the data is split during training, the split evaluation data is used to compute the curve results.
- If the data is not split during training, the entire training input is used to compute the curve results.
THRESHOLDS: anARRAY<FLOAT64>value that specifies the percentile values of the prediction output supplied by theGENERATE_ARRAYfunction.TRIAL_ID: anINT64value that identifies the hyperparameter tuning trial that you want the function to evaluate. The function uses the optimal trial by default. Only specify this argument if you ran hyperparameter tuning when creating the model.
Output
ML.ROC_CURVE returns multiple rows with metrics for
different threshold values for the model. The metrics include the following:
threshold: aFLOAT64value that contains the custom threshold for the binary class classification model.recall: aFLOAT64value that indicates the proportion of actual positive cases that were correctly predicted by the model.true_positives: anINT64value that contains the number of cases correctly predicted as positive by the model.false_positives: anINT64value that contains the number of cases incorrectly predicted as positive by the model.true_negatives: anINT64value that contains the number of cases correctly predicted as negative by the model.false_negatives: anINT64value that contains the number of cases incorrectly predicted as negative by the model.
Examples
The following examples assume your model and input table are in your default project.
Evaluate the ROC curve of a binary class logistic regression model
The following query returns all of the output columns for ML.ROC_CURVE. You
can graph the recall and false_positive_rate values for an ROC curve. The
threshold values returned are chosen based on the percentile values of the
prediction output.
SELECT * FROM ML.ROC_CURVE(MODEL `mydataset.mymodel`, TABLE `mydataset.mytable`)
Evaluate an ROC curve with custom thresholds
The following query returns all of the output columns for ML.ROC_CURVE. The
threshold values returned are chosen based on the output of the GENERATE_ARRAY
function.
SELECT * FROM ML.ROC_CURVE(MODEL `mydataset.mymodel`, TABLE `mydataset.mytable`, GENERATE_ARRAY(0.4,0.6,0.01))
Evaluate the precision-recall curve
Instead of getting an ROC curve (the recall versus false positive rate), the following query calculates a precision-recall curve by using the precision from the true and false positive counts:
SELECT recall, true_positives / (true_positives + false_positives) AS precision FROM ML.ROC_CURVE(MODEL `mydataset.mymodel`, TABLE `mydataset.mytable`)
What's next
- For more information about model evaluation, see BigQuery ML model evaluation overview.
- For more information about supported SQL statements and functions for ML models, see End-to-end user journeys for ML models.