The AI.FORECAST function
This document describes the AI.FORECAST function, which lets you
forecast a time series by using BigQuery ML's built-in
TimesFM model.
Using the AI.FORECAST function with the built-in TimesFM model lets you
perform forecasting without having to create and train your own model, so you
can avoid the need for model management.
Syntax
SELECT
*
FROM
AI.FORECAST(
{ TABLE TABLE | (QUERY_STATEMENT) },
data_col => 'DATA_COL',
timestamp_col => 'TIMESTAMP_COL'
[, model => 'MODEL']
[, id_cols => ID_COLS]
[, horizon => HORIZON]
[, confidence_level => CONFIDENCE_LEVEL]
[, context_window => CONTEXT_WINDOW]
)
Arguments
AI.FORECAST takes the following arguments:
TABLE: the name of the table that contains the data that you want to forecast. For example,`mydataset.mytable`.If the table is in a different project, then you must prepend the project ID to the table name in the following format, including backticks:
`[PROJECT_ID].[DATASET].[TABLE]`For example,
`myproject.mydataset.mytable`.To prevent query errors, we recommend providing the fully qualified table name, including backticks. This is especially important if the project name contains characters other than letters, numbers, and underscores.
QUERY_STATEMENT: the GoogleSQL query that generates the data that you want to forecast. See the GoogleSQL query syntax page for the supported SQL syntax of theQUERY_STATEMENTclause.DATA_COL: aSTRINGvalue that specifies the name of the data column. The data column contains the data to forecast. The data column must use one of the following data types:INT64NUMERICBIGNUMERICFLOAT64
TIMESTAMP_COL: aSTRINGvalue that specified the name of the timestamp column. The timestamp column must use one of the following data types:TIMESTAMPDATEDATETIME
MODEL: aSTRINGvalue that specifies the name of the model to use.TimesFM 2.0is the only supported value, and is the default value.ID_COLS: anARRAY<STRING>value that specifies the names of one or more ID columns. Each unique combination of IDs identifies a unique time series to forecast. Specify one or more values for this argument in order to forecast multiple time series using a single query. The columns that you specify must use one of the following data types:STRINGINT64ARRAY<STRING>ARRAY<INT64>
HORIZON: anINT64value that specifies the number of time series data points to forecast. The default value is10. The valid input range is[1, 10,000].CONFIDENCE_LEVEL: aFLOAT64value that specifies the percentage of the future values that fall in the prediction interval. The default value is0.95. The valid input range is[0, 1).CONTEXT_WINDOW: anINT64value that specifies the context window length used by BigQuery ML's built-in TimesFM model. The context window length determines how many of the most recent data points from the input time series are use by the model. For example, if your time series date range is March 1 to April 15, data points are selected starting at April 15 and working backwards. Valid values are as follows:6412825651210242048
If you don't specify a
CONTEXT_WINDOWvalue, theAI.FORECASTfunction automatically chooses the smallest possible context window length to use that is still large enough to cover the number of time series data points in your input data. The following table shows the mapping between the number of time series data points in the input data and the selected context window length:Number of time series data points Context window length (1, 64] 64 (65, 128] 128 (129, 256] 256 (257, 512] 512 (513, 1024] 1,024 (1025, 2048] 2,048 >2048 2,048 2,048 is the maximum number of time series data points that are passed to the model. Any additional time series data points in the input data are ignored.
Output
AI.FORECAST returns the following columns:
id_cols: one or more values that contain the identifiers of a time series.id_colscan be anINT64,STRING,ARRAY<INT64>orARRAY<STRING>value. The column names and types are inherited from theID_COLSargument value specified in the function input.confidence_level: aFLOAT64value that contains theconfidence_levelvalue that you specified in the function input, or0.95if you didn't specify aconfidence_levelvalue. This value is the same across all rows.prediction_interval_lower_bound: aFLOAT64value that contains the lower bound of the prediction interval for each forecasted point.prediction_interval_upper_bound: aFLOAT64value that contains the upper bound of the prediction interval for each forecasted point.ai_forecast_status: aSTRINGvalue that contains the forecast status. This value is empty if the operation was successful. If the operation wasn't successful, the value is the error string. A common error isThe time series data is too short.This error indicates that there wasn't enough historical data in the time series to generate a forecast. A minimum of 3 data points is required.forecast_timestamp: aTIMESTAMPvalue that contains the timestamps of the time series.forecast_value: aFLOAT64value that contains the 50% quantile value for the forecasting output from the model. The 50% quantile value represents the median value of the forecasted data.
Example
The following example forecasts the daily number of bike trips for each different user type for the next 30 days.
WITH
citibike_trips AS (
SELECT EXTRACT(DATE FROM starttime) AS date, usertype, COUNT(*) AS num_trips
FROM `bigquery-public-data.new_york.citibike_trips`
GROUP BY date, usertype
)
SELECT *
FROM
AI.FORECAST(
TABLE citibike_trips,
data_col => 'num_trips',
timestamp_col => 'date',
id_cols => ['usertype'],
horizon => 30);
Locations
AI.FORECAST and the TimesFM model are available in all
supported BigQuery ML locations.
Pricing
AI.FORECAST usage is billed at the evaluation, inspection, and prediction
rate documented in the BigQuery ML on-demand pricing section
of the BigQuery ML pricing page.
What's next
- Try using a TimesFM model with the
AI.FORECASTfunction. - Evaluate forecasting results from the TimesFM model using the
AI.EVALUATEfunction. - For information about forecasting in BigQuery ML, see Forecasting overview.
- For more information about supported SQL statements and functions for time series forecasting models, see End-to-end user journeys for time series forecasting models.