The CREATE MODEL statement
To create a model in BigQuery, use the BigQuery ML CREATE
MODEL statement. This statement is similar to the
CREATE TABLE
DDL statement. When you run a query that contains a CREATE MODEL statement, a
query job is generated for you that processes
the query. You can also use the Google Cloud console user interface to
create a model by using a UI
(Preview).
For more information about supported SQL statements and functions for each model type, see the following documents:
- End-to-end user journeys for generative AI models
- End-to-end user journeys for time series forecasting models
- End-to-end user journeys for ML models
- End-to-end user journeys for imported models
- Contribution analysis user journey
Required permissions
To create a dataset to store the model, you need the
bigquery.datasets.createIAM permission.To create a model, you need the following permissions:
bigquery.jobs.createbigquery.models.createbigquery.models.getDatabigquery.models.updateDatabigquery.connections.delegate(for remote models)
The following predefined IAM roles grant these permissions:
For more information about IAM roles and permissions in BigQuery, see Introduction to IAM.
CREATE MODEL syntax
{CREATE MODEL | CREATE MODEL IF NOT EXISTS | CREATE OR REPLACE MODEL}
model_name
[TRANSFORM (select_list)]
[INPUT (field_name field_type)
OUTPUT (field_name field_type)]
[REMOTE WITH CONNECTION {`connection_name` | DEFAULT}]
[OPTIONS(model_option_list)]
[AS {query_statement |
(
training_data AS (query_statement),
custom_holiday AS (holiday_statement)
)}]
model_option_list:
MODEL_TYPE = { 'LINEAR_REG' |
'LOGISTIC_REG' |
'KMEANS' |
'MATRIX_FACTORIZATION' |
'PCA' |
'AUTOENCODER' |
'AUTOML_CLASSIFIER' |
'AUTOML_REGRESSOR' |
'BOOSTED_TREE_CLASSIFIER' |
'BOOSTED_TREE_REGRESSOR' |
'RANDOM_FOREST_CLASSIFIER' |
'RANDOM_FOREST_REGRESSOR' |
'DNN_CLASSIFIER' |
'DNN_REGRESSOR' |
'DNN_LINEAR_COMBINED_CLASSIFIER' |
'DNN_LINEAR_COMBINED_REGRESSOR' |
'ARIMA_PLUS' |
'ARIMA_PLUS_XREG' |
'TENSORFLOW' |
'TENSORFLOW_LITE' |
'ONNX' |
'XGBOOST' |
'CONTRIBUTION_ANALYSIS'}
[, MODEL_REGISTRY = { 'VERTEX_AI' } ]
[, VERTEX_AI_MODEL_ID = string_value ]
[, VERTEX_AI_MODEL_VERSION_ALIASES = string_array ]
[, INPUT_LABEL_COLS = string_array ]
[, MAX_ITERATIONS = int64_value ]
[, EARLY_STOP = { TRUE | FALSE } ]
[, MIN_REL_PROGRESS = float64_value ]
[, DATA_SPLIT_METHOD = { 'AUTO_SPLIT' | 'RANDOM' | 'CUSTOM' | 'SEQ' | 'NO_SPLIT' } ]
[, DATA_SPLIT_EVAL_FRACTION = float64_value ]
[, DATA_SPLIT_TEST_FRACTION = float64_value ]
[, DATA_SPLIT_COL = string_value ]
[, OPTIMIZE_STRATEGY = { 'AUTO_STRATEGY' | 'BATCH_GRADIENT_DESCENT' | 'NORMAL_EQUATION' } ]
[, L1_REG = float64_value ]
[, L2_REG = float64_value ]
[, LEARN_RATE_STRATEGY = { 'LINE_SEARCH' | 'CONSTANT' } ]
[, LEARN_RATE = float64_value ]
[, LS_INIT_LEARN_RATE = float64_value ]
[, WARM_START = { TRUE | FALSE } ]
[, AUTO_CLASS_WEIGHTS = { TRUE | FALSE } ]
[, CLASS_WEIGHTS = struct_array ]
[, INSTANCE_WEIGHT_COL = string_value ]
[, NUM_CLUSTERS = int64_value ]
[, KMEANS_INIT_METHOD = { 'RANDOM' | 'KMEANS++' | 'CUSTOM' } ]
[, KMEANS_INIT_COL = string_value ]
[, DISTANCE_TYPE = { 'EUCLIDEAN' | 'COSINE' } ]
[, STANDARDIZE_FEATURES = { TRUE | FALSE } ]
[, MODEL_PATH = string_value ]
[, BUDGET_HOURS = float64_value ]
[,