Modifying table schemas
This document describes how to modify the schema definitions for existing BigQuery tables.
You can make most schema modifications described in this document by using SQL data definition language (DDL) statements. These statements don't incur charges.
You can modify a table schema in all the ways described on this page by exporting your table data to Cloud Storage, and then loading the data into a new table with the modified schema definition. BigQuery load and extract jobs are free, but you incur costs for storing the exported data in Cloud Storage. The following sections describe other ways of performing various types of schema modifications.
Add a column
You can add columns to an existing table's schema definition by using one of the following options:
- Add a new empty column.
- Overwrite a table with a load or query job.
- Append data to a table with a load or query job.
Any column you add must adhere to BigQuery's rules for column names. For more information on creating schema components, see Specifying a schema.
Add an empty column
If you add new columns to an existing table schema, the columns must be
NULLABLE or REPEATED. You cannot add a REQUIRED column to an existing
table schema. Adding a REQUIRED column to an existing table
schema in the API or bq command-line tool causes an error. However, you can create a
nested REQUIRED column as part of a new RECORD field.
REQUIRED columns can be added only when you
create a table while loading data, or when you create an empty table with a
schema definition.
To add empty columns to a table's schema definition:
Console
In the Google Cloud console, go to the BigQuery page.
In the left pane, click Explorer:
If you don't see the left pane, click Expand left pane to open the pane.
In the Explorer pane, expand your project, click Datasets, and then select a dataset.
Click Overview > Tables, and then select the table.
In the details pane, click the Schema tab.
Click Edit schema. You might need to scroll to see this button.
In the Current schema page, under New fields, click Add field.
When you are done adding columns, click Save.
SQL
Use the
ALTER TABLE ADD COLUMN DDL statement:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
ALTER TABLE mydataset.mytable ADD COLUMN new_column STRING;
Click Run.
For more information about how to run queries, see Run an interactive query.
bq
Issue the bq update command and provide a JSON schema file. If the table
you're updating is in a project other than your default project, add the
project ID to the dataset name in the following format:
PROJECT_ID:DATASET.
bq update PROJECT_ID:DATASET.TABLE SCHEMA
Replace the following:
PROJECT_ID: your project ID.DATASET: the name of the dataset that contains the table you're updating.TABLE: the name of the table you're updating.SCHEMA: the path to the JSON schema file on your local machine.
When you specify an inline schema, you cannot specify the column
description, mode, and RECORD (STRUCT)
type. All column modes default to NULLABLE. As a result, if you are
adding a new nested column to a RECORD, you must
supply a JSON schema file.
If you attempt to add columns using an inline schema definition, you must
supply the entire schema definition including the new columns. Because you
cannot specify column modes using an inline schema definition, the update
changes any existing REPEATED column to NULLABLE, which
produces the following error: BigQuery error in update
operation: Provided Schema does not match Table
PROJECT_ID:dataset.table. Field field has changed mode
from REPEATED to NULLABLE.
The preferred method of adding columns to an existing table using the bq command-line tool is to supply a JSON schema file.
To add empty columns to a table's schema using a JSON schema file:
First, issue the
bq showcommand with the--schemaflag and write the existing table schema to a file. If the table you're updating is in a project other than your default project, add the project ID to the dataset name in the following format:PROJECT_ID:DATASET.bq show \ --schema \ --format=prettyjson \ PROJECT_ID:DATASET.TABLE > SCHEMA
Replace the following:
PROJECT_ID: your project ID.DATASET: the name of the dataset that contains the table you're updating.TABLE: the name of the table you're updating.SCHEMA: the schema definition file written to your local machine.
For example, to write the schema definition of
mydataset.mytableto a file, enter the following command.mydataset.mytableis in your default project.bq show \ --schema \ --format=prettyjson \ mydataset.mytable > /tmp/myschema.jsonOpen the schema file in a text editor. The schema should look like the following:
[ { "mode": "REQUIRED", "name": "column1", "type": "STRING" }, { "mode": "REQUIRED", "name": "column2", "type": "FLOAT" }, { "mode": "REPEATED", "name": "column3", "type": "STRING" } ]Add the new columns to the end of the schema definition. If you attempt to add new columns elsewhere in the array, the following error is returned:
BigQuery error in update operation: Precondition Failed.Using a JSON file, you can specify descriptions,
NULLABLEorREPEATEDmodes, andRECORDtypes for new columns. For example, using the schema definition from the previous step, your new JSON array would look like the following. In this example, a newNULLABLEcolumn is added namedcolumn4.column4includes a description.[ { "mode": "REQUIRED", "name": "column1", "type": "STRING" }, { "mode": "REQUIRED", "name": "column2", "type": "FLOAT" }, { "mode": "REPEATED", "name": "column3", "type": "STRING" }, { "description": "my new column", "mode": "NULLABLE", "name": "column4", "type": "STRING" } ]For more information on working with JSON schema files, see Specifying a JSON schema file.
After updating your schema file, issue the following command to update the table's schema. If the table you're updating is in a project other than your default project, add the project ID to the dataset name in the following format:
PROJECT_ID:DATASET.bq update PROJECT_ID:DATASET.TABLE SCHEMA
Replace the following:
PROJECT_ID: your project ID.DATASET: the name of the dataset that contains the table you're updating.TABLE: the name of the table you're updating.SCHEMA: the schema definition file written to your local machine.
For example, enter the following command to update the schema definition of
mydataset.mytablein your default project. The path to the schema file on your local machine is/tmp/myschema.json.bq update mydataset.mytable /tmp/myschema.json
API
Call the tables.patch
method and use the schema property to add empty columns to your schema
definition. Because the tables.update method replaces the entire table
resource, the tables.patch method is preferred.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.js
Before trying this sample, follow the Node.js setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Node.js API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.
Python
Before trying this sample, follow the Python setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Python API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.
Add a nested column to a RECORD column
In addition to adding new columns to a table's schema, you can also add new
nested columns to a RECORD column. The process for adding a new nested column is
similar to the process for adding a new column.
Console
Adding a new nested field to an existing RECORD column is not
supported by the Google Cloud console.
SQL
Adding a new nested field to an existing RECORD column by using a SQL DDL
statement is not supported.
bq
Issue the bq update command and provide a JSON schema file that adds the
nested field to the existing RECORD column's schema definition. If the
table you're updating is in a project other than your default project, add
the project ID to the dataset name in the following format:
PROJECT_ID:DATASET.
bq update PROJECT_ID:DATASET.TABLE SCHEMA
Replace the following:
PROJECT_ID: your project ID.DATASET: the name of the dataset that contains the table you're updating.TABLE: the name of the table you're updating.SCHEMA: the path to the JSON schema file on your local machine.
When you specify an inline schema, you cannot specify the column
description, mode, and RECORD (STRUCT)
type. All column modes default to NULLABLE. As a result, if you are
adding a new nested column to a RECORD, you must
supply a JSON schema file.
To add a nested column to a RECORD using a JSON schema file:
First, issue the
bq showcommand with the--schemaflag and write the existing table schema to a file. If the table you're updating is in a project other than your default project, add the project ID to the dataset name in the following format:PROJECT_ID:DATASET.TABLE.bq show \ --schema \ --format=prettyjson \ PROJECT_ID:DATASET.TABLE > SCHEMA
Replace the following:
PROJECT_ID: your project ID.DATASET: the name of the dataset that contains the table you're updating.TABLE: the name of the table you're updating.SCHEMA: the schema definition file written to your local machine.
For example, to write the schema definition of
mydataset.mytableto a file, enter the following command.mydataset.mytableis in your default project.bq show \ --schema \ --format=prettyjson \ mydataset.mytable > /tmp/myschema.jsonOpen the schema file in a text editor. The schema should look like the following. In this example,
column3is a nested repeated column. The nested columns arenested1andnested2. Thefieldsarray lists the fields nested withincolumn3.[ { "mode": "REQUIRED", "name": "column1", "type": "STRING" }, { "mode": "REQUIRED", "name": "column2", "type": "FLOAT" }, { "fields": [ { "mode": "NULLABLE", "name": "nested1", "type": "STRING" }, { "mode": "NULLABLE", "name": "nested2", "type": "STRING" } ], "mode": "REPEATED", "name": "column3", "type": "RECORD" } ]Add the new nested column to the end of the
fieldsarray. In this example,nested3is the new nested column.[ { "mode": "REQUIRED", "name": "column1", "type": "STRING" }, { "mode": "REQUIRED", "name": "column2", "type": "FLOAT" }, { "fields": [ { "mode": "NULLABLE", "name": "nested1", "type": "STRING" }, { "mode": "NULLABLE", "name": "nested2", "type": "STRING" }, { "mode": "NULLABLE", "name": "nested3", "type": "STRING" } ], "mode": "REPEATED", "name": "column3", "type": "RECORD" } ]For more information on working with JSON schema files, see Specifying a JSON schema file.
After updating your schema file, issue the following command to update the table's schema. If the table you're updating is in a project other than your default project, add the project ID to the dataset name in the following format:
PROJECT_ID:DATASET.bq update PROJECT_ID:DATASET.TABLE SCHEMA
Replace the following:
PROJECT_ID: your project ID.DATASET: the name of the dataset that contains the table you're updating.TABLE: the name of the table you're updating.SCHEMA: the path to the JSON schema file on your local machine.
For example, enter the following command to update the schema definition of
mydataset.mytablein your default project. The path to the schema file on your local machine is/tmp/myschema.json.bq update mydataset.mytable /tmp/myschema.json
API
Call the tables.patch
method and use the schema property to add the nested columns to your
schema definition. Because the tables.update method replaces the entire
table resource, the tables.patch method is preferred.
Add columns when you overwrite or append data
You can add new columns to an existing table when you load data into it and choose to overwrite the existing table. When you overwrite an existing table, the schema of the data you're loading is used to overwrite the existing table's schema. For information on overwriting a table using a load job, see the document for your data's format:
Add columns in a load append job
You can add columns to a table when you append data to it in a load job. The new schema is determined by one of the following:
- Autodetection (for CSV and JSON files)
- A schema specified in a JSON schema file (for CSV and JSON files)
- The self-describing source data for Avro, ORC, Parquet and Datastore export files
If you specify the schema in a JSON file, the new columns must be defined in it. If the new column definitions are missing, an error is returned when you attempt to append the data.
When you add new columns during an append operation,
the values in the new columns are set to NULL for existing rows.
To add a new column when you append data to a table during a load job, use one of the following options:
bq
Use the bq load command to load your data and specify the --noreplace
flag to indicate that you are appending the data to an existing table.
If the data you're appending is in CSV or newline-delimited JSON format,
specify the --autodetect flag to use schema auto-detection
or supply the schema in a JSON schema file. The added columns can be
automatically inferred from Avro or Datastore export files.
Set the --schema_update_option flag to ALLOW_FIELD_ADDITION to indicate
that the data you're appending contains new columns.
If the table you're appending is in a dataset in a project other than your
default project, add the project ID to the dataset name in the following
format: PROJECT_ID:DATASET.
(Optional) Supply the --location flag and set the value to your
location.
Enter the load command as follows:
bq --location=LOCATION load \ --noreplace \ --autodetect \ --schema_update_option=ALLOW_FIELD_ADDITION \ --source_format=FORMAT \ PROJECT_ID:DATASET.TABLE \ PATH_TO_SOURCE \ SCHEMA
Replace the following:
LOCATION: the name of your location. The--locationflag is optional. For example, if you are using BigQuery in the Tokyo region, set the flag's value toasia-northeast1. You can set a default value for the location using the .bigqueryrc file.FORMAT: the format of the schema.NEWLINE_DELIMITED_JSON,CSV,AVRO,PARQUET,ORC, orDATASTORE_BACKUP.PROJECT_ID: your project ID.DATASET: the name of the dataset that contains the table.TABLE: the name of the table you're appending.PATH_TO_SOURCE: a fully-qualified Cloud Storage URI, a comma-separated list of URIs, or the path to a data file on your local machine.SCHEMA: the path to a local JSON schema file. A schema file is required only for CSV and JSON files when--autodetectis unspecified. Avro and Datastore schemas are inferred from the source data.
Examples:
Enter the following command to append a local Avro data file,
/tmp/mydata.avro, to mydataset.mytable using a load job. Because schemas
can be automatically inferred from Avro data you don't need to use
the --autodetect flag. mydataset is in your default project.
bq load \
--noreplace \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=AVRO \
mydataset.mytable \
/tmp/mydata.avro
Enter the following command append a newline-delimited JSON data file in
Cloud Storage to mydataset.mytable using a load job. The --autodetect
flag is used to detect the new columns. mydataset is in your default
project.
bq load \
--noreplace \
--autodetect \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
gs://mybucket/mydata.json
Enter the following command append a newline-delimited JSON data file in
Cloud Storage to mydataset.mytable using a load job. The schema
containing the new columns is specified in a local JSON schema file,
/tmp/myschema.json. mydataset is in myotherproject, not your default
project.
bq load \
--noreplace \
--schema_update_option=ALLOW_FIELD_ADDITION \
--source_format=NEWLINE_DELIMITED_JSON \
myotherproject:mydataset.mytable \
gs://mybucket/mydata.json \
/tmp/myschema.json
API
Call the jobs.insert
method. Configure a load job and set the following properties:
- Reference your data in Cloud Storage using the
sourceUrisproperty. - Specify the data format by setting the
sourceFormatproperty. - Specify the schema in the
schemaproperty. - Specify the schema update option using the
schemaUpdateOptionsproperty. - Set the write disposition of the destination table to
WRITE_APPENDusing thewriteDispositionproperty.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.js
Before trying this sample, follow the Node.js setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Node.js API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.
Python
Before trying this sample, follow the Python setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the