Specify nested and repeated columns in table schemas

This page describes how to define a table schema with nested and repeated columns in BigQuery. For an overview of table schemas, see Specifying a schema.

Define nested and repeated columns

To create a column with nested data, set the data type of the column to RECORD in the schema. A RECORD can be accessed as a STRUCT type in GoogleSQL. A STRUCT is a container of ordered fields.

To create a column with repeated data, set the mode of the column to REPEATED in the schema. A repeated field can be accessed as an ARRAY type in GoogleSQL.

A RECORD column can have REPEATED mode, which is represented as an array of STRUCT types. Also, a field within a record can be repeated, which is represented as a STRUCT that contains an ARRAY. An array cannot contain another array directly. For more information, see Declaring an ARRAY type.

Limitations

Nested and repeated schemas are subject to the following limitations:

A schema cannot contain more than 15 levels of nested RECORD types.
Columns of type RECORD can contain nested RECORD types, also called child records. The maximum nested depth limit is 15 levels. This limit is independent of whether the RECORDs are scalar or array-based (repeated).

RECORD type is incompatible with UNION, INTERSECT, EXCEPT DISTINCT, and SELECT DISTINCT.

Example schema

The following example shows sample nested and repeated data. This table contains information about people. It consists of the following fields:

  • id
  • first_name
  • last_name
  • dob (date of birth)
  • addresses (a nested and repeated field)
    • addresses.status (current or previous)
    • addresses.address
    • addresses.city
    • addresses.state
    • addresses.zip
    • addresses.numberOfYears (years at the address)

The JSON data file would look like the following. Notice that the addresses column contains an array of values (indicated by [ ]). The multiple addresses in the array are the repeated data. The multiple fields within each address are the nested data.

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}
{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

The schema for this table looks like the following:

[
    {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "first_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "last_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "dob",
        "type": "DATE",
        "mode": "NULLABLE"
    },
    {
        "name": "addresses",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
            {
                "name": "status",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "address",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "city",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "state",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "zip",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "numberOfYears",
                "type": "STRING",
                "mode": "NULLABLE"
            }
        ]
    }
]

Specifying the nested and repeated columns in the example

To create a new table with the previous nested and repeated columns, select one of the following options:

Console

Specify the nested and repeated addresses column:

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. In the left pane, click Explorer:

    Highlighted button for the Explorer pane.

    If you don't see the left pane, click Expand left pane to open the pane.

  3. In the Explorer pane, expand your project, click Datasets, and then select a dataset.

  4. In the details pane, click Create table.

  5. On the Create table page, specify the following details:

    • For Source, in the Create table from field, select Empty table.
    • In the Destination section, specify the following fields:

      • For Dataset, select the dataset in which you want to create the table.
      • For Table, enter the name of the table that you want to create.
    • For Schema, click Add field and enter the following table schema:

      • For Field name, enter addresses.
      • For Type, select RECORD.
      • For Mode, choose REPEATED. Addresses schema
      • Specify the following fields for a nested field:

        • In the Field name field, enter status.
        • For Type, choose STRING.
        • For Mode, leave the value set to NULLABLE.
        • Click Add field to add the following fields:

          Field name Type Mode
          address STRING NULLABLE
          city STRING NULLABLE
          state STRING NULLABLE
          zip STRING NULLABLE
          numberOfYears STRING NULLABLE

        Alternatively, click Edit as text and specify the schema as a JSON array.

SQL

Use the CREATE TABLE statement. Specify the schema using the column option:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE TABLE IF NOT EXISTS mydataset.mytable (
      id STRING,
      first_name STRING,
      last_name STRING,
      dob DATE,
      addresses
        ARRAY<
          STRUCT<
            status STRING,
            address STRING,
            city STRING,
            state STRING,
            zip STRING,
            numberOfYears STRING>>
    ) OPTIONS (
        description = 'Example name and addresses table');

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

bq

To specify the nested and repeated addresses column in a JSON schema file, use a text editor to create a new file. Paste in the example schema definition shown above.

After you create your JSON schema file, you can provide it through the bq command-line tool. For more information, see Using a JSON schema file.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/bigquery"
)

// createTableComplexSchema demonstrates creating a BigQuery table and specifying a complex schema that includes
// an array of Struct types.
func createTableComplexSchema(w io.Writer, projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydatasetid"
	// tableID := "mytableid"
	ctx := context.Background()

	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	sampleSchema := bigquery.Schema{
		{Name: "id", Type: bigquery.StringFieldType},
		{Name: "first_name", Type: bigquery.StringFieldType},
		{Name: "last_name", Type: bigquery.StringFieldType},
		{Name: "dob", Type: bigquery.DateFieldType},
		{Name: "addresses",
			Type:     bigquery.RecordFieldType,
			Repeated: true,
			Schema: bigquery.Schema{
				{Name: "status", Type: bigquery.StringFieldType},
				{Name: "address", Type: bigquery.StringFieldType},
				{Name: "city", Type: bigquery.StringFieldType},
				{Name: "state", Type: bigquery.StringFieldType},
				{Name: "zip", Type: bigquery.StringFieldType},
				{Name: "numberOfYears", Type: bigquery.StringFieldType},
			}},
	}

	metaData := &bigquery.TableMetadata{
		Schema: sampleSchema,
	}
	tableRef := client.Dataset(datasetID).Table(tableID)
	if err := tableRef.Create(ctx, metaData); err != nil {
		return err
	}
	fmt.Fprintf(w, "created table %s\n", tableRef.FullyQualifiedName())
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.