Class Session (0.11.0)

Session(
    context: typing.Optional[bigframes._config.bigquery_options.BigQueryOptions] = None,
    clients_provider: typing.Optional[bigframes.session.clients.ClientsProvider] = None,
)

Establishes a BigQuery connection to capture a group of job activities related to DataFrames.

Parameters

Name Description
context bigframes._config.bigquery_options.BigQueryOptions

Configuration adjusting how to connect to BigQuery and related APIs. Note that some options are ignored if clients_provider is set.

clients_provider bigframes.session.bigframes.session.clients.ClientsProvider

An object providing client library objects.

Properties

bqclient

API documentation for bqclient property.

bqconnectionclient

API documentation for bqconnectionclient property.

bqstoragereadclient

API documentation for bqstoragereadclient property.

cloudfunctionsclient

API documentation for cloudfunctionsclient property.

resourcemanagerclient

API documentation for resourcemanagerclient property.

Methods

close

close()

Terminated the BQ session, otherwises the session will be terminated automatically after 24 hours of inactivity or after 7 days.

read_csv

read_csv(
    filepath_or_buffer: str | IO["bytes"],
    *,
    sep: Optional[str] = ",",
    header: Optional[int] = 0,
    names: Optional[
        Union[MutableSequence[Any], np.ndarray[Any, Any], Tuple[Any, ...], range]
    ] = None,
    index_col: Optional[
        Union[int, str, Sequence[Union[str, int]], Literal[False]]
    ] = None,
    usecols: Optional[
        Union[
            MutableSequence[str],
            Tuple[str, ...],
            Sequence[int],
            pandas.Series,
            pandas.Index,
            np.ndarray[Any, Any],
            Callable[[Any], bool],
        ]
    ] = None,
    dtype: Optional[Dict] = None,
    engine: Optional[
        Literal["c", "python", "pyarrow", "python-fwf", "bigquery"]
    ] = None,
    encoding: Optional[str] = None,
    **kwargs
) -> dataframe.DataFrame

Loads DataFrame from comma-separated values (csv) file locally or from Cloud Storage.

The CSV file data will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.

Examples:

>>> import <xref uid="bigframes.pandas">bigframes.pandas</xref> as bpd
>>> bpd.options.display.progress_bar = None

>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.csv"
>>> df = bpd.read_csv(filepath_or_buffer=gcs_path)
>>> df.head(2)
      name post_abbr
0  Alabama        AL
1   Alaska        AK
<BLANKLINE>
[2 rows x 2 columns]
Parameters
Name Description
filepath_or_buffer str

A local or Google Cloud Storage (gs://) path with engine="bigquery" otherwise passed to pandas.read_csv.

sep Optional[str], default ","

the separator for fields in a CSV file. For the BigQuery engine, the separator can be any ISO-8859-1 single-byte character. To use a character in the range 128-255, you must encode the character as UTF-8. Both engines support sep=" " to specify tab character as separator. Default engine supports having any number of spaces as separator by specifying sep="\s+". Separators longer than 1 character are interpreted as regular expressions by the default engine. BigQuery engine only supports single character separators.

header Optional[int], default 0

row number to use as the column names. - None: Instructs autodetect that there are no headers and data should be read starting from the first row. - 0: If using engine="bigquery", Autodetect tries to detect headers in the first row. If they are not detected, the row is read as data. Otherwise data is read starting from the second row. When using default engine, pandas assumes the first row contains column names unless the names argument is specified. If names is provided, then the first row is ignored, second row is read as data, and column names are inferred from names. - N > 0: If using engine="bigquery", Autodetect skips N rows and tries to detect headers in row N+1. If headers are not detected, row N+1 is just skipped. Otherwise row N+1 is used to extract column names for the detected schema. When using default engine, pandas will skip N rows and assumes row N+1 contains column names unless the names argument is specified. If names is provided, row N+1 will be ignored, row N+2 will be read as data, and column names are inferred from names.

names default None

a list of column names to use. If the file contains a header row and you want to pass this parameter, then header=0 should be passed as well so the first (header) row is ignored. Only to be used with default engine.

index_col default None

column(s) to use as the row labels of the DataFrame, either given as string name or column index. index_col=False can be used with the default engine only to enforce that the first column is not used as the index. Using column index instead of column name is only supported with the default engine. The BigQuery engine only supports having a single column name as the index_col. Neither engine supports having a multi-column index.

usecols default None

List of column names to use): The BigQuery engine only supports having a list of string column names. Column indices and callable functions are only supported with the default engine. Using the default engine, the column names in usecols can be defined to correspond to column names provided with the names parameter (ignoring the document's header row of column names). The order of the column indices/names in usecols is ignored with the default engine. The order of the column names provided with the BigQuery engine will be consistent in the resulting dataframe. If using a callable function with the default engine, only column names that evaluate to True by the callable function will be in the resulting dataframe.

dtype data type for data or columns

Data type for data or columns. Only to be used with default engine.

engine Optional[Dict], default None

Type of engine to use. If engine="bigquery" is specified, then BigQuery's load API will be used. Otherwise, the engine will be passed to pandas.read_csv.

encoding Optional[str], default to None

encoding the character encoding of the data. The default encoding is UTF-8 for both engines. The default engine acceps a wide range of encodings. Refer to Python documentation for a comprehensive list, https://docs.python.org/3/library/codecs.html#standard-encodings The BigQuery engine only supports UTF-8 and ISO-8859-1.

Returns
Type Description
bigframes.dataframe.DataFrame A BigQuery DataFrames.

read_gbq

read_gbq(
    query_or_table: str,
    *,
    index_col: Iterable[str] | str = (),
    col_order: Iterable[str] = (),
    max_results: Optional[int] = None
) -> dataframe.DataFrame

Loads a DataFrame from BigQuery.

BigQuery tables are an unordered, unindexed data source. By default, the DataFrame will have an arbitrary index and ordering.

Set the index_col argument to one or more columns to choose an index. The resulting DataFrame is sorted by the index columns. For the best performance, ensure the index columns don't contain duplicate values.

If your query doesn't have an ordering, select GENERATE_UUID() AS rowindex in your SQL and set index_col='rowindex' for the best performance.

Examples:

>>> import <xref uid="bigframes.pandas">bigframes.pandas</xref> as bpd
>>> bpd.options.display.progress_bar = None

If the input is a table ID:

>>> df = bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")
>>> df.head(2)
                                     species island  culmen_length_mm  \
0        Adelie Penguin (Pygoscelis adeliae)  Dream              36.6
1        Adelie Penguin (Pygoscelis adeliae)  Dream              39.8
<BLANKLINE>
   culmen_depth_mm  flipper_length_mm  body_mass_g     sex
0             18.4              184.0       3475.0  FEMALE
1             19.1              184.0       4650.0    MALE
<BLANKLINE>
[2 rows x 7 columns]

Preserve ordering in a query input.

>>> df = bpd.read_gbq('''
...    SELECT
...       -- Instead of an ORDER BY clause on the query, use
...       -- ROW_NUMBER() to create an ordered DataFrame.
...       ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
...         AS rowindex,
...
...       pitcherFirstName,
...       pitcherLastName,
...       AVG(pitchSpeed) AS averagePitchSpeed
...     FROM `bigquery-public-data.baseball.games_wide`
...     WHERE year = 2016
...     GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
         pitcherFirstName pitcherLastName  averagePitchSpeed
rowindex
1                Albertin         Chapman          96.514113
2                 Zachary         Britton          94.591039
<BLANKLINE>
[2 rows x 3 columns]
Parameters
Name Description
query_or_table str

A SQL string to be executed or a BigQuery table to be read. The table must be specified in the format of project.dataset.tablename or dataset.tablename.

index_col Iterable[str] or str

Name of result column(s) to use for index in results DataFrame.

col_order Iterable[str]

List of BigQuery column names in the desired order for results DataFrame.

max_results Optional[int], default None

If set, limit the maximum number of rows to fetch from the query results.

Returns
Type Description