- 2.24.0 (latest)
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 2.0.0-dev0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
Session(
context: typing.Optional[bigframes._config.bigquery_options.BigQueryOptions] = None,
clients_provider: typing.Optional[bigframes.session.clients.ClientsProvider] = None,
)
Establishes a BigQuery connection to capture a group of job activities related to DataFrames.
Parameters |
|
---|---|
Name | Description |
context |
bigframes._config.bigquery_options.BigQueryOptions
Configuration adjusting how to connect to BigQuery and related APIs. Note that some options are ignored if |
clients_provider |
bigframes.session.bigframes.session.clients.ClientsProvider
An object providing client library objects. |
Properties
bqclient
API documentation for bqclient
property.
bqconnectionclient
API documentation for bqconnectionclient
property.
bqstoragereadclient
API documentation for bqstoragereadclient
property.
cloudfunctionsclient
API documentation for cloudfunctionsclient
property.
resourcemanagerclient
API documentation for resourcemanagerclient
property.
Methods
close
close()
Terminated the BQ session, otherwises the session will be terminated automatically after 24 hours of inactivity or after 7 days.
read_csv
read_csv(
filepath_or_buffer: str | IO["bytes"],
*,
sep: Optional[str] = ",",
header: Optional[int] = 0,
names: Optional[
Union[MutableSequence[Any], np.ndarray[Any, Any], Tuple[Any, ...], range]
] = None,
index_col: Optional[
Union[int, str, Sequence[Union[str, int]], Literal[False]]
] = None,
usecols: Optional[
Union[
MutableSequence[str],
Tuple[str, ...],
Sequence[int],
pandas.Series,
pandas.Index,
np.ndarray[Any, Any],
Callable[[Any], bool],
]
] = None,
dtype: Optional[Dict] = None,
engine: Optional[
Literal["c", "python", "pyarrow", "python-fwf", "bigquery"]
] = None,
encoding: Optional[str] = None,
**kwargs
) -> dataframe.DataFrame
Loads DataFrame from comma-separated values (csv) file locally or from Cloud Storage.
The CSV file data will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.
Examples:>>> import <xref uid="bigframes.pandas">bigframes.pandas</xref> as bpd
>>> bpd.options.display.progress_bar = None
>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.csv"
>>> df = bpd.read_csv(filepath_or_buffer=gcs_path)
>>> df.head(2)
name post_abbr
0 Alabama AL
1 Alaska AK
<BLANKLINE>
[2 rows x 2 columns]
Parameters | |
---|---|
Name | Description |
filepath_or_buffer |
str
A local or Google Cloud Storage ( |
sep |
Optional[str], default ","
the separator for fields in a CSV file. For the BigQuery engine, the separator can be any ISO-8859-1 single-byte character. To use a character in the range 128-255, you must encode the character as UTF-8. Both engines support |
header |
Optional[int], default 0
row number to use as the column names. - |
names |
default None
a list of column names to use. If the file contains a header row and you want to pass this parameter, then |
index_col |
default None
column(s) to use as the row labels of the DataFrame, either given as string name or column index. |
usecols |
default None
List of column names to use): The BigQuery engine only supports having a list of string column names. Column indices and callable functions are only supported with the default engine. Using the default engine, the column names in |
dtype |
data type for data or columns
Data type for data or columns. Only to be used with default engine. |
engine |
Optional[Dict], default None
Type of engine to use. If |
encoding |
Optional[str], default to None
encoding the character encoding of the data. The default encoding is |
Returns | |
---|---|
Type | Description |
bigframes.dataframe.DataFrame |
A BigQuery DataFrames. |
read_gbq
read_gbq(
query_or_table: str,
*,
index_col: Iterable[str] | str = (),
col_order: Iterable[str] = (),
max_results: Optional[int] = None
) -> dataframe.DataFrame
Loads a DataFrame from BigQuery.
BigQuery tables are an unordered, unindexed data source. By default, the DataFrame will have an arbitrary index and ordering.
Set the index_col
argument to one or more columns to choose an
index. The resulting DataFrame is sorted by the index columns. For the
best performance, ensure the index columns don't contain duplicate
values.
GENERATE_UUID() AS
rowindex
in your SQL and set index_col='rowindex'
for the
best performance.
Examples:
>>> import <xref uid="bigframes.pandas">bigframes.pandas</xref> as bpd
>>> bpd.options.display.progress_bar = None
If the input is a table ID:
>>> df = bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")
>>> df.head(2)
species island culmen_length_mm \
0 Adelie Penguin (Pygoscelis adeliae) Dream 36.6
1 Adelie Penguin (Pygoscelis adeliae) Dream 39.8
<BLANKLINE>
culmen_depth_mm flipper_length_mm body_mass_g sex
0 18.4 184.0 3475.0 FEMALE
1 19.1 184.0 4650.0 MALE
<BLANKLINE>
[2 rows x 7 columns]
Preserve ordering in a query input.
>>> df = bpd.read_gbq('''
... SELECT
... -- Instead of an ORDER BY clause on the query, use
... -- ROW_NUMBER() to create an ordered DataFrame.
... ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
... AS rowindex,
...
... pitcherFirstName,
... pitcherLastName,
... AVG(pitchSpeed) AS averagePitchSpeed
... FROM `bigquery-public-data.baseball.games_wide`
... WHERE year = 2016
... GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
pitcherFirstName pitcherLastName averagePitchSpeed
rowindex
1 Albertin Chapman 96.514113
2 Zachary Britton 94.591039
<BLANKLINE>
[2 rows x 3 columns]
Parameters | |
---|---|
Name | Description |
query_or_table |
str
A SQL string to be executed or a BigQuery table to be read. The table must be specified in the format of |
index_col |
Iterable[str] or str
Name of result column(s) to use for index in results DataFrame. |
col_order |
Iterable[str]
List of BigQuery column names in the desired order for results DataFrame. |
max_results |
Optional[int], default None
If set, limit the maximum number of rows to fetch from the query results. |
Returns | |
---|---|
Type | Description |
|