Class Session (2.29.0)

Session(
    context: typing.Optional[bigframes._config.bigquery_options.BigQueryOptions] = None,
    clients_provider: typing.Optional[bigframes.session.clients.ClientsProvider] = None,
)

Establishes a BigQuery connection to capture a group of job activities related to DataFrames.

Properties

MultiIndex

Constructs a MultiIndex.

Included for compatibility between bpd and Session.

See <xref uid="bigframes.pandas">bigframes.pandas</xref>.MulitIndex for full documentation.

bqclient

API documentation for bqclient property.

bqconnectionclient

API documentation for bqconnectionclient property.

bqconnectionmanager

API documentation for bqconnectionmanager property.

bqstoragereadclient

API documentation for bqstoragereadclient property.

bytes_processed_sum

The sum of all bytes processed by bigquery jobs using this session.

cloudfunctionsclient

API documentation for cloudfunctionsclient property.

objects

API documentation for objects property.

options

Options for configuring BigQuery DataFrames.

Included for compatibility between bpd and Session.

resourcemanagerclient

API documentation for resourcemanagerclient property.

session_id

API documentation for session_id property.

slot_millis_sum

The sum of all slot time used by bigquery jobs in this session.

Methods

DataFrame

DataFrame(*args, **kwargs)

Constructs a DataFrame.

Included for compatibility between bpd and Session.

See <xref uid="bigframes.pandas.DataFrame">bigframes.pandas.DataFrame</xref> for full documentation.

Index

Index(*args, **kwargs)

Constructs a Index.

Included for compatibility between bpd and Session.

See <xref uid="bigframes.pandas.Index">bigframes.pandas.Index</xref> for full documentation.

Series

Series(*args, **kwargs)

Constructs a Series.

Included for compatibility between bpd and Session.

See <xref uid="bigframes.pandas.Series">bigframes.pandas.Series</xref> for full documentation.

del

__del__()

Automatic cleanup of internal resources.

enter

__enter__()

Enter the runtime context of the Session object.

See With Statement Context Managers for more details.

exit

__exit__(*_)

Exit the runtime context of the Session object.

See With Statement Context Managers for more details.

close

close()

Delete resources that were created with this session's session_id. This includes BigQuery tables, remote functions and cloud functions serving the remote functions.

cut

cut(*args, **kwargs) -> bigframes.series.Series

Cuts a BigQuery DataFrames object.

Included for compatibility between bpd and Session.

See <xref uid="bigframes.pandas.cut">bigframes.pandas.cut</xref> for full documentation.

deploy_remote_function

deploy_remote_function(func, **kwargs)

Orchestrates the creation of a BigQuery remote function that deploys immediately.

This method ensures that the remote function is created and available for use in BigQuery as soon as this call is made.

deploy_udf

deploy_udf(func, **kwargs)

Orchestrates the creation of a BigQuery UDF that deploys immediately.

This method ensures that the UDF is created and available for use in BigQuery as soon as this call is made.

from_glob_path

from_glob_path(
    path: str, *, connection: Optional[str] = None, name: Optional[str] = None
) -> dataframe.DataFrame

Create a BigFrames DataFrame that contains a BigFrames Blob column from a global wildcard path. This operation creates a temporary BQ Object Table under the hood and requires bigquery.connections.delegate permission or BigQuery Connection Admin role. If you have an existing BQ Object Table, use read_gbq_object_table().

Returns
Type	Description
`bigframes.pandas.DataFrame`	Result BigFrames DataFrame.

read_arrow

read_arrow(pa_table: pyarrow.lib.Table) -> bigframes.dataframe.DataFrame

Load a PyArrow Table to a BigQuery DataFrames DataFrame.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	A new DataFrame representing the data from the PyArrow table.

read_csv

read_csv(
    filepath_or_buffer: str | IO["bytes"],
    *,
    sep: Optional[str] = ",",
    header: Optional[int] = 0,
    names: Optional[
        Union[MutableSequence[Any], np.ndarray[Any, Any], Tuple[Any, ...], range]
    ] = None,
    index_col: Optional[
        Union[
            int,
            str,
            Sequence[Union[str, int]],
            bigframes.enums.DefaultIndexKind,
            Literal[False],
        ]
    ] = None,
    usecols: Optional[
        Union[
            MutableSequence[str],
            Tuple[str, ...],
            Sequence[int],
            pandas.Series,
            pandas.Index,
            np.ndarray[Any, Any],
            Callable[[Any], bool],
        ]
    ] = None,
    dtype: Optional[Dict] = None,
    engine: Optional[
        Literal["c", "python", "pyarrow", "python-fwf", "bigquery"]
    ] = None,
    encoding: Optional[str] = None,
    write_engine: constants.WriteEngineType = "default",
    **kwargs
) -> dataframe.DataFrame

Loads data from a comma-separated values (csv) file into a DataFrame.

The CSV file data will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.

Note: using engine="bigquery" will not guarantee the same ordering as the file. Instead, set a serialized index column as the index and sort by that in the resulting DataFrame. Only files stored on your local machine or in Google Cloud Storage are supported.

Examples:

>>> import bigframes.pandas as bpd

>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.csv"
>>> df = bpd.read_csv(filepath_or_buffer=gcs_path)
>>> df.head(2)
      name post_abbr
0  Alabama        AL
1   Alaska        AK
<BLANKLINE>
[2 rows x 2 columns]

Exceptions
Type	Description
`bigframes.exceptions.DefaultIndexWarning`	Using the default index is discouraged, such as with clustered or partitioned tables without primary keys.

Returns
Type	Description
`bigframes.pandas.DataFrame`	A BigQuery DataFrames.

read_gbq

Loads a DataFrame from BigQuery.

BigQuery tables are an unordered, unindexed data source. To add support pandas-compatibility, the following indexing options are supported via the index_col parameter:

(Empty iterable, default) A default index. Behavior may change. Explicitly set index_col if your application makes use of specific index values.

If a table has primary key(s), those are used as the index, otherwise a sequential index is generated.
(<xref uid="bigframes.enums.DefaultIndexKind.SEQUENTIAL_INT64">bigframes.enums.DefaultIndexKind.SEQUENTIAL_INT64</xref>) Add an arbitrary sequential index and ordering. Warning This uses an analytic windowed operation that prevents filtering push down. Avoid using on large clustered or partitioned tables.
(Recommended) Set the index_col argument to one or more columns. Unique values for the row labels are recommended. Duplicate labels are possible, but note that joins on a non-unique index can duplicate rows via pandas-compatible outer join behavior.

Note: By default, even SQL query inputs with an ORDER BY clause create a DataFrame with an arbitrary ordering. Use

row_number() OVER
(ORDER BY ...) AS rowindex

in your SQL query and set index_col='rowindex' to preserve the desired ordering.

If your query doesn't have an ordering, select

GENERATE_UUID() AS
    rowindex

in your SQL and set index_col='rowindex' for the best performance.

Examples:

>>> import bigframes.pandas as bpd

If the input is a table ID:

>>> df = bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")

Read table path with wildcard suffix and filters:

>>> df = bpd.read_gbq_table("bigquery-public-data.noaa_gsod.gsod19*", filters=[("_table_suffix", ">=", "30"), ("_table_suffix", "<=", "39")])

Preserve ordering in a query input.

>>> df = bpd.read_gbq('''
...    SELECT
...       -- Instead of an ORDER BY clause on the query, use
...       -- ROW_NUMBER() to create an ordered DataFrame.
...       ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
...         AS rowindex,
...
...       pitcherFirstName,
...       pitcherLastName,
...       AVG(pitchSpeed) AS averagePitchSpeed
...     FROM `bigquery-public-data.baseball.games_wide`
...     WHERE year = 2016
...     GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
         pitcherFirstName pitcherLastName  averagePitchSpeed
rowindex
1                Albertin         Chapman          96.514113
2                 Zachary         Britton          94.591039
<BLANKLINE>
[2 rows x 3 columns]

Reading data with columns and filters parameters:

>>> columns = ['pitcherFirstName', 'pitcherLastName', 'year', 'pitchSpeed']
>>> filters = [('year', '==', 2016), ('pitcherFirstName', 'in', ['John', 'Doe']), ('pitcherLastName', 'in', ['Gant']), ('pitchSpeed', '>', 94)]
>>> df = bpd.read_gbq(
...             "bigquery-public-data.baseball.games_wide",
...             columns=columns,
...             filters=filters,
...         )
>>> df.head(1)
  pitcherFirstName pitcherLastName  year  pitchSpeed
0             John            Gant  2016          95
<BLANKLINE>
[1 rows x 4 columns]

Exceptions
Type	Description
`bigframes.exceptions.DefaultIndexWarning`	Using the default index is discouraged, such as with clustered or partitioned tables without primary keys.
`ValueError`	When both `columns` and `col_order` are specified.
`ValueError`	If `configuration` is specified when directly reading from a table.

Returns
Type	Description
`bigframes.pandas.DataFrame`	A DataFrame representing results of the query or table.

read_gbq_function

read_gbq_function(function_name: str, is_row_processor: bool = False)

Loads a BigQuery function from BigQuery.

Then it can be applied to a DataFrame or Series.

BigQuery Utils provides many public functions under the bqutil project on Google Cloud Platform project (See: https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs#using-the-udfs). You can checkout Community UDFs to use community-contributed functions. (See: https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs/community#community-udfs).

Examples:

Use the cw_lower_case_ascii_only function from Community UDFs.

>>> import bigframes.pandas as bpd
>>> func = bpd.read_gbq_function("bqutil.fn.cw_lower_case_ascii_only")

You can run it on scalar input. Usually you would do so to verify that it works as expected before applying to all values in a Series.

>>> func('AURÉLIE')
'aurÉlie'

You can apply it to a BigQuery DataFrames Series.

>>> df = bpd.DataFrame({'id': [1, 2, 3], 'name': ['AURÉLIE', 'CÉLESTINE', 'DAPHNÉ']})
>>> df
   id       name
0   1    AURÉLIE
1   2  CÉLESTINE
2   3     DAPHNÉ
<BLANKLINE>
[3 rows x 2 columns]

>>> df1 = df.assign(new_name=df['name'].apply(func))
>>> df1
   id       name   new_name
0   1    AURÉLIE    aurÉlie
1   2  CÉLESTINE  cÉlestine
2   3     DAPHNÉ     daphnÉ
<BLANKLINE>
[3 rows x 3 columns]

You can even use a function with multiple inputs. For example, cw_regexp_replace_5 from Community UDFs.

>>> func = bpd.read_gbq_function("bqutil.fn.cw_regexp_replace_5")
>>> func('TestStr123456', 'Str', 'Cad$', 1, 1)
'TestCad$123456'

>>> df = bpd.DataFrame({
...     "haystack" : ["TestStr123456", "TestStr123456Str", "TestStr123456Str"],
...     "regexp" : ["Str", "Str", "Str"],
...     "replacement" : ["Cad$", "Cad$", "Cad$"],
...     "offset" : [1, 1, 1],
...     "occurrence" : [1, 2, 1]
... })
>>> df
           haystack regexp replacement  offset  occurrence
0     TestStr123456    Str        Cad$       1           1
1  TestStr123456Str    Str        Cad$       1           2
2  TestStr123456Str    Str        Cad$       1           1
<BLANKLINE>
[3 rows x 5 columns]
>>> df.apply(func, axis=1)
0       TestCad$123456
1    TestStr123456Cad$
2    TestCad$123456Str
dtype: string

Another use case is to define your own remote function and use it later. For example, define the remote function:

>>> @bpd.remote_function(cloud_function_service_account="default")  # doctest: +SKIP
... def tenfold(num: int) -> float:
...     return num * 10

Then, read back the deployed BQ remote function:

>>> tenfold_ref = bpd.read_gbq_function(  # doctest: +SKIP
...     tenfold.bigframes_remote_function,
... )

>>> df = bpd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
>>> df
    a   b   c
0   1   3   5
1   2   4   6
<BLANKLINE>
[2 rows x 3 columns]

>>> df['a'].apply(tenfold_ref)  # doctest: +SKIP
0    10.0
1    20.0
Name: a, dtype: Float64

It also supports row processing by using is_row_processor=True. Please note, row processor implies that the function has only one input parameter.

>>> @bpd.remote_function(cloud_function_service_account="default")  # doctest: +SKIP
... def row_sum(s: pd.Series) -> float:
...     return s['a'] + s['b'] + s['c']

>>> row_sum_ref = bpd.read_gbq_function(  # doctest: +SKIP
...     row_sum.bigframes_remote_function,
...     is_row_processor=True,
... )

>>> df = bpd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
>>> df
    a   b   c
0   1   3   5
1   2   4   6
<BLANKLINE>
[2 rows x 3 columns]

>>> df.apply(row_sum_ref, axis=1)  # doctest: +SKIP
0     9.0
1    12.0
dtype: Float64

Returns
Type	Description
`collections.abc.Callable`	A function object pointing to the BigQuery function read from BigQuery. The object is similar to the one created by the `remote_function` decorator, including the `bigframes_remote_function` property, but not including the `bigframes_cloud_function` property.

read_gbq_model

read_gbq_model(model_name: str)

Loads a BigQuery ML model from BigQuery.

Examples:

Read an existing BigQuery ML model.

>>> import bigframes.pandas as bpd
>>> model_name = "bigframes-dev.bqml_tutorial.penguins_model"
>>> model = bpd.read_gbq_model(model_name)

read_gbq_object_table

read_gbq_object_table(
    object_table: str, *, name: Optional[str] = None
) -> dataframe.DataFrame

Read an existing object table to create a BigFrames Blob DataFrame. Use the connection of the object table for the connection of the blob. This function dosen't retrieve the object table data. If you want to read the data, use read_gbq() instead.

Returns
Type	Description
`bigframes.pandas.DataFrame`	Result BigFrames DataFrame.

read_gbq_query

Turn a SQL query into a DataFrame.

Note: Because the results are written to a temporary table, ordering by ORDER BY is not preserved. A unique index_col is recommended. Use row_number() over () if there is no natural unique index or you want to preserve ordering.

Examples:

Simple query input:

>>> import bigframes.pandas as bpd
>>> df = bpd.read_gbq_query('''
...    SELECT
...       pitcherFirstName,
...       pitcherLastName,
...       pitchSpeed,
...    FROM `bigquery-public-data.baseball.games_wide`
... ''')

Preserve ordering in a query input.

>>> df = bpd.read_gbq_query('''
...    SELECT
...       -- Instead of an ORDER BY clause on the query, use
...       -- ROW_NUMBER() to create an ordered DataFrame.
...       ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
...         AS rowindex,
...
...       pitcherFirstName,
...       pitcherLastName,
...       AVG(pitchSpeed) AS averagePitchSpeed
...     FROM `bigquery-public-data.baseball.games_wide`
...     WHERE year = 2016
...     GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
         pitcherFirstName pitcherLastName  averagePitchSpeed
rowindex
1                Albertin         Chapman          96.514113
2                 Zachary         Britton          94.591039
<BLANKLINE>
[2 rows x 3 columns]

read_gbq_table

Turn a BigQuery table into a DataFrame.

Examples:

Read a whole table, with arbitrary ordering or ordering corresponding to the primary key(s).

>>> import bigframes.pandas as bpd
>>> df = bpd.read_gbq_table("bigquery-public-data.ml_datasets.penguins")

read_gbq_table_streaming

read_gbq_table_streaming(table: str) -> streaming_dataframe.StreamingDataFrame

Turn a BigQuery table into a StreamingDataFrame.

import bigframes.streaming as bst

sdf = bst.read_gbq_table("bigquery-public-data.ml_datasets.penguins")

Returns
Type	Description
`bigframes.streaming.dataframe.StreamingDataFrame`	A StreamingDataFrame representing results of the table.

read_json

read_json(
    path_or_buf: str | IO["bytes"],
    *,
    orient: Literal[
        "split", "records", "index", "columns", "values", "table"
    ] = "columns",
    dtype: Optional[Dict] = None,
    encoding: Optional[str] = None,
    lines: bool = False,
    engine: Literal["ujson", "pyarrow", "bigquery"] = "ujson",
    write_engine: constants.WriteEngineType = "default",
    **kwargs
) -> dataframe.DataFrame

Convert a JSON string to DataFrame object.

Note: using engine="bigquery" will not guarantee the same ordering as the file. Instead, set a serialized index column as the index and sort by that in the resulting DataFrame.

Examples:

>>> import bigframes.pandas as bpd

>>> gcs_path = "gs://bigframes-dev-testing/sample1.json"
>>> df = bpd.read_json(path_or_buf=gcs_path, lines=True, orient="records")
>>> df.head(2)
   id   name
0   1  Alice
1   2    Bob
<BLANKLINE>
[2 rows x 2 columns]

Exceptions
Type	Description
`bigframes.exceptions.DefaultIndexWarning`	Using the default index is discouraged, such as with clustered or partitioned tables without primary keys.
`ValueError`	`lines` is only valid when `orient` is `records`.

Returns
Type	Description
`bigframes.pandas.DataFrame`	The DataFrame representing JSON contents.

read_pandas

Loads DataFrame from a pandas DataFrame.

The pandas DataFrame will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.

Examples:

>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> pandas_df = pd.DataFrame(data=d)
>>> df = bpd.read_pandas(pandas_df)
>>> df
   col1  col2
0     1     3
1     2     4
<BLANKLINE>
[2 rows x 2 columns]

Exceptions
Type	Description
`ValueError`	When the object is not a Pandas DataFrame.

read_parquet

read_parquet(
    path: str | IO["bytes"],
    *,
    engine: str = "auto",
    write_engine: constants.WriteEngineType = "default"
) -> dataframe.DataFrame

Load a Parquet object from the file path (local or Cloud Storage), returning a DataFrame.

Examples:

>>> import bigframes.pandas as bpd

>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.parquet"
>>> df = bpd.read_parquet(path=gcs_path, engine="bigquery")

Returns
Type	Description
`bigframes.pandas.DataFrame`	A BigQuery DataFrames.

read_pickle

read_pickle(
    filepath_or_buffer: FilePath | ReadPickleBuffer,
    compression: CompressionOptions = "infer",
    storage_options: StorageOptions = None,
    *,
    write_engine: constants.WriteEngineType = "default"
)

Load pickled BigFrames object (or any object) from file.

Examples:

>>> import bigframes.pandas as bpd

>>> gcs_path = "gs://bigframes-dev-testing/test_pickle.pkl"
>>> df = bpd.read_pickle(filepath_or_buffer=gcs_path)

Returns
Type	Description
`bigframes.pandas.DataFrame or bigframes.pandas.Series`	same type as object stored in file.

remote_function

remote_function(
    input_types: typing.Union[None, type, typing.Sequence[type]] = None,
    output_type: typing.Optional[type] = None,
    dataset: typing.Optional[str] = None,
    *,
    bigquery_connection: typing.Optional[str] = None,
    reuse: bool = True,
    name: typing.Optional[str] = None,
    packages: typing.Optional[typing.Sequence[str]] = None,
    cloud_function_service_account: str,
    cloud_function_kms_key_name: typing.Optional[str] = None,
    cloud_function_docker_repository: typing.Optional[str] = None,
    max_batching_rows: typing.Optional[int] = 1000,
    cloud_function_timeout: typing.Optional[int] = 600,
    cloud_function_max_instances: typing.Optional[int] = None,
    cloud_function_vpc_connector: typing.Optional[str] = None,
    cloud_function_vpc_connector_egress_settings: typing.Optional[
        typing.Literal["all", "private-ranges-only", "unspecified"]
    ] = None,
    cloud_function_memory_mib: typing.Optional[int] = 1024,
    cloud_function_ingress_settings: typing.Literal[
        "all", "internal-only", "internal-and-gclb"
    ] = "internal-only",
    cloud_build_service_account: typing.Optional[str] = None
)

Decorator to turn a user defined function into a BigQuery remote function. Check out the code samples at: https://cloud.google.com/bigquery/docs/remote-functions#bigquery-dataframes.

Note: input_types=Series scenario is in preview. It currently only supports dataframe with column types Int64/Float64/boolean/ string/binary[pyarrow].

Warning: To use remote functions with Bigframes 2.0 and onwards, please (preferred) set an explicit user-managed cloud_function_service_account or (discouraged) set cloud_function_service_account to use the Compute Engine service account by setting it to "default".

See, https://cloud.google.com/functions/docs/securing/function-identity.

Have the below APIs enabled for your project:
- BigQuery Connection API
- Cloud Functions API
- Cloud Run API
- Cloud Build API
- Artifact Registry API
- Cloud Resource Manager API
This can be done from the cloud console (change PROJECT_ID to yours): https://console.cloud.google.com/apis/enableflow?apiid=bigqueryconnection.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com,cloudbuild.googleapis.com,artifactregistry.googleapis.com,cloudresourcemanager.googleapis.com&project=PROJECT_ID

Or from the gcloud CLI:

$ gcloud services enable bigqueryconnection.googleapis.com cloudfunctions.googleapis.com run.googleapis.com cloudbuild.googleapis.com artifactregistry.googleapis.com cloudresourcemanager.googleapis.com
Have following IAM roles enabled for you:
- BigQuery Data Editor (roles/bigquery.dataEditor)
- BigQuery Connection Admin (roles/bigquery.connectionAdmin)
- Cloud Functions Developer (roles/cloudfunctions.developer)
- Service Account User (roles/iam.serviceAccountUser) on the service account PROJECT_NUMBER-compute@developer.gserviceaccount.com
- Storage Object Viewer (roles/storage.objectViewer)
- Project IAM Admin (roles/resourcemanager.projectIamAdmin) (Only required if the bigquery connection being used is not pre-created and is created dynamically with user credentials.)
Either the user has setIamPolicy privilege on the project, or a BigQuery connection is pre-created with necessary IAM role set:
1. To create a connection, follow https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#create_a_connection
2. To set up IAM, follow https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#grant_permission_on_function
  
  Alternatively, the IAM could also be setup via the gcloud CLI:
  
  $ gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:CONNECTION_SERVICE_ACCOUNT_ID" --role="roles/run.invoker".

Returns
Type	Description
`collections.abc.Callable`	A remote function object pointing to the cloud assets created in the background to support the remote execution. The cloud assets can be located through the following properties set in the object: `bigframes_cloud_function` - The google cloud function deployed for the user defined code. `bigframes_remote_function` - The bigquery remote function capable of calling into `bigframes_cloud_function`.

to_datetime

to_datetime(
    *args, **kwargs
) -> typing.Union[
    pandas._libs.tslibs.timestamps.Timestamp, datetime.datetime, bigframes.series.Series
]

Converts a BigQuery DataFrames object to datetime dtype.

Included for compatibility between bpd and Session.

See <xref uid="bigframes.pandas.to_datetime">bigframes.pandas.to_datetime</xref> for full documentation.

to_timedelta

to_timedelta(*args, **kwargs)

Converts a BigQuery DataFrames object to timedelta/duration dtype.

Included for compatibility between bpd and Session.

See <xref uid="bigframes.pandas.to_timedelta">bigframes.pandas.to_timedelta</xref> for full documentation.

udf

udf(
    *,
    input_types: typing.Union[None, type, typing.Sequence[type]] = None,
    output_type: typing.Optional[type] = None,
    dataset: str,
    bigquery_connection: typing.Optional[str] = None,
    name: str,
    packages: typing.Optional[typing.Sequence[str]] = None,
    max_batching_rows: typing.Optional[int] = None,
    container_cpu: typing.Optional[float] = None,
    container_memory: typing.Optional[str] = None
)

Decorator to turn a Python user defined function (udf) into a BigQuery managed user-defined function.

Examples:

>>> import datetime

Turning an arbitrary python function into a BigQuery managed python udf:

>>> bq_name = datetime.datetime.now().strftime("bigframes_%Y%m%d%H%M%S%f")
>>> @bpd.udf(dataset="bigfranes_testing", name=bq_name)  # doctest: +SKIP
... def minutes_to_hours(x: int) -> float:
...     return x/60

>>> minutes = bpd.Series([0, 30, 60, 90, 120])
>>> minutes
0      0
1     30
2     60
3     90
4    120
dtype: Int64

>>> hours = minutes.apply(minutes_to_hours)  # doctest: +SKIP
>>> hours  # doctest: +SKIP
0    0.0
1    0.5
2    1.0
3    1.5
4    2.0
dtype: Float64

To turn a user defined function with external package dependencies into a BigQuery managed python udf, you would provide the names of the packages (optionally with the package version) via packages param.

>>> bq_name = datetime.datetime.now().strftime("bigframes_%Y%m%d%H%M%S%f")
>>> @bpd.udf(  # doctest: +SKIP
...     dataset="bigfranes_testing",
...     name=bq_name,
...     packages=["cryptography"]
... )
... def get_hash(input: str) -> str:
...     from cryptography.fernet import Fernet
...
...     # handle missing value
...     if input is None:
...         input = ""
...
...     key = Fernet.generate_key()
...     f = Fernet(key)
...     return f.encrypt(input.encode()).decode()

>>> names = bpd.Series(["Alice", "Bob"])
>>> hashes = names.apply(get_hash)  # doctest: +SKIP

You can clean-up the BigQuery functions created above using the BigQuery client from the BigQuery DataFrames session:

>>> session = bpd.get_global_session()  # doctest: +SKIP
>>> session.bqclient.delete_routine(minutes_to_hours.bigframes_bigquery_function)  # doctest: +SKIP
>>> session.bqclient.delete_routine(get_hash.bigframes_bigquery_function)  # doctest: +SKIP

Returns
Type	Description
`collections.abc.Callable`	A managed function object pointing to the cloud assets created in the background to support the remote execution. The cloud ssets can be located through the following properties set in the object: `bigframes_bigquery_function` - The bigquery managed function deployed for the user defined code.

Returns
Type	Description
`bigframes.pandas.DataFrame or pandas.Series`	A DataFrame representing the result of the query. If `dry_run` is `True`, a `pandas.Series` containing query statistics is returned.

Class Session (2.29.0) Stay organized with collections Save and categorize content based on your preferences.

Properties

MultiIndex

bqclient

bqconnectionclient

bqconnectionmanager

bqstoragereadclient

bytes_processed_sum

cloudfunctionsclient

objects

options

resourcemanagerclient

session_id

slot_millis_sum

Methods

DataFrame

Index

Series

__del__

__enter__

__exit__

close

cut

deploy_remote_function

deploy_udf

from_glob_path

read_arrow

read_csv

read_gbq

read_gbq_function

read_gbq_model

read_gbq_object_table

read_gbq_query

read_gbq_table

read_gbq_table_streaming

read_json

read_pandas

read_parquet

read_pickle

remote_function

to_datetime

to_timedelta

udf

Class Session (2.29.0)

del

enter

exit