Customize Python functions for BigQuery DataFrames
BigQuery DataFrames lets you turn your custom Python functions into BigQuery artifacts that you can run on BigQuery DataFrames objects at scale. This extensibility support lets you perform operations beyond what is possible with BigQuery DataFrames and SQL APIs, so you can potentially take advantage of open source libraries.
There are two variants of this extensibility mechanism: user-defined functions and remote functions.
Required roles
To get the permissions that you need to complete the tasks in this document, ask your administrator to grant you the following IAM roles on your project:
-
BigQuery Data Editor (
roles/bigquery.dataEditor) -
BigQuery Connection Admin (
roles/bigquery.connectionAdmin) -
Cloud Functions Developer (
roles/cloudfunctions.developer) -
Service Account User (
roles/iam.serviceAccountUser) -
Storage Object Viewer (
roles/storage.objectViewer)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
User-defined functions (UDFs)
With UDFs (Preview), you can turn your custom Python function into a Python UDF. For an example usage, see Create a persistent Python UDF.
Creating a UDF in BigQuery DataFrames creates a BigQuery routine as the Python UDF in the specified dataset. For a full set of supported parameters, see bigframes.pandas.udf.
Requirements
To use a BigQuery DataFrames UDF, enable the
BigQuery API
in your project. If you provide the bigquery_connection parameter in
your project, you must also enable the
BigQuery Connection API.
Clean up
In addition to cleaning up the cloud artifacts directly in the Google Cloud console
or with other tools, you can clean up the BigQuery DataFrames UDFs that
were created with an explicit name argument by using the
bigframes.pandas.get_global_session().bqclient.delete_routine(routine_id)
command.
Limitations
- The code in the UDF must be self-contained, meaning, it must not contain any references to an import or variable defined outside of the function body.
- The code in the UDF must be compatible with Python 3.11, as that is the environment in which the code is executed in the cloud.
- Re-running the UDF definition code after trivial changes in the function code—for example, renaming a variable or inserting a new line—causes the UDF to be re-created, even if these changes are inconsequential to the behavior of the function.
- The user code is visible to users with read access on the BigQuery routines, so you should include sensitive content only with caution.
- A project can have up to 1,000 Cloud Run functions at a time in a BigQuery location.
The BigQuery DataFrames UDF deploys a user-defined BigQuery Python function, and the related limitations apply.
Remote functions
BigQuery DataFrames lets you turn your custom scalar functions into BigQuery remote functions. For an example usage, see Create a remote function. For a full set of supported parameters, see remote_function.
Creating a remote function in BigQuery DataFrames creates the following:
- A Cloud Run function.
-
By default, a connection named
bigframes-default-connectionis used. You can use a pre-configured BigQuery connection if you prefer, in which case the connection creation is skipped. The service account for the default connection is granted the Cloud Run role (roles/run.invoker). A BigQuery remote function that uses the Cloud Run function that's been created with the BigQuery connection.
Requirements
To use BigQuery DataFrames remote functions, you must enable the following APIs:
- BigQuery API (
bigquery.googleapis.com) - BigQuery Connection API (
bigqueryconnection.googleapis.com) - Cloud Functions API (
cloudfunctions.googleapis.com) - Cloud Run Admin API (
run.googleapis.com) - Artifact Registry API (
artifactregistry.googleapis.com) - Cloud Build API (
cloudbuild.googleapis.com) - Compute Engine API (
compute.googleapis.com) - Cloud Resource Manager API (
cloudresourcemanager.googleapis.com)
When you use BigQuery DataFrames remote functions, you need the
Project IAM Admin role (roles/resourcemanager.projectIamAdmin)
if you're using a default BigQuery connection, or the
Browser role (roles/browser)
if you're using a pre-configured connection. You can avoid this requirement by
setting the bigframes.pandas.options.bigquery.skip_bq_connection_check option
to True, in which case the connection (default or pre-configured) is used
as-is without any existence or permission check. If you're using the
pre-configured connection and skipping the connection check, verify the
following:
- The connection is created in the right location.
- If you're using BigQuery DataFrames remote functions, the service
account has the
Cloud Run Invoker role (
roles/run.invoker) on the project.
View and manage connections
BigQuery connections are created in the same location as the BigQuery DataFrames session, using the name you provide in the custom function definition. To view and manage connections, do the following:
In the Google Cloud console, go to the BigQuery page.
Select the project in which you created the remote function.
In the left pane, click Explorer:

In the Explorer pane, expand the project, and then click Connections.
BigQuery remote functions are created in the dataset you specify,
or they are created in an anonymous dataset, which is a type of
hidden dataset.
If you don't set a name for a remote function during its creation,
BigQuery DataFrames applies a default name that begins with the
bigframes prefix. To view and manage remote functions created in a
user-specified dataset, do the following:
In the Google Cloud console, go to the BigQuery page.
Select the project in which you created the remote function.
In the left pane, click Explorer:

In the Explorer pane, expand the project, and then click Datasets.
Click the dataset in which you created the remote function.
Click the Routines tab.
To view and manage Cloud Run functions, do the following:
Go to the Cloud Run page.
Select the project in which you created the function.
In the list of available services, filter on Function Deployment type.
To identify functions created by BigQuery DataFrames, look for function names with the
bigframesprefix.
Clean up
In addition to cleaning up the cloud artifacts directly in the Google Cloud console or with other tools, you can clean up the BigQuery remote functions that were created without an explicit name argument and their associated Cloud Run functions in the following ways:
- For a BigQuery DataFrames session, use the
session.close()command. - For the default BigQuery DataFrames session, use the
bigframes.pandas.close_session()command. - For a past session with
session_id, use thebigframes.pandas.clean_up_by_session_id(session_id)command.
You can also clean up the BigQuery remote functions that were
created with an explicit name argument and their associated
Cloud Run functions by using the
bigframes.pandas.get_global_session().bqclient.delete_routine(routine_id)
command.
Limitations
- Remote functions take about 90 seconds to become usable when you first create them. Additional package dependencies might add to the latency.
- Re-running the remote function definition code after trivial changes in and around the function code—for example, renaming a variable, inserting a new line, or inserting a new cell in the notebook—might cause the remote function to be re-created, even if these changes are inconsequential to the behavior of the function.
- The user code is visible to users with read access on the Cloud Run functions, so you should include sensitive content only with caution.
- A project can have up to 1,000 Cloud Run functions at a time in a region. For more information, see Quotas.
What's next
- Learn about ML and AI capabilities with BigQuery DataFrames.
- Learn how to generate BigQuery DataFrames code with Gemini.
- Learn how to analyze package downloads from PyPI with BigQuery DataFrames.
- View BigQuery DataFrames source code, sample notebooks, and samples on GitHub.
- Explore the BigQuery DataFrames API reference.