Migrate to BigQuery DataFrames version 2.0
Version 2.0 of BigQuery DataFrames makes security and performance improvements to the BigQuery DataFrames API, adds new features, and introduces breaking changes. This document describes the changes and provides migration guidance. You can apply these recommendations before installing the 2.0 version by using the latest version 1.x of BigQuery DataFrames.
BigQuery DataFrames version 2.0 has the following benefits:
- Faster queries and fewer tables are created when you run queries that return
results to the client, because
allow_large_resultsdefaults toFalse. This design can reduce storage costs, especially if you use physical bytes billing. - Improved security by default in the remote functions deployed by BigQuery DataFrames.
Install BigQuery DataFrames version 2.0
To avoid breaking changes, pin to a specific version of
BigQuery DataFrames in your requirements.txt file (for example,
bigframes==1.42.0) or your pyproject.toml file (for example,
dependencies = ["bigframes = 1.42.0"]). When you're ready to try the latest
version, you can run pip install --upgrade bigframes to install the latest
version of BigQuery DataFrames.
Use the allow_large_results option
BigQuery has a
maximum response size limit for query jobs.
Starting in BigQuery DataFrames version 2.0, BigQuery DataFrames
enforces this limit by default in methods that return results to the client,
such as peek(), to_pandas(), and to_pandas_batches(). If your job returns
large results, you can set allow_large_results to True in your
BigQueryOptions object to avoid breaking changes. This option is set to
False by default in BigQuery DataFrames version 2.0.
import bigframes.pandas as bpd bpd.options.bigquery.allow_large_results = True
You can override the allow_large_results option by using the
allow_large_results parameter in to_pandas() and other methods. For example:
bf_df = bpd.read_gbq(query) # ... other operations on bf_df ... pandas_df = bf_df.to_pandas(allow_large_results=True)
Use the @remote_function decorator
BigQuery DataFrames version 2.0 makes some changes to the default
behavior of the @remote_function decorator.
Keyword arguments are enforced for ambiguous parameters
To prevent passing values to an unintended parameter, BigQuery DataFrames version 2.0 and beyond enforces the use of keyword arguments for the following parameters:
bigquery_connectionreusenamepackagescloud_function_service_accountcloud_function_kms_key_namecloud_function_docker_repositorymax_batching_rowscloud_function_timeoutcloud_function_max_instancescloud_function_vpc_connectorcloud_function_memory_mibcloud_function_ingress_settings
When using these parameters, supply the parameter name. For example:
@remote_function( name="my_remote_function", ... ) def my_remote_function(parameter: int) -> str: return str(parameter)
Set a service account
As of version 2.0, BigQuery DataFrames no longer uses the Compute Engine service account by default for the Cloud Run functions it deploys. To limit the permissions of the function that you deploy, do the following:
- Create a service account with minimal permissions.
- Supply the service account email to the
cloud_function_service_accountparameter of the@remote_functiondecorator.
For example:
@remote_function( cloud_function_service_account="my-service-account@my-project.iam.gserviceaccount.com", ... ) def my_remote_function(parameter: int) -> str: return str(parameter)
If you would like to use the Compute Engine service account, you can set the
cloud_function_service_account parameter of the @remote_function decorator
to "default". For example:
# This usage is discouraged. Use only if you have a specific reason to use the # default Compute Engine service account. @remote_function(cloud_function_service_account="default", ...) def my_remote_function(parameter: int) -> str: return str(parameter)
Set ingress settings
As of version 2.0, BigQuery DataFrames sets the
ingress settings of the Cloud Run functions it
deploys to "internal-only". Previously, the ingress settings were set to
"all" by default. You can change the ingress settings by setting the
cloud_function_ingress_settings parameter of the @remote_function decorator.
For example:
@remote_function(cloud_function_ingress_settings="internal-and-gclb", ...) def my_remote_function(parameter: int) -> str: return str(parameter)
Use custom endpoints
In BigQuery DataFrames versions earlier than 2.0, if a region didn't
support
regional service endpoints and
bigframes.pandas.options.bigquery.use_regional_endpoints = True, then
BigQuery DataFrames would fall back to
locational endpoints. Version 2.0 of
BigQuery DataFrames removes this fallback behavior. To connect to
locational endpoints in version 2.0, set the
bigframes.pandas.options.bigquery.client_endpoints_override option. For
example:
import bigframes.pandas as bpd bpd.options.bigquery.client_endpoints_override = { "bqclient": "https://LOCATION-bigquery.googleapis.com", "bqconnectionclient": "LOCATION-bigqueryconnection.googleapis.com", "bqstoragereadclient": "LOCATION-bigquerystorage.googleapis.com", }
Replace LOCATION with the name of the BigQuery location that you want to connect to.
Use the bigframes.ml.llm module
In BigQuery DataFrames version 2.0, the default model_name for
GeminiTextGenerator has been updated to "gemini-2.0-flash-001". It is
recommended that you supply a model_name directly to avoid breakages if the
default model changes in the future.
import bigframes.ml.llm model = bigframes.ml.llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")
What's next
- Learn how to visualize graphs using BigQuery DataFrames.
- Learn how to generate BigQuery DataFrames code with Gemini.
- Learn how to analyze package downloads from PyPI with BigQuery DataFrames.
- View BigQuery DataFrames source code, sample notebooks, and samples on GitHub.
- Explore the BigQuery DataFrames API reference.