Install BigQuery DataFrames
BigQuery DataFrames provides a Python DataFrame and machine learning (ML) API powered by the BigQuery engine. BigQuery DataFrames is an open-source package.
Install BigQuery DataFrames
To install the latest version of BigQuery DataFrames, run pip install
--upgrade bigframes.
Available libraries
BigQuery DataFrames provides three libraries:
bigframes.pandasprovides a pandas API that you can use to analyze and manipulate data in BigQuery. Many workloads can be migrated from pandas to bigframes by just changing a few imports. Thebigframes.pandasAPI is scalable to support processing terabytes of BigQuery data, and the API uses the BigQuery query engine to perform calculations.bigframes.bigqueryprovides many BigQuery SQL functions that might not have a pandas equivalent.bigframes.mlprovides an API similar to the scikit-learn API for ML. The ML capabilities in BigQuery DataFrames let you preprocess data, and then train models on that data. You can also chain these actions together to create data pipelines.
Required roles
To get the permissions that you need to complete the tasks in this document, ask your administrator to grant you the following IAM roles on your project:
-
BigQuery Job User (
roles/bigquery.jobUser) -
BigQuery Read Session User (
roles/bigquery.readSessionUser) -
Use BigQuery DataFrames in a BigQuery notebook:
-
BigQuery User (
roles/bigquery.user) -
Notebook Runtime User (
roles/aiplatform.notebookRuntimeUser) -
Code Creator (
roles/dataform.codeCreator)
-
BigQuery User (
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
When you're performing end user authentication in an interactive environment like a notebook, Python REPL, or the command line, BigQuery DataFrames prompts for authentication, if needed. Otherwise, see how to set up application default credentials for various environments.
Configure installation options
After you install BigQuery DataFrames, you can specify the following options.
Location and project
You need to specify the location and project in which you want to use BigQuery DataFrames.
You can define the location and project in your notebook in the following way:
Data processing location
BigQuery DataFrames is designed for scale, which it
achieves by keeping data and processing on the BigQuery
service. However, you can bring data into the memory of your client
machine by calling .to_pandas() on a DataFrame orSeries object. If
you choose to do this, the memory limitation of your client machine
applies.
What's next
- Learn about manipulating data with BigQuery DataFrames.
- Learn how to generate BigQuery DataFrames code with Gemini.
- Learn how to analyze package downloads from PyPI with BigQuery DataFrames.
- View BigQuery DataFrames source code, sample notebooks, and samples on GitHub.
- Explore the BigQuery DataFrames API reference.