Manipulate data with BigQuery DataFrames
This document describes the data manipulation capabilities available with
BigQuery DataFrames. You can find the functions that are described in the
bigframes.bigquery library.
Required roles
To get the permissions that you need to complete the tasks in this document, ask your administrator to grant you the following IAM roles on your project:
-
BigQuery Job User (
roles/bigquery.jobUser) -
BigQuery Read Session User (
roles/bigquery.readSessionUser) -
Use BigQuery DataFrames in a BigQuery notebook:
-
BigQuery User (
roles/bigquery.user) -
Notebook Runtime User (
roles/aiplatform.notebookRuntimeUser) -
Code Creator (
roles/dataform.codeCreator)
-
BigQuery User (
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
When you perform end user authentication in an interactive environment like a notebook, Python REPL, or the command line, BigQuery DataFrames prompts for authentication, if needed. Otherwise, see how to set up application default credentials for various environments.
pandas API
A notable feature of BigQuery DataFrames is that the
bigframes.pandas API
is designed to be similar to APIs in the pandas library. This design lets you
employ
familiar syntax patterns for data manipulation tasks. Operations defined through
the BigQuery DataFrames API are executed server-side, operating directly
on data stored within BigQuery and eliminating the need to
transfer datasets out of BigQuery.
To check which pandas APIs are supported by BigQuery DataFrames, see Supported pandas APIs.
Inspect and manipulate data
You can use the bigframes.pandas API to perform data inspection and
calculation operations. The following code sample uses the bigframes.pandas
library to inspect the body_mass_g column, calculate the mean body_mass, and
calculate the mean body_mass by species:
BigQuery library
The BigQuery library provides BigQuery SQL functions that might not have a pandas equivalent. The following sections present some examples.
Process array values
You can use the bigframes.bigquery.array_agg() function in the
bigframes.bigquery library to aggregate values after a groupby operation:
You can also use the array_length() and array_to_string() array functions.
Create a struct Series object
You can use the bigframes.bigquery.struct() function in the
bigframes.bigquery library to create a new struct Series object with
subfields for each column in a DataFrame:
Convert timestamps to Unix epochs
You can use the bigframes.bigquery.unix_micros() function in the
bigframes.bigquery library to convert timestamps into Unix microseconds:
You can also use the unix_seconds() and unix_millis() time functions.
Use the SQL scalar function
You can use the bigframes.bigquery.sql_scalar() function in the
bigframes.bigquery library to access arbitrary SQL syntax representing a
single-column expression:
What's next
- Learn about custom Python functions for BigQuery DataFrames.
- Learn how to generate BigQuery DataFrames code with Gemini.
- Learn how to analyze package downloads from PyPI with BigQuery DataFrames.
- View BigQuery DataFrames source code, sample notebooks, and samples on GitHub.
- Explore the BigQuery DataFrames API reference.