Work with geospatial data in Bigtable

This page describes how to store and query geospatial data in Bigtable using the GoogleSQL geography functions. Geospatial data include geometrical representations of the Earth's surface in the form of points, lines, and polygons. You can group these entities into collections to track locations, routes, or other areas of interest.

Basic operations

The dedicated GoogleSQLgeography functions lets Bigtable optimize your database performance for geospatial calculations and query times. In the basic usage examples, we use a tourist_points_of_interest table with one column family called poi_data.

poi_data
row key name location
p1 Eiffel Tower POINT(2.2945 48.8584)

Write geography data

To write geospatial data to Bigtable columns, you need to provide string representations of geography entities in the GeoJSON or Well-Known Text (WKT) formats. Bigtable stores the underlying data as raw bytes, so you can provide any mix of entities in a single column (such as Points, LineStrings, Polygons, MultiPoints, MultiLineStrings, or MultiPolygons). The type of geospatial entity you provide is important at query time because you need to use the appropriate geography function to interpret the data.

For example, to add some attractions to the tourist_points_of_interest table, you can use the cbt CLI tool like so:

  • With WKT:
    cbt set tourist_points_of_interest p2 poi_data:name='Louvre Museum'
    cbt set tourist_points_of_interest p2 poi_data:location='POINT(2.3376 48.8611)'
    
    cbt set tourist_points_of_interest p3 poi_data:name='Place Charles de Gaulle'
    cbt set tourist_points_of_interest p3 poi_data:location='POLYGON(2.2941 48.8733, 2.2941 48.8742, 2.2957 48.8742, 2.2958 48.8732, 2.2941 48.8733)'
  • With GeoJSON:
    cbt set tourist_points_of_interest p2 poi_data:name='Louvre Museum'
    cbt set tourist_points_of_interest p2 poi_data:location='{"type": "Point", "coordinates": [2.3376, 48.8611]}'
    
    cbt set tourist_points_of_interest p3 poi_data:name='Place Charles de Gaulle'
    cbt set tourist_points_of_interest p3 poi_data:location='{"type": "Polygon", "coordinates": [[ [2.2941, 48.8733], [2.2941, 48.8742], [2.2957, 48.8742], [2.2958, 48.8732], [2.2941, 48.8733] ]] }'

Query geography data

You can query the tourist_points_of_interest table with GoogleSQL in Bigtable Studio and filter the results with geography functions. For example, to find all tourist attractions within 1500 meters of Trocadéro, you can use the ST_DWITHIN function:

SELECT _key, poi_data['name'], poi_data['location']
FROM points_of_interest
WHERE ST_DWITHIN(CAST(poi_data['location'] AS STRING), ST_GEOGPOINT(2.2874, 48.86322), 1500);

The result would be similar to the following:

Screenshot showing the tourist_points_of_interest table in Bigtable Studio
    with query results filtered with the ST_DWITHIN geography function.

The location column shows the original textual representation of the data provided in the write operation, but Bigtable stores it as bytes. In this example, even though Place Charles de Gaulle is defined as a polygon that stretches farther than the 1500 meter limit, it is included in the query results. That's because the ST_DWITHIN function returns true if any point contained in a geospatial entity is within the expected distance.

Advanced usage

The following sections describe how to use geography functions for complex scenarios like geofencing analysis. The Query optimization section explains how to use continuous materialized views for improved performance.

To illustrate advanced uses of geography functions in Bigtable, imagine a scenario where you are a researcher monitoring the behavior of emperor penguins who live on the southern side of the Snow Hill Island.

Your equipment provides hundreds of location pings for each penguin, so you create the penguin_movements table to store them:

penguin_details
row key penguin_id timestamp location
ping#123 pen_01 2025-12-06 08:15:22+00 POINT(-57.51 -64.42)
ping#124 pen_01 2025-12-06 10:22:05+00 POINT(-57.55 -64.43)
ping#125 pen_01 2025-12-07 12:35:45+00 POINT(-57.58 -64.41)
ping#126 pen_02 2025-12-12 06:05:11+00 POINT(-57.49 -64.39)

Example: Find who crossed the geofence

During the research, you observe penguin behavior around a specific feeding ground. Areas defined with virtual boundaries are commonly known as geofences. The feeding ground can be considered a geofenced area. You want to find out if any penguins visited the feeding ground on December 3, 2025. To answer the question, you use the ST_CONTAINS function, like so:

SELECT
  penguin_details['penguin_id'],
  penguin_details['location'],
  penguin_details['timestamp']
FROM
  penguin_movements
WHERE
  penguin_details['timestamp'] >= '2025-12-03 00:00:00 UTC'
  AND penguin_details['timestamp'] < '2025-12-04 00:00:00 UTC'
  -- The feeding ground boundary is defined with a WKT POLYGON entity
  -- Polygon's last point must be equal to its first point to close the loop
  AND ST_CONTAINS(
        ST_GEOGFROMTEXT('POLYGON((-57.21 -64.51, -57.23 -64.55, -57.08 -64.56, -57.06 -64.51, -57.21 -64.51))'),
        ST_GEOGFROMTEXT(CAST(penguin_details['location'] AS STRING))
      )

You get results similar to the following:

+------------+----------------------+------------------------+
| penguin_id |       location       |       timestamp        |
+------------+----------------------+------------------------+
| pen_01     | POINT(-57.15 -64.53) | 2025-12-03 08:15:22+00 |
| pen_02     | POINT(-57.18 -64.54) | 2025-12-03 14:30:05+00 |
| pen_10     | POINT(-57.10 -64.52) | 2025-12-03 11:45:10+00 |
+------------+----------------------+------------------------+

Example: Transform points to routes

To better visualize how the penguins travel, you decide to organize individual location pings into LineStrings by using the ST_MAKELINE function like so:

SELECT
  penguin_id,
  DATE(timestamp, 'UTC') AS route_date,
  ST_MAKELINE(ARRAY_AGG(location_cast_to_geog)) AS route
FROM (
  -- Sub-expression to sort all location pings by timestamp
  -- This way you make sure individual points can form a realistic route
  SELECT
    penguin_details['penguin_id'] AS penguin_id,
    -- Extract and cast timestamp once
    CAST(CAST(penguin_details['timestamp'] AS STRING) AS TIMESTAMP) AS timestamp,
    -- Extract and cast location once
    ST_GEOGFROMTEXT(CAST(penguin_details['location'] AS STRING)) AS location_cast_to_geog
  FROM
    penguin_movements
  ORDER BY
    -- Pre-sorts location pings for the ARRAY_AGG function
    penguin_id, timestamp ASC
)
GROUP BY
  penguin_id,
  route_date

With this query, you can get more advanced insights for your research. Expand the following sections to see example queries and results.

What is the daily travel distance for each penguin?

To calculate how much distance each penguin covers every day, you can use the ST_LENGTH function.

SELECT
  penguin_id,
  route_date,
  ST_LENGTH(route) AS daily_distance_meters
FROM (
  -- Subquery to aggregate points into daily routes
  SELECT
    penguin_id,
    DATE(timestamp, 'UTC') AS route_date,
    ST_MAKELINE(ARRAY_AGG(location_cast_to_geog)) AS route
  FROM (
    -- Sub-expression to sort all location pings by timestamp
    -- This way you make sure individual points can form a realistic route
    SELECT
      penguin_details['penguin_id'] AS penguin_id,
      -- Extract and cast timestamp once
      CAST(CAST(penguin_details['timestamp'] AS STRING) AS TIMESTAMP) AS timestamp,
      -- Extract and cast location once
      ST_GEOGFROMTEXT(CAST(penguin_details['location'] AS STRING)) AS location_cast_to_geog
    FROM
      penguin_movements
    ORDER BY
      -- Pre-sorts location pings for the ARRAY_AGG function
      penguin_id, timestamp ASC
  )
  GROUP BY
    penguin_id,
    route_date
) AS DailyRoutes;

This query returns a result similar to the following:

+------------+------------+-----------------------+
| penguin_id | route_date | daily_distance_meters |
+------------+------------+-----------------------+
| pen_01     | 2025-12-02 | 15420.7               |
| pen_03     | 2025-12-03 | 22105.1               |
| pen_32     | 2025-12-03 | 9850.3                |
+------------+------------+-----------------------+

How much of the daily route occurs within the geofenced area?

To answer this question, you filter the results with the ST_INTERSECTION function to check which parts of the penguin's route intersect with the geofenced area. Then, the ST_LENGTH function can help calculate the ratio of movement within the feeding grounds to the full distance traveled that day.

SELECT
  penguin_id,
  route_date,
  ST_LENGTH(ST_INTERSECTION(route, ST_GEOGFROMTEXT('POLYGON((-57.21 -64.51, -57.23 -64.55, -57.08 -64.56, -57.06 -64.51, -57.21 -64.51))'))) / ST_LENGTH(route) AS proportion_in_ground
FROM (
  -- Subquery to aggregate points into daily routes
  SELECT
    penguin_id,
    DATE(timestamp, 'UTC') AS route_date,
    ST_MAKELINE(ARRAY_AGG(location_cast_to_geog)) AS route
  FROM (
    -- Sub-expression to sort all location pings by timestamp
    -- This way you make sure individual points can form a realistic route
    SELECT
      penguin_details['penguin_id'] AS penguin_id,
      -- Extract and cast timestamp once
      CAST(CAST(penguin_details['timestamp'] AS STRING) AS TIMESTAMP) AS timestamp,
      -- Extract and cast location once
      ST_GEOGFROMTEXT(CAST(penguin_details['location'] AS STRING)) AS location_cast_to_geog
    FROM
      penguin_movements
    ORDER BY
      -- Pre-sorts location pings for the ARRAY_AGG function
      penguin_id, timestamp ASC
  )
  GROUP BY
    penguin_id,
    route_date
) AS DailyRoutes
WHERE
  ST_INTERSECTS(route, ST_GEOGFROMTEXT('POLYGON((-57.21 -64.51, -57.23 -64.55, -57.08 -64.56, -57.06 -64.51, -57.21 -64.51))'))

This query returns a result similar to the following:

+------------+------------+----------------------+
| penguin_id | route_date | proportion_in_ground |
+------------+------------+----------------------+
| pen_01     | 2025-12-03 | 0.652                |
| pen_03     | 2025-12-03 | 0.918                |
| pen_32     | 2025-12-04 | 0.231                |
+------------+------------+----------------------+

Optimize queries with continuous materialized views

Geospatial queries on large datasets can be slow because they might require scanning many rows. Bigtable doesn't support dedicated geospatial indexes, but you can improve your query performance with continuous materialized views.

To create a unique row key for such views, you can transform the GEOGRAPHY type into an S2 cell. S2 is a library that lets you perform complex trigonometry for sphere geometry. Calculating distances based on geographical coordinates can be computationally expensive. S2 cells can represent a specific area on the Earth's surface as a 64-bit integer, making them ideal for geospatial indexing with continuous materialized views.

To define an S2 cell, you call the S2_CELLIDFROMPOINT(location, level) function and provide the granularity level argument. This argument is a number between 0 and 30 and defines the size of each cell: the higher the number, the smaller the cell.

In the penguin research example, you can create a continuous materialized view indexed on each penguin's location and timestamp:

-- Query used to create the continuous materialized view
SELECT
  -- Create S2 cells for each penguin's location.
  -- Note that the `level` value is the same for each location so that
  -- every cell is the same size. This ensures consistency for your data.
  S2_CELLIDFROMPOINT(ST_GEOGFROMTEXT(CAST(penguin_details['location'] AS STRING)), level => 16) AS s2_cell_id,
  penguin_details['timestamp'] AS observation_time,
  penguin_details['penguin_id'] AS penguin_id,
  penguin_details['location'] AS location
FROM
  penguin_movements

Querying this view to answer the Who crossed the geofence? question becomes much more efficient with a two-phase approach. You first use the cell_id index in the continuous materialized view to quickly find candidate rows within the general vicinity, then apply the accurate ST_CONTAINS function on the reduced dataset:

SELECT
  idx.penguin_id,
  idx.location,
  idx.observation_time
FROM
  penguin_s2_time_index AS idx
WHERE
  -- Part one: use an approximate spatial filter and timestamp filter
  -- for fast scans on continuous materialized view keys.
  -- Use the S2_COVERINGCELLIDS function to create an array of S2 cells
  -- that cover the feeding ground polygon. The `level` argument must be the
  -- same value as the one you used to create the continuous materialized view.
  idx.s2_cell_id IN UNNEST(S2_COVERINGCELLIDS(
    ST_GEOGFROMTEXT('POLYGON((-57.21 -64.51, -57.23 -64.55, -57.08 -64.56, -57.06 -64.51, -57.21 -64.51))'),
    min_level => 16,
    max_level => 16,
    max_cells => 500
  ))
  AND idx.observation_time >= '2025-12-03 00:00:00 UTC'
  AND idx.observation_time < '2025-12-04 00:00:00 UTC'
  -- Part two: use ST_CONTAINS() on the returned set to ensure precision.
  -- S2 cells are squares, so they don't equal arbitrary geofence polygons.
  -- You still need to check if a specific point is contained within the area,
  -- but the filter applies to a smaller data set and is much faster.
  AND ST_CONTAINS(ST_GEOGFROMTEXT('POLYGON((-57.21 -64.51, -57.23 -64.55, -57.08 -64.56, -57.06 -64.51, -57.21 -64.51))'), idx.location);

Limitations

Bigtable doesn't support the following features when working with geography data:

  • Three-dimensional geometries. This includes the "Z" suffix in the WKT format, and the altitude coordinate in the GeoJSON format.
  • Linear reference systems. This includes the "M" suffix in WKT format.
  • WKT geometry objects other than geometry primitives or multipart geometries. In particular, Bigtable supports only Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, and GeometryCollection.
  • ST_CLUSTERDBSCAN function isn't supported.

For constraints specific to GeoJson and WKT input formats, see ST_GEOGFROMGEOJSON and ST_GEOGFROMTEXT.

What's next?