GoogleSQL for BigQuery supports the following syntax to use graphs within SQL queries.
Language list
| Name | Summary |
|---|---|
GRAPH_EXPAND
|
A TVF that returns a flattened version of the input graph. |
GRAPH_TABLE operator
|
Performs an operation on a graph in the FROM clause of a SQL
query and then produces a table with the results.
|
GRAPH_EXPAND
GRAPH_EXPAND(graph_name)
Description
Provides a flattened table representation of the data in the input graph.
The GRAPH_EXPAND TVF also supports querying graph properties defined by a
measure. For more information and examples of how to use
this function, see Work with measures.
Definitions
graph_name: ASTRINGvalue that contains the name of the graph to expand.
Details
The GRAPH_EXPAND function produces a flattened table by applying a series
of LEFT JOIN operations to node and edge tables. Because of this join behavior,
the function only accepts certain types of property graphs.
Input limitations
The schema of a property graph naturally forms a directed graph itself. Define the schema relationship graph for a property graph as follows:
- Each node table defines a node.
A directed edge from node table
Ato node tableBexists if the following conditions are met:- An edge table exists whose source key references one of the node tables and whose destination key references the other node table. The direction of the edge in the schema relationship graph might not match the direction of the edge defined on the property graph.
- The edge table defines a many-to-one relationship from table
Ato tableB. In other words, every row in tableAcorresponds to at most one row in tableB.
If the direction of the edge is ambiguous or there is a many-to-many relationship between the tables, omit the edge.
A property graph is valid input to the GRAPH_EXPAND function if it meets the
following requirements:
- The property graph has at least one node table.
- Every node table has at least one key column defined.
- At least one property is defined on a node or edge table.
- If the graph contains measure definitions, then a property name can't match a key column name unless the property's expression is identical to the name itself.
- Every node or edge table uses the default label.
The following conditions hold for the schema relationship graph of the property graph:
- There is a single root node, defined as a node table with in-degree zero.
- The graph is acyclic.
- Every node is reachable from the root node.
- The in-degree of every node is at most one.
Output
The GRAPH_EXPAND TVF returns a table that generally contains one column for
each property in the graph. However, if an edge table's properties are identical
to the properties of one of its adjacent node tables, then the redundant columns
are omitted.
The output column names are constructed by concatenating the
label name and property name with an underscore (_). For example, if
a node table has the label Person and a property called age, then the
column name corresponding to that property is Person_age.
To select columns that correspond to properties defined by a measure, you
must wrap them in the AGG function.
You can't directly select columns that correspond to properties defined by a
measure.
Examples
The following example creates a graph called StoreGraph based on the
Stores and Locations tables.
CREATE OR REPLACE TABLE mydataset.Stores (
name STRING PRIMARY KEY NOT ENFORCED,
location_id INT64 REFERENCES mydataset.Locations(id) NOT ENFORCED
) AS (
SELECT 'Store 1' AS name, 101 AS location_id
UNION ALL
SELECT 'Store 2' AS name, 101 AS location_id
);
CREATE OR REPLACE TABLE mydataset.Locations (
id INT64 PRIMARY KEY NOT ENFORCED,
name STRING,
population INT64
) AS (
SELECT 101 AS id, 'Anytown' AS name, 1000 AS population
UNION ALL
SELECT 102 AS id, 'Sometown' AS name, 500 AS population
);
CREATE OR REPLACE PROPERTY GRAPH mydataset.StoreGraph
NODE TABLES (
mydataset.Stores AS S,
mydataset.Locations AS L
PROPERTIES(id, name, population, MEASURE(SUM(population)) AS total_population)
)
EDGE TABLES (
mydataset.Stores AS SL
SOURCE KEY (location_id) REFERENCES L (id)
DESTINATION KEY (name) REFERENCES S (name)
);
The property graph consists of four nodes that represent Store 1, Store 2,
Anytown, and Sometown. The property graph contains two edges: one from
Anytown to Store 1 and another from Anytown to Store 2.
The schema relationship graph has a single edge
from the Stores node table to the Locations node table, because there is a
many-to-one relationship between them: many stores can belong to a single
location. This relationship is also reflected by the fact that location_id
is a foreign key in the Stores table.
The following query calls the GRAPH_EXPAND function and omits the
L_total_population column from the output because you can't directly select
a column for a property defined by a measure without using the AGG function:
SELECT * EXCEPT(L_total_population)
FROM GRAPH_EXPAND('mydataset.StoreGraph');
/*---------------+---------+------+---------+--------------+
| S_location_id | S_name | L_id | L_name | L_population |
+---------------+---------+------+---------+--------------+
| 101 | Store 2 | 101 | Anytown | 1000 |
| 101 | Store 1 | 101 | Anytown | 1000 |
+---------------+---------+------+---------+--------------*/
The Sometown location doesn't appear in the output because it's
not referenced by the location_id foreign key in the
Stores table, so it's dropped from the LEFT JOIN that produces the output.
The columns SL_location_id and SL_name don't appear because the properties
of the edge table SL are identical to the properties of its node table S.
The following query shows the difference between aggregating a measure and
a regular value. When you apply the AGG function to the L_total_population
measure, population is counted exactly once per distinct location_id value.
If you call the SUM function on L_population, then the L_population
column contributes the population for every row in the table with a given
location ID.
SELECT
S_location_id,
AGG(L_total_population) AS true_total_population,
SUM(L_population) AS overcounted_population
FROM GRAPH_EXPAND('mydataset.StoreGraph')
GROUP BY S_location_id;
/*---------------+-----------------------+------------------------+
| S_location_id | true_total_population | overcounted_population |
+---------------+-----------------------+------------------------+
| 101 | 1000 | 2000 |
+---------------+-----------------------+------------------------*/
GRAPH_TABLE operator
FROM GRAPH_TABLE ( property_graph_name multi_linear_query_statement ) [ [ AS ] alias ]
Description
Performs an operation on a graph in the FROM clause of a SQL query and then
produces a table with the results.
With the GRAPH_TABLE operator, you can use the GQL syntax
to query a property graph. The result of this operation is produced as a table that
you can use in the rest of the query.
Definitions
property_graph_name: The name of the property graph to query for patterns.multi_linear_query_statement: You can use GQL to query a property graph for patterns. For more information, see Graph query language.alias: An optional alias, which you can use to refer to the table produced by theGRAPH_TABLEoperator elsewhere in the query.
Examples
You can use the RETURN statement to return specific node and edge properties.
For example:
SELECT name, id
FROM GRAPH_TABLE(
graph_db.FinGraph
MATCH (n:Person)
RETURN n.name AS name, n.id AS id
);
/*-----------+
| name | id |
+-----------+
| Alex | 1 |
| Dana | 2 |
| Lee | 3 |
+-----------*/
The following query produces an error because id isn't
included in the RETURN statement, even though this property exists for
element n:
SELECT name, id
FROM GRAPH_TABLE(
graph_db.FinGraph
MATCH (n:Person)
RETURN n.name
);
The following query produces an error because directly outputting the graph
element n is not supported. Convert n to its JSON representation using the
TO_JSON
function for successful output.
-- Error
SELECT n
FROM GRAPH_TABLE(
graph_db.FinGraph
MATCH (n:Person)
RETURN n
);
SELECT TO_JSON(n) as json_node
FROM GRAPH_TABLE(
graph_db.FinGraph
MATCH (n:Person)
RETURN n
);
/*---------------------------+
| json_node |
+---------------------------+
| {"identifier":"mUZpbk...} |
| {"identifier":"mUZpbk...} |
| {"identifier":"mUZpbk...} |
+--------------------------*/