Spanner Graph, in close collaboration with Google Research Graph Mining, offers a suite of algorithms covering graph analysis needs for the following use cases:
Centrality
Centrality algorithms rank nodes by their structural importance within a graph. It can help you identify fraudulent accounts in fintech, influencers in social graphs, critical routers in telecom networks and more.
PageRank
PageRank scores nodes by importance. A node is considered more important if many
other important nodes link to it. The algorithm simulates a random walk through
the graph. Nodes that are visited more often in this walk are considered more
important. See
page rank
for algorithm details.
Function signature
PageRank(input_parameters) YIELD node, score
Input parameters
All common input parameters and:
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| source_nodes | Array of graph nodes | No | (none) | If present, source for personalized PageRank. |
| damping_factor | double | No | 0.85 | The probability that, at any given iteration, the algorithm chooses to traverse one of the outgoing edges of the current node. Must be in the range of [0, 1). |
| max_iterations | int | No | 10 | The maximum number of iterations of the algorithm. Must be positive. A higher number of iterations tends to produce more precise results at the cost of a longer runtime. |
| approx_precision | double | No | 1e-2 | Approximation precision threshold for the algorithm. Must be non-negative. Smaller values tends to produce more precise results at the cost of longer runtime. |
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| node | GRAPH_ELEMENT (node) | A node in the graph. |
| score | FLOAT64 | The PageRank score of the node. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/my-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL PageRank(
node_labels => ['Account'], edge_labels => ['Transfers'],
source_nodes => ARRAY {
MATCH (n:Account {id:7})
RETURN n
}
) YIELD node, score
RETURN node.id, score;
BetweennessCentrality
BetweennessCentrality measures how often a node lies on shortest paths between
other pairs of nodes. Nodes with high betweenness centrality often act as
critical bridges connecting different parts of the graph. See
betweenness centrality
for algorithm details.
Function signature
BetweennessCentrality(input_parameters) YIELD node, centrality
Input parameters
All common input parameters and:
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| num_source_nodes | INT64 | No | (none) | If set, must be positive, and specifies how many source nodes the algorithm should select for computing approximate betweenness centralities. If not set, or set to a number larger than the number of nodes in the graph, then all nodes are used as source nodes, that is, the algorithm computes exact betweenness centralities. Higher values lead to better approximations, but also to larger running times. |
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| node | GRAPH_ELEMENT (node) | A node in the graph. |
| centrality | FLOAT64 | The centrality score of the node. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/my-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL BetweennessCentrality(
node_labels => ['Account'], edge_labels => ['Transfers'], num_source_nodes => 5
) YIELD node, centrality
RETURN node.id, centrality;
ClosenessCentrality
ClosenessCentrality measures how close a node is to all other nodes in the
graph. Nodes with high closeness centrality can reach other nodes through
shorter paths. See closeness centrality
for algorithm details.
Function signature
ClosenessCentrality(input_parameters) YIELD node, centrality
Input parameters
All common input parameters and:
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| mode | STRING | No | EXACT | The algorithm mode. Supported values are EXACT and HYBRID. |
| use_wasserman_faust | BOOL | No | FALSE | If `false`, compute standard closeness centralities. If `true`, compute Wasserman-Faust closeness centralities. |
| epsilon | FLOAT64 | No | 0.1 | Used only for the `HYBRID` algorithm. Must be in the range (0.0, 1.0). Smaller values generally means higher precision. Larger values generally allows the algorithm to execute faster. |
| sample_size | INT64 | No | 0 | Used only for the `HYBRID` algorithm. The number of pivot nodes to use
for approximately computing the centrality values. Must be non-negative.
If not set or zero, a default value of min(100 * ln(N),
N) is used, where N is the node count of the
input graph.
|
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| node | GRAPH_ELEMENT (node) | A node in the graph. |
| centrality | FLOAT64 | The closeness centrality score of the node. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/my-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL ClosenessCentrality(
node_labels => ['Account'], edge_labels => ['Transfers'],
mode => 'EXACT'
) YIELD node, centrality
RETURN node.id, centrality;
Clustering
Clustering algorithms group nodes into clusters (also known as communities) such that nodes within each cluster are more densely connected to each other than to nodes in other clusters. This is useful to detect potential fraud rings in fintech, identify customer segmentations in retail and more.
WeaklyConnectedComponents
WeaklyConnectedComponents finds disjoint sets of nodes such that every node in a
set is reachable from any other node in the same set, but not from nodes in
other sets. See connected components
for algorithm details.
Function signature
WeaklyConnectedComponents(input_parameters) YIELD node, cluster
Input parameters
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| node | GRAPH_ELEMENT (node) | A node in the graph. |
| cluster | INT64 | The ID of the connected component/cluster the node belongs to.
In the range of [0, N), where N is the number of nodes in the graph. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/wcc-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL WeaklyConnectedComponents(
node_labels => ['Account'], edge_labels => ['Transfers']
) YIELD node, cluster
RETURN node.id, cluster;
ModularityClustering
ModularityClustering partitions the graph into clusters by optimizing for
modularity, a quality measure of the density of connections within clusters
compared to what would be expected in a random network. See modularity clustering for algorithm details.
Function signature
ModularityClustering(input_parameters) YIELD node, cluster
Input parameters
All common input parameters and:
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| resolution | double | No | 1.0 | Controls the granularity of the clustering. Must be finite and non-negative. Smaller resolution values tend to lead to larger clusters. Typical values are in the range [0.5, 5]. When the resolution is zero, the algorithm (with a sufficient number of iterations) finds the connected components of the graph (ignoring edges whose weight is zero). |
| max_iterations | int | No | 10 | Maximum number of outer iterations of the algorithm. Must be positive. Larger values tend to lead to higher-quality clusterings at the cost of longer runtime. |
| max_inner_iterations | int | No | 10 | Maximum number of inner iterations of the algorithm. Must be positive. Larger values tend to lead to higher-quality clusterings at the cost of longer runtime. |
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| node | GRAPH_ELEMENT (node) | A node in the graph. |
| cluster | INT64 | The ID of the community/cluster the node belongs to.
In the range of [0, N), where N is the number of nodes in the graph. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/modularity-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL ModularityClustering(
node_labels => ['Account'], edge_labels => ['Transfers'],
resolution => 1.0, max_iterations => 10
) YIELD node, cluster
RETURN node.id, cluster;
CorrelationClustering
CorrelationClustering partitions nodes based on their pairwise similarity or
dissimilarity, aiming to group similar nodes together while separating
dissimilar ones. See
correlation clustering
for algorithm details.
Function signature
CorrelationClustering(input_parameters) YIELD node, cluster
Input parameters
All common input parameters and:
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| resolution | double | Yes | (none) | Controls the granularity of the clustering. Must be explicitly defined by client. Must be finite and non-negative. Smaller values tend to lead to larger clusters. When the resolution is zero and all edge weights are positive, the algorithm (with a sufficient number of iterations) finds the connected components of the graph. |
| max_iterations | int | No | 10 | Maximum number of outer iterations of the algorithm. Must be positive. Larger values tend to lead to higher-quality clusterings at the cost of longer runtime. |
| max_inner_iterations | int | No | 10 | Maximum number of inner iterations of the algorithm. Must be positive. Larger values tend to lead to higher-quality clusterings at the cost of longer runtime. |
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| node | GRAPH_ELEMENT (node) | A node in the graph. |
| cluster | INT64 | The ID of the community/cluster the node belongs to.
In the range of [0, N), where N is the number of nodes in the graph. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/correlation-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL CorrelationClustering(
node_labels => ['Account'], edge_labels => ['Transfers'],
resolution => 0.5
) YIELD node, cluster
RETURN node.id, cluster;
LabelPropagation
LabelPropagation assigns nodes to clusters using a label propagation technique,
starting from an initial labeling which may use seed labels provided in the
input. As nodes iteratively adopt the label shared by the majority of their
neighbors, densely connected groups converge to a consensus label, forming
communities. See
label propagation
for algorithm details.
Function signature
LabelPropagation(input_parameters) YIELD node, cluster
Input parameters
All common input parameters and:
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| seed_label_property | string | No | (none) | The node property name for the seed label. |
| max_iterations | int | No | 10 | The maximum number of iterations of label propagation that the algorithm performs. Must be finite and positive. |
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| node | GRAPH_ELEMENT (node) | A node in the graph. |
| cluster | INT64 | The propagated cluster/label ID of the node. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/lp-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL LabelPropagation(
node_labels => ['Account'], edge_labels => ['Transfers'],
max_iterations => 10
) YIELD node, cluster
RETURN node.id, cluster;
CliqueFinding
CliqueFinding identifies potentially overlapping dense communities that meet a
minimum density threshold, in such a way that every clique (a group of nodes
where each node is connected to every other node) is contained in at least one
cluster. See
clique aggregator
for algorithm details.
Function signature
CliqueFinding(input_parameters) YIELD node, clique
Note: Unlike most algorithms, there can be multiple rows for the same node since a node may be part of many cliques.
Input parameters
All common input parameters and:
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| min_density | double | No | 0.9 | The minimum density of the clusters to be returned. Must be in [0, 1]. |
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| node | GRAPH_ELEMENT (node) | A node in the graph. |
| clique | INT64 | The ID of the clique the node belongs to. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/clique-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL CliqueFinding(
node_labels => ['Account'], edge_labels => ['Transfers'],
min_density => 0.9
) YIELD node, clique
RETURN node.id, clique;
Similarity
Similarity algorithms quantify how alike pairs of nodes are based on the structure of their neighborhoods. This is useful for inferring missing edges between nodes for entity resolution, recommendation and more.
Spanner Graph supports the following pairwise node similarities where similarity scores are computed based on neighborhoods of the nodes.
JaccardSimilarity: Based on ratio of common neighbors to total neighborsCosineSimilarity: Based on edge weights of common neighborsCommonNeighborsSimilarity: Based on number of shared neighborsTotalNeighborsSimilarity: Based on number of neighbors of at least one of the two nodes
See pairwise node similarity for algorithm details.
These four algorithms share the same arguments and output structure.
Function signature
[JaccardSimilarity|CosineSimilarity|CommonNeighborsSimilarity|TotalNeighborsSimilarity](input_parameters)
YIELD source_node, target_node, similarity
Input parameters
All common input parameters and:
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| source_nodes | Array of nodes | Yes | N/A | |
| target_nodes | Array of nodes | Yes | N/A |
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| source_node | GRAPH_ELEMENT (node) | The source node used in the similarity calculation. |
| target_node | GRAPH_ELEMENT (node) | The target node used in the similarity calculation. |
| similarity | FLOAT64 | The calculated similarity score between source and target node. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/jaccard-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL JaccardSimilarity(
source_nodes => ARRAY {
MATCH (n:Account {id: 7})
RETURN n
},
target_nodes => ARRAY {
MATCH (n:Account)
WHERE n.id != 7
RETURN n
}
) YIELD source_node, target_node, similarity
RETURN source_node.id AS source_id, target_node.id AS target_id, similarity;
Path Finding
Path finding algorithms compute optimal routes between nodes. This is useful for identifying cheapest route in supply chains, assessing vulnerability in cybersecurity and more.
ShortestPath
ShortestPath computes shortest paths between a given set of source nodes and a
given set of target nodes. See
many-to-many shortest paths
for algorithm details.
Function signature
ShortestPath(input_parameters) YIELD source_node, target_node, path, cost
Input parameters
All common input parameters and:
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| source_nodes | Array of nodes | Yes | N/A | |
| target_nodes | Array of nodes | Yes | N/A |
Output
Function yields:
| Name | Type | Description |
|---|---|---|
| source_node | GRAPH_ELEMENT (node) | The source node of the path. |
| target_node | GRAPH_ELEMENT (node) | The target node of the path. |
| path | GRAPH_PATH | The shortest path found. |
| cost | FLOAT64 | The cost of the shortest path. |
Example
EXPORT DATA OPTIONS (
uri = "gs://my-bucket-name/shortest-path-output.csv",
format = "csv"
) AS
GRAPH FinGraph
CALL ShortestPath(
source_nodes => ARRAY {
MATCH (n:Account {id: 7})
RETURN n
},
target_nodes => ARRAY {
MATCH (n:Account {id: 16})
RETURN n
}
) YIELD source_node, target_node, path, cost
RETURN source_node.id AS source_id, target_node.id AS target_id, PATH_LENGTH(path) AS length, cost;
What's next
- Spanner Graph run algorithms.
- Spanner Graph algorithm schema requirements and feature compatibility.
- Spanner Graph algorithm best practices.