Spanner Graph algorithms catalog

Spanner Graph, in close collaboration with Google Research Graph Mining, offers a suite of algorithms covering graph analysis needs for the following use cases:

Centrality

Centrality algorithms rank nodes by their structural importance within a graph. It can help you identify fraudulent accounts in fintech, influencers in social graphs, critical routers in telecom networks and more.

PageRank

PageRank scores nodes by importance. A node is considered more important if many other important nodes link to it. The algorithm simulates a random walk through the graph. Nodes that are visited more often in this walk are considered more important. See page rank for algorithm details.

Function signature

PageRank(input_parameters) YIELD node, score

Input parameters

All common input parameters and:

Name Type Required Default Description
source_nodes Array of graph nodes No (none) If present, source for personalized PageRank.
damping_factor double No 0.85 The probability that, at any given iteration, the algorithm chooses to traverse one of the outgoing edges of the current node. Must be in the range of [0, 1).
max_iterations int No 10 The maximum number of iterations of the algorithm. Must be positive. A higher number of iterations tends to produce more precise results at the cost of a longer runtime.
approx_precision double No 1e-2 Approximation precision threshold for the algorithm. Must be non-negative. Smaller values tends to produce more precise results at the cost of longer runtime.

Output

Function yields:

Name Type Description
node GRAPH_ELEMENT (node) A node in the graph.
score FLOAT64 The PageRank score of the node.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/my-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL PageRank(
    node_labels => ['Account'], edge_labels => ['Transfers'],
    source_nodes => ARRAY {
                      MATCH (n:Account {id:7})
                      RETURN n
                    }
  ) YIELD node, score
RETURN node.id, score;

BetweennessCentrality

BetweennessCentrality measures how often a node lies on shortest paths between other pairs of nodes. Nodes with high betweenness centrality often act as critical bridges connecting different parts of the graph. See betweenness centrality for algorithm details.

Function signature

BetweennessCentrality(input_parameters) YIELD node, centrality

Input parameters

All common input parameters and:

Name Type Required Default Description
num_source_nodes INT64 No (none) If set, must be positive, and specifies how many source nodes the algorithm should select for computing approximate betweenness centralities. If not set, or set to a number larger than the number of nodes in the graph, then all nodes are used as source nodes, that is, the algorithm computes exact betweenness centralities. Higher values lead to better approximations, but also to larger running times.

Output

Function yields:

Name Type Description
node GRAPH_ELEMENT (node) A node in the graph.
centrality FLOAT64 The centrality score of the node.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/my-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL BetweennessCentrality(
    node_labels => ['Account'], edge_labels => ['Transfers'], num_source_nodes => 5
  ) YIELD node, centrality
RETURN node.id, centrality;

ClosenessCentrality

ClosenessCentrality measures how close a node is to all other nodes in the graph. Nodes with high closeness centrality can reach other nodes through shorter paths. See closeness centrality for algorithm details.

Function signature

ClosenessCentrality(input_parameters) YIELD node, centrality

Input parameters

All common input parameters and:

Name Type Required Default Description
mode STRING No EXACT The algorithm mode. Supported values are EXACT and HYBRID.
use_wasserman_faust BOOL No FALSE If `false`, compute standard closeness centralities. If `true`, compute Wasserman-Faust closeness centralities.
epsilon FLOAT64 No 0.1 Used only for the `HYBRID` algorithm. Must be in the range (0.0, 1.0). Smaller values generally means higher precision. Larger values generally allows the algorithm to execute faster.
sample_size INT64 No 0 Used only for the `HYBRID` algorithm. The number of pivot nodes to use for approximately computing the centrality values. Must be non-negative. If not set or zero, a default value of min(100 * ln(N), N) is used, where N is the node count of the input graph.

Output

Function yields:

Name Type Description
node GRAPH_ELEMENT (node) A node in the graph.
centrality FLOAT64 The closeness centrality score of the node.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/my-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL ClosenessCentrality(
    node_labels => ['Account'], edge_labels => ['Transfers'],
    mode => 'EXACT'
  ) YIELD node, centrality
RETURN node.id, centrality;

Clustering

Clustering algorithms group nodes into clusters (also known as communities) such that nodes within each cluster are more densely connected to each other than to nodes in other clusters. This is useful to detect potential fraud rings in fintech, identify customer segmentations in retail and more.

WeaklyConnectedComponents

WeaklyConnectedComponents finds disjoint sets of nodes such that every node in a set is reachable from any other node in the same set, but not from nodes in other sets. See connected components for algorithm details.

Function signature

WeaklyConnectedComponents(input_parameters) YIELD node, cluster

Input parameters

All common input parameters.

Output

Function yields:

Name Type Description
node GRAPH_ELEMENT (node) A node in the graph.
cluster INT64 The ID of the connected component/cluster the node belongs to. In the range of [0, N), where N is the number of nodes in the graph.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/wcc-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL WeaklyConnectedComponents(
    node_labels => ['Account'], edge_labels => ['Transfers']
  ) YIELD node, cluster
RETURN node.id, cluster;

ModularityClustering

ModularityClustering partitions the graph into clusters by optimizing for modularity, a quality measure of the density of connections within clusters compared to what would be expected in a random network. See modularity clustering for algorithm details.

Function signature

ModularityClustering(input_parameters) YIELD node, cluster

Input parameters

All common input parameters and:

Name Type Required Default Description
resolution double No 1.0 Controls the granularity of the clustering. Must be finite and non-negative. Smaller resolution values tend to lead to larger clusters. Typical values are in the range [0.5, 5]. When the resolution is zero, the algorithm (with a sufficient number of iterations) finds the connected components of the graph (ignoring edges whose weight is zero).
max_iterations int No 10 Maximum number of outer iterations of the algorithm. Must be positive. Larger values tend to lead to higher-quality clusterings at the cost of longer runtime.
max_inner_iterations int No 10 Maximum number of inner iterations of the algorithm. Must be positive. Larger values tend to lead to higher-quality clusterings at the cost of longer runtime.

Output

Function yields:

Name Type Description
node GRAPH_ELEMENT (node) A node in the graph.
cluster INT64 The ID of the community/cluster the node belongs to. In the range of [0, N), where N is the number of nodes in the graph.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/modularity-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL ModularityClustering(
    node_labels => ['Account'], edge_labels => ['Transfers'],
    resolution => 1.0, max_iterations => 10
  ) YIELD node, cluster
RETURN node.id, cluster;

CorrelationClustering

CorrelationClustering partitions nodes based on their pairwise similarity or dissimilarity, aiming to group similar nodes together while separating dissimilar ones. See correlation clustering for algorithm details.

Function signature

CorrelationClustering(input_parameters) YIELD node, cluster

Input parameters

All common input parameters and:

Name Type Required Default Description
resolution double Yes (none) Controls the granularity of the clustering. Must be explicitly defined by client. Must be finite and non-negative. Smaller values tend to lead to larger clusters. When the resolution is zero and all edge weights are positive, the algorithm (with a sufficient number of iterations) finds the connected components of the graph.
max_iterations int No 10 Maximum number of outer iterations of the algorithm. Must be positive. Larger values tend to lead to higher-quality clusterings at the cost of longer runtime.
max_inner_iterations int No 10 Maximum number of inner iterations of the algorithm. Must be positive. Larger values tend to lead to higher-quality clusterings at the cost of longer runtime.

Output

Function yields:

Name Type Description
node GRAPH_ELEMENT (node) A node in the graph.
cluster INT64 The ID of the community/cluster the node belongs to. In the range of [0, N), where N is the number of nodes in the graph.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/correlation-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL CorrelationClustering(
    node_labels => ['Account'], edge_labels => ['Transfers'],
    resolution => 0.5
  ) YIELD node, cluster
RETURN node.id, cluster;

LabelPropagation

LabelPropagation assigns nodes to clusters using a label propagation technique, starting from an initial labeling which may use seed labels provided in the input. As nodes iteratively adopt the label shared by the majority of their neighbors, densely connected groups converge to a consensus label, forming communities. See label propagation for algorithm details.

Function signature

LabelPropagation(input_parameters) YIELD node, cluster

Input parameters

All common input parameters and:

Name Type Required Default Description
seed_label_property string No (none) The node property name for the seed label.
max_iterations int No 10 The maximum number of iterations of label propagation that the algorithm performs. Must be finite and positive.

Output

Function yields:

Name Type Description
node GRAPH_ELEMENT (node) A node in the graph.
cluster INT64 The propagated cluster/label ID of the node.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/lp-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL LabelPropagation(
    node_labels => ['Account'], edge_labels => ['Transfers'],
    max_iterations => 10
  ) YIELD node, cluster
RETURN node.id, cluster;

CliqueFinding

CliqueFinding identifies potentially overlapping dense communities that meet a minimum density threshold, in such a way that every clique (a group of nodes where each node is connected to every other node) is contained in at least one cluster. See clique aggregator for algorithm details.

Function signature

CliqueFinding(input_parameters) YIELD node, clique

Note: Unlike most algorithms, there can be multiple rows for the same node since a node may be part of many cliques.

Input parameters

All common input parameters and:

Name Type Required Default Description
min_density double No 0.9 The minimum density of the clusters to be returned. Must be in [0, 1].

Output

Function yields:

Name Type Description
node GRAPH_ELEMENT (node) A node in the graph.
clique INT64 The ID of the clique the node belongs to.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/clique-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL CliqueFinding(
    node_labels => ['Account'], edge_labels => ['Transfers'],
    min_density => 0.9
  ) YIELD node, clique
RETURN node.id, clique;

Similarity

Similarity algorithms quantify how alike pairs of nodes are based on the structure of their neighborhoods. This is useful for inferring missing edges between nodes for entity resolution, recommendation and more.

Spanner Graph supports the following pairwise node similarities where similarity scores are computed based on neighborhoods of the nodes.

  • JaccardSimilarity: Based on ratio of common neighbors to total neighbors
  • CosineSimilarity: Based on edge weights of common neighbors
  • CommonNeighborsSimilarity: Based on number of shared neighbors
  • TotalNeighborsSimilarity: Based on number of neighbors of at least one of the two nodes

See pairwise node similarity for algorithm details.

These four algorithms share the same arguments and output structure.

Function signature

[JaccardSimilarity|CosineSimilarity|CommonNeighborsSimilarity|TotalNeighborsSimilarity](input_parameters) YIELD source_node, target_node, similarity

Input parameters

All common input parameters and:

Name Type Required Default Description
source_nodes Array of nodes Yes N/A
target_nodes Array of nodes Yes N/A

Output

Function yields:

Name Type Description
source_node GRAPH_ELEMENT (node) The source node used in the similarity calculation.
target_node GRAPH_ELEMENT (node) The target node used in the similarity calculation.
similarity FLOAT64 The calculated similarity score between source and target node.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/jaccard-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL JaccardSimilarity(
    source_nodes => ARRAY {
                      MATCH (n:Account {id: 7})
                      RETURN n
                    },
    target_nodes => ARRAY {
                      MATCH (n:Account)
                      WHERE n.id != 7
                      RETURN n
                    }
  ) YIELD source_node, target_node, similarity
RETURN source_node.id AS source_id, target_node.id AS target_id, similarity;

Path Finding

Path finding algorithms compute optimal routes between nodes. This is useful for identifying cheapest route in supply chains, assessing vulnerability in cybersecurity and more.

ShortestPath

ShortestPath computes shortest paths between a given set of source nodes and a given set of target nodes. See many-to-many shortest paths for algorithm details.

Function signature

ShortestPath(input_parameters) YIELD source_node, target_node, path, cost

Input parameters

All common input parameters and:

Name Type Required Default Description
source_nodes Array of nodes Yes N/A
target_nodes Array of nodes Yes N/A

Output

Function yields:

Name Type Description
source_node GRAPH_ELEMENT (node) The source node of the path.
target_node GRAPH_ELEMENT (node) The target node of the path.
path GRAPH_PATH The shortest path found.
cost FLOAT64 The cost of the shortest path.

Example

EXPORT DATA OPTIONS (
  uri = "gs://my-bucket-name/shortest-path-output.csv",
  format = "csv"
) AS
GRAPH FinGraph
CALL ShortestPath(
    source_nodes => ARRAY {
                      MATCH (n:Account {id: 7})
                      RETURN n
                    },
    target_nodes => ARRAY {
                      MATCH (n:Account {id: 16})
                      RETURN n
                    }
  ) YIELD source_node, target_node, path, cost
RETURN source_node.id AS source_id, target_node.id AS target_id, PATH_LENGTH(path) AS length, cost;

What's next