Betweenness centrality

This algorithm computes the betweenness centralities of all nodes of a directed graph with positive edge weights, defined as follows. Let \(V\) be the set of all nodes in the graph. For any pair of distinct nodes \(x\) and \(y\), define the following notation:

\(P(x, y)\) is the set of all shortest paths from node \(x\) to node \(y\).
\(\delta_{xy}(z)\), where \(z\) is a node other than \(x\) and \(y\), is the pair-dependency of nodes \(x\) and \(y\) on \(z\). This quantity represents the fraction of shortest paths from \(x\) to \(y\) that pass through \(z\) (or zero if \(x\) is not reachable from \(y\)):

\[ \delta_{xy}(z) = \begin{cases} \frac{|\{p \in P(x, y) : z \in p\}|}{|P(x, y)|} & \text{, if } y \text{ is reachable from } x ;\\ 0 & \text{, otherwise.}\\ \end{cases} \]

The betweenness centrality of node \(z\) is the sum of the pair-dependencies of all pairs of nodes (other than \(z\) itself) on \(z\):

\[ \text{betweenness_centrality}(z) = \sum_{x, y \in V \setminus \{z\} : x \neq y} \delta_{xy}(z). \]

Note that the betweenness centrality of a node is always between \(0\) and \((|V| - 1) \cdot (|V| - 2)\), inclusive.

The algorithm also supports an approximate, source-constrained mode, where it computes pair-dependencies only for source nodes \(x\) in some set of source nodes \(S\) (a subset of \(V\)) that the algorithm samples:

\[ \text{betweenness_centrality}_S(z) = \sum_{x \in S \setminus \{z\} , y \in V \setminus \{x, z\}} \delta_{xy}(z). \]

Note that the approximate betweenness centrality of a node is always between \(0\) and \(|S| \cdot (|V| - 2)\), inclusive. Moreover, the exact betweenness centrality is a special case of the approximate betweenness centrality with \(S\) equal to \(V\).

Interface specifications

The following list summarizes the expectations on the graph passed as input to this algorithm. These expectations complement the ones listed in General assumptions on input graphs.

Directed graph (an undirected graph can be converted into a directed graph by replacing each undirected edge \(xy\) with two directed edges \(xy\) and \(yx\), both with the same weight as the original edge).

Edge weights: Must be positive. The output is a list of \(|V|\) floating-point numbers, corresponding to the betweenness centralities of all nodes.

The following parameters configure the algorithm:

Name	Type	Description
`num_source_nodes`	integer (optional)	If set, must be positive, and specifies how many source nodes the algorithm should select for computing approximate betweenness centralities. If not set, or set to a number larger than the number of nodes in the graph, then the algorithm uses all nodes as source nodes, that is, the algorithm computes exact centralities. Higher values lead to better approximations, but also to longer running times.
`sampling_seed`	integer (optional)	The seed for the random number generator that the algorithm uses to select the source nodes (if `num_source_nodes` is set). The algorithm selects the same set of source nodes when given the same input graph, `num_source_nodes`, and `sampling_seed`.

Algorithm

The algorithm starts by selecting which nodes it uses as source nodes. If num_source_nodes is not set, then the algorithm considers all nodes source nodes. If num_source_nodes is set, then the algorithm attempts to select this many source nodes, using reservoir sampling (without replacement). At each sampling step, the probability of selecting a given node that the algorithm has not yet selected is proportional to its out-degree. The algorithm never selects nodes with out-degree zero. If the number of nodes with positive out-degree is smaller than num_source_nodes (in particular, this happens if num_source_nodes is larger than the number of nodes in the graph), then the algorithm will use all nodes with positive out-degree as source nodes (note that this means that the algorithm will compute exact betweenness centralities).

For each source node \(x\) (in parallel), the algorithm then runs a parallel single-source shortest path (SSSP) algorithm to compute the shortest-path distances \(d(x, y)\) from \(x\) to all other nodes \(y\). Then it computes the number of shortest paths from \(x\) to any other node \(y\), by observing that \(|P(x, x)| = 1\) and for any node \(y \neq x\) that can be reached from \(x\), the following recurrence relation holds:

\[ |P(x, y)| = \sum_{z \in V : zy \in E, d(x, z) + w_{zy} = d(x, y)} |P(x, z)| , \]

where \(w_{zy}\) denotes the weight of edge \(zy\).

The algorithm then computes the contribution of the source node \(x\) to the betweenness centrality of each other node \(z\), namely

\[ \delta_{x\odot}(z) := \sum_{y \in V \setminus \{x, z\} } \delta_{xy}(z) , \]

using the fact that \(\delta_{x\odot}(x) = 0\) and for any other node \(z \neq x\) that can be reached from \(x\), the following recurrence relation holds:

\[ \delta_{x\odot}(z) = \sum_{y \in V : zy \in E, d(x, z) + w_{zy} = d(x, y)} \frac{|P(x,z)|}{|P(x,y)|}( 1+ \delta_{x\odot}(y) ) . \]

Finally, the algorithm adds up the contributions from all source nodes to obtain the final (exact or approximate) betweenness centralities.

Because the edges in the input graph have positive weights, the preceding recurrence relations don't introduce cyclic dependencies.

Computational complexity

The algorithm is a parallel algorithm that runs in \(O(sm\log n)\) work and \(O(m\log n)\) depth, where \(n\) and \(m\) denote the number of nodes and edges in the input graph, respectively, and \(s\) denotes the number of source nodes that the algorithm uses (which is at most num_source_nodes if that parameter is set, or \(n\) otherwise).

The algorithm uses \(O(\text{num_threads} \cdot n)\) memory, where \(\text{num_threads}\) is the number of threads that the algorithm uses during execution.

Example

This example illustrates the computation of the exact betweenness centrality for a specific node.

Input graph

Consider the following input graph:

A directed graph with seven nodes (a-g) and uniform edge weights.

For the provided graph, assume that the weight of each edge is 1.0.

Solution

Betweenness centrality scores for each node x in the graph, denoted \(BC_x\):

\(BC_a\) = 1.0, \(BC_b\) = 8.0, \(BC_c\) = 6.5, \(BC_d = 9.5\)
\(BC_e\) = 8.0, \(BC_f\) = 5.0, \(BC_g\) = 0.0

Algorithm details

Recall that betweenness centrality measures a node's centrality by determining how often it lies on the shortest paths between all other pairs of distinct nodes in the graph.

Instead of detailing all the calculations for all the nodes and respective pairs, this section focuses on a specific example to illustrate the algorithm's process. Assume the goal is to calculate the betweenness centrality for node \(d\) (colored in yellow), denoted by \(BC_d\).

The input graph highlighting node d in yellow, for which the algorithm calculates the betweenness centrality.

If all the shortest paths between node \(x\) and node \(y\) bypass node \(d\), it means node \(d\) is not needed as an intermediary for that specific connection. Consequently, the pair-dependency of the pair \((x, y)\) on node \(d\) is 0.

For example, tracing all the shortest paths between pairs of nodes like \((a, b)\), \((a, e)\), or \((e, g)\) reveals that node \(d\) is not on any of them. That means those pairs of nodes don't contribute to node \(d\)'s betweenness centrality. In this particular example, there are 19 such pairs.

Formally, the pair dependency of a pair \((x, y)\) on \(d\) is the total number of shortest paths between \((x, y)\) passing through \(d\), divided by the total number of shortest paths between \((x, y)\).

Consider the pairs of nodes for which the pair-dependency on node \(d\) is not 0:

pair (a, f): There are two shortest paths, one of which passes through node \(d\) (solid dark blue edges) and one of which doesn't (dashed light blue edges):

The graph showing shortest paths between node a and f, highlighting the one passing through node d.

The pair-dependency is therefore 0.5 (that is, 1/2).

pair (a, g): Again, there are two shortest paths, one of which passes through node \(d\):

The graph showing shortest paths between node a and g, highlighting the one passing through node d.

The pair-dependency is therefore 0.5 (that is, 1/2).

pair (b, e): Here again there are two shortest paths, one of which passes through node \(d\):

The graph showing shortest paths between node b and e, highlighting the one passing through node d.

The pair-dependency is therefore 0.5 (that is, 1/2).

pair (b, f): In this case there is a unique shortest path, and it passes through node \(d\):

The unique shortest path between node b and f, which passes through node d.

The pair-dependency is thus 1.0 (that is, 1/1).

pair (b, g): There is a unique shortest path, and it passes through node \(d\):

The unique shortest path between node b and g, which passes through node d.

The pair-dependency is thus 1.0 (that is, 1/1).

pair (e, a): There is a unique shortest path, and it passes through node \(d\):

The unique shortest path between node e and a, which passes through node d.