robin

ROBustness In Network

Valeria Policastro

2021-02-09

robin

In network analysis, many community detection algorithms have been developed. However,their applications leave unaddressed one important question: the statistical validation of the results.

robin (ROBustness in Network) is an R package for the validation of community detection. It has a double aim: to study the robustness of a community detection algorithm and to compare the robustness of different community detection algorithms on the same network.

Reference in Policastro V., Righelli D., Carissimo A., Cutillo L., De Feis I. (2021) https://arxiv.org/abs/2102.03106.

It provides: 1) a procedure to examine the robustness of a community detection algorithm against random perturbations of the original graph; 2) two tests to determine the statistical difference between stability measure curves; 3) a routine to choose among different community detection algorithms the one that better fits the network of interest; 4) a graphical interactive representation.

Installation

#install.packages("robin")

If there are problems with the installation try:

# if (!requireNamespace("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# BiocManager::install("gprege")
# 
# install.packages("robin")

Loading package

library("robin")

Input

The input of the robin package is a network that can be read from different format: edgelist, pajek, graphml, gml, ncol, lgl, dimacs, graphdb and igraph graphs.

prepGraph function creates an igraph object from the input file. This step is necessary for robin execution

my_network <- system.file("example/football.gml", package="robin")
# downloaded from: http://www-personal.umich.edu/~mejn/netdata/
graph <- prepGraph(file=my_network, file.format="gml")
graph
## IGRAPH ec64e86 U--- 115 613 -- 
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from ec64e86:
##  [1] 1--  2 1--  5 1-- 10 1-- 17 1-- 24 1-- 34 1-- 36 1-- 42 1-- 66 1-- 91
## [11] 1-- 94 1--105 2-- 26 2-- 28 2-- 34 2-- 38 2-- 46 2-- 58 2-- 90 2--102
## [21] 2--104 2--106 2--110 3--  4 3--  7 3-- 14 3-- 15 3-- 16 3-- 48 3-- 61
## [31] 3-- 65 3-- 73 3-- 75 3--101 3--107 4--  6 4-- 12 4-- 27 4-- 41 4-- 53
## [41] 4-- 59 4-- 73 4-- 75 4-- 82 4-- 85 4--103 5--  6 5-- 10 5-- 17 5-- 24
## [51] 5-- 29 5-- 42 5-- 70 5-- 94 5--105 5--109 6-- 11 6-- 12 6-- 53 6-- 75
## [61] 6-- 82 6-- 85 6-- 91 6-- 98 6-- 99 6--108 7--  8 7-- 33 7-- 40 7-- 48
## [71] 7-- 56 7-- 59 7-- 61 7-- 65 7-- 86 7--101 7--107 8--  9 8-- 22 8-- 23
## + ... omitted several edges

Null model

robin offers two choices for the null model:

  1. external building according to users’ preferences, then the null graph is passed as a variable;

  2. generation by using the function random.

random function creates the null model for the robustness procedure robinRobust. The graph argument must be the same returned by prepGraph function.

The function random creates a random graph with the same degree distribution of the original graph, but with completely random edges, using the rewire function of the package igraph with the option “keeping_degseq” that preserves the degree distribution of the original network. The function rewire randomly assigns a number of edges between vertices with the given degree distribution.

graphRandom <- random(graph=graph)
graphRandom
## IGRAPH ec6b020 U--- 115 613 -- 
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from ec6b020:
##  [1]  1-- 93  1--  5  1--  6 17-- 87 15-- 70  1-- 34  7-- 46  1-- 58  1-- 56
## [10]  1-- 91  1-- 10 61-- 99  2-- 73 28-- 40  2-- 34  2-- 93  2-- 26  2-- 76
## [19]  2-- 90  2-- 35  2--104 54--105  2-- 67 23-- 37 22-- 47 35--102 15-- 59
## [28]  3--114  3-- 48 64-- 85  3-- 65 12-- 73  3-- 29  3-- 56  3--107  4--  6
## [37]  4-- 83  4-- 27  4-- 41 53-- 62 18-- 81  4-- 97 45-- 55  4-- 82  4-- 85
## [46]  4--103  5-- 79 10-- 69  5-- 64  5-- 24  5-- 87  5-- 94 63--108 15-- 28
## [55] 88-- 99  5--106 10-- 11  6--102  6--110 75--105  6-- 72  6-- 75 25-- 50
## [64]  6--113  6-- 36  6--108  8-- 44  7-- 33  9-- 75 19--113  7-- 56  7-- 36
## + ... omitted several edges

Network visualization

plotGraph function offers a graphical representation of the network with the aid of networkD3 package.

plotGraph(graph)

Community detection

methodCommunity function detects communities using all the algorithms implemented in igraph package: “walktrap”, “edgeBetweenness”, “fastGreedy”, “spinglass”, “leadingEigen”, “labelProp”, “infomap”, “optimal”, “other”.

methodCommunity(graph=graph, method="fastGreedy") 
## IGRAPH clustering fast greedy, groups: 6, mod: 0.55
## + groups:
##   $`1`
##    [1]   7  14  16  33  40  48  61  65 101 107
##   
##   $`2`
##    [1]   8   9  10  17  22  23  24  42  47  50  52  54  68  69  74  78  79  89
##   [19] 105 109 111 112 115
##   
##   $`3`
##    [1]   1   2  20  26  30  31  34  36  38  46  56  80  81  83  90  94  95 102
##   [19] 104 106 110
##   + ... omitted several groups/vertices

membershipCommunities function detects the community membership.

membershipCommunities(graph=graph, method="fastGreedy") 
##   [1] 3 3 5 5 5 5 1 2 2 2 5 5 4 1 4 1 2 6 4 3 6 2 2 2 5 3 4 6 5 3 3 4 1 3 4 3 6
##  [38] 3 4 1 5 2 6 4 6 3 2 1 6 2 5 2 5 2 4 3 6 6 6 6 1 4 6 6 1 6 6 2 2 5 6 4 5 2
##  [75] 5 6 6 2 2 3 3 5 3 5 5 4 6 6 2 3 5 6 6 3 3 6 6 6 5 4 1 3 5 3 2 3 1 5 2 3 2
## [112] 2 6 6 2

Community visualization

plotComm function produces an interactive 3D plot of the communites detected by the chosen algorithm.

members <- membershipCommunities(graph=graph, method="fastGreedy")
plotComm(graph=graph, members=members)

Robustness of a community detection algorithm

robinRobust function implements the validation of the community robustness. In this example we use “vi” distance as stability measure, “indipendent” type procedure and “louvain” as community detection algorithm.

Users can choose also different measures as: “nmi”,“split.join”, “adjusted.rand”.

proc <- robinRobust(graph=graph, graphRandom=graphRandom, measure="vi", 
                  method="louvain", type="independent")

plotRobin function plots the stability measure curves. The (x,y)-axes represent the percentuage of perturbation and the average of the stability measure, respectively.

plotRobin(graph=graph, model1=proc$Mean, model2=proc$MeanRandom, measure="vi",
legend=c("real data", "null model"))

The procedure implemented depends on the network of interest.In this example, the Louvain algorithm fits good the network of interest,as the curve of the stability measure assumes lower values than the one obtained by the null model.

Statistical tests

The differeces between the stability measure curves are tested using: 1) Functional Data Analysis (FDA); 2) Gaussian Process (GP). Moreover to quantify the differences between the curves when they are very close the Area Under the Curves (AUC) are evaluated.

robinFDATest function implements the FDA testing.

robinFDATest(graph=graph, model1=proc$Mean, model2=proc$MeanRandom, 
             measure="vi",legend=c("real data", "null model"))
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is  simpler,
##   and 'argvals' should be;  now  dim(argvals) =  13 ;  dim(y) =  13 x 20 
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"
## Warning: Removed 87 row(s) containing missing values (geom_path).

## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name           grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## $adj.pvalue
## [1] 0.5766 0.0035 0.0018 0.0018 0.0018 0.0021 0.0204 0.0040 0.0039
## 
## $pvalues
## [1] 0.5766 0.0018 0.0018 0.0018 0.0018 0.0018 0.0204 0.0018 0.0018

The first figure represents the stability measure plot using “louvain” algorithm for detecting communities. The second one represents the corresponding adjusted p-values of the Interval Testing procedure. Horizontal red line corresponds to the critical value 0.05.

robinGPTest function implements the GP testing.

robinGPTest(model1=proc$Mean, model2=proc$MeanRandom)
##  Profile  1 
##  Profile  2
## [1] 46.73269

The output is the Bayes Factor.

robinAUC function implements the AUC.

robinAUC(graph=graph, model1=proc$Mean, model2=proc$MeanRandom, 
             measure="vi")
## $area1
## [1] 0.1148519
## 
## $area2
## [1] 0.2573932

The outputs are the area under the two curves

Comparison of two community detection algorithms

robinCompare function compares two detection algorithms on the same network and permits the user to choose the one that better fits the network of interest.

In this example we consider the “Fast Greedy” and “Louvain” algorithms.

We firstly plot the communities dectected by both algorithms.

membersFast <- membershipCommunities(graph=graph, method="fastGreedy")
membersLouv <- membershipCommunities(graph=graph, method="louvain")
plotComm(graph=graph, members=membersFast)
plotComm(graph=graph, members=membersLouv)

Secondly, we compare them with robinCompare function.

comp <- robinCompare(graph=graph, method1="fastGreedy",
                method2="louvain", measure="vi", type="independent")

Thirdly, we plot the curves of the compared methods.

plotRobin(graph=graph, model1=comp$Mean1, model2=comp$Mean2, measure="vi", 
legend=c("fastGreedy", "louvain"), title="FastGreedy vs Louvain")

In this example, the Louvain algorithm fits better the network of interest, as the curve of the stability measure assumes lower values than the one obtained by the Fast greedy method.

Statistical tests

The following procedures test the statistical differences between the two curves using two different methods

robinFDATest(graph=graph, model1=comp$Mean1, model2=comp$Mean2, measure="vi")
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is  simpler,
##   and 'argvals' should be;  now  dim(argvals) =  13 ;  dim(y) =  13 x 20 
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"
## Warning: Removed 70 row(s) containing missing values (geom_path).

## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name           grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## $adj.pvalue
## [1] 0.4060 0.0353 0.0425 0.0053 0.0225 0.1448 0.8266 0.9493 0.2464
## 
## $pvalues
## [1] 0.4060 0.0047 0.0425 0.0021 0.0038 0.0125 0.5606 0.9493 0.0288
robinGPTest(model1=comp$Mean1, model2=comp$Mean2)
##  Profile  1 
##  Profile  2
## [1] 26.87333

while

robinAUC(graph=graph, model1=comp$Mean1, model2=comp$Mean2, measure="vi")
## $area1
## [1] 0.1659276
## 
## $area2
## [1] 0.1205634

calculates the area under the curves.