`netjack`

This vignette provides an introduction to the `netjack`

package and overviews common data input and analysis pipelines. For a tutorial about creating custom network functions and network statistics see the “Custom Functions in `netjack`

” vignette.

Samples of *registered* networks, or networks that consist of the same node set, are increasing common in a variety of scientific fields. The `netjack`

package implements an framework to let researchers quickly manipulate and analyze large samples of registered networks, as well as develop custom functionality that builds on the existing `netjack`

framework.

In this vignette, we go over the following procedures:

- Basic data objects and function classes in
`netjack`

- Data Input
- Network Manipulation Functions
- Network Statistic Functions
- Difference, Group and Group Difference Testing.

`netjack`

is built around a series of S4 classes that represent different levels of network manipulation, and functions that act on each level of network manipulation. This section describes these classes and functions at a summary level.

The most basic data object is the `Net`

object. This represents a single network, along with node level variables, such as partition assignments.

A `NetSample`

object represents a collection of `Net`

objects, along with network level variables. For example, if each network in the sample represents a single individual’s functional brain network, a network level variable could be the diagnostic status of each individual.

The `Net`

and `NetSample`

objects are representations of raw data. To work with these data objects, the `net_apply`

function can be used to apply a *network manipulation function*. This class of functions take a single network, and perform a series of manipulations on the network, returning the manipulated networks. As an example the `node_jackknife()`

function applied to a `Net`

object returns a set of `Net`

objects corresponding to the original network with each node removed in turn.

To represent the output of a `net_apply`

, the S4 classes `NetSet`

and `NetSampleSet`

are used. These classes represent both the original `Net`

or `NetSample`

, as well as the product of the `net_apply`

.

One common procedure that network analysis uses is the calculation of various network statistics. In `netjack`

*network statistic functions* can be used via the `net_stat_apply`

function to quickly be calculated for both `NetSet`

and `NetSampleSet`

objects. The output of a `net_stat_apply`

is a `NetStatSet`

or `NetSampleStatSet`

object.

`netjack`

implements several statistical testing procedures that are described below. Additionally, to extract a `data.frame`

of the calculated network statistics from a `NetStatSet`

or `NetSampleStatSet`

object, `to_data_frame()`

can be used.

To illustrate the various features of `netjack`

, two simulated datasets are provided, GroupA and GroupB. Networks can be loaded into the `netjack`

framework from adjacency matrices, either as single Net objects, or more commonly as one NetSample object.

```
library(netjack)
data("GroupA")
Subject1 <- as_Net(GroupA[[1]], "Subject1")
show(Subject1)
```

```
## Net
## Net Name: Subject1
## Node Size: 20
## Node variables: index
```

Node Variables can be assigned during construction as a named list:

```
Subject2 <- as_Net(GroupA[[2]], "Subject1", node.variables = list(community = c(rep(1,10), rep(2,10))))
show(Subject2)
```

```
## Net
## Net Name: Subject1
## Node Size: 20
## Node variables: index community
```

Typically, a researcher using `netjack`

is analyzing a sample of registered networks rather than a single network. `NetSample`

objects can be constructed in much the same way as a `Net`

object can, using lists of adjacency matrices rather than a single matrix:

```
GroupASamp = as_NetSample(GroupA, net.names = as.character(1:20) , node.variables = list(community = c(rep(1,10), rep(2,10))), sample.variables = list(group = rep(1, 20)))
show(GroupASamp)
```

```
## Net Sample
## Net Names: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Sample Variables: group orig.net
```

Importantly, when a NetSample object is created, the list of node variables is applied to every network. This is appropriate in registered network applications, where for example, in neuroimaging networks, each node represents a specific brain region, and each node is the same for each subject.

Sample variables represent network level characteristics. For example, if each network represents a functional connectivity network from a neuroimaging study, a sample variable might be the diagnostic status of a particular individual.

Once a sample of networks is represented as a `NetSample`

object, a network manipulation function can be applied. As described previously, these functions change a network in some way. As an example, the `node_jackknife`

function returns a set of networks, where each node has been removed in turn.

Network manipulation functions can be applied via `net_apply`

to a `Net`

object to produce a `NetSet`

, or can be applied via `net_apply`

to a `NetSample`

object to produce a `NetSampleSet`

```
Sub1Jackknifed <- net_apply(network = Subject1, net.function = "node_jackknife")
show(Sub1Jackknifed)
```

```
## Net Set
## Applied Function: "node_jackknife"
## Function Arguments:
## Original Network Name: Subject1
## Contains 20 manipulated networks.
```

```
GroupAJackknifed <- net_apply(network = GroupASamp, net.function = "node_jackknife")
show(GroupAJackknifed)
```

```
## Net Sample Set
## Applied Function: "node_jackknife"
## Function Arguments:
## Contains 20 NetSets
```

Network manipulation functions that involve node level variables can be used by including them in the `net.function.args`

argument within `net_apply`

. For example, `network_jackknife`

removes sub-networks on the basis of a node level grouping variable.

```
GroupANetJackknifed <- net_apply(GroupASamp, net.function = "network_jackknife", net.function.args = list(network.variable = "community"))
show(GroupANetJackknifed)
```

```
## Net Sample Set
## Applied Function: "network_jackknife"
## Function Arguments:
## network.variable = community
## Contains 20 NetSets
```

Once a network manipulation function has been applied, network statistics can be computed.

A network statistic is a single numerical summary of some aspect of a network’s structure or topology. `netjack`

focuses on the analysis of networks at a network statistic level, and provides simple interfaces for calculating network statistics on collections of networks.

Similar in structure to the network manipulation functions, network statistic functions are applied via a `net_stat_apply`

function, which can be used with either a `NetSet`

object, or a `NetSampleSet`

object. This produces a `NetStatSet`

and a `NetSampleStatSet`

respectively.

`Sub1JackknifedGlobEff <- net_stat_apply(Sub1Jackknifed, net.stat.fun = global_efficiency)`

```
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
```

`show(Sub1JackknifedGlobEff)`

```
## Net Statistic Set
## Applied Function: "node_jackknife"
## Function Arguments:
## Applied Statistic Function: global_efficiency
## Statistic Function Arguments:
## Original Network Name: Subject1
```

```
GroupAJackknifedGlobEff <- net_stat_apply(GroupAJackknifed, net.stat.fun = global_efficiency)
show(GroupAJackknifedGlobEff)
```

```
## Net Sample Statistic Set
## Applied Function: "node_jackknife"
## Function Arguments:
## Applied Statistic Function: global_efficiency
## Statistic Function Arguments:
## Original Network Names: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
```

Once a `NetStatSet`

or `NetSampleStatSet`

has been computed, the computed network statistics can be extracted into a `data.frame`

by using the `to_data_frame`

function. The data frame returned is in long format, with a row for each manipulated network.

```
Sub1Data = to_data_frame(Sub1JackknifedGlobEff)
names(Sub1Data)
```

`## [1] "orig.net" "orig.stat" "net.names" "nets.stat" "stat.name"`

```
GroupAData = to_data_frame(GroupAJackknifedGlobEff)
head(GroupAData)
```

```
## orig.net orig.stat net.names nets.stat stat.name group
## 1 1 0.6447368 1 0.6403509 global_efficiency 1
## 2 1 0.6447368 2 0.6432749 global_efficiency 1
## 3 1 0.6447368 3 0.6461988 global_efficiency 1
## 4 1 0.6447368 4 0.6403509 global_efficiency 1
## 5 1 0.6447368 5 0.6491228 global_efficiency 1
## 6 1 0.6447368 6 0.6491228 global_efficiency 1
```

`netjack`

implements three statistical testing procedures in easy to use functions for both tabular and graphical output. The first test is the *difference test* which assess if any specific network manipulation causes a significant difference from the original network in a given network statistic. This test is implemented with the `diff_test`

and graphically with the `net_ggPlot`

function. Plotting uses the `ggplot`

package, making the aesthetic presentation easily manipulated.

The example dataset `GroupA`

has been generated so that the removal of node 10 will result in a significant difference in global efficiency from the original networks. Below are the full set of steps for this analysis:

```
GroupASamp = as_NetSample(GroupA, net.names = as.character(1:20))
GroupAJackknifed = net_apply(GroupASamp, net.function = "node_jackknife")
GroupAJackknifedGlobEff = net_stat_apply(GroupAJackknifed, net.stat.fun = "global_efficiency")
diff_test(GroupAJackknifedGlobEff)
```

```
## net.names diff p adjusted.p
## mean of x 1 0.0023245614 7.548641e-02 1.161329e-01
## mean of x1 10 -0.1257156781 1.144533e-12 2.289065e-11
## mean of x2 11 0.0032017544 1.426799e-03 9.285723e-03
## mean of x3 12 0.0029093567 2.114492e-03 9.285723e-03
## mean of x4 13 0.0004239766 7.472946e-01 7.866259e-01
## mean of x5 14 0.0017397661 1.386111e-01 1.848149e-01
## mean of x6 15 0.0018859649 3.286142e-02 6.572285e-02
## mean of x7 16 0.0020321637 9.205373e-02 1.315053e-01
## mean of x8 17 0.0011549708 3.240808e-01 3.600898e-01
## mean of x9 18 0.0032017544 5.382591e-03 1.794197e-02
## mean of x10 19 0.0024707602 4.495854e-02 8.174280e-02
## mean of x11 2 0.0013011696 1.567161e-01 1.958951e-01
## mean of x12 20 0.0001315789 8.786445e-01 8.786445e-01
## mean of x13 3 0.0037865497 3.843974e-04 3.843974e-03
## mean of x14 4 0.0024707602 1.850869e-02 4.627173e-02
## mean of x15 5 0.0023245614 1.758724e-02 4.627173e-02
## mean of x16 6 0.0033479532 2.321431e-03 9.285723e-03
## mean of x17 7 0.0023245614 2.621901e-02 5.826446e-02
## mean of x18 8 0.0018859649 6.155882e-02 1.025980e-01
## mean of x19 9 0.0011549708 2.974065e-01 3.498900e-01
```

`diff_test_ggPlot(GroupAJackknifedGlobEff)`

The second test implemented is the group test. This examines differences between to sample level groups (such as healthy controls and individuals with a disorder) in a network statistic, subject to a network manipulation.

In this example, `GroupA`

has been simulated to have node 10 be important for global efficiency, while `GroupB`

has node 15 as important for global efficiency. We combine these datasets into a single object, and perform the group testing now.

```
fullGroup = c(GroupA, GroupB)
fullSamp = as_NetSample(fullGroup,net.names = as.character(1:40), sample.variables = list(group = c(rep("GroupA", 20), rep("GroupB", 20))))
fullSampJackknifed = net_apply(fullSamp, net.function = "node_jackknife")
fullSampleJackknifedGlobEff = net_stat_apply(fullSampJackknifed, net.stat.fun = "global_efficiency")
group_test(fullSampleJackknifedGlobEff, grouping.variable = "group")
```

```
## net.names p 1 2 adjusted.p
## 1 1 3.479674e-01 0.6416667 0.6461988 3.479674e-01
## 2 10 5.457061e-15 0.5136264 0.6459064 5.457061e-15
## 3 11 6.055813e-01 0.6425439 0.6445906 6.055813e-01
## 4 12 6.505909e-01 0.6422515 0.6440058 6.505909e-01
## 5 13 1.402365e-01 0.6397661 0.6457602 1.402365e-01
## 6 14 1.435786e-01 0.6410819 0.6469298 1.435786e-01
## 7 15 3.240468e-16 0.6412281 0.5295224 3.240468e-16
## 8 16 1.961826e-01 0.6413743 0.6464912 1.961826e-01
## 9 17 9.077830e-02 0.6404971 0.6472222 9.077830e-02
## 10 18 7.671224e-01 0.6425439 0.6438596 7.671224e-01
## 11 19 2.873317e-01 0.6418129 0.6451754 2.873317e-01
## 12 2 2.970671e-01 0.6406433 0.6448830 2.970671e-01
## 13 20 1.230613e-01 0.6394737 0.6461988 1.230613e-01
## 14 3 8.593489e-01 0.6431287 0.6438596 8.593489e-01
## 15 4 2.796704e-01 0.6418129 0.6464912 2.796704e-01
## 16 5 3.257748e-01 0.6416667 0.6457602 3.257748e-01
## 17 6 7.125387e-01 0.6426901 0.6441520 7.125387e-01
## 18 7 2.771352e-01 0.6416667 0.6461988 2.771352e-01
## 19 8 1.800351e-01 0.6412281 0.6467836 1.800351e-01
## 20 9 3.712339e-01 0.6404971 0.6441520 3.712339e-01
```

`group_test_ggPlot(fullSampleJackknifedGlobEff, grouping.variable="group")`

Finally, the group difference test assesses if the network manipulation has a differential impact on the network statistic between the groups. This test is implemented with `group_diff_test`

and graphically with `netGroupDiff_ggPlot`

.

`group_diff_test(fullSampleJackknifedGlobEff, grouping.variable = "group")`

```
## net.names p 1 2 adjusted.p
## 1 1 7.837022e-01 0.0023245614 0.0027777778 8.707803e-01
## 2 10 5.474698e-19 -0.1257156781 0.0024853801 5.474698e-18
## 3 11 2.053881e-01 0.0032017544 0.0011695906 4.211775e-01
## 4 12 3.727280e-02 0.0029093567 0.0005847953 1.723862e-01
## 5 13 2.105888e-01 0.0004239766 0.0023391813 4.211775e-01
## 6 14 2.348841e-01 0.0017397661 0.0035087719 4.270620e-01
## 7 15 4.898286e-23 0.0018859649 -0.1138986355 9.796572e-22
## 8 16 5.036256e-01 0.0020321637 0.0030701754 7.748087e-01
## 9 17 5.329649e-02 0.0011549708 0.0038011696 1.776550e-01
## 10 18 1.113582e-01 0.0032017544 0.0004385965 3.181664e-01
## 11 19 6.274374e-01 0.0024707602 0.0017543860 8.646142e-01
## 12 2 9.032667e-01 0.0013011696 0.0014619883 9.508071e-01
## 13 20 4.309656e-02 0.0001315789 0.0027777778 1.723862e-01
## 14 3 1.857876e-02 0.0037865497 0.0004385965 1.238584e-01
## 15 4 6.484607e-01 0.0024707602 0.0030701754 8.646142e-01
## 16 5 9.900847e-01 0.0023245614 0.0023391813 9.900847e-01
## 17 6 1.398629e-01 0.0033479532 0.0007309942 3.496572e-01
## 18 7 7.441306e-01 0.0023245614 0.0027777778 8.707803e-01
## 19 8 3.589458e-01 0.0018859649 0.0033625731 5.982429e-01
## 20 9 7.695662e-01 0.0011549708 0.0007309942 8.707803e-01
```

`group_diff_test_ggPlot(fullSampleJackknifedGlobEff, grouping.variable="group")`

From this, we can see that when node 10 or node 15 are removed, this results in a significantly different change from the original global efficiency value between Group A and Group B.