This vignette provides a definition of full, active and manual risk set, it explains how a manual risk set is declared in the processing function remify::remify(), and it shows how the processed risk set looks like in the remify object.


Consider the remify object for the network randomREHsmall.

library(remify) # loading package
data(randomREHsmall) # data

# processing the edgelist 
reh <- remify(edgelist = randomREHsmall$edgelist,
                          directed = TRUE, # events are directed
                          ordinal = FALSE, # model with waiting times
                          model = "tie", # tie-oriented modeling   
                          actors = randomREHsmall$actors,
                          origin = randomREHsmall$origin,
                          omit_dyad = NULL)

# summary(reh)                                

Definition of risk set

A relational event history consists of a time-ordered sequence of (directed or undirected) interaction. For each event, we know:

  • its time of occurrence, either as timestamp/date/continuous value or just as order
  • the actors that were involved in the realtional event
  • the type of the event (if measured)

For instance, the first five events of the randomREHsmall sequence are reported as follows

randomREHsmall$edgelist[1:5,]
##                  time  actor1 actor2
## 1 2020-03-05 16:36:37  Colton  Kayla
## 2 2020-03-05 19:34:11    Lexy Colton
## 3 2020-03-05 20:49:37  Colton  Kayla
## 4 2020-03-05 21:38:23  Colton  Kayla
## 5 2020-03-06 06:54:12 Richard Colton

where time, actor1, actor2 describe each observed event in the sequence (Note that in this example the type of events is not annotated).

When modeling a relational event sequence, we have to define per each time point a risk set, which consists of the set of those relational events (dyads) that at a specific time point were likely to be observed (this set also contains the event that is actually observed at a specific time point). The definition of the risk set is an important building block of the likelihood function for both tie-oriented and actor-oriented modeling framework. In the sections of this vignette, we discuss three possible definitions of the risk set: full, active and manual risk set. These three types of risk set can be processed with remify::remify() by specifying the risk set type to the input argument riskset.


The full risk set

The most common definition of the risk set assumes that all the possible dyads are likely to occur over the whole observation period. We refer to this definition as full risk set. If the network has N actors and it consists of directed events that can assume a number of C possible event types, then the risk set will be characterized by all the possible directed dyads among N actors, which are D = N(N-1)C, or D = N(N-1)C/2 in the case of undirected dyads. For instance, in the random network (randomREHsmall) dyads are directed, actors are N = 5 and event types are C = 1, therefore we expect the dimension of the risk set to be D = 5 * 4 * 1 = 20. The first five dyads in the full risk set will be

# method getDyad(), see more in ?remify::getDyad
getDyad(x = reh, dyadID = c(1:5)) 
##   dyadID    actor1    actor2
## 1      1    Colton Francesca
## 2      2    Colton     Kayla
## 3      3    Colton      Lexy
## 4      4    Colton   Richard
## 5      5 Francesca    Colton

The ID of the dyads (dyadID) corresponds to the order of the dyads used by the functions in `and it is processed by the functionremify::remify()`. The ID of the dyads is defined by a two-steps approach:

  1. Actors’ and types’ names are first sorted according to their
    alphanumeric

    The alphanumeric order follows first the order of numbers from 0 to 9, then the alphabetical order of the letters.

    For instance, given the vector of names c("user22","0usr","1user","1deer"), its alphanumeric order will be c("0usr","1deer","1user","user22)

    order, that for the actors in the random network will be,
# sorted vector of actors' names
sorted_actors <- sort(randomREHsmall$actors)
sorted_actors
## [1] "Colton"    "Francesca" "Kayla"     "Lexy"      "Richard"
# number of actors in the network
N <- length(randomREHsmall$actors)

and for the event type will be

# no event type, we set it to an empty string
sorted_types <- c(" ") 

# C = 1 for 'randomREHsmall'
C <- length(sorted_types) 

In this phase, the processing function remify::remify() will also assign numeric IDs to both actors and event types

# IDs of actors will consist of an integer number from 1 to N
names(sorted_actors) <- 1:N
sorted_actors
##           1           2           3           4           5 
##    "Colton" "Francesca"     "Kayla"      "Lexy"   "Richard"
# IDs of types will be an integer number from 1 to C
names(sorted_types) <- 1:C # in this case is one (artificial) event type
sorted_types
##   1 
## " "
  1. dyads are defined by the triple c(actor1,actor2,type) that is found by looping first on actor2, then actor1, and finally type. An example of the loops is shown below
# initializing matrix object where to store the dyads as [actor1,actor2,type]
dyad_mat <- matrix(NA, nrow = N*(N-1)*C, ncol = 3)
colnames(dyad_mat) <- c("actor1","actor2","type")
rownames(dyad_mat) <- 1:(N*(N-1)*C)

# initializing position index
d <- 1 

# start three loops
for(type in sorted_types){ # loop over event types, 
  for(actor1 in sorted_actors){ # loop over actor1
    for(actor2 in sorted_actors){ # loop over actor2
      if(actor1!=actor2){ # avoid self-loops
        dyad_mat[d,] <- c(actor1,actor2,type)
        d <- d + 1
      }
    }
  }
}

 # same result as showed above by using the method `getDyad()`
dyad_mat[1:5,]
##   actor1      actor2      type
## 1 "Colton"    "Francesca" " " 
## 2 "Colton"    "Kayla"     " " 
## 3 "Colton"    "Lexy"      " " 
## 4 "Colton"    "Richard"   " " 
## 5 "Francesca" "Colton"    " "
# checking the size of the _full_ risk set that is 20
dim(dyad_mat)[1] 
## [1] 20

The matrix dyad_mat above describes the full risk set and the row indices correspond to the ID of each dyad (dyadID). For instance, the dyadID is useful in the case of tie-oriented modeling, where the remify object will contain the attribute named "dyad", which describes the time-ordered sequence of ID’s as to the observed dyads.

# accessing the first values of the attribute "dyad" 
# (attribute available only for tie-oriented modeling)
head(attr(reh,"dyad"))
## [1]  2 13  2  2 17  2

Visualizing the risk set

A possible way for visualizing the risk set composition at each time point consists in plotting a grid with actors’ names on both axes: referring to the senders (on the y-axis) and to the receivers (on the x-axis).

Cosidering the first four time points of randomREHsmall, we observe: the (directed) dyad (Colton,Kayla) at time \(t_1\), \(t_3\) and \(t_4\) and the (directed) dyad (Lexy,Colton) at time \(t_2\). The cell corresponding to the relational event occurred at each time point is colored in green. The rest of the cells are colored in gray, indicating those dyadic events that could have occurred and they are part of the risk set. Cells in white, indicate those events that could not occur (in this case the self-loops, like (Colton,Colton), where sender and receiver are the same actor).

A full risk set in undirected networks will assume a particular grid visualization. The dyads at risk will be on the lower triangular grid, because the actor names c(actor1,actor2) describing the dyad in the input edgelist are sorted according to their alphanumeric order before being processed. For instance, the event at \(t_2\) c("Lexy","Colton"), will be rearranged as c("Colton","Lexy"), and the risk set will change as follows in the picture below.


The active and manual risk set

A full risk set is assumed to have a constant structure throughout the whole event history. All the possible dyads are assumed to be always at risk regardless any consideration about: (i) the possibility of one or more actors to still be able to interact with the other actors during the observation period, (ii) the possiblity of some event types to actually occur.

From this observation, the concept of a risk set structure that changes over time may accomodate certain relational event histories in which, actors, dyads or event types may not be observed within prespecified time windows. Two alternative definitions of the risk set can be declared with remify::remify():

  • the active risk set, which reduces its size to only the dyadic events that are observed across the event history and it mantains the same (modified) structure over time
  • the manual riskset, which allows the user to specify a more flexible risk set with a time-varying structure, in which actor, dyads or event types can be excluded at specific time intervals of the event history.

The active risk set

There exist relational event networks that have a large number of actors and the number of observed dyads is by far lower than the potential number of dyads (i.e. the size \(D\) of the full risk set).

A measure of global density can be calculated over the whole event sequence as the ratio \(D_{\text{obs}}/D\), where \(D_{\text{obs}}\) is the number of observed dyadic events and it can vary between \(1\) and \(D\). When a very low portion of dyads takes action in the network, we can think of restricting the risk set only to such observed dyads. This risk set reduction leads to the active risk set, which mantains the same structure over time but is restricted to the dyads that were observed at least one time in the event history. This type of risk set can be declared by specifying riskset = "active" in remify::remify()

The use of the active risk set can significantly decrease the computational time of both the calculation of statistics and the estimation of model parameters. However, the reduction of the risk set to the set of active (observed) dyads causes the exclusion of dyadic events that perhaps should be still included in the risk set. It is always good practice to explore the set of active dyads and take the due considerations given the type of data at hand, for instance: (i) expecting potential biases coming from the definition of an active risk set, (ii) considering to define a modification of the active risk set that avoids the exclusion of a set of additional actors/dyads/event types from the risk set even if they were not observed in the event history.


The manual risk set

There are circumstances in which one or more actors cannot take part in a relational event or an event type cannot be observed. This can happen either for a time window that can assume one of the following definitions:

  • the time window is embedded in the event history, e.g., some actors temporarily drop the network
  • the time window starts from a time point after the start of the event history and stops with the end of the history, e.g., when actors leave the network without poissibility of return
  • the time window starts with beginning of the event history and stops before the end of the study, e.g., actors that join the network after the beginning of the event history and could only interact after they join.

To give a grasp of a few possible real scenarios in which actors/dyads/event types may be excluded from the risk set, we introduce three examples:

  • Example 1: when the relational event network is about in-person interactions (e.g., at the university or at school) and it is measured over days (or even weeks or months). One or more actors may not be present during one or more days, therefore we want to exclude such actors from the risk set for the specific time spans in which they could not interact. Furthermore, one or more actors may join (leave) the network after (before) the beginning (end) of the event history and this can also define specific restrictions on the risk set for such actors.

  • Example 2: when relational events are observed at a conference where multiple sessions or workshops can occur at the same time. In this case, the set of dyads at risk reduces to smaller different risk sets, each one based on the groups of actors participating at a specific session or workshop (constraints on the risk set here apply as a response to spatial constraints during a sesison or a workshop).

  • Example 3: when the relational events are digital interactions and one or more actors cannot interact one another because they do not appear in each other’s friends list (which may be a requirement in order to be able to interact).

In such scenarios and in many others, a full risk set would account for relational events that are not feasible and this may even lead to biased estimates of the model parameters. On contrary, it is possible to account for changes of the risk set over time by defining a manual risk set.

A manual risk set consists of a time-based definition of the ensemble of dyads at risk where the user specifies which dyads to remove from the full risk set at a specific time interval of the study. This can be done via the omit_dyad argument of the function remify::remify(). The user can define multiple modifications of the full risk set occurring at different, or even overlapping, time windows. In each modification, the user specifies the set of actors, or dyads, or event types to be omitted.

Consider the first four time points of the small random network and assume this time that actors "Richard" and "Francesca" didn’t join the study until the second day of the study. This means that the risk set for at least the first four time points will have the following composition,