```
library(anovir)
#> Loading required package: bbmle
#> Loading required package: stats4
```

A pathogen's virulence is a key parameter in the mathematical models on which most epidemiological theory is based. In these models virulence is generally defined as the increased per capita rate of mortality of infected hosts due to infection [1,2]. This package allows this rate of mortality to be estimated from the relative survival of hosts in experimentally-infected vs. uninfected treatments over time.

The analysis of relative survival is frequently encountered in the medical literature where it is the method of choice for estimating how survival in a population of patients is affected by a specific factor [3–6]. This factor can be a disease or illness, e.g., a particular type of cancer, or a specific event, such as, breaking a hip bone. This approach compares the observed survival in the target population of patients against their expected survival had they not experienced the factor in question; the difference between them being attributable to the factor involved.

The following sections briefly outlines the rationale for this approach when applied to comparing survival in matching populations of hosts differing only in their experimental exposure to infection, or not. For a more detailed version, see [7].

The mathemetical models on which most epidemiological theory is based are population dynamics models describing the flow of hosts among different compartments, or subpopulations, of the host population as a whole [1,2]. For example the following model has two compartments, or subpopulations, for uninfected (*X*) and infected (*Y*) hosts,

where all hosts are of the same age and the respective rates of change in the size of each subpopulation at time *t*, *dX/dt, dY/dt*, are determined by the value of various parameters at time *t*;

*b*(*t*), the birth rate of uninfected hosts,

\(\beta(t)\), the probability infection is transmitted when infected and uninfected hosts come into contact,

\(\mu(t)\), the 'natural' or background rate of mortality of uninfected hosts, and

\(\nu(t)\), **the pathogen's virulence**, which is the additional rate of mortality of infected hosts due to infection.

where the number of secondary infections a single infected host is expected to create when introduced into a susceptible population of uninfected hosts is proportional to how well the disease transmits (\(\beta\)) and the size of the population into which it is introduced (*X*), multiplied by the average longevity of an infected host, 1/(\(\mu+\nu\)).

The type of model described above can also be used to describe the observed dynamics in experimental populations of infected and uninfected hosts. Furthermore, experiments can be designed such that the only population dynamics to be observed will be those due to host mortality,

\[\begin{align} -\frac{dX / dt}{X(t)} &= \mu(t) \\ \\ -\frac{dY / dy}{Y(t)} &= \mu(t) + \nu(t) \\ \end{align}\]where the per capita rate of decrease in the size of the uninfected population at time *t*, -(*dX/dt*)/*X*(*t*), is due only to background mortality at time *t*, and that for the infected population, -(*dY/dt*)/*Y*(*t*), is due to the sum of both the background rate of mortality and mortality due to infection.

These conditions can be achieved, for example, by using only juvenile hosts to avoid any dynamics due to births and by housing infected and uninfected hosts separately to eliminate any dynamics due to the transmission of disease.

The analysis of relative survival can be used to estimate these two rates of mortality from the type of data routinely generated in experiments comparing survival in cohorts of experimentally-infected vs. uninfected hosts. This follows a brief description of some survival functions and how they are related.

The probability that an individual alive in a particular population or treatment at the beginning of an experiment, *t*_{0}, will still be alive at time *t* can be expressed as,

where *S*(*t*) is the cumulative survival function in continuous time at time *t*. It is the complement of the cumulative density function, *F*(*t*), for the probability the individual will have died by time *t*; 0 ≤ *S*(*t*), *F*(*t*) ≤ 1.

Differentiating *F*(*t*) with respect to time gives the rate at which mortality reduces the size of the population at time *t*,

where *f*(*t*) is the probability density function for mortality. It corresponds with data collected for the number of individuals dying at time *t*, divided by the initial size of the population.

Whereas *f*(*t*) represents the probability an individual alive at *t*_{0} will die at time *t*, the hazard function, *h*(*t*), represents the probability an individual alive at time *t* will die at time *t*,

where the probability of dying at time *t*, *f*(*t*), is corrected by the probability of being alive at time *t*, *S*(*t*). This is the rate of mortality in the population at time *t* and represents the per capita risk of dying at time *t*.

If the hazard function *h*(*t*) represents the risk an individual alive at time *t* will die at time *t*, the cumulative hazard function, *H*(*t*), represents the individual's accumulated exposure to the risk of dying at time *t*. It is related to the cumulative survival function, *S*(*t*), as,

and can take values greater than one.

Analyses testing for the effects of a pathogen on the survival of infected vs. uninfected hosts usually involve the estimation and comparison of one of the expressions above. What makes the analysis of relative survival different is how it treats the survival of infected hosts.

The analysis of relative survival assumes individuals in the target population are exposed to two independent and mutually exclusive sources of mortality;

*1. Background or `natural' mortality*. This is the mortality individuals in the target population would be expected to experience had they not been afflicted by the disease or illness in question, and,

*2. Mortality due to disease or illness*. This is mortality individuals in the target population experience due to the disease or illness in question.

In the expressions below, and throughout the package, the index '1' will be used to indicate the effect of background mortality and the index '2' to indicate the effect of mortality due to infection.

When infected hosts die, it is not possible to tell whether they died due to background mortality or due to infection. However to remain alive means the host has not died due to the cumulative effects of background mortality, *F*_{1}(*t*), or the cumulative effects of mortality due to infection, *F*_{2}(*t*). As these two sources of mortality are independent, the probability an infected host will be observed surviving until time *t*, *S _{OBS.INF}*(

where *S*_{1}(*t*) and *S*_{2}(*t*) are the cumulative survival functions for background mortality and mortality due to infection at time *t*, respectively.

The relative survival of infected hosts at time *t*, *S _{REL}*(

which equals the expected survival of infected hosts due only to the effects of infection, *S*_{2}(*t*). That is, relative survival is the observed survival of infected hosts corrected for background mortality.

Differentiating *S _{OBS.INF}*(

where *f*_{1}(*t*) and *f*_{2}(*t*) are the probability density functions for the probability an individual alive at time *t*_{0} will die at time *t* due to background mortality or mortality due to infection, respectively.

Dividing *f _{OBS.INF}*(

where at time *t* the observed rate of mortality in the infected population is the sum of the background rate of mortality, *h*_{1}(*t*), plus the rate of mortality due to infection, *h*_{2}(*t*).

The expressions for -(*dY/dt*)/*Y*(*t*) and *h _{OBS.INF}*(

where the analysis of relative survival can be used to estimate *h*_{1}(*t*) and *h*_{2}(*t*), that is the background rate of mortality and virulence as it is defined in the mathematical models on which most epidemiological theory is based.

In the empirical scenario describe above, it was assumed;

all of the hosts in the infected population were infected, and

they experienced equally virulent infections, that is, a single hazard function could describe the rate of mortality due to infection for each member of the population.

It was also assumed that hosts could not recover from infection.

Each of these assumptions can be relaxed and relative survival models adapted to allow for incomplete infection success, for virulence to vary among infected hosts, and for hosts to recover from infection.

For a more detailed text, with worked examples, see [7].

1. Anderson RM, May RM. 1979 Population biology of infectious-diseases. 1. *Nature* **280**, 361–367. (doi:10.1038/280361a0)

2. May RM, Anderson RM. 1979 Population biology of infectious-diseases. 2. *Nature* **280**, 455–461. (doi:10.1038/280455a0)

3. Dickman PW, Sloggett A, Hills M, Hakulinen T. 2004 Regression models for relative survival. *Statistics in Medicine* **23**, 51–64. (doi:10.1002/sim.1597)

4. Ederer F, Axtell LM, Cutler SJ. 1961 The relative survival rate: a statistical methodology. *Natl Cancer Inst Monogr* **6**, 101–121.

5. Esteve J, Benhamou E, Croasdale M, Raymond L. 1990 Relative survival and the estimation of net survival - elements for further discussion. *Statistics in Medicine* **9**, 529–538. (doi:10.1002/sim.4780090506)

6. Monson RR. 1974 Analysis of relative survival and proportional mortality. *Computers and Biomedical Research* **7**, 325–332. (doi:10.1016/0010-4809(74)90010-x)

7. Agnew P. 2019 Estimating virulence from relative survival. *bioRxiv* (doi:10.1101/530709)