`Rpadrino`

provides an interface to the PADRINO database.
PADRINO houses metadata on published Integral Projection Models, and all
the information needed to rebuild them. `Rpadrino`

provides a
set of functions that wrap around `ipmr`

so that you
can rebuild these models using *R*. Additionally, there’s some
data downloading and management functionality, as well as tools to help
report and cite studies used in an analysis.

Below are some very brief examples of how to select data based on the
Metadata table, rebuild those models, and conduct an analysis. The first
step for any of this is to access PADRINO. For first time users, this
will always mean downloading it. For returning users, you may save and
re-load PADRINO, but this example will not make use of the
saving/loading functionality provided in `pdb_load()`

and
`pdb_save()`

.

```
library(Rpadrino)
<- pdb_download(save = FALSE) pdb
```

We’ve now downloaded the PADRINO database. It consists of 10 tables,
all linked by `ipm_id`

.

We can get a brief overview of the information using the
`print`

method:

` pdb`

```
## A 'pdb' object with 56 unique species, 40 publications, and 280 models.
## Please cite all publications used in an analysis! These can be accessed with
## 'pdb_citations()'.
##
## The following models have continuously varying environments:
## aaaa15, aaaa16, aaaa21, aaaa54, aaaa59
## These can take longer to re-build - adjust your expectations accordingly!
```

We see that some of the IPMs have continuously varying environments -
we don’t want to use those for now because they take a while to run. We
can use the `pdb_subset()`

function to get only deterministic
models. Because of the way PADRINO is structured,
`pdb_subset()`

only accepts a set of `ipm_id`

’s
that we want to **keep** (as opposed to other subset
functions, which may accept some arbitrary logic). Thus, we need to
create an index of those, which we can pass to `pdb_subset()`

like so:

```
<- setdiff(pdb$Metadata$ipm_id,
sub_ind c("aaaa15", "aaaa16", "aaaa21", "aaaa54", "aaaa59"))
<- pdb_subset(pdb, ipm_ids = sub_ind) det_pdb
```

Great! Say we want to find all the *Asteraceae* in PADRINO. We
can first check and see which `ipm_id`

s correspond to those
by querying the `tax_family`

column of
`pdb$Metadata`

.

```
<- det_pdb$Metadata$ipm_id[det_pdb$Metadata$tax_family == "Asteraceae"]
aster_ind
<- pdb_subset(pdb, ipm_ids = aster_ind) aster_pdb
```

Next, we can rebuild their IPMs. This is a two step process in
`Rpadrino`

. The first is to create `proto_ipm`

s
with `pdb_make_proto_ipm()`

. This function takes a database
object and, optionally, a subset of `ipm_id`

s, and constructs
a list of `proto_ipm`

s. `proto_ipm`

s are a common
data structure used to represent IPM objects before they are actually
constructed. One advantage of this extra step is that you can combine
`proto_ipm`

s generated from your own data with
`ipmr`

with `proto_ipm`

s generated from PADRINO,
so you can augment syntheses with your own data. An example of that is
provided in the *Other Data Sources* vignette.

The simplest way to make `proto_ipm`

s for a whole
`pdb`

object is:

`<- pdb_make_proto_ipm(aster_pdb) proto_list `

```
## 'ipm_id' aaa326 has the following notes that require your attention:
## aaa326: 'Demographic data from Metcalf Funct Ecol 2006'
```

```
## 'ipm_id' aaa329 has the following notes that require your attention:
## aaa329: 'Based on IPM from Rose Ecology 2005; The GPS coordinates were approximated
## to the closest geographic location described in the reference'
```

We see that there are some notes that require our attention. The
first tells us that this IPM is using data from another publication. The
second warns us that GPS coordinates aren’t exact, so we should be
cautious if we wanted to merge the outputs from this analysis with other
spatially referenced datasets (e.g. gridded climate data). We can
inspect the IPM structure by printing each `proto_ipm`

object:

` proto_list `

```
## This list of 'proto_ipm's contains the following species:
## Cirsium arvense
## Cirsium canescens
##
## You can inspect each model by printing it individually.
```

Next, we can reconstruct the IPMs using `pdb_make_ipm()`

.
There are a variety of different options we can specify here, and we’ll
get into how that works in the next example.

```
<- pdb_make_ipm(proto_list)
cirsiums
<- lambda(cirsiums)
lambdas
cirsiums
```

```
## $aaa326
## A simple, density independent, deterministic IPM with 2 sub-kernel(s) defined.
## Deterministic lambda = 1
## $aaa329
## A simple, density dependent, deterministic IPM with 1 sub-kernel(s) defined.
## Lambda for the final time step of the model is: 1.146
## Call lambda(x, type_lambda = "all") for deterministic lambdas
## from each iteration.
## attr(,"class")
## [1] "pdb_ipm" "list"
```

We’ve rebuilt our first set of IPMs! Next, we’ll explore how to extract a bit more information from these.

Often times, we’ll want more than just the deterministic population
growth rates. We’ll explore how to run some more complicated analyses
here. Specifically, we’ll compute the probability of surviving to age
\(a\), \(l_a(z_0)\) and the average per-capita
fecundity at age \(a\), \(f_a(z_0)\). After that, we’ll compute mean
and variance in lifespan (\(\bar\eta(z_0)\) and \(\sigma_\eta^2 (z_0)\), respectively). Along
the way, we’ll learn more about the structure of IPM objects in
`Rpadrino`

and `ipmr`

.

We’re going to use a subset of PADRINO IPMs to illustrate how to implement these calculations. We’ll select simple IPMs that are density-independent, deterministic, and from North America, so that things run a bit faster.

```
# This creates a table of the number of state variables per ipm_id. Since
# simple IPMs, by defintion, have only 1 state variable, we can use the names
# of the vector that it returns to choose only those.
<- table(pdb$StateVariables$ipm_id)
n_state_vars <- names(n_state_vars)[n_state_vars == 1]
simple_mod_ind
# Next, we want to capture any models that have stochastic dynamics, multiple
# values per parameter, or density dependence.
<- unique(c(pdb$EnvironmentalVariables$ipm_id, # Stochastic models
rm_mod_ind $ParSetIndices$ipm_id, # Multiple parameter values
pdb$Metadata$ipm_id[pdb$Metadata$has_dd], # Density dependent
pdb$Metadata$ipm_id[pdb$Metadata$continent != "n_america"]))
pdb
# Remove these IDs from our simple_mod_ind vector
<- simple_mod_ind[!simple_mod_ind %in% rm_mod_ind]
simple_mod_ind
# We're also going to remove monocarpic perennials, as their survival/growth
# kernels are slightly trickier to work with (note that there is an example
# of working with these in the manuscript/si/case_study_1.pdf file in PADRINO
# GitHub repository).
<- pdb$Metadata$ipm_id[pdb$Metadata$organism_type == "biennial"]
monocarps
<- simple_mod_ind[!simple_mod_ind %in% monocarps]
simple_mod_ind
# Finally, we'll subset the database and build the IPM objects!
<- pdb_subset(pdb, simple_mod_ind) %>%
simple_pdb pdb_make_proto_ipm() %>%
pdb_make_ipm()
```

```
## 'ipm_id' aaa310 has the following notes that require your attention:
## aaa310: 'Geo and time info retrieved from COMPADRE (v.X.X.X.4)'
```

```
## 'ipm_id' aaa385 has the following notes that require your attention:
## aaa385: 'Same data as AAA385. State variable Height (Cm)'
```

```
## 'ipm_id' aaa388 has the following notes that require your attention:
## aaa388: 'Same data as AAA388. State variable Height (Cm)'
```

```
## 'ipm_id' dddd30 has the following notes that require your attention:
## dddd30: 'Frankenstein IPM'
```

```
## 'ipm_id' dddd31 has the following notes that require your attention:
## dddd31: 'Frankenstein IPM'
```

```
## 'ipm_id' dddd32 has the following notes that require your attention:
## dddd32: 'Frankenstein IPM'
```

We now have 28 distinct `ipm_id`

s to work with!

Age-specific calculations are straightforward once we’ve extracted
sub-kernels. We can define functions that accept a single IPM object and
then `lapply()`

them to do our computations. We’ll start with
\(l_a(z_0)\) and \(f_a(z_0)\). These are defined by the
following equations:

\(l_a(z_0) = eP^a\), and

\(f_a(z_0) = (eFP^a)/l_a\).

\(P\) and \(F\) are survival/growth and fecundity kernels, respectively. \(e\) is a constant function \(e(z) \equiv 1\). Left multiplication with this function has the effect of summing columns. \(a\) is the age we wish to do the calculations for.

```
<- function(ipm, a) {
l_a
<- ipm$sub_kernels$P
P
# %^% is a function from ipmr that raises matrices to a power, rather than
# a pointwise power that ^ does.
<- P %^% a
P_a
colSums(P_a)
}
<- function(ipm, a) {
f_a
<- ipm$sub_kernels$P
P <- ipm$sub_kernels$F
F
<- l_a(ipm, a)
l_age
<- P %^% a
P_a
colSums(F %*% P_a) / l_age
}
```

Now, we just need to apply our functions to the IPMs. We’ll compute survival and fecundities for 5 year olds, and then plot the results:

```
<- lapply(simple_pdb,
l_as function(x, a) l_a(x, a),
a = 5)
<- lapply(simple_pdb,
f_as function(x, a) f_a(x, a),
a = 5)
# This only plots the figures for the first two species.
# Remove the [1:2] to see all of them.
# Uncomment the par(mfrow = c(...)) line to get an arrangement you like
# par(mfrow = c(2, 2))
for(i in seq_along(l_as)[1:2]) {
<- pdb$Metadata$species_accepted[pdb$Metadata$ipm_id == names(l_as)[i]]
nm
plot(l_as[[i]], type = "l",
# ylim = c(0, 1),
main = paste0(nm,": Probability of survival to age 5"),
xlab = expression(paste("Initial size z"[0])),
ylab = "Pr(s)")
plot(f_as[[i]], type = "l",
# ylim = c(0, 1),
main = paste0(nm,": Expected Fecundity at age 5 (given survival)"),
xlab = expression(paste("Initial size z"[0])),
ylab = "E[f]")
}
```

We can also generate survivorship curves for each one to investigate type I/II/III species. These require simulating cohorts for a number of years using the \(P\) and \(F\) kernels. Because this can take some time to run, we’ll only do it for a single model for each species in our data set.

```
<- pdb_subset(pdb, simple_mod_ind) %>%
keep_ind $Metadata %>%
.!duplicated(.$species_accepted), "ipm_id"]
.[
<- simple_pdb[keep_ind]
use_ipms
<- 10
n_yrs
<- right_ev(use_ipms) init_pops
```

```
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
```

`## model is now converged :)`

```
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
```

`## model is now converged :)`

```
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
```

`## model is now converged :)`

```
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
```

`## model is now converged :)`

```
## 'x' did not converge to asymptotic dynamics after 51 iterations.
## Will re-iterate the model 100 times and check for convergence.
```

`## model is now converged :)`

```
# As above, remove the [1:2] to see all plots and use par() to control their
# arrangement
# par(mfrow = c(3, 2))
for(i in seq_along(init_pops)[1:2]) {
<- l_age <- numeric(n_yrs)
f_age
<- diag(nrow(use_ipms[[i]]$sub_kernels[[1]]))
P_a
for(j in seq_len(n_yrs)) {
<- use_ipms[[i]]$sub_kernels$P
P_now <- use_ipms[[i]]$sub_kernels$F
F_now
<- sum(colSums(P_a) * init_pops[[i]][[1]])
l_age[j] <- sum(colSums(F_now %*% P_a) * init_pops[[i]][[1]])
f_age[j]
<- P_now %*% P_a
P_a
}
<- f_age / l_age
f_age
<- pdb$Metadata$species_accepted[pdb$Metadata$ipm_id == names(init_pops)[i]]
nm
plot(l_age, type = "l",
ylim = c(0, 1),
main = paste0(nm, ": Probability of survival"),
xlab = "Age",
ylab = "Pr(s)")
plot(f_age, type = "l",
# ylim = c(0, 1),
main = paste0(nm, ": Average Fecundity"),
xlab = "Age",
ylab = "E[f]")
}
```

\(\bar\eta(z_0)\) is given by the following equation: \(\bar\eta(z_0) = eN\), where \(N\) is the fundamental operator. This can be thought of as the expected amount of time an individual with initial state \(z_0\) spends in state \(z'\) before death. It is computed as \((I - P)^{-1}\), where \(I\) is an identity operator (in practice, it is an identity matrix with dimension equal to \(P\)), and \(P\) is the survival/growth kernel of the IPM. Thus, all we need are the \(P\) kernels from each model.

```
<- function(ipm) {
make_N
<- ipm$sub_kernel$P
P <- diag(nrow(P))
I <- solve(I - P)
N
return(N)
}<- function(ipm) {
eta_bar_z0
<- make_N(ipm)
N return(colSums(N))
}
<- lapply(use_ipms, eta_bar_z0) mean_lifespan
```

The formula for the variance in lifespan is only a bit more
complicated. It is given by \(\sigma_\eta^2(z_0) = e(2N^2 - N) -
(eN)^2\). For this, we also need to use `ipmr`

’s
`%^%`

operator to ensure we correctly exponentiate the first
term, and use the regular `^`

to exponentiate the second
term. Calculating \(N\) is the same as
before, we’ll compute the variance and standard deviation of lifespan
(for plotting).

```
<- function(ipm) {
sigma_eta_z0
<- make_N(ipm)
N
<- colSums(2 * N %^% 2 - N) - (colSums(N)) ^ 2
out
return(out)
}
<- lapply(use_ipms, sigma_eta_z0)
var_lifespan
<- lapply(var_lifespan, sqrt) sd_lifespan
```

Warning? Huh? Let’s see what’s going on.

`vapply(mean_lifespan, range, numeric(2L))`

```
## aaaa34 aaaa36 aaa310 aaa341 aaa351 aaa385 ddddd5 dddd10
## [1,] 1.810595 1.590430 6.36749 -348825.70 1.045242 1.005169 2.106308 1.000000
## [2,] 3.739900 5.874642 13.68391 -30555.21 2.023906 3.631503 4.638129 1.626605
## dddd30
## [1,] 1.299821
## [2,] 1.537262
```

`vapply(var_lifespan, range, numeric(2L))`

```
## aaaa34 aaaa36 aaa310 aaa341 aaa351 aaa385 ddddd5
## [1,] 2.475952 2.774118 71.19712 20383705217 0.05439601 0.02589388 3.35095
## [2,] 6.023528 13.706128 115.84883 121676736576 0.99166204 9.44496008 13.28191
## dddd10 dddd30
## [1,] 0.0000000 0.3902612
## [2,] 0.9258931 0.7019200
```

The fundamental operator should not have any negative numbers, and
yet `aaa341`

contains them. This is because of the survival
function used in the model. We’ll show how to remedy this in the next
section, but for now, we’ll just remove it from our analysis. Next,
we’ll learn how to visualize our results quickly with
`ggplot2`

:

```
<- mean_lifespan[!names(mean_lifespan) %in% "aaa341"]
mean_lifespan <- sd_lifespan[!names(sd_lifespan) %in% "aaa341"]
sd_lifespan
library(ggplot2)
<- data.frame(
all_data id = NA,
species = NA,
mean_ls = NA,
upper = NA,
lower = NA,
z_0 = NA
)
for(i in seq_along(mean_lifespan)) {
<- data.frame(
temp id = names(mean_lifespan)[i],
species = pdb$Metadata$species_accepted[pdb$Metadata$ipm_id == names(mean_lifespan)[i]],
mean_ls = mean_lifespan[[i]],
upper = mean_lifespan[[i]] + 1.96 * sd_lifespan[[i]],
lower = mean_lifespan[[i]] - 1.96 * sd_lifespan[[i]],
z_0 = seq(1, length(mean_lifespan[[i]]), 1)
)
<- rbind(all_data, temp)
all_data
}
# Remove the NA dummy row, and restrict the lower CI to >= 0 (can't have negative
# lifespan)
<- all_data[-1, ]
all_data $lower <- ifelse(all_data$lower < 0, 0, all_data$lower)
all_data
# Now, ggplot using facet wrap and geom_ribbon to get the confidence interval
ggplot(all_data, aes(x = z_0, y = mean_ls)) +
geom_line() +
geom_ribbon(aes(ymin = lower,
ymax = upper),
fill = "grey50",
alpha = 0.5) +
facet_wrap( ~ species,
scales = "free") +
theme_bw()
```

We saw above that sometimes data in PADRINO can give values that make
no sense. PADRINO is committed to providing IPMs *as they are
published*, and does not take a stance on the technical correctness
of these models. The analysis above provides an opportunity to quickly
illustrate how to address some of these issues. There is a separate
vignette with a more complete overview of known issues in PADRINO.

We’ll introduce two new functions -
`vital_rate_exprs<-`

and `pdb_new_fun_form()`

.
Since we’re computing all new survival values, we’ll have to modify the
`proto_ipm`

first, then re-build the IPM object (the species
in question is *Lonicera maackii*).

```
<- pdb_make_proto_ipm(pdb, "aaa341")
lonicera_proto
# Inspect the vital rate expressions
vital_rate_exprs(lonicera_proto)
```

```
## $aaa341
## s: 1/(1 + exp(-(si + ss1 * size_1 + ss2 * size_1^2)))
## g_mean: gi + gs * size_1
## g: dnorm(size_2, g_mean, g_sd)
## Fp: 1/(1 + exp(-(fpi + fps * size_1)))
## Fs: exp(fi + fs * size_1)
## Fd: dnorm(size_2, fd_mean, fd_sd)
```

We can see that the function `s`

is the one we want to
update. We’ll use the setter `vital_rate_exprs<-`

and
`pdb_new_fun_form()`

to do this. These two functions combine
with the following syntax:

```
vital_rate_exprs(proto_ipms) <- pdb_new_fun_form(
list(
<ipm_id_1> = list(
<vital_rate_name_1> = <expression_1>
<vital_rate_name_2> = <expression_2>
),<ipm_id_2> = list(
<vital_rate_name_3> = <expression_3>
)
) )
```

With a maximum survival probability of 0.98, our example of
`aaa341`

looks like this:

```
vital_rate_exprs(lonicera_proto) <-pdb_new_fun_form(
list(
aaa341 = list(
s = pmin(0.98, 1 / (1 + exp(-(si + ss1 * size_1 + ss2 * size_1 ^ 2))))
)
)
)
vital_rate_exprs(lonicera_proto)
```

```
## $aaa341
## s: pmin(0.98, 1/(1 + exp(-(si + ss1 * size_1 + ss2 * size_1^2))))
## g_mean: gi + gs * size_1
## g: dnorm(size_2, g_mean, g_sd)
## Fp: 1/(1 + exp(-(fpi + fps * size_1)))
## Fs: exp(fi + fs * size_1)
## Fd: dnorm(size_2, fd_mean, fd_sd)
```

Great! Let’s re-run the analysis above with our corrected model!

```
# Rebuild the IPM, then use our functions for mean, variance, and SD on it.
<- pdb_make_ipm(lonicera_proto)
lonicera_ipm
<- eta_bar_z0(lonicera_ipm$aaa341)
lonicera_mu_ls <- sigma_eta_z0(lonicera_ipm$aaa341)
lonicera_var_ls <- sqrt(lonicera_var_ls)
lonicera_sd_ls
<- data.frame(
temp id = "aaa341",
species = "Lonicera_maackii",
mean_ls = lonicera_mu_ls,
upper = lonicera_mu_ls + 1.96 * lonicera_sd_ls,
lower = lonicera_mu_ls - 1.96 * lonicera_sd_ls,
z_0 = seq(1, length(lonicera_mu_ls), 1)
)
<- rbind(all_data, temp)
all_data
# Again, restrict the lower CI to have minimum of 0
$lower <- ifelse(all_data$lower < 0, 0, all_data$lower)
all_data
# Rebuild our plot!
ggplot(all_data, aes(x = z_0, y = mean_ls)) +
geom_line() +
geom_ribbon(aes(ymin = lower,
ymax = upper),
fill = "grey50",
alpha = 0.5) +
facet_wrap( ~ species,
scales = "free") +
theme_bw()
```

The results look a bit better for *Lonicera*, and are at least
now mathematically correct. When updating functional forms in PADRINO,
it is important to explore how assumptions like the maximum survival
probability in the modified form affect results. This is left as an
exercise to you!

This is just a very brief overview of the analyses that are possible
with `Rpadrino`

. Ellner,
Childs & Rees 2016 give a comprehensive overview of IPM theory
and analyses. There are further vignettes in this package that describe
data cleaning and how to use PADRINO with other databases.