# A Tutorial on Using the Simulation Functions Included in powerlmm

#### 2018-08-14

This vignette shows how you can further investigate your models using Monte Carlo simulations. In addition to reporting power, simulations allows you to investigate parameter bias, model misspecification, Type I errors and convergence issues. For instance, even if the analytical power calculations work well, it can be useful to check the small sample properties of the model, e.g. estimates of cluster-level random effects with only a few clusters.

## Setup

The simulation-function accepts the same study setup-object that we have used in the two- and three-level power vignettes. The simulation-function supports all models that are available in powerlmm.

#cores <- parallel::detectCores(logical = FALSE) # use all physical CPU cores
cores <- 16
nsim <- 1000

## Partially nested three-level model

We will start with an example of how to evaluate the partially nested three-level model. When doing simulations it can be a good idea to specify the model using raw values, to ensure all parameters are reasonable. Here I use estimates from a real clinical trial.

library(powerlmm)
p <- study_parameters(n1 = 11,
n2 = 10,
n3 = 6,
fixed_intercept = 37,
fixed_slope = -0.64,
sigma_subject_intercept = 2.8,
sigma_subject_slope = 0.4,
sigma_cluster_intercept = 0,
cor_subject = -0.2,
icc_slope = 0.05,
sigma_error = 2.6,
dropout = dropout_weibull(proportion = 0.3,
rate = 1/2),
partially_nested = TRUE,
effect_size = cohend(-0.8,
standardizer = "pretest_SD"))
p
##
##      Study setup (three-level, partially nested)
##
##               n1 = 11
##               n2 = 10 x 6 (treatment)
##                    60     (control)
##               n3 = 6      (treatment)
##                    0      (control)
##                    6      (total)
##          total_n = 60     (treatment)
##                    60     (control)
##                    120    (total)
##          dropout =  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10 (time)
##                     0, 11, 15, 18, 20, 22, 24, 26, 27, 29, 30 (%, control)
##                     0, 11, 15, 18, 20, 22, 24, 26, 27, 29, 30 (%, treatment)
## icc_pre_subjects = 0.54
## icc_pre_clusters = 0
##        icc_slope = 0.05
##        var_ratio = 0.02
##      effect_size = -0.8 (Cohen's d [SD: pretest_SD, control])

### How many simulations to run?

The get_monte_carlo_se-function accepts a power object, and will return the 95 % interval for a given amount of simulations. This is useful to gauge the precision of the empirical power estimates.

x <- get_power(p)
get_monte_carlo_se(x, nsim = c(1000, 2000, 5000))
##   power     se lwr_95 upr_95
## 1  0.65 0.0151   0.62   0.68
## 2  0.65 0.0107   0.63   0.67
## 3  0.65 0.0067   0.64   0.66

We would need about 5000 simulation if we want our empirical power to be +/- 1% from the analytical power.

Now, let’s run the simulation. First we need to write the lme4-formula, since the simulated data sets are analyzed using lme4::lmer. Then we pass the study_parameters-object to the simulate function, and when the simulation is finished we use summary to view the results. You can run the simulation in parallel by settings cores > 1. The parallel computations will be done using the parallel package. If performed on a Windows machine or inside a GUI, then PSOCK clusters are used, if on Unix/macOS and the simulation is run non-interactively in a terminal then forking is used.

f <- sim_formula("y ~ treatment * time +
(1 + time | subject) +
(0 + treatment:time | cluster)")

if(RUN_SIM) {
res <- simulate(object = p,
nsim = nsim,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)

}

summary(res)
## Model:  default
##
## Random effects
##
##          parameter  M_est   theta est_rel_bias prop_zero is_NA
##  subject_intercept  7.900  7.8000       0.0067      0.00     0
##      subject_slope  0.160  0.1600      -0.0160      0.00     0
##      cluster_slope  0.012  0.0084       0.4500      0.42     0
##              error  6.800  6.8000      -0.0014      0.00     0
##        cor_subject -0.200 -0.2000      -0.0210      0.00     0
##
## Fixed effects
##
##       parameter  M_est theta  M_se SD_est Power Power_bw Power_satt
##     (Intercept) 37.000 37.00 0.420  0.430 1.000        .          .
##       treatment  0.031  0.00 0.590  0.580 0.046        .          .
##            time -0.640 -0.64 0.068  0.072 1.000        .          .
##  treatment:time -0.300 -0.31 0.110  0.100 0.810     0.62       0.77
## ---
## Number of simulations: 1000  | alpha:  0.05
## Time points (n1):  11
## Subjects per cluster (n2 x n3):  10 x 6 (treatment)
##                                  60 (control)
## Total number of subjects:  120
## [Model: default] 0.4% of the models threw convergence warnings

### Automatic formula creation

It is also possible to automatically create the formula by leaving it blank. Then the lmer-formula is created by including the parameters that are not specified as NA or NULL in the model, see ?create_lmer_formula. N.B., parameters specified as 0 are included in the model.

## Investigate model misspecification

Let’s compare the correct partially nested model to a two-level linear mixed-effects model. The sim_formula_compare function lets us fit multiple models to each simulated data set, by passing the model formulas as named arguments. By setting effect_size to 0 the simulation results tells us the empirical Type I errors for the correct and misspecified model.

p <- update(p, effect_size = 0)

f <- sim_formula_compare("correct" = "y ~ treatment * time +
(1 + time | subject) +
(0 + treatment:time | cluster)",
"wrong" = "y ~ treatment * time +
(1 + time | subject)")
if(RUN_SIM) {
res2 <- simulate(object = p,
nsim = nsim,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)
}

summary(res2)
## Model:  correct
##
## Random effects
##
##          parameter  M_est   theta est_rel_bias prop_zero is_NA
##  subject_intercept  7.800  7.8000      -0.0064      0.00     0
##      subject_slope  0.160  0.1600      -0.0200      0.00     0
##      cluster_slope  0.013  0.0084       0.5000      0.43     0
##              error  6.700  6.8000      -0.0015      0.00     0
##        cor_subject -0.190 -0.2000      -0.0420      0.00     0
##
## Fixed effects
##
##       parameter   M_est theta  M_se SD_est Power Power_bw Power_satt
##     (Intercept) 37.0000 37.00 0.420  0.420 1.000        .          .
##       treatment -0.0120  0.00 0.590  0.580 0.055        .          .
##            time -0.6400 -0.64 0.068  0.068 1.000        .          .
##  treatment:time  0.0049  0.00 0.110  0.100 0.046    0.004      0.042
## ---
## Model:  wrong
##
## Random effects
##
##          parameter M_est theta est_rel_bias prop_zero is_NA
##  subject_intercept  7.80  7.80      -0.0064         0     0
##      subject_slope  0.16  0.16       0.0130         0     0
##              error  6.70  6.80      -0.0015         0     0
##        cor_subject -0.19 -0.20      -0.0580         0     0
##
## Fixed effects
##
##       parameter   M_est theta  M_se SD_est Power Power_bw Power_satt
##     (Intercept) 37.0000 37.00 0.420  0.420 1.000        .          .
##       treatment -0.0120  0.00 0.590  0.580 0.055        .          .
##            time -0.6400 -0.64 0.069  0.068 1.000        .          .
##  treatment:time  0.0048  0.00 0.097  0.100 0.060     0.01      0.058
## ---
## Number of simulations: 1000  | alpha:  0.05
## Time points (n1):  11
## Subjects per cluster (n2 x n3):  10 x 6 (treatment)
##                                  60 (control)
## Total number of subjects:  120
## [Model: correct] 0.3% of the models threw convergence warnings

All the parameters are summarized and presented for both models. Since we specified satterthwaite = TRUE, empirical power is both based on the Wald Z test and the Wald t test using Satterthwaite’s df approximation (from the lmerTest package).

## Compare multiple combinations of parameter values

We can also simulate multiple designs and compare them. Let’s compare 6 vs 12 clusters in the treatment arm, and a ìcc_slope of 0.05 and 0.1, and a Cohen’s d of 0.5 and 0.8.

p <- study_parameters(n1 = 11,
n2 = 10,
n3 = c(6, 12),
fixed_intercept = 37,
fixed_slope = -0.64,
sigma_subject_intercept = 2.8,
sigma_subject_slope = 0.4,
sigma_cluster_intercept = 0,
cor_subject = -0.2,
icc_slope = c(0.05, 0.1),
sigma_error = 2.6,
dropout = dropout_weibull(proportion = 0.3,
rate = 1/2),
partially_nested = TRUE,
effect_size = cohend(c(0.5, 0.8),
standardizer = "pretest_SD"))

f <- sim_formula("y ~ treatment * time + (1 + time | subject) +
(0 + treatment:time | cluster)")
if(RUN_SIM) {
res3 <- simulate(object = p,
nsim = nsim,
formula = f,
satterthwaite = TRUE,
cores = cores,
batch_progress = FALSE)
}

The setup and simulation is the same, except that we define vectors of values for some parameters. In this scenario we have a total of 2 * 2 * 2 = 8 different combination of parameter values. For each of the 8 setups nsim number of simulations will be performed. With multiple designs summary works a bit differently. Instead of summarizing all parameters, we now have to choose which parameters to summarize.

# Summarize the 'time:treatment' results
x <- summary(res3, para = "treatment:time")

# customize what to print
print(x, add_cols = c("n3_tx", "icc_slope", "effect_size"),
estimates = FALSE)
## Model: 'All' | Type: 'fixed' | Parameter(s): 'treatment:time'
## ---
##  effect_size icc_slope n3_tx Power Power_bw Power_satt
##          0.5      0.05     6  0.45     0.23       0.40
##          0.5      0.05    12  0.74     0.64       0.72
##          0.5      0.10     6  0.41     0.23       0.36
##          0.5      0.10    12  0.69     0.60       0.66
##          0.8      0.05     6  0.82     0.62       0.78
##          0.8      0.05    12  0.98     0.97       0.98
##          0.8      0.10     6  0.77     0.54       0.71
##          0.8      0.10    12  0.96     0.94       0.95
## ---
## nsim:  1000 | 29 columns not shown
# Summarize the cluster-level random slope
x <- summary(res3, para = "cluster_slope")
print(x, add_cols = c("n3_tx", "icc_slope", "effect_size"))
## Model: 'All' | Type: 'random' | Parameter(s): 'cluster_slope'
## ---
##  effect_size icc_slope n3_tx M_est  theta
##          0.5      0.05     6 0.012 0.0084
##          0.5      0.05    12 0.010 0.0084
##          0.5      0.10     6 0.020 0.0178
##          0.5      0.10    12 0.018 0.0178
##          0.8      0.05     6 0.012 0.0084
##          0.8      0.05    12 0.010 0.0084
##          0.8      0.10     6 0.021 0.0178
##          0.8      0.10    12 0.018 0.0178
## ---
## nsim:  1000 | 28 columns not shown

# ANCOVA vs 2-level LMM—data transformations

Sometimes it is useful to compare if the full longitudinal model is preferred over just analyzing posttest data. In this example we will compare the 2-level LMM to models that analyze the results as a cross-sectional model at posttest. This can be accomplished by using the data_transform argument, note that we must specify test = "treatment" for the OLS posttest models.

p <- study_parameters(n1 = 11,
n2 = 40,
icc_pre_subject = 0.5,
cor_subject = -0.5,
var_ratio = c(0, 0.01, 0.02, 0.03),
dropout = c(0, dropout_weibull(0.3, 1)),
effect_size = cohend(c(0, 0.8)))

# Formulas --------------------------------------------------------------------
# OLS "t-test"
f_PT <- sim_formula("y ~ treatment",
test = "treatment",
data_transform = transform_to_posttest)

# ANCOVA
f_PT_pre <- sim_formula("y ~ treatment + pretest",
test = "treatment",
data_transform = transform_to_posttest)

# analyze as 2-level longitudinal
f_LMM <- sim_formula("y ~ time * treatment +
(1 + time | subject)")

# constrain treatment differences at pretest
f_LMM_c <- sim_formula("y ~ time + time:treatment +
(1 + time | subject)")

# combine formulas
f <- sim_formula_compare("posttest" = f_PT,
"ANCOVA" = f_PT_pre,
"LMM" = f_LMM,
"LMM_c" = f_LMM_c)

# Run sim --------------------------------------------------------------------
if(RUN_SIM) {
res4 <- simulate(p,
formula = f,
nsim = nsim,
cores = cores,
satterthwaite = TRUE,
batch_progress = FALSE)
}

We are mostly interested in the treatment effect, so we use the para argument. Since the treatment effect is represented by different terms in the cross-sectional and longitudinal model we specify the name of the treatment effect for each model.

tests <-  list("posttest" = "treatment",
"ANCOVA" = "treatment",
"LMM" = "time:treatment",
"LMM_c" = "time:treatment")

x <- summary(res4, para = tests)
"effect_size"))
## Model: 'All' | Type: 'fixed' | Parameter(s): 'treatment', 'time:treatment'
## ---
##   model effect_size var_ratio  M_est theta M_se SD_est Power Power_bw Power_satt
##  ANCOVA         0.0      0.00  0.102     0  2.7    2.7 0.056    0.049      0.049
##  ANCOVA         0.0      0.01 -0.022     0  3.1    3.0 0.047    0.042      0.042
##  ANCOVA         0.0      0.02 -0.077     0  3.6    3.6 0.060    0.056      0.056
##  ANCOVA         0.0      0.03  0.148     0  4.1    4.0 0.045    0.042      0.042
##  ANCOVA         0.8      0.00 11.196     0  2.7    2.7 0.980    0.978      0.978
## ---
## nsim:  1000 | 23 columns not shown

We only show the first 5 results, for these data a figure will show the results better.

library(ggplot2)
x$model <- factor(x$model, levels = c("posttest",
"ANCOVA",
"LMM",
"LMM_c"))

d_limits <- data.frame(effect_size = c(0),
Power_satt = c(0.025, 0.075),
var_ratio = 0,
model = 0,
dropout = 0)

d_hline <- data.frame(effect_size = c(0, 0.8),
x = c(0.05, 0.8))

ggplot(x, aes(model,
Power_satt,
group = interaction(var_ratio, dropout),
color = factor(var_ratio),
linetype = factor(dropout))) +
geom_hline(data = d_hline,  aes(yintercept = x)) +
geom_point() +
geom_blank(data = d_limits) +
geom_line() +
labs(y = "Power (Satterthwaite)",
linetype = "Dropout",
color = "Variance ratio",
title = "Power and Type I errors",
subtitle = 'Comparing "cross-sectional" and longitudinal models\nunder various amounts of heterogeneity and dropout') +
facet_grid(dropout ~ effect_size,
scales = "free",
labeller = "label_both") +
coord_flip()

# Investigate LRT model selection strategies

## Do we need a 3-level model?

A common strategy when analyzing (longitudinal) data is to build the model in a data driven fashion. We can investigate how well such a strategy works using sim_formula_compare. We’ll define four model formulas, starting with a 2-level random intercept model and working up to a 3-level random intercept and slope model. The true model is a 3-level model with only a random slope at the third level. Let’s first setup the model

p <- study_parameters(n1 = 11,
n2 = 40,
n3 = 3,
icc_pre_subject = 0.5,
cor_subject = -0.5,
icc_slope = 0.05,
var_ratio = 0.03)

f0 <- sim_formula("y ~ time * treatment + (1 | subject)")
f1 <- sim_formula("y ~ time * treatment + (1 + time | subject)")
f2 <- sim_formula("y ~ time * treatment + (1 + time | subject) + (0 + time | cluster)")
f3 <- sim_formula("y ~ time * treatment + (1 + time | subject) + (1 + time | cluster)")

f <- sim_formula_compare("subject-intercept" = f0,
"subject-slope" = f1,
"cluster-slope" = f2,
"cluster-intercept" = f3)

Then we run the simulation, the four model formulas in f will be fit the each data set.

if(RUN_SIM) {
res5 <- simulate(p, formula = f,
nsim = nsim,
satterthwaite = TRUE,
cores = cores,
CI = FALSE)
}

During each simulation the REML log-likelihood is saved for each model, we can therefore perform the model selection using the summary method, as a post-processing step. Since REML is used it is assumed the fixed effects are the same, and that we compare a “full model” to a “reduced model”. Let’s try a forward selection strategy, where start with the first model and compare it to the next. If the comparison is significant we continue testing models, else we stop. The summary function performs this model comparison for each of the nsim simulations and returns the results from the “winning” model in each simulation replication.

x <- summary(res5, model_selection = "FW", LRT_alpha = 0.05)
x
## Model:  model_selection
##
## Random effects
##
##          parameter   M_est  theta est_rel_bias prop_zero is_NA
##  subject_intercept 100.000 100.00     -0.00098      0.00     0
##      subject_slope   2.900   2.80      0.00690      0.00     0
##              error 100.000 100.00      0.00140      0.00     0
##        cor_subject  -0.500  -0.50     -0.00780      0.00     0
##      cluster_slope   0.280   0.15      0.85000      0.00     0
##  cluster_intercept   8.000   0.00      8.00000      0.00     0
##        cor_cluster   0.019   0.00      0.01900      0.56     0
##
## Fixed effects
##
##       parameter    M_est theta M_se SD_est Power Power_bw Power_satt
##     (Intercept)  0.02800     0 1.10   1.10 0.054        .          .
##            time -0.00077     0 0.25   0.28 0.140        .          .
##       treatment -0.04300     0 1.50   1.50 0.047        .          .
##  time:treatment  0.00150     0 0.35   0.40 0.120    0.049       0.12
## ---
## Number of simulations: 1000  | alpha:  0.05
## Time points (n1):  11
## Subjects per cluster (n2 x n3):  40 x 3 (treatment)
##                                  40 x 3 (control)
## Total number of subjects:  240
## ---
## Results based on LRT model comparisons, using direction: FW (alpha = 0.05)
## Model selected (proportion)
## cluster-intercept     cluster-slope     subject-slope
##             0.009             0.422             0.569

The point of the model selection algorithm is to mimic a type of data driven model selection that is quite common. We see that this strategy do not lead to nominal Type I errors in this scenario. However, it is fairly common to increase the LRT’s alpha level to try to improve this strategy. Let’s try a range of alpha levels to see the impact.

alphas <- seq(0.01, 0.5, length.out = 50)
x <- vapply(alphas, function(a) {
type1 <- summary(res5, model_selection = "FW", LRT_alpha = a)
type1$summary$model_selection$FE$Power_satt[4]
}, numeric(1))

d <- data.frame(LRT_alpha = alphas, type1 = x)
ggplot(d, aes(LRT_alpha, type1)) +
geom_line() +
geom_hline(yintercept = 0.05, linetype = "dotted") +
labs(y = "Type I errors (time:treatment)",
title = "Impact of the alpha level when using LRT for model selection")

We can also see the results from each of the four models. Here we will look at the time:treatment effect.

x1 <- summary(res5, para = "time:treatment")
x1
## Model:  summary
##
## Fixed effects: 'time:treatment'
##
##              model  M_est theta M_se SD_est Power Power_bw Power_satt
##  subject-intercept 0.0015     0 0.14    0.4 0.500    0.320      0.490
##      subject-slope 0.0015     0 0.25    0.4 0.210    0.073      0.200
##      cluster-slope 0.0015     0 0.39    0.4 0.083    0.023      0.051
##  cluster-intercept 0.0015     0 0.40    0.4 0.078    0.025      0.041
## ---
## Number of simulations: 1000  | alpha:  0.05
## Time points (n1):  11
## Subjects per cluster (n2 x n3):  40 x 3 (treatment)
##                                  40 x 3 (control)
## Total number of subjects:  240

We see that the 2-lvl random intercept model lead to substantially inflated Type I errors = 0.49. The 2-level model that also adds a random slope is somewhat better but still not good, Type I errors = 0.2. The correct 3-level model that account for the third level using a random slope have close to nominal Type I errors = 0.05. The full 3-level that adds an unnecessary random intercept is somewhat conservative, Type I errors = 0.04.

When choosing a strategy Type I errors is not only factor we want to minimize, power is also important. So let’s see how power is affected.

# See if power is impacted
p1 <- update(p, effect_size = cohend(0.8))
if(RUN_SIM) {
res6 <- simulate(p1,
formula = f,
nsim = nsim,
satterthwaite = TRUE,
cores = cores,
CI = FALSE)
}

# we can also summary a fixed effect for convenience
x <- summary(res6, model_selection = "FW", para = "time:treatment")
print(x, verbose = FALSE)
## Model:  summary
##
## Fixed effects: 'time:treatment'
##
##            model M_est theta M_se SD_est Power Power_bw Power_satt
##  model_selection   1.1   1.1 0.37   0.41  0.79     0.58       0.66
## ---
x1 <- summary(res6, para = "time:treatment")
print(x1, verbose = FALSE)
## Model:  summary
##
## Fixed effects: 'time:treatment'
##
##              model M_est theta M_se SD_est Power Power_bw Power_satt
##  subject-intercept   1.1   1.1 0.14   0.41  0.99     0.98       0.99
##      subject-slope   1.1   1.1 0.25   0.41  0.94     0.84       0.94
##      cluster-slope   1.1   1.1 0.39   0.41  0.77     0.54       0.62
##  cluster-intercept   1.1   1.1 0.40   0.41  0.77     0.53       0.56
## ---

## Consequences of ignoring clustering in a posttest only model

We already investigated the consequences of ignoring the cluster-level. We can also investigate the consequences of ignoring higher-level clustering in a posttest only analysis, by using the data_transform argument in sim_formula.

# simulation formulas

# analyze as a posttest only 1-level OLS model
f0 <- sim_formula("y ~ treatment",
test = "treatment",
data_transform = transform_to_posttest)

# analyze as a posttest only 2-level model
f1 <- sim_formula("y ~ treatment + (1 | cluster)",
test = "treatment",
data_transform = transform_to_posttest)

f <- sim_formula_compare("post_ignore" = f0,
"post_2lvl" = f1)

if(RUN_SIM) {
res7 <- simulate(p,
formula = f,
nsim = nsim,
cores = cores,
satterthwaite = TRUE,
batch_progress = FALSE)
}

# Type I error
print(summary(res7, para = "treatment"), verbose = FALSE)
## Model:  summary
##
## Fixed effects: 'treatment'
##
##        model M_est theta M_se SD_est Power Power_bw Power_satt
##  post_ignore 0.038     0  2.3    4.1  0.29    0.110      0.280
##    post_2lvl 0.038     0  3.8    4.1  0.12    0.038      0.073
## ---

We see that the Type I errors is substantially increased when clustering is ignored.

# model selection
summary(res7,
model_selection = "FW",
para = "treatment",
LRT_alpha = 0.1)
## Model:  summary
##
## Fixed effects: 'treatment'
##
##            model M_est theta M_se SD_est Power Power_bw Power_satt
##  model_selection 0.038     0  3.5    4.1  0.17    0.061       0.16
## ---
## Number of simulations: 1000  | alpha:  0.05
## Time points (n1):  11
## Subjects per cluster (n2 x n3):  40 x 3 (treatment)
##                                  40 x 3 (control)
## Total number of subjects:  240
## ---
## Results based on LRT model comparisons, using direction: FW (alpha = 0.1)
## Model selected (proportion)
##   post_2lvl post_ignore
##       0.468       0.532
## ---
## At least one of the models applied a data transformation during simulation,
## summaries that depend on the true parameter values will no longer be correct,
## see 'help(summary.plcp_sim)'

Just as with the longitudinal model, the LRT model selection strategy rejects the 2-level model too often leading to inflated Type I errors.