# The exactmed function

## Introduction

This document aims to illustrate the usage of the function exactmed() and its behavior via additional examples.

We recall that exactmed() computes natural direct and indirect effects, and controlled direct effects for a binary outcome and a binary mediator. The user can specify the reference levels of the outcome and mediator variables using the input parameters hvalue_m and hvalue_y, respectively (see the function help). Controlled direct effects are obtained for both possible mediator values ($$m=0$$ and $$m=1$$). Natural and controlled effects can either be unadjusted or adjusted for covariates (that is, conditional effects). Adjusted effect estimates are obtained for covariates fixed at their sample-specific mean values (default) or at specific values (user-provided). Also, by default, exactmed() incorporates a mediator-exposure interaction term in the outcome model, which can be removed by setting interaction=FALSE. Concerning interval estimates, exactmed() generates, by default, $$95\%$$ confidence intervals obtained by the delta method. Alternatively, bootstrap confidence intervals, instead of delta method confidence intervals, can be obtained by specifying boot=TRUE in the function call. In this case, 1000 bootstrap datasets are generated by default.

## The dataset

In all the examples presented below we use the dataset datamed, available after loading the ExactMed package. Some of the features of this dataset can be found in its corresponding help file (help(datamed)). We recall that the exactmed() function only works on data frames with named columns and no missing values.

> library(ExactMed)
>
X M Y C1         C2
1 1 1 0  0  0.3753731
2 1 0 0  1  0.1971635
3 1 0 1  1 -0.5971041
4 0 1 0  0  0.7576990
5 0 1 0  0  1.2056864
6 0 1 0  0 -0.5983204

The following command verifies whether the dataset contains any missing values:

>
> as.logical(sum(is.na(datamed)))
[1] FALSE

## Examples

### Basic examples

Suppose that one wishes to obtain mediation effects estimates for a change in exposure from $$0$$ to $$1$$, assuming there is no exposure-mediator interaction and using the delta method to construct $$95\%$$ confidence intervals.

In this case, a valid call to exactmed() would be:

>
> results1 <- exactmed(
+   data = datamed, a = 'X', m = 'M', y = 'Y',
+   a1 = 1, a0 = 0, interaction = FALSE
+   )
'exactmed' will compute unadjusted natural effects
>
> results1
$Natural effects on OR scale Estimate Std.error 2.5% 97.5% P.val Direct effect 2.04899 0.27469 1.57554 2.66472 8.7484e-08 Indirect effect 0.88069 0.03097 0.82204 0.94352 0.00030218 Total effect 1.80452 0.23756 1.39414 2.33571 7.3256e-06$Natural effects on RR scale
Estimate Std.error    2.5%   97.5%      P.val
Direct effect    1.30183   0.06489 1.18065 1.43543 1.2130e-07
Indirect effect  0.96248   0.01014 0.94281 0.98257 0.00028454
Total effect     1.25298   0.06379 1.13399 1.38446 9.4328e-06

$Natural effects on RD scale Estimate Std.error 2.5% 97.5% P.val Direct effect 0.16514 0.03021 0.10593 0.22435 4.5850e-08 Indirect effect -0.02672 0.00732 -0.04107 -0.01238 0.00026137 Total effect 0.13842 0.03048 0.07868 0.19815 5.5775e-06$Controlled direct effects (m=0)
Estimate Std.error    2.5%   97.5%      P.val
OR scale  2.07495   0.28607 1.58363 2.71870 1.1937e-07
RR scale  1.26843   0.05652 1.16234 1.38420 9.5079e-08
RD scale  0.15878   0.02892 0.10210 0.21546 4.0057e-08

$Controlled direct effects (m=1) Estimate Std.error 2.5% 97.5% P.val OR scale 2.07495 0.28607 1.58363 2.71870 1.1937e-07 RR scale 1.40138 0.09725 1.22317 1.60555 1.1566e-06 RD scale 0.17947 0.03343 0.11395 0.24499 7.9365e-08 Mediation effects estimates adjusted for covariates are obtained through the use of the character vectors m_cov and y_cov, which contain the names of the covariates to be adjusted for in the mediator and outcome models, respectively. The following call to exactmed() incorporates covariates C1 and C2 in both the mediator and outcome models: > > results2 <- exactmed( + data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, + m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), + interaction = FALSE + ) > > results2$Natural effects on OR scale
Estimate Std.error    2.5%   97.5%      P.val
Direct effect    1.91812   0.27925 1.44197 2.55150 7.6751e-06
Indirect effect  0.99263   0.05626 0.88826 1.10927    0.89619
Total effect     1.90399   0.26714 1.44622 2.50664 4.4407e-06

$Natural effects on RR scale Estimate Std.error 2.5% 97.5% P.val Direct effect 1.27053 0.06748 1.14492 1.40991 6.5375e-06 Indirect effect 0.99782 0.01667 0.96568 1.03103 0.89593 Total effect 1.26776 0.06670 1.14354 1.40547 6.5043e-06$Natural effects on RD scale
Estimate Std.error     2.5%   97.5%      P.val
Direct effect    0.15019   0.03272  0.08605 0.21432 4.4402e-06
Indirect effect -0.00154   0.01178 -0.02463 0.02155    0.89604
Total effect     0.14865   0.03195  0.08602 0.21127 3.2871e-06

$Controlled direct effects (m=0) Estimate Std.error 2.5% 97.5% P.val OR scale 1.91814 0.27936 1.44182 2.55183 7.7378e-06 RR scale 1.26961 0.06625 1.14619 1.40633 4.7657e-06 RD scale 0.15000 0.03236 0.08657 0.21342 3.5640e-06$Controlled direct effects (m=1)
Estimate Std.error    2.5%   97.5%      P.val
OR scale  1.91814   0.27936 1.44182 2.55183 7.7378e-06
RR scale  1.27393   0.07778 1.13025 1.43587 7.3286e-05
RD scale  0.15087   0.03454 0.08318 0.21857 1.2534e-05

exactmed() also allows for the specification of two different sets of covariates in the mediator and outcome models. For example, the following specification of m_cov and y_cov means that the mediator model is adjusted for C1 and C2, while the outcome model is adjusted for C1 only.

However, we advise against this practice unless it is known that excluded covariates are independent of the dependent variable (mediator or outcome) being modeled given the rest of covariates.

>
> results3 <- exactmed(
+   data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,
+   m_cov = c('C1', 'C2'), y_cov = c('C1'),
+   interaction = FALSE
+   )
>
> results3
$Natural effects on OR scale Estimate Std.error 2.5% 97.5% P.val Direct effect 2.07747 0.29467 1.57326 2.74328 2.5397e-07 Indirect effect 0.88888 0.04509 0.80476 0.98180 0.020229 Total effect 1.84663 0.25710 1.40563 2.42600 1.0554e-05$Natural effects on RR scale
Estimate Std.error    2.5%   97.5%      P.val
Direct effect    1.29629   0.06554 1.17400 1.43133 2.8560e-07
Indirect effect  0.96677   0.01378 0.94013 0.99416   0.017754
Total effect     1.25321   0.06518 1.13176 1.38771 1.4271e-05

$Natural effects on RD scale Estimate Std.error 2.5% 97.5% P.val Direct effect 0.16572 0.03132 0.10433 0.22710 1.2175e-07 Indirect effect -0.02409 0.01021 -0.04410 -0.00409 0.018258 Total effect 0.14162 0.03174 0.07941 0.20384 8.1377e-06$Controlled direct effects (m=0)
Estimate Std.error    2.5%   97.5%      P.val
OR scale  2.08515   0.29879 1.57458 2.76128 2.9258e-07
RR scale  1.28132   0.06150 1.16628 1.40771 2.4082e-07
RD scale  0.16264   0.03059 0.10269 0.22259 1.0544e-07

$Controlled direct effects (m=1) Estimate Std.error 2.5% 97.5% P.val OR scale 2.08515 0.29879 1.57458 2.76128 2.9258e-07 RR scale 1.36112 0.09014 1.19544 1.54976 3.2297e-06 RD scale 0.17702 0.03443 0.10954 0.24450 2.7259e-07 By default, the adjusted parameter is TRUE. If the adjusted parameter is set to FALSE, exactmed() ignores the values of the vectors m_cov and y_cov, and computes unadjusted (crude) effect estimates as in the first example above: > > results4 <- exactmed( + data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, + m_cov = c('C1', 'C2'), y_cov = c('C1'), + adjusted = FALSE, interaction = FALSE + ) 'exactmed' will compute unadjusted natural effects > > results4$Natural effects on OR scale
Estimate Std.error    2.5%   97.5%      P.val
Direct effect    2.04899   0.27469 1.57554 2.66472 8.7484e-08
Indirect effect  0.88069   0.03097 0.82204 0.94352 0.00030218
Total effect     1.80452   0.23756 1.39414 2.33571 7.3256e-06

$Natural effects on RR scale Estimate Std.error 2.5% 97.5% P.val Direct effect 1.30183 0.06489 1.18065 1.43543 1.2130e-07 Indirect effect 0.96248 0.01014 0.94281 0.98257 0.00028454 Total effect 1.25298 0.06379 1.13399 1.38446 9.4328e-06$Natural effects on RD scale
Estimate Std.error     2.5%    97.5%      P.val
Direct effect    0.16514   0.03021  0.10593  0.22435 4.5850e-08
Indirect effect -0.02672   0.00732 -0.04107 -0.01238 0.00026137
Total effect     0.13842   0.03048  0.07868  0.19815 5.5775e-06

$Controlled direct effects (m=0) Estimate Std.error 2.5% 97.5% P.val OR scale 2.07495 0.28607 1.58363 2.71870 1.1937e-07 RR scale 1.26843 0.05652 1.16234 1.38420 9.5079e-08 RD scale 0.15878 0.02892 0.10210 0.21546 4.0057e-08$Controlled direct effects (m=1)
Estimate Std.error    2.5%   97.5%      P.val
OR scale  2.07495   0.28607 1.58363 2.71870 1.1937e-07
RR scale  1.40138   0.09725 1.22317 1.60555 1.1566e-06
RD scale  0.17947   0.03343 0.11395 0.24499 7.9365e-08

To perform an adjusted mediation analysis allowing for exposure-mediator interaction (by default TRUE) and using bootstrap based on $$100$$ resamples with initial random seed $$= 1991$$ to construct $$97\%$$ confidence intervals, one should call exactmed() as follows:

>
> results5 <- exactmed(
+   data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,
+   m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'),
+   boot = TRUE, nboot = 100, bootseed = 1991, confcoef = 0.97
+   )
>
> results5
$Natural effects on OR scale Estimate Std.error 1.5% 98.5% Direct effect 2.24870 0.37516 1.68812 3.32099 Indirect effect 0.88856 0.07284 0.75874 1.07384 Total effect 1.99810 0.28480 1.51926 2.71363$Natural effects on RR scale
Estimate Std.error    1.5%   98.5%
Direct effect    1.33700   0.07272 1.20829 1.48229
Indirect effect  0.96726   0.02151 0.92984 1.02289
Total effect     1.29323   0.06769 1.16111 1.44939

$Natural effects on RD scale Estimate Std.error 1.5% 98.5% Direct effect 0.18403 0.03365 0.12012 0.25767 Indirect effect -0.02390 0.01591 -0.05367 0.01544 Total effect 0.16013 0.03112 0.09568 0.22870$Controlled direct effects (m=0)
Estimate Std.error    1.5%   98.5%
OR scale  2.61843   0.55335 1.82091 4.25427
RR scale  1.41151   0.09319 1.25366 1.59634
RD scale  0.21741   0.04099 0.14197 0.30413

$Controlled direct effects (m=1) Estimate Std.error 1.5% 98.5% OR scale 1.30748 0.30771 0.79211 2.15864 RR scale 1.10061 0.09454 0.92150 1.35743 RD scale 0.06150 0.05285 -0.05274 0.18411 ### Firth’s penalization In the situation where we believe that we are facing a problem of separation or quasi-separation, Firth’s penalization can be used by setting the Firth parameter to TRUE (Firth penalized mediation analysis). If this is the case, Firth’s penalization is applied to both the mediator model and the outcome model. The Firth parameter implements Firth’s penalization to reduce the bias of the regression coefficients estimators under scarce or sparse data: > > results6 <- exactmed( + data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, + m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), Firth = TRUE, + boot = TRUE, nboot = 100, bootseed = 1991, confcoef = 0.97 + ) > > results6$Natural effects on OR scale
Estimate Std.error    1.5%   98.5%
Direct effect    2.23246   0.36864 1.68036 3.28209
Indirect effect  0.88991   0.07197 0.76166 1.07288
Total effect     1.98668   0.28090 1.51385 2.69182

$Natural effects on RR scale Estimate Std.error 1.5% 98.5% Direct effect 1.33452 0.07217 1.20680 1.47867 Indirect effect 0.96751 0.02135 0.93035 1.02269 Total effect 1.29116 0.06719 1.16007 1.44614$Natural effects on RD scale
Estimate Std.error     1.5%   98.5%
Direct effect    0.18263   0.03343  0.11919 0.25573
Indirect effect -0.02367   0.01576 -0.05316 0.01527
Total effect     0.15896   0.03093  0.09498 0.22710

$Controlled direct effects (m=0) Estimate Std.error 1.5% 98.5% OR scale 2.59835 0.54383 1.81264 4.20506 RR scale 1.40883 0.09257 1.25203 1.59214 RD scale 0.21597 0.04077 0.14098 0.30221$Controlled direct effects (m=1)
Estimate Std.error     1.5%   98.5%
OR scale  1.30556   0.30467  0.79391 2.14590
RR scale  1.10034   0.09396  0.92202 1.35523
RD scale  0.06124   0.05249 -0.05229 0.18295

### Stratum-specific effects

The following call to exactmed() returns mediation effects when the values of the covariates C1 and C2 are $$0.1$$ and $$0.4$$, respectively, assuming an exposure-mediator interaction and using the delta method to construct $$95\%$$ confidence intervals:

>
> results7 <- exactmed(
+   data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,
+   m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'),
+   m_cov_cond = c(C1 = 0.1,C2 = 0.4), y_cov_cond = c(C1 = 0.1, C2 = 0.4)
+   )
>
> results7
$Natural effects on OR scale Estimate Std.error 2.5% 97.5% P.val Direct effect 1.86610 0.27725 1.39466 2.49690 2.6816e-05 Indirect effect 0.89347 0.06371 0.77694 1.02747 0.11413480 Total effect 1.66730 0.25143 1.24065 2.24066 0.00069917$Natural effects on RR scale
Estimate Std.error    2.5%   97.5%      P.val
Direct effect    1.37432   0.10503 1.18315 1.59639 3.1743e-05
Indirect effect  0.95099   0.03022 0.89357 1.01210 0.11377623
Total effect     1.30697   0.10409 1.11809 1.52777 0.00077525

$Natural effects on RD scale Estimate Std.error 2.5% 97.5% P.val Direct effect 0.15465 0.03625 0.08360 0.22571 1.9913e-05 Indirect effect -0.02783 0.01759 -0.06230 0.00664 0.11360218 Total effect 0.12683 0.03702 0.05427 0.19938 0.00061244$Controlled direct effects (m=0)
Estimate Std.error    2.5%   97.5%      P.val
OR scale  2.61843   0.52125 1.77253 3.86803 1.3290e-06
RR scale  1.63173   0.16066 1.34536 1.97906 6.5954e-07
RD scale  0.23603   0.04712 0.14369 0.32838 5.4583e-07

$Controlled direct effects (m=1) Estimate Std.error 2.5% 97.5% P.val OR scale 1.30748 0.28346 0.85486 1.99974 0.21622 RR scale 1.14677 0.12957 0.91897 1.43104 0.22549 RD scale 0.06689 0.05389 -0.03873 0.17251 0.21448 Common adjustment covariates in vectors m_cov and y_cov must have the same values; otherwise, the execution of the exactmed() function is aborted and an error message is displayed in the R console. Example: > exactmed( + data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, + m_cov = c('C1','C2'), y_cov = c('C1', 'C2'), + m_cov_cond = c(C1 = 0.3, C2 = 0.4), y_cov_cond = c(C1 = 0.1, C2 = 0.4) + )  Error in .check_input_param(data = data, a = a, m = m, y = y, a1 = a1, : Covariate C1 has two different values specified If the covariates specified in m_cov_cond (y_cov_cond) constitute some proper subset of m_cov (y_cov) then the other covariates are set to their sample-specific mean levels. Hence, the call > > results8 <- exactmed( + data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, + m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), + m_cov_cond = c(C1 = 0.1), y_cov_cond = c(C1 = 0.1) + ) is equivalent to: > > mc2 <- mean(datamed$C2)
>  mc2
[1] -0.04769712
>
> results9 <- exactmed(
+   data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,
+   m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'),
+   m_cov_cond = c(C1 = 0.1, C2 = mc2), y_cov_cond = c(C1 = 0.1, C2 = mc2)
+   )

This can be checked by comparing the two outputs:

>
> identical(results8, results9)
[1] TRUE

With this in mind, an error is easily predicted if one makes this call:

> exactmed(
+   data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,
+   m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'),
+   m_cov_cond = c(C1 = 0.1), y_cov_cond = c(C1 = 0.1, C2 = 0.4)
+   )
  Error in .check_input_param(data = data, a = a, m = m, y = y, a1 = a1, : Covariate C2 has two different values specified (one implicitly)

### Categorical covariates

exactmed() also allows for categorical covariates. Covariates of this type must appear in the data frame as factor, character or logical columns. To illustrate how exactmed() works with categorical covariates, we replace the covariate C1 in the dataset datamed by a random factor column:

>
> cate <- factor(sample(c("a", "b", "c"), nrow(datamed), replace =TRUE))
> datamed$C1 <- cate It is possible to estimate mediation effects at specific values of categorical covariates using the input parameters m_cov_cond and y_cov_cond. Note that if the targeted covariates are both numerical and categorical, the above parameters require to be list-type vectors, instead of atomic vectors as when covariates are only numerical. Hence, if one wants to estimate mediation effects at level ‘a’ for C1 and at value $$0.4$$ for C2, assuming an exposure-mediator interaction and using the delta method to construct $$95\%$$ confidence intervals, exactmed() should be called as follows: > > results10 <- exactmed( + data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0, + m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'), + m_cov_cond = list(C1 = 'a', C2 = 0.4), y_cov_cond = list(C1 = 'a', C2 = 0.4) + ) > > results10$Natural effects on OR scale
Estimate Std.error    2.5%   97.5%      P.val
Direct effect    1.93666   0.26913 1.47490 2.54298 1.9726e-06
Indirect effect  0.80707   0.05473 0.70663 0.92179  0.0015722
Total effect     1.56302   0.22095 1.18478 2.06201  0.0015806

$Natural effects on RR scale Estimate Std.error 2.5% 97.5% P.val Direct effect 1.28784 0.07186 1.15443 1.43667 5.7977e-06 Indirect effect 0.93157 0.02153 0.89031 0.97474 0.0021622 Total effect 1.19971 0.07055 1.06911 1.34627 0.0019593$Natural effects on RD scale
Estimate Std.error     2.5%    97.5%      P.val
Direct effect    0.15482   0.03215  0.09180  0.21784 1.4724e-06
Indirect effect -0.04740   0.01505 -0.07691 -0.01790  0.0016367
Total effect     0.10742   0.03377  0.04123  0.17361  0.0014684

$Controlled direct effects (m=0) Estimate Std.error 2.5% 97.5% P.val OR scale 2.61643 0.50061 1.79824 3.80690 4.9841e-07 RR scale 1.39356 0.09416 1.22072 1.59089 9.0296e-07 RD scale 0.21365 0.04027 0.13473 0.29258 1.1236e-07$Controlled direct effects (m=1)
Estimate Std.error     2.5%   97.5%   P.val
OR scale  1.37503   0.28624  0.91436 2.06781 0.12605
RR scale  1.14656   0.10575  0.95694 1.37375 0.13812
RD scale  0.07787   0.05103 -0.02214 0.17789 0.12699

If one does not specify a value for the categorical covariate C1, exactmed() computes the effects by assigning each dummy variable (created internally by exactmed()) to a value equal to the proportion of observations in the corresponding category (equivalent to setting each dummy variable to its mean value):

>
> results11 <- exactmed(
+   data = datamed, a = 'X', m = 'M', y = 'Y', a1 = 1, a0 = 0,
+   m_cov = c('C1', 'C2'), y_cov = c('C1', 'C2'),
+   m_cov_cond = c(C2 = 0.4), y_cov_cond = c(C2 = 0.4)
+   )
>
> results11
$Natural effects on OR scale Estimate Std.error 2.5% 97.5% P.val Direct effect 2.01817 0.28018 1.53741 2.64928 4.2356e-07 Indirect effect 0.79861 0.05731 0.69383 0.91921 0.00172510 Total effect 1.61173 0.22105 1.23183 2.10879 0.00050099$Natural effects on RR scale
Estimate Std.error    2.5%   97.5%      P.val
Direct effect    1.31391   0.07090 1.18203 1.46049 4.2158e-07
Indirect effect  0.92786   0.02211 0.88552 0.97223  0.0016799
Total effect     1.21912   0.06962 1.09003 1.36350  0.0005212

$Natural effects on RD scale Estimate Std.error 2.5% 97.5% P.val Direct effect 0.16525 0.03184 0.10286 0.22765 2.0946e-07 Indirect effect -0.04990 0.01579 -0.08084 -0.01896 0.00157225 Total effect 0.11536 0.03280 0.05107 0.17964 0.00043667$Controlled direct effects (m=0)
Estimate Std.error    2.5%   97.5%      P.val
OR scale  2.61643   0.50061 1.79824 3.80690 4.9841e-07
RR scale  1.40827   0.09257 1.23803 1.60191 1.9076e-07
RD scale  0.21668   0.04026 0.13777 0.29559 7.3675e-08

\$Controlled direct effects (m=1)
Estimate Std.error     2.5%   97.5%   P.val
OR scale  1.37503   0.28624  0.91436 2.06781 0.12605
RR scale  1.15094   0.10884  0.95622 1.38531 0.13713
RD scale  0.07836   0.05129 -0.02217 0.17889 0.12656