Test and effect size details

Indrajeet Patil

2020-04-22

This vignette provides a go-to summary for which test is carried out for each function included in the package and what effect size it returns. Additionally, there are also recommendations on how to interpret those effect sizes.

Summary of statistical tests and effect sizes

Here is a summary table of all the statistical tests currently supported across various functions:

Functions Type Test Effect size 95% CI available?
expr_anova_parametric (2 groups) Parametric Student’s and Welch’s t-test Cohen’s d, Hedge’s g \(\checkmark\)
expr_anova_parametric (> 2 groups) Parametric Fisher’s and Welch’s one-way ANOVA \(\eta^2, \eta^2_p, \omega^2, \omega^2_p\) \(\checkmark\)
expr_anova_nonparametric (2 groups) Non-parametric Mann-Whitney U-test r \(\checkmark\)
expr_anova_nonparametric (> 2 groups) Non-parametric Kruskal-Wallis Rank Sum Test \(\epsilon^2\) \(\checkmark\)
expr_anova_robust (2 groups) Robust Yuen’s test for trimmed means \(\xi\) \(\checkmark\)
expr_anova_robust (> 2 groups) Robust Heteroscedastic one-way ANOVA for trimmed means \(\xi\) \(\checkmark\)
expr_anova_parametric (2 groups) Parametric Student’s t-test Cohen’s d, Hedge’s g \(\checkmark\)
expr_anova_parametric (> 2 groups) Parametric Fisher’s one-way repeated measures ANOVA \(\eta^2_p, \omega^2\) \(\checkmark\)
expr_anova_nonparametric (2 groups) Non-parametric Wilcoxon signed-rank test r \(\checkmark\)
expr_anova_nonparametric (> 2 groups) Non-parametric Friedman rank sum test \(W_{Kendall}\) \(\checkmark\)
expr_anova_robust (2 groups) Robust Yuen’s test on trimmed means for dependent samples \(\xi\) \(\checkmark\)
expr_anova_robust (> 2 groups) Robust Heteroscedastic one-way repeated measures ANOVA for trimmed means \(\times\) \(\times\)
expr_contingency_tab (unpaired) Parametric \(\text{Pearson's}~ \chi^2 ~\text{test}\) Cramér’s V \(\checkmark\)
expr_contingency_tab (paired) Parametric McNemar’s test Cohen’s g \(\checkmark\)
expr_contingency_tab Parametric One-sample proportion test Cramér’s V \(\checkmark\)
expr_corr_test Parametric Pearson’s r r \(\checkmark\)
expr_corr_test Non-parametric \(\text{Spearman's}~ \rho\) \(\rho\) \(\checkmark\)
expr_corr_test Robust Percentage bend correlation r \(\checkmark\)
expr_t_onesample Parametric One-sample t-test Cohen’s d, Hedge’s g \(\checkmark\)
expr_t_onesample Non-parametric One-sample Wilcoxon signed rank test r \(\checkmark\)
expr_t_onesample Robust One-sample percentile bootstrap robust estimator \(\checkmark\)
expr_meta_parametric Parametric Meta-analysis via random-effects models \(\beta\) \(\checkmark\)
expr_meta_robust Robust Meta-analysis via robust random-effects models \(\beta\) \(\checkmark\)

Note that the following recommendations on how to interpret the effect sizes are just suggestions and there is nothing universal about them. The interpretation of any effect size measures is always going to be relative to the discipline, the specific data, and the aims of the analyst. Here the guidelines are given for small, medium, and large effects and references should shed more information on the baseline discipline with respect to which these guidelines were recommended. This is important because what might be considered a small effect in psychology might be large for some other field like public health.

One-sample tests

parametric

Test: One-sample t-test
Effect size: Cohen’s d, Hedge’s g

Effect size Small Medium Large Range
Cohen’s d 0 – < 0.20 0.20 – < 0.50 ≥ 0.80 [-Inf,Inf]
Hedge’s g 0 – < 0.20 0.20 – < 0.50 ≥ 0.80 [-Inf,Inf]

non-parametric

Test: One-sample Wilcoxon Signed-rank Test
Effect size: r ( = \(Z/\sqrt(N_{obs})\))

Effect size Small Medium Large Range
r 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [0,1]

robust

Test: One-sample percentile bootstrap test
Effect size: robust location measure

Two-sample tests

within-subjects design

parametric

Test: Student’s dependent samples t-test
Effect size: Cohen’s d, Hedge’s g

Effect size Small Medium Large Range
Cohen’s d 0.20 0.50 0.80 [0,1]
Hedge’s g 0.20 0.50 0.80 [0,1]

non-parametric

Test: Wilcoxon signed-rank test
Effect size: r ( = \(Z/\sqrt(N_{pairs})\))

Effect size Small Medium Large Range
r 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [0,1]

robust

Test: Yuen’s dependent sample trimmed means t-test
Effect size: Explanatory measure of effect size (\(\xi\))

Effect size Small Medium Large Range
\(\xi\) 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [0,1]

Reference: https://CRAN.R-project.org/package=WRS2/vignettes/WRS2.pdf

between-subjects design

parametric

Test: Student’s and Welch’s independent samples t-test
Effect size: Cohen’s d, Hedge’s g

Effect size Small Medium Large Range
Cohen’s d 0.20 0.50 0.80 [-Inf,Inf]
Hedge’s g 0.20 0.50 0.80 [-Inf,Inf]

non-parametric

Test: Two-sample Mann–Whitney U Test
Effect size: r ( = \(Z/\sqrt(N_{obs})\))

Effect size Small Medium Large Range
r 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [0,1]

Reference: https://rcompanion.org/handbook/F_04.html

robust

Test: Yuen’s independent sample trimmed means t-test
Effect size: Explanatory measure of effect size (\(\xi\))

Effect size Small Medium Large Range
\(\xi\) 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [0,1]

Reference: https://CRAN.R-project.org/package=WRS2/vignettes/WRS2.pdf

One-way ANOVAs

within-subjects design

parametric

Test: Fisher’s repeated measures one-way ANOVA
Effect size: \(\eta^2_p\), \(\omega^2\)

Effect size Small Medium Large Range
\(\omega^2\) 0.01 – < 0.06 0.06 – < 0.14 ≥ 0.14 [0,1]
\(\eta^2_p\) 0.01 – < 0.06 0.06 – < 0.14 ≥ 0.14 [0,1]

Reference:

non-parametric

Test: Friedman’s rank sum test
Effect size: Kendall’s W

In the following table, k is the number of treatments, groups, or things being rated.

k Small Medium Large Range
k = 3 < 0.10 0.10 – < 0.30 ≥ 0.30 [0,1]
k = 5 < 0.10 0.10 – < 0.25 ≥ 0.25 [0,1]
k = 7 < 0.10 0.10 – < 0.20 ≥ 0.20 [0,1]
k = 9 < 0.10 0.10 – < 0.20 ≥ 0.20 [0,1]

robust

Test: Heteroscedastic one-way repeated measures ANOVA for trimmed means
Effect size: Not available

between-subjects design

parametric

Test: Fisher’s or Welch’s one-way ANOVA
Effect size: \(\eta^2\), \(\eta^2_p\), \(\omega^2\), \(\omega^2_p\)

Effect size Small Medium Large Range
\(\eta^2\) 0.01 – < 0.06 0.06 – < 0.14 ≥ 0.14 [0,1]
\(\omega^2\) 0.01 – < 0.06 0.06 – < 0.14 ≥ 0.14 [0,1]
\(\eta^2_p\) 0.01 – < 0.06 0.06 – < 0.14 ≥ 0.14 [0,1]
\(\omega^2_p\) 0.01 – < 0.06 0.06 – < 0.14 ≥ 0.14 [0,1]

Reference:

non-parametric

Test: Kruskal–Wallis test
Effect size: \(\epsilon^2\)

Effect size Small Medium Large Range
\(\epsilon^2\) 0.01 – < 0.08 0.08 – < 0.26 ≥ 0.26 [0,1]

Reference: https://rcompanion.org/handbook/F_08.html

robust

Test: Heteroscedastic one-way ANOVA for trimmed means
Effect size: Explanatory measure of effect size (\(\xi\))

Effect size Small Medium Large Range
\(\xi\) 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [0,1]

Reference: https://CRAN.R-project.org/package=WRS2/vignettes/WRS2.pdf

Contingency table analyses

association test - unpaired

Test: Pearson’s \(\chi^2\)-squared test
Effect size: Cramér’s V

In the following table, k is the minimum number of categories in either rows or columns.

k Small Medium Large Range
k = 2 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [0,1]
k = 3 0.07 – < 0.20 0.20 – < 0.35 ≥ 0.35 [0,1]
k = 4 0.06 – < 0.17 0.17 – < 0.29 ≥ 0.29 [0,1]

Reference: https://rcompanion.org/handbook/H_10.html

association test - paired

Test: McNemar’s test
Effect size: Cohen’s g

Effect size Small Medium Large Range
Cohen’s g 0.05 – < 0.15 0.15 – < 0.25 ≥ 0.25 [0,1]

Reference: https://rcompanion.org/handbook/H_05.html

goodness-of-fit test

Test: Pearson’s \(\chi^2\)-squared goodness-of-fit test
Effect size: Cramér’s V

In the following table, k is the number of categories.

k Small Medium Large Range
k = 2 0.100 – < 0.300 0.300 – < 0.500 ≥ 0.500 [0,1]
k = 3 0.071 – < 0.212 0.212 – < 0.354 ≥ 0.354 [0,1]
k = 4 0.058 – < 0.173 0.173 – < 0.289 ≥ 0.289 [0,1]
k = 5 0.050 – < 0.150 0.150 – < 0.250 ≥ 0.250 [0,1]
k = 6 0.045 – < 0.134 0.134 – < 0.224 ≥ 0.224 [0,1]
k = 7 0.043 – < 0.130 0.130 – < 0.217 ≥ 0.217 [0,1]
k = 8 0.042 – < 0.127 0.127 – < 0.212 ≥ 0.212 [0,1]
k = 9 0.042 – < 0.125 0.125 – < 0.209 ≥ 0.209 [0,1]
k = 10 0.041 – < 0.124 0.124 – < 0.207 ≥ 0.207 [0,1]

Reference: https://rcompanion.org/handbook/H_03.html

Correlation analyses

parametric

Test: Pearson product-moment correlation coefficient
Effect size: Pearson’s correlation coefficient (r)

Effect size Small Medium Large Range
Pearson’s r 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [-1,1]

non-parametric

Test: Spearman’s rank correlation coefficient
Effect size: Spearman’s rank correlation coefficient (\(\rho\))

Effect size Small Medium Large Range
Spearman’s \(\rho\) 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [-1,1]

robust

Test: Percentage bend correlation coefficient
Effect size: Percentage bend correlation coefficient (\(\rho_{pb}\))

Effect size Small Medium Large Range
\(\rho_{pb}\) 0.10 – < 0.30 0.30 – < 0.50 ≥ 0.50 [-1,1]

Suggestions

If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/ggstatsplot/issues

Session Information

For details, see- https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/session_info.html