Note this version of the R package and tutorial addresses two errors in the original paper: 1) The harmonic mean p-value for set \(\mathcal{R}\) should be compared against a threshold \(\alpha_L\), not \(\alpha_{|\mathcal{R}|}\). 2) The mechanism for calculating asymptotically exact harmonic mean p-values was wrongly stated. The harmonicmeanp package version 2.0 only corrected error 1. Here both errors are addressed, and many of the results have changed quantitatively. In particular, correcting the errors makes the test less powerful than previously claimed for narrowing down regions of significance. See the updated correction for further information: https://blog.danielwilson.me.uk/2019/07/correction-harmonic-mean-p-value-for.html

The harmonic mean p-value (HMP) is a method for performing a combined test of the null hypothesis that no p-value is significant (Wilson, 2019). It is more robust to dependence between p-values than Fisher’s (1934) method, making it more broadly applicable. Like Bonferroni correction, the HMP procedure controls the strong-sense family-wise error rate (ssFWER). It is more powerful than common alternatives including Bonferroni and Simes procedures when combining large proportions of all the p-values, at the cost of slightly lower power when combining small proportions of all the p-values. It rivals the power of the BH procedure (Benjamini and Hochberg, 1995) which controls both the weak-sense family-wise error rate (wsFWER) and the false discovery rate (FDR), in the sense that the HMP is expected to find one or more p-values or groups of p-values significant more often than the BH procedure finds one or more p-values significant.

Method Robust to dependence Indicative power1 Controls
Significance very rare Significance uncommon Significance common FDR wsFWER ssFWER
Fisher \(\times\) \(\bullet\) \(\bullet\,\bullet\,\bullet\,\bullet\) \(\bullet\,\bullet\,\bullet\,\bullet\,\bullet\) \(\checkmark\) \(\checkmark\) \(\checkmark\)
HMP \(\checkmark\) \(\bullet\,\bullet\,\circ\) \(\bullet\,\bullet\,\bullet\) \(\bullet\,\bullet\,\bullet\,\bullet\) \(\checkmark\) \(\checkmark\) \(\checkmark\)
BH \(\checkmark\checkmark\) \(\bullet\,\bullet\,\circ\) \(\bullet\,\bullet\,\circ\) \(\bullet\,\bullet\,\circ\) \(\checkmark\) \(\checkmark\) \(\times\)
Bonferroni \(\checkmark\checkmark\checkmark\) \(\bullet\,\bullet\,\circ\) \(\bullet\,\bullet\) \(\bullet\,\bullet\) \(\checkmark\) \(\checkmark\) \(\checkmark\)

There are two approaches to applying the HMP:

  1. One compares \(\overset{\circ}{p}_{\mathcal{R}}\), the HMP of a set of p-values \(\mathcal{R}\), to significance threshold \(\alpha_L\,w_\mathcal{R}\).

  2. Equivalently, one compares \(p_{\overset{\circ}{p}_{\mathcal{R}}}\), an asymptotically exact HMP, to the significance threshold \(\alpha\,w_\mathcal{R}\).

In the above,

  • \(\overset{\circ}{p}_{\mathcal{R}}=\left(\sum_{i\in\mathcal{R}}w_{i}\right)/\left(\sum_{i\in\mathcal{R}}w_{i}/p_{i}\right)\) defines the HMP for the set of p-values \(\mathcal{R}\). It is computed in R by \(\texttt{hmp.stat}\).

  • \(\alpha_L\) is a significance threshold that depends on the desired family-wise error rate \(\alpha\) and the total number of individual p-values \(L\). It is computed in R by \(\texttt{qharmonicmeanp}\).

  • \(w_1 \dots w_L\) are weights for the individual p-values that must sum to one, i.e. \(\sum_{i=1}^{L}w_{i}=1\), and \(w_\mathcal{R} = \sum_{i\in\mathcal{R}} w_i\) equals the sum of weights of the individual p-values in set \(\mathcal{R}\). The HMP is robust to the choice of weights, so it is reasonable to start with equal weights (\(w_{i}=1/L\)).

  • The asymptotically exact p-value is computed by R using the \(\texttt{p.hmp}\) command, which takes into account the total number of individual p-values, \(L\).

In this tutorial, I will focus on the second approach because one usually wishes to quote in reports a p-value that can be directly compared to the usual significance threshold, e.g. \(\alpha=0.05\). When the subscript \(\mathcal{R}\) is dropped from \(\overset{\circ}{p}_{\mathcal{R}}\) and \(p_{\overset{\circ}{p}_{\mathcal{R}}}\) it means \(\mathcal{R}\) includes all the p-values. This is the “headline” HMP.

The method may be used as follows:

  • The headline HMP is deemed significant when \(\overset{\circ}{p}\leq\alpha_L\) or, equivalently, \(p_{\overset{\circ}{p}}\leq\alpha\). Here significant means that we reject the null hypothesis that none of the p-values are significant.

  • If the headline HMP is not significant, neither is the HMP for any subset. If the headline HMP is significant, subsets may also be significant. The significance thresholds are all pre-determined so the number of subsets that are tested does not affect them.

  • The HMP for a subset of p-values is deemed significant when \(\overset{\circ}{p}_{\mathcal{R}}\leq\alpha_L\,w_{\mathcal{R}}\) or, equivalently, \(p_{\overset{\circ}{p}_{\mathcal{R}}}\leq\alpha\,w_{\mathcal{R}}\). Here significant means that we reject the null hypothesis that none of the p-values in subset \(\mathcal{R}\) are significant.


To use this tutorial, copy and paste the R code from your web browser to the R console. In the HTML version, you can select and copy R code simply by clicking within the code snippet (as long as JQuery is enabled in your web browser and the page was compiled with Rmarkdown v2). To ensure the tutorial runs correctly, execute each code snippet once, in order. This tutorial was compiled using Rmarkdown (Allaire et al., 2018), with equations rendered by Mathjax.

If you haven’t already installed the package, type at the R console:


Once you have installed the package, load it in the usual way:

## Loading required package: FMStable

To check you have version 3.0 installed, type


Example 1. Sliding Window Analysis

Download the 312457 p-values from chromosome 12 of the genome-wide association study (GWAS) for neuroticism (Okbay et al., 2016). This file is an excerpt of the original data. It took me a few seconds to download the data excerpt. The 8 megabyte file contains rs identifiers and SNP positions as per human genome build GRCh37/hg19 as well as the p-values.

system.time((gwas = read.delim("https://www.danielwilson.me.uk/files/Neuroticism_ch12.txt",
##    user  system elapsed 
##   0.795   0.046   1.447
##            rs    pos      p
## 1   rs7959779 149478 0.3034
## 2   rs4980821 149884 0.5905
## 3 rs192950336 150256 0.1125
## 4  rs61907205 151213 0.4896
## 5   rs2368809 151236 0.7066
## 6   rs4018398 151469 0.9420

The harmonic mean p-value (HMP) is a statistic with which one can perform a combined test of the null hypothesis that none of the p-values is significant even when the p-values are dependent. In GWAS, p-values will often be dependent because of genetic linkage. The HMP can be used to test the null hypothesis that no SNPs on chromosome 12 are significant. Let’s do it manually by first calculating the HMP, assuming equal weights. Note that a total of L=6524432 tests were performed genome-wide, so this number must be used to determine the weights if we are to control the genome-wide ssFWER, even though we are only analysing the 312457 SNPs on chromosome 12 in this example.

L = 6524432
gwas$w = 1/L
R = 1:nrow(gwas)
(HMP.R = sum(gwas$w[R])/sum(gwas$w[R]/gwas$p[R]))
## [1] 0.0008734522

One of the remarkable properties of the HMP is that for small values (e.g. below 0.05), the HMP can be directly interpreted as a p-value after adjusting for multiple comparisons. That the HMP equals \(\overset{\circ}{p}_{\mathcal{R}}=0.0008734522\) suggests it is strongly significant before multiple testing correction. To test this formally, first the HMP significance threshold is computed. For that I will assume a false positive rate of \(\alpha=0.05\), i.e. 5%.

# Specify the false positive rate
alpha = 0.05
# Compute the HMP significance threshold
(alpha.L = qharmonicmeanp(alpha, L))
## [1] 0.02593083

The multiple testing-adjusted threshold against which to evaluate the significance of the combined test is determined by the sum of the weights for the p-values being combined. The HMP for subset \(\mathcal{R}\) is significant when \(\overset{\circ}{p}_\mathcal{R}\leq \alpha_L w_\mathcal{R}\).

# Test whether the HMP for subset R is significance
w.R = sum(gwas$w[R])
alpha.L * w.R
## [1] 0.001241835

Therefore after adjusting for multiple comparison we can reject the null hypothesis of no association on chromosome 12 at level \(\alpha=0.05\) because 0.0008734522 is below 0.001241835.

An equivalent approach is to calculate an asymptotically exact p-value based on the HMP.

# Use p.hmp instead to compute the HMP test statistic and
# calculate its asymptotically exact p-value in one step
# Note this line has changed because of a previous error.
w.R*pharmonicmeanp(HMP.R/w.R, L=L, lower.tail=TRUE)
## [1] 0.001343897
# Compare it to the multiple testing threshold
## [1] 0.002394515

The asymptotically exact p-value of \(p_{\overset{\circ}{p}_{\mathcal{R}}}=0.001343897\) is close to the HMP of \(\overset{\circ}{p}_{\mathcal{R}}=0.0008734522\) and also significant because it is below \(0.002394515\). Note however that direct interpretation of the HMP is anti-conservative compared to the asymptotically exact test, which is why the HMP had to be compared directly to the more stringent threshold \(\alpha_L=0.02593083\). The asymptotically exact p-value can be computed in one step:

# Note that the p.hmp function has been redefined to take argument L. Omitting L will issue a warning.
R = 1:nrow(gwas)
##       p.hmp 
## 0.001343897

The combined p-value for chromosome 12 is useful because if the combined p-value is not significant, neither is any constituent p-value, after multiple testing correction, as always. Conversely, if the combined p-value is significant, there may be one or more subsets of constituent p-values that are also significant. These subsets can be hunted down because another useful property of the HMP is that the significance thresholds of these further tests are the same no matter how many combinations of subsets of the constituent p-values are tested. Specifically, for any subset \(\mathcal{R}\) of the L p-values, the HMP is compared against a threshold \(\alpha_L\,w_{\mathcal{R}}\) (equivalently, the asymptotically exact HMP is compared against a threshold \(\alpha\,w_{\mathcal{R}}\)), where \(w_{\mathcal{\mathcal{R}}}=\sum_{i\in\mathcal{R}}w_{i}\) and the \(w_{i}\)s are the weights of the individual p-values, constrained to sum to one. Assuming equal weights, \(w_{i}=1/L\), meaning that \(w_{\mathcal{R}}=\left|\mathcal{R}\right|/L\) equals the fraction of all tests being combined. In what follows I will mainly use the asymptotically exact p-values, rather than directly interpreting the HMP.

For example, separately test the p-values occurring at even and odd positions on chromosome 12:

R = which(gwas$pos%%2==0)
##       p.hmp 
## 0.002658581
w.R = sum(gwas$w[R])
## [1] 0.001200587
R = which(gwas$pos%%2==1)
##      p.hmp 
## 0.00230653
w.R = sum(gwas$w[R])
## [1] 0.001193928

Neither of the two tests is significant individually: for even positions, the combined p-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}=0.002658581\) which was above the significance threshold of \(\alpha\,w_{\mathcal{R}}=0.001200587\) and for odd positions, the combined p-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}=0.00230653\) which was above the significance threshold of \(\alpha\,w_{\mathcal{R}}=0.001193928\).

Comparing p-values with different significance thresholds can be confusing. Instead, it is useful to calculate adjusted p-values, which are compared directly to \(\alpha\), the intended strong-sense familywise error rate. An adjusted p-value is simply divided by its weight w. For example:

R = which(gwas$pos%%2==0)
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
##   p.hmp 
## 0.11072
R = which(gwas$pos%%2==1)
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
##      p.hmp 
## 0.09659422

Now it is easy to see that both tests are non-significant, assuming \(\alpha=0.05\).

Of course it makes little sense to combine p-values according to whether their position is an even or odd number. Instead we might wish to test the first 156229 SNPs on chromosome 12 separately from the second 156228 SNPs to begin to narrow down regions of significance.

R = 1:156229
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
## p.hmp 
##     1
R = 156230:312457
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
##      p.hmp 
## 0.02842931

This is much clearer: only in the second half of the chromosome can we reject the null hypothesis of no significant p-values at the \(\alpha=0.05\) level. For the first half of the chromosome, the adjusted p-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}/w_{\mathcal{R}}=1\). By the corrected definition of asymptotically exact HMPs, the adjusted p-value will not exceed 1, although in general while p-values must be 1 or below, adjusted p-values need not be. For the second half of the chromosome, the adjusted p-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}/w_{\mathcal{R}}=0.02842931\) which is below the standard significance threshold of \(\alpha=0.05\).

Note that it was completely irrelevant that we had already performed tests of even- and odd-positioned SNPs: as mentioned above, the significance thresholds are pre-determined by the \(w_{\mathcal{R}}\)’s no matter how many subsets of p-values are tested and no matter in what combinations. We can test any subset of the p-values without incurring further multiple testing penalties. For example, let’s test 50 megabase windows overlaping at 10 megabase intervals. Testing overlapping versus non-overlapping windows has no effect on the significance thresholds, but of course it has an effect on the resolution of our conclusions and on the computational time.

# Define overlapping sliding windows of 50 megabase at 10 megabase intervals
win.50M.beg = outer(0:floor(max(gwas$pos/50e6-1)),(0:4)/5,"+")*50e6
win.50M.beg = win.50M.beg[win.50M.beg+50e6<=max(gwas$pos)]
# Calculate the combined p-values for each window
  p.50M = sapply(win.50M.beg,function(beg) {
    R = which(gwas$pos>=beg & gwas$pos<(beg+50e6))
##    user  system elapsed 
##   0.083   0.017   0.117
# Calculate sums of weights for each combined test
  w.50M = sapply(win.50M.beg,function(beg) {
    R = which(gwas$pos>=beg & gwas$pos<(beg+50e6))
##    user  system elapsed 
##   0.049   0.009   0.059
# Calculate adjusted p-value for each window
p.50M.adj = p.50M/w.50M 

Now plot them

# Took a few seconds, plotting over 312k points
gwas$p.adj = gwas$p/gwas$w
plot(gwas$pos/1e6,-log10(gwas$p.adj),pch=".",xlab="Position on chromosome 12 (megabases)",
  ylab="Adjusted significance (-log10 adjusted p-value)",
# Superimpose the significance threshold, alpha, e.g. alpha=0.05
# When using the HMP to evaluate individual p-values, the HMP threshold must be used,
# which is slightly more stringent than Bonferroni for individual tests
# For comparison, plot the conventional GWAS threshold of 5e-8. Need to convert
# this into the adjusted p-value scale. Instead of comparing each raw p-value
# against a Bonferonni threshold of alpha/L=0.05/6524432, we would be comparing each
# against 5e-8. So the adjusted p-values p/w=p*L would be compared against
# 5e-8*L = 5e-8 * 6524432 = 0.3262216