User Rating 0.0
Total Usage 0 times
Common values: 0.01, 0.05, 0.10
Enter values between 0 and 1. Accepts commas, spaces, tabs, or newlines as separators.
Is this tool helpful?

Your feedback helps us improve.

About

Running multiple statistical tests on the same dataset inflates Type I error rate exponentially. With 20 independent tests at α = 0.05, the probability of at least one false positive reaches 1 (1 0.05)20 0.64. The Bonferroni correction divides the significance threshold α by the number of comparisons m, controlling the familywise error rate (FWER) at the cost of statistical power. This tool implements four correction methods: classical Bonferroni, the less conservative Šidák correction, the step-down Holm-Bonferroni procedure, and the Benjamini-Hochberg procedure for false discovery rate control. Each method trades off between Type I and Type II error differently.

Misapplying corrections leads to two costly outcomes: no correction inflates false discoveries in genomics, neuroimaging, or A/B testing; over-correction with Bonferroni on correlated tests masks true effects. This calculator flags which hypotheses survive each method, letting you compare power loss across approaches. Note: all methods assume independent or positively dependent tests. For strongly correlated test statistics, consider permutation-based methods not covered here.

bonferroni correction multiple comparisons p-value adjustment familywise error rate FWER false discovery rate FDR Holm-Bonferroni Sidak correction Benjamini-Hochberg statistics calculator hypothesis testing

Formulas

The Bonferroni correction adjusts the significance level to maintain the familywise error rate (FWER) at or below a desired α. The adjusted threshold is:

αadj = αm

A hypothesis Hi is rejected if pi αadj. The Šidák correction provides a tighter bound for independent tests:

αŠidák = 1 (1 α)1m

The Holm-Bonferroni step-down procedure sorts p-values in ascending order p(1) p(2) p(m) and rejects H(i) while:

p(i) αm i + 1

The Benjamini-Hochberg procedure controls the false discovery rate (FDR). It sorts p-values ascending and finds the largest k such that:

p(k) km × α

All hypotheses H(1), …, H(k) are then rejected.

Where: α = nominal significance level (typically 0.05), m = total number of comparisons (hypotheses tested), pi = observed p-value for test i, i = rank of sorted p-value.

Reference Data

MethodControlsAdjusted ThresholdPowerAssumptionBest For
BonferroniFWERα ÷ mLowestAny dependenceSmall m, strict control
ŠidákFWER1 (1 α)1mSlightly higherIndependent testsIndependent tests, moderate m
Holm-BonferroniFWERα ÷ (m i + 1)Higher than BonferroniAny dependenceGeneral use, uniformly more powerful
Benjamini-HochbergFDRim × αHighestIndependent or PRDSLarge m, exploratory research
Bonferroni (m=5, α=0.05)FWER0.01 - - Example threshold
Bonferroni (m=10, α=0.05)FWER0.005 - - Example threshold
Bonferroni (m=20, α=0.05)FWER0.0025 - - Example threshold
Bonferroni (m=50, α=0.05)FWER0.001 - - Example threshold
Bonferroni (m=100, α=0.05)FWER0.0005 - - Example threshold
Bonferroni (m=1000, α=0.05)FWER0.00005 - - Genomics scale
Bonferroni (m=20000, α=0.05)FWER2.5 × 10−6 - - GWAS standard
UncorrectedPer-comparisonαMaximumSingle testSingle hypothesis only
FWER at m tests - 1 (1 α)m - IndependentCalculating inflation
Genome-wide significanceFWER5 × 10−8 - ~106 independent SNPsGWAS convention
Suggestive significanceFWER1 × 10−5 - - GWAS follow-up

Frequently Asked Questions

Use Bonferroni (or Holm) when you need strict familywise error rate control - meaning you want to guarantee that the probability of even one false positive stays below α. This is critical in confirmatory clinical trials or safety analyses. Use Benjamini-Hochberg when you can tolerate a controlled proportion of false discoveries among rejected hypotheses, which is typical in exploratory genomics, proteomics, or screening studies with hundreds or thousands of tests. BH preserves far more statistical power at scale.
The adjusted threshold α ÷ m shrinks linearly. At m = 1000 with α = 0.05, the threshold drops to 0.00005. Only very strong effects survive this cutoff, dramatically increasing Type II errors (false negatives). The method treats all tests as maximally dependent, which wastes power when tests are independent or weakly correlated. For large m, Holm-Bonferroni is strictly more powerful with identical FWER control.
No. Bonferroni is valid under any dependence structure - this is its strength and weakness. It uses the union bound (Boole's inequality): P(any rejection) 1m × m × α/m = α. Because it makes no distributional assumptions, it controls FWER regardless of correlation. The Šidák correction does assume independence and gives a slightly less conservative threshold as a result.
Holm's step-down procedure is uniformly more powerful than Bonferroni while controlling the same FWER. It works by sorting p-values and comparing each against progressively relaxed thresholds: α/m, α/(m1), α/(m2), etc. The first non-rejected hypothesis stops the procedure. This means later tests face easier thresholds, rescuing borderline significant results that Bonferroni would miss. There is no scenario where Bonferroni rejects a hypothesis but Holm does not.
FWER (familywise error rate) is the probability of making at least one Type I error across all tests. FDR (false discovery rate) is the expected proportion of rejected hypotheses that are false positives. With m = 100 tests and FDR controlled at 0.05, if you reject 20 hypotheses, you expect about 1 to be a false positive. FWER control at the same level demands no false positives at all. The choice depends on cost of false positives: drug approvals need FWER; gene expression screening tolerates FDR.
You can, and it remains valid - but it becomes extremely wasteful. Highly correlated tests carry redundant information, so treating each as independent inflates m artificially. For repeated measures, consider using Tukey's HSD or Dunnett's test. For nested/hierarchical designs, consider hierarchical FDR procedures or permutation-based corrections that account for the correlation structure. The effective number of independent comparisons meff can be estimated via eigenvalue decomposition of the correlation matrix.