User Rating 0.0 ★★★★★

Total Usage 0 times

Category Statistics & Probability

Significance Level (α) Common values: 0.01, 0.05, 0.10

Correction Method

P-Values Enter values between 0 and 1. Accepts commas, spaces, tabs, or newlines as separators.

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Running multiple statistical tests on the same dataset inflates Type I error rate exponentially. With 20 independent tests at α = 0.05, the probability of at least one false positive reaches 1 − (1 − 0.05)²⁰ ≈ 0.64. The Bonferroni correction divides the significance threshold α by the number of comparisons m, controlling the familywise error rate (FWER) at the cost of statistical power. This tool implements four correction methods: classical Bonferroni, the less conservative Šidák correction, the step-down Holm-Bonferroni procedure, and the Benjamini-Hochberg procedure for false discovery rate control. Each method trades off between Type I and Type II error differently.

Misapplying corrections leads to two costly outcomes: no correction inflates false discoveries in genomics, neuroimaging, or A/B testing; over-correction with Bonferroni on correlated tests masks true effects. This calculator flags which hypotheses survive each method, letting you compare power loss across approaches. Note: all methods assume independent or positively dependent tests. For strongly correlated test statistics, consider permutation-based methods not covered here.

Formulas

The Bonferroni correction adjusts the significance level to maintain the familywise error rate (FWER) at or below a desired α. The adjusted threshold is:

α_adj = αm

A hypothesis H_i is rejected if p_i ≤ α_adj. The Šidák correction provides a tighter bound for independent tests:

α_Šidák = 1 − (1 − α)^1m

The Holm-Bonferroni step-down procedure sorts p-values in ascending order p₍₁₎ ≤ p₍₂₎ ≤ … ≤ p_(m) and rejects H_(i) while:

p_(i) ≤ αm − i + 1

The Benjamini-Hochberg procedure controls the false discovery rate (FDR). It sorts p-values ascending and finds the largest k such that:

p_(k) ≤ km × α

All hypotheses H₍₁₎, …, H_(k) are then rejected.

Where: α = nominal significance level (typically 0.05), m = total number of comparisons (hypotheses tested), p_i = observed p-value for test i, i = rank of sorted p-value.

Reference Data

Method	Controls	Adjusted Threshold	Power	Assumption	Best For
Bonferroni	FWER	α ÷ m	Lowest	Any dependence	Small m, strict control
Šidák	FWER	1 − (1 − α)^1m	Slightly higher	Independent tests	Independent tests, moderate m
Holm-Bonferroni	FWER	α ÷ (m − i + 1)	Higher than Bonferroni	Any dependence	General use, uniformly more powerful
Benjamini-Hochberg	FDR	im × α	Highest	Independent or PRDS	Large m, exploratory research
Bonferroni (m=5, α=0.05)	FWER	0.01	-	-	Example threshold
Bonferroni (m=10, α=0.05)	FWER	0.005	-	-	Example threshold
Bonferroni (m=20, α=0.05)	FWER	0.0025	-	-	Example threshold
Bonferroni (m=50, α=0.05)	FWER	0.001	-	-	Example threshold
Bonferroni (m=100, α=0.05)	FWER	0.0005	-	-	Example threshold
Bonferroni (m=1000, α=0.05)	FWER	0.00005	-	-	Genomics scale
Bonferroni (m=20000, α=0.05)	FWER	2.5 × 10⁻⁶	-	-	GWAS standard
Uncorrected	Per-comparison	α	Maximum	Single test	Single hypothesis only
FWER at m tests	-	1 − (1 − α)^m	-	Independent	Calculating inflation
Genome-wide significance	FWER	5 × 10⁻⁸	-	~10⁶ independent SNPs	GWAS convention
Suggestive significance	FWER	1 × 10⁻⁵	-	-	GWAS follow-up

Frequently Asked Questions

Use Bonferroni (or Holm) when you need strict familywise error rate control - meaning you want to guarantee that the probability of even one false positive stays below α. This is critical in confirmatory clinical trials or safety analyses. Use Benjamini-Hochberg when you can tolerate a controlled proportion of false discoveries among rejected hypotheses, which is typical in exploratory genomics, proteomics, or screening studies with hundreds or thousands of tests. BH preserves far more statistical power at scale.

The adjusted threshold α ÷ m shrinks linearly. At m = 1000 with α = 0.05, the threshold drops to 0.00005. Only very strong effects survive this cutoff, dramatically increasing Type II errors (false negatives). The method treats all tests as maximally dependent, which wastes power when tests are independent or weakly correlated. For large m, Holm-Bonferroni is strictly more powerful with identical FWER control.

No. Bonferroni is valid under any dependence structure - this is its strength and weakness. It uses the union bound (Boole's inequality): P(any rejection) ≤ 1m × m × α/m = α. Because it makes no distributional assumptions, it controls FWER regardless of correlation. The Šidák correction does assume independence and gives a slightly less conservative threshold as a result.

Holm's step-down procedure is uniformly more powerful than Bonferroni while controlling the same FWER. It works by sorting p-values and comparing each against progressively relaxed thresholds: α/m, α/(m−1), α/(m−2), etc. The first non-rejected hypothesis stops the procedure. This means later tests face easier thresholds, rescuing borderline significant results that Bonferroni would miss. There is no scenario where Bonferroni rejects a hypothesis but Holm does not.

FWER (familywise error rate) is the probability of making at least one Type I error across all tests. FDR (false discovery rate) is the expected proportion of rejected hypotheses that are false positives. With m = 100 tests and FDR controlled at 0.05, if you reject 20 hypotheses, you expect about 1 to be a false positive. FWER control at the same level demands no false positives at all. The choice depends on cost of false positives: drug approvals need FWER; gene expression screening tolerates FDR.

You can, and it remains valid - but it becomes extremely wasteful. Highly correlated tests carry redundant information, so treating each as independent inflates m artificially. For repeated measures, consider using Tukey's HSD or Dunnett's test. For nested/hierarchical designs, consider hierarchical FDR procedures or permutation-based corrections that account for the correlation structure. The effective number of independent comparisons m_eff can be estimated via eigenvalue decomposition of the correlation matrix.