Bayesian A/B Test Calculator
Free Bayesian A/B test calculator. Compute posterior distributions, probability to win, expected loss, and 95% credible intervals using Monte Carlo simulation.
About
Frequentist A/B tests answer the wrong question. They report the probability of observing data given no effect exists (p-value), not the probability that variant B actually beats variant A. Misinterpreting p < 0.05 as "95% chance B is better" is a statistical error that costs organizations real revenue. This calculator uses Bayesian inference with a Beta-Binomial conjugate model. Each variant's conversion rate θ is modeled as a Beta posterior: Beta(α0 + c, β0 + n − c), where c is conversions and n is visitors. Monte Carlo sampling (100,000 draws) computes the probability each variant wins and expected loss directly.
Expected loss is the metric that matters for decision-making. It quantifies how much conversion rate you sacrifice by choosing the wrong variant. A variant with 99% probability to win but 0.001% expected loss is a safe call. A variant with 80% probability to win and 2.5% expected loss deserves more data. The tool assumes a uniform prior Beta(1, 1) by default, which is non-informative. Adjust prior parameters if you have historical baseline data. Note: this model assumes independent Bernoulli trials with a fixed conversion probability per variant. It does not account for time-varying effects, novelty bias, or segment interactions.
Formulas
The Bayesian A/B framework uses a Beta-Binomial conjugate model. Given a prior Beta(α0, β0) and observed data (c conversions from n visitors), the posterior distribution over the conversion rate θ is:
The Beta probability density function is:
where B(α, β) = Γ(α)Γ(β)Γ(α + β) is the Beta function. The probability that variant B beats variant A is computed via Monte Carlo sampling:
Expected loss for choosing variant A is:
Where θ = true conversion rate for a variant, α0 = prior alpha parameter (shape 1), β0 = prior beta parameter (shape 2), c = observed conversions, n = total visitors, N = number of Monte Carlo samples, I(⋅) = indicator function returning 1 if condition is true.
Reference Data
| Metric | Definition | Decision Threshold | Notes |
|---|---|---|---|
| Probability to Win | P(θB > θA) | ≥ 95% | Most common stopping rule |
| Expected Loss (A) | E[max(θB − θA, 0)] | ≤ 0.1% absolute | Risk of choosing A when B is better |
| Expected Loss (B) | E[max(θA − θB, 0)] | ≤ 0.1% absolute | Risk of choosing B when A is better |
| 95% Credible Interval | Central interval containing 95% of posterior mass | Narrower is better | Not the same as a confidence interval |
| Posterior Mean | αα + β | - | Shrinks toward prior with small samples |
| Posterior Variance | αβ(α + β)2(α + β + 1) | - | Decreases with more data |
| Relative Uplift | θB − θAθA × 100% | Context-dependent | Computed from posterior means |
| Prior: Uniform | Beta(1, 1) | Default | No prior knowledge assumed |
| Prior: Jeffreys | Beta(0.5, 0.5) | Alternative | Minimally informative, invariant prior |
| Prior: Informed | Beta(α0, β0) | Custom | Use historical data to set parameters |
| Sample Size Guidance | - | ≥ 100 conversions per variant | Rule of thumb for stable posteriors |
| Minimum Detectable Effect | Smallest Δθ the test can resolve | 1-5% relative | Depends on sample size |
| Monte Carlo Error | ∝ 1√N | ≤ 0.1% at 100k samples | Increase samples for precision |
| Conjugacy | Beta prior × Binomial likelihood → Beta posterior | - | Closed-form update, exact |
| Stopping Rule | Expected loss below threshold | 0.01-0.5% | Unlike frequentist, peeking is valid |