User Rating 0.0 ★★★★★

Total Usage 0 times

Category Statistics & Probability

Paste your numbers (one per line, comma-separated, or space-separated) Accepts integers, decimals, negatives, scientific notation. Non-numeric text is ignored.

Upload CSV / TXT

Presets:

Digit	Count	Observed %	Expected %	Difference	\|Deviation\|

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Benford's Law (also called the Newcomb-Benford Law) predicts that in many naturally occurring datasets, the leading digit d appears with probability P(d) = log₁₀(1 + 1/d). Digit 1 appears roughly 30.1% of the time, not 11.1% as naive intuition suggests. Datasets spanning multiple orders of magnitude (population counts, financial statements, physical constants, election returns) tend to follow this distribution. Failure to conform can indicate data fabrication, rounding bias, or truncation artifacts. Forensic accountants and auditors use Benford analysis as a first-pass anomaly screen on general ledger entries, tax returns, and expense reports.

This calculator extracts every leading significant digit from your dataset, computes observed vs. expected frequencies, and runs three conformity tests: the χ² goodness-of-fit test (degrees of freedom = 8, critical value 15.507 at α = 0.05), Nigrini's Mean Absolute Deviation (MAD), and the Kolmogorov-Smirnov (KS) statistic. The tool approximates conformity under the assumption that observations are independent and the dataset contains at least 100 values. Smaller samples yield unreliable results. Numbers with no significant digits (zero, non-numeric text) are silently excluded from analysis.

Formulas

The probability that the first significant digit equals d (d ∈ {1, 2, …, 9}) under Benford's Law:

P(d) = log₁₀(1 + 1d)

The Chi-Squared goodness-of-fit statistic with k − 1 = 8 degrees of freedom:

χ² = 9∑d=1 (O_d − E_d)²E_d

where O_d = observed count of digit d, and E_d = N ⋅ P(d) is the expected count given total sample size N. Reject the null hypothesis (data follows Benford) when χ² > 15.507 at significance level α = 0.05.

Nigrini's Mean Absolute Deviation:

MAD = 19 9∑d=1 |O_d′ − P(d)|

where O_d′ is the observed proportion (relative frequency) of digit d.

Kolmogorov-Smirnov statistic:

D = max_d |F_obs(d) − F_exp(d)|

where F_obs and F_exp are cumulative distribution functions of observed and expected proportions. Critical value at α = 0.05 is approximated as 1.36√N.

Reference Data

Leading Digit d	Benford Probability P(d)	Percentage	Cumulative %
1	0.30103	30.103%	30.103%
2	0.17609	17.609%	47.712%
3	0.12494	12.494%	60.206%
4	0.09691	9.691%	69.897%
5	0.07918	7.918%	77.815%
6	0.06695	6.695%	84.510%
7	0.05799	5.799%	90.309%
8	0.05115	5.115%	95.424%
9	0.04576	4.576%	100.000%

MAD Range	Conformity Level	Interpretation
0.000 - 0.006	Close Conformity	Strong adherence to Benford's Law. Typical of large, unmanipulated datasets.
0.006 - 0.012	Acceptable Conformity	Minor deviations within normal variance. Generally passes audit screens.
0.012 - 0.015	Marginal Conformity	Warrants further investigation. Could indicate partial data manipulation or natural boundary effects.
≥ 0.015	Non-Conformity	Significant deviation. Data may be fabricated, truncated, or drawn from a non-Benford process.

Dataset Type	Typically Conforms?	Notes
Population of cities/countries	Yes	Spans many orders of magnitude
Financial statement line items	Yes	Standard forensic accounting application
Stock prices	Yes	Over long periods with sufficient range
Physical constants	Yes	Mixed units amplify multi-order effect
Fibonacci sequence	Yes	Exact conformity in the limit
Telephone numbers	No	Assigned, not naturally generated
Human heights (cm)	No	Narrow range, single order of magnitude
Lottery results	No	Uniform distribution by design
Zip/postal codes	No	Assigned sequentially by geography
Invoice amounts (fabricated)	No	Fraudsters tend to over-represent digits 5-9

Frequently Asked Questions

Most statisticians recommend at least 100 to 500 observations for meaningful results. Below 100 values, the chi-squared test has low statistical power and MAD thresholds become unreliable. Nigrini suggests a minimum of 300 records for forensic accounting applications. This calculator will warn you if your sample is below 100.

Benford's Law applies to datasets that span multiple orders of magnitude and arise from multiplicative or exponential processes. Datasets confined to a narrow range (e.g., human heights in centimeters, all between 150 and 200) will not conform. Assigned numbers (phone numbers, zip codes, lottery draws) are also excluded because they lack the geometric scaling property. A dataset of invoice amounts between $1 and $999,999 conforms well; a dataset of exam scores between 60 and 100 does not.

With 9 possible leading digits minus 1 estimated parameter, we get 8 degrees of freedom. At significance level α = 0.05, the chi-squared distribution table gives a critical value of 15.507. If the computed χ² exceeds this threshold, you reject the null hypothesis that the data follows Benford's distribution. At α = 0.01 the critical value rises to 20.090, making the test more conservative.

No. Benford's Law is a screening tool, not proof of fraud. Non-conformity flags a dataset for deeper investigation (duplicate analysis, stratification by vendor, round-number testing). Legitimate datasets can fail Benford tests due to natural constraints (price points clustered near $9.99, minimum transaction thresholds). Conversely, a sophisticated fraudster could fabricate Benford-conforming data. Always combine Benford analysis with second-digit tests, last-two-digit tests, and domain-specific auditing procedures.

Chi-squared is a formal hypothesis test that is sensitive to sample size. Very large datasets will almost always reject the null hypothesis due to tiny but statistically significant deviations. MAD (Mean Absolute Deviation) is a descriptive measure independent of sample size, making it more practical for forensic work. Nigrini's thresholds (Close < 0.006, Acceptable < 0.012, Marginal < 0.015, Non-conforming ≥ 0.015) provide stable benchmarks regardless of whether you have 500 or 500,000 records.

Yes. The leading digit extraction algorithm strips minus signs and leading zeros (including those after a decimal point). For example, −0.00482 yields leading digit 4, and −37.5 yields leading digit 3. The value 0 is excluded entirely since it has no significant leading digit. Scientific notation (e.g., 3.2e7) is also parsed correctly.

The KS test examines the maximum deviation between cumulative distributions rather than bin-by-bin squared deviations. It is more sensitive to systematic shifts (e.g., all digits consistently 2% too high) and less affected by a single outlier bin. The critical value is approximately 1.36 / √N at α = 0.05. For small samples the KS test can be more robust than chi-squared, but with only 9 discrete categories its power is limited compared to continuous applications.