User Rating 0.0
Total Usage 0 times
Accepts integers, decimals, negatives, scientific notation. Non-numeric text is ignored.
Presets:
Is this tool helpful?

Your feedback helps us improve.

About

Benford's Law (also called the Newcomb-Benford Law) predicts that in many naturally occurring datasets, the leading digit d appears with probability P(d) = log10(1 + 1/d). Digit 1 appears roughly 30.1% of the time, not 11.1% as naive intuition suggests. Datasets spanning multiple orders of magnitude (population counts, financial statements, physical constants, election returns) tend to follow this distribution. Failure to conform can indicate data fabrication, rounding bias, or truncation artifacts. Forensic accountants and auditors use Benford analysis as a first-pass anomaly screen on general ledger entries, tax returns, and expense reports.

This calculator extracts every leading significant digit from your dataset, computes observed vs. expected frequencies, and runs three conformity tests: the χ2 goodness-of-fit test (degrees of freedom = 8, critical value 15.507 at α = 0.05), Nigrini's Mean Absolute Deviation (MAD), and the Kolmogorov-Smirnov (KS) statistic. The tool approximates conformity under the assumption that observations are independent and the dataset contains at least 100 values. Smaller samples yield unreliable results. Numbers with no significant digits (zero, non-numeric text) are silently excluded from analysis.

benfords law leading digit chi-squared test fraud detection statistical analysis digit distribution goodness of fit

Formulas

The probability that the first significant digit equals d (d {1, 2, …, 9}) under Benford's Law:

P(d) = log10(1 + 1d)

The Chi-Squared goodness-of-fit statistic with k 1 = 8 degrees of freedom:

χ2 = 9d=1 (Od Ed)2Ed

where Od = observed count of digit d, and Ed = N P(d) is the expected count given total sample size N. Reject the null hypothesis (data follows Benford) when χ2 > 15.507 at significance level α = 0.05.

Nigrini's Mean Absolute Deviation:

MAD = 19 9d=1 |Od P(d)|

where Od is the observed proportion (relative frequency) of digit d.

Kolmogorov-Smirnov statistic:

D = maxd |Fobs(d) Fexp(d)|

where Fobs and Fexp are cumulative distribution functions of observed and expected proportions. Critical value at α = 0.05 is approximated as 1.36N.

Reference Data

Leading Digit dBenford Probability P(d)PercentageCumulative %
10.3010330.103%30.103%
20.1760917.609%47.712%
30.1249412.494%60.206%
40.096919.691%69.897%
50.079187.918%77.815%
60.066956.695%84.510%
70.057995.799%90.309%
80.051155.115%95.424%
90.045764.576%100.000%
MAD RangeConformity LevelInterpretation
0.000 - 0.006Close ConformityStrong adherence to Benford's Law. Typical of large, unmanipulated datasets.
0.006 - 0.012Acceptable ConformityMinor deviations within normal variance. Generally passes audit screens.
0.012 - 0.015Marginal ConformityWarrants further investigation. Could indicate partial data manipulation or natural boundary effects.
0.015Non-ConformitySignificant deviation. Data may be fabricated, truncated, or drawn from a non-Benford process.
Dataset TypeTypically Conforms?Notes
Population of cities/countriesYesSpans many orders of magnitude
Financial statement line itemsYesStandard forensic accounting application
Stock pricesYesOver long periods with sufficient range
Physical constantsYesMixed units amplify multi-order effect
Fibonacci sequenceYesExact conformity in the limit
Telephone numbersNoAssigned, not naturally generated
Human heights (cm)NoNarrow range, single order of magnitude
Lottery resultsNoUniform distribution by design
Zip/postal codesNoAssigned sequentially by geography
Invoice amounts (fabricated)NoFraudsters tend to over-represent digits 5-9

Frequently Asked Questions

Most statisticians recommend at least 100 to 500 observations for meaningful results. Below 100 values, the chi-squared test has low statistical power and MAD thresholds become unreliable. Nigrini suggests a minimum of 300 records for forensic accounting applications. This calculator will warn you if your sample is below 100.
Benford's Law applies to datasets that span multiple orders of magnitude and arise from multiplicative or exponential processes. Datasets confined to a narrow range (e.g., human heights in centimeters, all between 150 and 200) will not conform. Assigned numbers (phone numbers, zip codes, lottery draws) are also excluded because they lack the geometric scaling property. A dataset of invoice amounts between $1 and $999,999 conforms well; a dataset of exam scores between 60 and 100 does not.
With 9 possible leading digits minus 1 estimated parameter, we get 8 degrees of freedom. At significance level α = 0.05, the chi-squared distribution table gives a critical value of 15.507. If the computed χ² exceeds this threshold, you reject the null hypothesis that the data follows Benford's distribution. At α = 0.01 the critical value rises to 20.090, making the test more conservative.
No. Benford's Law is a screening tool, not proof of fraud. Non-conformity flags a dataset for deeper investigation (duplicate analysis, stratification by vendor, round-number testing). Legitimate datasets can fail Benford tests due to natural constraints (price points clustered near $9.99, minimum transaction thresholds). Conversely, a sophisticated fraudster could fabricate Benford-conforming data. Always combine Benford analysis with second-digit tests, last-two-digit tests, and domain-specific auditing procedures.
Chi-squared is a formal hypothesis test that is sensitive to sample size. Very large datasets will almost always reject the null hypothesis due to tiny but statistically significant deviations. MAD (Mean Absolute Deviation) is a descriptive measure independent of sample size, making it more practical for forensic work. Nigrini's thresholds (Close < 0.006, Acceptable < 0.012, Marginal < 0.015, Non-conforming ≥ 0.015) provide stable benchmarks regardless of whether you have 500 or 500,000 records.
Yes. The leading digit extraction algorithm strips minus signs and leading zeros (including those after a decimal point). For example, −0.00482 yields leading digit 4, and −37.5 yields leading digit 3. The value 0 is excluded entirely since it has no significant leading digit. Scientific notation (e.g., 3.2e7) is also parsed correctly.
The KS test examines the maximum deviation between cumulative distributions rather than bin-by-bin squared deviations. It is more sensitive to systematic shifts (e.g., all digits consistently 2% too high) and less affected by a single outlier bin. The critical value is approximately 1.36 / √N at α = 0.05. For small samples the KS test can be more robust than chi-squared, but with only 9 discrete categories its power is limited compared to continuous applications.