About

Choosing the wrong class width collapses distinct data patterns into noise or fragments them into meaningless spikes. A histogram with too few bins hides bimodality; too many bins create random jaggedness that misleads interpretation. This calculator determines optimal class width h and number of classes k from raw ungrouped data using five established rules: Sturges' formula, Scott's normal reference rule, the Freedman-Diaconis estimator, the Square Root choice, and Rice's rule. Each method makes different assumptions about the underlying distribution. Sturges assumes approximate normality. Freedman-Diaconis is robust against outliers because it relies on the interquartile range IQR rather than standard deviation σ. The tool parses your raw observations, computes all five widths simultaneously, builds a grouped frequency table with absolute, relative, and cumulative frequencies, and renders a histogram. Note: all rules are asymptotic approximations. For sample sizes below 30, results should be treated as rough guides, not definitive answers.

Formulas

The calculator determines the number of classes k and class width h from n observations spanning a range R = x_max − x_min. Five rules are evaluated simultaneously.

Sturges' Rule

k = ⌈1 + 3.322 ⋅ log₁₀(n)⌉ , h = Rk

Scott's Rule

h = 3.49 ⋅ s ⋅ n⁻¹³ , k = ⌈Rh⌉

Freedman-Diaconis Rule

h = 2 ⋅ IQR ⋅ n⁻¹³ , k = ⌈Rh⌉

Square Root Rule

k = ⌈√n⌉ , h = Rk

Rice Rule

k = ⌈2 ⋅ n¹³⌉ , h = Rk

Where n = number of observations, R = range (x_max − x_min), s = sample standard deviation (Bessel-corrected), IQR = interquartile range (Q₃ − Q₁), h = class width, k = number of classes, and ⌈ ⌉ denotes the ceiling function.

Reference Data

Rule	Formula for Classes / Width	Assumption	Best For	Weakness
Sturges (1926)	k = 1 + 3.322 ⋅ log₁₀(n)	Normal distribution	Small to moderate n (< 200)	Under-bins for large or skewed data
Scott (1979)	h = 3.49 ⋅ σ ⋅ n^−1/3	Normal distribution	Continuous, roughly symmetric data	Sensitive to outliers via σ
Freedman-Diaconis (1981)	h = 2 ⋅ IQR ⋅ n^−1/3	None (nonparametric)	Skewed data, outlier-heavy sets	May over-bin if IQR is very small
Square Root	k = √n	None	Quick estimation, Excel default	No theoretical optimality
Rice (1944)	k = 2 ⋅ n^1/3	None	Large datasets	Tends to over-bin for small n
Manual	User-defined k	Domain knowledge	Regulatory or publication standards	Requires expertise
Common sample size benchmarks
n = 30	Sturges: 6, √: 5, Rice: 6	Minimum for CLT approximation
n = 100	Sturges: 8, √: 10, Rice: 9	Typical classroom dataset
n = 500	Sturges: 10, √: 22, Rice: 16	Survey-scale data
n = 1000	Sturges: 11, √: 32, Rice: 20	Large-sample analytics
n = 10000	Sturges: 15, √: 100, Rice: 43	Big-data; Sturges notably under-bins
Descriptive statistics used internally
Range	R = x_max − x_min	Spread of data
Mean	x = 1n n∑i=1 x_i	Arithmetic average
Std Dev (σ)	s = √n∑i=1(x_i − x)²n − 1	Sample standard deviation (Bessel-corrected)
IQR	IQR = Q₃ − Q₁	Middle 50% spread, outlier-resistant
Q1 (25th percentile)	Linear interpolation at rank 0.25(n + 1)	Lower quartile boundary
Q3 (75th percentile)	Linear interpolation at rank 0.75(n + 1)	Upper quartile boundary

Frequently Asked Questions

Use the Freedman-Diaconis rule. It replaces standard deviation with the interquartile range (IQR), which is resistant to extreme values. For example, income data with a few very high earners would inflate the standard deviation used by Scott's rule, producing bins that are too wide and masking the shape of the lower-income majority. Freedman-Diaconis avoids this by anchoring to the middle 50% of data.

Sturges' formula grows logarithmically: k = 1 + 3.322 × log₁₀(n). At n = 10,000 you get only about 15 classes, while the Square Root rule yields 100. Sturges derived his formula assuming a binomial distribution converging to normal. For large n with non-normal data, it systematically under-bins, hiding multimodality and local structure. For n > 200, prefer Scott's or Freedman-Diaconis.

When all values are equal, the range R = 0. Division by zero is avoided: the calculator reports a single class containing all observations with width 0. Scott's and Freedman-Diaconis rules also yield h = 0 because both σ and IQR equal zero. A toast notification warns that the data has no variability.

Yes. Select the "Manual" method and enter your desired number of classes k. The calculator then derives h = R / k. This is useful when a regulatory standard or publication guideline mandates a specific bin count, for instance ISO 3534-2 or APA style recommendations for histogram reporting.

Class width h is a single number: the span of each bin. Class interval boundaries are the actual edges. The first lower boundary is typically x_min (or slightly below it). Each subsequent boundary adds h. So for x_min = 10 and h = 5, boundaries are [10, 15), [15, 20), [20, 25), etc. The calculator uses left-closed, right-open intervals except for the last class, which is closed on both sides to include x_max.

Yes. The calculator rounds h upward to a "nice" number (multiples of 1, 2, 5, 10, etc.) for readability. This may slightly widen the total span beyond the original range, meaning the last class might extend past x_max. Frequency counts remain accurate because every observation still falls into exactly one bin. However, the visual histogram may show a small empty tail on the right.

The calculator sorts the data and uses linear interpolation (the "inclusive" method matching Excel's QUARTILE.INC). For Q1, the rank position is 0.25 × (n + 1). If this is not an integer, interpolation between the two adjacent sorted values is applied. The same process applies for Q3 at rank 0.75 × (n + 1). This method is consistent with most statistical software defaults.