User Rating 0.0
Total Usage 0 times
Separate X and Y with comma, space, or tab. One pair per line.
Presets:
Is this tool helpful?

Your feedback helps us improve.

About

The coefficient of determination R2 quantifies the proportion of variance in a dependent variable y that is predictable from an independent variable x. A value of 0.85 means 85% of the observed variation in y is explained by the linear model. Misinterpreting R2 leads to overfitting, false confidence in weak models, or rejection of adequate ones. This calculator performs Ordinary Least Squares regression, computes R2 via 1 SSE ÷ SST, and reports adjusted R2 which penalizes model complexity. It assumes a linear relationship and homoscedastic residuals.

Note: R2 alone does not confirm causation and can be misleading with nonlinear data. Always inspect residual patterns. For datasets with fewer than 5 observations, adjusted R2 becomes unreliable due to small-sample bias. The F-statistic reported here tests the null hypothesis that the slope equals zero. A high R2 with a non-significant F-statistic signals insufficient data.

r-squared calculator coefficient of determination linear regression pearson correlation statistics calculator adjusted r-squared goodness of fit

Formulas

The coefficient of determination is defined as the complement of the ratio of residual variance to total variance:

R2 = 1 SSESST

Where the component sums of squares are computed as:

SST = ni=1 (yi y)2
SSE = ni=1 (yi yi)2
SSR = ni=1 (yi y)2

The OLS regression line y = b0 + b1x has coefficients:

b1 = nxiyi xi yinxi2 (xi)2
b0 = y b1 x

Adjusted R-squared corrects for the number of predictors k:

R2adj = 1 (1 R2)(n 1)n k 1

The Pearson correlation coefficient r satisfies R2 = r2 for simple linear regression. The F-statistic is:

F = SSR ÷ kSSE ÷ (n k 1)

Where n = number of observations, k = number of predictors (1 for simple linear regression), yi = observed value, yi = predicted value, y = mean of observed values, b0 = y-intercept, b1 = slope.

Reference Data

R² RangeInterpretationTypical DomainAction Guidance
0.95 - 1.00Excellent fitPhysics, engineering calibrationVerify not overfitting; check for data leakage
0.85 - 0.95Strong fitChemistry, controlled experimentsModel reliable for prediction within range
0.70 - 0.85Moderate fitBiology, agricultureConsider additional predictors or transformations
0.50 - 0.70Weak - moderate fitSocial sciences, psychologyModel captures trend but high residual noise
0.30 - 0.50Weak fitEconomics, marketingUseful for directional insight only
0.00 - 0.30Poor fitBehavioral data, stock returnsLinear model inadequate; try nonlinear or add variables
< 0.00Worse than mean modelMisspecified modelsModel is harmful; discard and re-specify
Key Statistics Reference
SSTTotal Sum of Squares - total variance of y around its mean y
SSRRegression Sum of Squares - variance explained by the regression line
SSEError (Residual) Sum of Squares - unexplained variance; SST = SSR + SSE
rPearson correlation coefficient; R2 = r2 for simple linear regression
SEEStandard Error of Estimate - average distance of data from regression line in y units
FF-statistic - ratio of explained to unexplained variance per degree of freedom
Adj. R2Adjusted R-squared - penalizes adding predictors that do not improve fit
nSample size - minimum 3 for regression, 10+ recommended
kNumber of independent predictors - 1 in simple linear regression

Frequently Asked Questions

A negative R2 means the fitted model performs worse than a horizontal line at the mean y. This occurs when the model is fundamentally misspecified - for example, fitting a positive-slope line to data with a negative trend, or applying linear regression to strongly nonlinear data. A negative value is a signal to discard the current model and re-examine the relationship between variables.
Standard R2 never decreases when a predictor is added, even if that predictor is noise. Adjusted R2 applies a penalty proportional to n 1n k 1, so it can decrease if a new variable does not improve the model enough to justify the lost degree of freedom. Use adjusted R2 whenever you compare models with different numbers of predictors. For simple linear regression with k = 1, the difference is small when n is large but significant when n < 10.
No. R2 is scale-dependent on SST, which is unique to each dependent variable's distribution. Comparing R2 = 0.90 from a model predicting temperature with R2 = 0.70 from a model predicting stock price is meaningless. For cross-model comparison with different response variables, use information criteria (AIC, BIC) or out-of-sample prediction error (RMSE on test data).
For simple linear regression, the F-statistic is a monotonic transformation of R2: F = R2 ÷ k(1 R2) ÷ (n k 1). A high R2 with a low F-statistic (below approximately 4.0 for common significance levels) means the sample is too small to confirm the relationship is not due to chance. Always check F alongside R2.
Statistically, you need at least n k + 2 to compute regression at all (for k = 1, minimum 3 points). However, with very small samples, R2 is inflated - two points always yield R2 = 1.00 trivially. A practical guideline is n 10 per predictor for exploratory work and n 20 per predictor for confirmatory analysis.
No. R2 measures linear association, not causation. Two variables may correlate due to a shared confounding variable. For example, ice cream sales and drowning rates both correlate with temperature, producing a high R2 between them despite no causal link. Establishing causation requires controlled experiments, instrumental variables, or quasi-experimental designs - none of which are captured by R2 alone.