# Coefficient of determination

*34,202*pages on

this wiki

## Ad blocker interference detected!

### Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.

Assessment |
Biopsychology |
Comparative |
Cognitive |
Developmental |
Language |
Individual differences |
Personality |
Philosophy |
Social |

Methods |
Statistics |
Clinical |
Educational |
Industrial |
Professional items |
World psychology |

**Statistics:**
Scientific method ·
Research methods ·
Experimental design ·
Undergraduate statistics courses ·
Statistical tests ·
Game theory ·
Decision theory

In statistics, the **coefficient of determination** *R*^{2} is the proportion of variability in a data set that is accounted for by a statistical model. There are several common and equivalent expressions for *R*^{2}. The version most common in statistics texts is based on an analysis of variance decomposition as follows:

In the above definition,

That is, is the total sum of squares, is the explained sum of squares, and is the residual sum of squares.

## Explanation and interpretation of *R*^{2}Edit

For expository purposes, consider a linear model of the form

where *Y*_{i} is the response variable, are unknown coefficients; are *p* regressors, and is a mean zero error term. The coefficient of determination *R*^{2} is a measure of the global fit of the model. Specifically, is an element of [0,1] and represents the proportion of variability in *Y*_{i} that may be attributed to some linear combination of the regressors (explanatory variables) in *X*.

More simply, *R*^{2} is often interpreted as the proportion of response variation "explained" by the regressors in the model. Thus, indicates that the fitted model explains all variability in , while indicates no 'linear' relationship between the response variable and regressors. An interior value such as may be interpreted as follows: "Approximately seventy percent of the variation in the response variable can be explained by the explanatory variable. The remaining thirty percent can be explained by unknown, lurking variables or inherent variability."

If there is just one scalar-valued regressor, then is the square of the correlation between the regressor and response variables. More generally, is the square of the correlation between *y* and .

## Inflation of *R*^{2}Edit

In least squares regression, *R*^{2} is weakly increasing in the number of regressors in the model. As such, *R*^{2} cannot be used as a meaningful comparison of models with different numbers of covariants. As a reminder of this, some authors denote *R*^{2} by *R*^{2}_{p}, where *p* is the number of columns in *X*

Demonstration of this property is trivial. To begin, recall that the objective of least squares regression is (in matrix notation)

The optimal value of the objective is weakly smaller as additional columns of are added, by the fact that relatively unconstrained minimization leads to a solution which is weakly smaller than relatively constrained minimization. Given the previous conclusion and noting that depends only on *y*, the non-decreasing property of *R*_{2} follows directly from the definition above.

## Adjusted *R*^{2}Edit

Adjusted *R*^{2} is a modification of *R*^{2} that adjusts for the number of explanatory terms in a model. Unlike *R*^{2}, the adjusted *R*^{2} increases only if the new term improves the model more than would be expected by chance. The adjusted *R*^{2} can be negative, and will always be less than *R*^{2}. The adjusted *R*^{2} is defined as

where p is the total number of regressors in the linear model, and *n* is sample size.

Adjusted *R*^{2} *does not have the same interpretation as R ^{2}*. As such, care must be taken in interpreting and reporting this statistic. Adjusted

*R*

^{2}is particularly useful in the Feature selection stage of model building.

## Notes on interpreting *R*^{2}Edit

does *NOT* tell whether:

- the independent variables are a true cause of the changes in the dependent variable
- omitted-variable bias exists; or
- the most appropriate set of independent variables has been chosen

## External linksEdit

## See alsoEdit

- de:Bestimmtheitsmaß
- es:Coeficiente_de_Determinacíon
- pt:Coeficiente de determinação

This page uses Creative Commons Licensed content from Wikipedia (view authors). |