Education
 

Simple linear regression

From Psychology Wiki

Community portal · Tasks to do · News · Help

Clinical · Educational · Ind&Org · Other fields · Professional · Transpersonal · World

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language
Personality | Philosophy | Research Methods | Social | Statistics

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory


A simple linear regression is a linear regression in which there is only one covariate (predictor variable). Simple linear regression is a form of multiple regression.

Simple linear regression is used in situations to evaluate the linear relationship between two variables. One example could be the relationship between muscle strength and lean body mass. Another way to put it is that simple linear regression is used to develop an equation by which we can predict or estimate a dependent variable given an independent variable.

The regression equation is given by

math

Where math is the dependent variable, math is the y intercept, math is the gradient or slope of the line, math is independent variable and math is a random term.

The linear relationship between the two variables (i.e. dependent and independent) can be measured using a correlation coefficient e.g. the Pearson product moment correlation coefficient.

math

Contents

[edit] Estimating the Regression Line

The parameters of the linear regression line, math, can be estimated using the method of Ordinary Least Squares. This method finds the line that minimizes the sum of the squares of the regression residuals, math. The residual is the difference between the observed value and the predicted value: math

The minimization problem can be solved using calculus, producing the following formulas for the estimates of the regression parameters:

math

math

Ordinary Least Squares produces the following features:

  1. The line goes through the point math
  2. The sum of the residuals is equal to zero
  3. The estimates are unbiased

[edit] Alternative formulas for the slope coefficient

There are alternative (and simpler) formulas for calculating math:

math

Here, r is the correlation coefficient of X and Y, sx is the sample standard deviation of X and sy is the sample standard deviation of Y.

[edit] Inference

Under the assumption that the error term is normally distributed, the estimate of the slope coefficient has a normal distribution with mean equal to b and standard error given by:

WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
(./128e8e7159fbd76dffbb10b3ba29a
LaTeX2e <2003/12/01>
Babel  and hyphenation patterns for american, french, german, ngerman, b
ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e
stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis
h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur
kish, ukrainian, nohyphenation, loaded.
(/usr/share/texmf/tex/latex/base/article.cls
Document Class: article 2004/02/16 v1.4f Standard LaTeX document class
(/usr/share/texmf/tex/latex/base/size10.clo))
(/usr/share/texmf/tex/latex/amsfonts/amssymb.sty
(/usr/share/texmf/tex/latex/amsfonts/amsfonts.sty))
(/usr/share/texmf/tex/latex/amsmath/amsmath.sty
For additional information on amsmath, use the `?' option.
(/usr/share/texmf/tex/latex/amsmath/amstext.sty
(/usr/share/texmf/tex/latex/amsmath/amsgen.sty))
(/usr/share/texmf/tex/latex/amsmath/amsbsy.sty)
(/usr/share/texmf/tex/latex/amsmath/amsopn.sty))
(/usr/share/texmf/tex/latex/amsmath/amscd.sty)
(/usr/share/texmf/tex/latex/concmath/concmath.sty)
(./128e8e7159fbd76dffbb10b3ba29a.aux)
(/usr/share/texmf/tex/latex/concmath/ot1ccr.fd)
(/usr/share/texmf/tex/latex/concmath/omlccm.fd)
(/usr/share/texmf/tex/latex/concmath/omsccsy.fd)
(/usr/share/texmf/tex/latex/concmath/omxccex.fd)
(/usr/share/texmf/tex/latex/amsfonts/umsa.fd)
(/usr/share/texmf/tex/latex/amsfonts/umsb.fd)
! Missing { inserted.
 
                   \gdef 
l.5 \begin{equation*} s_ \hat{b}
                                 = \sqrt { \frac {\sum_{i=1}^N \hat{\varepsi...

! Missing } inserted.
 
                }
l.5 ...{i=1}^N (x_i - \bar{x})^2} }\end{equation*}
                                                  
[1] (./128e8e7159fbd76dffbb10b3ba29a.aux) )
(see the transcript file for additional information)
Output written on 128e8e7159fbd76dffbb10b3ba29a.dvi (1 page, 748 bytes).
Transcript written on 128e8e7159fbd76dffbb10b3ba29a.log.
.


A confidence interval for b can be created using a t-distribution with N-2 degrees of freedom:

WikiTeX: latex reported a failure, namely:
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
(./d6608bc8966e9f4fd0f92cde0e498
LaTeX2e <2003/12/01>
Babel  and hyphenation patterns for american, french, german, ngerman, b
ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e
stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis
h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur
kish, ukrainian, nohyphenation, loaded.
(/usr/share/texmf/tex/latex/base/article.cls
Document Class: article 2004/02/16 v1.4f Standard LaTeX document class
(/usr/share/texmf/tex/latex/base/size10.clo))
(/usr/share/texmf/tex/latex/amsfonts/amssymb.sty
(/usr/share/texmf/tex/latex/amsfonts/amsfonts.sty))
(/usr/share/texmf/tex/latex/amsmath/amsmath.sty
For additional information on amsmath, use the `?' option.
(/usr/share/texmf/tex/latex/amsmath/amstext.sty
(/usr/share/texmf/tex/latex/amsmath/amsgen.sty))
(/usr/share/texmf/tex/latex/amsmath/amsbsy.sty)
(/usr/share/texmf/tex/latex/amsmath/amsopn.sty))
(/usr/share/texmf/tex/latex/amsmath/amscd.sty)
(/usr/share/texmf/tex/latex/concmath/concmath.sty)
(./d6608bc8966e9f4fd0f92cde0e498.aux)
(/usr/share/texmf/tex/latex/concmath/ot1ccr.fd)
(/usr/share/texmf/tex/latex/concmath/omlccm.fd)
(/usr/share/texmf/tex/latex/concmath/omsccsy.fd)
(/usr/share/texmf/tex/latex/concmath/omxccex.fd)
(/usr/share/texmf/tex/latex/amsfonts/umsa.fd)
(/usr/share/texmf/tex/latex/amsfonts/umsb.fd)
! Missing { inserted.
 
                   \gdef 
l.5 \begin{equation*} [ \hat{b} - s_ \hat{b}
                                             t_{N-2}^*,\hat{b} + s_ \hat{b} ...

! Missing { inserted.
 
                   \gdef 
l.5 ...- s_ \hat{b} t_{N-2}^*,\hat{b} + s_ \hat{b}
                                                   t_{N-2}^*] \end{equation*}
! Missing } inserted.
 
                }
l.5 ...{b} + s_ \hat{b} t_{N-2}^*] \end{equation*}
                                                  
! Missing } inserted.
 
                }
l.5 ...{b} + s_ \hat{b} t_{N-2}^*] \end{equation*}
                                                  
[1] (./d6608bc8966e9f4fd0f92cde0e498.aux) )
(see the transcript file for additional information)
Output written on d6608bc8966e9f4fd0f92cde0e498.dvi (1 page, 692 bytes).
Transcript written on d6608bc8966e9f4fd0f92cde0e498.log.

[edit] Numerical Example

Suppose we have the sample of points {(1,-1),(2,4),(6,3)}. The mean of X is 3 and the mean of Y is 2. The slope coefficient estimate is given by:

math

The standard error of the coefficient is 0.866. A 95% confidence interval is given by:

[0.5 - 0.866 x 12.7062, 0.5 + 0.866 x 12.7062] = [-10.504, 11.504].

Smallwikipedialogo.png This page uses content from the English-language version of Wikipedia. The original article was at Simple linear regression. The list of authors can be seen in the page history. As with Psychology Wiki, the text of Wikipedia is available under the GNU Free Documentation License.