Cronbach's alpha

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory

Cronbach's $\alpha$ (alpha) is a statistic. It has an important use as a measure of the reliability of a psychometric instrument. It was first named as alpha by Cronbach (1951), as he had intended to continue with further instruments. It is the extension of an earlier version, the Kuder-Richardson Formula 20 (often shortened to KR-20), which is the equivalent for dichotomous items, and Guttman (1945) developed the same quantity under the name lambda-2. Cronbach's $\alpha$ is a coefficient of consistency and measures how well a set of variables or items measures a single, unidimensional latent construct.

Definition

Cronbach's $\alpha$ is defined as

${\displaystyle \alpha = { { {N} \over{N-1} } \left(1 - {{\sum_{i=1}^N \sigma^{2}_{Y_i}}\over{\sigma^{2}_{X}}}\right) } }$

where $N$ is the number of components (items or testlets), $\sigma_X^2$ is the variance of the observed total test scores, and $\sigma^{2}_{Y_i}$ is the variance of component i.

Alternatively, the standardized Cronbach's $\alpha$ can also be defined as

$\alpha = {N\cdot\bar c \over (\bar v + (N-1)\cdot\bar c)}$

where N is the number of components (items or testlets), $\bar{v}$ equals the average variance and $\bar{c}$ is the average of all covariances between the components.

Cronbach's alpha and internal consistency

Cronbach's alpha will generally increase when the correlations between the items increase. For this reason the coefficient is also called the internal consistency or the internal consistency reliability of the test.

Cronbach's alpha in classical test theory

Alpha is an unbiased estimator of reliability if and only if the components are essentially $\tau$ -equivalent (Lord & Novick, 1968^[1]). Under this condition the components can have different means and different variances, but their covariances should all be equal - which implies that they have 1 common factor in a factor analysis. One special case of essential $\tau$ -equivalence is that the components are parallel. Although the assumption of essential $\tau$ -equivalence may sometimes be met (at least approximately) by testlets, when applied to items it is probably never true. This is caused by the facts that (1) most test developers invariably include items with a range of difficulties (or stimuli that vary in their standing on the latent trait, in the case of personality, attitude or other non-cognitive instruments), and (2) the item scores are usually bounded from above and below. These circumstances make it unlikely that the items have a linear regression on a common factor. A factor analysis may then produce artificial factors that are related to the differential skewnesses of the components. When the assumption of essential $\tau$ -equivalence of the components is violated, alpha is not an unbiased estimator of reliability. Instead, it is a lower bound on reliability.

$\alpha$ can take values between negative infinity and 1 (although only positive values make sense). Some professionals, as a rule of thumb, require a reliability of 0.70 or higher (obtained on a substantial sample) before they will use an instrument. Obviously, this rule should be applied with caution when $\alpha$ has been computed from items that systematically violate its assumptions. Further, the appropriate degree of reliability depends upon the use of the instrument, e.g., an instrument designed to be used as part of a battery may be intentionally designed to be as short as possible (and thus somewhat less reliable). Other situations may require extremely precise measures (with very high reliabilities).

Cronbach's $\alpha$ is related conceptually to the Spearman-Brown prediction formula. Both arise from the basic classical test theory result that the reliability of test scores can be expressed as the ratio of the true score and total score (error and true score) variances:

$\rho_{XX}= { {\sigma^2_T}\over{\sigma_X^2} }$

Alpha is most appropriately used when the items measure different substantive areas within a single construct. Conversely, alpha (and other internal consistency estimates of reliability) are inappropriate for estimating the reliability of an intentionally heterogeneous instrument (such as screening devices like biodata or the original MMPI). Also, $\alpha$ can be artificially inflated by making scales which consist of superficial changes to the wording within a set of items or by analyzing speeded tests.

Cronbach's alpha in generalizability theory

Cronbach and others generalized some basic assumptions of classical test theory in their generalizability theory. If this theory is applied to test construction, then it is assumed that the items that constitute the test are a random sample from a larger universe of items. The expected score of a person in the universe is called the universum score, analogous to a true score. The generalizability is defined analogously as the variance of the universum scores divided by the variance of the observable scores, analogous to the concept of reliability in classical test theory. In this theory, Cronbach's alpha is an unbiased estimate of the generalizability. For this to be true the assumptions of essential $\tau$ -equivalence or parallelness are not needed. Consequently, Cronbach's alpha can be viewed as a measure of how well the sum score on the selected items capture the expected score in the entire domain, even if that domain is heterogeneous.

Cronbach's alpha and the intra-class correlation

Cronbach's alpha is equal to the stepped-up consistency version of the Intra-class correlation coefficient, which is commonly used in observational studies. This can be viewed as another application of generalizability theory, where the items are replaced by raters or observers who are randomly drawn from a population. Cronbach's alpha will then estimate how strongly the score obtained from the actual panel of raters correlates with the score that would have been obtained by another random sample of raters.

Cronbach's alpha and factor analysis

As stated in the section about its relation with classical test theory, Cronbach's alpha has a theoretical relation with factor analysis. There is also a more empirical relation: Selecting items such that they optimize Cronbach's alpha will often result in a test that is homogeneous in that they (very roughly) approximately satisfy a factor analysis with one common factor. The reason for this is that Cronbach's alpha increases with the average correlation between item, so optimization of it tends to select items that have correlations of similar size with most other items. It should be stressed that, although unidimensionality (i.e. fit to the one-factor model) is a necessary condition for alpha to be an unbiased estimator of reliability, the value of alpha is not related to the factorial homogeneity. The reason is that the value of alpha depends on the size of the average inter-item covariance, while unidimensionality depends on the pattern of the inter-item covariances.

Cronbach's alpha and other disciplines

Although this description of the use of $\alpha$ is given in terms of psychology, the statistic can be used in any discipline.

Construct creation

Coding two (or more) different variables with a high Cronbach's alpha into a construct for regression use is simple. Dividing the used variables by their means or averages results in a percentage value for the respective case. After all variables have been re-calculated in percentage terms, they can easily be summed to create the new construct.

References

↑ Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Wesley Publishing Company.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.
Allen, M.J., & Yen, W. M. (2002). Introduction to Measurement Theory. Long Grove, IL: Waveland Press.