Changes: Standard scores

Latest revision as of 11:43, 26 April 2013

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory

File:The Normal Distribution.svg

comparison of various measures of the normal distribution: standard deviations, cumulative percentages, Z-scores, and T-scores

In statistics, a standard score is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. This conversion process is called standardizing or normalizing; however, "normalizing" can refer to many types of ratios; see normalization (statistics) for more.

Standard scores are also called z-values, z-scores, normal scores, and standardized variables; the use of "Z" is because the normal distribution is also known as the "Z distribution". They are most frequently used to compare a sample to a standard normal deviate (standard normal distribution, with μ=0 and σ=1), though they can be defined without assumptions of normality.

The standard score indicates how many standard deviations an observation is above or below the mean: the standard deviation is the unit of measurement of the z-score. It allows comparison of observations from different normal distributions, which is done frequently in research.

The z-score is only defined if one knows the population parameters, as in standardized testing; if one only has a sample set, then the analogous computation with sample mean and sample standard deviation yields the Student's t-statistic.

The standard score is not the same as the z-factor used in the analysis of high-throughput screening data, but is sometimes confused with it.

Formula

The standard score is

z={\frac {x-\mu }{\sigma }},

where:

x is a raw score to be standardized;

μ is the mean of the population;

σ is the standard deviation of the population.

The quantity z represents the distance between the raw score and the population mean in units of the standard deviation. z is negative when the raw score is below the mean, positive when above.

A key point is that calculating z requires the population mean and the population standard deviation, not the sample mean or sample deviation. It requires knowing the population parameters, not the statistics of a sample drawn from the population of interest. But knowing the true standard deviation of a population is often unrealistic except in cases such as standardized testing, where the entire population is measured. In cases where it is impossible to measure every member of a population, the standard deviation may be estimated using a random sample. For example, a population of people who smoke cigarettes is not fully measured.

When a population is normally distributed, the percentile rank may be determined from the standard score and statistical tables.

Related statistics

If using sample mean and sample standard deviation (rather than the population mean and standard deviation), the resulting ratio is the (single-sample) Student's t-statistic. In regression analysis, one instead uses the studentized residual, as the standard error of estimates of response variables vary for different input explanatory variables.

The T score statistic is a simple transformation of the z score, calculated using the formula

T=({z*10})+50

The T score has a mean of 50 and a standard deviation of 10 (Carroll, Carroll & 2002 , p.56).

Applications

The z-score is most often used in the z-test in standardized testing – the analog of the Student's t-test for a population whose parameters are known, rather than estimated. As it is very unusual to know the entire population, the t-test is much more widely used.

Darby and Reissland (1981) make use of z-scores as a way of understanding the contributions from various subsets of data to an overall test of trend. The overall analysis was of trends in the rate of occurrence of cancer and the subsets considered approximately 55 different types of cancer, together with various groupings of these types. In this instance, the use of z-scores is not immediately as a test statistic for a significance test, but rather as a numerical guide to finding subsets of data which might show different trends than others.

Standardizing in mathematical statistics

Further information: Normalization (statistics)

In mathematical statistics, a random variable X is standardized using the theoretical (population) mean and standard deviation:

Z={X-\mu  \over \sigma }

where μ = E(X) is the mean and σ = the standard deviation of the probability distribution of X.

If the random variable under consideration is the sample mean:

{\bar {X}}={1 \over n}\sum _{i=1}^{n}X_{i}

then the standardized version is

Z={{\bar {X}}-\mu  \over \sigma /{\sqrt {n}}}.

See normalization (statistics) for other forms of normalization.

References and notes

Carroll, Susan Rovezzi; Carroll, David J. (2002), Statistics Made Simple for School Leaders (illustrated ed.), Rowman & Littlefield, ISBN 9780810843226, http://books.google.com/books?id=gccHkMDikb0C, retrieved on 7 June 2009

General references

Richard J. Larsen and Morris L. Marx (2000) An Introduction to Mathematical Statistics and Its Applications, Third Edition, ISBN 0139223037. p. 282.
Darby, S.C., Reissland, J.A. (1981) "Low levels of ionizing radiation and cancer — are we underestimating the risk? (with discussion)". Journal of the Royal Statistical Society, Series A, 144(3), 298–331.

External links

A Guide to Understanding & Calculating the Standard Score (Z-Score)
Norm Scale Calculator (Utility for the Transformation and Visualization of Norm Scores)
Z-Score to percentile conversion table With a given Z-Score, calculate the value's percentile rank.
Z-Score to percentile calculator Converts Z-Scores into percentiles (1 & 2 Sided).
Normal Distribution & calculation of Z-Scores and percentile rank with Excel functions

@@ Line 1: / Line 1: @@
 {{StatsPsy}}
-[[Image:Normal_distribution_and_scales.gif|thumb|350px|right|Compares the various grading methods in a normal distribution. Includes: Standard deviations, cummulative precentages, percentile equivalents, Z-scores, T-scores, standard nine, percent in stanine]]
+[[Image:The Normal Distribution.svg|thumb|350px|right|comparison of various measures of the normal distribution: standard deviations, cumulative percentages, Z-scores, and T-scores]]
-In [[statistics]], a '''standard score''' (also ''z'''''-score''' or '''normal score''') is a [[dimensionless number|dimensionless quantity]] derived by subtracting the [[population mean]] from an individual (raw) score and then dividing the difference by the population [[standard deviation]]:
+In [[statistics]], a '''standard score''' is a [[dimensionless number|dimensionless quantity]] derived by subtracting the [[population mean]] from an individual raw score and then dividing the difference by the [[statistical population|population]] [[standard deviation]].  This conversion process is called '''standardizing''' or '''normalizing'''; however, "normalizing" can refer to many types of ratios; see [[normalization (statistics)]] for more.
+Standard scores are also called '''z-values, ''z''-scores, normal scores,''' and '''standardized variables;''' the use of "Z" is because the normal distribution is also known as the "Z distribution". They are most frequently used to compare a sample to a [[standard normal deviate]] (standard normal distribution, with ''μ''=0 and ''σ''=1), though they can be defined without assumptions of normality.
-The standard score, which is also commonly known as the '''z-score''', is not the same as, but is sometimes confused with, the [[Z-Factor]] used in the analysis of [[high-throughput screening]] data.
+The standard score indicates how many [[standard deviation]]s an observation is above or below the mean: the standard deviation is the unit of measurement of the z-score.
-Knowing the true &sigma; of a population is often unrealistic except in cases such as [[standardized testing]] in which the entire population is known.  In cases where it is impossible to measure every member of a population, the standard deviation may be estimated using a random sample.
+It allows comparison of observations from different normal distributions, which is done frequently in research.
+The z-score is ''only'' defined if one knows the population parameters, as in [[standardized testing]]; if one only has a sample set, then the analogous computation with sample mean and sample standard deviation yields the [[Student's t-statistic]].
-The ''z'' score calculation requires the following to be known:
-*&sigma; (the [[standard deviation]] of the [[statistical population|population]])
+The standard score is not the same as the [[z-factor]] used in the analysis of [[high-throughput screening]] data, but is sometimes confused with it.
-*&mu; (the [[mean]] of the population)
-*''X'' (a raw score)
+== Formula ==
 The standard score is
-:<math> z = {X - \mu \over \sigma}.</math>
+: <math> z = \frac{x - \mu}{\sigma},</math>
+where:
+: ''x'' is a raw score to be standardized;
+: ''μ'' is the [[mean]] of the population;
+: ''σ'' is the [[standard deviation]] of the population.
 The quantity ''z'' represents the distance between the raw score and the population mean in units of the standard deviation.  ''z'' is negative when the raw score is below the mean, positive when above.
+A key point is that calculating ''z'' requires the population mean and the population standard deviation, not the sample mean or sample deviation.  It requires knowing the population parameters, not the statistics of a sample drawn from the population of interest. But knowing the true standard deviation of a population is often unrealistic except in cases such as [[standardized testing]], where the entire population is measured.  In cases where it is impossible to measure every member of a population, the standard deviation may be estimated using a random sample.  For example, a population of people who smoke [[cigarette]]s is not fully measured.
-Another name for a standard score is a '''''z''-score'''. The conversion process itself is sometimes called '''standardizing'''.
+When a population is [[normal distribution|normally distributed]], the [[percentile rank]] may be determined from the standard score and statistical tables.
-The key point to remember for the ''z'' score is that it is calculated using the population mean and the population standard deviation and not the sample mean or sample deviation.  Calculation of the ''z'' score requires knowledge of the population statistics as opposed to the statistics of a sample drawn from the population of interest.
+=== Related statistics ===
-Population statistics are rarely known in the real world except for circumstances such as [[standardized testing]].  The population of people taking a standardized test is known and the population statistics can be calculated because all of the scores of the test takers are available.  On the other hand, a population such as people who smoke [[cigarette]]s is not fully described so the population statistics are approximated using samples of the population.
+If using sample mean and sample standard deviation (rather than the population mean and standard deviation), the resulting ratio is the (single-sample) [[Student's t-statistic]]. In [[regression analysis]], one instead uses the [[studentized residual]], as the standard error of estimates of response variables vary for different input explanatory variables.
+The T score statistic is a simple transformation of the z score, calculated using the formula
-When a population is [[normal distribution|normally distributed]], the [[percentile rank]] may be determined from the standard score and ubiquitous tables.
+: <math> T = ({z * 10}) + 50 </math>
+The T score has a mean of 50 and a standard deviation of 10 {{harv |Carroll|Carroll|2002|, p.56}}.
+== Applications ==
+The z-score is most often used in the [[z-test]] in [[standardized testing]] – the analog of the [[Student's t-test]] for a population whose parameters are known, rather than estimated. As it is very unusual to know the entire population, the t-test is much more widely used.
+Darby and Reissland (1981) make use of z-scores as a way of understanding the contributions from various subsets of data to an overall test of trend. The overall analysis was of trends in the rate of occurrence of cancer and the subsets considered approximately 55 different types of cancer, together with various groupings of these types. In this instance, the use of z-scores is not immediately as a test statistic for a significance test, but rather as a numerical guide to finding subsets of data which might show different trends than others.
 ==Standardizing in mathematical statistics==
+{{further|[[Normalization (statistics)]]}}
 In [[mathematical statistics]], a [[random variable]] ''X'' is '''standardized''' using the theoretical (population) mean and standard deviation:
 :<math>Z = {X - \mu \over \sigma}</math>
-where &mu;&nbsp;=&nbsp;E(''X'') is the [[mean]] and &sigma;&sup2;&nbsp;=&nbsp;Var(''X'') the variance of the [[probability distribution]] of ''X''.
+where μ&nbsp;=&nbsp;E(''X'') is the [[mean]] and σ&nbsp;= the [[standard deviation]] of the [[probability distribution]] of ''X''.
 If the random variable under consideration is the [[sample mean]]:
@@ Line 39: / Line 57: @@
 :<math>\bar{X}={1 \over n} \sum_{i=1}^n X_i</math>
 then the standardized version is
-:<math>Z={\bar{X}-\mu\over\sigma/\sqrt{n}}</math>
+:<math>Z={\bar{X}-\mu\over\sigma/\sqrt{n}}.</math>
+See [[normalization (statistics)]] for other forms of normalization.
+==References and notes==
+* {{citation|last=Carroll|first=Susan Rovezzi|last2=Carroll|first2=David J.
+ |title=Statistics Made Simple for School Leaders |url=http://books.google.com/books?id=gccHkMDikb0C
+ |accessdate=7 June 2009 |edition=illustrated |year=2002|publisher=Rowman & Littlefield |isbn=9780810843226}}
+===General references===
+* Richard J. Larsen and Morris L. Marx (2000) ''An Introduction to Mathematical Statistics and Its Applications, Third Edition,'' ISBN 0139223037. p. 282.
+* Darby, S.C., Reissland, J.A. (1981) "Low levels of ionizing radiation and cancer &mdash; are we underestimating the risk? (with discussion)". Journal of the Royal Statistical Society, Series A, 144(3), 298&ndash;331.
+==External links==
+{{Spoken Wikipedia|Standard score.ogg|2006-07-09}}
+*[http://www.stats4students.com/Essentials/Standard-Score/Overview.php A Guide to Understanding & Calculating the Standard Score (Z-Score)]
+*[http://www.psychometrica.de/normwertrechner_en.html Norm Scale Calculator] (Utility for the Transformation and Visualization of Norm Scores)
+*[http://www.acposb.on.ca/conversion.htm  Z-Score to percentile conversion table] With a given Z-Score, calculate the value's percentile rank.
+*[http://www.measuringusability.com/pcalcz.php Z-Score to percentile calculator] Converts Z-Scores into percentiles (1 & 2 Sided).
+*[http://www.uark.edu/misc/lampinen/tutorials/normal.htm Normal Distribution & calculation of Z-Scores and percentile rank with Excel functions]
 ==See also==
+* [[Normalization (statistics)]]
-* [[Z-test]]
-* [[Z-Factor]]
+* [[Sampling distribution]]
-* [[moment (mathematics)]]
+* [[Score equating]]
-* [[central moment]]
+* [[Scoring (testing)]]
-* [[sampling distribution]]
+* [[Standard deviation]]
+* [[Standard normal deviate]]
+* [[Standard normal table]]
 * [[Student's t-test]]
+* [[Student's t-statistic]]
+* [[Studentized residual]]
+* [[Z-test]]
+{{Statistics}}
+[[Category:Data analysis]]
 [[Category:Probability and statistics]]
+[[Category:Statistical terminology]]
+[[Category:Statistical ratios]]
+[[Category:Test scores]]
-[[ja:偏差値]]
+<!--
+[[de:Standardisierung (Statistik)]]
+[[ko:편차치]]
+[[he:ציון תקן]]
+[[it:Standardizzazione (statistica)]]
 [[nl:Z-score]]
+[[ja:偏差値]]
+[[pl:Standaryzacja (statystyka)]]
+[[sl:Z-vrednost]]
+[[ur:ز۔قدر]]
+[[zh:標準分數]]
+-->
 {{enWP|Standard_score}}

v·d·e Statistics
Descriptive statistics	Mean (Arithmetic, Geometric) - Median - Mode - Power - Variance - Standard deviation
Inferential statistics	Hypothesis testing - Significance - Null hypothesis/Alternate hypothesis - Error - Z-test - Student's t-test - Maximum likelihood - Standard score/Z score - P-value - Analysis of variance
Survival analysis	Survival function - Kaplan-Meier - Logrank test - Failure rate - Proportional hazards models
Probability distributions	Normal (bell curve) - Poisson - Bernoulli
Correlation	Confounding variable - Pearson product-moment correlation coefficient - Rank correlation (Spearman's rank correlation coefficient, Kendall tau rank correlation coefficient)
Regression analysis	Linear regression - Nonlinear regression - Logistic regression