Education
 

Mann-Whitney U test

From Psychology Wiki

(Redirected from Mann-Whitney test)

Community portal · Tasks to do · News · Help

Clinical · Educational · Ind&Org · Other fields · Professional · Transpersonal · World

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language
Personality | Philosophy | Research Methods | Social | Statistics

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory


The Mann-Whitney U test is one of the best-known non-parametric statistical significance tests. It is sometimes also called the Mann-Whitney-Wilcoxon (MWW) test.

The test is appropriate to the case of two independent samples of observations that are measured at least at an ordinal level, i.e. we can at least say, of any two observations, which is the greater. The test assesses whether the degree of overlap between the two observed distributions is less than would be expected by chance, on the null hypothesis that the two samples are drawn from a single population.

The test involves the calculation of a statistic, usually called U, whose distribution under the null hypothesis is known. In the case of small samples, the distribution is tabulated, but for samples above about 20 there is a good approximation using the normal distribution. Some books tabulate statistics other than U, such as the sum of ranks in one of the samples, but this deviation from standard practice is unhelpful.

The U test is included in most modern statistical packages. However, it is easily calculated by hand especially for small samples. There are two ways of doing this:

  • For small samples, a direct method is recommended. It is very quick, and it also gives an insight into the meaning of the U statistic. Choose the sample for which the observations seem to be smaller (or the smaller sample - the choice is relevant only to ease of computation). Call this sample 1, and call the other sample sample 2. Taking each observation in sample 1, count the number of observations in sample 2 that are smaller than it. The total of these counts is U.
  • For larger samples, a formula can be used. Arrange all the observations into a single ranked series, and then add up the ranks in the smaller group. The sum of ranks in the other group follows by calculation, since the sum of all the ranks equals N(N + 1)/2 where N is the total number of observations. U is then given by the following formula:
math
where n1 and n2 are the two sample sizes, and R1 is the sum of the ranks in sample 1.</p>

Note that the maximum value of U is the product of the two sample sizes, and if the value obtained by either of the methods above is more than half of this maximum, it should be subtracted from the maximum to obtain the value to look up in tables.

For example, let us suppose that Aesop is dissatisfied with his classic experiment in which one tortoise was found to beat one hare in a race, and decides to carry out a significance test to discover whether the results could be extended to tortoises in general and hares in general. He collects a sample of 6 tortoises and 6 hares, and makes them all run his race. The order in which they reach the finishing post is as follows, writing T for a tortoise and H for a hare:

T H H H H H T T T T T H

(his original tortoise still goes at warp speed, and his original hare is still lazy, but the others run truer to stereotype). What is the value of U?

  • Using the direct method, we take each tortoise in turn, and count the number of hares it beats, getting the following results: 6, 1, 1, 1, 1, 1. So U = 6 + 1 + 1 + 1 + 1 + 1 = 11.
  • Using the indirect method:
the sum of the ranks achieved by the tortoises is 1 + 7 + 8 + 9 + 1 0 + 11 = 46.
Therefore U = 6×6 + 6×7/2 − 46 = 36 + 21 − 46 = 11.

Consulting the table referenced below, we find that this result does not confirm the greater speed of tortoises, though nor does it show any significant speed advantage for hares. It is left as an exercise for the reader to establish that statistical packages will give the same result, at rather greater expense.

For large samples, the normal approximation:

math

can be used, where z is a standard normal deviate whose significance can be checked in tables of the normal distribution. mU and σU are the mean and standard deviation of U if the null hypothesis is true, and are given by the following formulae:

math
math

All the formulae given here are made more complicated in the presence of tied ranks, but if the number of these is small (and especially if there are no large tie bands) these can be ignored when doing calculations by hand. The computer statistical packages will use them as a matter or routine.

The U test is useful in the same situations as the independent samples Student's t-test, and the question arises of which should be preferred. Before electronic calculators and computer packages made calculations easy, the U test was preferred on grounds of speed of calculation. It remains the logical choice when the data are inherently ordinal; and it is much less likely than the t-test to give a spuriously significant result because of one or two outliers. On the other hand, the U test is often recommended for situations where the distributions of the two samples are very different. This is an error: it tests whether the two samples come from a common distribution, and Monte Carlo methods have shown that it is capable of giving erroneously significant results in some situations where they are drawn from distributions with the same mean and different variances. In that situation, the version of the t-test that allows for the samples to come from populations of different variance is likely to give more reliable results.

The U test is related to a number of other nonparametric statistical procedures. For example, it is equivalent to using Kendall's τ correlation coefficient in a situation where one of the variables being correlated can only take two values.

A statistic linearly related to U, the ρ statistic proposed by Richard Herrnstein, is widely used in studies of categorization (discrimination learning involving concepts) in birds (see animal cognition). ρ is calculated by dividing U by its maximum value for the given sample sizes, which is simply n1n2. ρ is thus a non-parametric measure of the overlap between two distributions; it can take values between 0 and 1. Both extreme values represent complete separation of the distributions, while a ρ of 0.5 represents complete overlap.

[edit] See also

[edit] External links

[edit] References

  • Bi, J. (2006). Statistical analyses for R-index: Journal of Sensory Studies Vol 21(6) Dec 2006, 584-600.
  • Blair, R. C., Higgins, J. J., & Smitley, W. D. (1980). On the relative power of the U and t tests: British Journal of Mathematical and Statistical Psychology Vol 33(1) May 1980, 114-120.
  • Ciechalski, J. C. (1990). Action research, the Mann-Whitney U, and thou: Elementary School Guidance & Counseling Vol 25(1) Oct 1990, 54-63.
  • Curtis, D. A., & Marascuilo, L. A. (1992). Point estimates and confidence intervals for the parameters of the two-sample and matched-pair combined tests for ranks and normal scores: Journal of Experimental Education Vol 60(3) Spr 1992, 243-269.
  • D'Andrade, R. G. (1978). U-statistic hierarchical clustering: Psychometrika Vol 43(1) Mar 1978, 59-67.
  • Gibbons, J. D., & Chakraborti, S. (1991). Comparisons of the Mann-Whitney, Student's t, and alternate t tests for means of normal distributions: Journal of Experimental Education Vol 59(3) Spr 1991, 258-267.
  • Herrnstein, R. J., Loveland, D. H., & Cable, C. (1976). Natural concepts in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 2, 285-302.
  • Kasuya, E. (2001). Mann-Whitney U test when variances are unequal: Animal Behaviour Vol 61(6) Jun 2001, 1247-1249.
  • Kim, C. (1986). An empirical comparison of the power and the robustness of the two independent means t-test and the Mann-Whitney U-test for semantic differential and Likert type scale scores assuming a discretized normal distribution: Dissertation Abstracts International.
  • Lindman, H. R. (1972). Nonparametric statistics, Bayesian and classical: I. Sign test and Mann-Whitney U test. Oxford, England: Indiana U , No 72-6.
  • Pacut, A. (1987). How to use the Mann-Whitney Test to detect a change in distribution for groups: Acta Neurobiologiae Experimentalis Vol 47(1) 1987, 19-26.
  • Rasch, D., & Guiard, V. (2004). The robustness of parametric statistical methods: Psychology Science Vol 46(2) 2004, 175-208.
  • Rasmussen, J. L. (1983). Parametric vs nonparametric tests on non-normal and transformed data: Dissertation Abstracts International.
  • Rasmussen, J. L. (1986). An evaluation of parametric and non-parametric tests on modified and non-modified data: British Journal of Mathematical and Statistical Psychology Vol 39(2) Nov 1986, 213-220.
  • Simmons, H. J., Garber, E. E., & Simmons, G. T. (1988). An order statistic for coarse measurement scales: Ce derived from U: Journal of General Psychology Vol 115(2) Apr 1988, 203-213.
  • Trachtman, J. N., Giambalvo, V., & Dippner, R. S. (1978). On the assumptions concerning the assumptions of a t test: Journal of General Psychology Vol 99(1) Jul 1978, 107-116.
  • Ury, H. K., & Wiggins, A. D. (1976). A general upper bound on the variance of the Wilcoxon-Mann-Whitney U-statistic for symmetric distributions with shift alternatives: British Journal of Mathematical and Statistical Psychology Vol 29(2) Nov 1976, 263-267.
  • Wiedermann, W. T., & Alexandrowicz, R. W. (2007). A plea for more general tests than those for location only: Further considerations on Rasch & Guiard's 'The robustness of parametric statistical methods': Psychology Science Vol 49(1) 2007, 2-12.
  • Zimmerman, D. W. (1985). Power functions of the t test and Mann-Whitney U test under violation of parametric assumptions: Perceptual and Motor Skills Vol 61(2) Oct 1985, 467-470.
  • Zimmerman, D. W. (1987). Comparative power of Student t test and Mann-Whitney U test for unequal sample sizes and variances: Journal of Experimental Education Vol 55(3) Spr 1987, 171-174.

<!- de:Wilcoxon-Rangsummentest es:Prueba de Mann-Whitneynl:Wilcoxon -->

Smallwikipedialogo.png This page uses content from the English-language version of Wikipedia. The original article was at Mann-Whitney_U. The list of authors can be seen in the page history. As with Psychology Wiki, the text of Wikipedia is available under the GNU Free Documentation License.