Psychology Wiki
Register
No edit summary
 
No edit summary
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{StatsPsy}}
 
{{StatsPsy}}
  +
{{Expert}}
  +
{{main|Hypothesis testing}}
   
In [[statistics]], a '''null hypothesis''' is a hypothesis set up to be nullified or refuted in order to support an ''alternative hypothesis''. When used, the null hypothesis is presumed true until statistical [[evidence]] in the form of a hypothesis test indicates otherwise.
+
In [[statistics]], a '''null hypothesis''' (''H''<sub>0</sub>) is a hypothesis set up to be nullified or refuted in order to support an ''[[alternative hypothesis]]''. This procedure is sometimes known as '''null hypothesis significance testing''' (NHST) or '''null hypothesis testing''' (NHT)
   
  +
When used, the null hypothesis is presumed true until [[statistics|statistical]] [[evidence]], in the form of a [[statistical hypothesis testing|hypothesis test]], indicates otherwise — that is, when the researcher has a certain degree of confidence, usually 95% to 99%, that the data does not support the null hypothesis. It is possible for an experiment to fail to reject the null hypothesis. It is also possible that both the null hypothesis and the alternate hypothesis are rejected if there are more than those two possibilities.it isn't statistically approved
Although it was originally proposed to be any hypothesis, in practice it has come to be identified with the "nil hypothesis", which states that "there is no phenomenon", and that the results in question could have arisen through [[chance]].
 
   
  +
In scientific and medical applications, the null hypothesis plays a major role in testing the significance of differences in treatment and [[scientific control|control]] groups. The assumption at the outset of the experiment is that no difference exists between the two groups (for the variable being compared): this is the null hypothesis in this instance. Other types of null hypothesis may be, for example, that:
For example, if we want to compare the test scores of two random [[statistical sample|sample]]s of men and women, a null hypothesis would be that the mean score of the male population was the same as the mean score of the female population, and therefore there is no significant statistical difference between them:
 
  +
* values in samples from a given population can be modelled using a certain family of [[statistical distribution]]s.
  +
* the variability of data in different groups is the same, although they may be centred around different values.
   
  +
The term was coined by the [[England|English]] [[geneticist]] and statistician [[Ronald Fisher]].
:<math>H_0: \mu_1 = \mu_2</math>
 
  +
  +
== Example ==
 
For example, one may want to compare the test scores of two random [[statistical sample|sample]]s of men and women, and ask whether or not one population has a mean score different from the other. A null hypothesis would be that the mean score of the male population was the same as the mean score of the female population:
  +
  +
: ''H''<SUB>0</SUB> : &mu;<SUB>1</SUB> = &mu;<SUB>2</SUB>
   
 
where:
 
where:
:''H''<SUB>0</SUB> = the null hypothesis
 
:&mu;<SUB>1</SUB> = the mean of population 1, and
 
:&mu;<SUB>2</SUB> = the mean of population 2.
 
   
 
: ''H''<SUB>0</SUB> = the null hypothesis
Alternatively, the null hypothesis can postulate that the two samples are drawn from the same population:
 
 
: &mu;<SUB>1</SUB> = the mean of population 1, and
 
: &mu;<SUB>2</SUB> = the mean of population 2.
   
 
Alternatively, the null hypothesis can postulate that the two samples are drawn from the same population, so that the variance and shape of the distributions are equal, as well as the means.
:<math>H_0: \mu_1 - \mu_2 = 0</math>
 
   
Formulation of the null hypothesis is a vital step in [[statistical significance]] testing. Having formulated such a hypothesis, the probability of observing the obtained data can be established, or data more different from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what is commonly called the "significance level" of the results.
+
Formulation of the null hypothesis is a vital step in testing [[statistical significance]]. Having formulated such a hypothesis, one can establish the probability of observing the obtained data or data more different from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what is commonly called the "significance level" of the results.
   
  +
That is, in scientific experimental design, we may predict that a particular factor will produce an effect on our dependent variable — this is our alternative hypothesis. We then consider how often we would expect to observe our experimental results, or results even more extreme, if we were to take many samples from a population where there was no effect (i.e. we test against our null hypothesis). If we find that this happens rarely (up to, say, 5% of the time), we can conclude that our results support our experimental prediction — we reject our null hypothesis.
When a null hypothesis is formed, it is always in contrast to an implicit ''alternative hypothesis'', which is accepted if the observed data values are sufficiently improbable under the null hypothesis. The precise formulation of the null hypothesis has implications for the alternative. For example, if the null hypothesis is that sample A is drawn from a population with the same mean as sample B, the alternative hypothesis is that they come from populations with ''different'' means, which can be tested with a [[two-tailed test]] of significance. But if the null hypothesis is that sample A is drawn from a population whose mean is ''lower'' than the mean of the population from which sample B is drawn, the alternative hypothesis is that sample A comes from a population with a ''higher'' mean than the population from which sample B is drawn, which can be tested with a [[one-tailed test]].
 
   
  +
== Directionality ==
A null hypothesis is only useful if it is possible to calculate the probability of observing a data set with particular parameters from it. In general it is much harder to be precise about how probable the data would be if the alternative hypothesis is true.
 
   
  +
In many statements of null hypotheses there is no appearance that these can have a "directionality", in that the statement says that values are identical. However, null hypotheses can and do have "direction" - in many of these instances statistical theory allows the formulation of the test procedure to be simplified so that the test is equivalent to testing for an exact identity. That is, if we formulate a one-tailed alternative hypothesis ''that application of Drug A will lead to increased growth in patients'', the effective null hypothesis remains ''that application of Drug A will have no effect on growth in patients''. It is not merely the opposite of the alternative hypothesis — that is, it is not ''that application of Drug A will not lead to increased growth in patients.'' However this does remain the true null hypothesis.
If experimental observations contradict the prediction of the null hypothesis, it means that either the null hypothesis is false, or we have observed an event with very low probability. This gives us high confidence in the falsehood of the null hypothesis, which can be improved by increasing the number of trials. However, accepting the alternative hypothesis only commits us to a difference in observed parameters; it does not prove that the theory or principles that predicted such a difference is true, since it is always possible that the difference could be due to additional factors not recognised by the theory.
 
   
  +
To explain why this should be so, it is instructive to consider the nature of the hypotheses outlined above. We are predicting that patients exposed to Drug A will see increased growth compared to a control group who do not receive the drug. That is,
For example, rejection of a null hypothesis (that, say, rates of symptom relief in a sample of patients who received a [[placebo]] and a sample who received a medicinal drug will be equal) allows us to make a non-null statement (that the rates differed); it does not prove that the drug relieved the symptoms, though it gives us more confidence in that hypothesis.
 
   
  +
: ''H''<sub>1</sub>: &mu;<sub>drug</sub> > &mu;<sub>control</sub>
The formulation, testing, and rejection of null hypotheses is methodologically consistent with the [[falsificationism|falsificationist]] model of [[Science|scientific discovery]] formulated by [[Karl Popper]] and widely believed to apply to most kinds of [[empirical research]]. However, concerns regarding the high [[Statistical power|power]] of [[statistical hypothesis testing|statistical tests]] to detect differences in large samples have led to suggestions for re-defining the null hypothesis, for example as a hypothesis that an effect falls within a range considered negligible. This is an attempt to address the confusion among non-statisticians between ''significant'' and ''substantial'', since large enough samples are likely to be able to indicate differences however minor.
 
   
  +
where:
==See also==
 
  +
*[[Statistical hypothesis testing]]
 
  +
: &mu; = the patients' mean growth.
*[[P-value]]
 
  +
  +
The effective null hypothesis is ''H''<sub>0</sub>: μ<sub>drug</sub> = μ<sub>control</sub>
  +
  +
The true null hypothesis is ''H''<sub>T</sub>: μ<sub>drug</sub> ≤ μ<sub>control</sub>
  +
  +
The reduction occurs because, in order to gauge support for the alternative hypothesis, classical hypothesis testing requires us to calculate how often we would have obtained results as or more extreme than our experimental observations. In order to do this, we need first to define the probability of rejecting the null hypothesis for each possibility included in the null hypothesis and second to ensure that these probabilities are all less than or equal to the quoted [[significance level]] of the test. For any reasonable test procedure the largest of all these probabilities will occur on the boundary of the region ''H''<sub>T</sub>, specifically for the cases included in ''H''<sub>0</sub> only. Thus the test procedure can be defined (that is the [[critical values]] can be defined) for testing the null hypothesis ''H''<sub>T</sub> exactly as if the null hypothesis of interest was the reduced version ''H''<sub>0</sub>.
  +
  +
Note that there are some who argue that the null hypothesis cannot be as general as indicated above: as Fisher, who first coined the term "null hypothesis" said, "the null hypothesis must be exact, that is free of vagueness and ambiguity, because it must supply the basis of the 'problem of distribution,' of which the test of significance is the solution."<ref>Fisher, R.A. (1966). ''The design of experiments.'' 8th edition. Hafner:Edinburgh.</ref> Thus according to this view, the null hypothesis must be numerically exact — it must state that a particular quantity or difference is equal to a particular number. In classical science, it is most typically the statement that there is ''no effect'' of a particular treatment; in observations, it is typically that there is ''no difference'' between the value of a particular measured variable and that of a prediction. The usefulness of this viewpoint must be queried - one can note that the majority of null hypotheses test in practice do not meet this criterion of being "exact". For example, consider the usual test that two means are equal where the true values of the variances are unknown - exact values of the variances are not specified.
  +
  +
Most statisticians believe that it is valid to state direction as a part of null hypothesis, or as part of a null hypothesis/alternative hypothesis pair (for example see http://davidmlane.com/hyperstat/A73079.html). The logic is quite simple: if the direction is omitted, then if the null hypothesis is not rejected it is quite confusing to interpret the conclusion. Say, the null is that the population mean = 10, and the one-tailed alternative: mean > 10. If the sample evidence obtained through x-bar equals -200 and the corresponding t-test statistic equals -50, what is the conclusion? Not enough evidence to reject the null hypothesis? Surely not! But we cannot accept the one-sided alternative in this case. Therefore, to overcome this ambiguity, it is better to include the direction of the effect if the test is one-sided. The statistical theory required to deal with the simple cases dealt with here, and more complicated ones, makes use of the concept of an [[unbiased test]].
  +
  +
== Limitations ==
  +
 
A test of a null hypothesis is useful because it sets a limit on the probability of observing a [[data set]] as or more extreme than that observed if the null hypothesis is true. In general it is much harder to be precise about the corresponding probability if the alternative hypothesis is true.
  +
 
If experimental observations contradict the prediction of the null hypothesis, it means that either the null hypothesis is false, or the event under observation occurs very improbably. This gives us high confidence in the falsehood of the null hypothesis, which can be improved in proportion to the number of trials conducted. However, accepting the alternative hypothesis only commits us to a difference in observed parameters; it does not prove that the theory or principles that predicted such a difference is true, since it is always possible that the difference could be due to additional factors not recognized by the theory.
  +
 
For example, rejection of a null hypothesis that predicts that the rates of symptom relief in a sample of patients who received a [[placebo]] and a sample who received a medicinal drug will be equal allows us to make a non-null statement (that the rates differed); it does not prove that the drug relieved the symptoms, though it gives us more confidence in that hypothesis.
  +
 
The formulation, testing, and rejection of null hypotheses is methodologically consistent with the [[falsifiability]] model of [[Science|scientific discovery]] formulated by [[Karl Popper]] and widely believed to apply to most kinds of [[empirical research]]. However, concerns regarding the high [[Statistical power|power]] of [[statistical hypothesis testing|statistical tests]] to detect differences in large samples have led to suggestions for re-defining the null hypothesis, for example as a hypothesis that an effect falls within a range considered negligible. This is an attempt to address the confusion among non-statisticians between ''significant'' and ''substantial'', since large enough samples are likely to be able to indicate differences however minor.
  +
  +
The theory underlying the idea of a null hypothesis is closely associated with the [[frequency probability|frequency]] theory of probability, in which probabilistic statements can only be made about the relative frequencies of events in arbitrarily large samples. One way in which a failure to reject the null hypothesis is meaningful is in relation to an arbitrarily large population from which the observed sample is supposed to be drawn. A second way in which it is meaningful is from approach where both an experiment and all details of the statistical analysis are decided before doing the experiment. The significance level of a test is then conceptually identical to the probability of incorrectly rejecting the null hypothesis judged at a pre-experiment stage, where this probability need not be a frequency-based/large-sample one.
  +
 
== Publication bias ==
  +
{{Main|Publication bias}}
   
==Publication bias==
 
 
In [[2002]], a group of psychologists launched a new journal dedicated to experimental studies in [[psychology]] which support the null hypothesis. The ''Journal of Articles in Support of the Null Hypothesis'' (JASNH) was founded to address a scientific publishing bias against such articles. [http://www.jasnh.com/] According to the editors,
 
In [[2002]], a group of psychologists launched a new journal dedicated to experimental studies in [[psychology]] which support the null hypothesis. The ''Journal of Articles in Support of the Null Hypothesis'' (JASNH) was founded to address a scientific publishing bias against such articles. [http://www.jasnh.com/] According to the editors,
   
:"other journals and reviewers have exhibited a bias against articles that did not reject the null hypothesis. We plan to change that by offering an outlet for experiments that do not reach the traditional significance levels (p < 0.05). Thus, reducing the file drawer problem, and reducing the bias in psychological literature. Without such a resource researchers could be wasting their time examining empirical questions that have already been examined. We collect these articles and provide them to the scientific community free of cost."
+
: "other journals and reviewers have exhibited a bias against articles that did not reject the null hypothesis. We plan to change that by offering an outlet for experiments that do not reach the traditional significance levels (p < 0.05). Thus, reducing the [[file drawer problem]], and reducing the bias in psychological literature. Without such a resource researchers could be wasting their time examining empirical questions that have already been examined. We collect these articles and provide them to the scientific community free of cost."
   
  +
The "File Drawer problem" is a problem that exists due to the fact that academics tend not to publish results that indicate the null hypothesis could not be rejected. This does not mean that the relationship they were looking for did not exist, but it means they couldn't prove it. Even though these papers can often be interesting, they tend to end up unpublished, in "file drawers."
See also: [[publication bias]]
 
   
  +
Ioannidis has inventoried factors that should alert readers to risks of publication bias.<ref name="pmid16060722">{{cite journal | author = Ioannidis J | title = Why most published research findings are false | journal = PLoS Med | volume = 2 | issue = 8 | pages = e124 | year = 2005 | id = PMID 16060722 | doi = 10.1371/journal.pmed.0020124}}</ref>
==Null Hypothesis Controversy==
 
Null hypothesis testing has always been the subject of some controversy. Many statisticians have pointed out that rejecting the null hypothesis says nothing or very little about the likelihood that the null is true. Under traditional null hypothesis testing, the null is rejected when P(Data | Null) is extremely unlikely, say 0.05. However, reseachers are really interested in P(Null | Data) which cannot be inferred from a p-value. In some cases, P(Null | Data) approaches 1 while P(Data | Null) approaches 0, in other words, we can reject the null when it's virtually certain to be true. For this and other reasons, [[Gerd Gigerenzer]] has called null hypothesis testing "mindless statistics" while Jacob Cohen describes it as a ritual conducted to convince ourselves that we have the evidence needed to confirm our theories.
 
   
  +
== Controversy ==<!-- This section is linked from [[Statistics]] -->
Anscombe notes that “Tests of the null hypothesis that there is no difference between certain treatments are often made in the analysis of agricultural or industrial experiments in which alternative methods or processes are compared. Such tests are [...] totally irrelevant. What are needed are estimates of magnitudes of effects, with standard errors."
 
   
 
Null hypothesis testing is controversial when the alternative hypothesis is suspected to be true at the outset of the experiment, making the null hypothesis the reverse of what the experimenter actually believes; it is put forward only to allow the data to contradict it. Many statisticians have pointed out that rejecting the null hypothesis says nothing or very little about the likelihood that the null is true. Under traditional null hypothesis testing, the null is rejected when P(Data | Null) (where P(x|y) denotes the [[conditional probability|probability of x given y]]) is very small, say 0.05. However, researchers are really interested in P(Null | Data) which cannot be inferred from a [[p-value]]. In some cases, P(Null | Data) approaches 1 while P(Data | Null) approaches 0, in other words, we can reject the null when it's virtually certain to be true. For this and other reasons, [[Gerd Gigerenzer]] has called null hypothesis testing "mindless statistics" while Jacob Cohen describes it as a ritual conducted to convince ourselves that we have the evidence needed to confirm our theories.
[[category:experimental design]]
 
[[Category:Statistics]]
 
   
  +
[[Bayesian]] statisticians normally reject the idea of null hypothesis testing. Given a [[prior probability distribution]] for one or more parameters, sample evidence can be used to generate an updated [[posterior distribution]]. In this framework, but ''not'' in the null hypothesis testing framework, it is meaningful to make statements of the general form "the probability that the true value of the parameter is greater than 0 is p".
[[de:Nullhypothese]]
 
  +
  +
== References ==
  +
<references/>
  +
 
== See also ==
  +
* [[Counternull]]
  +
* [[Dream argument]]
  +
* [[Straw man]]
 
* [[P-value]]
  +
* [[Publication bias]]
 
* [[Statistical hypothesis testing]]
  +
* [[Null Hypothesis: The Journal of Unlikely Science]]
  +
  +
==External links==
  +
* [http://www.stats4students.com/Essentials/Hypothesis-Testing/Overview.php A Guide to Understanding the Null Hypothesis]
  +
* [http://www.null-hypothesis.co.uk Null Hypothesis - Journal of Unlikely Science]
  +
* [http://core.ecu.edu/psyc/wuenschk/StatHelp/NHST-SHIT.htm References for arguments for and against null hypothesis significance testing]
  +
* [http://davidmlane.com/hyperstat/A29337.html - HyperStat Online]
  +
{{Statistics}}
  +
  +
  +
<!--
  +
[[de:Hypothese (Statistik)]]
  +
[[es:Hipótesis nula]]
  +
[[fa:فرض صفر]]
  +
[[ko:귀무가설]]
  +
[[it:Ipotesi nulla]]
 
[[nl:Nulhypothese]]
 
[[nl:Nulhypothese]]
 
[[pt:Hipótese nula]]
 
[[pt:Hipótese nula]]
  +
[[simple:Null hypothesis]]
 
  +
[[fi:Nollahypoteesi]]
  +
[[sv:Nollhypotes]]
  +
-->
 
{{enWP|null_hypothesis}}
 
{{enWP|null_hypothesis}}
  +
[[Category:Experimental design]]
  +
[[Category:Hypothesis testing]]
 
[[Category:Statistics]]

Latest revision as of 01:44, 3 August 2013

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Individual differences | Personality | Philosophy | Social |
Methods | Statistics | Clinical | Educational | Industrial | Professional items | World psychology |

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory


This article is in need of attention from a psychologist/academic expert on the subject.
Please help recruit one, or improve this page yourself if you are qualified.
This banner appears on articles that are weak and whose contents should be approached with academic caution.
Main article: Hypothesis testing

In statistics, a null hypothesis (H0) is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. This procedure is sometimes known as null hypothesis significance testing (NHST) or null hypothesis testing (NHT)

When used, the null hypothesis is presumed true until statistical evidence, in the form of a hypothesis test, indicates otherwise — that is, when the researcher has a certain degree of confidence, usually 95% to 99%, that the data does not support the null hypothesis. It is possible for an experiment to fail to reject the null hypothesis. It is also possible that both the null hypothesis and the alternate hypothesis are rejected if there are more than those two possibilities.it isn't statistically approved

In scientific and medical applications, the null hypothesis plays a major role in testing the significance of differences in treatment and control groups. The assumption at the outset of the experiment is that no difference exists between the two groups (for the variable being compared): this is the null hypothesis in this instance. Other types of null hypothesis may be, for example, that:

  • values in samples from a given population can be modelled using a certain family of statistical distributions.
  • the variability of data in different groups is the same, although they may be centred around different values.

The term was coined by the English geneticist and statistician Ronald Fisher.

Example

For example, one may want to compare the test scores of two random samples of men and women, and ask whether or not one population has a mean score different from the other. A null hypothesis would be that the mean score of the male population was the same as the mean score of the female population:

H0 : μ1 = μ2

where:

H0 = the null hypothesis
μ1 = the mean of population 1, and
μ2 = the mean of population 2.

Alternatively, the null hypothesis can postulate that the two samples are drawn from the same population, so that the variance and shape of the distributions are equal, as well as the means.

Formulation of the null hypothesis is a vital step in testing statistical significance. Having formulated such a hypothesis, one can establish the probability of observing the obtained data or data more different from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what is commonly called the "significance level" of the results.

That is, in scientific experimental design, we may predict that a particular factor will produce an effect on our dependent variable — this is our alternative hypothesis. We then consider how often we would expect to observe our experimental results, or results even more extreme, if we were to take many samples from a population where there was no effect (i.e. we test against our null hypothesis). If we find that this happens rarely (up to, say, 5% of the time), we can conclude that our results support our experimental prediction — we reject our null hypothesis.

Directionality

In many statements of null hypotheses there is no appearance that these can have a "directionality", in that the statement says that values are identical. However, null hypotheses can and do have "direction" - in many of these instances statistical theory allows the formulation of the test procedure to be simplified so that the test is equivalent to testing for an exact identity. That is, if we formulate a one-tailed alternative hypothesis that application of Drug A will lead to increased growth in patients, the effective null hypothesis remains that application of Drug A will have no effect on growth in patients. It is not merely the opposite of the alternative hypothesis — that is, it is not that application of Drug A will not lead to increased growth in patients. However this does remain the true null hypothesis.

To explain why this should be so, it is instructive to consider the nature of the hypotheses outlined above. We are predicting that patients exposed to Drug A will see increased growth compared to a control group who do not receive the drug. That is,

H1: μdrug > μcontrol

where:

μ = the patients' mean growth.

The effective null hypothesis is H0: μdrug = μcontrol

The true null hypothesis is HT: μdrug ≤ μcontrol

The reduction occurs because, in order to gauge support for the alternative hypothesis, classical hypothesis testing requires us to calculate how often we would have obtained results as or more extreme than our experimental observations. In order to do this, we need first to define the probability of rejecting the null hypothesis for each possibility included in the null hypothesis and second to ensure that these probabilities are all less than or equal to the quoted significance level of the test. For any reasonable test procedure the largest of all these probabilities will occur on the boundary of the region HT, specifically for the cases included in H0 only. Thus the test procedure can be defined (that is the critical values can be defined) for testing the null hypothesis HT exactly as if the null hypothesis of interest was the reduced version H0.

Note that there are some who argue that the null hypothesis cannot be as general as indicated above: as Fisher, who first coined the term "null hypothesis" said, "the null hypothesis must be exact, that is free of vagueness and ambiguity, because it must supply the basis of the 'problem of distribution,' of which the test of significance is the solution."[1] Thus according to this view, the null hypothesis must be numerically exact — it must state that a particular quantity or difference is equal to a particular number. In classical science, it is most typically the statement that there is no effect of a particular treatment; in observations, it is typically that there is no difference between the value of a particular measured variable and that of a prediction. The usefulness of this viewpoint must be queried - one can note that the majority of null hypotheses test in practice do not meet this criterion of being "exact". For example, consider the usual test that two means are equal where the true values of the variances are unknown - exact values of the variances are not specified.

Most statisticians believe that it is valid to state direction as a part of null hypothesis, or as part of a null hypothesis/alternative hypothesis pair (for example see http://davidmlane.com/hyperstat/A73079.html). The logic is quite simple: if the direction is omitted, then if the null hypothesis is not rejected it is quite confusing to interpret the conclusion. Say, the null is that the population mean = 10, and the one-tailed alternative: mean > 10. If the sample evidence obtained through x-bar equals -200 and the corresponding t-test statistic equals -50, what is the conclusion? Not enough evidence to reject the null hypothesis? Surely not! But we cannot accept the one-sided alternative in this case. Therefore, to overcome this ambiguity, it is better to include the direction of the effect if the test is one-sided. The statistical theory required to deal with the simple cases dealt with here, and more complicated ones, makes use of the concept of an unbiased test.

Limitations

A test of a null hypothesis is useful because it sets a limit on the probability of observing a data set as or more extreme than that observed if the null hypothesis is true. In general it is much harder to be precise about the corresponding probability if the alternative hypothesis is true.

If experimental observations contradict the prediction of the null hypothesis, it means that either the null hypothesis is false, or the event under observation occurs very improbably. This gives us high confidence in the falsehood of the null hypothesis, which can be improved in proportion to the number of trials conducted. However, accepting the alternative hypothesis only commits us to a difference in observed parameters; it does not prove that the theory or principles that predicted such a difference is true, since it is always possible that the difference could be due to additional factors not recognized by the theory.

For example, rejection of a null hypothesis that predicts that the rates of symptom relief in a sample of patients who received a placebo and a sample who received a medicinal drug will be equal allows us to make a non-null statement (that the rates differed); it does not prove that the drug relieved the symptoms, though it gives us more confidence in that hypothesis.

The formulation, testing, and rejection of null hypotheses is methodologically consistent with the falsifiability model of scientific discovery formulated by Karl Popper and widely believed to apply to most kinds of empirical research. However, concerns regarding the high power of statistical tests to detect differences in large samples have led to suggestions for re-defining the null hypothesis, for example as a hypothesis that an effect falls within a range considered negligible. This is an attempt to address the confusion among non-statisticians between significant and substantial, since large enough samples are likely to be able to indicate differences however minor.

The theory underlying the idea of a null hypothesis is closely associated with the frequency theory of probability, in which probabilistic statements can only be made about the relative frequencies of events in arbitrarily large samples. One way in which a failure to reject the null hypothesis is meaningful is in relation to an arbitrarily large population from which the observed sample is supposed to be drawn. A second way in which it is meaningful is from approach where both an experiment and all details of the statistical analysis are decided before doing the experiment. The significance level of a test is then conceptually identical to the probability of incorrectly rejecting the null hypothesis judged at a pre-experiment stage, where this probability need not be a frequency-based/large-sample one.

Publication bias

Main article: Publication bias

In 2002, a group of psychologists launched a new journal dedicated to experimental studies in psychology which support the null hypothesis. The Journal of Articles in Support of the Null Hypothesis (JASNH) was founded to address a scientific publishing bias against such articles. [1] According to the editors,

"other journals and reviewers have exhibited a bias against articles that did not reject the null hypothesis. We plan to change that by offering an outlet for experiments that do not reach the traditional significance levels (p < 0.05). Thus, reducing the file drawer problem, and reducing the bias in psychological literature. Without such a resource researchers could be wasting their time examining empirical questions that have already been examined. We collect these articles and provide them to the scientific community free of cost."

The "File Drawer problem" is a problem that exists due to the fact that academics tend not to publish results that indicate the null hypothesis could not be rejected. This does not mean that the relationship they were looking for did not exist, but it means they couldn't prove it. Even though these papers can often be interesting, they tend to end up unpublished, in "file drawers."

Ioannidis has inventoried factors that should alert readers to risks of publication bias.[2]

Controversy

Null hypothesis testing is controversial when the alternative hypothesis is suspected to be true at the outset of the experiment, making the null hypothesis the reverse of what the experimenter actually believes; it is put forward only to allow the data to contradict it. Many statisticians have pointed out that rejecting the null hypothesis says nothing or very little about the likelihood that the null is true. Under traditional null hypothesis testing, the null is rejected when P(Data | Null) (where P(x|y) denotes the probability of x given y) is very small, say 0.05. However, researchers are really interested in P(Null | Data) which cannot be inferred from a p-value. In some cases, P(Null | Data) approaches 1 while P(Data | Null) approaches 0, in other words, we can reject the null when it's virtually certain to be true. For this and other reasons, Gerd Gigerenzer has called null hypothesis testing "mindless statistics" while Jacob Cohen describes it as a ritual conducted to convince ourselves that we have the evidence needed to confirm our theories.

Bayesian statisticians normally reject the idea of null hypothesis testing. Given a prior probability distribution for one or more parameters, sample evidence can be used to generate an updated posterior distribution. In this framework, but not in the null hypothesis testing framework, it is meaningful to make statements of the general form "the probability that the true value of the parameter is greater than 0 is p".

References

  1. Fisher, R.A. (1966). The design of experiments. 8th edition. Hafner:Edinburgh.
  2. Ioannidis J (2005). Why most published research findings are false. PLoS Med 2 (8): e124. PMID 16060722.

See also

External links




This page uses Creative Commons Licensed content from Wikipedia (view authors).