Wikia

Psychology Wiki

Changes: Biased sample

Edit

Back to page

 
Line 1: Line 1:
 
{{StatsPsy}}
 
{{StatsPsy}}
A '''biased sample''' is a [[sample (statistics)|statistical sample]] of a [[statistical population|population]] where some members of the population are less likely to be included than others. An extreme form of biased sampling occurs when certain members of the population are totally excluded from the sample (that is, they have zero probability of being selected). For example, a survey of high school students to measure teenage use of illegal drugs will be a biased sample because it does not include home schooled students or dropouts. A sample is also biased if certain members are underrepresented or overrepresented relative to others in the population. For example, a "man on the street" interview which selects people who walk by a certain location is going to have an over-representation of healthy individuals who are more likely to be out of the home than individuals with a chronic illness.
+
{{expert}}
  +
In [[statistics]], '''sampling bias''' is a [[bias]] in which a sample is collected in such a way that some members of the intended [[statistical population|population]] are less likely to be included than others. It results in a '''biased sample''', a non-[[random sample]]<ref> [http://www.medilexicon.com/medicaldictionary.php?t=10087 Medical Dictionary - 'Sampling Bias'] Retrieved on September 23, 2009</ref> of a [[statistical population|population]] (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected.<ref>[http://medical-dictionary.thefreedictionary.com/Sample+bias TheFreeDictionary – biased sample] Retrieved on 2009-09-23. Site in turn cites: Mosby's Medical Dictionary, 8th edition.</ref> If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of [[sampling (statistics)|sampling]].
   
==Problems caused by a biased sample==
+
Medical sources sometimes refer to sampling bias as '''ascertainment bias'''.<ref>{{cite book |author=Weising, Kurt |title=DNA fingerprinting in plants: principles, methods, and applications |publisher=Taylor & Francis Group |location=London |year=2005 |page=180 |isbn=0-8493-1488-7}}</ref><ref>Page 34 in: [http://www.tesisenxarxa.net/TESIS_UPF/AVAILABLE/TDX-0710109-123904//tars.pdf Selection and linkage desequilibrium tests under complex demographies and ascertainment bias] Francesc Calafell i Majó, Anna Ramírez i Soriano. July 2008</ref> Ascertainment bias has basically the same definition,<ref name=saem/><ref>[http://www.medilexicon.com/medicaldictionary.php?t=10080 medilexicon Medical Dictionary - 'Ascertainment Bias'] Retrieved on November 14, 2009</ref> but is still sometimes classified as a separate type of bias.<ref name=saem>[htem.org/sites/default/files/issuu/libraries/Panacek_Error_And_Bias_In_Clinical_Research_syllabus_1.pdf Panacek: Error in research] [[Society for Academic Emergency Medicine]]. Retrieved on November 14, 2009</ref>
   
A biased sample causes problems because any [[statistic]] computed from that sample has the potential to be consistently erroneous. The bias can lead to an over- or under-representation of the corresponding [[parameter]] in the population. Almost every sample in practice is biased because it is practically impossible to insure a perfectly random sample. If the degree of underrepresentation is small, the sample can be treated as a reasonable approximation to a random sample. Also, if the group that is underrepresented does not differ markedly from the other groups in the quantity being measured, then a random sample can still be a reasonable approximation.
+
==Distinction from selection bias==
  +
Sampling bias is mostly classified as a subtype of [[selection bias]],<ref>[http://medical.webends.com/kw/Selection%20Bias Dictionary of Cancer Terms – Selection Bias] Retrieved on September 23, 2009</ref> sometimes specifically termed '''sample selection bias''',<ref>[http://www.ncbi.nlm.nih.gov/pubmed/9504213 The effects of sample selection bias on racial differences in child abuse reporting] Ards S, Chung C, Myers SL Jr. Child Abuse Negl. 1999 December;23(12):1209; author reply 1211-5. PMID 9504213</ref><ref>[http://www.cs.nyu.edu/~mohri/postscript/bias.pdf Sample Selection Bias Correction Theory] [[Corinna Cortes]], Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. New York University.</ref><ref>[http://www.cs.nyu.edu/~mohri/pub/nsmooth.pdf Domain Adaptation and Sample Bias Correction Theory and Algorithm for Regression] [[Corinna Cortes]], Mehryar Mohri. New York University.</ref> but some classify it as a separate type of bias.<ref>[http://books.google.com/books?id=f0IDHvLiWqUC&printsec=frontcover&source=gbs_navlinks_s#v=onepage&q=&f=false Page 262 in: Behavioral Science. Board Review Series.] By Barbara Fadem. ISBN 0-7817-8257-0, ISBN 978-0-7817-8257-9. 216 pages</ref> A distinction, albeit not universally accepted, of sampling bias is that it undermines the [[external validity]] of a test (the ability of its results to be generalized to the rest of the population), while [[selection bias]] mainly addresses [[internal validity]] for differences or similarities found in the sample at hand. In this sense, errors occurring in the process of gathering the sample or cohort cause sampling bias, while errors in any process thereafter cause selection bias.
   
The word [[bias]] in common usage has a strong negative connotation, and implies a deliberate intent to mislead. In statistical usage, bias represents a mathematical property. While some individuals might deliberately use a biased sample to produce misleading results, more often, a biased sample is just a reflection of the difficulty in obtaining a truly representative sample.
+
However, selection bias and sampling bias are often used synonymously.<ref>[http://books.google.dk/books?id=EBq63uyt87QC&printsec=frontcover&source=gbs_navlinks_s#v=onepage&q=sometimes%20used%20synonymously&f=false Wallace/Maxcy-Rosenau-Last public health & preventive medicine (page 21)] 15ed, illustrated. By Robert B. Wallace. ISBN 0-07-144198-0, ISBN 978-0-07-144198-8</ref>
   
Some samples use a design that is deliberately biased. The U.S. [[National Center for Health Statistics]] will deliberately oversample from minority populations in many of its nationwide surveys in order to gain sufficient precision for estimates within these groups<sup>(NCHS 2007)</sup>. These surveys require the use of sample weights (see below) to produce proper estimates across all racial and ethnic groups.
+
==Types of sampling bias==
  +
* Selection from a '''specific real area'''. For example, a survey of high school students to measure teenage use of illegal drugs will be a biased sample because it does not include home-schooled students or dropouts. A sample is also biased if certain members are underrepresented or overrepresented relative to others in the population. For example, a "man on the street" interview which selects people who walk by a certain location is going to have an overrepresentation of healthy individuals who are more likely to be out of the home than individuals with a chronic illness. This may be an extreme form of biased sampling, because certain members of the population are totally excluded from the sample (that is, they have zero probability of being selected).
  +
* '''[[Self-selection]]''' bias, which is possible whenever the group of people being studied has any form of control over whether to participate. Participants' decision to participate may be correlated with traits that affect the study, making the participants a non-representative sample. For example, people who have strong opinions or substantial knowledge may be more willing to spend time answering a survey than those who do not. Another example is [[online and phone-in polls]], which are biased samples because the respondents are self-selected. Those individuals who are highly motivated to respond, typically individuals who have strong opinions, are overrepresented, and individuals that are indifferent or apathetic are less likely to respond. This often leads to a polarization of responses with extreme perspectives being given a disproportionate weight in the summary. As a result, these types of polls are regarded as unscientific.
  +
* '''Pre-screening''' of trial participants, or '''advertising''' for volunteers within particular groups. For example a study to "prove" that smoking does not affect fitness might recruit at the local fitness center, but advertise for smokers during the advanced aerobics class, and for non-smokers during the weight loss sessions.
  +
* '''Exclusion''' bias results from exclusion of particular groups from the sample, e.g. exclusion of subjects who have recently [[human migration|migrated]] into the study area (this may occur when newcomers are not available in a register used to identify the source population). Excluding subjects who move out of the study area during follow-up is rather equivalent of dropout or nonresponse, a [[selection bias]] in that it rather affects the internal validity of the study.
  +
* '''[[Healthy user bias]]''', when the study population is likely healthier than the general population, e.g. workers (i.e. someone in ill-health is unlikely to have a job as manual laborer).
  +
* '''[[Overmatching]]''', matching for an apparent confounder that actually is a result of the exposure. The control group becomes more similar to the cases in regard to exposure than the general population.
   
==Examples of biased samples==
+
===Symptom-based sampling===
  +
The study of medical conditions begins with anecdotal reports. By their nature, such reports only include those referred for diagnosis and treatment. A child who can't function in school is more likely to be diagnosed with [[dyslexia]] than a child who struggles but passes. A child examined for one condition is more likely to be tested for and diagnosed with other conditions, skewing [[comorbidity]] statistics. As certain diagnoses become associated with behavior problems or [[mental retardation]], parents try to prevent their children from being stigmatized with those diagnoses, introducing further bias. Studies carefully selected from whole populations are showing that many conditions are much more common and usually much milder than formerly believed.
   
Online and call-in polls are biased samples because the respondents are self-selected. Those individuals who are highly motivated to respond, typically individuals who have strong opinions, are overrepresented, and individuals that are indifferent or apathetic are less likely to respond. This often leads to a polarization of responses with extreme perspectives being given a disproportionate weight in the summary. As a result, these types of polls are regarded as unscientific.
+
===Truncate selection in pedigree studies===
  +
[[File:Ascertainment bias.png|600px|center|thumbnail|Simple pedigree example of sampling bias]]
  +
Geneticists are limited in how they can obtain data from human populations. As an example, consider a human characteristic. We are interested in deciding if the characteristic is inherited as a [[autosomal recessive|simple Mendelian]] trait. Following the laws of [[Mendelian inheritance]], if the parents in a family do not have the characteristic, but carry the allele for it, they are carriers (e.g. a non-expressive [[heterozygote]]). In this case their children will each have a 25% chance of showing the characteristic. The problem arises because we can't tell which families have both parents as carriers (heterozygous) unless they have a child who exhibits the characteristic. The description follows the textbook by Sutton.<ref>{{cite book |author=H. Eldon Sutton |title=An Introduction to Human Genetics |year= 1988 |edition=4th Edition |publisher=Harcourt Brace Jovanovich |location=San Diego |isbn=0-15-540099-1}}</ref>
   
A classic example of a biased sample and the misleading results it produced occurred in [[1936]]. In the early days of opinion polling, the American ''Literary Digest'' magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. The result was the exact opposite. The Literary Digest survey represented a sample collected from readers of the magazine, supplemented by records of registered automobile owners and telephone users. This sample included an over-representation of individuals who were rich, who, as a group, were more likely to vote for the Republican candidate. In contrast, a poll of only 50 thousand citizens selected by [[George Gallup]]'s organization successfully predicted the result, leading to the popularity of the [[Gallup poll]].
+
The figure shows the pedigrees of all the possible families with two children when the parents are carriers (Aa).
  +
* '''Nontruncate selection'''. In a perfect world we should be able to discover all such families with a gene including those who are simply carriers. In this situation the analysis would be free from ascertainment bias and the pedigrees would be under "nontruncate selection" In practice, most studies identify, and include, families in a study based upon them having affected individuals.
  +
* '''Truncate selection'''. When afflicted ''individuals'' have an equal chance of being included in a study this is called truncate selection, signifying the inadvertent exclusion (truncation) of families who are carriers for a gene. Because selection is performed on the individual level, families with two or more affected children would have a higher probability of becoming included in the study.
  +
* '''Complete truncate selection''' is a special case where each ''family'' with an affected child has an equal chance of being selected for the study.
   
==Statistical corrections for a biased sample==
+
The probabilities of each of the families being selected is given in the figure, with the sample frequency of affected children also given. In this simple case, the researcher will look for a frequency of {{frac|4|7}} or {{frac|5|8}} for the characteristic, depending on the type of truncate selection used.
   
If entire segments of the population are excluded from a sample, then there are no adjustments that can produce estimates that are representative of the entire population. But if some groups are underrepresented and you can quantify the degree of underrepresentation, then sample weights can correct the bias.
+
===The caveman effect===
  +
An example of selection bias is called the "caveman effect." Much of our understanding of [[Prehistory|prehistoric]] peoples comes from caves, such as [[cave painting]]s made nearly 40,000 years ago. If there had been contemporary paintings on trees, animal skins or hillsides, they would have been washed away long ago. Similarly, evidence of fire pits, [[midden]]s, [[ceremonial burial|burial sites]], etc. are most likely to remain intact to the modern era in caves. Prehistoric people are associated with caves because that is where the data still exists, not necessarily because most of them lived in caves for most of their lives.{{Or|date=May 2012}}{{Citation needed|date=May 2012}}
   
For example, a hypothetical population might include 10 million men and 10 million women. Suppose that a biased sample of 100 patients included 20 men and 80 women. A researcher could correct for this imbalance by attaching a weight of 2.5 for each male and 0.675 for each female. This would adjust any estimates to achieve the same expected value as a sample that included exactly 50 men and 50 women.
+
==Problems caused by sampling bias==
  +
A biased sample causes problems because any [[statistic]] computed from that sample has the potential to be consistently erroneous. The bias can lead to an over- or underrepresentation of the corresponding [[parameter]] in the population. Almost every sample in practice is biased because it is practically impossible to ensure a perfectly random sample. If the degree of underrepresentation is small, the sample can be treated as a reasonable approximation to a random sample. Also, if the group that is underrepresented does not differ markedly from the other groups in the quantity being measured, then a random sample can still be a reasonable approximation.
   
==Spotlight fallacy==
+
The word [[bias]] in common usage has a strong negative word connotation, and implies a deliberate intent to mislead or other [[scientific fraud]]. In statistical usage, bias merely represents a mathematical property, no matter if it is deliberate or either unconscious or due to imperfections in the instruments used for observation. While some individuals might deliberately use a biased sample to produce misleading results, more often, a biased sample is just a reflection of the difficulty in obtaining a truly representative sample.
   
The Spotlight fallacy is committed when a person uncritically assumes that all members or cases of a certain class or type are like those that receive the most attention or coverage in the media. This line of “reasoning” has the following form:
+
Some samples use a biased statistical design which nevertheless allows the estimation of parameters. The U.S. [[National Center for Health Statistics]] for example, deliberately oversamples from minority populations in many of its nationwide surveys in order to gain sufficient precision for estimates within these groups.<ref>[http://www.cdc.gov/nchs/about/otheract/minority/minority.htm National Center for Health Statistics (2007). Minority Health.]</ref> These surveys require the use of sample weights (see below) to produce proper estimates across all racial and ethnic groups. Provided that certain conditions are met (chiefly that the sample is drawn randomly from the entire sample) these samples permit accurate estimation of population parameters.
   
1. Xs with quality Q receive a great deal of attention or coverage in the media.
+
==Historical examples==
2. Therefore all Xs have quality Q.
+
[[File:Acid2compliancebyusage.png|thumb|right|250px|Example of biased sample, claiming as of June 2008, that only 54% of web browsers ([[Internet Explorer]]) in use do not pass the [[Acid2]] test. The statistics are from visitors to one website comprising mostly web developers.<ref>{{cite web |url=http://www.w3schools.com/browsers/browsers_stats.asp |title=Browser Statistics |publisher=Refsnes Data |month= June |year=2008 |accessdate=2008-07-05}}</ref>]]
  +
A classic example of a biased sample and the misleading results it produced occurred in 1936. In the early days of opinion polling, the American ''[[Literary Digest]]'' magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, [[Alf Landon]], would beat the incumbent president, [[Franklin Roosevelt]] by a large margin. The result was the exact opposite. The Literary Digest survey represented a sample collected from readers of the magazine, supplemented by records of registered automobile owners and telephone users. This sample included an over-representation of individuals who were rich, who, as a group, were more likely to vote for the Republican candidate. In contrast, a poll of only 50 thousand citizens selected by [[George Gallup]]'s organization successfully predicted the result, leading to the popularity of the [[Gallup poll]].
   
This line of reasoning is fallacious since the mere fact that someone or something attracts the most attention or coverage in the media does not mean that it automatically represents the whole population. For example, suppose a mass murderer from Old Town, Maine, received a great deal of attention in the media. It would hardly follow that everyone from the town is a mass murderer.
+
Another classic example occurred in the [[United States presidential election, 1948|1948 Presidential Election]]. On Election night, the [[Chicago Tribune]] printed the headline ''[[Dewey Defeats Truman|DEWEY DEFEATS TRUMAN]]'', which turned out to be mistaken. In the morning the grinning President-Elect, [[Harry S. Truman]], was photographed holding a newspaper bearing this headline. The reason the Tribune was mistaken is that their editor trusted the results of a [[phone survey]]. Survey research was then in its infancy, and few academics realized that a sample of telephone users was not representative of the general population. Telephones were not yet widespread, and those who had them tended to be prosperous and have stable addresses. (In many cities, the [[Bell System]] [[telephone directory]] contained the same names as the [[Social Register]].) In addition, the Gallup poll that the Tribune based its headline on was over two weeks old at the time of the printing.<ref>based on http://www.uh.edu/engines/epi1199.htm retrieved on September 29, 2007</ref>
   
The [[Spotlight fallacy]] derives its name from the fact that receiving a great deal of attention or coverage is often referred to as being in the spotlight. It is similar to [[Hasty Generalization]], Biased Sample and [[Misleading Vividness]] because the error being made involves generalizing about a population based on an inadequate or flawed sample.
+
==Statistical corrections for a biased sample==
The Spotlight Fallacy is a very common fallacy. This fallacy most often occurs when people assume that those who receive the most media attention actually represent the groups they belong to. For example, some people began to believe that all those who oppose abortion are willing to gun down doctors in cold blood simply because those incidents received a great deal of media attention. Since the media typically covers people or events that are unusual or exceptional, it is somewhat odd for people to believe that such people or events are representative.
+
If entire segments of the population are excluded from a sample, then there are no adjustments that can produce estimates that are representative of the entire population. But if some groups are underrepresented and the degree of underrepresentation can be quantified, then sample weights can correct the bias.[citation needed]
  +
  +
For example, a hypothetical population might include 10 million men and 10 million women. Suppose that a biased sample of 100 patients included 20 men and 80 women. A researcher could correct for this imbalance by attaching a weight of 2.5 for each male and 0.625 for each female. This would adjust any estimates to achieve the same expected value as a sample that included exactly 50 men and 50 women, unless men and women differed in their likelihood of taking part in the survey.
   
===Examples===
 
#I wouldn't like to go to America because of all the gun crime, we see it on the news all the time.
 
#'''Doctor:''' Why don't patients make some effort to look after themselves? My surgery is full of people who eat, drink, smoke and don't get any exercise. ''Of course he may have many more patients who do look after themselves and don't often turn up in his surgery.''
 
#Why do young people all take drugs and go around mugging old ladies? You read about it in the paper all the time!
 
#'''Child:''' When I grow up I want to be a singer. Have you seen how much money those pop-stars make?!
 
   
 
==See also==
 
==See also==
*[[Cherry picking]]
+
*[[Censored regression model]]
  +
*[[Cherry picking (fallacy)]]
  +
*[[File drawer problem]]
  +
*[[Friendship paradox]]
  +
*[[Reporting bias]]
  +
*[[Selection bias]]
  +
*[[Spectrum bias]]
  +
*[[Truncated regression model]]
  +
  +
==References==
  +
{{Reflist}}
  +
   
 
==External links==
 
==External links==
Line 49: Line 44:
 
{{Informal_Fallacy}}
 
{{Informal_Fallacy}}
   
 
:fr:Échantillon biaisé
 
:he:דגימה מוטה
 
:pt:Amostra polarizada
 
 
==References==
 
   
   
Line 60: Line 49:
 
[[Category:Inductive fallacies]]
 
[[Category:Inductive fallacies]]
 
[[Category:Misuse of statistics]]
 
[[Category:Misuse of statistics]]
[[Category:Statitical biases]]
+
[[Category:Sampling (experimental)]]
  +
[[Category:Statistical biases]]

Latest revision as of 21:15, September 6, 2013

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Individual differences | Personality | Philosophy | Social |
Methods | Statistics | Clinical | Educational | Industrial | Professional items | World psychology |

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory


This article is in need of attention from a psychologist/academic expert on the subject.
Please help recruit one, or improve this page yourself if you are qualified.
This banner appears on articles that are weak and whose contents should be approached with academic caution
.

In statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a biased sample, a non-random sample[1] of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected.[2] If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.

Medical sources sometimes refer to sampling bias as ascertainment bias.[3][4] Ascertainment bias has basically the same definition,[5][6] but is still sometimes classified as a separate type of bias.[5]

Distinction from selection biasEdit

Sampling bias is mostly classified as a subtype of selection bias,[7] sometimes specifically termed sample selection bias,[8][9][10] but some classify it as a separate type of bias.[11] A distinction, albeit not universally accepted, of sampling bias is that it undermines the external validity of a test (the ability of its results to be generalized to the rest of the population), while selection bias mainly addresses internal validity for differences or similarities found in the sample at hand. In this sense, errors occurring in the process of gathering the sample or cohort cause sampling bias, while errors in any process thereafter cause selection bias.

However, selection bias and sampling bias are often used synonymously.[12]

Types of sampling biasEdit

  • Selection from a specific real area. For example, a survey of high school students to measure teenage use of illegal drugs will be a biased sample because it does not include home-schooled students or dropouts. A sample is also biased if certain members are underrepresented or overrepresented relative to others in the population. For example, a "man on the street" interview which selects people who walk by a certain location is going to have an overrepresentation of healthy individuals who are more likely to be out of the home than individuals with a chronic illness. This may be an extreme form of biased sampling, because certain members of the population are totally excluded from the sample (that is, they have zero probability of being selected).
  • Self-selection bias, which is possible whenever the group of people being studied has any form of control over whether to participate. Participants' decision to participate may be correlated with traits that affect the study, making the participants a non-representative sample. For example, people who have strong opinions or substantial knowledge may be more willing to spend time answering a survey than those who do not. Another example is online and phone-in polls, which are biased samples because the respondents are self-selected. Those individuals who are highly motivated to respond, typically individuals who have strong opinions, are overrepresented, and individuals that are indifferent or apathetic are less likely to respond. This often leads to a polarization of responses with extreme perspectives being given a disproportionate weight in the summary. As a result, these types of polls are regarded as unscientific.
  • Pre-screening of trial participants, or advertising for volunteers within particular groups. For example a study to "prove" that smoking does not affect fitness might recruit at the local fitness center, but advertise for smokers during the advanced aerobics class, and for non-smokers during the weight loss sessions.
  • Exclusion bias results from exclusion of particular groups from the sample, e.g. exclusion of subjects who have recently migrated into the study area (this may occur when newcomers are not available in a register used to identify the source population). Excluding subjects who move out of the study area during follow-up is rather equivalent of dropout or nonresponse, a selection bias in that it rather affects the internal validity of the study.
  • Healthy user bias, when the study population is likely healthier than the general population, e.g. workers (i.e. someone in ill-health is unlikely to have a job as manual laborer).
  • Overmatching, matching for an apparent confounder that actually is a result of the exposure. The control group becomes more similar to the cases in regard to exposure than the general population.

Symptom-based samplingEdit

The study of medical conditions begins with anecdotal reports. By their nature, such reports only include those referred for diagnosis and treatment. A child who can't function in school is more likely to be diagnosed with dyslexia than a child who struggles but passes. A child examined for one condition is more likely to be tested for and diagnosed with other conditions, skewing comorbidity statistics. As certain diagnoses become associated with behavior problems or mental retardation, parents try to prevent their children from being stigmatized with those diagnoses, introducing further bias. Studies carefully selected from whole populations are showing that many conditions are much more common and usually much milder than formerly believed.

Truncate selection in pedigree studiesEdit

Geneticists are limited in how they can obtain data from human populations. As an example, consider a human characteristic. We are interested in deciding if the characteristic is inherited as a simple Mendelian trait. Following the laws of Mendelian inheritance, if the parents in a family do not have the characteristic, but carry the allele for it, they are carriers (e.g. a non-expressive heterozygote). In this case their children will each have a 25% chance of showing the characteristic. The problem arises because we can't tell which families have both parents as carriers (heterozygous) unless they have a child who exhibits the characteristic. The description follows the textbook by Sutton.[13]

The figure shows the pedigrees of all the possible families with two children when the parents are carriers (Aa).

  • Nontruncate selection. In a perfect world we should be able to discover all such families with a gene including those who are simply carriers. In this situation the analysis would be free from ascertainment bias and the pedigrees would be under "nontruncate selection" In practice, most studies identify, and include, families in a study based upon them having affected individuals.
  • Truncate selection. When afflicted individuals have an equal chance of being included in a study this is called truncate selection, signifying the inadvertent exclusion (truncation) of families who are carriers for a gene. Because selection is performed on the individual level, families with two or more affected children would have a higher probability of becoming included in the study.
  • Complete truncate selection is a special case where each family with an affected child has an equal chance of being selected for the study.

The probabilities of each of the families being selected is given in the figure, with the sample frequency of affected children also given. In this simple case, the researcher will look for a frequency of Template:Frac or Template:Frac for the characteristic, depending on the type of truncate selection used.

The caveman effectEdit

An example of selection bias is called the "caveman effect." Much of our understanding of prehistoric peoples comes from caves, such as cave paintings made nearly 40,000 years ago. If there had been contemporary paintings on trees, animal skins or hillsides, they would have been washed away long ago. Similarly, evidence of fire pits, middens, burial sites, etc. are most likely to remain intact to the modern era in caves. Prehistoric people are associated with caves because that is where the data still exists, not necessarily because most of them lived in caves for most of their lives.[original research?]

[citation needed]

Problems caused by sampling biasEdit

A biased sample causes problems because any statistic computed from that sample has the potential to be consistently erroneous. The bias can lead to an over- or underrepresentation of the corresponding parameter in the population. Almost every sample in practice is biased because it is practically impossible to ensure a perfectly random sample. If the degree of underrepresentation is small, the sample can be treated as a reasonable approximation to a random sample. Also, if the group that is underrepresented does not differ markedly from the other groups in the quantity being measured, then a random sample can still be a reasonable approximation.

The word bias in common usage has a strong negative word connotation, and implies a deliberate intent to mislead or other scientific fraud. In statistical usage, bias merely represents a mathematical property, no matter if it is deliberate or either unconscious or due to imperfections in the instruments used for observation. While some individuals might deliberately use a biased sample to produce misleading results, more often, a biased sample is just a reflection of the difficulty in obtaining a truly representative sample.

Some samples use a biased statistical design which nevertheless allows the estimation of parameters. The U.S. National Center for Health Statistics for example, deliberately oversamples from minority populations in many of its nationwide surveys in order to gain sufficient precision for estimates within these groups.[14] These surveys require the use of sample weights (see below) to produce proper estimates across all racial and ethnic groups. Provided that certain conditions are met (chiefly that the sample is drawn randomly from the entire sample) these samples permit accurate estimation of population parameters.

Historical examplesEdit

File:Acid2compliancebyusage.png

A classic example of a biased sample and the misleading results it produced occurred in 1936. In the early days of opinion polling, the American Literary Digest magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. The result was the exact opposite. The Literary Digest survey represented a sample collected from readers of the magazine, supplemented by records of registered automobile owners and telephone users. This sample included an over-representation of individuals who were rich, who, as a group, were more likely to vote for the Republican candidate. In contrast, a poll of only 50 thousand citizens selected by George Gallup's organization successfully predicted the result, leading to the popularity of the Gallup poll.

Another classic example occurred in the 1948 Presidential Election. On Election night, the Chicago Tribune printed the headline DEWEY DEFEATS TRUMAN, which turned out to be mistaken. In the morning the grinning President-Elect, Harry S. Truman, was photographed holding a newspaper bearing this headline. The reason the Tribune was mistaken is that their editor trusted the results of a phone survey. Survey research was then in its infancy, and few academics realized that a sample of telephone users was not representative of the general population. Telephones were not yet widespread, and those who had them tended to be prosperous and have stable addresses. (In many cities, the Bell System telephone directory contained the same names as the Social Register.) In addition, the Gallup poll that the Tribune based its headline on was over two weeks old at the time of the printing.[16]

Statistical corrections for a biased sampleEdit

If entire segments of the population are excluded from a sample, then there are no adjustments that can produce estimates that are representative of the entire population. But if some groups are underrepresented and the degree of underrepresentation can be quantified, then sample weights can correct the bias.[citation needed]

For example, a hypothetical population might include 10 million men and 10 million women. Suppose that a biased sample of 100 patients included 20 men and 80 women. A researcher could correct for this imbalance by attaching a weight of 2.5 for each male and 0.625 for each female. This would adjust any estimates to achieve the same expected value as a sample that included exactly 50 men and 50 women, unless men and women differed in their likelihood of taking part in the survey.


See alsoEdit

ReferencesEdit

  1. Medical Dictionary - 'Sampling Bias' Retrieved on September 23, 2009
  2. TheFreeDictionary – biased sample Retrieved on 2009-09-23. Site in turn cites: Mosby's Medical Dictionary, 8th edition.
  3. Weising, Kurt (2005). DNA fingerprinting in plants: principles, methods, and applications, London: Taylor & Francis Group.
  4. Page 34 in: Selection and linkage desequilibrium tests under complex demographies and ascertainment bias Francesc Calafell i Majó, Anna Ramírez i Soriano. July 2008
  5. 5.0 5.1 [htem.org/sites/default/files/issuu/libraries/Panacek_Error_And_Bias_In_Clinical_Research_syllabus_1.pdf Panacek: Error in research] Society for Academic Emergency Medicine. Retrieved on November 14, 2009
  6. medilexicon Medical Dictionary - 'Ascertainment Bias' Retrieved on November 14, 2009
  7. Dictionary of Cancer Terms – Selection Bias Retrieved on September 23, 2009
  8. The effects of sample selection bias on racial differences in child abuse reporting Ards S, Chung C, Myers SL Jr. Child Abuse Negl. 1999 December;23(12):1209; author reply 1211-5. PMID 9504213
  9. Sample Selection Bias Correction Theory Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. New York University.
  10. Domain Adaptation and Sample Bias Correction Theory and Algorithm for Regression Corinna Cortes, Mehryar Mohri. New York University.
  11. Page 262 in: Behavioral Science. Board Review Series. By Barbara Fadem. ISBN 0-7817-8257-0, ISBN 978-0-7817-8257-9. 216 pages
  12. Wallace/Maxcy-Rosenau-Last public health & preventive medicine (page 21) 15ed, illustrated. By Robert B. Wallace. ISBN 0-07-144198-0, ISBN 978-0-07-144198-8
  13. H. Eldon Sutton (1988). An Introduction to Human Genetics, 4th Edition, San Diego: Harcourt Brace Jovanovich.
  14. National Center for Health Statistics (2007). Minority Health.
  15. (2008). Browser Statistics. Refsnes Data. URL accessed on 2008-07-05.
  16. based on http://www.uh.edu/engines/epi1199.htm retrieved on September 29, 2007


External linksEdit

National Center for Health Statistics (2007). Minority Health.


Informal fallacies
Special pleading | Red herring | Gambler's fallacy and its inverse
Fallacy of distribution (Composition | Division) | Begging the question | Many questions
Correlative-based fallacies:
False dilemma (Perfect solution) | Denying the correlative | Suppressed correlative
Deductive fallacies:
Accident | Converse accident
Inductive fallacies:
Hasty generalization | Overwhelming exception | Biased sample
False analogy | Misleading vividness | Conjunction fallacy
Vagueness:
False precision | Slippery slope
Ambiguity:
Amphibology | Continuum fallacy | False attribution (Contextomy | Quoting out of context)
Equivocation (Loki's Wager | No true Scotsman)
Questionable cause:
Correlation does not imply causation | Post hoc | Regression fallacy
Texas sharpshooter | Circular cause and consequence | Wrong direction | Single cause
Other types of fallacy


This page uses Creative Commons Licensed content from Wikipedia (view authors).

Around Wikia's network

Random Wiki