Rescorla-Wagner learning model

Cognitive Psychology: Attention · Decision making · Learning · Judgement · Memory · Motivation · Perception · Reasoning · Thinking - Cognitive processes Cognition - Outline Index

The Rescorla-Wagner model is a model of classical conditioning in which the animal is theorized to learn from the discrepancy between what is expected to happen and what actually happens. This is a trial-level model in which each stimulus is either present or not present at some point in the trial. The prediction of the unconditioned stimulus for a trial can be represented as the sum of all the associative strengths for the conditioned stimuli present during the trial. This is the feature of the model that represents a major advance over previous models, and allowed a straightforward explanation of important experimental phenomena such as blocking. For this reason, the Rescorla-Wagner model has become one of the most influential models of learning, though it has been frequently criticized since its publication. It has attracted considerable attention in recent years, as many studies have suggested that the phasic activity of dopamine neurons in mesostriatal DA projections in the midbrain encodes for the type of prediction error detailed in the model.

The Rescorla-Wagner model was created by Robert A. Rescorla of the University of Pennsylvania and Allan R. Wagner of Yale University.

Success and popularity

The Rescorla-Wagner model has been successful and popular because^[1]:

it can generate clear and ordinal predictions
it has a number of successful predictions
processing event representation by intensity and unexpectedness has an intuitive appeal
it provides considerable heuristic value
of its relatively few free parameters and independent variables
it has had little competition from other theories

Basic assumptions of the model

The amount of surprise an organism is assumed to experience when encountering an Unconditioned Stimulus (US) is assumed to be dependent on the summed associative value of all cues present during that trial. This assumption differs from previous models which considered only the associative value of a particular Conditioned Stimulus (CS) to be the determining aspect of surprise.
Excitation and inhibition are opposite features. One stimulus can only have a positive associative strength (being a conditioned excitor) or a negative associative strength (being a conditioned inhibitor) it cannot have both.
The associative strength of a stimulus is expressed directly in the behaviour it elicits/inhibits. There is no way of learning about a stimulus and not showing what was learned in the organism's reactions.
The salience of a CS is a constant. The salience of a CS (alpha) is not supposed to undergo any changes during training and can thus be represented by a constant.
The history of a cue does not have any effects on its current state. It is only the current associative value of a cue which determines the amount of learning. It does not matter whether the CS may have undergone several conditioning-extinction sessions or the like.

The first two assumptions are unique to the Rescorla-Wagner model. The last three assumptions where present in antecedents of the model and are less central to the theory but still important to the structure of the model.^[2]

Equation

$\Delta V^{n+1}_X = \alpha_X \beta (\lambda - V_{tot})$

and

$V^{n+1}_X = V^n_X + \Delta V^{n+1}_X$

where

$\Delta V_X$ is the change in the strength of association of X
$\alpha$ is the salience of the CS (bounded by 0 and 1)
$\beta$ is the rate parameter for the US (bounded by 0 and 1)
$\lambda$ is the maximum conditioning possible for the US
$V_X$ is the current associative strength
$V_{tot}$ is the total associative strength of all CS

The Revised RW model by Van Hamme and Wassermann (1994)

Van Hamme and Wassermann have extended the original Rescorla-Wagner (RW) model and introduced a new factor in their revised RW model in 1994^[3]: They suggested that not only conditioned stimuli physically present on a given trial can undergo changes in their associative strength, the associative value of a CS can also be altered by a within-compound-association with a CS present on that trial. A within-compound-association is established if two CSs are presented together during training (compound stimulus). If one of the two component CSs is subsequently presented alone, then it is assumed to activate a representation of the other (previously paired) CS as well. Van Hamme and Wassermann propose that stimuli indirectly activated through within-compound-associations have a negative learning parameter--thus phenomena of retrospective reevaluation can be explained.

Let's consider the following example, an experimental paradigm called `backward blocking´, indicative of retrospective revaluation, where AB is the compound stimulus A+B:

Phase 1: AB-US

Phase 2: A-US

Test trials: Group 1, which received both Phase 1- and 2-trials, elicits a weaker Conditioned Response (CR) compared to the Control group, which only received Phase 1-trials.

The original RW model cannot account for this effect. But the revised model can: In phase 2, stimulus B is indirectly activated through within-compound-association with A. But instead of a positive learning parameter (usually called alpha) when physically present, during Phase 2, B has a negative learning parameter. Thus during the second phase, B's associative strength declines whereas A's value increases because of its positive learning parameter.

Thus, the revised RW model can explain why the CR elicited by B after backward blocking training is weaker compared with AB-only conditioning.

Some failures of the RW Model

spontaneous recovery from extinction and recovery from extinction caused by reminder treatments (reinstatement): It is a well established observation that a time-out interval after completion of extinction results in partial recovery from extinction, i.e. the previously extinguished reaction or response recurs - but usually at a lower level than before extinction training. Reinstatement refers to the phenomenon that exposure to the US from training alone after completion of extinction results in partial recovery from extinction. The RW model can't account for those phenomena.

extinction of a previously conditioned inhibitor: The RW model predicts that repeated presentation of a conditioned inhibitor alone (a CS with negative associative strength) results in extinction of this stimulus (a decline of its negative associative value). This is a false prediction. Contrarily, experiments show the repeated presentation of a conditioned inhibitor alone even increases its inhibitory potential.

facilitated reacquisition after extinction: One of the assumptions of the model is that the history of conditioning of a CS does not have any influences on its present status - only its current associative value is important. Contrary to this assumption, many experiments show that stimuli that were first conditioned and then extinguished are more easily reconditioned (i.e. fewer trial are necessary for conditioning).

the exclusiveness of excitation and inhibition: The RW model also assumes that excitation and inhibition are opponent features. A stimulus can either have excitatory potential (a positive associative strength) or inhibitory potential (a negative associative strength). By contrast it is sometimes observed, that stimuli can have both qualities. One example is backward excitatory conditioning in which a CS is backwardly paired with the US (US-CS instead of CS-US). This usually makes the CS become a conditioned exctitor. But interestingly, the stimulus also has inhibitory features which can be proven by the retardation of acquisition test. This test is used to assess the inhibitory potential of a stimulus since it is observed that excitatory conditioning with a previously conditioned inhibitor is retarded. The backwardly conditioned stimulus passes this test and thus seems to have both excitatory and inhibitory features.

pairing a novel stimulus with a conditioned inhibitor: A conditioned inhibitor is assumed to have a negative associative value. By presenting an inhibitor with a novel stimulus (i.e. its associative strength is zero), the model predicts that the novel cue should become a conditioned excitor. This is not the case in experimental situations. The predictions of the model stem from its basic term (lambda-V). Since the summed associative strength of all stimuli (V) present on the trial is negative (zero + inhibitory potential) and lambda is zero (no US present), the resulting change in the associative strength is positive, thus making the novel cue a conditioned excitor.

CS-preexposure effect: The CS-preexposure effect (also called latent inhibition) is the well established observation that conditioning after exposure to the stimulus later used as the CS in conditioning is retarded. The RW model doesn't predict any effect of presenting a novel stimulus without a US.

higher-order conditioning: In higher order conditioning a previously conditioned CS is paired with a novel cue (i.e. first CS1-US then CS2-CS1). This usually makes the novel cue CS2 eliciting similar reactions like the CS1. The model cannot account for this phenomenon since during CS2-CS1 trials, no US is present. But by allowing CS1 acting similarly like a US, one can reconcile the model with this effect.

sensory preconditioning: Sensory preconditioning refers to first pairing two novel cues (CS1-CS2) and then pairing one of them with an US (CS2-US). This turns both CS1 and CS2 in conditioned excitors. The RW model cannot explain this, since during the CS1-CS2-phase both stimuli have an associative value of zero and lambda is also zero (no US present) which results in no change in the associative strength of theerences==

↑ Miller, Ralph R. (1995). Assessment of the Rescorla-Wagner Model. Psychological Bulletin 117 (3): 363–386.
↑ Miller, Ralph R. (1995). Assessment of the Rescorla-Wagner Model. Psychological Bulletin 117 (3): 363–386.
↑ Van Hamme, L.J., & Wasserman, E.A. (1994). Cue competition in causality judgements: The role of nonpresentation of compound stimulus elements. Learning and Motivation, 25, 127–151.

Rescorla, R. A., and Wagner, A. R. (1972) A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement, Classical Conditioning II, A. H. Black and W. F. Prokasy, Eds., pp. 64-99. Appleton-Century-Crofts.
Miller, R. R., Barnet, R. C., and Grahame, N. J. (1995) Assessment of the Rescorla-Wagner model, Psychological Bulletin, 117, pp. 363-386.
Van Hamme, L. J., & Wasserman, E. A. (1994) Cue competition in causality judgments: The role of nonrepresentation of compound stimulus elements, Learning and motivation, 25, pp. 127-151.

External links

Scholarpedia Rescorla Wagner model

Learning
Types of learning
Avoidance conditioning \| Classical conditioning \| Confidence-based learning \| Discrimination learning \| Emulation \| Experiential learning \| Escape conditioning \| Incidental learning \|Intentional learning \| Latent learning \| Maze learning \| Mastery learning \| Mnemonic learning \| Nonassociative learning \| Nonreversal shift learning \| Nonsense syllable learning \| Nonverbal learning \| Observational learning \| Omission training \| Operant conditioning \| Paired associate learning \| Perceptual motor learning \| Place conditioning \| Probability learning \| Rote learning \| Reversal shift learning \| Second-order conditioning \| Sequential learning \| Serial anticipation learning \| Serial learning \| Skill learning \| Sidman avoidance conditioning \| Social learning \| Spatial learning \| State dependent learning \| Social learning theory \| State-dependent learning \| Trial and error learning \| Verbal learning
Concepts in learning theory
Chaining \| Cognitive hypothesis testing \| Conditioning \| Conditioned responses \| Conditioned stimulus \| Conditioned suppression \| Constant time delay \| Counterconditioning \| Covert conditioning \| Counterconditioning \| Delayed alternation \| Delay reduction hypothesis \| Discriminative response \| Distributed practice \|Extinction \| Fast mapping \| Gagné's hierarchy \| Generalization (learning) \| Generation effect (learning) \| Habits \| Habituation \| Imitation (learning) \| Implicit repetition \| Interference (learning) \| Interstimulus interval \| Intermittent reinforcement \| Latent inhibition \| Learning schedules \| Learning rate \| Learning strategies \| Massed practice \| Modelling \| Negative transfer \| Overlearning \| Practice \| Premack principle \| Preconditioning \| Primacy effect \| Primary reinforcement \| Principles of learning \| Prompting \| Punishment \| Recall (learning) \| Recency effect \| Recognition (learning) \| Reconstruction (learning) \| Reinforcement \| Relearning \| Rescorla-Wagner model \| Response \| Reinforcement \| Secondary reinforcement \| Sensitization \| Serial position effect \| Serial recall \| Shaping \| Stimulus \| Reinforcement schedule \| Spontaneous recovery \| State dependent learning \| Stimulus control \| Stimulus generalization \| Transfer of learning \| Unconditioned responses \| Unconditioned stimulus
Animal learning
Cat learning \| Dog learning Rat learning
Neuroanatomy of learning

Neurochemistry of learning
Adenylyl cyclase
Learning in clinical settings
Applied Behavior Analysis \| Behaviour therapy \| Behaviour modification \| Delay of gratification \| CBT \| Desensitization \| Exposure Therapy \| Exposure and response prevention \| Flooding \| Graded practice \| Habituation \| Learning disabilities \| Reciprocal inhibition therapy \| Systematic desensitization \| Task analysis \| Time out
Learning in education
Adult learning \| Cooperative learning \| Constructionist learning \| Experiential learning \| Foreign language learning \| Individualised instruction \| Learning ability \| Learning disabilities \| Learning disorders \| Learning Management \| Learning styles \| Learning theory (education) \| Learning through play \| School learning \| Study habits
Machine learning
Temporal difference learning \| Q-learning
Philosophical context of learning theory
Behaviourism \| Connectionism \| Constructivism \| Functionalism \| Logical positivism \| Radical behaviourism
Prominant workers in Learning Theory\|-
Pavlov \| Hull \| Tolman \| Skinner \| Bandura \| Thorndike \| Skinner \| Watson
Miscellaneous\|-
Category:Learning journals \| Melioration theory
edit

This page uses Creative Commons Licensed content from Wikipedia (view authors).

[1] Miller, Ralph R. (1995). Assessment of the Rescorla-Wagner Model. Psychological Bulletin 117 (3): 363–386.

[2] Miller, Ralph R. (1995). Assessment of the Rescorla-Wagner Model. Psychological Bulletin 117 (3): 363–386.

[3] Van Hamme, L.J., & Wasserman, E.A. (1994). Cue competition in causality judgements: The role of nonpresentation of compound stimulus elements. Learning and Motivation, 25, 127–151.

[1]

[2]

[3]