Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Stimulus representation and the timing of reward-prediction errors in models of the dopamine system

Tools
- Tools
+ Tools

Ludvig, Elliot Andrew, Sutton, Richard S. and Kehoe, E. James (2008) Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Computation, Volume 20 (Number 12). pp. 3034-3054. doi:10.1162/neco.2008.11-07-654

[img]
Preview
Text
WRAP_Ludvig_neco%2E2008%2E11-07-654.pdf - Published Version

Download (1168Kb) | Preview
Official URL: http://dx.doi.org/10.1162/neco.2008.11-07-654

Request Changes to record.

Abstract

The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus representation for the TD model. In our model, all external stimuli, including rewards, spawn a series of internal microstimuli, which grow weaker and more diffuse over time. These microstimuli are used by the TD learning algorithm to generate predictions of future reward. This new stimulus representation injects temporal generalization into the TD model and enhances correspondence between model and data in several experiments, including those when rewards are omitted or received early. This improved fit mostly derives from the absence of large negative errors in the new model, suggesting that dopamine alone can encode the full range of TD errors in these situations.

Item Type: Journal Item
Subjects: Q Science > QA Mathematics
Q Science > QP Physiology
Divisions: Faculty of Science > Psychology
Library of Congress Subject Headings (LCSH): Dopamine, Dopaminergic mechanisms, Prediction theory
Journal or Publication Title: Neural Computation
Publisher: MIT Press
ISSN: 0899-7667
Official Date: 30 October 2008
Dates:
DateEvent
30 October 2008Published
Volume: Volume 20
Number: Number 12
Page Range: pp. 3034-3054
DOI: 10.1162/neco.2008.11-07-654
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Restricted or Subscription Access
Funder: iCORE (Alta.), Natural Sciences and Engineering Research Council of Canada (NSERC)

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us