Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values

Open Access

28 October 2009

journal article
Published by Society for Neuroscience in Journal of Neuroscience

Vol. 29 (43), 13524-13531
https://doi.org/10.1523/jneurosci.2469-09.2009

Abstract

Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning—such as prediction error signals for action valuation associated with dopamine and the striatum—can cope with this “curse of dimensionality.” We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to “divide and conquer” reinforcement learning over high-dimensional action spaces.

Keywords

This publication has 62 references indexed in Scilit:

Circular analysis in systems neuroscience: the dangers of double dipping
Nature Neuroscience, 2009
The discovery of structural form
Proceedings of the National Academy of Sciences, 2008
Value Representations in the Primate Striatum during Matching Behavior
Neuron, 2008
Free choice activates a decision circuit between frontal and parietal cortex
Nature, 2008
Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards
Nature Neuroscience, 2007
Posterior Parietal Cortex Encodes Autonomously Selected Motor Plans
Neuron, 2007
Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans
Nature, 2006
Dorsal Premotor Neurons Encode the Relative Position of the Hand, Eye, and Goal during Reach Planning
Neuron, 2006
Cortical substrates for exploratory decisions in humans
Nature, 2006
Coding of intention in the posterior parietal cortex
Nature, 1997

Cited by 123 articles