“A Unified Framework for Dopamine Signals across Timescales”, 2020-11-27 (; similar):
Temporal difference (TD) error is a powerful teaching signal in machine learning
Teleport and speed manipulations are used to characterize dopamine signals in mice
Slowly ramping as well as phasic dopamine responses convey TD errors
Dopamine neurons compute TD error or changes in value on a moment-by-moment basis
Rapid phasic activity of midbrain dopamine neurons is thought to signal reward prediction errors (RPEs), resembling temporal difference errors used in machine learning. However, recent studies describing slowly increasing dopamine signals have instead proposed that they represent state values and arise independent from somatic spiking activity. Here we developed experimental paradigms using virtual reality that disambiguate RPEs from values. We examined dopamine circuit activity at various stages, including somatic spiking, calcium signals at somata and axons, and striatal dopamine concentrations. Our results demonstrate that ramping dopamine signals are consistent with RPEs rather than value, and this ramping is observed at all stages examined. Ramping dopamine signals can be driven by a dynamic stimulus that indicates a gradual approach to a reward. We provide an unified computational understanding of rapid phasic and slowly ramping dopamine signals: dopamine neurons perform a derivative-like computation over values on a moment-by-moment basis.
View PDF (29MB):