“Has Dynamic Programming Improved Decision Making?”, 2018-08-22 (; backlinks; similar):
Dynamic programming (DP) is an extremely powerful tool for solving a wide class of sequential decision making problems under uncertainty. In principle, it enables us to compute optimal decision rules that specify the best possible decision to take in any given situation. This article reviews developments in DP and contrasts its revolutionary impact on economics, operations research, engineering, and artificial intelligence, with the comparative paucity of real world applications where DP is actually used to improve decision making. I discuss the literature on numerical solution of DPs and its connection to the literature on reinforcement learning (RL) and artificial intelligence (AI).
Despite amazing, highly publicized successes of these algorithms that result in superhuman levels of performance in board games such as chess or Go, I am not aware of comparably successful applications of DP for helping individuals and firms to solve real-world problems. I point to the fuzziness of many real world decision problems and the difficulty in mathematically formulating and modeling them as key obstacles to wider application of DP to improve decision making. Nevertheless, I provide several success stories where DP has demonstrably improved decision making and discuss a number of other examples where it seems likely that the application of DP could have substantial value.
I conclude that ‘applied DP’ offers substantial promise for economic policy making if economists can let go of the empirically untenable assumption of unbounded rationality and try to tackle the challenging decision problems faced every day by individuals and firms.
[Keywords: actor-critic algorithms, Alpha Zero, approximate dynamic programming, artificial intelligence, behavioral economics, Bellman equation, bounded rationality, curse of dimensionality, computational complexity, decision rules, dynamic pricing, dynamic programming, employee compensation, Herbert Simon, fleet sizing, identification problem, individual and firm behavior life-cycle problem, locomotive allocation, machine learning, Markov decision processes, mental models, model-free learning, neural networks, neurodynamic programming, offline versus online training, optimal inventory management, optimal replacement, optimal search, principle of decomposition, Q-learning, revenue management, real-time dynamic programming, reinforcement learning, Richard Bellman, structural econometrics, supervised versus unsupervised learning]