Bibliography:

  1. ‘RL’ tag

  2. ‘AlphaStar’ tag

  3. ‘OA5’ tag

  4. Deep Reinforcement Learning Without Experience Replay, Target Networks, or Batch Updates

  5. Centaur: a foundation model of human cognition

  6. SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

  7. Training Language Models to Self-Correct via Reinforcement Learning

  8. Carpentopod: A Walking Table Project

  9. Mind Wandering During Implicit Learning Is Associated With Increased Periodic EEG Activity And Improved Extraction Of Hidden Probabilistic Patterns

  10. Alexa Is in Millions of Households—and Amazon Is Losing Billions

  11. Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

  12. Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

  13. CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

  14. ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

  15. Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

  16. Let Models Speak Ciphers: Multiagent Debate through Embeddings

  17. Predictive auxiliary objectives in deep RL mimic learning in the brain

  18. Small batch deep reinforcement learning

  19. Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

  20. Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems

  21. What Are Dreams For? Converging lines of research suggest that we might be misunderstanding something we do every night of our lives

  22. Learning to Model the World with Language

  23. Low-Poly Image Generation Using Evolutionary Algorithms in Ruby

  24. Using temperature to analyze the neural basis of a time-based decision

  25. Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

  26. Twitching in Sensorimotor Development from Sleeping Rats to Robots

  27. Universal Mechanical Polycomputation in Granular Matter

  28. Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning

  29. Improving Language Models with Advantage-based Offline Policy Gradients

  30. Reinforcement Learning in Newcomb-like Environments

  31. WizardLM: Empowering Large Language Models to Follow Complex Instructions

  32. Bridging Discrete and Backpropagation: Straight-Through and Beyond

  33. Empirical Design in Reinforcement Learning

  34. A circuit mechanism linking past and future learning through shifts in perception

  35. Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

  36. Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula

  37. Melting Pot 2.0

  38. Token Turing Machines

  39. Legged Locomotion in Challenging Terrains using Egocentric Vision

  40. Over-communicate no more: Situated RL agents learn concise communication protocols

  41. E3B: Exploration via Elliptical Episodic Bonuses

  42. Hyperbolic Deep Reinforcement Learning

  43. Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

  44. Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

  45. Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (ALM)

  46. Human-level Atari 200× faster

  47. Nearest Neighbor Non-autoregressive Text Generation

  48. A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning

  49. Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter

  50. Improved Policy Optimization for Online Imitation Learning

  51. Offline RL for Natural Language Generation with Implicit Language Q Learning

  52. Fine-grained Image Captioning with CLIP Reward

  53. Reward Bases: Instantaneous reward revaluation with temporal difference learning

  54. Is Vanilla Policy Gradient Overlooked? Analyzing Deep Reinforcement Learning for Hanabi

  55. Quantifying and alleviating political bias in language models

  56. Machine Learning Helps Control Tokamak Plasmas

  57. Retrieval-Augmented Reinforcement Learning

  58. Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

  59. A data-driven approach for learning to control computers

  60. Magnetic control of tokamak plasmas through deep reinforcement learning

  61. Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

  62. Learning Dynamics and Generalization in Deep Reinforcement Learning

  63. Agile Locomotion via Model-Free Learning

  64. Amortized Noisy Channel Neural Machine Translation

  65. Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs

  66. Simple but Effective: CLIP Embeddings for Embodied AI

  67. Offline Reinforcement Learning with Implicit Q-Learning (IQL)

  68. Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

  69. DroQ: Dropout Q-Functions for Doubly Efficient Reinforcement Learning

  70. Batch size-invariance for policy optimization

  71. MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

  72. Bootstrapped Meta-Learning

  73. Megaverse: Simulating Embodied Agents at One Million Experiences per Second

  74. PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

  75. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft

  76. On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning

  77. Constructions in combinatorics via neural networks

  78. Muesli: Combining Improvements in Policy Optimization

  79. Podracer architectures for scalable Reinforcement Learning

  80. Counter-Strike Deathmatch with Large-Scale Behavioral Cloning

  81. ALD: Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

  82. Replay in Deep Learning: Current Approaches and Missing Biological Elements

  83. Large Batch Simulation for Deep Reinforcement Learning

  84. The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

  85. Reinforcement Learning for Datacenter Congestion Control

  86. Training Larger Networks for Deep Reinforcement Learning

  87. How RL Agents Behave When Their Actions Are Modified

  88. A Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

  89. Randomized Ensembled Double Q-Learning (REDQ): Learning Fast Without a Model

  90. MLGO: a Machine Learning Guided Compiler Optimizations Framework

  91. Evolving Reinforcement Learning Algorithms

  92. Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments

  93. Autonomous navigation of stratospheric balloons using reinforcement learning

  94. A Unified Framework for Dopamine Signals across Timescales

  95. Offline Learning from Demonstrations and Unlabeled Experience

  96. Adversarial vulnerabilities of human decision-making

  97. D2RL: Deep Dense Architectures in Reinforcement Learning

  98. Human-centric Dialog Training via Offline Reinforcement Learning

  99. Emergent Social Learning via Multi-agent Reinforcement Learning

  100. Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning

  101. SPR: Data-Efficient Reinforcement Learning with Self-Predictive Representations

  102. Learning Breakout From RAM—Part 2

  103. cf3111a7bb49cf5e99ce25b62109942ddeedc683.html

  104. Learning Breakout From RAM—Part 1

  105. b907b453d5352bad19e330492afcbb28340fe38f.html

  106. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

  107. Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

  108. Conservative Q-Learning for Offline Reinforcement Learning

  109. Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics (TQC)

  110. Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners

  111. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

  112. Chip Placement with Deep Reinforcement Learning

  113. CURL: Contrastive Unsupervised Representations for Reinforcement Learning

  114. Evolving Normalization-Activation Layers

  115. Benchmarking End-to-End Behavioral Cloning on Video Games

  116. Agent57: Outperforming the Atari Human Benchmark

  117. Deep neuroethology of a virtual rodent

  118. Q Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison

  119. Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?

  120. Causal Evidence Supporting the Proposal That Dopamine Transients Function As Temporal Difference Prediction Errors

  121. A Distributional Code for Value in Dopamine-Based Reinforcement Learning

  122. Combining Q-Learning and Search with Amortized Value Estimates

  123. SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

  124. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

  125. QUARL: Quantized Reinforcement Learning (ActorQ)

  126. Deep Learning Enables Rapid Identification of Potent DDR1 Kinase Inhibitors

  127. Exponential slowdown for larger populations: The (μ+1)-EA on monotone functions

  128. Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field

  129. A View on Deep Reinforcement Learning in System Optimization

  130. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

  131. A General Dichotomy of Evolutionary Algorithms on Monotone Functions

  132. A Recipe for Training Neural Networks

  133. Universal quantum control through deep reinforcement learning

  134. Reinforcement Learning for Recommender Systems: A Case Study on Youtube

  135. Benchmarking Classic and Learned Navigation in Complex 3D Environments

  136. AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning

  137. Anxiety, Depression, and Decision Making: A Computational Perspective

  138. Reinforcement Learning in Artificial and Biological Systems

  139. Designing Neural Networks through Neuroevolution

  140. IRLAS: Inverse Reinforcement Learning for Architecture Search

  141. Quantifying Generalization in Reinforcement Learning

  142. Top-K Off-Policy Correction for a REINFORCE Recommender System

  143. Relative Entropy Regularized Policy Iteration

  144. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

  145. Neural probabilistic motor primitives for humanoid control

  146. InstaNAS: Instance-aware Neural Architecture Search

  147. A Closer Look at Deep Policy Gradients

  148. One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

  149. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

  150. Learning to Perform Local Rewriting for Combinatorial Optimization

  151. R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning

  152. Benchmarking Reinforcement Learning Algorithms on Real-World Robots

  153. Deterministic Implementations for Reproducibility in Deep Reinforcement Learning

  154. Multi-task Deep Reinforcement Learning with PopArt

  155. Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction

  156. Searching Toward Pareto-Optimal Device-Aware Neural Architectures

  157. A Study of Reinforcement Learning for Neural Machine Translation

  158. Improving Abstraction in Text Summarization

  159. Learning to Optimize Join Queries With Deep Reinforcement Learning

  160. InfoNCE: Representation Learning with Contrastive Predictive Coding (CPC)

  161. Is Q-learning Provably Efficient?

  162. Maximum a Posteriori Policy Optimization

  163. The Unusual Effectiveness of Averaging in GAN Training

  164. Resource-Efficient Neural Architect

  165. DVRL: Deep Variational Reinforcement Learning for POMDPs

  166. Playing Atari with Six Neurons

  167. Measuring the Intrinsic Dimension of Objective Landscapes

  168. DP4G: Distributed Distributional Deterministic Policy Gradients

  169. Optimizing Query Evaluations using Reinforcement Learning for Web Search

  170. Delayed Impact of Fair Machine Learning

  171. Accelerated Methods for Deep Reinforcement Learning

  172. Learning Memory Access Patterns

  173. Investigating Human Priors for Playing Video Game

  174. ME-TRPO: Model-Ensemble Trust-Region Policy Optimization

  175. TD3: Addressing Function Approximation Error in Actor-Critic Methods

  176. Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

  177. Unicorn: Continual Learning with a Universal, Off-policy Agent

  178. ENAS: Efficient Neural Architecture Search via Parameter Sharing

  179. Regularized Evolution for Image Classifier Architecture Search

  180. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

  181. Interactive Grounded Language Acquisition and Generalization in a 2D World

  182. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

  183. Chapter 5: Monte Carlo Methods

  184. 2ddcafc570cef087ed62b0113ee2917df3a4f33a.pdf#page=133

  185. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

  186. The Case for Learned Index Structures

  187. AI Safety Gridworlds

  188. Classification with Costly Features using Deep Reinforcement Learning

  189. Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

  190. Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarization

  191. Swish: Searching for Activation Functions

  192. Gradient-free Policy Architecture Search and Adaptation

  193. Rainbow: Combining Improvements in Deep Reinforcement Learning

  194. OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

  195. Deep Reinforcement Learning that Matters

  196. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

  197. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

  198. The successor representation in human reinforcement learning

  199. Practical Block-wise Neural Network Architecture Generation

  200. Learning Policies for Adaptive Tracking with Deep Feature Cascades

  201. Reinforced Video Captioning with Entailment Rewards

  202. A Distributional Perspective on Reinforcement Learning

  203. Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning

  204. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

  205. Efficient Architecture Search by Network Transformation

  206. Grammatical Error Correction with Neural Reinforcement Learning

  207. Noisy Networks for Exploration

  208. Gated-Attention Architectures for Task-Oriented Language Grounding

  209. The Persistence and Transience of Memory

  210. Deep reinforcement learning from human preferences § Appendix A.2: Atari

  211. Towards Synthesizing Complex Programs from Input-Output Examples

  212. IDK Cascades: Fast Deep Learning by Learning not to Overthink

  213. Teaching Machines to Describe Images via Natural Language Feedback

  214. Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks

  215. Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models

  216. Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

  217. A Deep Reinforced Model for Abstractive Summarization

  218. Inferring and Executing Programs for Visual Reasoning

  219. Time-Contrastive Networks: Self-Supervised Learning from Video

  220. RAM: Dynamic Computational Time for Visual Attention

  221. Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks (EPANNs)

  222. Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

  223. End-to-end optimization of goal-driven and visually grounded dialogue systems

  224. Neural Episodic Control

  225. CoDeepNEAT: Evolving Deep Neural Networks

  226. Tuning Recurrent Neural Networks with Reinforcement Learning

  227. PathNet: Evolution Channels Gradient Descent in Super Neural Networks

  228. The Kelly Coin-Flipping Game: Exact Solutions

  229. Deep Reinforcement Learning: A Brief Survey

  230. Your TL;DR by an AI: A Deep Reinforced Model for Abstractive Summarization

  231. Loss is its own Reward: Self-Supervision for Reinforcement Learning

  232. Self-critical Sequence Training for Image Captioning

  233. Neural Combinatorial Optimization with Reinforcement Learning

  234. Reinforcement Learning with Unsupervised Auxiliary Tasks

  235. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models

  236. Hybrid computing using a neural network with dynamic external memory

  237. Connecting Generative Adversarial Networks and Actor-Critic Methods

  238. Deep Reinforcement Learning for Mention-Ranking Coreference Models

  239. Deep Neural Networks for YouTube Recommendations

  240. The Malmo Platform for Artificial Intelligence Experimentation

  241. Progressive Neural Networks

  242. Learning to Optimize

  243. Deep Reinforcement Learning for Dialogue Generation

  244. ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning

  245. Learning from the memory of Atari 2600

  246. Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning

  247. Asynchronous Methods for Deep Reinforcement Learning

  248. Dueling Network Architectures for Deep Reinforcement Learning

  249. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

  250. Prioritized Experience Replay

  251. Deep Reinforcement Learning with Double Q-learning

  252. Gorila: Massively Parallel Methods for Deep Reinforcement Learning

  253. Reinforcement Learning Neural Turing Machines—Revised

  254. An Invitation to Imitation

  255. TRPO: Trust Region Policy Optimization

  256. DRAW: A Recurrent Neural Network For Image Generation

  257. Random feedback weights support learning in deep neural networks

  258. Learning to Execute

  259. Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective

  260. Playing Atari with Deep Reinforcement Learning

  261. The Arcade Learning Environment: An Evaluation Platform for General Agents

  262. Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting

  263. Off-Policy Actor-Critic

  264. Neural mechanisms of speed-accuracy tradeoff

  265. DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

  266. Compositional pattern producing networks: A novel abstraction of development

  267. Midbrain dopamine neurons encode a quantitative reward prediction error signal

  268. It Takes Two Neurons To Ride a Bicycle

  269. Recent Developments in the Evolution of Morphologies and Controllers for Physically Simulated Creatures § A Re-implementation of Sims’ Work Using the MathEngine Physics Engine

  270. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

  271. 6.6 Actor-Critic Methods

  272. Descriptor predictive control: Tracking controllers for a riderless bicycle

  273. Control for an autonomous bicycle

  274. Simple statistical gradient-following algorithms for connectionist reinforcement learning

  275. Proceedings of the First International Conference on Genetic Algorithms and Their Applications

  276. Temporal Credit Assignment In Reinforcement Learning

  277. Experiments on the Mechanization of Game-Learning Part II. Rule-Based Learning and the Human Window [BOXES]

  278. Why the Law of Effect Will Not Go Away

  279. Experiments on the Mechanization of Game-Learning Part I. Characterization of the Model and Its Parameters [MENACE]

  280. A Matchbox Game-Learning Machine

  281. Some Studies in Machine Learning Using the Game of Checkers

  282. John Schulman’s Homepage

  283. LA Residents Complain about ‘Waze Craze’

  284. 77ae4ad181dec464a36698d81c36549844f3a1c9.html

  285. Sutton & Barto Book: Reinforcement Learning: An Introduction

  286. Evolving Stable Strategies

  287. Finding Nash Equilibria through Simulation

  288. Trackmania I—The History of Machine Learning in Trackmania

  289. 7553baec9ca63d343fcdf9ef880861ba924d4b70.html

  290. The 37 Implementation Details of Proximal Policy Optimization

  291. Microsoft and Meta Join Google in Using AI to Help Run Their Data Centers

  292. 1031d0e9693d43e275b2a482c1b29278b19ca8c5.html

  293. Hedonic Loops and Taming RL

  294. Sony’s Racing Car AI Just Destroyed Its Human Competitors—By Being Nice (and Fast)

  295. 510e5df19155ad246f649d86fe4edf4d42406fc7.html

  296. Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Video]

  297. Measuring the Intrinsic Dimension of Objective Landscapes [Video]

  298. Zyme—An Evolvable Language

  299. 970ab0024c7aee9d4b66cf13e7dd9dfa6c4b1a48.html

  300. 2021-hessel-figure4-anakinsebulbatpupodperformance.jpg

  301. 2001-cook-figure2-chaoticdynamicsofunsteeredvirtualbicycleover800runs.png

  302. index.html

  303. http://incompleteideas.net/book/ebook/node45.html

  304. 528eb890c177e0a5d26694aea7131d449ca9022e.html

  305. http://vision.psych.umn.edu/groups/schraterlab/dearden98bayesian.pdf

  306. https://blog.ought.com/dalca-4d47a90edd92

  307. 8a06a3f7afb28912aed273184beb600d643b5f09.html

  308. https://github.com/Rolv-Arild/Necto

  309. https://github.com/bertdobbelaere/SorterHunter

  310. https://github.com/curiousjp/toy_sd_genetics?tab=readme-ov-file#toy_sd_genetics

  311. https://github.com/deepmind/acme/tree/master/acme/agents/tf/dmpo

  312. https://journals.sagepub.com/doi/10.1177/17456916231204811

  313. https://plato.stanford.edu/entries/selection-units/

  314. https://research.google/blog/quantization-for-fast-and-environmentally-sustainable-reinforcement-learning/

  315. https://spectrum.ieee.org/disney-robot

  316. https://web.archive.org/web/20140918110745/http://friggeri.net/blog/a-genetic-approach-to-css-compression/

  317. https://www.currentobituary.com/member/obit/282438

  318. https://www.everything2.net/index.pl?node_id=1190642

  319. https://www.lesswrong.com/posts/DKtWikjcdApRj3rWr/paper-understanding-and-controlling-a-maze-solving-policy

  320. https://www.lesswrong.com/posts/S54HKhxQyttNLATKu/deconfusing-direct-vs-amortised-optimization

  321. https://www.nature.com/articles/s41586-023-06419-4

  322. https://www.newyorker.com/magazine/1981/12/14/a-i

  323. https://www.quantamagazine.org/memories-help-brains-recognize-new-events-worth-remembering-20230517/

  324. https://www.reddit.com/r/MachineLearning/comments/18eh2hb/p_the_power_of_reinforcement_learning_look_how/

  325. ee94cfff7e02628c1af451cfcfd13a8aa2819848.html

  326. https://www.reddit.com/r/MachineLearning/comments/1anv7n4/p_ai_learns_pvp_in_old_school_runescape/

  327. b4baabf4b1eddf9dcd62e3a4ddb698611ab08287.html

  328. https://www.youtube.com/watch?v=DcYLT37ImBY

  329. Deep Reinforcement Learning Without Experience Replay, Target Networks, or Batch Updates

  330. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DyqQJGTDGXN.html

  331. Small batch deep reinforcement learning

  332. https%253A%252F%252Farxiv.org%252Fabs%252F2310.03882%2523deepmind.html

  333. Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems

  334. %252Fdoc%252Freinforcement-learning%252Fmodel%252F2023-gao.pdf.html

  335. Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

  336. https%253A%252F%252Farxiv.org%252Fabs%252F2306.13831.html

  337. Universal Mechanical Polycomputation in Granular Matter

  338. https%253A%252F%252Farxiv.org%252Fabs%252F2305.17872.html

  339. WizardLM: Empowering Large Language Models to Follow Complex Instructions

  340. https%253A%252F%252Farxiv.org%252Fabs%252F2304.12244.html

  341. Legged Locomotion in Challenging Terrains using Egocentric Vision

  342. https%253A%252F%252Farxiv.org%252Fabs%252F2211.07638.html

  343. Hyperbolic Deep Reinforcement Learning

  344. https%253A%252F%252Farxiv.org%252Fabs%252F2210.01542%2523twitter.html

  345. Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

  346. Hannaneh Hajishirzi—University of Washington

  347. https%253A%252F%252Farxiv.org%252Fabs%252F2210.01241.html

  348. Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (ALM)

  349. Sergey Levine

  350. https%253A%252F%252Farxiv.org%252Fabs%252F2209.08466.html

  351. Human-level Atari 200× faster

  352. https%253A%252F%252Farxiv.org%252Fabs%252F2209.07550%2523deepmind.html

  353. Quantifying and alleviating political bias in language models

  354. Jason Wei

  355. %252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252F2%252F2022-liu-3.pdf.html

  356. Magnetic control of tokamak plasmas through deep reinforcement learning

  357. David Pfau

  358. Koray Kavukcuoglu

  359. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41586-021-04301-9%2523deepmind.html

  360. Learning Dynamics and Generalization in Deep Reinforcement Learning

  361. https%253A%252F%252Fproceedings.mlr.press%252Fv162%252Flyle22a%252Flyle22a.pdf.html

  362. PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

  363. Luke Metz

  364. Jascha Sohl-Dickstein

  365. https%253A%252F%252Fproceedings.mlr.press%252Fv139%252Fvicol21a.html.html

  366. Podracer architectures for scalable Reinforcement Learning

  367. https%253A%252F%252Farxiv.org%252Fabs%252F2104.06272%2523deepmind.html

  368. The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

  369. https%253A%252F%252Farxiv.org%252Fabs%252F2103.01955.html

  370. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

  371. https%253A%252F%252Farxiv.org%252Fabs%252F2004.13649.html

  372. Deep neuroethology of a virtual rodent

  373. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DSyxrxR4KPS%2523deepmind.html

  374. Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?

  375. https%253A%252F%252Farxiv.org%252Fabs%252F2003.01629.html

  376. SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

  377. https%253A%252F%252Farxiv.org%252Fabs%252F1910.06591%2523deepmind.html

  378. QUARL: Quantized Reinforcement Learning (ActorQ)

  379. https%253A%252F%252Farxiv.org%252Fabs%252F1910.01055%2523google.html

  380. A Recipe for Training Neural Networks

  381. https%253A%252F%252Fkarpathy.github.io%252F2019%252F04%252F25%252Frecipe%252F.html

  382. R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning

  383. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253Dr1lyTjAqYX%2523deepmind.html

  384. The Unusual Effectiveness of Averaging in GAN Training

  385. https%253A%252F%252Farxiv.org%252Fabs%252F1806.04498.html

  386. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

  387. Jeff Clune—Professor—Computer Science—University of British Columbia

  388. https%253A%252F%252Farxiv.org%252Fabs%252F1712.06567%2523uber.html

  389. Rainbow: Combining Improvements in Deep Reinforcement Learning

  390. https%253A%252F%252Farxiv.org%252Fabs%252F1710.02298%2523deepmind.html

  391. The Persistence and Transience of Memory

  392. https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS0896627317303653.html

  393. The Kelly Coin-Flipping Game: Exact Solutions

  394. Gwern.net Homepage

    [Transclude the forward-link's context]

  395. %252Fcoin-flip.html

  396. Self-critical Sequence Training for Image Captioning

  397. https%253A%252F%252Farxiv.org%252Fabs%252F1612.00563.html

  398. It Takes Two Neurons To Ride a Bicycle

  399. %252Fdoc%252Freinforcement-learning%252Fmodel-free%252F2004-cook.pdf.html

  400. Recent Developments in the Evolution of Morphologies and Controllers for Physically Simulated Creatures § A Re-implementation of Sims’ Work Using the MathEngine Physics Engine

  401. %252Fdoc%252Fai%252F2001-taylor.pdf%2523page%253D6.html