Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Interpretable Contrastive Monte Carlo Tree Search Reasoning
OpenAI co-founder Sutskever’s new safety-focused AI startup SSI raises $1 billion
The brain simulates actions and their consequences during REM sleep
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
DT-VIN: Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
MCTSr: Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMA-3-8B
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
From r to Q✱: Your Language Model is Secretly a Q-Function
Identifying general reaction conditions by bandit optimization
ReCoRe: Regularized Contrastive Representation Learning of World Model
Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games
Why Won’t OpenAI Say What the Q✱ Algorithm Is? Supposed AI breakthroughs are frequently veiled in secrecy, hindering scientific consensus
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
The neural basis of mental navigation in rats: A brain–machine interface demonstrates volitional control of hippocampal activity
Volitional activation of remote place representations with a hippocampal brain–machine interface
Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Predictive auxiliary objectives in deep RL mimic learning in the brain
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
Improving Long-Horizon Imitation Through Instruction Prediction
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)
Long-Term Value of Exploration: Measurements, Findings and Algorithms
Emergence of belief-like representations through reinforcement learning
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
Graph schemas as abstractions for transfer learning, inference, and planning
John Carmack’s ‘Different Path’ to Artificial General Intelligence
Merging enzymatic and synthetic chemistry with computational synthesis planning
PALMER: Perception-Action Loop with Memory for Long-Horizon Planning
Space is a latent [CSCG] sequence: Structured sequence learning as a unified theory of representation in the hippocampus
CICERO: Human-level play in the game of Diplomacy by combining language models with strategic reasoning
Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning
Top-down design of protein nanomaterials with reinforcement learning
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (ALM)
PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations
Learning with Combinatorial Optimization Layers: a Probabilistic Approach
Spatial representation by ramping activity of neurons in the retrohippocampal cortex
Inner Monologue: Embodied Reasoning through Planning with Language Models
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Semantic Exploration from Language Abstractions and Pretrained Representations
Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning
Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances
Reinforcement Learning with Action-Free Pre-Training from Videos
On-the-fly Strategy Adaptation for ad-hoc Agent Coordination
VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning
Learning Synthetic Environments and Reward Networks for Reinforcement Learning
How to build a cognitive map: insights from models of the hippocampal formation
LID: Pre-Trained Language Models for Interactive Decision-Making
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
What is the point of computers? A question for pure mathematicians
An Experimental Design Perspective on Model-Based Reinforcement Learning
Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates
Learning Representations for Pixel-based Control: What Matters and Why?
Learning Behaviors through Physics-driven Latent Imagination
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies
Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks
Dropout’s Dream Land: Generalization from Learned Simulators to Reality
Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation
Planning for Novelty: Width-Based Algorithms for Common Problems in Control, Planning and Reinforcement Learning
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain
Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing
Replaying real life: how the Waymo Driver avoids fatal human crashes
Learning Chess Blindfolded: Evaluating Language Models on State Tracking
COMBO: Conservative Offline Model-Based Policy Optimization
A✱ Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks
Inductive Biases for Deep Learning of Higher-Level Cognition
Multimodal dynamics modeling for off-road autonomous vehicles
What are the Statistical Limits of Offline RL with Linear Function Approximation?
The Overfitted Brain: Dreams evolved to assist generalization
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning
Mathematical Reasoning via Self-supervised Skip-tree Training
Learning to Simulate Dynamic Environments with GameGAN [homepage]
Learning to Fly via Deep Model-Based Reinforcement Learning
Introducing Dreamer: Scalable Reinforcement Learning Using World Models
Reinforcement Learning for Combinatorial Optimization: A Survey
Learning to Prove Theorems by Learning to Generate Theorems
Combining Q-Learning and Search with Amortized Value Estimates
Dream to Control: Learning Behaviors by Latent Imagination
Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions
Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
An Application of Reinforcement Learning to Aerobatic Helicopter Flight
When to Trust Your Model: Model-Based Policy Optimization (MOPO)
VISR: Fast Task Inference with Variational Intrinsic Successor Features
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search
The Alignment Problem for Bayesian History-Based Reinforcement Learners
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Mining gold from implicit models to improve likelihood-free inference
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
Differentiable Dynamic Programming for Structured Prediction and Attention
How to Explore Chemical Space Using Algorithms and Automation
Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI
Generalization Guides Human Exploration in Vast Decision Spaces
Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Learning Transferable Architectures for Scalable Image Recognition
Path Integral Networks: End-to-End Differentiable Optimal Control
Visual Semantic Planning using Deep Successor Representations
AIXIjs: A Software Demo for General Reinforcement Learning
DeepArchitect: Automatically Designing and Training Deep Architectures
Stochastic Constraint Programming as Reinforcement Learning
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Learning to Win by Reading Manuals in a Monte-Carlo Framework
Whatever next? Predictive brains, situated agents, and the future of cognitive science
PUCT: Continuous Upper Confidence Trees with Polynomial Exploration-Consistency
Evolution And Episodic Memory: An Analysis And Demonstration Of A Social Function Of Episodic Recollection
Policy Mining: Learning Decision Policies from Fixed Sets of Data
The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions
Approximate Bayes Optimal Policy Search Using Neural Networks
Introducing ‘Computer Use’, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku
Best-Of-n With Misaligned Reward Models for Math Reasoning
2021-scanlon-waymoaccidentavoidance-worldsimreconstruction-case_AZ1796255_2_surfels_noagent_cropped-2021-03-05_12_56_39.mp4
2010-silver-figure1-illustrationofpomcpmctssearchoverapomdp.png
http://www.alpha60.de/research/programming_enter/DavidLink_ProgrammingEnter_ComputerResurrection60_2012.pdf
https://github.com/KeeyanGhoreshi/PokemonFireredSingleSequence
https://if50.substack.com/p/christopher-strachey-and-the-dawn
https://journals.sagepub.com/doi/10.1177/17456916231204811
https://netflixtechblog.com/artwork-personalization-c589f074ad76
https://www.aboutwayfair.com/careers/tech-blog/contextual-bandit-for-marketing-treatment-optimization
https://www.bkgm.com/articles/Berliner/ComputerBackgammon/index.html
https://www.dwarkeshpatel.com/p/demis-hassabis#%C2%A7timestamps
https://www.freepatentsonline.com/y2024/0104353.html#deepmind
https://www.instacart.com/company/how-its-made/using-contextual-bandit-models-in-large-action-spaces-at-instacart/
https://www.lesswrong.com/posts/S54HKhxQyttNLATKu/deconfusing-direct-vs-amortised-optimization
https://www.lesswrong.com/posts/ZwshvqiqCvXPsZEct/the-learning-theoretic-agenda-status-2023
https://www.lesswrong.com/posts/nmxzr2zsjNtjaHh7x/actually-othello-gpt-has-a-linear-emergent-world
https://www.quantamagazine.org/electric-ripples-in-the-resting-brain-tag-memories-for-storage-20240521/
Interpretable Contrastive Monte Carlo Tree Search Reasoning
OpenAI co-founder Sutskever’s new safety-focused AI startup SSI raises $1 billion
https%253A%252F%252Fwww.reuters.com%252Ftechnology%252Fartificial-intelligence%252Fopenai-co-founder-sutskevers-new-safety-focused-ai-startup-ssi-raises-1-billion-2024-09-04%252F.html
DT-VIN: Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
https%253A%252F%252Farxiv.org%252Fabs%252F2406.08404%2523schmidhuber.html
MCTSr: Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMA-3-8B
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
From r to Q✱: Your Language Model is Secretly a Q-Function
Identifying general reaction conditions by bandit optimization
%252Fdoc%252Freinforcement-learning%252Fmodel%252F2024-wang-2.pdf.html
Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DpsXVkKO9No%2523deepmind.html
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems
%252Fdoc%252Freinforcement-learning%252Fmodel%252F2023-gao.pdf.html
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)
https%253A%252F%252Farxiv.org%252Fabs%252F2306.04930%2523microsoft.html
https%253A%252F%252Farxiv.org%252Fabs%252F2301.04104%2523deepmind.html
Merging enzymatic and synthetic chemistry with computational synthesis planning
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-022-35422-y.html
CICERO: Human-level play in the game of Diplomacy by combining language models with strategic reasoning
%252Fdoc%252Freinforcement-learning%252Fimperfect-information%252Fdiplomacy%252F2022-bakhtin.pdf.html
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (ALM)
Inner Monologue: Embodied Reasoning through Planning with Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2207.05608%2523google.html
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F2206.11795%2523openai.html
https%253A%252F%252Farxiv.org%252Fabs%252F2206.04114%2523google.html
Semantic Exploration from Language Abstractions and Pretrained Representations
https%253A%252F%252Farxiv.org%252Fabs%252F2204.05080%2523deepmind.html
Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances
https%253A%252F%252Farxiv.org%252Fabs%252F2204.01691%2523google.html
Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation
https%253A%252F%252Farxiv.org%252Fabs%252F2106.13281%2523google.html
Replaying real life: how the Waymo Driver avoids fatal human crashes
https%253A%252F%252Fwaymo.com%252Fblog%252F2021%252F03%252Freplaying-real-life%252F.html
Introducing Dreamer: Scalable Reinforcement Learning Using World Models
https%253A%252F%252Fresearch.google%252Fblog%252Fintroducing-dreamer-scalable-reinforcement-learning-using-world-models%252F.html
%252Fdoc%252Freinforcement-learning%252Fimitation-learning%252F2018-gudmundsson.pdf.html
The Alignment Problem for Bayesian History-Based Reinforcement Learners
%252Fdoc%252Freinforcement-learning%252Fmodel%252F2018-everitt.pdf.html
%252Fdoc%252Freinforcement-learning%252Fmodel%252F2010-silver.pdf.html
%252Fdoc%252Freinforcement-learning%252Fmodel%252F2001-cazenave.pdf.html
Wikipedia Bibliography: