“Learning Few-Shot Imitation As Cultural Transmission”, 2023-11-28 ():
[Twitter; video #31; blog] Cultural transmission is the domain-general social skill that allows agents to acquire and use information from each other in real-time with high fidelity and recall. It can be thought of as the process that perpetuates fit variants in cultural evolution. In humans, cultural evolution has led to the accumulation and refinement of skills, tools and knowledge across generations.
We provide a method for generating cultural transmission in artificially intelligent agents, in the form of few-shot imitation.
Our agents succeed [in a rich Unity-based 3D physical simulation, GoalCycle3D] at real-time imitation of a human in novel contexts without using any pre-collected human data. We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission and develop an evaluation methodology for rigorously assessing it.
This paves the way for cultural evolution to play an algorithmic role in the development of artificial general intelligence.
[Preprint published as “Learning Robust Real-Time Cultural Transmission without Human Data”, Cultural General Intelligence et al 2022.]
…Via careful ablations, we identify a minimal sufficient “starter kit” of training ingredients required for cultural transmission to emerge in GoalCycle3D, namely:
function approximation,
memory (M),
the presence of an expert co-player (E),
expert dropout (D),
attentional bias towards the expert (AL), and
automatic domain randomization (ADR).
We refer to this collection by the acronym MEDAL-ADR:
[RNN]
Memory is implemented as an LSTM network in the agent architecture.
Our expert co-players are hard-coded bots, and are dropped in and out probabilistically during training episodes.
This probabilistic dropout provides the right experience for agents to learn to observe what a useful demonstrator is doing and then remember and reproduce it when the demonstrator is absent.
Attentional bias towards the expert is learned via an auxiliary loss to predict the position of the co-player.
ADR gradually expands the distribution of tasks on which an agent trains, while maintaining a high cultural transmission capability.
These components are ablated in turn in § The role of memory, expert demonstrations and attention loss to § ADR for cultural transmission in complex worlds: only when all of them are acting in concert does robust cultural transmission arise in complex worlds.
…We find that cultural transmission generalizes outside the training distribution, and that agents recall demonstrations within a single episode long after the expert has departed. Introspecting the agent’s “brain”, we find strikingly interpretable neurons responsible for encoding social information and goal states.
…Agents recall expert demonstrations with high fidelity: To demonstrate the recall capabilities of our best-performing agent, we quantify its performance across a set of tasks where the expert drops out. The intuition here is that if our agent is able to recall information well, then its score will remain high for many timesteps even after the expert has dropped out. However, if the agent is simply following the expert or has poor recall, then its score will instead drop immediately close to zero.
To our knowledge, within-episode recall of a third-person demonstration has not previously been shown to arise from reinforcement learning. This is an important discovery, since the recent history of AI research has demonstrated the increased flexibility and generality of learned behaviors over pre-programmed ones. What’s more, third-person recall within an episode amortizes imitation onto a timescale of seconds and does not require perspective matching between co-players. As such, we achieve the fast adaptation benefits of previous first-person few-shot imitation works (eg.22, 43, 44) but as a general-purpose emergent property from third-person RL rather than via a special-purpose first-person imitation algorithm.