“Learning Few-Shot Imitation As Cultural Transmission”, Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett, Alexandre Fréchette, Yanko Gitahy Oliveira, Edward Hughes, Kory W. Mathewson, Piermaria Mendolicchio, Julia Pawar, Miruna Pȋslar, Alex Platonov, Evan Senter, Sukhdeep Singh, Alexander Zacherl, Lei M. Zhang2023-11-28 (, , , )⁠:

[Twitter; video #31; blog] Cultural transmission is the domain-general social skill that allows agents to acquire and use information from each other in real-time with high fidelity and recall. It can be thought of as the process that perpetuates fit variants in cultural evolution. In humans, cultural evolution has led to the accumulation and refinement of skills, tools and knowledge across generations.

We provide a method for generating cultural transmission in artificially intelligent agents, in the form of few-shot imitation.

Our agents succeed [in a rich Unity-based 3D physical simulation, GoalCycle3D] at real-time imitation of a human in novel contexts without using any pre-collected human data. We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission and develop an evaluation methodology for rigorously assessing it.

This paves the way for cultural evolution to play an algorithmic role in the development of artificial general intelligence.

[Preprint published as “Learning Robust Real-Time Cultural Transmission without Human Data”, Cultural General Intelligence Team et al 2022.]

…Via careful ablations, we identify a minimal sufficient “starter kit” of training ingredients required for cultural transmission to emerge in GoalCycle3D, namely:

We refer to this collection by the acronym MEDAL-ADR:

These components are ablated in turn in § The role of memory, expert demonstrations and attention loss to § ADR for cultural transmission in complex worlds: only when all of them are acting in concert does robust cultural transmission arise in complex worlds.

…We find that cultural transmission generalizes outside the training distribution, and that agents recall demonstrations within a single episode long after the expert has departed. Introspecting the agent’s “brain”, we find strikingly interpretable neurons responsible for encoding social information and goal states.

Agents recall expert demonstrations with high fidelity: To demonstrate the recall capabilities of our best-performing agent, we quantify its performance across a set of tasks where the expert drops out. The intuition here is that if our agent is able to recall information well, then its score will remain high for many timesteps even after the expert has dropped out. However, if the agent is simply following the expert or has poor recall, then its score will instead drop immediately close to zero.

To our knowledge, within-episode recall of a third-person demonstration has not previously been shown to arise from reinforcement learning. This is an important discovery, since the recent history of AI research has demonstrated the increased flexibility and generality of learned behaviors over pre-programmed ones. What’s more, third-person recall within an episode amortizes imitation onto a timescale of seconds and does not require perspective matching between co-players. As such, we achieve the fast adaptation benefits of previous first-person few-shot imitation works (eg.22, 43, 44) but as a general-purpose emergent property from third-person RL rather than via a special-purpose first-person imitation algorithm.