On GPT-3: meta-learning, scaling, implications, and deep theory. The scaling hypothesis: neural nets absorb data & compute, generalizing and becoming more Bayesian as problems get harder, manifesting new abilities even at trivial-by-global-standards-scale. The deep learning revolution has begun as foretold.
Powerful generative models like GPT-3 learn to imitate agents and thus become agents when prompted appropriately. This is an inevitable consequence of training on huge amounts of human-generated data. This can be a problem.
Is human data (or moral equivalents like DRL agents) necessary, and other kinds of data, such as physics data, free of this problem? (And so a safety strategy of filtering data could reduce or eliminate hidden agency.)
I argue no: agency is not discrete or immaterial, but an ordinary continuum of capability, useful to a generative model in many contexts beyond those narrowly defined as ‘agents’, such as in the “intentional stance” or variational approaches to solving physics problems. Much like other DRL-elicited capabilities like meta-learning, memory, exploration, or reasoning, ‘agency’ is a useful tool for a large family of problems, and a powerful model applied to that family may, at some point, develop concepts of agency or theory of mind etc.
Thus, a very wide range of problems, at scale, may surprisingly induce emergent agency.