“Hidden Incentives for Auto-Induced Distributional Shift”, David Krueger, Tegan Maharaj, Jan Leike2020-09-19 (, , )⁠:

Decisions made by machine learning systems have increasing influence on the world, yet it is common for machine learning algorithms to assume that no such influence exists. An example is the use of the i.i.d. assumption in content recommendation. In fact, the (choice of) content displayed can change users’ perceptions and preferences, or even drive them away, causing a shift in the distribution of users. We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs.

Our goal is to ensure that machine learning systems do not leverage ADS to increase performance when doing so could be undesirable. We demonstrate that changes to the learning algorithm, such as the introduction of meta-learning, can cause hidden incentives for auto-induced distributional shift (HI-ADS) to be revealed.

To address this issue, we introduce ‘unit tests’ and a mitigation strategy for HI-ADS, as well as a toy environment for modeling real-world issues with HI-ADS in content recommendation, where we demonstrate that strong meta-learners achieve gains in performance via ADS.

We show meta-learning and Q-learning both sometimes fail unit tests, but pass when using our mitigation strategy.

…In both the RL and SL unit tests, we find that ‘vanilla’ learning algorithms (eg. minibatch SGD) pass the test, but introducing an outer-loop of meta-learning (eg. Population-Based Training (PBT) (Jaderberg et al 2017)) can lead to high levels of failure. We find results consistent with our unit tests in the content recommendation environment: recommenders trained with PBT create earlier, faster, and larger drift in user interests, and for the same level of performance, create larger changes in the user base. These results suggest that failure of our unit tests indicates that an algorithm is prone to revealing HI-ADS in other settings.

Finally, we propose and test a mitigation strategy we call context swapping. The strategy consists of rotating learners through different environments throughout learning, so that they can’t see the results or correlations of their actions in one environment over longer time horizons. This effectively mitigates HI-ADS in our unit test environments, but did not work well in content recommendation experiments. [“Capabilities generalize further than alignment”—content recommendation is much richer than their toy problem, so it may be harder to avoid learning general convergent strategies of empowerment & manipulation?]