“Value-Free Random Exploration Is Linked to Impulsivity”, 2022-08-04 ():
Deciding whether to forgo a good choice in favour of exploring a potentially more rewarding alternative is one of the most challenging arbitrations both in human reasoning and in artificial intelligence. Humans show substantial variability in their exploration, and theoretical (but only limited empirical) work has suggested that excessive exploration is a critical mechanism underlying the psychiatric dimension of impulsivity [ADHD symptoms measured by ASRS; and BIS, LSAS, STAI, IUS, OCIR, SDS, CFS, AQ10].
In this registered report, we put these theories to test using large online samples, dimensional analyses, and computational modeling. Capitalising on recent advances in disentangling distinct human exploration strategies, we not only demonstrate that:
impulsivity is associated with a specific form of exploration—value-free random exploration—but also explore links between exploration and other psychiatric dimensions.
…In our current data, we confirmed that our participants used a mixture of resource-requiring complex strategies and computationally light heuristics. The resource-demanding strategies (such as Thompson sampling or UCB) demand keeping track of expected means and uncertainties across the different choice options. The computationally lighter heuristic strategies, namely value-free random exploration (captured by ϵ-greedy) and novelty exploration (captured using a novelty bonus η), although being less optimal, require substantially less computational power, making them very useful in practice.
Using model comparison as well as model simulations, we were able to demonstrate the presence of both complex and heuristic exploration strategies. The winning model, combining complex Thompson with novelty (η) and value-free random (ϵ) exploration, was not entirely distinguishable from the 2nd winning model, combining complex UCB with novelty and value-free random exploration, but was well distinguishable from other models (cf. confusion matrix, Supplementary Figure 6b) with relatively high confidence regarding its generative origins (cf. inversion matrix, Supplementary Figure 6c). This suggests that the two complex exploration strategies make similar predictions in our task, preventing us to disentangle them properly. However, we capture similar amounts of value-free random exploration, irrespective of the complex model used, demonstrating the robustness of our result.
Our results therefore show that participants supplemented complex strategies (UCB or Thompson sampling) with two heuristic strategies. Given that we find an association between value-free random exploration and impulsivity irrespective of the complex model used, this does not impact the conclusions in the given study.
…Overall, our findings suggest at least two roles for exploration in impulsivity: a more flexible way of exploration which does not rely on (potentially wrong) prior knowledge and a way to circumvent mental effort. Importantly, value-free random exploration is used by all participants in a goal-directed manner (ie. they used it more when exploration was beneficial). This means that participants adapt their usage of value-free random exploration to the demands of the task.