“Classical Planning Algorithms on the Atari Video Games”, Nir Lipovetzky, Miquel Ramirez, Hector Geffner2015-07 ()⁠:

The Atari 2600 games supported in the Arcade Learning Environment [Bellemare et al 2013] all feature a known initial (RAM) state and actions that have deterministic effects. Classical planners, however, cannot be used off-the-shelf as there is no compact PDDL-model of the games, and action effects and goals are not known a priori. Indeed, there are no explicit goals, and the planner must select actions on-line while interacting with a simulator that returns successor states and rewards. None of this precludes the use of blind lookahead algorithms for action selection like breadth-first search or Dijkstra’s method yet such methods are not effective over large state spaces.

We thus turn to a different class of classical planning algorithms introduced recently that perform a structured exploration of the state space; namely, like breadth-first search and Dijkstra’s algorithm they are “blind” and hence do not require prior knowledge of state transitions, costs (rewards) or goals, and yet, like heuristic search algorithms, they have been shown to be effective for solving problems over huge state spaces.

The simplest such algorithm, called Iterated Width (IW), consists of a sequence of calls IW(1), IW(2), …, IW(k) where IW(i) is a breadth-first search in which a state is pruned when it is not the first state in the search to make true some subset of i atoms.

The empirical results over 54 ALE games suggest that the performance of IW with the k parameter fixed to 1, i.e. IW(1), is at the level of the state-of-the-art represented by UCT.

A simple best-first variation of IW that combines exploration and exploitation proves to be very competitive as well.