“Optimal Ordered Problem Solver (OOPS)”, Juergen Schmidhuber2002-07-31 ()⁠:

We present a novel, general, optimally fast, incremental way of searching for a universal algorithm that solves each task in a sequence of tasks. The Optimal Ordered Problem Solver (OOPS) continually organizes and exploits previously found solutions to earlier tasks, efficiently searching not only the space of domain-specific algorithms, but also the space of search algorithms.

Essentially we extend the principles of optimal non-incremental universal search [Levin search] to build an incremental universal learner that is able to improve itself through experience.

In illustrative experiments, our self-improver becomes the first general system that learns to solve all n disk Towers of Hanoi tasks (solution size 2n−1) for n up to 30, profiting from previously solved, simpler tasks involving samples of a simple context free language.


In a quite pragmatic sense OOPS is the fastest general way of solving one task after another, always optimally exploiting solutions to earlier tasks when possible. It can be used for increasingly hard problems of optimization or prediction. Suppose there is only one task and a bias in form of a probability distribution P on programs for a universal computer. In the i-th phase (i = 1, 2, 3, . . .) of asymptotically optimal nonincremental universal search (Levin1973, Levin1984, Hutter2002a) we test all programs p with runtime ≤ 2iP(p) until the task is solved. Now suppose there is a sequence of tasks, eg. the n-th task is to find a shorter path through a maze than the best found so far. To reduce the search time for new tasks, previous incremental extensions of universal search tried to modify P through experience with earlier tasks—but in a heuristic and non-general and suboptimal way prone to overfitting. OOPS, however, does it right

Tested self-delimiting program prefixes (beginnings of code that may continue) are immediately executed while being generated. They grow by one instruction whenever they request this. The storage for the first found program computing a solution to the current task becomes non-writable. Programs tested during search for solutions to later task may copy non-writable code into separate modifiable storage, to edit it and execute the modified result. Prefixes may also recompute the probability distribution on their suffixes in arbitrary computable ways. To solve the n-th task we sacrifice half the total search time for testing (via universal search) programs that have the most recent successful program as a prefix. The other half remains for testing fresh programs starting at the address right above the top non-writable address. When we are searching for a universal solver for all tasks in the sequence we have to time-share the second half (but not the first!) among all tasks 1..n. For realistic limited computers we need efficient backtracking in program space to reset storage contents modified by tested programs. We introduce a recursive procedure for doing this in time-optimal fashion.

OOPS can solve tasks unsolvable by traditional reinforcement learners and AI planners, such as Towers of Hanoi with 30 disks (minimal solution size > 109). In our experiments OOPS demonstrates incremental learning by reusing previous solutions to discover a prefix that temporarily rewrites the distribution on its suffixes, such that universal search is accelerated by 1,000×. This illustrates how OOPS can benefit from self-improvement and meta-searching, that is, searching for faster search procedures.

We mention several OOPS variants and outline OOPS-based reinforcement learners. Since OOPS will scale to larger problems in essentially unbeatable fashion, we also examine its physical limitations.

[Keywords: OOPS, bias-optimality, incremental optimal universal search, efficient planning & backtracking in program space, meta-searching & metalearning, self-improvement]

6.6 Experimental Results for Both Task Sets Within roughly 0.3 days, OOPS found and froze code solving all 30 1n2n-tasks. Thereafter, within 2–3 additional days, it also found a universal Hanoi solver. The latter does not call the 1n2n solver as a subprogram (which would not make sense at all), but it does profit from experience: it begins with a rather short prefix that reshapes the distribution on the possible suffixes, an thus the search space, by temporally increasing the probabilities of certain instructions of the earlier found 1n2n solver. This in turn happens to increase the probability of finding a Hanoi-solving suffix. It is instructive to study the sequence of intermediate solutions…The entire 4-day search for solutions to all 60 tasks tested 93,994,568,009 prefixes corresponding to 345,450,362,522 instructions costing 678,634,413,962 time steps. Recall once more that search time of an optimal solver is a natural measure of initial bias. Clearly, most tested prefixes are short—they either halt or get interrupted soon. Still, some programs do run for a long time; for example, the run of the self-discovered universal Hanoi solver working on instance 30 consumed 33 billion steps, which is already 5% of the total time. The stack used by the iterative equivalent of procedure Try for storage management (§4.1) never held more than 20,000 elements though.

…Note also that we could continue to solve Hanoi tasks up to n > 40. The execution time required to solve such instances with an optimal solver greatly exceeds the search time required for finding the solver itself. There it does not matter much whether OOPS already starts with a pre-wired Hanoi solver, or first has to discover one, since the initial search time for the solver becomes negligible anyway.

…For example, we could use the principles of OOPS to create a non-gradient-based, near-bias-optimal variant of the successful recurrent network meta learner by Hochreiter et al 2001. It should also be of interest to study probabilistic Speed Prior-based OOPS variants (Schmidhuber2002e) and to devise applications of OOPS-like methods as components of universal reinforcement learners (§5.3).