“A Search Without Expansions: Learning Heuristic Functions With Deep Q-Networks”, Forest Agostinelli, Alexander Shmakov, Stephen McAleer, Roy Fox, Pierre Baldi2021-02-08 (, , )⁠:

Efficiently solving problems with large action spaces using A search has been of importance to the artificial intelligence community for decades. This is because the computation and memory requirements of A search grow linearly with the size of the action space. This burden becomes even more apparent when A search uses a heuristic function learned by computationally expensive function approximators, such as deep neural networks.

To address this problem, we introduce Q search, a search algorithm that uses deep Q-networks to guide search in order to take advantage of the fact that the sum of the transition costs and heuristic values of the children of a node can be computed with a single forward pass through a deep Q-network without explicitly generating those children [multiplexing]. This reduces computation time and requires only one node to be generated per iteration. We use Q search to solve the Rubik’s cube when formulated with a large action space that includes 1,872 meta-actions and find that this 157× increase in the size of the action space incurs less than a 4× increase in computation time and less than a 3× increase in number of nodes generated when performing Q search.

Furthermore, Q search is up to 129× faster and generates up to 1,288× fewer nodes than A search. Finally, although obtaining admissible heuristic functions from deep neural networks is an ongoing area of research, we prove that Q search is guaranteed to find a shortest path given a heuristic function that neither overestimates the cost of a shortest path nor underestimates the transition cost.