“Learning to Reason in Large Theories without Imitation”, Kshitij Bansal, Christian Szegedy, Markus N. Rabe, Sarah M. Loos, Viktor Toman2019-05-25 (, , , )⁠:

In this paper, we demonstrate how to do automated theorem proving in the presence of a large knowledge base of potential premises without learning from human proofs.

We suggest an exploration mechanism that mixes in additional premises selected by a tf-idf (term frequency-inverse document frequency) based lookup in a deep reinforcement learning scenario. This helps with exploring and learning which premises are relevant for proving a new theorem.

Our experiments show that the theorem prover trained with this exploration mechanism outperforms provers that are trained only on human proofs. It approaches the performance of a prover trained by a combination of imitation and reinforcement learning.

We perform multiple experiments to understand the importance of the underlying assumptions that make our exploration approach work, thus explaining our design choices.