“The MONK’s Problems-A Performance Comparison of Different Learning Algorithms”, Sebastian B. Thrun, Jerzy W. Bala, Eric Bloedorn, Ivan Bratko, Bojan Cestnik, John Cheng, Kenneth A. De Jong, Saso Dzeroski, Douglas H. Fisher, Scott E. Fahlman, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R. S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich, H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, J. Zhang1991-12 (; backlinks)⁠:

Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one week to a wide variety of learning algorithms, they felt rather confused: Which algorithm would be optimal? And which one to avoid? As a consequence of this dilemma, they created a simple task on which all learning algorithms ought to be compared [benchmarked]: the three MONK’s problems.

This report summarizes the results

[Keywords: machine learning, MONK’s problems, AQ17-DCI, AQ17-HCI, AQ17-FCLS, AQ14-NT, AQ15-GA, Assistant Professional, mFOIL, ID5R, IDL, ID5R-hat, TDIDT, ID3, AQR, CN2, CLASSWEB, ECOBWEB, PRISM, backpropagation, Cascade Correlation]

This report summarizes a comparison of different learning techniques which was performed at the 2nd European Summer School on Machine Learning, held in Belgium during summer 1991. A variety of symbolic and non-symbolic leaning techniques—namely AQ17-DCI, AQ17-HCI, AQ17-FCLS, AQ14-NT, AQ15-GA, Assistant Professional, mFOIL, IDSR, IDL, IDSR-hat, TDIDT, ID3, AQR, CN2, CLASSWEB, ECOBWEB, PRISM, backpropagation, and Cascade Correlation—are compared on 3 classification problems, the MONK’s problems.

The MONK’s problems are derived from a domain in which each training example is represented by 6 discrete-valued attributes. Each problem involves learning a binary function defined over this domain, from a sample of training examples of this function. Experiments were performed with and without noise in the training examples.

One important characteristic of this comparison is that it was performed by a collection of researchers, each of whom was an advocate of the technique they tested (often they were the creators of the various methods). In this sense, the results are less biased than in comparisons performed by a single person advocating a specific learning method, and more accurately reflect the generalization behavior of the learning techniques as applied by knowledgeable users.