omniglot

Omniglot

In an episode of this experiment, the model is presented with a sequence of either 50 or 100 images. It knows beforehand that there are 5 (or 15) different kinds of characters with different labels. However, it does not know what the labels are, nor has it seen the exact characters before. How well can a model learn to do this task, if you train it with enough background knowledge?

5-class task

In this task, the sequences are of length 50 and there are 5 different classes. Here are the accuracy numbers for the sgdstore model.

Instance 1	Instance 2	Instance 3	Instance 4	Instance 11
35.76%	89.38%	93.72%	95.25%	96.74%

Note that these results look worse than the results in Santoro et al.. This is likely due to the fact that I use a different training/evaluation split. In the aforementioned paper, the network is meta-trained on more data and tested on less data. The models I trained did indeed overfit slightly, indicating that more training data would be helpful. I am using the original background/evaluation split from Lake et al..

The following graph shows, for three different models, the validation error over time (measured in episodes) during meta-training. The sgdstore model clearly does the best, but the LSTM catches up after way more training. The vanilla RNN (which is used as a controller for sgdstore), does terribly on its own. I will update the graph after I have run the sgdstore model for longer.

Training in this experiment was done with batch sizes of 64 and a step size of 0.0003. It is likely that hyper-parameter tuning would result in much better results. I discovered in the 15-class task that batches of 16 work much better in terms of data efficiency. I have yet to test larger step sizes, but I bet that will help too.

15-class task

In this task, the sequences are of length 100 and there are 15 different classes. Here are the accuracy results for the sgdstore model:

Instance 1	Instance 2	Instance 3	Instance 4	Instance 10	Instance 11
9.73%	77.07%	84.73%	87.39%	87.74%	90.91%

Here is a plot of training over time. In this case, I used a batch size of 16 and a learning rate of 0.0003. It is clear that the sgdstore model learns an order of magnitude faster than the LSTM, but once again the LSTM does eventually catch up:

Hyper-parameter exploration

The above experiments were with a learning rate of 0.0003. From the following graph, it's clear that the model would learn faster (in the short term) with a learning rate of 0.001:

Also, the memory modules in the above experiments were single-layer MLPs with 256 hidden units and two read heads. By "two read heads", I mean that the controller got to run two samples through the memory network. Here are two variations: one with "deep memory" (two layers of 256 units), another with four read heads. It is clear that deep memory helps in the long run, perhaps just due to the extra capacity:

Here are the accuracy measurements for the "deep memory" model:

Set	1^st	2^nd	3^rd	10^th	11^th
Eval	11.4%	81.7%	86.5%	91.2%	91.0%
Train	13.4%	88.6%	92.7%	95.6%	96.8%

A note on LSTM results

I have found that I can get much better LSTM results than the ones reported in Santoro et al.. They stop training after 100,000 episodes, which seems arbitrary (almost like it was chosen to make their model look good, since it learns faster). I don't want to confuse learning speed with model capacity, which Santoro et al. seems to do.

I use two-layer LSTMs with 384 cells per layer. This is likely much more capacity than Santoro et al. allow for their LSTMs. I think it would be unfair not to give LSTMs the benefit of the doubt, even if their learning speeds are a lot worse than memory-augmented neural networks.

	Name	Last commit message	Last commit date
parent directory ..
	log	hyperparam exploration	Apr 6, 2017
	plot	hyperparam exploration	Apr 6, 2017
	README.md	clean up some poor writing	Apr 6, 2017
	accuracy.go	added tool for computing accuracy	Mar 30, 2017
	analysis.go	added tool to analyze trained weights	Mar 29, 2017
	cuda.go	drafted omniglot experiment	Mar 26, 2017
	debug.go	support parasgdstore in debug command	Apr 2, 2017
	main.go	inspect values at points in the sgdstore model	Apr 1, 2017
	model.go	added option for ReLU activations	Apr 25, 2017
	samples.go	use new omniglot optimizations	Mar 27, 2017
	train.go	added parasgdstore model	Apr 2, 2017

Files

omniglot

Directory actions

More options

Latest commit

History

Folders and files

parent directory

README.md

Omniglot

5-class task

15-class task

Hyper-parameter exploration

A note on LSTM results