×
all 19 comments

[–]iamquah 4 points5 points  (1 child)

What is a "symbolic method" exactly? I Googled "symbolic method nle" and didn't really find anything pertinent. I can't watch the vid so I thought I'd just ask away.

TIA!

[–]timthebaker 2 points3 points  (0 children)

For the NetHack compeition, "symbolic" was defined as any non-neural network approach.

Agent not using a neural network or significantly similar modeling technique.

In AI, there are two major schools of thought. One is "connectionist" which you can think of as basically neural-network-like approaches. These are systems consisting of nodes usually and what you learn is the connection strengths between nodes. They are meant to be very flexible and are inspired by the brain which is understood to be a network of neurons. Sometimes connectionists architectures are called "sub-symbolic" because they exist below the level of symbols (i.e., symbols are more abstract than a network with learnable weights and symbolic reasoning can supposedly emerge from such a network).

The other school is sometimes called "symbolic AI" which refers to methods that manipulate symbols. I'm much less familiar with these besides knowing that these methods used to be popular. A basic symbolic AI approach for, say chess, would be to give the agent a set of rules such as "queen is more valuable than a rook. bishop is more valuable than a pawn. controlling center squares is valuable" and then let the agent learn how to weight the significance of those rules. The term "expert system" also comes to mind.

In the 1900s Symbolic approaches dominated AI and received most of the attention and funding. In the 2000s, thanks to increases in computing power and the success of algorithms like convolutional neural networks, connectionist architectures are dominating. Despite what any will tell you, both approaches have merit.

[–]timthebaker 2 points3 points  (12 children)

Saw the results on twitter a few weeks ago and thought NLE was a neat challenge for AI. Not only was the best approach (yours) symbolic, but in general the symbolic entries took the top 3 spots over "neural" approaches which was cool. Congrats on winning. Haven't and still don't have time to go through the results, but hoping to pop in discussions on this thread.

Michel, why do you think symbolic approaches outperformed in this competition, what is deep RL missing?

[–]procedural_only[S] 2 points3 points  (0 children)

Michel, why do you think symbolic approaches outperformed in this competition, what is deep RL missing?

I think there are actually multiple reasons for that, and even after eliminating some of them, symbolic methods may still be more applicable. Here are some initial reasons/ideas we came up with:

1. lack of some innate human priors:

a) objectness -- a NN needs to create the abstraction of an object by looking at the ASCII characters. Objects are items, monsters, walls, doors, etc. and all share some common things (e.g. you can kick all of them). It applies only if we feed it a somewhat "raw" observations without any action space transformation.

b) priors about how physics works -- like what happens if you throw something in a direction, or when you drop something

c) innate notions about natural numbers -- and NNs always have problems to learn any arithmetics properly

d) priors about orientation and navigation in a somewhat 2D/3D space (non-euclidean though)

2. lack of some human acquired priors:

a) generic ones like: what is weapon, how many hands do you (usually) have, what can you possibly do with a potion/fluid (i.e. drink, dip in it, throw?), etc.

b) lack of knowledge present on e.g. the NetHack Wiki -- though in theory one could try to incorporate this knowledge by e.g. using an pre-trained NLP model on it for feature extraction.

3. Problems that makes this environment hard from currently known RL algorithms perspective:

a) higly partial observations -- agent needs to build a complex game state representation during an episode

b) sparse rewards -- score mostly only after killing monsters

c) long episodes

We have actually tried an experiment with training MuZero on a simplified action space, but we couldn't improve our score.

[–]moschles 1 point2 points  (9 children)

Wy do you think symbolic approaches outperformed in this competition, what is deep RL missing?

You can always code up a bot for a specific game. And that bot will out-compete those agents required to learn it from scratch. The reason is not mystical -- the reason is because a coded bot is endowed with the all the cognitive heavy lifting already done for it by a human programmer.

[–]timthebaker 1 point2 points  (8 children)

Well, to be fair, Alpha Zero learns from scratch and outperforms all traditional game-specific chess AI, which seems like a counterexample to your point. A bot with hand-crafted features will always serve as a good baseline though and I guess I am most curious about what NN-based agents fail to learn in the Netscape setting.

[–]moschles 1 point2 points  (0 children)

Alpha Zero

Alpha Zero was trained by a gigantic research outfit called Deepmind London. Those researchers have something like 2000 TPUs and the models cost over a million dollars to train. NLE is some Nethack competition among 'teams' with a prize of $20,000 dollars. (If one of those 'teams' had that kind of resources, I'm convinced that their 795-million parameter model would trounce the competition.). But it seems to me the more provencial asnwer is probably correct. The "symbolic" approaches take the top 3 slots because they are hand-coding them.

[–]gor-ren 1 point2 points  (6 children)

Alpha Zero learns from scratch and outperforms all traditional game-specific chess AI

Yes, and it was hailed as a major breakthrough exactly because of this. It also required an ungodly amount of training to get to that performance, despite chess' relatively simple premise (8x8 grid, handful of pieces with different movement rules, deterministic, perfect state observations).

NetHack is vastly more complex than chess... large maps, different behaviour on different levels, weird/obtuse rules that will kill you or make you powerful, non-determinism, very limited observations, and so on.

I will guesstimate (with the caveat I'm not on the cutting edge of RL/ML by any means) that the RL agents used for this competition could not be given enough training to learn the idiosyncrasies of NetHack well enough to beat the symbolic bots. This touches on a weakness of current RL algorithms: poor sample efficiency.

Anyway I think your real point is that symbolic approaches encoded with good behaviour due to domain knowledge aren't better than a general RL agent which learns optimal behaviour through training. But you can appreciate that in a domain where current RL approaches can't learn well enough, the symbolic approaches win... for now :)

[–]timthebaker 1 point2 points  (4 children)

For sure, I'm really curious to see when/if NN-based approaches ever overtake the symbolic ones. Honestly, I could see a hybrid approach being attractive. Let a NN learn to make decisions, but code in a lot of the game's interactions symbolically.

[–]gor-ren 0 points1 point  (3 children)

You might be interested in "reward shaping", a way to encode human domain knowledge into an RL reward function to give agents a trail of breadcrumbs to follow.

[–]timthebaker 0 points1 point  (2 children)

Oh, neat. I'd bet some super hardcore folk probably hate this idea, but I'm game.

Is there a specific paper?

[–]gor-ren 1 point2 points  (1 child)

The classic paper is Policy invariance under reward transformations (more plainly: how to modify the reward function while ensuring the optimal policy doesn't change). It's a very formal and rigorous paper, though, and you might get further searching for the intro sections of papers applying reward shaping instead.

e: I remember finding this YouTube video useful https://www.youtube.com/watch?v=0R3PnJEisqk

[–]timthebaker 1 point2 points  (0 children)

Great, thank you for the pointers

[–]jms4607 0 points1 point  (0 children)

The “ungodly” amount of training will be laughs in 20 years

[–]moschles 2 points3 points  (3 children)

The winning agent isn't based on reinforcement learning in the end, but the victory of symbolic methods in this competition shows what RL is still missing to some extent -- so I believe this subreddit is a good place to discuss it.

No, RL is not "missing" something provided by symbolic methods. The symbolic methods are specifically tweaked to the game itself, in what researchers call "domain knowledge". Domain knowledge is the whole crux to the Atari playing agents of Deepmind. Those agents learned the games starting only from raw pixels, without the aid of human beings pre-labelling the entities that appear on the screen. In the case of NetHack, you can come along and hand-code symbols that correspond to the primary entities that appear in the game world. Such software systems will necessarily outperform the deep learning agents who have to create all the "entities" from scratch by uncovering their invariant features.

In short : you can always code up a bot for a specific game. And that bot will out-compete those agents required to learn it from scratch. The reason is not mystical -- the reason is because a coded bot is endowed with the all the cognitive heavy lifting already done for it by a human being.

[–]timthebaker 1 point2 points  (2 children)

I posted in another comment, but I'll reiterate here on this top-level comment.

: you can always code up a bot for a specific game. And that bot will out-compete those agents required to learn it from scratch. The reason is not mystical -- the reason is because a coded bot is endowed with the all the cognitive heavy lifting already done for it by a human being.

This is incorrect. Alpha zero is given nothing more than the rules of chess and it learns, from scratch, how to play better than any other bot. The reasoning is that a set of rules hand-crafted by a human is likely to be incomplete and biased. For example, in chess, sacrificing a piece goes against the standard rule of "trading even" or "trading up." That's a shallow example which you can argue against, but it captures the notion of the fact that many rules have exceptions and exceptions to rules also have exceptions, etc.

It is hard to come up with a set of rules because so many rules have exceptions and because we often aren't even aware of what we humans are doing subconsciously when we make decisions. That being said, a bot with hand-crafted rules will always be a good baseline to measure against.

[–]moschles 1 point2 points  (1 child)

That's a shallow example which you can argue against, but it captures the notion of the fact that many rules have exceptions and exceptions to rules also have exceptions, etc.

Your Alpha zero example is shallow for other reasons. Those agents are trained by expensive research outfits, not by "teams" with access to maybe a few PCs. Centers like Deepmind and OpenAI are training models that cost millions.

I stick to my original assertion. the "symbolic" NetHack playing agents are in the top 3 teams in the competition, because they are hand-coded bots.

[–]timthebaker 1 point2 points  (0 children)

I stick to my original assertion. the "symbolic" NetHack playing agents are in the top 3 teams in the competition, because they are hand-coded bots.

I agree with the above.

you can always code up a bot for a specific game. And that bot will out-compete those agents required to learn it from scratch.

I disagree with the above from your original comment. I agree the resources used for alpha zero is absurd, but it is a direct counterexample to the quote.

The NetHack competition was built in part by Facebook AI and hosted at NeurIPS, the most popular NN conference. I think there was plenty of incentive for the big spenders in AI to throw some money at winning the competition for the good PR and for the love of solving hard problems.

[–]rogal_the_stubborn 0 points1 point  (0 children)

Hey u/procedural_only congrats on winning the challenge! great result!

I was wondering if your agent is available online, I am writing a paper and I would like to benchmark an (inferior) learned-agent against it. Thanks!