BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents (NetPlay)
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
The Tactical Amulet Extraction Bot: Predicting and Controlling NetHack's Randomness
https://ai.facebook.com/blog/launching-the-nethack-challenge-at-neurips-2021/
https://ai.facebook.com/blog/minihack-a-new-sandbox-for-open-ended-reinforcement-learning
https://www.aicrowd.com/challenges/neurips-2021-the-nethack-challenge
https://www.reddit.com/r/MachineLearning/comments/p88v9w/d_we_are_facebook_ai_researchs_nethack_learning/
https://www.reddit.com/r/nethack/comments/2tluxv/yaap_fullauto_bot_ascension_bothack
https://www.reddit.com/r/reinforcementlearning/comments/rtp5ts/nethack_2021_neurips_challenge_winning_agent/
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games