- See Also
-
Links
- “Tree of Thoughts (ToT): Deliberate Problem Solving With Large Language Models”, Et Al 2023
- “Large Language Model Programs”, Et Al 2023
- “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting”, Et Al 2023
- “Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Et Al 2023
- “Boosting Theory-of-Mind Performance in Large Language Models via Prompting”, 2023
- “LLM+P: Empowering Large Language Models With Optimal Planning Proficiency”, Et Al 2023
- “Think Before You Act: Unified Policy for Interleaving Language Reasoning With Actions”, Et Al 2023
- “Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Et Al 2023
- “Multimodal Chain-of-Thought Reasoning in Language Models”, Et Al 2023
- “Large Language Models Are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning”, Et Al 2023
- “Faithful Chain-of-Thought Reasoning”, Et Al 2023
- “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, 2023
- “ChatGPT Goes to Law School”, Et Al 2023
- “Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Et Al 2023
- “PAL: Program-aided Language Models”, Et Al 2022
- “Measuring Progress on Scalable Oversight for Large Language Models”, Et Al 2022
- “Large Language Models Can Self-Improve”, Et Al 2022
- “U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, Et Al 2022
- “Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”, Et Al 2022
- “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Et Al 2022
- “ReAct: Synergizing Reasoning and Acting in Language Models”, Et Al 2022
- “Language Models Are Multilingual Chain-of-Thought Reasoners”, Et Al 2022
- “Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”, Et Al 2022
- “FOLIO: Natural Language Reasoning With First-Order Logic”, Et Al 2022
- “Faithful Reasoning Using Large Language Models”, 2022
- “Language Models Can Teach Themselves to Program Better”, Et Al 2022
- “CodeT: Code Generation With Generated Tests”, Et Al 2022
- “Language Model Cascades”, Et Al 2022
- “Can Large Language Models Reason about Medical Questions?”, Et Al 2022
- “Inner Monologue: Embodied Reasoning through Planning With Language Models”, Et Al 2022
- “Language Models (Mostly) Know What They Know”, Et Al 2022
- “Exploring Length Generalization in Large Language Models”, Et Al 2022
- “Solving Quantitative Reasoning Problems With Language Models”, Et Al 2022
- “Large Language Models Are Zero-Shot Reasoners”, Et Al 2022
- “Maieutic Prompting: Logically Consistent Reasoning With Recursive Explanations”, Et Al 2022
- “Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Et Al 2022
- “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”, Et Al 2022
- “Dialog Inpainting: Turning Documents into Dialogues”, Et Al 2022
- “Unifying Language Learning Paradigms”, Et Al 2022
- “Can Language Models Learn from Explanations in Context?”, Et Al 2022
- “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, Et Al 2022
- “STaR: Bootstrapping Reasoning With Reasoning”, Et Al 2022
- “A Conversational Paradigm for Program Synthesis”, Et Al 2022
- “Self-Consistency Improves Chain-of-Thought Reasoning in Language Models”, Et Al 2022
- “Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension”, Et Al 2022
- “PromptChainer: Chaining Large Language Model Prompts through Visual Programming”, Et Al 2022
- “It Looks Like You’re Trying To Take Over The World”, Gwern 2022
- “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, Et Al 2022
- “Reasoning Like Program Executors”, Et Al 2022
- “A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More”, Et Al 2021
- “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Et Al 2021
- “NeuroLogic A✱Esque Decoding: Constrained Text Generation With Lookahead Heuristics”, Et Al 2021
- “Reframing Human-AI Collaboration for Generating Free-Text Explanations”, Et Al 2021
- “DREAM: Uncovering Mental Models behind Language Models”, Et Al 2021
- “Few-Shot Self-Rationalization With Natural Language Prompts”, Et Al 2021
- “Training Verifiers to Solve Math Word Problems”, Et Al 2021
- “Unsupervised Neural Machine Translation With Generative Language Models Only”, Et Al 2021
- “Show Your Work: Scratchpads for Intermediate Computation With Language Models”, Et Al 2021
- “AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Et Al 2021
- “Teaching Autoregressive Language Models Complex Tasks By Demonstration”, 2021
- “Program Synthesis With Large Language Models”, Et Al 2021
- “Decision Transformer: Reinforcement Learning via Sequence Modeling”, Et Al 2021
- “Explainable Multi-hop Verbal Reasoning Through Internal Monologue”, Et Al 2021
- “A Simple Method to Keep GPT-3 Focused in a Conversation”, 2021
- “Measuring Mathematical Problem Solving With the MATH Dataset”, Et Al 2021
- “Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm”, Reynolds & 2021
- “How We Accidentally Gave Our Bots Their Personalities”, 2021
- “Word in Context: Agent and Agent Clarification (69% Dev)”, 2020
- “I Found That Getting GPT-3 to Add Its Own”Internal Monologue” in Parentheses to Be a Helpful Strategy…“, Blixt 2020
- “Teaching GPT-3 to Do a Brute Force ‘For Loop’ Checking Answers Also Seems to Work”, Karyo2020
- “Seems to Work”, Karyo2020
- “Inducing Self-Explanation: a Meta-Analysis”, Et Al 2018
- “Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems”, Et Al 2017
- “Research Ideas”, Gwern 2017
- “Why Do Humans Reason? Arguments for an Argumentative Theory”, 2011
- “How to Dramatically Improve the Reasoning Ability of GPT-3”
- “AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button.”
- “Solving Math Word Problems: We’ve Trained a System That Solves Grade School Math Problems With Nearly Twice the Accuracy of a Fine-tuned GPT-3 Model. It Solves about 90% As Many Problems As Real Kids: a Small Sample of 9-12 Year Olds Scored 60% on a Test from Our Dataset, While Our System Scored 55% on Those Same Problems. This Is Important Because Today’s AI Is Still Quite Weak at Commonsense Multistep Reasoning, Which Is Easy Even for Grade School Kids. We Achieved These Results by Training Our Model to Recognize Its Mistakes, so That It Can Try Repeatedly Until It Finds a Solution That Works”
- Wikipedia
- Miscellaneous
- Link Bibliography
Inner Monologue (by analogy to human inner-monologue) is a family of prompt engineering tricks for large language models which make them solve problems in a ‘step by step’ verbalized way; it is particularly effective on multi-step tasks with ‘one right answer’ such as math word & programming problems.
It can be induced by few-shot examples of several solved problems, finetuning on a corpus (eg. InstructGPT), or with a carefully-chosen prompt inducing a ‘dialogue’ (original discovery) or instructions (eg. “let’s think step by step”). It can be combined with better sampling strategies like best-of ranking or majority voting or a critic, self-distillation on its monologue outputs (possibly repeatedly), additional data like unit tests or retrieval results, & access to oracles like REPLs or humans.
It was discovered in July 2020 by early OA API & AI Dungeon 2 users who found that GPT-3/‘Dragon’ would fail to solve most simple arithmetic problems like multiplication (as found by the GPT-3 paper), but could be coaxed into solving them by setting up a fictional dialogue between the player and a ‘character’ into solving it step by step. It has been rediscovered repeatedly since (eg. as “scratchpad” or “chain-of-thought”).
Inner-monologue is interesting because it: is a simple prompting technique which dramatically improves benchmark performance (“sampling can show the presence of knowledge but not the absence”), was not predicted but discovered empirically after model release, appears to emerge only in large language models (>80b dense parameters), can have increasing returns to scale, can scale performance even when naive prompting has flat scaling (“hidden scaling”) adds an RNN-esque flavor to feedforward language models, and involves planning (cf. Socratic models/ SayCan). It has also not been integrated into model training in any extensive way, and the limits of self-training & exploration are unknown.
A toy-model for how inner-monologue works is that such problems are sequential: when calculating out an arithmetic problem, an error in any step causes all following steps to be wrong. Such a process is a multiplicative pipeline, where failure rates multiply: ie. a P success rate on n steps multiplies to a correctness rate of Pn, which rapidly shrinks in either variable. So inner-monologue makes the task meta-learning easier by being more specific, and reducing to easier sub-tasks, potentially increasing success rate far more than alternatives like scaling a model a few times (eg. a 5-step problem with P = 90% vs P = 99% is 60% vs 95%, which for that improvement via pure scaling of naive prompts, might require >10× scaling). Small models then aren’t smart enough to ‘get it’ from the instructions, and their baseline error rate too high to execute steps reliably enough to see much gain.
I speculate the reason for inner-monologue not being model defaults, when it predicts the answer so much more accurately, may be the lack of an implicit memory mechanism—where a model could adaptively execute computations for predicting the next token. Because models like GPT-3 or PaLM have no recurrent state, they must fake it by reusing their predicted output as a working memory. However, such ‘show-your-work’ writing style is highly unusual in the original natural language distribution they are trained to imitate, so they will not do so by default without a prompt steering them towards it; they instead try to emit the answer immediately, which is impossible given their feedforward limitation, and so they guess incorrectly.
See Also
Links
“Tree of Thoughts (ToT): Deliberate Problem Solving With Large Language Models”, Et Al 2023
“Tree of Thoughts (ToT): Deliberate Problem Solving with Large Language Models”
“Large Language Model Programs”, Et Al 2023
“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting”, Et Al 2023
“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Et Al 2023
“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”
“Boosting Theory-of-Mind Performance in Large Language Models via Prompting”, 2023
“Boosting Theory-of-Mind Performance in Large Language Models via Prompting”
“LLM+P: Empowering Large Language Models With Optimal Planning Proficiency”, Et Al 2023
“LLM+P: Empowering Large Language Models with Optimal Planning Proficiency”
“Think Before You Act: Unified Policy for Interleaving Language Reasoning With Actions”, Et Al 2023
“Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions”
“Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Et Al 2023
“Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)”
“Multimodal Chain-of-Thought Reasoning in Language Models”, Et Al 2023
“Large Language Models Are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning”, Et Al 2023
“Faithful Chain-of-Thought Reasoning”, Et Al 2023
“Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, 2023
“ChatGPT Goes to Law School”, Et Al 2023
“Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Et Al 2023
“PAL: Program-aided Language Models”, Et Al 2022
“Measuring Progress on Scalable Oversight for Large Language Models”, Et Al 2022
“Measuring Progress on Scalable Oversight for Large Language Models”
“Large Language Models Can Self-Improve”, Et Al 2022
“U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, Et Al 2022
“Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”, Et Al 2022
“Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”
“Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Et Al 2022
“Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”
“ReAct: Synergizing Reasoning and Acting in Language Models”, Et Al 2022
“ReAct: Synergizing Reasoning and Acting in Language Models”
“Language Models Are Multilingual Chain-of-Thought Reasoners”, Et Al 2022
“Language Models are Multilingual Chain-of-Thought Reasoners”
“Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”, Et Al 2022
“Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”
“FOLIO: Natural Language Reasoning With First-Order Logic”, Et Al 2022
“Faithful Reasoning Using Large Language Models”, 2022
“Language Models Can Teach Themselves to Program Better”, Et Al 2022
“CodeT: Code Generation With Generated Tests”, Et Al 2022
“Language Model Cascades”, Et Al 2022
“Can Large Language Models Reason about Medical Questions?”, Et Al 2022
“Inner Monologue: Embodied Reasoning through Planning With Language Models”, Et Al 2022
“Inner Monologue: Embodied Reasoning through Planning with Language Models”
“Language Models (Mostly) Know What They Know”, Et Al 2022
“Exploring Length Generalization in Large Language Models”, Et Al 2022
“Solving Quantitative Reasoning Problems With Language Models”, Et Al 2022
“Solving Quantitative Reasoning Problems with Language Models”
“Large Language Models Are Zero-Shot Reasoners”, Et Al 2022
“Maieutic Prompting: Logically Consistent Reasoning With Recursive Explanations”, Et Al 2022
“Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations”
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Et Al 2022
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”
“Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”, Et Al 2022
“Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”
“Dialog Inpainting: Turning Documents into Dialogues”, Et Al 2022
“Unifying Language Learning Paradigms”, Et Al 2022
“Can Language Models Learn from Explanations in Context?”, Et Al 2022
“Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, Et Al 2022
“Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language”
“STaR: Bootstrapping Reasoning With Reasoning”, Et Al 2022
“A Conversational Paradigm for Program Synthesis”, Et Al 2022
“Self-Consistency Improves Chain-of-Thought Reasoning in Language Models”, Et Al 2022
“Self-Consistency Improves Chain-of-Thought Reasoning in Language Models”
“Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension”, Et Al 2022
“Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension”
“PromptChainer: Chaining Large Language Model Prompts through Visual Programming”, Et Al 2022
“PromptChainer: Chaining Large Language Model Prompts through Visual Programming”
“It Looks Like You’re Trying To Take Over The World”, Gwern 2022
“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, Et Al 2022
“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”
“Reasoning Like Program Executors”, Et Al 2022
“A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More”, Et Al 2021
“WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Et Al 2021
“WebGPT: Improving the factual accuracy of language models through web browsing”
“NeuroLogic A✱Esque Decoding: Constrained Text Generation With Lookahead Heuristics”, Et Al 2021
“NeuroLogic A✱esque Decoding: Constrained Text Generation with Lookahead Heuristics”
“Reframing Human-AI Collaboration for Generating Free-Text Explanations”, Et Al 2021
“Reframing Human-AI Collaboration for Generating Free-Text Explanations”
“DREAM: Uncovering Mental Models behind Language Models”, Et Al 2021
“Few-Shot Self-Rationalization With Natural Language Prompts”, Et Al 2021
“Few-Shot Self-Rationalization with Natural Language Prompts”
“Training Verifiers to Solve Math Word Problems”, Et Al 2021
“Unsupervised Neural Machine Translation With Generative Language Models Only”, Et Al 2021
“Unsupervised Neural Machine Translation with Generative Language Models Only”
“Show Your Work: Scratchpads for Intermediate Computation With Language Models”, Et Al 2021
“Show Your Work: Scratchpads for Intermediate Computation with Language Models”
“AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Et Al 2021
“Teaching Autoregressive Language Models Complex Tasks By Demonstration”, 2021
“Teaching Autoregressive Language Models Complex Tasks By Demonstration”
“Program Synthesis With Large Language Models”, Et Al 2021
“Decision Transformer: Reinforcement Learning via Sequence Modeling”, Et Al 2021
“Decision Transformer: Reinforcement Learning via Sequence Modeling”
“Explainable Multi-hop Verbal Reasoning Through Internal Monologue”, Et Al 2021
“Explainable Multi-hop Verbal Reasoning Through Internal Monologue”
“A Simple Method to Keep GPT-3 Focused in a Conversation”, 2021
“Measuring Mathematical Problem Solving With the MATH Dataset”, Et Al 2021
“Measuring Mathematical Problem Solving With the MATH Dataset”
“Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm”, Reynolds & 2021
“Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm”
“How We Accidentally Gave Our Bots Their Personalities”, 2021
“Word in Context: Agent and Agent Clarification (69% Dev)”, 2020
“I Found That Getting GPT-3 to Add Its Own”Internal Monologue” in Parentheses to Be a Helpful Strategy…“, Blixt 2020
“Teaching GPT-3 to Do a Brute Force ‘For Loop’ Checking Answers Also Seems to Work”, Karyo2020
“Teaching GPT-3 to do a brute force 'for loop' checking answers also seems to work”
“Seems to Work”, Karyo2020
“Inducing Self-Explanation: a Meta-Analysis”, Et Al 2018
“Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems”, Et Al 2017
“Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems”
“Research Ideas”, Gwern 2017
“Why Do Humans Reason? Arguments for an Argumentative Theory”, 2011
“Why do humans reason? Arguments for an argumentative theory”
“How to Dramatically Improve the Reasoning Ability of GPT-3”
“How to dramatically improve the reasoning ability of GPT-3”
“Solving Math Word Problems: We’ve Trained a System That Solves Grade School Math Problems With Nearly Twice the Accuracy of a Fine-tuned GPT-3 Model. It Solves about 90% As Many Problems As Real Kids: a Small Sample of 9-12 Year Olds Scored 60% on a Test from Our Dataset, While Our System Scored 55% on Those Same Problems. This Is Important Because Today’s AI Is Still Quite Weak at Commonsense Multistep Reasoning, Which Is Easy Even for Grade School Kids. We Achieved These Results by Training Our Model to Recognize Its Mistakes, so That It Can Try Repeatedly Until It Finds a Solution That Works”
Wikipedia
Miscellaneous
-
/doc/ai/nn/transformer/gpt/inner-monologue/2022-zeng-figure2-socraticmodelsworkflowoverview.png
-
/doc/ai/nn/transformer/gpt/inner-monologue/2022-wei-figure8-lamdavsgpt3.png
-
/doc/ai/nn/transformer/gpt/inner-monologue/2022-tay-ul2-innermonologueresults.png
-
https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html
-
https://ai.googleblog.com/2023/01/google-research-2022-beyond-language.html
-
https://builtin.com/job/operations/expert-ai-teacher-contract/1267315
-
https://generative.ink/posts/methods-of-prompt-programming/#serializing-reasoning
-
https://gist.github.com/brockmanmatt/deafb4dba7e4399327e44f2c8fd97b2b
-
https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md
-
https://towardsdatascience.com/1-1-3-wait-no-1-1-2-how-to-have-gpt-sanity-check-itself-136e846987bf
-
https://twitter.com/DaveMonlander/status/1612802240582135809
-
https://twitter.com/KevinAFischer/status/1646018246225846272
-
https://twitter.com/KevinAFischer/status/1646677902833102849
-
https://twitter.com/KevinAFischer/status/1646690838981005312
-
https://twitter.com/StudentInfosec/status/1640360234882310145
-
https://twitter.com/andrewwhite01/status/1616933106786738176
-
https://twitter.com/peterwildeford/status/1522633978305560576
-
https://www.fhi.ox.ac.uk/wp-content/uploads/2021/08/QNRs_FHI-TR-2021-3.0.pdf
-
https://www.lesswrong.com/posts/yDcMDJeSck7SuBs24/steganography-in-chain-of-thought-reasoning
-
https://www.lesswrong.com/posts/zRn6cLtxyNodudzhw/visible-thoughts-project-and-bounty-announcement
-
https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/
-
https://www.reddit.com/r/ChatGPT/comments/10zavbv/extending_chatgpt_with_some_additional_internal/
-
https://www.reddit.com/r/ChatGPT/comments/11anct1/its_easy_to_give_chatgpt_a_bonafide_consciousness/
-
https://www.reddit.com/r/GPT3/comments/uzrexd/thinking_is_all_you_need/
Link Bibliography
-
https://arxiv.org/abs/2305.10601#deepmind
: “Tree of Thoughts (ToT): Deliberate Problem Solving With Large Language Models”, Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan -
https://arxiv.org/abs/2305.04388
: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting”, Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman -
https://arxiv.org/abs/2304.11490
: “Boosting Theory-of-Mind Performance in Large Language Models via Prompting”, Shima Rahimi Moghaddam, Christopher J. Honey -
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335945
: “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, John Nay -
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335905
: “ChatGPT Goes to Law School”, Jonathan H. Choi, Kristin E. Hickman, Amy Monahan, Daniel Schwarcz -
https://arxiv.org/abs/2210.11610#google
: “Large Language Models Can Self-Improve”, Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han -
https://arxiv.org/abs/2210.11399#google
: “U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, -
https://arxiv.org/abs/2210.09261#google
: “Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”, -
https://arxiv.org/abs/2210.03350#allen
: “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis -
https://arxiv.org/abs/2210.03057#google
: “Language Models Are Multilingual Chain-of-Thought Reasoners”, -
https://arxiv.org/abs/2209.00840
: “FOLIO: Natural Language Reasoning With First-Order Logic”, -
https://arxiv.org/abs/2207.08143
: “Can Large Language Models Reason about Medical Questions?”, Valentin Liévin, Christoffer Egeberg Hother, Ole Winther -
https://arxiv.org/abs/2207.05608#google
: “Inner Monologue: Embodied Reasoning through Planning With Language Models”, -
https://arxiv.org/abs/2207.05221#anthropic
: “Language Models (Mostly) Know What They Know”, -
https://arxiv.org/abs/2205.10625#google
: “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”, -
https://arxiv.org/abs/2205.09073#google
: “Dialog Inpainting: Turning Documents into Dialogues”, Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Zhao, Aida Amini, Qazi Mamunur Rashid, Mike Green, Kelvin Guu -
https://arxiv.org/abs/2205.05131#google
: “Unifying Language Learning Paradigms”, -
https://arxiv.org/abs/2204.00598#google
: “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, -
https://arxiv.org/abs/2203.11171#google
: “Self-Consistency Improves Chain-of-Thought Reasoning in Language Models”, Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou -
clippy
: “It Looks Like You’re Trying To Take Over The World”, gwern -
https://arxiv.org/abs/2201.11903#google
: “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, Denny Zhou -
https://arxiv.org/abs/2201.11473#microsoft
: “Reasoning Like Program Executors”, Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen -
https://openai.com/research/webgpt
: “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Jacob Hilton, Suchir Balaji, Reiichiro Nakano, John Schulman -
https://sites.google.com/berkeley.edu/decision-transformer
: “Decision Transformer: Reinforcement Learning via Sequence Modeling”, -
http://gptprompts.wikidot.com/linguistics:word-in-context#toc3
: “Word in Context: Agent and Agent Clarification (69% Dev)”, Matt Brockman -
https://news.ycombinator.com/item?id=23990902
: “I Found That Getting GPT-3 to Add Its Own "internal Monologue" in Parentheses to Be a Helpful Strategy…”, blixt -
https://twitter.com/kleptid/status/1284098635689611264
: “Teaching GPT-3 to Do a Brute Force 'for Loop' Checking Answers Also Seems to Work”, KaryoKleptid -
https://twitter.com/kleptid/status/1284069270603866113
: “Seems to Work”, KaryoKleptid -
2018-bisra.pdf
: “Inducing Self-Explanation: a Meta-Analysis”, Kiran Bisra, Qing Liu, John C. Nesbit, Farimah Salimi, Philip H. Winne -
idea
: “Research Ideas”, gwern