- See Also
-
Links
- “Contrastive Decoding Improves Reasoning in Large Language Models”, O’Brien & Lewis 2023
- “Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-based Self-Verification”, Zhou et al 2023
- “LLMs As Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines With LLMs”, Wu et al 2023
- “Android in the Wild: A Large-Scale Dataset for Android Device Control”, Rawles et al 2023
- “TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT”, Zha et al 2023
- “Explaining Competitive-Level Programming Solutions Using LLMs”, Li et al 2023
- “Teaching Arithmetic to Small Transformers”, Lee et al 2023
- “Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning”, Ma et al 2023
- “GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models”, Agarwal et al 2023
- “Large Language Models As Tax Attorneys: A Case Study in Legal Capabilities Emergence”, Nay et al 2023
- “Iterative Translation Refinement With Large Language Models”, Chen et al 2023
- “Let’s Verify Step by Step”, Lightman et al 2023
- “Improving Factuality and Reasoning in Language Models through Multiagent Debate”, Du et al 2023
- “How Language Model Hallucinations Can Snowball”, Zhang et al 2023
- “Tree of Thoughts (ToT): Deliberate Problem Solving With Large Language Models”, Yao et al 2023
- “Large Language Model Programs”, Schlag et al 2023
- “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting”, Turpin et al 2023
- “Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Xie et al 2023
- “Boosting Theory-of-Mind Performance in Large Language Models via Prompting”, Moghaddam & Honey 2023
- “LLM+P: Empowering Large Language Models With Optimal Planning Proficiency”, Liu et al 2023
- “Think Before You Act: Unified Policy for Interleaving Language Reasoning With Actions”, Mezghani et al 2023
- “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023
- “Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Huang et al 2023
- “Multimodal Chain-of-Thought Reasoning in Language Models”, Zhang et al 2023
- “Large Language Models Are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning”, Ye et al 2023
- “Faithful Chain-of-Thought Reasoning”, Lyu et al 2023
- “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, Nay 2023
- “ChatGPT Goes to Law School”, Choi et al 2023
- “Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Pilault et al 2023
- “Solving Math Word Problems With Process & Outcome-based Feedback”, Uesato et al 2022
- “PAL: Program-aided Language Models”, Gao et al 2022
- “Measuring Progress on Scalable Oversight for Large Language Models”, Bowman et al 2022
- “Large Language Models Can Self-Improve”, Huang et al 2022
- “U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, Tay et al 2022
- “Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”, Suzgun et al 2022
- “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Press et al 2022
- “ReAct: Synergizing Reasoning and Acting in Language Models”, Yao et al 2022
- “Language Models Are Multilingual Chain-of-Thought Reasoners”, Shi et al 2022
- “Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”, Lu et al 2022
- “FOLIO: Natural Language Reasoning With First-Order Logic”, Han et al 2022
- “Faithful Reasoning Using Large Language Models”, Creswell & Shanahan 2022
- “Limitations of Language Models in Arithmetic and Symbolic Induction”, Qian et al 2022
- “Language Models Can Teach Themselves to Program Better”, Haluptzok et al 2022
- “CodeT: Code Generation With Generated Tests”, Chen et al 2022
- “Language Model Cascades”, Dohan et al 2022
- “Can Large Language Models Reason about Medical Questions?”, Liévin et al 2022
- “Inner Monologue: Embodied Reasoning through Planning With Language Models”, Huang et al 2022
- “Language Models (Mostly) Know What They Know”, Kadavath et al 2022
- “Exploring Length Generalization in Large Language Models”, Anil et al 2022
- “Solving Quantitative Reasoning Problems With Language Models”, Lewkowycz et al 2022
- “Large Language Models Are Zero-Shot Reasoners”, Kojima et al 2022
- “Maieutic Prompting: Logically Consistent Reasoning With Recursive Explanations”, Jung et al 2022
- “Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Honovich et al 2022
- “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”, Zhou et al 2022
- “Dialog Inpainting: Turning Documents into Dialogues”, Dai et al 2022
- “Unifying Language Learning Paradigms”, Tay et al 2022
- “Can Language Models Learn from Explanations in Context?”, Lampinen et al 2022
- “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, Zeng et al 2022
- “STaR: Bootstrapping Reasoning With Reasoning”, Zelikman et al 2022
- “A Conversational Paradigm for Program Synthesis”, Nijkamp et al 2022
- “Self-Consistency Improves Chain-of-Thought Reasoning in Language Models”, Wang et al 2022
- “Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension”, Zhao et al 2022
- “PromptChainer: Chaining Large Language Model Prompts through Visual Programming”, Wu et al 2022
- “It Looks Like You’re Trying To Take Over The World”, Gwern 2022
- “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, Wei et al 2022
- “Reasoning Like Program Executors”, Pi et al 2022
- “A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More”, Drori et al 2021
- “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Hilton et al 2021
- “NeuroLogic A✱esque Decoding: Constrained Text Generation With Lookahead Heuristics”, Lu et al 2021
- “Reframing Human-AI Collaboration for Generating Free-Text Explanations”, Wiegreffe et al 2021
- “DREAM: Uncovering Mental Models behind Language Models”, Gu et al 2021
- “Few-Shot Self-Rationalization With Natural Language Prompts”, Marasović et al 2021
- “Training Verifiers to Solve Math Word Problems”, Cobbe et al 2021
- “Unsupervised Neural Machine Translation With Generative Language Models Only”, Han et al 2021
- “Show Your Work: Scratchpads for Intermediate Computation With Language Models”, Nye et al 2021
- “AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Wu et al 2021
- “Teaching Autoregressive Language Models Complex Tasks By Demonstration”, Recchia 2021
- “Program Synthesis With Large Language Models”, Austin et al 2021
- “Decision Transformer: Reinforcement Learning via Sequence Modeling”, Chen et al 2021
- “Explainable Multi-hop Verbal Reasoning Through Internal Monologue”, Liang et al 2021
- “A Simple Method to Keep GPT-3 Focused in a Conversation”, Mayne 2021
- “Measuring Mathematical Problem Solving With the MATH Dataset”, Hendrycks et al 2021
- “Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm”, Reynolds & McDonell 2021
- “How We Accidentally Gave Our Bots Their Personalities”, Latitude 2021
- “Word in Context: Agent and Agent Clarification (69% Dev)”, Brockman 2020
- “I Found That Getting GPT-3 to Add Its Own "internal Monologue" in Parentheses to Be a Helpful Strategy…”, blixt 2020
- kleptid @ "2020-07-17"
- kleptid @ "2020-07-17"
- “Inducing Self-Explanation: a Meta-Analysis”, Bisra et al 2018
- “Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems”, Ling et al 2017
- “Research Ideas”, Gwern 2017
- “Why Do Humans Reason? Arguments for an Argumentative Theory”, Mercier & Sperber 2011
- “How to Dramatically Improve the Reasoning Ability of GPT-3”
- “AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button.”
- “Solving Math Word Problems: We’ve Trained a System That Solves Grade School Math Problems With Nearly Twice the Accuracy of a Fine-tuned GPT-3 Model. It Solves about 90% As Many Problems As Real Kids: a Small Sample of 9-12 Year Olds Scored 60% on a Test from Our Dataset, While Our System Scored 55% on Those Same Problems. This Is Important Because Today’s AI Is Still Quite Weak at Commonsense Multistep Reasoning, Which Is Easy Even for Grade School Kids. We Achieved These Results by Training Our Model to Recognize Its Mistakes, so That It Can Try Repeatedly Until It Finds a Solution That Works”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
Inner Monologue (by analogy to human inner-monologue) is a family of prompt engineering tricks for large language models which make them solve problems in a ‘step by step’ verbalized way; it is particularly effective on multi-step tasks with ‘one right answer’ such as math word & programming problems.
It can be induced by few-shot examples of several solved problems, finetuning on a corpus (eg. InstructGPT), or with a carefully-chosen prompt inducing a ‘dialogue’ (original discovery) or instructions (eg. “let’s think step by step”). It can be combined with better sampling strategies like best-of ranking or majority voting or a critic, self-distillation on its monologue outputs (possibly repeatedly), additional data like unit tests or retrieval results, & access to oracles like REPLs or humans.
It was discovered in July 2020 by early OA API & AI Dungeon 2 users who found that GPT-3/‘Dragon’ would fail to solve most simple arithmetic problems like multiplication (as found by the GPT-3 paper), but could be coaxed into solving them by setting up a fictional dialogue between the player and a ‘character’ into solving it step by step. It has been rediscovered repeatedly since (eg. as “scratchpad” or “chain-of-thought”).
Inner-monologue is interesting because it: is a simple prompting technique which dramatically improves benchmark performance (“sampling can show the presence of knowledge but not the absence”), was not predicted but discovered empirically after model release, appears to emerge only in large language models (>80b dense parameters), can have increasing returns to scale, can scale performance even when naive prompting has flat scaling (“hidden scaling”) adds an RNN-esque flavor to feedforward language models, and involves planning (cf. Socratic models/SayCan). It has also not been integrated into model training in any extensive way, and the limits of self-training & exploration are unknown.
A toy-model for how inner-monologue works is that such problems are sequential: when calculating out an arithmetic problem, an error in any step causes all following steps to be wrong. Such a process is a multiplicative pipeline, where failure rates multiply: ie. a P success rate on n steps multiplies to a correctness rate of Pn, which rapidly shrinks in either variable. So inner-monologue makes the task meta-learning easier by being more specific, and reducing to easier sub-tasks, potentially increasing success rate far more than alternatives like scaling a model a few times (eg. a 5-step problem with P = 90% vs P = 99% is 60% vs 95%, which for that improvement via pure scaling of naive prompts, might require >10× scaling). Small models then aren’t smart enough to ‘get it’ from the instructions, and their baseline error rate too high to execute steps reliably enough to see much gain.
I speculate the reason for inner-monologue not being model defaults, when it predicts the answer so much more accurately, may be the lack of an implicit memory mechanism—where a model could adaptively execute computations for predicting the next token. Because models like GPT-3 or PaLM have no recurrent state, they must fake it by reusing their predicted output as a working memory. However, such ‘show-your-work’ writing style is highly unusual in the original natural language distribution they are trained to imitate, so they will not do so by default without a prompt steering them towards it; they instead try to emit the answer immediately, which is impossible given their feedforward limitation, and so they guess incorrectly.
See Also
Links
“Contrastive Decoding Improves Reasoning in Large Language Models”, O’Brien & Lewis 2023
“Contrastive Decoding Improves Reasoning in Large Language Models”
“Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-based Self-Verification”, Zhou et al 2023
“LLMs As Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines With LLMs”, Wu et al 2023
“LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs”
“Android in the Wild: A Large-Scale Dataset for Android Device Control”, Rawles et al 2023
“Android in the Wild: A Large-Scale Dataset for Android Device Control”
“TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT”, Zha et al 2023
“TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT”
“Explaining Competitive-Level Programming Solutions Using LLMs”, Li et al 2023
“Explaining Competitive-Level Programming Solutions using LLMs”
“Teaching Arithmetic to Small Transformers”, Lee et al 2023
“Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning”, Ma et al 2023
“Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning”
“GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models”, Agarwal et al 2023
“GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models”
“Large Language Models As Tax Attorneys: A Case Study in Legal Capabilities Emergence”, Nay et al 2023
“Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence”
“Iterative Translation Refinement With Large Language Models”, Chen et al 2023
“Iterative Translation Refinement with Large Language Models”
“Let’s Verify Step by Step”, Lightman et al 2023
“Improving Factuality and Reasoning in Language Models through Multiagent Debate”, Du et al 2023
“Improving Factuality and Reasoning in Language Models through Multiagent Debate”
“How Language Model Hallucinations Can Snowball”, Zhang et al 2023
“Tree of Thoughts (ToT): Deliberate Problem Solving With Large Language Models”, Yao et al 2023
“Tree of Thoughts (ToT): Deliberate Problem Solving with Large Language Models”
“Large Language Model Programs”, Schlag et al 2023
“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting”, Turpin et al 2023
“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Xie et al 2023
“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”
“Boosting Theory-of-Mind Performance in Large Language Models via Prompting”, Moghaddam & Honey 2023
“Boosting Theory-of-Mind Performance in Large Language Models via Prompting”
“LLM+P: Empowering Large Language Models With Optimal Planning Proficiency”, Liu et al 2023
“LLM+P: Empowering Large Language Models with Optimal Planning Proficiency”
“Think Before You Act: Unified Policy for Interleaving Language Reasoning With Actions”, Mezghani et al 2023
“Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions”
“How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023
“How well do Large Language Models perform in Arithmetic tasks?”
“Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Huang et al 2023
“Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)”
“Multimodal Chain-of-Thought Reasoning in Language Models”, Zhang et al 2023
“Large Language Models Are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning”, Ye et al 2023
“Faithful Chain-of-Thought Reasoning”, Lyu et al 2023
“Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, Nay 2023
“ChatGPT Goes to Law School”, Choi et al 2023
“Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Pilault et al 2023
“Solving Math Word Problems With Process & Outcome-based Feedback”, Uesato et al 2022
“Solving math word problems with process & outcome-based feedback”
“PAL: Program-aided Language Models”, Gao et al 2022
“Measuring Progress on Scalable Oversight for Large Language Models”, Bowman et al 2022
“Measuring Progress on Scalable Oversight for Large Language Models”
“Large Language Models Can Self-Improve”, Huang et al 2022
“U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, Tay et al 2022
“Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”, Suzgun et al 2022
“Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”
“Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Press et al 2022
“Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”
“ReAct: Synergizing Reasoning and Acting in Language Models”, Yao et al 2022
“ReAct: Synergizing Reasoning and Acting in Language Models”
“Language Models Are Multilingual Chain-of-Thought Reasoners”, Shi et al 2022
“Language Models are Multilingual Chain-of-Thought Reasoners”
“Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”, Lu et al 2022
“Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”
“FOLIO: Natural Language Reasoning With First-Order Logic”, Han et al 2022
“Faithful Reasoning Using Large Language Models”, Creswell & Shanahan 2022
“Limitations of Language Models in Arithmetic and Symbolic Induction”, Qian et al 2022
“Limitations of Language Models in Arithmetic and Symbolic Induction”
“Language Models Can Teach Themselves to Program Better”, Haluptzok et al 2022
“CodeT: Code Generation With Generated Tests”, Chen et al 2022
“Language Model Cascades”, Dohan et al 2022
“Can Large Language Models Reason about Medical Questions?”, Liévin et al 2022
“Inner Monologue: Embodied Reasoning through Planning With Language Models”, Huang et al 2022
“Inner Monologue: Embodied Reasoning through Planning with Language Models”
“Language Models (Mostly) Know What They Know”, Kadavath et al 2022
“Exploring Length Generalization in Large Language Models”, Anil et al 2022
“Solving Quantitative Reasoning Problems With Language Models”, Lewkowycz et al 2022
“Solving Quantitative Reasoning Problems with Language Models”
“Large Language Models Are Zero-Shot Reasoners”, Kojima et al 2022
“Maieutic Prompting: Logically Consistent Reasoning With Recursive Explanations”, Jung et al 2022
“Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations”
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Honovich et al 2022
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”
“Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”, Zhou et al 2022
“Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”
“Dialog Inpainting: Turning Documents into Dialogues”, Dai et al 2022
“Unifying Language Learning Paradigms”, Tay et al 2022
“Can Language Models Learn from Explanations in Context?”, Lampinen et al 2022
“Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, Zeng et al 2022
“Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language”
“STaR: Bootstrapping Reasoning With Reasoning”, Zelikman et al 2022
“A Conversational Paradigm for Program Synthesis”, Nijkamp et al 2022
“Self-Consistency Improves Chain-of-Thought Reasoning in Language Models”, Wang et al 2022
“Self-Consistency Improves Chain-of-Thought Reasoning in Language Models”
“Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension”, Zhao et al 2022
“Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension”
“PromptChainer: Chaining Large Language Model Prompts through Visual Programming”, Wu et al 2022
“PromptChainer: Chaining Large Language Model Prompts through Visual Programming”
“It Looks Like You’re Trying To Take Over The World”, Gwern 2022
“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, Wei et al 2022
“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”
“Reasoning Like Program Executors”, Pi et al 2022
“A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More”, Drori et al 2021
“WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Hilton et al 2021
“WebGPT: Improving the factual accuracy of language models through web browsing”
“NeuroLogic A✱esque Decoding: Constrained Text Generation With Lookahead Heuristics”, Lu et al 2021
“NeuroLogic A✱esque Decoding: Constrained Text Generation with Lookahead Heuristics”
“Reframing Human-AI Collaboration for Generating Free-Text Explanations”, Wiegreffe et al 2021
“Reframing Human-AI Collaboration for Generating Free-Text Explanations”
“DREAM: Uncovering Mental Models behind Language Models”, Gu et al 2021
“Few-Shot Self-Rationalization With Natural Language Prompts”, Marasović et al 2021
“Few-Shot Self-Rationalization with Natural Language Prompts”
“Training Verifiers to Solve Math Word Problems”, Cobbe et al 2021
“Unsupervised Neural Machine Translation With Generative Language Models Only”, Han et al 2021
“Unsupervised Neural Machine Translation with Generative Language Models Only”
“Show Your Work: Scratchpads for Intermediate Computation With Language Models”, Nye et al 2021
“Show Your Work: Scratchpads for Intermediate Computation with Language Models”
“AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Wu et al 2021
“Teaching Autoregressive Language Models Complex Tasks By Demonstration”, Recchia 2021
“Teaching Autoregressive Language Models Complex Tasks By Demonstration”
“Program Synthesis With Large Language Models”, Austin et al 2021
“Decision Transformer: Reinforcement Learning via Sequence Modeling”, Chen et al 2021
“Decision Transformer: Reinforcement Learning via Sequence Modeling”
“Explainable Multi-hop Verbal Reasoning Through Internal Monologue”, Liang et al 2021
“Explainable Multi-hop Verbal Reasoning Through Internal Monologue”
“A Simple Method to Keep GPT-3 Focused in a Conversation”, Mayne 2021
“Measuring Mathematical Problem Solving With the MATH Dataset”, Hendrycks et al 2021
“Measuring Mathematical Problem Solving With the MATH Dataset”
“Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm”, Reynolds & McDonell 2021
“Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm”
“How We Accidentally Gave Our Bots Their Personalities”, Latitude 2021
“Word in Context: Agent and Agent Clarification (69% Dev)”, Brockman 2020
“I Found That Getting GPT-3 to Add Its Own "internal Monologue" in Parentheses to Be a Helpful Strategy…”, blixt 2020
kleptid @ "2020-07-17"
“Teaching GPT-3 to do a brute force 'for loop' checking answers also seems to work”
kleptid @ "2020-07-17"
“Inducing Self-Explanation: a Meta-Analysis”, Bisra et al 2018
“Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems”, Ling et al 2017
“Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems”
“Research Ideas”, Gwern 2017
“Why Do Humans Reason? Arguments for an Argumentative Theory”, Mercier & Sperber 2011
“Why do humans reason? Arguments for an argumentative theory”
“How to Dramatically Improve the Reasoning Ability of GPT-3”
“How to dramatically improve the reasoning ability of GPT-3”
“AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button.”
“AI Dungeon players can now translate their stories into emojis by just clicking a button.”
“Solving Math Word Problems: We’ve Trained a System That Solves Grade School Math Problems With Nearly Twice the Accuracy of a Fine-tuned GPT-3 Model. It Solves about 90% As Many Problems As Real Kids: a Small Sample of 9-12 Year Olds Scored 60% on a Test from Our Dataset, While Our System Scored 55% on Those Same Problems. This Is Important Because Today’s AI Is Still Quite Weak at Commonsense Multistep Reasoning, Which Is Easy Even for Grade School Kids. We Achieved These Results by Training Our Model to Recognize Its Mistakes, so That It Can Try Repeatedly Until It Finds a Solution That Works”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
Reasoning
math-problems
machine-learning
Wikipedia
Miscellaneous
-
/doc/ai/nn/transformer/gpt/inner-monologue/2023-lee-figure1-numberformattingforgpt2arithmetic.png
-
/doc/ai/nn/transformer/gpt/inner-monologue/2022-05-28-gpt3user-thinkingisallyouneed.html
-
/doc/ai/nn/transformer/gpt/inner-monologue/2022-zeng-figure2-socraticmodelsworkflowoverview.png
-
/doc/ai/nn/transformer/gpt/inner-monologue/2022-wei-figure8-lamdavsgpt3.png
-
/doc/ai/nn/transformer/gpt/inner-monologue/2022-tay-ul2-innermonologueresults.png
-
https://blog.research.google/2022/06/minerva-solving-quantitative-reasoning.html
-
https://blog.research.google/2023/01/google-research-2022-beyond-language.html
-
https://builtin.com/job/customer-success/expert-ai-teacher-contract/1267315
-
https://generative.ink/posts/methods-of-prompt-programming/#serializing-reasoning
-
https://gist.github.com/brockmanmatt/deafb4dba7e4399327e44f2c8fd97b2b
-
https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md
-
https://statmodeling.stat.columbia.edu/2023/08/30/chatgpt-4-can-do-3-digit-multiplication/
-
https://towardsdatascience.com/1-1-3-wait-no-1-1-2-how-to-have-gpt-sanity-check-itself-136e846987bf
-
https://twitter.com/DaveMonlander/status/1612802240582135809
-
https://twitter.com/KevinAFischer/status/1646018246225846272
-
https://twitter.com/KevinAFischer/status/1646677902833102849
-
https://twitter.com/KevinAFischer/status/1646690838981005312
-
https://twitter.com/StudentInfosec/status/1640360234882310145
-
https://twitter.com/andrewwhite01/status/1616933106786738176
-
https://twitter.com/peterwildeford/status/1522633978305560576
-
https://twitter.com/yoheinakajima/status/1670557048743010305
-
https://www.fhi.ox.ac.uk/wp-content/uploads/2021/08/QNRs_FHI-TR-2021-3.0.pdf
-
https://www.lesswrong.com/posts/yDcMDJeSck7SuBs24/steganography-in-chain-of-thought-reasoning
-
https://www.lesswrong.com/posts/zRn6cLtxyNodudzhw/visible-thoughts-project-and-bounty-announcement
-
https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/
-
https://www.reddit.com/r/ChatGPT/comments/10zavbv/extending_chatgpt_with_some_additional_internal/
-
https://www.reddit.com/r/ChatGPT/comments/11anct1/its_easy_to_give_chatgpt_a_bonafide_consciousness/
-
https://www.waluigipurple.com/post/revising-poetry-with-gpt-4
Link Bibliography
-
https://arxiv.org/abs/2309.09117#facebook
: “Contrastive Decoding Improves Reasoning in Large Language Models”, Sean O’Brien, Mike Lewis -
https://arxiv.org/abs/2308.07921
: “Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-based Self-Verification”, -
https://arxiv.org/abs/2307.03381
: “Teaching Arithmetic to Small Transformers”, Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos -
https://arxiv.org/abs/2306.14308#google
: “Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning”, Xiao Ma, Swaroop Mishra, Ahmad Beirami, Alex Beutel, Jilin Chen -
https://arxiv.org/abs/2305.13534
: “How Language Model Hallucinations Can Snowball”, Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith -
https://arxiv.org/abs/2305.10601#deepmind
: “Tree of Thoughts (ToT): Deliberate Problem Solving With Large Language Models”, Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan -
https://arxiv.org/abs/2305.04388
: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting”, Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman -
https://arxiv.org/abs/2304.11490
: “Boosting Theory-of-Mind Performance in Large Language Models via Prompting”, Shima Rahimi Moghaddam, Christopher J. Honey -
https://arxiv.org/abs/2304.02015#alibaba
: “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang -
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335945
: “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, John Nay -
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335905
: “ChatGPT Goes to Law School”, Jonathan H. Choi, Kristin E. Hickman, Amy Monahan, Daniel Schwarcz -
https://arxiv.org/abs/2210.11610#google
: “Large Language Models Can Self-Improve”, Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han -
https://arxiv.org/abs/2210.11399#google
: “U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, -
https://arxiv.org/abs/2210.09261#google
: “Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”, -
https://arxiv.org/abs/2210.03350#allen
: “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis -
https://arxiv.org/abs/2210.03057#google
: “Language Models Are Multilingual Chain-of-Thought Reasoners”, -
https://arxiv.org/abs/2209.00840
: “FOLIO: Natural Language Reasoning With First-Order Logic”, -
https://arxiv.org/abs/2207.08143
: “Can Large Language Models Reason about Medical Questions?”, Valentin Liévin, Christoffer Egeberg Hother, Ole Winther -
https://arxiv.org/abs/2207.05608#google
: “Inner Monologue: Embodied Reasoning through Planning With Language Models”, -
https://arxiv.org/abs/2207.05221#anthropic
: “Language Models (Mostly) Know What They Know”, -
https://arxiv.org/abs/2205.10625#google
: “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”, -
https://arxiv.org/abs/2205.09073#google
: “Dialog Inpainting: Turning Documents into Dialogues”, Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Zhao, Aida Amini, Qazi Mamunur Rashid, Mike Green, Kelvin Guu -
https://arxiv.org/abs/2205.05131#google
: “Unifying Language Learning Paradigms”, -
https://arxiv.org/abs/2204.00598#google
: “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, -
https://arxiv.org/abs/2203.11171#google
: “Self-Consistency Improves Chain-of-Thought Reasoning in Language Models”, Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou -
clippy
: “It Looks Like You’re Trying To Take Over The World”, Gwern -
https://arxiv.org/abs/2201.11903#google
: “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, Denny Zhou -
https://arxiv.org/abs/2201.11473#microsoft
: “Reasoning Like Program Executors”, Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen -
https://openai.com/research/webgpt
: “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Jacob Hilton, Suchir Balaji, Reiichiro Nakano, John Schulman -
https://sites.google.com/berkeley.edu/decision-transformer
: “Decision Transformer: Reinforcement Learning via Sequence Modeling”, -
https://gptprompts.wikidot.com/linguistics:word-in-context#toc3
: “Word in Context: Agent and Agent Clarification (69% Dev)”, Matt Brockman -
https://news.ycombinator.com/item?id=23990902
: “I Found That Getting GPT-3 to Add Its Own "internal Monologue" in Parentheses to Be a Helpful Strategy…”, blixt -
https://twitter.com/kleptid/status/1284098635689611264
: “Teaching GPT-3 to Do a Brute Force 'for Loop' Checking Answers Also Seems to Work”, KaryoKleptid -
https://twitter.com/kleptid/status/1284069270603866113
: “Seems to Work”, KaryoKleptid -
2018-bisra.pdf
: “Inducing Self-Explanation: a Meta-Analysis”, Kiran Bisra, Qing Liu, John C. Nesbit, Farimah Salimi, Philip H. Winne -
idea
: “Research Ideas”, Gwern