Bibliography (24):

  1. GPT-3: Language Models are Few-Shot Learners

  2. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

  3. https://arxiv.org/pdf/2210.03350#page=17&org=allen

  4. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

  5. MuSiQue: Multi-hop Questions via Single-hop Question Composition

  6. Large Language Models are Zero-Shot Reasoners

  7. InstructGPT: Training language models to follow instructions with human feedback

  8. https://arxiv.org/pdf/2210.03350.pdf#page=24

  9. RealTime QA: What’s the Answer Right Now?

  10. https://serpapi.com/

  11. https://arxiv.org/pdf/2210.03350#page=24&org=allen

  12. Large Language Models Can Self-Improve

  13. Language Model Cascades

  14. Self-Consistency Improves Chain-of-Thought Reasoning in Language Models

  15. Language Models (Mostly) Know What They Know

  16. Boosting Theory-of-Mind Performance in Large Language Models via Prompting

  17. Ask Me Anything (AMA): A simple strategy for prompting language models

  18. What Can a Generative Language Model Answer About a Passage?

  19. Language Models are Multilingual Chain-of-Thought Reasoners

  20. Predictability and Surprise in Large Generative Models

  21. WebGPT: Improving the factual accuracy of language models through web browsing