Bibliography:

  1. ‘GPT’ tag

  2. ‘PaLM 2’ tag

  3. ‘LaMDA’ tag

  4. OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision

  5. To Believe or Not to Believe Your LLM

  6. Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

  7. LLMs achieve adult human performance on higher-order theory of mind tasks

  8. VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

  9. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

  10. Beyond Memorization: Violating Privacy Via Inference with Large Language Models

  11. HyperAttention: Long-context Attention in Near-Linear Time

  12. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

  13. How Robust is Google’s Bard to Adversarial Image Attacks?

  14. Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models

  15. CausalLM is not optimal for in-context learning

  16. Simple synthetic data reduces sycophancy in large language models

  17. Large Language Models are Few-Shot Health Learners

  18. SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

  19. q2d: Turning Questions into Dialogs to Teach Models How to Search

  20. Larger language models do in-context learning differently

  21. Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models

  22. Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation with Interaction

  23. Memory Augmented Large Language Models are Computationally Universal

  24. Med-PaLM: Large Language Models Encode Clinical Knowledge

  25. Character-Aware Models Improve Visual Text Rendering

  26. Efficiently Scaling Transformer Inference

  27. U-PaLM: Transcending Scaling Laws with 0.1% Extra Compute

  28. FLAN: Scaling Instruction-Finetuned Language Models

  29. Large Language Models Can Self-Improve

  30. RARR: Attributed Text Generation via Post-hoc Research and Revision

  31. Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them

  32. Language Models are Multilingual Chain-of-Thought Reasoners

  33. ReAct: Synergizing Reasoning and Acting in Language Models

  34. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

  35. Inner Monologue: Embodied Reasoning through Planning with Language Models

  36. Solving Quantitative Reasoning Problems with Language Models

  37. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

  38. Unifying Language Learning Paradigms

  39. PaLM: Scaling Language Modeling with Pathways

  40. Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances

  41. Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance

  42. PaLM § Figure 19: [Explaining a Joke / Inference Chaining] Each ‘Input” Was Independently Prepended With the Same 2-Shot Exemplar Shown at the Top, and “Model Output’ Shows the Greedy Decoding Output of PaLM 540B. The Two Exemplar Jokes Are Known Jokes (explanations Written by Authors), While All Evaluated Jokes Were Written by the Authors. Of Course, These Jokes Do Share Abstract Premises With Existing Jokes (wordplay, Reliability, Humorous Analogies, Reversal-Of-Expectations). The Inference Chaining Examples Were Also Written by the Authors.

  43. 6cdac06b552d242ed33f68d838d884af52e82e92.pdf#page=38&org=google

  44. AI Will Increase the Quantity—And Quality—Of Phishing Scams

  45. design#future-tag-features

    [Transclude the forward-link's context]

  46. 2022-ahn-figure10-saycanrobotictasksuccessratescalinginnumberoftrainingtasks.jpg

  47. 2022-ahn-figure2-saycanqueryinglanguagemodelforoptions.jpg

  48. https://daleonai.com/bigcontextwindows

  49. https://every.to/chain-of-thought/i-spent-a-week-with-gemini-pro-1-5-it-s-fantastic

  50. 6413a29058d53aa23f7da249d170e9843a856112.html

  51. https://minerva-demo.github.io/#category=Algebra&index=1

  52. https://news.ycombinator.com/item?id=37564768

  53. 950c7fdc13bbf41fe7eaa8bda4d40370fcc3af6c.html

  54. https://old.reddit.com/r/singularity/comments/1atjz9v/ive_put_a_complex_codebase_into_a_single/

  55. 0bda7850f9c1f8b76ce61ec3d844c4aec7bb59bf.html

  56. https://research.google/blog/google-research-2022-beyond-language-vision-and-generative-models/

  57. https://research.google/blog/minerva-solving-quantitative-reasoning-problems-with-language-models/

  58. https://scale.com/leaderboard/coding

  59. https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/

  60. https://simonwillison.net/2024/Feb/21/gemini-pro-video

  61. 1cee0912bf1331b2bc9b97f4bebb4933d571ddc0.html

  62. https://thezvi.wordpress.com/2023/08/31/ai-27-portents-of-gemini/

  63. e2752311c771c38c1aefc1663328623e379f6642.html

  64. https://thezvi.wordpress.com/2024/02/27/the-gemini-incident-continues/

  65. https://thezvi.wordpress.com/2024/05/31/the-gemini-1-5-report/

  66. 604f004b22cd0eb150b457866f13f3517e96b08d.html

  67. https://www.dwarkeshpatel.com/p/demis-hassabis#%C2%A7timestamps

  68. https://www.freepatentsonline.com/y2024/0104353.html#deepmind

  69. 60589a2ab3b4503180e5d189b4a77b0c00730e73.html#deepmind

  70. https://www.lasso.security/blog/ai-package-hallucinations

  71. https://www.lesswrong.com/posts/75o8oja43LXGAqbAR/palm-2-and-gpt-4-in-extrapolating-gpt-n-performance

  72. https://www.lesswrong.com/posts/EHbJ69JDs4suovpLw/testing-palm-prompts-on-gpt3

  73. https://www.lesswrong.com/posts/JkKeFt2u4k4Q4Bmnx/linkpost-solving-quantitative-reasoning-problems-with

  74. https://www.lesswrong.com/posts/YzbQeCiwoLBHrvAh4

  75. https://www.lesswrong.com/posts/mLuQfS7gmfr4nwTdv/google-s-new-540-billion-parameter-language-model

  76. https://www.reddit.com/r/GPT3/comments/twxtwg/how_gpt3_answers_the_google_pathway_sample/

  77. 8ce410796a67d9d5edd61c93cc43bcc973320dce.html

  78. https://www.reddit.com/r/singularity/comments/1atjz9v/ive_put_a_complex_codebase_into_a_single/

  79. https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini

  80. https://www.theverge.com/2023/3/29/23662621/google-bard-chatgpt-sharegpt-training-denies

  81. https://x.com/JeffDean/status/1770653917543870571

  82. LLMs achieve adult human performance on higher-order theory of mind tasks

  83. https%253A%252F%252Farxiv.org%252Fabs%252F2405.18870%2523google.html

  84. VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

  85. https%253A%252F%252Farxiv.org%252Fabs%252F2404.05955.html

  86. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

  87. https%253A%252F%252Farxiv.org%252Fabs%252F2402.11753.html

  88. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

  89. Jason Wei

  90. https%253A%252F%252Farxiv.org%252Fabs%252F2310.03214%2523google.html

  91. How Robust is Google’s Bard to Adversarial Image Attacks?

  92. https%253A%252F%252Farxiv.org%252Fabs%252F2309.11751.html

  93. Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models

  94. https%253A%252F%252Farxiv.org%252Fabs%252F2308.12287.html

  95. Simple synthetic data reduces sycophancy in large language models

  96. https%253A%252F%252Farxiv.org%252Fabs%252F2308.03958%2523deepmind.html

  97. SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

  98. https%253A%252F%252Farxiv.org%252Fabs%252F2305.11840%2523google.html

  99. q2d: Turning Questions into Dialogs to Teach Models How to Search

  100. https%253A%252F%252Farxiv.org%252Fabs%252F2304.14318%2523google.html

  101. Larger language models do in-context learning differently

  102. Jason Wei

  103. Yi Tay

  104. https%253A%252F%252Farxiv.org%252Fabs%252F2303.03846%2523google.html

  105. Med-PaLM: Large Language Models Encode Clinical Knowledge

  106. Jason Wei

  107. https%253A%252F%252Farxiv.org%252Fabs%252F2212.13138%2523google.html

  108. Character-Aware Models Improve Visual Text Rendering

  109. William Chan

  110. https%253A%252F%252Farxiv.org%252Fabs%252F2212.10562%2523google.html

  111. Efficiently Scaling Transformer Inference

  112. https://x.com/jekbradbury

  113. https%253A%252F%252Farxiv.org%252Fabs%252F2211.05102%2523google.html

  114. U-PaLM: Transcending Scaling Laws with 0.1% Extra Compute

  115. Yi Tay

  116. Jason Wei

  117. Neil Houlsby

  118. https%253A%252F%252Farxiv.org%252Fabs%252F2210.11399%2523google.html

  119. FLAN: Scaling Instruction-Finetuned Language Models

  120. Barret Zoph

  121. Yi Tay

  122. Jason Wei

  123. https%253A%252F%252Farxiv.org%252Fabs%252F2210.11416%2523google.html

  124. Large Language Models Can Self-Improve

  125. https%253A%252F%252Farxiv.org%252Fabs%252F2210.11610%2523google.html

  126. RARR: Attributed Text Generation via Post-hoc Research and Revision

  127. https%253A%252F%252Farxiv.org%252Fabs%252F2210.08726%2523google.html

  128. Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them

  129. Yi Tay

  130. Jason Wei

  131. https%253A%252F%252Farxiv.org%252Fabs%252F2210.09261%2523google.html

  132. Language Models are Multilingual Chain-of-Thought Reasoners

  133. Yi Tay

  134. Jason Wei

  135. https%253A%252F%252Farxiv.org%252Fabs%252F2210.03057%2523google.html

  136. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

  137. https%253A%252F%252Farxiv.org%252Fabs%252F2208.01448%2523amazon.html

  138. Inner Monologue: Embodied Reasoning through Planning with Language Models

  139. Igor Mordatch

  140. Sergey Levine

  141. https%253A%252F%252Farxiv.org%252Fabs%252F2207.05608%2523google.html

  142. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

  143. Jason Wei

  144. https%253A%252F%252Farxiv.org%252Fabs%252F2205.10625%2523google.html

  145. Unifying Language Learning Paradigms

  146. Yi Tay

  147. Neil Houlsby

  148. https%253A%252F%252Farxiv.org%252Fabs%252F2205.05131%2523google.html

  149. PaLM: Scaling Language Modeling with Pathways

  150. Yi Tay

  151. https://x.com/jekbradbury

  152. Vedant Misra

  153. Barret Zoph

  154. Jason Wei

  155. https%253A%252F%252Farxiv.org%252Fabs%252F2204.02311%2523google.html

  156. Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances

  157. https://evjang.com/about/

  158. Sergey Levine

  159. https%253A%252F%252Farxiv.org%252Fabs%252F2204.01691%2523google.html