OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
LLMs achieve adult human performance on higher-order theory of mind tasks
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
HyperAttention: Long-context Attention in Near-Linear Time
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models
Simple synthetic data reduces sycophancy in large language models
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models
q2d: Turning Questions into Dialogs to Teach Models How to Search
Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models
Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation with Interaction
Memory Augmented Large Language Models are Computationally Universal
RARR: Attributed Text Generation via Post-hoc Research and Revision
Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them
Language Models are Multilingual Chain-of-Thought Reasoners
ReAct: Synergizing Reasoning and Acting in Language Models
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Inner Monologue: Embodied Reasoning through Planning with Language Models
Solving Quantitative Reasoning Problems with Language Models
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
PaLM § Figure 19: [Explaining a Joke / Inference Chaining] Each ‘Input” Was Independently Prepended With the Same 2-Shot Exemplar Shown at the Top, and “Model Output’ Shows the Greedy Decoding Output of PaLM 540B. The Two Exemplar Jokes Are Known Jokes (explanations Written by Authors), While All Evaluated Jokes Were Written by the Authors. Of Course, These Jokes Do Share Abstract Premises With Existing Jokes (wordplay, Reliability, Humorous Analogies, Reversal-Of-Expectations). The Inference Chaining Examples Were Also Written by the Authors.
6cdac06b552d242ed33f68d838d884af52e82e92.pdf#page=38&org=google
AI Will Increase the Quantity—And Quality—Of Phishing Scams
2022-ahn-figure10-saycanrobotictasksuccessratescalinginnumberoftrainingtasks.jpg
2022-ahn-figure2-saycanqueryinglanguagemodelforoptions.jpg
https://every.to/chain-of-thought/i-spent-a-week-with-gemini-pro-1-5-it-s-fantastic
https://old.reddit.com/r/singularity/comments/1atjz9v/ive_put_a_complex_codebase_into_a_single/
https://research.google/blog/google-research-2022-beyond-language-vision-and-generative-models/
https://research.google/blog/minerva-solving-quantitative-reasoning-problems-with-language-models/
https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/
https://thezvi.wordpress.com/2023/08/31/ai-27-portents-of-gemini/
https://thezvi.wordpress.com/2024/02/27/the-gemini-incident-continues/
https://thezvi.wordpress.com/2024/05/31/the-gemini-1-5-report/
https://www.dwarkeshpatel.com/p/demis-hassabis#%C2%A7timestamps
https://www.freepatentsonline.com/y2024/0104353.html#deepmind
https://www.lesswrong.com/posts/75o8oja43LXGAqbAR/palm-2-and-gpt-4-in-extrapolating-gpt-n-performance
https://www.lesswrong.com/posts/EHbJ69JDs4suovpLw/testing-palm-prompts-on-gpt3
https://www.lesswrong.com/posts/JkKeFt2u4k4Q4Bmnx/linkpost-solving-quantitative-reasoning-problems-with
https://www.lesswrong.com/posts/mLuQfS7gmfr4nwTdv/google-s-new-540-billion-parameter-language-model
https://www.reddit.com/r/GPT3/comments/twxtwg/how_gpt3_answers_the_google_pathway_sample/
https://www.reddit.com/r/singularity/comments/1atjz9v/ive_put_a_complex_codebase_into_a_single/
https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini
https://www.theverge.com/2023/3/29/23662621/google-bard-chatgpt-sharegpt-training-denies
LLMs achieve adult human performance on higher-order theory of mind tasks
https%253A%252F%252Farxiv.org%252Fabs%252F2405.18870%2523google.html
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
https%253A%252F%252Farxiv.org%252Fabs%252F2310.03214%2523google.html
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models
Simple synthetic data reduces sycophancy in large language models
https%253A%252F%252Farxiv.org%252Fabs%252F2308.03958%2523deepmind.html
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models
https%253A%252F%252Farxiv.org%252Fabs%252F2305.11840%2523google.html
q2d: Turning Questions into Dialogs to Teach Models How to Search
https%253A%252F%252Farxiv.org%252Fabs%252F2304.14318%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2303.03846%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2212.13138%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2212.10562%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2211.05102%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2210.11399%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2210.11416%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2210.11610%2523google.html
RARR: Attributed Text Generation via Post-hoc Research and Revision
https%253A%252F%252Farxiv.org%252Fabs%252F2210.08726%2523google.html
Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them
https%253A%252F%252Farxiv.org%252Fabs%252F2210.09261%2523google.html
Language Models are Multilingual Chain-of-Thought Reasoners
https%253A%252F%252Farxiv.org%252Fabs%252F2210.03057%2523google.html
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
https%253A%252F%252Farxiv.org%252Fabs%252F2208.01448%2523amazon.html
Inner Monologue: Embodied Reasoning through Planning with Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2207.05608%2523google.html
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2205.10625%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2205.05131%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2204.02311%2523google.html
Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances
https%253A%252F%252Farxiv.org%252Fabs%252F2204.01691%2523google.html
Wikipedia Bibliography: