Scalable Watermarking for Identifying Large Language Model Outputs
Inference Scaling for Long-Context Retrieval Augmented Generation
Project Zero: From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
Training Language Models to Self-Correct via Reinforcement Learning
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReSTEM)
Universal Self-Consistency for Large Language Model Generation
Instruction-Following Evaluation for Large Language Models
A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models
PAIR: Jailbreaking Black Box Large Language Models in 20 Queries
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Android in the Wild: A Large-Scale Dataset for Android Device Control
Google’s newest AI model uses nearly 5× more text data for training than its predecessor
What Happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives
https://blog.google/technology/ai/google-palm-2-ai-large-language-model/
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
https%253A%252F%252Farxiv.org%252Fabs%252F2406.13121%2523google.html
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
https%253A%252F%252Farxiv.org%252Fabs%252F2403.18802%2523deepmind.html
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReSTEM)
https%253A%252F%252Farxiv.org%252Fabs%252F2312.06585%2523deepmind.html
PAIR: Jailbreaking Black Box Large Language Models in 20 Queries
Google’s newest AI model uses nearly 5× more text data for training than its predecessor
https%253A%252F%252Fwww.cnbc.com%252F2023%252F05%252F16%252Fgoogles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html.html
Wikipedia Bibliography: