‘PaLM 2’ directory
- See Also
-
Links
- “Idiosyncrasies in Large Language Models”, Sun et al 2025
- “SycEval: Evaluating LLM Sycophancy”, Fanous et al 2025
- “Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs”, Saxena et al 2025
- “How Different LLMs Answered the PhilPapers 2020 Survey”, Satron 2025
- “Ingesting Millions of PDFs and Why Gemini 2.0 Changes Everything”, Filimonov 2025
- “Proactive Agents for Multi-Turn Text-To-Image Generation Under Uncertainty”, Hahn et al 2024
- “Alphabet Q3 Earnings Call: CEO Sundar Pichai’s Remarks”
- “AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents”, Andriushchenko et al 2024
- “Scalable Watermarking for Identifying Large Language Model Outputs”
- “Inference Scaling for Long-Context Retrieval Augmented Generation”, Yue et al 2024
- “Project Zero: From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code”
- “Training Language Models to Self-Correct via Reinforcement Learning”, Kumar et al 2024
- “On Scalable Oversight With Weak LLMs Judging Strong LLMs”, Kenton et al 2024
- “Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?”, Lee et al 2024
- “What Are the Odds? Language Models Are Capable of Probabilistic Reasoning”, Paruchuri et al 2024
- “Can Language Models Use Forecasting Strategies?”, Pratt et al 2024
- “Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization”, Wang et al 2024
- “Many-Shot In-Context Learning”, Agarwal et al 2024
- “Few-Shot Recalibration of Language Models”, Li et al 2024
- “Long-Form Factuality in Large Language Models”, Wei et al 2024
- “Don’t Trust: Verify—Grounding LLM Quantitative Reasoning With Autoformalization”, Zhou et al 2024
- “When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method”, Zhang et al 2024
- “ReST Meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent”, Aksitov et al 2023
- “Rich Human Feedback for Text-To-Image Generation”, Liang et al 2023
- “Beyond Human Data: Scaling Self-Training for Problem-Solving With Language Models (ReSTEM)”, Singh et al 2023
- “Universal Self-Consistency for Large Language Model Generation”, Chen et al 2023
- “Instruction-Following Evaluation for Large Language Models”, Zhou et al 2023
- “A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models”, Eisape et al 2023
- “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
- “RLAIF: Scaling Reinforcement Learning from Human Feedback With AI Feedback”, Lee et al 2023
- “Android in the Wild: A Large-Scale Dataset for Android Device Control”, Rawles et al 2023
- “Google’s Newest AI Model Uses Nearly 5× More Text Data for Training Than Its Predecessor”, Elias 2023
- “Pretraining Language Models With Human Preferences”, Korbak et al 2023
- “Working With AI (Part 2): Code Conversion”
- “Adversarial Misuse of Generative AI”
- “How Good Are LLMs at Doing ML on an Unknown Dataset?”
- “What Happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives”, Tay 2025
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Links
“Idiosyncrasies in Large Language Models”, Sun et al 2025
“SycEval: Evaluating LLM Sycophancy”, Fanous et al 2025
“Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs”, Saxena et al 2025
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs
“How Different LLMs Answered the PhilPapers 2020 Survey”, Satron 2025
“Ingesting Millions of PDFs and Why Gemini 2.0 Changes Everything”, Filimonov 2025
Ingesting Millions of PDFs and why Gemini 2.0 Changes Everything
“Proactive Agents for Multi-Turn Text-To-Image Generation Under Uncertainty”, Hahn et al 2024
Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
“Alphabet Q3 Earnings Call: CEO Sundar Pichai’s Remarks”
“AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents”, Andriushchenko et al 2024
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
“Scalable Watermarking for Identifying Large Language Model Outputs”
Scalable watermarking for identifying large language model outputs
“Inference Scaling for Long-Context Retrieval Augmented Generation”, Yue et al 2024
Inference Scaling for Long-Context Retrieval Augmented Generation
“Project Zero: From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code”
“Training Language Models to Self-Correct via Reinforcement Learning”, Kumar et al 2024
Training Language Models to Self-Correct via Reinforcement Learning
“On Scalable Oversight With Weak LLMs Judging Strong LLMs”, Kenton et al 2024
“Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?”, Lee et al 2024
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
“What Are the Odds? Language Models Are Capable of Probabilistic Reasoning”, Paruchuri et al 2024
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
“Can Language Models Use Forecasting Strategies?”, Pratt et al 2024
“Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization”, Wang et al 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
“Many-Shot In-Context Learning”, Agarwal et al 2024
“Few-Shot Recalibration of Language Models”, Li et al 2024
“Long-Form Factuality in Large Language Models”, Wei et al 2024
“Don’t Trust: Verify—Grounding LLM Quantitative Reasoning With Autoformalization”, Zhou et al 2024
Don’t Trust: Verify—Grounding LLM Quantitative Reasoning with Autoformalization
“When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method”, Zhang et al 2024
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
“ReST Meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent”, Aksitov et al 2023
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
“Rich Human Feedback for Text-To-Image Generation”, Liang et al 2023
“Beyond Human Data: Scaling Self-Training for Problem-Solving With Language Models (ReSTEM)”, Singh et al 2023
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReSTEM)
“Universal Self-Consistency for Large Language Model Generation”, Chen et al 2023
Universal Self-Consistency for Large Language Model Generation
“Instruction-Following Evaluation for Large Language Models”, Zhou et al 2023
“A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models”, Eisape et al 2023
A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models
“PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
PAIR: Jailbreaking Black Box Large Language Models in 20 Queries
“RLAIF: Scaling Reinforcement Learning from Human Feedback With AI Feedback”, Lee et al 2023
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
“Android in the Wild: A Large-Scale Dataset for Android Device Control”, Rawles et al 2023
Android in the Wild: A Large-Scale Dataset for Android Device Control
“Google’s Newest AI Model Uses Nearly 5× More Text Data for Training Than Its Predecessor”, Elias 2023
Google’s newest AI model uses nearly 5× more text data for training than its predecessor
“Pretraining Language Models With Human Preferences”, Korbak et al 2023
“Working With AI (Part 2): Code Conversion”
“Adversarial Misuse of Generative AI”
“How Good Are LLMs at Doing ML on an Unknown Dataset?”
“What Happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives”, Tay 2025
What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives :
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
llm-evaluation model-training scaling-reasoning multi-modal-reasoning agent-harmfulness
scaling-llm reasoning-autoformalization-contextual-learning long-context-generation multi-turn-uncertainty scaling-llm
human-feedback
Wikipedia
Miscellaneous
Bibliography
-
https://arxiv.org/abs/2412.06771#deepmind
: “Proactive Agents for Multi-Turn Text-To-Image Generation Under Uncertainty”, -
https://arxiv.org/abs/2406.13121#google
: “Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?”, -
https://arxiv.org/abs/2405.15071
: “Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization”, -
https://arxiv.org/abs/2403.18802#deepmind
: “Long-Form Factuality in Large Language Models”, -
https://arxiv.org/abs/2403.18120#google
: “Don’t Trust: Verify—Grounding LLM Quantitative Reasoning With Autoformalization”, -
https://arxiv.org/abs/2312.06585#deepmind
: “Beyond Human Data: Scaling Self-Training for Problem-Solving With Language Models (ReSTEM)”, -
https://arxiv.org/abs/2310.08419
: “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, -
https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html
: “Google’s Newest AI Model Uses Nearly 5× More Text Data for Training Than Its Predecessor”,