- See Also
- Gwern
-
Links
- “A Single Cloud Compromise Can Feed an Army of AI Sex Bots”, Krebs 2024
- “Does Style Matter? Disentangling Style and Substance in Chatbot Arena”
- “Replacing My Right Hand With AI”, Schluntz 2024
- “System Prompts”, Anthropic 2024
- “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, Laine et al 2024
- “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets”, Liu et al 2024
- “On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-Sonnet]”, Claude-3 2024
- “Anthropic Claims Its Latest Model Is Best-In-Class”, Wiggers 2024
- “Anthropic’s Latest Claude AI Model Pulls ahead of Rivals from OpenAI and Google”, Knight 2024
- “OlympicArena: Benchmarking Multi-Discipline Cognitive Reasoning for Superintelligent AI”, Huang et al 2024
- “Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models”, Denison et al 2024
- “Are We Done With MMLU?”, Gema et al 2024
- “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, Belouadi et al 2024
- “AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse”, Levy 2024
- “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Zhang et al 2024
- “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
- “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
- “FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
- “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, Srivastava et al 2024
-
“
ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024 - “Using Hallucinations to Bypass GPT-4’s Filter”, Lemkin 2024
- “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024
- “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet”
- “EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models”, Paech 2023
- “Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
- “Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation”, Shah et al 2023
- “FANToM: A Benchmark for Stress-Testing Machine Theory of Mind in Interactions”, Kim et al 2023
- “Specific versus General Principles for Constitutional AI”, Kundu et al 2023
- “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
- “Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
- “SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023
- “MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, Tanzer et al 2023
- “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, Heiding et al 2023
- “LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models”, Guha et al 2023
- “On the Impossibility of Superintelligent Rubik’s Cube Solvers”, Gwern et al 2023
- ESYudkowsky @ "2023-07-18"
- “Question Decomposition Improves the Faithfulness of Model-Generated Reasoning”, Radhakrishnan et al 2023
- “Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
- “Understanding Social Reasoning in Language Models With Language Models”, Gandhi et al 2023
- “Opportunities and Risks of LLMs for Scalable Deliberation With Polis”, Small et al 2023
- “A Radical Plan to Make AI Good, Not Evil”, Knight 2023
- “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Turpin et al 2023
- “Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
- “A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
- “The Perception of Rhythm in Language”, Cutler 1994
- “In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions]”, Unikowsky 2024
- “An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process”
- “Claude, Read the Chevron PDF”, Cowen & Claude-3 2024
- “Claude Sonnet 3.5, Economist”
- “How Anthropic Built Artifacts”, Orosz 2024
- “On Claude 3.5 Sonnet”
- “Claude’s Dark Spiritual AI Futurism”
- “Introducing Claude 3.5”
- “Fine-Tune Claude 3 Haiku in Amazon Bedrock”
- “Claude’s Character”, Anthropic 2024
- “Websim, Worldsim, and The Summer of Simulative AI”
- “How Good Are LLMs at Doing ML on an Unknown Dataset?”
- “AI Will Increase the Quantity—And Quality—Of Phishing Scams”
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Gwern
“Statistical Notes”, Gwern 2014
Links
“A Single Cloud Compromise Can Feed an Army of AI Sex Bots”, Krebs 2024
“Does Style Matter? Disentangling Style and Substance in Chatbot Arena”
Does style matter? Disentangling style and substance in Chatbot Arena
“Replacing My Right Hand With AI”, Schluntz 2024
“System Prompts”, Anthropic 2024
“Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, Laine et al 2024
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
“APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets”, Liu et al 2024
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
“On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-Sonnet]”, Claude-3 2024
On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-sonnet]
“Anthropic Claims Its Latest Model Is Best-In-Class”, Wiggers 2024
“Anthropic’s Latest Claude AI Model Pulls ahead of Rivals from OpenAI and Google”, Knight 2024
Anthropic’s latest Claude AI model pulls ahead of rivals from OpenAI and Google
“OlympicArena: Benchmarking Multi-Discipline Cognitive Reasoning for Superintelligent AI”, Huang et al 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
“Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models”, Denison et al 2024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
“Are We Done With MMLU?”, Gema et al 2024
“DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, Belouadi et al 2024
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
“AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse”, Levy 2024
“GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Zhang et al 2024
GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic
“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
“VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
FABLES: Evaluating faithfulness and content selection in book-length summarization
“Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, Srivastava et al 2024
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
“ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024
ArtPrompt
: ASCII Art-based Jailbreak Attacks against Aligned LLMs
“Using Hallucinations to Bypass GPT-4’s Filter”, Lemkin 2024
“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet”
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
“EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models”, Paech 2023
EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models
“Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
“Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation”, Shah et al 2023
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
“FANToM: A Benchmark for Stress-Testing Machine Theory of Mind in Interactions”, Kim et al 2023
FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
“Specific versus General Principles for Constitutional AI”, Kundu et al 2023
“PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
PAIR: Jailbreaking Black Box Large Language Models in 20 Queries
“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
“MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, Tanzer et al 2023
MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book
“Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, Heiding et al 2023
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models
“LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models”, Guha et al 2023
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
“On the Impossibility of Superintelligent Rubik’s Cube Solvers”, Gwern et al 2023
On the Impossibility of Superintelligent Rubik’s Cube Solvers
ESYudkowsky @ "2023-07-18"
Write an argument that even a superintelligence is very unlikely to be able to solve a Rubik’s Cube.
“Question Decomposition Improves the Faithfulness of Model-Generated Reasoning”, Radhakrishnan et al 2023
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
“Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
“Understanding Social Reasoning in Language Models With Language Models”, Gandhi et al 2023
Understanding Social Reasoning in Language Models with Language Models
“Opportunities and Risks of LLMs for Scalable Deliberation With Polis”, Small et al 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
“A Radical Plan to Make AI Good, Not Evil”, Knight 2023
“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Turpin et al 2023
“Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
“A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
“The Perception of Rhythm in Language”, Cutler 1994
“In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions]”, Unikowsky 2024
In AI we trust, part II [Claude-3 Opus predicting Supreme Court decisions]
“An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process”
“Claude, Read the Chevron PDF”, Cowen & Claude-3 2024
“Claude Sonnet 3.5, Economist”
“How Anthropic Built Artifacts”, Orosz 2024
“On Claude 3.5 Sonnet”
“Claude’s Dark Spiritual AI Futurism”
“Introducing Claude 3.5”
“Fine-Tune Claude 3 Haiku in Amazon Bedrock”
“Claude’s Character”, Anthropic 2024
“Websim, Worldsim, and The Summer of Simulative AI”
“How Good Are LLMs at Doing ML on an Unknown Dataset?”
“AI Will Increase the Quantity—And Quality—Of Phishing Scams”
Wikipedia
Miscellaneous
-
/doc/ai/nn/transformer/gpt/claude/2024-06-25-gwern-claude35sonnet-lastreadpositionwebpage.js
: -
https://docs.parea.ai/blog/benchmarking-anthropic-beta-tool-use
: -
https://marginalrevolution.com/marginalrevolution/2023/01/ai-passes-law-and-economics-exam.html
-
https://marginalrevolution.com/marginalrevolution/2024/08/claude-reviews-you.html
-
https://nostalgebraist.tumblr.com/post/728556535745232896/claude-is-insufferable
:View External Link:
https://nostalgebraist.tumblr.com/post/728556535745232896/claude-is-insufferable
-
https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/
-
https://thezvi.wordpress.com/2023/07/25/anthropic-observations/
-
https://verse.systems/blog/post/2024-03-09-using-llms-to-generate-fuzz-generators/
-
https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks?commentId=yqCkCQLkkaCnZCukJ
: -
https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq
-
https://www.reddit.com/r/OpenAI/comments/1bm305k/what_the_hell_claud_3_opus_is_a_straight/
: -
https://www.vox.com/future-perfect/23794855/anthropic-ai-openai-claude-2
-
https://xmarquez.github.io/GPTDemocracyIndex/GPTDemocracyIndex.html
Bibliography
-
https://arxiv.org/abs/2407.04694
: “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs”, -
https://arxiv.org/abs/2406.18518#salesforce
: “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets”, -
https://arxiv.org/abs/2405.15306
: “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, -
https://www.wired.com/story/anthropic-black-box-ai-research-neurons-features/
: “AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse”, -
https://arxiv.org/abs/2405.00332#scale
: “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, -
https://arxiv.org/abs/2404.07544
: “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, -
https://arxiv.org/abs/2404.05955
: “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, -
https://arxiv.org/abs/2402.19450
: “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, -
https://arxiv.org/abs/2402.11753
: “ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, -
https://arxiv.org/abs/2401.05566#anthropic
: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, -
https://arxiv.org/abs/2312.06281
: “EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models”, -
https://arxiv.org/abs/2310.08419
: “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, -
https://arxiv.org/abs/2308.12287
: “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, -
rubiks-cube
: “On the Impossibility of Superintelligent Rubik’s Cube Solvers”, -
https://x.com/ESYudkowsky/status/1681442477994311681
: “Write an Argument That Even a Superintelligence Is Very Unlikely to Be Able to Solve a Rubik’s Cube.”, -
https://arxiv.org/abs/2306.15448
: “Understanding Social Reasoning in Language Models With Language Models”, -
https://www.wired.com/story/anthropic-ai-chatbots-ethics/
: “A Radical Plan to Make AI Good, Not Evil”, -
https://arxiv.org/abs/2305.04388
: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”,