- See Also
-
Links
- “Anthropic Claims Its Latest Model Is Best-In-Class”, Wiggers 2024
- “OlympicArena: Benchmarking Multi-Discipline Cognitive Reasoning for Superintelligent AI”, Huang et al 2024
- “Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models”, Denison et al 2024
- “Are We Done With MMLU?”, Gema et al 2024
- “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, Belouadi et al 2024
- “AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse”, Levy 2024
- “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Zhang et al 2024
- “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
- “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
- “FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
- “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, Srivastava et al 2024
-
“
ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024 - “Using Hallucinations to Bypass GPT-4’s Filter”, Lemkin 2024
- “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024
- “Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
- “Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation”, Shah et al 2023
- “FANToM: A Benchmark for Stress-Testing Machine Theory of Mind in Interactions”, Kim et al 2023
- “Specific versus General Principles for Constitutional AI”, Kundu et al 2023
- “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
- “Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
- “SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023
- “MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, Tanzer et al 2023
- “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, Heiding et al 2023
- “LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models”, Guha et al 2023
- “On the Impossibility of Superintelligent Rubik’s Cube Solvers”, Gwern et al 2023
- ESYudkowsky @ "2023-07-18"
- “Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
- “Opportunities and Risks of LLMs for Scalable Deliberation With Polis”, Small et al 2023
- “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Turpin et al 2023
- “A Radical Plan to Make AI Good, Not Evil”, Knight 2023
- “Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
- “A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
- “The Perception of Rhythm in Language”, Cutler 1994
- “In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions]”, Unikowsky 2024
- “An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process”
- “Claude Sonnet 3.5, Economist”
- “On Claude 3.5 Sonnet”
- “Claude’s Dark Spiritual AI Futurism”
- “Introducing Claude 3.5”
- “Claude’s Character”, Anthropic 2024
- “AI Will Increase the Quantity—And Quality—Of Phishing Scams”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“Anthropic Claims Its Latest Model Is Best-In-Class”, Wiggers 2024
“OlympicArena: Benchmarking Multi-Discipline Cognitive Reasoning for Superintelligent AI”, Huang et al 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
“Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models”, Denison et al 2024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
“Are We Done With MMLU?”, Gema et al 2024
“DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, Belouadi et al 2024
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
“AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse”, Levy 2024
“GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Zhang et al 2024
GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic
“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
“VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
FABLES: Evaluating faithfulness and content selection in book-length summarization
“Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, Srivastava et al 2024
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
“ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024
ArtPrompt
: ASCII Art-based Jailbreak Attacks against Aligned LLMs
“Using Hallucinations to Bypass GPT-4’s Filter”, Lemkin 2024
“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
“Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
“Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation”, Shah et al 2023
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
“FANToM: A Benchmark for Stress-Testing Machine Theory of Mind in Interactions”, Kim et al 2023
FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
“Specific versus General Principles for Constitutional AI”, Kundu et al 2023
“PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
PAIR: Jailbreaking Black Box Large Language Models in 20 Queries
“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
“MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, Tanzer et al 2023
MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book
“Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, Heiding et al 2023
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models
“LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models”, Guha et al 2023
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
“On the Impossibility of Superintelligent Rubik’s Cube Solvers”, Gwern et al 2023
On the Impossibility of Superintelligent Rubik’s Cube Solvers
ESYudkowsky @ "2023-07-18"
Write an argument that even a superintelligence is very unlikely to be able to solve a Rubik’s Cube.
“Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
“Opportunities and Risks of LLMs for Scalable Deliberation With Polis”, Small et al 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Turpin et al 2023
“A Radical Plan to Make AI Good, Not Evil”, Knight 2023
“Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
“A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
“The Perception of Rhythm in Language”, Cutler 1994
“In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions]”, Unikowsky 2024
In AI we trust, part II [Claude-3 Opus predicting Supreme Court decisions]
“An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process”
“Claude Sonnet 3.5, Economist”
“On Claude 3.5 Sonnet”
“Claude’s Dark Spiritual AI Futurism”
“Introducing Claude 3.5”
“Claude’s Character”, Anthropic 2024
“AI Will Increase the Quantity—And Quality—Of Phishing Scams”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
deception-detection illusion reasoning-benchmarks multimodal-understanding symbolic-reasoning jailbreak-attacks jailbreak-attacks
llm-security
lm-benchmarks
language-rhythm
Wikipedia
Miscellaneous
-
/doc/ai/nn/transformer/gpt/claude/2024-06-25-gwern-claude35sonnet-lastreadpositionwebpage.js
: -
https://docs.parea.ai/blog/benchmarking-anthropic-beta-tool-use
: -
https://marginalrevolution.com/marginalrevolution/2023/01/ai-passes-law-and-economics-exam.html
-
https://nostalgebraist.tumblr.com/post/728556535745232896/claude-is-insufferable
:View External Link:
https://nostalgebraist.tumblr.com/post/728556535745232896/claude-is-insufferable
-
https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/
-
https://thezvi.wordpress.com/2023/07/25/anthropic-observations/
-
https://verse.systems/blog/post/2024-03-09-using-llms-to-generate-fuzz-generators/
-
https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks?commentId=yqCkCQLkkaCnZCukJ
: -
https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq
-
https://www.reddit.com/r/OpenAI/comments/1bm305k/what_the_hell_claud_3_opus_is_a_straight/
: -
https://www.vox.com/future-perfect/23794855/anthropic-ai-openai-claude-2
-
https://xmarquez.github.io/GPTDemocracyIndex/GPTDemocracyIndex.html
Link Bibliography
-
https://arxiv.org/abs/2405.15306
: “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, -
https://www.wired.com/story/anthropic-black-box-ai-research-neurons-features/
: “AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse”, -
https://arxiv.org/abs/2405.00332#scale
: “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, -
https://arxiv.org/abs/2404.07544
: “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, -
https://arxiv.org/abs/2404.05955
: “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, -
https://arxiv.org/abs/2402.19450
: “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap”, -
https://arxiv.org/abs/2402.11753
: “ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, -
https://arxiv.org/abs/2401.05566#anthropic
: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, -
https://arxiv.org/abs/2310.08419
: “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, -
https://arxiv.org/abs/2308.12287
: “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models”, -
rubiks-cube
: “On the Impossibility of Superintelligent Rubik’s Cube Solvers”, -
https://x.com/ESYudkowsky/status/1681442477994311681
: “Write an Argument That Even a Superintelligence Is Very Unlikely to Be Able to Solve a Rubik’s Cube.”, -
https://arxiv.org/abs/2305.04388
: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, -
https://www.wired.com/story/anthropic-ai-chatbots-ethics/
: “A Radical Plan to Make AI Good, Not Evil”, -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”,