- See Also
-
Links
- “A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Zhang et al 2024
- “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
- “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
- “FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
-
“
ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024 - “Using Hallucinations to Bypass GPT-4’s Filter”, Lemkin 2024
- “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024
- “Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
- “Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation”, Shah et al 2023
- “Specific versus General Principles for Constitutional AI”, Kundu et al 2023
- “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
- “Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
- “SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023
- “Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
- “Opportunities and Risks of LLMs for Scalable Deliberation With Polis”, Small et al 2023
- “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Turpin et al 2023
- “A Radical Plan to Make AI Good, Not Evil”, Knight 2023
- “Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
- “A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
- “The Perception of Rhythm in Language”, Cutler 1994
- Miscellaneous
- Link Bibliography
See Also
Links
“A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Zhang et al 2024
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
“VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
FABLES: Evaluating faithfulness and content selection in book-length summarization
“ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024
ArtPrompt
: ASCII Art-based Jailbreak Attacks against Aligned LLMs
“Using Hallucinations to Bypass GPT-4’s Filter”, Lemkin 2024
“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
“Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
“Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation”, Shah et al 2023
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
“Specific versus General Principles for Constitutional AI”, Kundu et al 2023
“PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
PAIR: Jailbreaking Black Box Large Language Models in 20 Queries
“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
“Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
“Opportunities and Risks of LLMs for Scalable Deliberation With Polis”, Small et al 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Turpin et al 2023
“A Radical Plan to Make AI Good, Not Evil”, Knight 2023
“Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
“A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
“The Perception of Rhythm in Language”, Cutler 1994
Miscellaneous
-
https://docs.parea.ai/blog/benchmarking-anthropic-beta-tool-use
-
https://marginalrevolution.com/marginalrevolution/2023/01/ai-passes-law-and-economics-exam.html
-
https://nostalgebraist.tumblr.com/post/728556535745232896/claude-is-insufferable
:View External Link:
https://nostalgebraist.tumblr.com/post/728556535745232896/claude-is-insufferable
-
https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/
-
https://thezvi.wordpress.com/2023/07/25/anthropic-observations/
-
https://twitter.com/AIPanicLive/status/1678942781174161409
: -
https://twitter.com/AnthonyLeeZhang/status/1768639726557209082
: -
https://twitter.com/IntuitMachine/status/1678870325600108545
: -
https://twitter.com/IntuitMachine/status/1766205754304827407
: -
https://twitter.com/LouisKnightWebb/status/1724510794514157668
-
https://twitter.com/OwainEvans_UK/status/1636580251676585986
-
https://twitter.com/OwainEvans_UK/status/1636581594642403328
: -
https://twitter.com/OwainEvans_UK/status/1636605571637055488
-
https://twitter.com/OwainEvans_UK/status/1636762386085605376
: -
https://twitter.com/VictorTaelin/status/1768070973515800931
: -
https://twitter.com/alexalbert__/status/1780707227130863674
: -
https://twitter.com/amandaaskell/status/1765207842993434880
: -
https://twitter.com/anton_bakhtin/status/1764701559844147359
: -
https://twitter.com/daniel_271828/status/1769853886163296455
: -
https://twitter.com/elder_plinius/status/1774220858711490909
: -
https://twitter.com/futuristfrog/status/1777063159553040700
: -
https://twitter.com/fxturevescent/status/1776456827741323323
: -
https://twitter.com/jeremyphoward/status/1765529891343339804
: -
https://twitter.com/jeremyphoward/status/1779311134656671872
: -
https://twitter.com/kindgracekind/status/1770671231190127090
: -
https://twitter.com/mattshumer_/status/1766157714411942055
: -
https://twitter.com/metachirality/status/1769818226718888426
: -
https://twitter.com/metachirality/status/1769905644725830090
: -
https://twitter.com/peligrietzer/status/1678912319743459328
: -
https://verse.systems/blog/post/2024-03-09-using-llms-to-generate-fuzz-generators/
-
https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks?commentId=yqCkCQLkkaCnZCukJ
: -
https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq
-
https://www.reddit.com/r/OpenAI/comments/1bm305k/what_the_hell_claud_3_opus_is_a_straight/
-
https://www.vox.com/future-perfect/23794855/anthropic-ai-openai-claude-2
-
https://xmarquez.github.io/GPTDemocracyIndex/GPTDemocracyIndex.html
Link Bibliography
-
https://arxiv.org/abs/2405.00332#scale
: “A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, -
https://arxiv.org/abs/2402.11753
: “ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, -
https://arxiv.org/abs/2401.05566#anthropic
: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, -
https://arxiv.org/abs/2310.08419
: “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, -
https://arxiv.org/abs/2305.04388
: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, -
https://www.wired.com/story/anthropic-ai-chatbots-ethics/
: “A Radical Plan to Make AI Good, Not Evil”, -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”,