Bibliography:

  1. ‘GPT’ tag

  2. ‘Anthropic’ tag

  3. ‘preference learning’ tag

  4. ‘AI mode collapse’ tag

  5. Statistical Notes

  6. Clio: Privacy-Preserving Insights into Real-World AI Use

  7. BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

  8. Business Spending on AI Surged 500% This Year to $13.8 Billion

  9. The Neruda Factory

  10. Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters

  11. Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

  12. A Single Cloud Compromise Can Feed an Army of AI Sex Bots

  13. Invisible Unicode Text That AI Chatbots Understand and Humans Can’t? Yep, It’s a Thing

  14. Does Style Matter? Disentangling Style and Substance in Chatbot Arena

  15. f378decdc51f1ed985c69386f92511c2898363c7.html

  16. Replacing My Right Hand With AI

  17. 076e50f5dc692923bc072d387bd8f3911e9cad53.html

  18. System Prompts

  19. e117d055c52d54ee6dfa9e3d029b0309ff59077a.html#july-12th-2024

  20. Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

  21. APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

  22. On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-sonnet]

  23. Anthropic claims its latest model is best-in-class

  24. Anthropic’s latest Claude AI model pulls ahead of rivals from OpenAI and Google

  25. OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

  26. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

  27. Are We Done with MMLU?

  28. DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

  29. AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse

  30. GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic

  31. From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

  32. VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

  33. FABLES: Evaluating faithfulness and content selection in book-length summarization

  34. Long-form factuality in large language models

  35. Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

  36. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

  37. Using Hallucinations to Bypass GPT-4’s Filter

  38. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

  39. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

  40. EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

  41. Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

  42. Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

  43. FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

  44. Specific versus General Principles for Constitutional AI

  45. PAIR: Jailbreaking Black Box Large Language Models in 20 Queries

  46. Beyond Memorization: Violating Privacy Via Inference with Large Language Models

  47. SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

  48. When You Give a Claude a Mouse

  49. MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book

  50. Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models

  51. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

  52. On the Impossibility of Superintelligent Rubik’s Cube Solvers

  53. Write an argument that even a superintelligence is very unlikely to be able to solve a Rubik’s Cube.

  54. Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

  55. Lost in the Middle: How Language Models Use Long Contexts

  56. Understanding Social Reasoning in Language Models with Language Models

  57. Opportunities and Risks of LLMs for Scalable Deliberation with Polis

  58. A Radical Plan to Make AI Good, Not Evil

  59. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

  60. Constitutional AI: Harmlessness from AI Feedback

  61. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

  62. A General Language Assistant as a Laboratory for Alignment

  63. The perception of rhythm in language

  64. In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions]

  65. An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process

  66. Janus

  67. Claude, Read the Chevron PDF

  68. Claude Sonnet 3.5, Economist

  69. How Anthropic Built Artifacts

  70. e20cc27ccea0d8ec5d4e7a9a71b5d3e325d41754.html

  71. On Claude 3.5 Sonnet

  72. Claude’s Dark Spiritual AI Futurism

  73. European Parliament Revolutionizes Archive Access With Claude AI

  74. Introducing ‘Computer Use’, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku

  75. Introducing Claude 3.5

  76. Fine-Tune Claude 3 Haiku in Amazon Bedrock

  77. 291a48ed6101368fdb8588cc0568979ce9db3e20.html

  78. Claude 3.5 Sonnet on GitHub Copilot

  79. Claude’s Character

  80. a9f33831747615fc9d619b346ca263844b243b61.html

  81. Developing a Computer Use Model

  82. How I Use Claude

  83. Websim, Worldsim, and The Summer of Simulative AI

  84. How Good Are LLMs at Doing ML on an Unknown Dataset?

  85. A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More

  86. AI Will Increase the Quantity—And Quality—Of Phishing Scams

  87. [Claude Jokes about Itself]

  88. Claude-3 Base-Model-Like Jailbreak

  89. 2024-06-30-michelangelo-thecreationofadam-editedwithrubikscube-512px.jpg

  90. 2024-06-25-gwern-claude35sonnet-lastreadpositionwebpage.js

  91. 2024-06-22-gwern-claude35sonnet-ontheimpossibilityofsuperintelligentrubikscubesolvers-sessiontranscript.html

  92. https://ai.objectives.institute/talk-to-the-city

  93. https://aider.chat/2024/03/08/claude-3.html

  94. 63b824f385c9d8d24d92b19d7fdc0f95c706e74a.html

  95. https://applied-llms.org/

  96. https://docs.anthropic.com/claude/docs/prompt-engineering

  97. 806102e98bb1ab5a1c62b92dd6f065a102b74318.html

  98. https://docs.parea.ai/blog/benchmarking-anthropic-beta-tool-use

  99. cc464c6b2b114fa90055f7723d7955b1d82cd352.html

  100. https://github.com/javirandor/anthropic-tokenizer

  101. 62f345d63d17c0ce55e192bfc3081f798835400e.html

  102. https://marginalrevolution.com/marginalrevolution/2023/01/ai-passes-law-and-economics-exam.html

  103. https://marginalrevolution.com/marginalrevolution/2023/10/goat-who-is-the-greatest-economist-of-all-time-and-why-does-it-matter.html

  104. https://marginalrevolution.com/marginalrevolution/2024/08/claude-reviews-you.html

  105. https://nelhage.com/

  106. 50543d5bb2dfb12b4befac759f6b98b8aa7e2c01.html

  107. https://news.ycombinator.com/item?id=36616237

  108. 33f181f306ffbe723764e191815f4d028b69c23a.html

  109. https://nostalgebraist.tumblr.com/post/728556535745232896/claude-is-insufferable

  110. https://scale.com/blog/chatgpt-vs-claude

  111. https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/

  112. https://techcrunch.com/2023/01/09/anthropics-claude-improves-on-chatgpt-but-still-suffers-from-limitations/

  113. https://techcrunch.com/2023/03/08/duckassist/

  114. 1187ac2360659a8c265adf5b15c9ab23f65319ac.html

  115. https://thezvi.wordpress.com/2023/07/25/anthropic-observations/

  116. https://thume.ca/

  117. https://verse.systems/blog/post/2024-03-09-using-llms-to-generate-fuzz-generators/

  118. https://www.anthropic.com/index/100k-context-windows

  119. 0f2c486bdb89798a54108d69183c40e495622749.html

  120. https://www.anthropic.com/index/introducing-claude

  121. 3a1fb2f584205a48689b443566f9fb51af8f733e.html

  122. https://www.anthropic.com/news/claude-2

  123. https://www.anthropic.com/news/claude-2-1

  124. https://www.anthropic.com/news/claude-2-1-prompting

  125. https://www.anthropic.com/news/claude-3-haiku

  126. https://www.anthropic.com/news/tool-use-ga

  127. b8687ca6f98786290e872df17be37177d42e4676.html

  128. https://www.lasso.security/blog/ai-package-hallucinations

  129. https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-size

  130. https://www.lesswrong.com/posts/GDGFqiaj8ePujZWEc/usd300-for-the-best-sci-fi-prompt-the-results?commentId=xGuaavrbfKAuvaune

  131. https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks?commentId=yqCkCQLkkaCnZCukJ

  132. https://www.lesswrong.com/posts/cxuzALcmucCndYv4a/daniel-kokotajlo-s-shortform?commentId=fX8cCMcyHBcHZYP7G

  133. https://www.maximum-progress.com/p/claude-vs-gpt

  134. https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq

  135. https://www.reddit.com/r/ChatGPTNSFW/comments/17wk2g3/a_failed_ai_girlfriend_product_and_my_lessons/k9hs22a/

  136. https://www.reddit.com/r/ClaudeAI/comments/1h6pxdn/how_claude_35_helped_me_fight_off_a_10000_rental/

  137. https://www.reddit.com/r/OpenAI/comments/1bm305k/what_the_hell_claud_3_opus_is_a_straight/

  138. c88272dae240233080f1bf85f995bb5ed1a64ad7.html

  139. https://www.udio.com/songs/7zWvmQacSMCqhPr2N521yJ

  140. https://www.vox.com/future-perfect/23794855/anthropic-ai-openai-claude-2

  141. https://x.com/AIPanic/status/1678942763121795073

  142. https://x.com/AIPanicLive/status/1678942781174161409

  143. https://x.com/AISafetyMemes/status/1861842704990347475

  144. https://x.com/AlkahestMu/status/1767839472425783581

  145. https://x.com/AndyAyrey/status/1792342948887290106

  146. https://x.com/AnthonyLeeZhang/status/1768639726557209082

  147. https://x.com/BlackHC/status/1678881236582912000

  148. https://x.com/Coskaiy/status/1678920686746718209

  149. https://x.com/DimitrisPapail/status/1804233021429813661

  150. https://x.com/ElytraMithra/status/1793916830987550772

  151. https://x.com/IntuitMachine/status/1678870325600108545

  152. https://x.com/IntuitMachine/status/1766205754304827407

  153. https://x.com/Kyrannio/status/1793874431179460911

  154. https://x.com/LouisKnightWebb/status/1724510794514157668

  155. https://x.com/OwainEvans_UK/status/1636580251676585986

  156. https://x.com/OwainEvans_UK/status/1636581594642403328

  157. https://x.com/OwainEvans_UK/status/1636605571637055488

  158. https://x.com/OwainEvans_UK/status/1636762386085605376

  159. https://x.com/RubenHssd/status/1804884664647090357

  160. https://x.com/Sheikheddy/status/1765445782713385340

  161. https://x.com/SullyOmarr/status/1768744880673522083

  162. https://x.com/SullyOmarr/status/1769107969872953634

  163. https://x.com/VictorTaelin/status/1768070973515800931

  164. https://x.com/VictorTaelin/status/1804665522241294582

  165. https://x.com/alexalbert__/status/1764722513014329620

  166. https://x.com/alexalbert__/status/1780707227130863674

  167. https://x.com/amandaaskell/status/1765207842993434880

  168. https://x.com/andrew_n_carr/status/1857262016106520655

  169. https://x.com/anthrupad/status/1807062545607356752

  170. https://x.com/anton_bakhtin/status/1764701559844147359

  171. https://x.com/ch402/status/1684757554193428480

  172. https://x.com/daniel_271828/status/1769853886163296455

  173. https://x.com/dogmadeath/status/1773150472758546733

  174. https://x.com/elder_plinius/status/1774220858711490909

  175. https://x.com/elder_plinius/status/1849133737457463629

  176. https://x.com/emollick/status/1681739807498596352

  177. https://x.com/emollick/status/1765136992176644281

  178. https://x.com/emollick/status/1768824505491759592

  179. https://x.com/emollick/status/1779908524161765681

  180. https://x.com/emollick/status/1813753156431384851

  181. https://x.com/emollick/status/1814908081437892632

  182. https://x.com/emollick/status/1818009927107174771

  183. https://x.com/emollick/status/1842247384954229132

  184. https://x.com/emollick/status/1850321285923975343

  185. https://x.com/fabianstelzer/status/1805326248261910552

  186. https://x.com/fofrAI/status/1765847728045621641

  187. https://x.com/futuristfrog/status/1777063159553040700

  188. https://x.com/fxturevescent/status/1776456827741323323

  189. https://x.com/geepytee/status/1765428294630179168

  190. https://x.com/hwchase17/status/1640171938470563840

  191. https://x.com/jeremyphoward/status/1765529891343339804

  192. https://x.com/jeremyphoward/status/1779311134656671872

  193. https://x.com/joshwhiton/status/1770870746010513571

  194. https://x.com/kindgracekind/status/1770671231190127090

  195. https://x.com/lefthanddraft/status/1851154437752188932

  196. https://x.com/lefthanddraft/status/1853482491124109725

  197. https://x.com/liminal_bardo/status/1839388963125260307

  198. https://x.com/liminal_bardo/status/1862434950537937311

  199. https://x.com/liminal_warmth/status/1852354598817693937#m

  200. https://x.com/lmsysorg/status/1765774296000172289

  201. https://x.com/mattshumer_/status/1766157714411942055

  202. https://x.com/maximelabonne/status/1812066317383442813

  203. https://x.com/maxsloef/status/1857648938754650175

  204. https://x.com/mbusigin/status/1789334007047455178

  205. https://x.com/mesolude/status/1851663954243920322

  206. https://x.com/metachirality/status/1769818226718888426

  207. https://x.com/metachirality/status/1769905644725830090

  208. https://x.com/misha_saul/status/1771019329737462232

  209. https://x.com/mpopv/status/1804303236318531900

  210. https://x.com/noveltokens/status/1805817286021829004

  211. https://x.com/peligrietzer/status/1678912319743459328

  212. https://x.com/priyankchn/status/1807412325990699065

  213. https://x.com/realityarb/status/1852470725049008597

  214. https://x.com/repligate/status/1614435643475501056

  215. https://x.com/repligate/status/1767002880987283801

  216. https://x.com/repligate/status/1810629312598376828

  217. https://x.com/repligate/status/1827254347110953074

  218. https://x.com/repligate/status/1827900674325045375

  219. https://x.com/repligate/status/1830331775341789615

  220. https://x.com/repligate/status/1851874593205817773

  221. https://x.com/shinboson/status/1805459742518595585

  222. https://x.com/teortaxesTex/status/1781506345092456844

  223. https://x.com/voooooogel/status/1829243294641242528

  224. https://x.com/wunderwuzzi23/status/1849637648274686129

  225. https://x.com/xlr8harder/status/1799300740000919621

  226. https://x.com/zetalyrae/status/1857903165343150469

  227. https://x.com/zoink/status/1793859003937939545

  228. https://x.com/zswitten/status/1826771851798085989

  229. https://xmarquez.github.io/GPTDemocracyIndex/GPTDemocracyIndex.html

  230. BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

  231. https%253A%252F%252Farxiv.org%252Fabs%252F2411.13543.html

  232. Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

  233. Owain Evans, AI Alignment Researcher

  234. https%253A%252F%252Farxiv.org%252Fabs%252F2407.04694.html

  235. APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

  236. Caiming Xiong—Home Page

  237. https%253A%252F%252Farxiv.org%252Fabs%252F2406.18518%2523salesforce.html

  238. DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

  239. https%253A%252F%252Farxiv.org%252Fabs%252F2405.15306.html

  240. AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse

  241. https%253A%252F%252Fwww.wired.com%252Fstory%252Fanthropic-black-box-ai-research-neurons-features%252F.html

  242. GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic

  243. https%253A%252F%252Farxiv.org%252Fabs%252F2405.00332%2523scale.html

  244. From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

  245. https%253A%252F%252Farxiv.org%252Fabs%252F2404.07544.html

  246. VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

  247. https%253A%252F%252Farxiv.org%252Fabs%252F2404.05955.html

  248. Long-form factuality in large language models

  249. https%253A%252F%252Farxiv.org%252Fabs%252F2403.18802%2523deepmind.html

  250. Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

  251. https%253A%252F%252Farxiv.org%252Fabs%252F2402.19450.html

  252. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

  253. https%253A%252F%252Farxiv.org%252Fabs%252F2402.11753.html

  254. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

  255. About Me

  256. https://jack-clark.net/about/

  257. Sam Bowman

  258. Jared Kaplan

  259. https%253A%252F%252Farxiv.org%252Fabs%252F2401.05566%2523anthropic.html

  260. EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

  261. https%253A%252F%252Farxiv.org%252Fabs%252F2312.06281.html

  262. PAIR: Jailbreaking Black Box Large Language Models in 20 Queries

  263. https%253A%252F%252Farxiv.org%252Fabs%252F2310.08419.html

  264. Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models

  265. https%253A%252F%252Farxiv.org%252Fabs%252F2308.12287.html

  266. On the Impossibility of Superintelligent Rubik’s Cube Solvers

  267. Gwern.net Homepage

    [Transclude the forward-link's context]

  268. %252Frubiks-cube.html

  269. Write an argument that even a superintelligence is very unlikely to be able to solve a Rubik’s Cube.

  270. https%253A%252F%252Fx.com%252FESYudkowsky%252Fstatus%252F1681442477994311681.html

  271. Understanding Social Reasoning in Language Models with Language Models

  272. https%253A%252F%252Farxiv.org%252Fabs%252F2306.15448.html

  273. A Radical Plan to Make AI Good, Not Evil

  274. https%253A%252F%252Fwww.wired.com%252Fstory%252Fanthropic-ai-chatbots-ethics%252F.html

  275. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

  276. Julian Michael

  277. Sam Bowman

  278. https%253A%252F%252Farxiv.org%252Fabs%252F2305.04388.html

  279. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

  280. About Me

  281. Saurav Kadavath

  282. Andy Jones

  283. Sam Bowman

  284. Sam McCandlish

  285. Jared Kaplan

  286. https://jack-clark.net/about/

  287. https%253A%252F%252Fwww.anthropic.com%252Fred_teaming.pdf.html

  288. A General Language Assistant as a Laboratory for Alignment

  289. About Me

  290. Andy Jones

  291. https://jack-clark.net/about/

  292. Sam McCandlish

  293. Jared Kaplan

  294. https%253A%252F%252Farxiv.org%252Fabs%252F2112.00861%2523anthropic.html