‘Codex’ directory

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia (2)

GitHub Copilot
OpenAI Codex :

https://en.wikipedia.org/wiki/OpenAI_Codex

Miscellaneous

Bibliography

https://red.anthropic.com/2025/ai-for-cyber-defenders/: “AI for Cyber Defenders: AI Models Are Now Useful for Cybersecurity Tasks in Practice, Not Just Theory ”, Anthropic

link-bibliography
https://arxiv.org/abs/2506.17298: “Mercury: Ultra-Fast Language Models Based on Diffusion ”, Inception Labs, Samar Khanna, Siddhant Kharbanda, Shufan Li, Harshit Varma, Eric Wang, Sawyer Birnbaum, Ziyang Luo, Yanis Miraoui, Akash Palrecha, Stefano Ermon, Aditya Grover, Volodymyr Kuleshov

link-bibliography
https://arxiv.org/abs/2505.07215: “Measuring General Intelligence With Generated Games ”, Vivek Verma, David Huang, William Chen, Dan Klein, Nicholas Tomlin

link-bibliography
https://arxiv.org/abs/2505.02881: “Rewriting Pre-Training Data Boosts LLM Performance in Math and Code [SwallowCode/Math] ”, Kazuki Fujii, Yukito Tajima, Sakae Mizuki, Hinari Shimada, Taihei Shiotani, Koshiro Saito, Masanari Ohi, Masaki Kawamura, Taishi Nakamura, Takumi Okamoto, Shigeki Ishida, Kakeru Hattori, Youmi Ma, Hiroya Takamura, Rio Yokota, Naoaki Okazaki

link-bibliography
https://www.reddit.com/r/emacs/comments/1ka2zmv/emacs_in_the_golden_age_of_llms/: “Emacs in the Golden Age of LLMs Has Become the Truly Flexible Editor It Was Always Promised to Be but Never Achieved ”, AmateurPhotoGuy415

link-bibliography
https://arxiv.org/abs/2502.06807#openai: “Competitive Programming With Large Reasoning Models ”, Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Łukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou

link-bibliography
https://registerspill.thorstenball.com/p/they-all-use-it: “They All Use It ”, Thorsten Ball

link-bibliography
https://arxiv.org/abs/2410.06992: “SWE-Bench+: Enhanced Coding Benchmark for LLMs ”, Reem Aleithan, Haoran Xue, Mohammad Mahdi Mohajer, Elijah Nnorom, Gias Uddin, Song Wang

link-bibliography
https://arxiv.org/abs/2410.07095#openai: “MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering ”, Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, Aleksander Madry

link-bibliography
https://www.ft.com/content/4868bd38-613c-4fa9-ba9d-1ed8fa8a40c8: “AI-Powered Coding Pulls in Almost $1bn of Funding to Claim ‘Killer App’ Status ”, Madhumita Murgia

link-bibliography
https://arxiv.org/abs/2406.18518#salesforce: “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

link-bibliography
https://arxiv.org/abs/2405.15793: “SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering ”, John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press

link-bibliography
https://www.wsj.com/tech/ai/a-peter-thiel-backed-ai-startup-cognition-labs-seeks-2-billion-valuation-998fa39d: “A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round Could Increase Startup’s Valuation Nearly Sixfold in a Matter of Weeks, Reflecting AI Frenzy ”, Berber Jin

link-bibliography
https://arxiv.org/abs/2403.18624: “Vulnerability Detection With Code Language Models: How Far Are We? ”, Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, Yizheng Chen

link-bibliography
https://www.bloomberg.com/news/articles/2024-03-12/cognition-ai-is-a-peter-thiel-backed-coding-assistant: “Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A New Startup Called Cognition AI Can Turn a User’s Prompt into a Website or Video Game ”, Ashlee Vance

link-bibliography
2024-harding.pdf: “Coding on Copilot: 2023 Data Shows Downward Pressure on Code Quality, Plus Projections for 2024 ”, William Harding, Matthew Kloster

link-bibliography
https://arxiv.org/abs/2401.05566#anthropic: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ”, Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

link-bibliography
https://arxiv.org/abs/2312.11556: “StarVector: Generating Scalable Vector Graphics Code from Images ”, Juan A. Rodriguez, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, Marco Pedersoli

link-bibliography
https://arxiv.org/abs/2310.04406: “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models ”, Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang

link-bibliography
https://arxiv.org/abs/2310.03262: “PassUntil: Predicting Emergent Abilities With Infinite Resolution Evaluation ”, Shengding Hu, Xin Liu, Xu Han, Xinrong Zhang, Chaoqun He, Weilin Zhao, Yankai Lin, Ning Ding, Zebin Ou, Guoyang Zeng, Zhiyuan Liu, Maosong Sun

link-bibliography
https://arxiv.org/abs/2310.02059: “Security Weaknesses of Copilot Generated Code in GitHub ”, Yujia Fu, Peng Liang, Amjed Tahir, Zengyang Li, Mojtaba Shahin, Jiaxin Yu, Jinfu Chen

link-bibliography
https://arxiv.org/abs/2308.07921: “Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-Based Self-Verification ”, Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, Hongsheng Li

link-bibliography
https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots: “AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—And Not Going Anywhere ”, Josh Dzieza

link-bibliography
https://arxiv.org/abs/2306.04930#microsoft: “When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF) ”, Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz

link-bibliography
https://arxiv.org/abs/2303.11455: “Large Language Models and Simple, Stupid Bugs ”, Kevin Jesse, Toufique Ahmed, Premkumar T. Devanbu, Emily Morgan

link-bibliography
https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/: “Introducing Microsoft 365 Copilot—Your Copilot for Work ”, Jared Spataro

link-bibliography
https://arxiv.org/abs/2303.03846#google: “Larger Language Models Do In-Context Learning Differently ”, Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

link-bibliography
https://arxiv.org/abs/2302.12433: “ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics ”, Zhangir Azerbayev, Bartosz Piotrowski, Hailey Schoelkopf, Edward W. Ayers, Dragomir Radev, Jeremy Avigad

link-bibliography
https://www.cnbc.com/2023/01/31/google-testing-chatgpt-like-chatbot-apprentice-bard-with-employees.html: “Google Is Asking Employees to Test Potential ChatGPT Competitors, including a Chatbot Called 'Apprentice Bard' ”, Jennifer Elias

link-bibliography
https://arxiv.org/abs/2301.08653: “An Analysis of the Automatic Bug Fixing Performance of ChatGPT ”, Dominik Sobania, Martin Briesch, Carol Hanna, Justyna Petke

link-bibliography
https://azure.microsoft.com/en-us/blog/general-availability-of-azure-openai-service-expands-access-to-large-advanced-ai-models-with-added-enterprise-benefits/: “General Availability of Azure OpenAI Service Expands Access to Large, Advanced AI Models With Added Enterprise Benefits ”, Eric Boyd

link-bibliography
https://arxiv.org/abs/2211.15533: “The Stack: 3 TB of Permissively Licensed Source Code ”, Denis Kocetkov, Raymond Li, Loubna Ben Allal, Jia Li, Chenghao Mou, Carlos Muñoz Ferrandis, Yacine Jernite, Margaret Mitchell, Sean Hughes, Thomas Wolf, Dzmitry Bahdanau, Leandro von Werra, Harm de Vries

link-bibliography
https://greylock.com/greymatter/kevin-scott-ai-programming-possibility/: “Programming Possibility: Kevin Scott on AI’s Impact on Cognitive Work ”, Reid Hoffman, Kevin Scott

link-bibliography
https://arxiv.org/abs/2210.09261#google: “Challenging BIG-Bench Tasks (BBH) and Whether Chain-Of-Thought Can Solve Them ”, Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

link-bibliography
https://arxiv.org/abs/2209.01975: “Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners ”, Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah Smith, Tao Yu

link-bibliography
https://arxiv.org/abs/2207.08143: “Can Large Language Models Reason about Medical Questions? ”, Valentin Liévin, Christoffer Egeberg Hother, Ole Winther

link-bibliography
https://arxiv.org/abs/2205.06537#github: “Productivity Assessment of Neural Code Completion ”, Albert Ziegler, Eirini Kalliamvakou, Shawn Simister, Ganesh Sittampalam, Alice Li, Andrew Rice, Devon Rifkin, Edward Aftandilian

link-bibliography
https://arxiv.org/abs/2204.05999#facebook: “InCoder: A Generative Model for Code Infilling and Synthesis ”, Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

link-bibliography
https://arxiv.org/abs/2204.02311#google: “PaLM: Scaling Language Modeling With Pathways ”, Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel

link-bibliography
2022-vaithilingam.pdf: “Expectation versus Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models ”, Priyan Vaithilingam, Tianyi Zhang, Elena Glassman

link-bibliography
https://arxiv.org/abs/2201.10005#openai: “Text and Code Embeddings by Contrastive Pre-Training ”, Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, Lilian Weng

link-bibliography
https://arxiv.org/abs/2112.15594: “A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More ”, Iddo Drori, Sunny Tran, Roman Wang, Newman Cheng, Kevin Liu, Leonard Tang, Elizabeth Ke, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, Gilbert Strang

link-bibliography
https://arxiv.org/abs/2112.09332#openai: “WebGPT: Browser-Assisted Question-Answering With Human Feedback ”, Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

link-bibliography
https://openai.com/research/webgpt: “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing ”, Jacob Hilton, Suchir Balaji, Reiichiro Nakano, John Schulman

link-bibliography
https://arxiv.org/abs/2112.11446#deepmind: “Scaling Language Models: Methods, Analysis & Insights from Training Gopher ”, Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent Sifre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d’Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving

link-bibliography
https://arxiv.org/abs/2111.11904#microsoft: “Can Pre-Trained Language Models Be Used to Resolve Textual and Semantic Merge Conflicts? ”, Jialu Zhang, Todd Mytkowicz, Mike Kaufman, Ruzica Piskac, Shuvendu K. Lahiri

link-bibliography
https://arxiv.org/abs/2111.08267: “Solving Probability and Statistics Problems by Program Synthesis ”, Leonard Tang, Elizabeth Ke, Nikhil Singh, Nakul Verma, Iddo Drori

link-bibliography
2021-jiang-2.pdf: “GenLine and GenForm: Two Tools for Interacting With Generative Language Models in a Code Editor ”, Ellen Jiang, Edwin Toh, Alejandra Molina, Aaron Donsbach, Carrie Cai, Michael Terry

link-bibliography