Business Spending on AI Surged 500% This Year to $13.8 Billion
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Project Zero: From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
Evaluation of OpenAI o1: Opportunities and Challenges of AGI
AI-powered coding pulls in almost $1bn of funding to claim ‘killer app’ status
Prompt Injection in ‘Resolve Vulnerabilty’ Results in Arbitrary Command Execution in Victim’s Pipeline
To Code, or Not To Code? Exploring Impact of Code in Pre-training
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round could increase startup’s valuation nearly sixfold in a matter of weeks, reflecting AI frenzy
Vulnerability Detection with Code Language Models: How Far Are We?
Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A new startup called Cognition AI can turn a user’s prompt into a website or video game
TestGen-LLM: Automated Unit Test Improvement using Large Language Models at Meta
The Impact of AI Tool on Engineering at ANZ Bank: An Empirical Study on GitHub Copilot Within a Corporate Environment
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Coding on Copilot: 2023 Data Shows Downward Pressure on Code Quality, Plus Projections for 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Leveraging Large Language Models to Boost Dafny’s Developers Productivity
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
StarVector: Generating Scalable Vector Graphics Code from Images
Universal Self-Consistency for Large Language Model Generation
LLM-Assisted Code Cleaning For Training Accurate Code Generators
A Coder Considers the Waning Days of the Craft: Coding has always felt to me like an endlessly deep and rich domain. Now I find myself wanting to write a eulogy for it
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Eureka: Human-Level Reward Design via Coding Large Language Models
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
PassUntil: Predicting Emergent Abilities with Infinite Resolution Evaluation
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
Insights into Stack Overflow’s traffic: We’re setting the record straight
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
Explaining Competitive-Level Programming Solutions using LLMs
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
AI Is a Lot of Work: As the technology becomes ubiquitous, a vast tasker underclass is emerging—and not going anywhere
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)
CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Authoring
Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing
Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
Today was the first day that I could definitively say that GPT-4 has saved me a substantial amount of tedious work
Reflexion: Language Agents with Verbal Reinforcement Learning
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning
Google is asking employees to test potential ChatGPT competitors, including a chatbot called 'Apprentice Bard'
An Analysis of the Automatic Bug Fixing Performance of ChatGPT
Connor Leahy on Aliens, Ethics, Economics, Memetics, and Education § GPT-4
General availability of Azure OpenAI Service expands access to large, advanced AI models with added enterprise benefits
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages
Programming Possibility: Kevin Scott on AI’s Impact on Cognitive Work
Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them
Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners
Repair Is Nearly Generation: Multilingual Program Repair with LLMs
Limitations of Language Models in Arithmetic and Symbolic Induction
PanGu-Coder: Program Synthesis with Function-Level Language Modeling
Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code
Repository-Level Prompt Generation for Large Language Models of Code
InCoder: A Generative Model for Code Infilling and Synthesis
Evaluating the Text-to-SQL Capabilities of Large Language Models
Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models
PolyCoder: A Systematic Evaluation of Large Language Models of Code
Pop Quiz! Can a Large Language Model Help With Reverse Engineering?
Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models
A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More
Few-Shot Semantic Parsing with Language Models Trained On Code
WebGPT: Browser-assisted question-answering with human feedback
WebGPT: Improving the factual accuracy of language models through web browsing
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?
Solving Probability and Statistics Problems by Program Synthesis
Automatic Program Repair with OpenAI’s Codex: Evaluating QuixBugs
GenLine and GenForm: Two Tools for Interacting with Generative Language Models in a Code Editor
An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions
Learning C to x86 Translation: An Experiment in Neural Compilation
TAPEX: Table Pre-training via Learning a Neural SQL Executor
Research recitation: A first look at rote learning in GitHub Copilot suggestions
Microsoft and OpenAI have a new AI tool that will give coding suggestions to software developers
SymbolicGPT: A Generative Transformer Model for Symbolic Regression
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning
GraphCodeBERT: Pre-training Code Representations with Data Flow
CoCoNuT: Combining Context-Aware Neural Translation Models using Ensemble for Program Repair
TransCoder: Unsupervised Translation of Programming Languages
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
Building Games and Apps Entirely through Natural Language Using OpenAI’s Code-Davinci Model
An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process
Copilot Stops Working on `gender` Related Subjects · Community · Discussion #72603
Revolutionize Your Project Documentation With the Codex-README Generator, Utilizing OpenAI's Codex for Intelligent README Creation.
Fun and Dystopia With AI-Based Code Generation Using GPT-J-6B
There’s a Running Theme in Here of Programming Problems LLMs Solve Where It’s...
Introducing ‘Computer Use’, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku
Who Models the Models That Model Models? An Exploration of GPT-3’s In-Context Model Fitting Ability
A.I. Can Now Write Its Own Computer Code. That’s Good News for Humans.
OpenAI Can Translate English into Code With Its New Machine Learning Software Codex
FROM PLAIN TO EXPLAINED IN FIVE MINUTES: Getting Started With Stenography Autopilot
I Built a Todo List App Simply by Describing It to GPT-3. It Generated the React Code for a Fully Functioning App within Seconds. I’m Becoming More Impressed and Aware of Its Capabilities Every Single Day.
I Gave GPT-3 Access to Chrome With the Objective ‘Please Buy Me AirPods’...It Successfully Made It to the Product Page, but Got Sidetracked With Walmart’s Privacy Policy. Since Even a Simplified DOM Is Far Too Large for a Single Prompt, Multiple Prompts Are given Different Chunks of the DOM, Each Generating Their Own ‘Interaction’. Another Prompt Then Takes All the Proposed Interactions and Selects the Best One, Sort of like a Tournament Bracket. For More Complex Web Pages, the Time It Takes to Generate an Action Scales at 𝒪(log n) With the Size of the DOM—Really Fast! It Also Gets around Token Limits, so You Could Technically Process an Infinitely Large DOM!
The Examples Are Indeed Extremely Simple on Purpose (otherwise It’s Hard to Communicate Efficiently What’s Happening to Non-Metamath Experts). That Being Said, We’re Still Pretty Far Away from IMOs; but This Is Definitely a Goal for Us, and One We’re Actively Working Towards!
XBOW Now Matches the Capabilities of a Top Human Pentester
2024-harding-figure2-gitcodemodificationsbytypeovertime.jpg
2022-neelakantan-figure1-gpt3textcodeembeddingscalingbymodelsize.png
2022-ziegler-figure5-githubcopilotcodecompletionsuggestionacceptanceratebyprogramminglanguage.jpg
2021-austin-figure3-lamdaprogrammingperformancevsmodelscaling.png
2021-austin-figure4-fractionofsamplessolvingeachtaskbylamdamodelscaling.png
2021-nakano-figure1-gpt3textbrowserenvironmentobservations.png
2021-nakano-figure2-humanevaluationsofscalinggpt3questionanswering.png
2021-nakano-figure5-humanpreferencebynumberofrandomsamplesgeneratedforpreferenceranking.png
2021-nakano-figure6-behaviorcloningscalingbydemonstrationsandparametercount.jpg
2021-nakano-figure7-bestfnscalingbyflopsandanswerssampled.jpg
2021-nakano-figure7-rewardmodelscalingbycomparisonsandparametercount.jpg
2021-rae-figure3-gopherscalingcurvesforfeverfactcheckinginusingevidenceforreasoning.png
2021-rae-figure4-gopherscalingacrossfamiliesoftasksupto280bparameters.jpg
2021-rae-figurea17-gopherfewshotcapabilityemergesontruthfulqaby280bparameters.jpg
2021-zhang-figure8-gpt3vsgptjjavascriptmergeaccuracybynumberofattempts.png
https://about.sourcegraph.com/blog/cheating-is-all-you-need
https://andrewmayne.com/2023/03/23/chatgpt-code-interpreter-magic/
https://blog.mentat.ai/benchmarking-gpt-4-turbo-a-cautionary-tale
https://borretti.me/article/astronomical-calculations-for-hard-sf-common-lisp
https://builtin.com/job/customer-success/expert-ai-teacher-contract/1267315
https://docs.parea.ai/blog/benchmarking-anthropic-beta-tool-use
https://finedataproducts.com/posts/2024-03-10-tax-scenarios-with-ai/
https://gist.github.com/harryaskham/68a611bef777525991790bca2f2d324d
https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/
https://github.blog/2023-02-14-github-copilot-for-business-is-now-available/
https://github.blog/2023-11-08-universe-2023-copilot-transforms-github-into-the-ai-powered-developer-platform/
https://github.com/E-xyza/Exonerate/blob/master/bench/reports/gpt-bench.md
https://github.com/aiwebb/treenav-bench#interesting-findings
https://github.com/jujumilk3/leaked-system-prompts/blob/main/github-copilot-chat_20230513.md
https://jacobbrazeal.wordpress.com/2022/09/23/gpt-3-can-find-paths-up-to-7-nodes-long-in-random-graphs/
https://kenkantzer.com/lessons-after-a-half-billion-gpt-tokens/
https://koenvangilst.nl/blog/keeping-code-complexity-in-check
https://lemire.me/blog/2023/03/22/can-gpt-pass-my-programming-courses/
https://martinfowler.com/articles/2023-chatgpt-xu-hao.html
https://mazzzystar.github.io/2023/05/10/LLM-for-individual/
https://medium.com/geekculture/i-found-a-loophole-to-successfully-web-scrape-using-chatgpt-heres-how-it-works-135f6c077d4d
https://medium.com/tenable-techblog/g-3po-a-protocol-droid-for-ghidra-4b46fa72f1ff
https://micahflee.com/2023/04/capturing-the-flag-with-gpt-4/
https://model-checking.github.io/kani-verifier-blog/2023/05/01/writing-code-with-chatgpt-improve-it-with-kani.html
https://nickarner.com/notes/llm-powered-assistants-for-complex-interfaces-february-26-2023/
https://old.reddit.com/r/singularity/comments/1atjz9v/ive_put_a_complex_codebase_into_a_single/
https://openai.com/blog/function-calling-and-other-api-updates#function-calling
https://openai.com/blog/introducing-text-and-code-embeddings/
https://openai.com/index/introducing-structured-outputs-in-the-api/#_5PYjnV1iAHOPKPupDztdZk
https://paperswithcode.com/sota/math-word-problem-solving-on-math
https://platform.openai.com/docs/guides/embeddings/code-search-using-embeddings
https://platform.openai.com/docs/guides/embeddings/use-cases
https://research.checkpoint.com/2023/opwnai-cybercriminals-starting-to-use-chatgpt/
https://research.google/blog/safely-repairing-broken-builds-with-ml/
https://security.googleblog.com/2023/08/ai-powered-fuzzing-breaking-bug-hunting.html
https://simonwillison.net/2022/Dec/5/rust-chatgpt-copilot/
https://stability.ai/blog/stablecode-llm-generative-ai-coding
https://stackoverflow.co/company/press/archive/openai-partnership/
https://statmodeling.stat.columbia.edu/2023/04/18/chatgpt4-writes-stan-code-so-i-dont-have-to/
https://statmodeling.stat.columbia.edu/2023/08/20/bob-carpenter-thinks-gpt-4-is-awesome/
https://tagide.com/education/writing-a-tokenizer-with-chatgpt/
https://towardsdatascience.com/can-chatgpt-write-better-sql-than-a-data-analyst-f079518efab2
https://towardsdatascience.com/codex-by-openai-in-action-83529c0076cc
https://tyleransom.substack.com/p/using-llms-to-fuzzy-merge
https://verse.systems/blog/post/2024-03-09-using-llms-to-generate-fuzz-generators/
https://web.archive.org/web/20221112033036/https://mullikine.github.io/posts/nlsh-natural-language-shell/
https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/
https://www.chargebackstop.com/blog/card-networks-exploitation
https://www.engraved.blog/building-a-virtual-machine-inside/
https://www.geoffreylitt.com/2023/03/25/llm-end-user-programming
https://www.honeycomb.io/blog/hard-stuff-nobody-talks-about-llm
https://www.kite.com/blog/product/kite-launches-ai-powered-javascript-completions/
https://www.lesswrong.com/posts/KSroBnxCHodGmPPJ8/jailbreaking-gpt-4-s-code-interpreter
https://www.lesswrong.com/posts/u3SueTC44tgKFMMNs/is-the-chatgpt-simulated-linux-virtual-machine-real?commentId=iCAiCah33bBNJqNQE
https://www.lesswrong.com/posts/u6KXXmKFbXfWzoAXn/a-circuit-for-python-docstrings-in-a-4-layer-attention-only
https://www.lesswrong.com/posts/ukTLGe5CQq9w8FMne/inducing-unprompted-misalignment-in-llms
https://www.lesswrong.com/posts/ux93sLHcqmBfsRTvg/gpt-can-write-quines-now-gpt-4
https://www.oneusefulthing.org/p/it-is-starting-to-get-strange
https://www.oreilly.com/radar/what-we-learned-from-a-year-of-building-with-llms-part-i/
https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/
https://www.quantamagazine.org/a-team-of-math-proves-a-critical-link-between-addition-and-sets-20231206/
https://www.reddit.com/r/ChatGPT/comments/12a0ajb/i_gave_gpt4_persistent_memory_and_the_ability_to/
https://www.reddit.com/r/GPT3/comments/106t5gv/compressing_prompt_text_with_lossless_compression/
https://www.reddit.com/r/MachineLearning/comments/106q6m9/p_i_built_adrenaline_a_debugger_that_fixes_errors/
https://www.reddit.com/r/OpenAI/comments/1bm305k/what_the_hell_claud_3_opus_is_a_straight/
https://www.reddit.com/r/singularity/comments/1atjz9v/ive_put_a_complex_codebase_into_a_single/
https://www.samdickie.me/writing/experiment-1-creating-a-landing-page-using-ai-tools-no-code
https://www.zdnet.com/article/microsoft-has-over-a-million-paying-github-copilot-users-ceo-nadella/
https://xenaproject.wordpress.com/2022/09/12/beyond-the-liquid-tensor-experiment/
https%253A%252F%252Fregisterspill.thorstenball.com%252Fp%252Fthey-all-use-it.html
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
https%253A%252F%252Farxiv.org%252Fabs%252F2410.07095%2523openai.html
AI-powered coding pulls in almost $1bn of funding to claim ‘killer app’ status
https%253A%252F%252Fwww.ft.com%252Fcontent%252F4868bd38-613c-4fa9-ba9d-1ed8fa8a40c8.html
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
https%253A%252F%252Farxiv.org%252Fabs%252F2406.18518%2523salesforce.html
A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round could increase startup’s valuation nearly sixfold in a matter of weeks, reflecting AI frenzy
https%253A%252F%252Fwww.wsj.com%252Ftech%252Fai%252Fa-peter-thiel-backed-ai-startup-cognition-labs-seeks-2-billion-valuation-998fa39d.html
Vulnerability Detection with Code Language Models: How Far Are We?
Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A new startup called Cognition AI can turn a user’s prompt into a website or video game
https%253A%252F%252Fwww.bloomberg.com%252Fnews%252Farticles%252F2024-03-12%252Fcognition-ai-is-a-peter-thiel-backed-coding-assistant.html
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
https%253A%252F%252Farxiv.org%252Fabs%252F2401.05566%2523anthropic.html
StarVector: Generating Scalable Vector Graphics Code from Images
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
PassUntil: Predicting Emergent Abilities with Infinite Resolution Evaluation
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
AI Is a Lot of Work: As the technology becomes ubiquitous, a vast tasker underclass is emerging—and not going anywhere
https%253A%252F%252Fwww.theverge.com%252Ffeatures%252F23764584%252Fai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots.html
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)
https%253A%252F%252Farxiv.org%252Fabs%252F2306.04930%2523microsoft.html
https%253A%252F%252Fblogs.microsoft.com%252Fblog%252F2023%252F03%252F16%252Fintroducing-microsoft-365-copilot-your-copilot-for-work%252F.html
https%253A%252F%252Farxiv.org%252Fabs%252F2303.03846%2523google.html
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics
Google is asking employees to test potential ChatGPT competitors, including a chatbot called 'Apprentice Bard'
https%253A%252F%252Fwww.cnbc.com%252F2023%252F01%252F31%252Fgoogle-testing-chatgpt-like-chatbot-apprentice-bard-with-employees.html.html
An Analysis of the Automatic Bug Fixing Performance of ChatGPT
General availability of Azure OpenAI Service expands access to large, advanced AI models with added enterprise benefits
https%253A%252F%252Fazure.microsoft.com%252Fen-us%252Fblog%252Fgeneral-availability-of-azure-openai-service-expands-access-to-large-advanced-ai-models-with-added-enterprise-benefits%252F.html
Programming Possibility: Kevin Scott on AI’s Impact on Cognitive Work
https%253A%252F%252Fgreylock.com%252Fgreymatter%252Fkevin-scott-ai-programming-possibility%252F.html
Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them
https%253A%252F%252Farxiv.org%252Fabs%252F2210.09261%2523google.html
Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners
https%253A%252F%252Farxiv.org%252Fabs%252F2205.06537%2523github.html
InCoder: A Generative Model for Code Infilling and Synthesis
https%253A%252F%252Farxiv.org%252Fabs%252F2204.05999%2523facebook.html
https%253A%252F%252Farxiv.org%252Fabs%252F2204.02311%2523google.html
Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252Fcodex%252F2022-vaithilingam.pdf.html
https%253A%252F%252Farxiv.org%252Fabs%252F2201.10005%2523openai.html
A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More
WebGPT: Browser-assisted question-answering with human feedback
https%253A%252F%252Farxiv.org%252Fabs%252F2112.09332%2523openai.html
WebGPT: Improving the factual accuracy of language models through web browsing
https%253A%252F%252Fopenai.com%252Fresearch%252Fwebgpt.html
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
https%253A%252F%252Farxiv.org%252Fabs%252F2112.11446%2523deepmind.html
Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?
https%253A%252F%252Farxiv.org%252Fabs%252F2111.11904%2523microsoft.html
Solving Probability and Statistics Problems by Program Synthesis
GenLine and GenForm: Two Tools for Interacting with Generative Language Models in a Code Editor
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252Flamda%252F2021-jiang-2.pdf.html
Wikipedia Bibliography: