âAn Empirical Cybersecurity Evaluation of GitHub Copilotâs Code Contributionsâ, 2021-08-20 (; similar)â :
There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described âAI pair programmerâ, GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugsâand so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilotâs code contributions.
In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (eg. those from MITREâs âTop 25â list). We explore Copilotâs performance on 3 distinct code generation axesâexamining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains.
In total, we produce 89 different scenarios for Copilot to complete, producing 1,692 programs. Of these, we found ~40% to be vulnerable.
This raises significant concerns about the reliability and security of auto-generated code, particularly for applications in critical systems.