Security Weaknesses of Copilot Generated Code in GitHub

Fu, Yujia; Liang, Peng; Tahir, Amjed; Li, Zengyang; Shahin, Mojtaba; Yu, Jiaxin; Chen, Jinfu

Computer Science > Software Engineering

arXiv:2310.02059 (cs)

[Submitted on 3 Oct 2023 (v1), last revised 4 Apr 2024 (this version, v2)]

Title:Security Weaknesses of Copilot Generated Code in GitHub

Authors:Yujia Fu, Peng Liang, Amjed Tahir, Zengyang Li, Mojtaba Shahin, Jiaxin Yu, Jinfu Chen

View PDF HTML (experimental)

Abstract:Modern code generation tools, utilizing AI models like Large Language Models (LLMs), have gained popularity for producing functional code. However, their usage presents security challenges, often resulting in insecure code merging into the code base. Evaluating the quality of generated code, especially its security, is crucial. While prior research explored various aspects of code generation, the focus on security has been limited, mostly examining code produced in controlled environments rather than real-world scenarios. To address this gap, we conducted an empirical study, analyzing code snippets generated by GitHub Copilot from GitHub projects. Our analysis identified 452 snippets generated by Copilot, revealing a high likelihood of security issues, with 32.8% of Python and 24.5% of JavaScript snippets affected. These issues span 38 different Common Weakness Enumeration (CWE) categories, including significant ones like CWE-330: Use of Insufficiently Random Values, CWE-78: OS Command Injection, and CWE-94: Improper Control of Generation of Code. Notably, eight CWEs are among the 2023 CWE Top-25, highlighting their severity. Our findings confirm that developers should be careful when adding code generated by Copilot and should also run appropriate security checks as they accept the suggested code. It also shows that practitioners should cultivate corresponding security awareness and skills.

Subjects:	Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Cite as:	arXiv:2310.02059 [cs.SE]
	(or arXiv:2310.02059v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2310.02059

Submission history

From: Peng Liang [view email]
[v1] Tue, 3 Oct 2023 14:01:28 UTC (1,734 KB)
[v2] Thu, 4 Apr 2024 07:53:03 UTC (2,234 KB)

Computer Science > Software Engineering

Title:Security Weaknesses of Copilot Generated Code in GitHub

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Security Weaknesses of Copilot Generated Code in GitHub

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators