On the Security of Python Virtual Machines: An Empirical Study

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2022-10-01 DOI:10.1109/ICSME55016.2022.00028

Xinrong Lin, Baojian Hua, Qiliang Fan

{"title":"On the Security of Python Virtual Machines: An Empirical Study","authors":"Xinrong Lin, Baojian Hua, Qiliang Fan","doi":"10.1109/ICSME55016.2022.00028","DOIUrl":null,"url":null,"abstract":"Python continues to be one of the most popular programming languages and has been used in many safety-critical fields such as medical treatment, autonomous driving systems, and data science. These fields put forward higher security requirements to Python ecosystems. However, existing studies on machine learning systems in Python concentrate on data security, model security and model privacy, and just assume the underlying Python virtual machines (PVMs) are secure and trustworthy. Unfortunately, whether such an assumption really holds is still unknown.This paper presents, to the best of our knowledge, the first and most comprehensive empirical study on the security of CPython, the official and most deployed Python virtual machine. To this end, we first designed and implemented a software prototype dubbed PVMSCAN, then use it to scan the source code of the latest CPython (version 3.10) and other 10 versions (3.0 to 3.9), which consists of 3,838,606 lines of source code. Empirical results give relevant findings and insights towards the security of Python virtual machines, such as: 1) CPython virtual machines are still vulnerable, for example, PVMSCAN detected 239 vulnerabilities in version 3.10, including 55 null dereferences, 86 uninitialized variables and 98 dead stores; Python/C API-related vulnerabilities are very common and have become one of the most severe threats to the security of PVMs: for example, 70 Python/C API-related vulnerabilities are identified in CPython 3.10; 3) the overall quality of the code remained stable during the evolution of Python VMs with vulnerabilities per thousand line (VPTL) to be 0.50; and 4) automatic vulnerability rectification is effective: 166 out of 239 (69.46%) vulnerabilities can be rectified by a simple yet effective syntax-directed heuristics.We have reported our empirical results to the developers of CPython, and they have acknowledged us and already confirmed and fixed 2 bugs (as of this writing) while others are still being analyzed. This study not only demonstrates the effectiveness of our approach, but also highlights the need to improve the reliability of infrastructures like Python virtual machines by leveraging state-of-the-art security techniques and tools.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Python continues to be one of the most popular programming languages and has been used in many safety-critical fields such as medical treatment, autonomous driving systems, and data science. These fields put forward higher security requirements to Python ecosystems. However, existing studies on machine learning systems in Python concentrate on data security, model security and model privacy, and just assume the underlying Python virtual machines (PVMs) are secure and trustworthy. Unfortunately, whether such an assumption really holds is still unknown.This paper presents, to the best of our knowledge, the first and most comprehensive empirical study on the security of CPython, the official and most deployed Python virtual machine. To this end, we first designed and implemented a software prototype dubbed PVMSCAN, then use it to scan the source code of the latest CPython (version 3.10) and other 10 versions (3.0 to 3.9), which consists of 3,838,606 lines of source code. Empirical results give relevant findings and insights towards the security of Python virtual machines, such as: 1) CPython virtual machines are still vulnerable, for example, PVMSCAN detected 239 vulnerabilities in version 3.10, including 55 null dereferences, 86 uninitialized variables and 98 dead stores; Python/C API-related vulnerabilities are very common and have become one of the most severe threats to the security of PVMs: for example, 70 Python/C API-related vulnerabilities are identified in CPython 3.10; 3) the overall quality of the code remained stable during the evolution of Python VMs with vulnerabilities per thousand line (VPTL) to be 0.50; and 4) automatic vulnerability rectification is effective: 166 out of 239 (69.46%) vulnerabilities can be rectified by a simple yet effective syntax-directed heuristics.We have reported our empirical results to the developers of CPython, and they have acknowledged us and already confirmed and fixed 2 bugs (as of this writing) while others are still being analyzed. This study not only demonstrates the effectiveness of our approach, but also highlights the need to improve the reliability of infrastructures like Python virtual machines by leveraging state-of-the-art security techniques and tools.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

论Python虚拟机的安全性:一个实证研究

Python仍然是最流行的编程语言之一，并已被用于许多安全关键领域，如医疗、自动驾驶系统和数据科学。这些领域对Python生态系统提出了更高的安全性要求。然而，现有的关于Python机器学习系统的研究主要集中在数据安全、模型安全和模型隐私上，并且仅仅假设底层的Python虚拟机(pvm)是安全可信的。不幸的是，这种假设是否真的成立仍然未知。据我们所知，本文首次对CPython(官方和部署最多的Python虚拟机)的安全性进行了最全面的实证研究。为此，我们首先设计并实现了一个名为PVMSCAN的软件原型，然后使用它扫描最新的CPython(3.10版本)和其他10个版本(3.0到3.9)的源代码，共3838606行源代码。实证结果对Python虚拟机的安全性给出了相关的发现和见解，例如:1)CPython虚拟机仍然存在漏洞，例如，PVMSCAN在3.10版本中检测到239个漏洞，包括55个null解引用，86个未初始化变量和98个死存储;Python/C api相关漏洞非常普遍，已经成为pvm安全最严重的威胁之一:例如，在CPython 3.10中识别了70个Python/C api相关漏洞;3)在Python vm的演进过程中，代码的整体质量保持稳定，每千行漏洞数(VPTL)为0.50;4)漏洞自动修正是有效的:239个漏洞中有166个(69.46%)可以通过简单而有效的语法导向启发式修正。我们已经向CPython的开发人员报告了我们的经验结果，他们已经承认了我们，并且已经确认并修复了2个错误(在撰写本文时)，而其他错误仍在分析中。这项研究不仅证明了我们方法的有效性，而且还强调了通过利用最先进的安全技术和工具来提高基础设施(如Python虚拟机)可靠性的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量

期刊最新文献

RestTestGen: An Extensible Framework for Automated Black-box Testing of RESTful APIs COBREX: A Tool for Extracting Business Rules from COBOL On the Security of Python Virtual Machines: An Empirical Study The Phantom Menace: Unmasking Security Issues in Evolving Software Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair