首页 > 最新文献

IEEE Transactions on Software Engineering最新文献

英文 中文
On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations 数据重采样对基于深度学习的日志异常检测的影响:见解和建议
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-12-09 DOI: 10.1109/TSE.2024.3513413
Xiaoxue Ma;Huiqi Zou;Pinjia He;Jacky Keung;Yishu Li;Xiao Yu;Federica Sarro
Numerous Deep Learning (DL)-based approaches have gained attention in software Log Anomaly Detection (LAD), yet class imbalance in training data remains a challenge, with anomalies often comprising less than 1% of datasets like Thunderbird. Existing DLLAD methods may underperform in severely imbalanced datasets. Although data resampling has proven effective in other software engineering tasks, it has not been explored in LAD. This study aims to fill this gap by providing an in-depth analysis of the impact of diverse data resampling methods on existing DLLAD approaches from two distinct perspectives. Firstly, we assess the performance of these DLLAD approaches across four datasets with different levels of class imbalance, and we explore the impact of resampling ratios of normal to abnormal data on DLLAD approaches. Secondly, we evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data. Our findings indicate that oversampling methods generally outperform undersampling and hybrid sampling methods. Data resampling on raw data yields superior results compared to data resampling in the feature space. These improvements are attributed to the increased attention given to important tokens. By exploring the resampling ratio of normal to abnormal data, we suggest generating more data for minority classes through oversampling while removing less data from majority classes through undersampling. In conclusion, our study provides valuable insights into the intricate relationship between data resampling methods and DLLAD. By addressing the challenge of class imbalance, researchers and practitioners can enhance DLLAD performance.
许多基于深度学习(DL)的方法在软件日志异常检测(LAD)中得到了关注,但训练数据中的类不平衡仍然是一个挑战,异常通常占雷鸟等数据集的不到1%。现有的DLLAD方法在严重不平衡的数据集中可能表现不佳。虽然数据重采样已被证明在其他软件工程任务中是有效的,但它尚未在LAD中进行探索。本研究旨在通过从两个不同的角度深入分析不同数据重采样方法对现有DLLAD方法的影响来填补这一空白。首先,我们评估了这些DLLAD方法在四个不同类别失衡水平的数据集上的性能,并探讨了正常数据与异常数据的重采样比对DLLAD方法的影响。其次,利用最优的正态数据和异常数据的重采样比,评估了数据重采样方法的有效性。我们的研究结果表明,过采样方法通常优于欠采样和混合采样方法。在原始数据上的数据重采样比在特征空间上的数据重采样效果更好。这些改进归功于对重要代币的更多关注。通过探索正常数据与异常数据的重采样比,我们建议通过过采样为少数类生成更多的数据,而通过欠采样从多数类中去除更少的数据。总之,我们的研究为数据重采样方法与DLLAD之间的复杂关系提供了有价值的见解。通过解决阶级不平衡的挑战,研究人员和实践者可以提高DLLAD的性能。
{"title":"On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations","authors":"Xiaoxue Ma;Huiqi Zou;Pinjia He;Jacky Keung;Yishu Li;Xiao Yu;Federica Sarro","doi":"10.1109/TSE.2024.3513413","DOIUrl":"10.1109/TSE.2024.3513413","url":null,"abstract":"Numerous Deep Learning (DL)-based approaches have gained attention in software Log Anomaly Detection (LAD), yet class imbalance in training data remains a challenge, with anomalies often comprising less than 1% of datasets like Thunderbird. Existing DLLAD methods may underperform in severely imbalanced datasets. Although data resampling has proven effective in other software engineering tasks, it has not been explored in LAD. This study aims to fill this gap by providing an in-depth analysis of the impact of diverse data resampling methods on existing DLLAD approaches from two distinct perspectives. Firstly, we assess the performance of these DLLAD approaches across four datasets with different levels of class imbalance, and we explore the impact of resampling ratios of normal to abnormal data on DLLAD approaches. Secondly, we evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data. Our findings indicate that oversampling methods generally outperform undersampling and hybrid sampling methods. Data resampling on raw data yields superior results compared to data resampling in the feature space. These improvements are attributed to the increased attention given to important tokens. By exploring the resampling ratio of normal to abnormal data, we suggest generating more data for minority classes through oversampling while removing less data from majority classes through undersampling. In conclusion, our study provides valuable insights into the intricate relationship between data resampling methods and DLLAD. By addressing the challenge of class imbalance, researchers and practitioners can enhance DLLAD performance.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"243-261"},"PeriodicalIF":6.5,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142796779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FM-PRO: A Feature Modeling Process FM-PRO:一个特征建模过程
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-12-09 DOI: 10.1109/TSE.2024.3513635
Johan Martinson;Wardah Mahmood;Jude Gyimah;Thorsten Berger
Almost any software system needs to exist in multiple variants. While branching or forking—a.k.a. clone & own—are simple and inexpensive strategies, they do not scale well with the number of variants created. Software platforms—a.k.a. software product lines—scale and allow to derive variants by selecting the desired features in an automated, tool-supported process. However, product lines are difficult to adopt and to evolve, requiring mechanisms to manage features and their implementations in complex codebases. Such systems can easily have thousands of features with intricate dependencies. Feature models have arguably become the most popular notation to model and manage features, mainly due to their intuitive, tree-like representation. Introduced more than 30 years ago, thousands of techniques relying on feature models have been presented, including model configuration, synthesis, analysis, and evolution techniques. However, despite many success stories, organizations still struggle with adopting software product lines, limiting the usefulness of such techniques. Surprisingly, no modeling process exists to systematically create feature models, despite them being the main artifact of a product line. This challenges organizations, even hindering the adoption of product lines altogether. We present FM-PRO, a process to engineer feature models. It can be used with different adoption strategies for product lines, including creating one from scratch (pro-active adoption) and re-engineering one from existing cloned variants (extractive adoption). The resulting feature models can be used for configuration, planning, evolution, reasoning about variants, or keeping an overview understanding of complex software platforms. We systematically engineered the process based on empirically elicited modeling principles. We evaluated and refined it in a real-world industrial case study, two surveys with industrial and academic feature-modeling experts, as well as an open-source case study. We hope that FM-PRO helps to adopt feature models and that it facilitates higher-level, feature-oriented engineering practices, establishing features as a better and more abstract way to manage increasingly complex codebases.
几乎任何软件系统都需要以多种变体存在。当分支或分叉——也就是。克隆和自有——这是简单且廉价的策略,但它们并不能很好地扩展变体的数量。软件platforms-a.k.a。软件产品线——通过在自动化的、工具支持的过程中选择所需的特性,扩展并允许派生变体。然而,产品线很难采用和发展,需要在复杂的代码库中管理特性及其实现的机制。这样的系统可以很容易地拥有数千个具有复杂依赖关系的特性。特征模型已经成为建模和管理特征的最流行的表示法,主要是因为它们具有直观的树状表示法。30多年前,已经出现了数千种依赖于特征模型的技术,包括模型配置、综合、分析和进化技术。然而,尽管有许多成功的案例,组织仍然在努力采用软件产品线,限制了这些技术的有用性。令人惊讶的是,不存在系统化地创建特征模型的建模过程,尽管它们是产品线的主要工件。这给组织带来了挑战,甚至阻碍了产品线的采用。我们提出了FM-PRO,一个设计特征模型的过程。它可以用于产品线的不同采用策略,包括从头创建一个(主动采用)和从现有克隆变体重新设计一个(提取采用)。得到的特征模型可以用于配置、规划、演化、对变量的推理,或者保持对复杂软件平台的总体理解。我们根据经验得出的建模原则系统地设计了这个过程。我们在一个真实的工业案例研究中评估并改进了它,与工业和学术特征建模专家进行了两次调查,以及一个开源案例研究。我们希望FM-PRO能够帮助采用特征模型,并促进更高层次的、面向特征的工程实践,将特征作为一种更好、更抽象的方式来管理日益复杂的代码库。
{"title":"FM-PRO: A Feature Modeling Process","authors":"Johan Martinson;Wardah Mahmood;Jude Gyimah;Thorsten Berger","doi":"10.1109/TSE.2024.3513635","DOIUrl":"10.1109/TSE.2024.3513635","url":null,"abstract":"Almost any software system needs to exist in multiple variants. While branching or forking—a.k.a. clone & own—are simple and inexpensive strategies, they do not scale well with the number of variants created. Software platforms—a.k.a. software product lines—scale and allow to derive variants by selecting the desired features in an automated, tool-supported process. However, product lines are difficult to adopt and to evolve, requiring mechanisms to manage features and their implementations in complex codebases. Such systems can easily have thousands of features with intricate dependencies. Feature models have arguably become the most popular notation to model and manage features, mainly due to their intuitive, tree-like representation. Introduced more than 30 years ago, thousands of techniques relying on feature models have been presented, including model configuration, synthesis, analysis, and evolution techniques. However, despite many success stories, organizations still struggle with adopting software product lines, limiting the usefulness of such techniques. Surprisingly, no modeling process exists to systematically create feature models, despite them being the main artifact of a product line. This challenges organizations, even hindering the adoption of product lines altogether. We present FM-PRO, a process to engineer feature models. It can be used with different adoption strategies for product lines, including creating one from scratch (\u0000<italic>pro-active adoption</i>\u0000) and re-engineering one from existing cloned variants (\u0000<italic>extractive adoption</i>\u0000). The resulting feature models can be used for configuration, planning, evolution, reasoning about variants, or keeping an overview understanding of complex software platforms. We systematically engineered the process based on empirically elicited modeling principles. We evaluated and refined it in a real-world industrial case study, two surveys with industrial and academic feature-modeling experts, as well as an open-source case study. We hope that FM-PRO helps to adopt feature models and that it facilitates higher-level, feature-oriented engineering practices, establishing features as a better and more abstract way to manage increasingly complex codebases.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"262-282"},"PeriodicalIF":6.5,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142796778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoCo: Fuzzing Deep Learning Libraries via Assembling Code MoCo:通过汇编代码模糊化深度学习库
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-12-02 DOI: 10.1109/TSE.2024.3509975
Pin Ji;Yang Feng;Duo Wu;Lingyue Yan;Penglin Chen;Jia Liu;Zhihong Zhao
The rapidly developing Deep Learning (DL) techniques have been applied in software systems of various types. However, they can also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in generating tests corresponding to crucial testing scenarios and constructing test oracles. In this paper, we propose MoCo, a novel fuzzing testing method for DL libraries via assembling code. The seed tests used by MoCo are code files that implement DL models, covering both model construction and training in the most common real-world application scenarios for DL libraries. MoCo first disassembles the seed code files to extract templates and code blocks, then applies code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate new code blocks that fit the template. To ensure the correctness of the code block mutation, we employ the Large Language Model to parse the official documents of DL libraries for information about the parameters and the constraints between them. By inserting context-appropriate code blocks into the template, MoCo can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree, we construct the test oracle based on the execution state consistency and the calculation result consistency. Since the granularity of code assembly is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of MoCo using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiments, MoCo detects 77 new bugs of four types in three DL libraries, where 55 bugs have been confirmed, and 39 bugs have been fixed by developers. The experimental results demonstrate that MoCo can generate high-quality tests that cover crucial testing scenarios and detect different types of bugs, which helps developers improve the reliability of DL libraries.
{"title":"MoCo: Fuzzing Deep Learning Libraries via Assembling Code","authors":"Pin Ji;Yang Feng;Duo Wu;Lingyue Yan;Penglin Chen;Jia Liu;Zhihong Zhao","doi":"10.1109/TSE.2024.3509975","DOIUrl":"10.1109/TSE.2024.3509975","url":null,"abstract":"The rapidly developing Deep Learning (DL) techniques have been applied in software systems of various types. However, they can also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in generating tests corresponding to crucial testing scenarios and constructing test oracles. In this paper, we propose <monospace>MoCo</monospace>, a novel fuzzing testing method for DL libraries via assembling code. The seed tests used by <monospace>MoCo</monospace> are code files that implement DL models, covering both model construction and training in the most common real-world application scenarios for DL libraries. <monospace>MoCo</monospace> first disassembles the seed code files to extract templates and code blocks, then applies code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate new code blocks that fit the template. To ensure the correctness of the code block mutation, we employ the Large Language Model to parse the official documents of DL libraries for information about the parameters and the constraints between them. By inserting context-appropriate code blocks into the template, <monospace>MoCo</monospace> can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree, we construct the test oracle based on the execution state consistency and the calculation result consistency. Since the granularity of code assembly is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of <monospace>MoCo</monospace> using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiments, <monospace>MoCo</monospace> detects 77 new bugs of four types in three DL libraries, where 55 bugs have been confirmed, and 39 bugs have been fixed by developers. The experimental results demonstrate that <monospace>MoCo</monospace> can generate high-quality tests that cover crucial testing scenarios and detect different types of bugs, which helps developers improve the reliability of DL libraries.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"371-388"},"PeriodicalIF":6.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142759889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sprint2Vec: A Deep Characterization of Sprints in Iterative Software Development Sprint2Vec:迭代软件开发中对sprint的深入描述
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-29 DOI: 10.1109/TSE.2024.3509016
Morakot Choetkiertikul;Peerachai Banyongrakkul;Chaiyong Ragkhitwetsagul;Suppawong Tuarob;Hoa Khanh Dam;Thanwadee Sunetnanta
Iterative approaches like Agile Scrum are commonly adopted to enhance the software development process. However, challenges such as schedule and budget overruns still persist in many software projects. Several approaches employ machine learning techniques, particularly classification, to facilitate decision-making in iterative software development. Existing approaches often concentrate on characterizing a sprint to predict solely productivity. We introduce Sprint2Vec, which leverages three aspects of sprint information – sprint attributes, issue attributes, and the developers involved in a sprint, to comprehensively characterize it for predicting both productivity and quality outcomes of the sprints. Our approach combines traditional feature extraction techniques with automated deep learning-based unsupervised feature learning techniques. We utilize methods like Long Short-Term Memory (LSTM) to enhance our feature learning process. This enables us to learn features from unstructured data, such as textual descriptions of issues and sequences of developer activities. We conducted an evaluation of our approach on two regression tasks: predicting the deliverability (i.e., the amount of work delivered from a sprint) and quality of a sprint (i.e., the amount of delivered work that requires rework). The evaluation results on five well-known open-source projects (Apache, Atlassian, Jenkins, Spring, and Talendforge) demonstrate our approach's superior performance compared to baseline and alternative approaches.
像敏捷Scrum这样的迭代方法通常被用来增强软件开发过程。然而,进度和预算超支等挑战仍然存在于许多软件项目中。有几种方法使用机器学习技术,特别是分类,来促进迭代软件开发中的决策。现有的方法通常集中于描述冲刺,以预测生产率。我们介绍了Sprint2Vec,它利用了sprint信息的三个方面——sprint属性、问题属性和sprint中涉及的开发人员,来全面地描述它,以预测sprint的生产力和质量结果。我们的方法结合了传统的特征提取技术和基于自动深度学习的无监督特征学习技术。我们利用长短期记忆(LSTM)等方法来增强我们的特征学习过程。这使我们能够从非结构化数据中学习特性,例如问题的文本描述和开发人员活动的序列。我们在两个回归任务上对我们的方法进行了评估:预测可交付性(即,从冲刺中交付的工作量)和冲刺的质量(即,需要返工的交付工作量)。在五个知名的开源项目(Apache、Atlassian、Jenkins、Spring和Talendforge)上的评估结果表明,与基线和替代方法相比,我们的方法具有优越的性能。
{"title":"Sprint2Vec: A Deep Characterization of Sprints in Iterative Software Development","authors":"Morakot Choetkiertikul;Peerachai Banyongrakkul;Chaiyong Ragkhitwetsagul;Suppawong Tuarob;Hoa Khanh Dam;Thanwadee Sunetnanta","doi":"10.1109/TSE.2024.3509016","DOIUrl":"10.1109/TSE.2024.3509016","url":null,"abstract":"Iterative approaches like Agile Scrum are commonly adopted to enhance the software development process. However, challenges such as schedule and budget overruns still persist in many software projects. Several approaches employ machine learning techniques, particularly classification, to facilitate decision-making in iterative software development. Existing approaches often concentrate on characterizing a sprint to predict solely productivity. We introduce Sprint2Vec, which leverages three aspects of sprint information – sprint attributes, issue attributes, and the developers involved in a sprint, to comprehensively characterize it for predicting both productivity and quality outcomes of the sprints. Our approach combines traditional feature extraction techniques with automated deep learning-based unsupervised feature learning techniques. We utilize methods like Long Short-Term Memory (LSTM) to enhance our feature learning process. This enables us to learn features from unstructured data, such as textual descriptions of issues and sequences of developer activities. We conducted an evaluation of our approach on two regression tasks: predicting the deliverability (i.e., the amount of work delivered from a sprint) and quality of a sprint (i.e., the amount of delivered work that requires rework). The evaluation results on five well-known open-source projects (Apache, Atlassian, Jenkins, Spring, and Talendforge) demonstrate our approach's superior performance compared to baseline and alternative approaches.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"220-242"},"PeriodicalIF":6.5,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10771809","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PackHunter: Recovering Missing Packages for C/C++ Projects 为C/ c++项目恢复丢失的包
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-27 DOI: 10.1109/TSE.2024.3506629
Rongxin Wu;Zhiling Huang;Zige Tian;Chengpeng Wang;Xiangyu Zhang
The reproducibility of software artifacts is a critical aspect of software development and application. However, current research indicates that a notable proportion of C/C++ projects encounter non-reproducibility issues stemming from build failures, primarily attributed to the absence of necessary packages. This paper introduces PackHunter, a novel technique that automates the recovery of missing packages in C/C++ projects. By identifying missing files during the project's build process, PackHunter can determine potentially missing packages and synthesize an installation script. Specifically, it simplifies C/C++ projects through program reduction to reduce build overhead and simulates the presence of missing files via mock build to ensure a successful build for probing missing files. Besides, PackHunter leverages a sophisticated design to eliminate packages that do not contain the required missing files, effectively reducing the search space. Furthermore, PackHunter introduces a greedy strategy to prioritize the packages, eventually recovering missing packages with few times of package enumeration. We have implemented PackHunter as a tool and evaluated it on 30 real-world projects. The results demonstrate that PackHunter can recover missing packages efficiently, achieving 26.59$boldsymbol{times}$ speed up over the state-of-the-art approach. The effectiveness of PackHunter highlights its potential to assist developers in building C/C++ artifacts and promote software reproducibility.
软件工件的可再现性是软件开发和应用程序的一个关键方面。然而,目前的研究表明,相当一部分C/ c++项目遇到了由构建失败引起的不可再现性问题,主要归因于缺乏必要的包。本文介绍了一种在C/ c++项目中自动恢复丢失包的新技术PackHunter。通过在项目构建过程中识别丢失的文件,PackHunter可以确定可能丢失的包并合成一个安装脚本。具体来说,它通过程序缩减来简化C/ c++项目,以减少构建开销,并通过模拟构建来模拟缺失文件的存在,以确保成功构建以探测缺失文件。此外,PackHunter利用复杂的设计来消除不包含所需丢失文件的包,有效地减少了搜索空间。此外,PackHunter引入了贪心策略来对包进行优先级排序,最终通过很少的包枚举来恢复丢失的包。我们已经将PackHunter作为工具实现,并在30个实际项目中对其进行了评估。结果表明,PackHunter可以有效地恢复丢失的包,达到26.59美元的速度比最先进的方法。PackHunter的有效性突出了它在帮助开发人员构建C/ c++工件和提高软件可重复性方面的潜力。
{"title":"PackHunter: Recovering Missing Packages for C/C++ Projects","authors":"Rongxin Wu;Zhiling Huang;Zige Tian;Chengpeng Wang;Xiangyu Zhang","doi":"10.1109/TSE.2024.3506629","DOIUrl":"10.1109/TSE.2024.3506629","url":null,"abstract":"The reproducibility of software artifacts is a critical aspect of software development and application. However, current research indicates that a notable proportion of C/C++ projects encounter non-reproducibility issues stemming from build failures, primarily attributed to the absence of necessary packages. This paper introduces \u0000<small>PackHunter</small>\u0000, a novel technique that automates the recovery of missing packages in C/C++ projects. By identifying missing files during the project's build process, \u0000<small>PackHunter</small>\u0000 can determine potentially missing packages and synthesize an installation script. Specifically, it simplifies C/C++ projects through program reduction to reduce build overhead and simulates the presence of missing files via mock build to ensure a successful build for probing missing files. Besides, \u0000<small>PackHunter</small>\u0000 leverages a sophisticated design to eliminate packages that do not contain the required missing files, effectively reducing the search space. Furthermore, \u0000<small>PackHunter</small>\u0000 introduces a greedy strategy to prioritize the packages, eventually recovering missing packages with few times of package enumeration. We have implemented \u0000<small>PackHunter</small>\u0000 as a tool and evaluated it on 30 real-world projects. The results demonstrate that \u0000<small>PackHunter</small>\u0000 can recover missing packages efficiently, achieving 26.59\u0000<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>\u0000 speed up over the state-of-the-art approach. The effectiveness of \u0000<small>PackHunter</small>\u0000 highlights its potential to assist developers in building C/C++ artifacts and promote software reproducibility.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"206-219"},"PeriodicalIF":6.5,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On-the-Fly Syntax Highlighting: Generalisation and Speed-Ups 即时语法高亮:泛化和加速
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-26 DOI: 10.1109/TSE.2024.3506040
Marco Edoardo Palma;Alex Wolf;Pasquale Salza;Harald C. Gall
On-the-fly syntax highlighting involves the rapid association of visual secondary notation with each character of a language derivation. This task has grown in importance due to the widespread use of online software development tools, which frequently display source code and heavily rely on efficient syntax highlighting mechanisms. In this context, resolvers must address three key demands: speed, accuracy, and development costs. Speed constraints are crucial for ensuring usability, providing responsive feedback for end users and minimizing system overhead. At the same time, precise syntax highlighting is essential for improving code comprehension. Achieving such accuracy, however, requires the ability to perform grammatical analysis, even in cases of varying correctness. Additionally, the development costs associated with supporting multiple programming languages pose a significant challenge. The technical challenges in balancing these three aspects explain why developers today experience significantly worse code syntax highlighting online compared to what they have locally. The current state-of-the-art relies on leveraging programming languages’ original lexers and parsers to generate syntax highlighting oracles, which are used to train base Recurrent Neural Network models. However, questions of generalisation remain. This paper addresses this gap by extending previous work validation dataset to six mainstream programming languages thus providing a more thorough evaluation. In response to limitations related to evaluation performance and training costs, this work introduces a novel Convolutional Neural Network (CNN) based model, specifically designed to mitigate these issues. Furthermore, this work addresses an area previously unexplored performance gains when deploying such models on GPUs. The evaluation demonstrates that the new CNN-based implementation is significantly faster than existing state-of-the-art methods, while still delivering the same near-perfect accuracy.
{"title":"On-the-Fly Syntax Highlighting: Generalisation and Speed-Ups","authors":"Marco Edoardo Palma;Alex Wolf;Pasquale Salza;Harald C. Gall","doi":"10.1109/TSE.2024.3506040","DOIUrl":"10.1109/TSE.2024.3506040","url":null,"abstract":"On-the-fly syntax highlighting involves the rapid association of visual secondary notation with each character of a language derivation. This task has grown in importance due to the widespread use of online software development tools, which frequently display source code and heavily rely on efficient syntax highlighting mechanisms. In this context, resolvers must address three key demands: speed, accuracy, and development costs. Speed constraints are crucial for ensuring usability, providing responsive feedback for end users and minimizing system overhead. At the same time, precise syntax highlighting is essential for improving code comprehension. Achieving such accuracy, however, requires the ability to perform grammatical analysis, even in cases of varying correctness. Additionally, the development costs associated with supporting multiple programming languages pose a significant challenge. The technical challenges in balancing these three aspects explain why developers today experience significantly worse code syntax highlighting online compared to what they have locally. The current state-of-the-art relies on leveraging programming languages’ original lexers and parsers to generate syntax highlighting oracles, which are used to train base Recurrent Neural Network models. However, questions of generalisation remain. This paper addresses this gap by extending previous work validation dataset to six mainstream programming languages thus providing a more thorough evaluation. In response to limitations related to evaluation performance and training costs, this work introduces a novel Convolutional Neural Network (CNN) based model, specifically designed to mitigate these issues. Furthermore, this work addresses an area previously unexplored performance gains when deploying such models on GPUs. The evaluation demonstrates that the new CNN-based implementation is significantly faster than existing state-of-the-art methods, while still delivering the same near-perfect accuracy.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"355-370"},"PeriodicalIF":6.5,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142718350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Triple Peak Day: Work Rhythms of Software Developers in Hybrid Work 三倍峰值日:混合工作中软件开发人员的工作节奏
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-22 DOI: 10.1109/TSE.2024.3504831
Javier Hernandez;Vedant Das Swain;Jina Suh;Daniel McDuff;Judith Amores;Gonzalo Ramos;Kael Rowan;Brian Houck;Shamsi Iqbal;Mary Czerwinski
The future of work is rapidly changing, with remote and hybrid settings blurring the boundaries between professional and personal life. To understand how work rhythms vary across different work settings, we conducted a month-long study of 65 software developers, collecting anonymized computer activity data as well as daily ratings for perceived stress, productivity, and work setting. In addition to confirming the double-peak pattern of activity at 10:00 am and 2:00 pm observed in prior research, we observed a significant third peak around 9:00 pm. This third peak was associated with higher perceived productivity during remote days but increased stress during onsite and hybrid days, highlighting a nuanced interplay between work demands and work settings. Additionally, we found strong correlations between computer activity, productivity, and stress, including an inverted U-shaped relationship where productivity peaked at around six hours of computer activity before declining on more active days. These findings provide new insights into evolving work rhythms and highlight the impact of different work settings on productivity and stress.
{"title":"Triple Peak Day: Work Rhythms of Software Developers in Hybrid Work","authors":"Javier Hernandez;Vedant Das Swain;Jina Suh;Daniel McDuff;Judith Amores;Gonzalo Ramos;Kael Rowan;Brian Houck;Shamsi Iqbal;Mary Czerwinski","doi":"10.1109/TSE.2024.3504831","DOIUrl":"10.1109/TSE.2024.3504831","url":null,"abstract":"The future of work is rapidly changing, with remote and hybrid settings blurring the boundaries between professional and personal life. To understand how work rhythms vary across different work settings, we conducted a month-long study of 65 software developers, collecting anonymized computer activity data as well as daily ratings for perceived stress, productivity, and work setting. In addition to confirming the double-peak pattern of activity at 10:00 am and 2:00 pm observed in prior research, we observed a significant third peak around 9:00 pm. This third peak was associated with higher perceived productivity during remote days but increased stress during onsite and hybrid days, highlighting a nuanced interplay between work demands and work settings. Additionally, we found strong correlations between computer activity, productivity, and stress, including an inverted U-shaped relationship where productivity peaked at around six hours of computer activity before declining on more active days. These findings provide new insights into evolving work rhythms and highlight the impact of different work settings on productivity and stress.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"344-354"},"PeriodicalIF":6.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142690736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenProgJS: A Baseline System for Test-Based Automated Repair of JavaScript Programs GenProgJS:基于测试的 JavaScript 程序自动修复基准系统
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-21 DOI: 10.1109/TSE.2024.3497798
Viktor Csuvik;Dániel Horváth;Márk Lajkó;László Vidács
Originally, GenProg was created to repair buggy programs written in the C programming language, launching a new discipline in Generate-and-Validate approach of Automated Program Repair (APR). Since then, a number of other tools has been published using a variety of repair approaches. Some of these still operate on programs written in C/C++, others on Java or even Python programs. In this work, a tool named GenProgJS is presented, which generates candidate patches for faulty JavaScript programs. The algorithm it uses is very similar to the genetic algorithm used in the original GenProg, hence the name. In addition to the traditional approach, solutions used in some more recent works were also incorporated, and JavaScript language-specific approaches were also taken into account when the tool was designed. To the best of our knowledge, the tool presented here is the first to apply GenProg's general generate-and-validate approach to JavaScript programs. We evaluate the method on the BugsJS bug database, where it successfully fixed 31 bugs in 6 open source Node.js projects. These bugs belong to 14 different categories showing the generic nature of the method. During the experiments, code transformations applied on the original source code are all traced, and an in-depth analysis of mutation operators and fine-grained changes are also presented. We share our findings with the APR research community and describe the difficulties and differences we faced while designed this JavaScript repair tool. The source code of GenProgJS is publicly available on Github, with a pre-configured Docker environment where it can easily be launched.
{"title":"GenProgJS: A Baseline System for Test-Based Automated Repair of JavaScript Programs","authors":"Viktor Csuvik;Dániel Horváth;Márk Lajkó;László Vidács","doi":"10.1109/TSE.2024.3497798","DOIUrl":"10.1109/TSE.2024.3497798","url":null,"abstract":"Originally, GenProg was created to repair buggy programs written in the C programming language, launching a new discipline in Generate-and-Validate approach of Automated Program Repair (APR). Since then, a number of other tools has been published using a variety of repair approaches. Some of these still operate on programs written in C/C++, others on Java or even Python programs. In this work, a tool named GenProgJS is presented, which generates candidate patches for faulty JavaScript programs. The algorithm it uses is very similar to the genetic algorithm used in the original GenProg, hence the name. In addition to the traditional approach, solutions used in some more recent works were also incorporated, and JavaScript language-specific approaches were also taken into account when the tool was designed. To the best of our knowledge, the tool presented here is the first to apply GenProg's general generate-and-validate approach to JavaScript programs. We evaluate the method on the BugsJS bug database, where it successfully fixed 31 bugs in 6 open source Node.js projects. These bugs belong to 14 different categories showing the generic nature of the method. During the experiments, code transformations applied on the original source code are all traced, and an in-depth analysis of mutation operators and fine-grained changes are also presented. We share our findings with the APR research community and describe the difficulties and differences we faced while designed this JavaScript repair tool. The source code of GenProgJS is publicly available on Github, with a pre-configured Docker environment where it can easily be launched.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"325-343"},"PeriodicalIF":6.5,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10759840","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142684362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Inter-Dataset Code Duplication and Data Leakage in Large Language Models 论大型语言模型中的数据集间代码重复和数据泄露
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-21 DOI: 10.1109/TSE.2024.3504286
José Antonio Hernández López;Boqi Chen;Mootez Saad;Tushar Sharma;Dániel Varró
Motivation. Large language models (LLMs) have exhibited remarkable proficiency in diverse software engineering (SE) tasks, such as code summarization, code translation, and code search. Handling such tasks typically involves acquiring foundational coding knowledge on large, general-purpose datasets during a pre-training phase, and subsequently refining on smaller, task-specific datasets as part of a fine-tuning phase. Problem statement. Data leakage i.e., using information of the test set to perform the model training, is a well-known issue in training of machine learning models. A manifestation of this issue is the intersection of the training and testing splits. While intra-dataset code duplication examines this intersection within a given dataset and has been addressed in prior research, inter-dataset code duplication, which gauges the overlap between different datasets, remains largely unexplored. If this phenomenon exists, it could compromise the integrity of LLM evaluations because of the inclusion of fine-tuning test samples that were already encountered during pre-training, resulting in inflated performance metrics. Contribution. This paper explores the phenomenon of inter-dataset code duplication and its impact on evaluating LLMs across diverse SE tasks. Study design. We conduct an empirical study using the CodeSearchNet dataset (csn), a widely adopted pre-training dataset, and five fine-tuning datasets used for various SE tasks. We first identify the intersection between the pre-training and fine-tuning datasets using a deduplication process. Next, we pre-train two versions of LLMs using a subset of csn: one leaky LLM, which includes the identified intersection in its pre-training set, and one non-leaky LLM that excludes these samples. Finally, we fine-tune both models and compare their performances using fine-tuning test samples that are part of the intersection. Results. Our findings reveal a potential threat to the evaluation of LLMs across multiple SE tasks, stemming from the inter-dataset code duplication phenomenon. We also demonstrate that this threat is accentuated by the chosen fine-tuning technique. Furthermore, we provide evidence that open-source models such as CodeBERT, GraphCodeBERT, and UnixCoder could be affected by inter-dataset duplication. Based on our findings, we delve into prior research that may be susceptible to this threat. Additionally, we offer guidance to SE researchers on strategies to prevent inter-dataset code duplication.
动机。大型语言模型(llm)在不同的软件工程(SE)任务中表现出了显著的熟练程度,例如代码总结、代码翻译和代码搜索。处理此类任务通常需要在预训练阶段获取大型通用数据集的基础编码知识,然后在微调阶段对较小的任务特定数据集进行细化。问题陈述。数据泄漏,即利用测试集的信息进行模型训练,是机器学习模型训练中一个众所周知的问题。这个问题的一个表现就是训练和测试分离的交集。虽然数据集内代码复制检查了给定数据集内的交叉点,并且已经在先前的研究中得到了解决,但数据集间代码复制(衡量不同数据集之间的重叠)在很大程度上仍未被探索。如果这种现象存在,它可能会损害LLM评估的完整性,因为它包含了在预训练期间已经遇到的微调测试样本,从而导致虚增的性能指标。的贡献。本文探讨了数据集间代码重复的现象及其对跨不同SE任务评估llm的影响。研究设计。我们使用CodeSearchNet数据集(csn)进行实证研究,这是一个广泛采用的预训练数据集,以及用于各种SE任务的五个微调数据集。我们首先使用重复数据删除过程确定预训练和微调数据集之间的交集。接下来,我们使用csn的一个子集预训练两个版本的LLM:一个漏的LLM,它在其预训练集中包括已识别的交集,另一个非漏的LLM排除这些样本。最后,我们对两个模型进行微调,并使用作为交集一部分的微调测试样本来比较它们的性能。结果。我们的研究结果揭示了跨多个SE任务的llm评估的潜在威胁,源于数据集间代码重复现象。我们还证明,所选择的微调技术加剧了这种威胁。此外,我们提供的证据表明,CodeBERT、GraphCodeBERT和UnixCoder等开源模型可能受到数据集间复制的影响。根据我们的发现,我们深入研究了可能易受这种威胁的先前研究。此外,我们还为SE研究人员提供了防止数据集间代码重复的策略指导。
{"title":"On Inter-Dataset Code Duplication and Data Leakage in Large Language Models","authors":"José Antonio Hernández López;Boqi Chen;Mootez Saad;Tushar Sharma;Dániel Varró","doi":"10.1109/TSE.2024.3504286","DOIUrl":"10.1109/TSE.2024.3504286","url":null,"abstract":"<italic>Motivation.</i>\u0000 Large language models (\u0000<sc>LLM</small>\u0000s) have exhibited remarkable proficiency in diverse software engineering (\u0000<sc>SE</small>\u0000) tasks, such as code summarization, code translation, and code search. Handling such tasks typically involves acquiring foundational coding knowledge on large, general-purpose datasets during a pre-training phase, and subsequently refining on smaller, task-specific datasets as part of a fine-tuning phase. \u0000<italic>Problem statement.</i>\u0000 Data leakage \u0000<italic>i.e.,</i>\u0000 using information of the test set to perform the model training, is a well-known issue in training of machine learning models. A manifestation of this issue is the intersection of the training and testing splits. While \u0000<italic>intra-dataset</i>\u0000 code duplication examines this intersection within a given dataset and has been addressed in prior research, \u0000<italic>inter-dataset code duplication</i>\u0000, which gauges the overlap between different datasets, remains largely unexplored. If this phenomenon exists, it could compromise the integrity of \u0000<sc>LLM</small>\u0000 evaluations because of the inclusion of fine-tuning test samples that were already encountered during pre-training, resulting in inflated performance metrics. \u0000<italic>Contribution.</i>\u0000 This paper explores the phenomenon of inter-dataset code duplication and its impact on evaluating \u0000<sc>LLM</small>\u0000s across diverse \u0000<sc>SE</small>\u0000 tasks. \u0000<italic>Study design.</i>\u0000 We conduct an empirical study using the \u0000<sc>CodeSearchNet</small>\u0000 dataset (\u0000<sc>csn</small>\u0000), a widely adopted pre-training dataset, and five fine-tuning datasets used for various \u0000<sc>SE</small>\u0000 tasks. We first identify the intersection between the pre-training and fine-tuning datasets using a deduplication process. Next, we pre-train two versions of \u0000<sc>LLM</small>\u0000s using a subset of \u0000<sc>csn</small>\u0000: one leaky \u0000<sc>LLM</small>\u0000, which includes the identified intersection in its pre-training set, and one non-leaky \u0000<sc>LLM</small>\u0000 that excludes these samples. Finally, we fine-tune both models and compare their performances using fine-tuning test samples that are part of the intersection. \u0000<italic>Results.</i>\u0000 Our findings reveal a potential threat to the evaluation of \u0000<sc>LLM</small>\u0000s across multiple \u0000<sc>SE</small>\u0000 tasks, stemming from the inter-dataset code duplication phenomenon. We also demonstrate that this threat is accentuated by the chosen fine-tuning technique. Furthermore, we provide evidence that open-source models such as \u0000<sc>CodeBERT</small>\u0000, \u0000<sc>GraphCodeBERT</small>\u0000, and \u0000<sc>UnixCoder</small>\u0000 could be affected by inter-dataset duplication. Based on our findings, we delve into prior research that may be susceptible to this threat. Additionally, we offer guidance to \u0000<sc>SE</small>\u0000 researchers on strategies to prevent inter-dataset code duplication.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"192-205"},"PeriodicalIF":6.5,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142684364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Line-Level Defect Prediction by Capturing Code Contexts With Graph Convolutional Networks 通过图卷积网络捕捉代码上下文进行线路级缺陷预测
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-20 DOI: 10.1109/TSE.2024.3503723
Shouyu Yin;Shikai Guo;Hui Li;Chenchen Li;Rong Chen;Xiaochen Li;He Jiang
Software defect prediction refers to the systematic analysis and review of software using various approaches and tools to identify potential defects or errors. Software defect prediction aids developers in swiftly identifying defects and optimizing development resource allocation, thus enhancing software quality and reliability. Previous defect prediction approaches still face two main limitations: 1) lacking of contextual semantic information and 2) Ignoring the joint reasoning between different granularities of defect predictions. In response to these challenges, we propose LineDef, a line-level defect prediction approach by capturing code contexts with graph convolutional networks. Specifically, LineDef comprises three components: the token embedding component, the graph extraction component, and the multi-granularity defect prediction component. The token embedding component maps each token to a vector to obtain a high-dimensional semantic feature representation of the token. Subsequently, the graph extraction component utilizes a sliding window to extract line-level and token-level graphs, addressing the challenge of capturing contextual semantic relationships in the code. Finally, the multi-granularity defect prediction component leverages graph convolutional layers and attention mechanisms to acquire prediction labels and risk scores, thereby achieving file-level and line-level defect prediction. Experimental studies on 32 datasets across 9 different software projects show that LineDef exhibits significantly enhanced balanced accuracy, ranging from 15.61% to 45.20%, compared to state-of-the-art file-level defect prediction approaches, and a remarkable cost-effectiveness improvement ranging from 15.32% to 278%, compared to state-of-the-art line-level defect prediction approaches. These results demonstrate that LineDef approach can extract more comprehensive information from lines of code for defect prediction.
软件缺陷预测是指使用各种方法和工具对软件进行系统的分析和审查,以识别潜在的缺陷或错误。软件缺陷预测帮助开发人员快速识别缺陷并优化开发资源分配,从而提高软件质量和可靠性。以前的缺陷预测方法仍然面临两个主要的局限性:1)缺乏上下文语义信息;2)忽略了不同粒度缺陷预测之间的联合推理。为了应对这些挑战,我们提出了LineDef,这是一种通过使用图卷积网络捕获代码上下文的行级缺陷预测方法。具体来说,LineDef包括三个组件:令牌嵌入组件、图提取组件和多粒度缺陷预测组件。标记嵌入组件将每个标记映射到一个向量,以获得标记的高维语义特征表示。随后,图形提取组件利用滑动窗口提取行级和记号级图形,解决了在代码中捕获上下文语义关系的挑战。最后,多粒度缺陷预测组件利用图卷积层和关注机制获取预测标签和风险评分,从而实现文件级和行级缺陷预测。对9个不同软件项目的32个数据集的实验研究表明,LineDef与最先进的文件级缺陷预测方法相比,显示出显著增强的平衡精度,范围从15.61%到45.20%,与最先进的行级缺陷预测方法相比,显著的成本效益改进范围从15.32%到278%。这些结果表明LineDef方法可以从代码行中提取更全面的信息来进行缺陷预测。
{"title":"Line-Level Defect Prediction by Capturing Code Contexts With Graph Convolutional Networks","authors":"Shouyu Yin;Shikai Guo;Hui Li;Chenchen Li;Rong Chen;Xiaochen Li;He Jiang","doi":"10.1109/TSE.2024.3503723","DOIUrl":"10.1109/TSE.2024.3503723","url":null,"abstract":"Software defect prediction refers to the systematic analysis and review of software using various approaches and tools to identify potential defects or errors. Software defect prediction aids developers in swiftly identifying defects and optimizing development resource allocation, thus enhancing software quality and reliability. Previous defect prediction approaches still face two main limitations: 1) lacking of contextual semantic information and 2) Ignoring the joint reasoning between different granularities of defect predictions. In response to these challenges, we propose LineDef, a line-level defect prediction approach by capturing code contexts with graph convolutional networks. Specifically, LineDef comprises three components: the token embedding component, the graph extraction component, and the multi-granularity defect prediction component. The token embedding component maps each token to a vector to obtain a high-dimensional semantic feature representation of the token. Subsequently, the graph extraction component utilizes a sliding window to extract line-level and token-level graphs, addressing the challenge of capturing contextual semantic relationships in the code. Finally, the multi-granularity defect prediction component leverages graph convolutional layers and attention mechanisms to acquire prediction labels and risk scores, thereby achieving file-level and line-level defect prediction. Experimental studies on 32 datasets across 9 different software projects show that LineDef exhibits significantly enhanced balanced accuracy, ranging from 15.61% to 45.20%, compared to state-of-the-art file-level defect prediction approaches, and a remarkable cost-effectiveness improvement ranging from 15.32% to 278%, compared to state-of-the-art line-level defect prediction approaches. These results demonstrate that LineDef approach can extract more comprehensive information from lines of code for defect prediction.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"172-191"},"PeriodicalIF":6.5,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1