首页 > 最新文献

IEEE Transactions on Software Engineering最新文献

英文 中文
Sprint2Vec: A Deep Characterization of Sprints in Iterative Software Development Sprint2Vec:迭代软件开发中对sprint的深入描述
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-29 DOI: 10.1109/TSE.2024.3509016
Morakot Choetkiertikul;Peerachai Banyongrakkul;Chaiyong Ragkhitwetsagul;Suppawong Tuarob;Hoa Khanh Dam;Thanwadee Sunetnanta
Iterative approaches like Agile Scrum are commonly adopted to enhance the software development process. However, challenges such as schedule and budget overruns still persist in many software projects. Several approaches employ machine learning techniques, particularly classification, to facilitate decision-making in iterative software development. Existing approaches often concentrate on characterizing a sprint to predict solely productivity. We introduce Sprint2Vec, which leverages three aspects of sprint information – sprint attributes, issue attributes, and the developers involved in a sprint, to comprehensively characterize it for predicting both productivity and quality outcomes of the sprints. Our approach combines traditional feature extraction techniques with automated deep learning-based unsupervised feature learning techniques. We utilize methods like Long Short-Term Memory (LSTM) to enhance our feature learning process. This enables us to learn features from unstructured data, such as textual descriptions of issues and sequences of developer activities. We conducted an evaluation of our approach on two regression tasks: predicting the deliverability (i.e., the amount of work delivered from a sprint) and quality of a sprint (i.e., the amount of delivered work that requires rework). The evaluation results on five well-known open-source projects (Apache, Atlassian, Jenkins, Spring, and Talendforge) demonstrate our approach's superior performance compared to baseline and alternative approaches.
像敏捷Scrum这样的迭代方法通常被用来增强软件开发过程。然而,进度和预算超支等挑战仍然存在于许多软件项目中。有几种方法使用机器学习技术,特别是分类,来促进迭代软件开发中的决策。现有的方法通常集中于描述冲刺,以预测生产率。我们介绍了Sprint2Vec,它利用了sprint信息的三个方面——sprint属性、问题属性和sprint中涉及的开发人员,来全面地描述它,以预测sprint的生产力和质量结果。我们的方法结合了传统的特征提取技术和基于自动深度学习的无监督特征学习技术。我们利用长短期记忆(LSTM)等方法来增强我们的特征学习过程。这使我们能够从非结构化数据中学习特性,例如问题的文本描述和开发人员活动的序列。我们在两个回归任务上对我们的方法进行了评估:预测可交付性(即,从冲刺中交付的工作量)和冲刺的质量(即,需要返工的交付工作量)。在五个知名的开源项目(Apache、Atlassian、Jenkins、Spring和Talendforge)上的评估结果表明,与基线和替代方法相比,我们的方法具有优越的性能。
{"title":"Sprint2Vec: A Deep Characterization of Sprints in Iterative Software Development","authors":"Morakot Choetkiertikul;Peerachai Banyongrakkul;Chaiyong Ragkhitwetsagul;Suppawong Tuarob;Hoa Khanh Dam;Thanwadee Sunetnanta","doi":"10.1109/TSE.2024.3509016","DOIUrl":"10.1109/TSE.2024.3509016","url":null,"abstract":"Iterative approaches like Agile Scrum are commonly adopted to enhance the software development process. However, challenges such as schedule and budget overruns still persist in many software projects. Several approaches employ machine learning techniques, particularly classification, to facilitate decision-making in iterative software development. Existing approaches often concentrate on characterizing a sprint to predict solely productivity. We introduce Sprint2Vec, which leverages three aspects of sprint information – sprint attributes, issue attributes, and the developers involved in a sprint, to comprehensively characterize it for predicting both productivity and quality outcomes of the sprints. Our approach combines traditional feature extraction techniques with automated deep learning-based unsupervised feature learning techniques. We utilize methods like Long Short-Term Memory (LSTM) to enhance our feature learning process. This enables us to learn features from unstructured data, such as textual descriptions of issues and sequences of developer activities. We conducted an evaluation of our approach on two regression tasks: predicting the deliverability (i.e., the amount of work delivered from a sprint) and quality of a sprint (i.e., the amount of delivered work that requires rework). The evaluation results on five well-known open-source projects (Apache, Atlassian, Jenkins, Spring, and Talendforge) demonstrate our approach's superior performance compared to baseline and alternative approaches.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"220-242"},"PeriodicalIF":6.5,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10771809","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PackHunter: Recovering Missing Packages for C/C++ Projects 为C/ c++项目恢复丢失的包
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-27 DOI: 10.1109/TSE.2024.3506629
Rongxin Wu;Zhiling Huang;Zige Tian;Chengpeng Wang;Xiangyu Zhang
The reproducibility of software artifacts is a critical aspect of software development and application. However, current research indicates that a notable proportion of C/C++ projects encounter non-reproducibility issues stemming from build failures, primarily attributed to the absence of necessary packages. This paper introduces PackHunter, a novel technique that automates the recovery of missing packages in C/C++ projects. By identifying missing files during the project's build process, PackHunter can determine potentially missing packages and synthesize an installation script. Specifically, it simplifies C/C++ projects through program reduction to reduce build overhead and simulates the presence of missing files via mock build to ensure a successful build for probing missing files. Besides, PackHunter leverages a sophisticated design to eliminate packages that do not contain the required missing files, effectively reducing the search space. Furthermore, PackHunter introduces a greedy strategy to prioritize the packages, eventually recovering missing packages with few times of package enumeration. We have implemented PackHunter as a tool and evaluated it on 30 real-world projects. The results demonstrate that PackHunter can recover missing packages efficiently, achieving 26.59$boldsymbol{times}$ speed up over the state-of-the-art approach. The effectiveness of PackHunter highlights its potential to assist developers in building C/C++ artifacts and promote software reproducibility.
软件工件的可再现性是软件开发和应用程序的一个关键方面。然而,目前的研究表明,相当一部分C/ c++项目遇到了由构建失败引起的不可再现性问题,主要归因于缺乏必要的包。本文介绍了一种在C/ c++项目中自动恢复丢失包的新技术PackHunter。通过在项目构建过程中识别丢失的文件,PackHunter可以确定可能丢失的包并合成一个安装脚本。具体来说,它通过程序缩减来简化C/ c++项目,以减少构建开销,并通过模拟构建来模拟缺失文件的存在,以确保成功构建以探测缺失文件。此外,PackHunter利用复杂的设计来消除不包含所需丢失文件的包,有效地减少了搜索空间。此外,PackHunter引入了贪心策略来对包进行优先级排序,最终通过很少的包枚举来恢复丢失的包。我们已经将PackHunter作为工具实现,并在30个实际项目中对其进行了评估。结果表明,PackHunter可以有效地恢复丢失的包,达到26.59美元的速度比最先进的方法。PackHunter的有效性突出了它在帮助开发人员构建C/ c++工件和提高软件可重复性方面的潜力。
{"title":"PackHunter: Recovering Missing Packages for C/C++ Projects","authors":"Rongxin Wu;Zhiling Huang;Zige Tian;Chengpeng Wang;Xiangyu Zhang","doi":"10.1109/TSE.2024.3506629","DOIUrl":"10.1109/TSE.2024.3506629","url":null,"abstract":"The reproducibility of software artifacts is a critical aspect of software development and application. However, current research indicates that a notable proportion of C/C++ projects encounter non-reproducibility issues stemming from build failures, primarily attributed to the absence of necessary packages. This paper introduces \u0000<small>PackHunter</small>\u0000, a novel technique that automates the recovery of missing packages in C/C++ projects. By identifying missing files during the project's build process, \u0000<small>PackHunter</small>\u0000 can determine potentially missing packages and synthesize an installation script. Specifically, it simplifies C/C++ projects through program reduction to reduce build overhead and simulates the presence of missing files via mock build to ensure a successful build for probing missing files. Besides, \u0000<small>PackHunter</small>\u0000 leverages a sophisticated design to eliminate packages that do not contain the required missing files, effectively reducing the search space. Furthermore, \u0000<small>PackHunter</small>\u0000 introduces a greedy strategy to prioritize the packages, eventually recovering missing packages with few times of package enumeration. We have implemented \u0000<small>PackHunter</small>\u0000 as a tool and evaluated it on 30 real-world projects. The results demonstrate that \u0000<small>PackHunter</small>\u0000 can recover missing packages efficiently, achieving 26.59\u0000<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>\u0000 speed up over the state-of-the-art approach. The effectiveness of \u0000<small>PackHunter</small>\u0000 highlights its potential to assist developers in building C/C++ artifacts and promote software reproducibility.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"206-219"},"PeriodicalIF":6.5,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On-the-Fly Syntax Highlighting: Generalisation and Speed-ups 即时语法高亮:泛化和加速
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-26 DOI: 10.1109/tse.2024.3506040
Marco Edoardo Palma, Alex Wolf, Pasquale Salza, Harald C. Gall
{"title":"On-the-Fly Syntax Highlighting: Generalisation and Speed-ups","authors":"Marco Edoardo Palma, Alex Wolf, Pasquale Salza, Harald C. Gall","doi":"10.1109/tse.2024.3506040","DOIUrl":"https://doi.org/10.1109/tse.2024.3506040","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"13 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142718350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Triple Peak Day: Work Rhythms of Software Developers in Hybrid Work 三倍峰值日:混合工作中软件开发人员的工作节奏
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-22 DOI: 10.1109/tse.2024.3504831
Javier Hernandez, Vedant Das Swain, Jina Suh, Daniel McDuff, Judith Amores, Gonzalo Ramos, Kael Rowan, Brian Houck, Shamsi Iqbal, Mary Czerwinski
{"title":"Triple Peak Day: Work Rhythms of Software Developers in Hybrid Work","authors":"Javier Hernandez, Vedant Das Swain, Jina Suh, Daniel McDuff, Judith Amores, Gonzalo Ramos, Kael Rowan, Brian Houck, Shamsi Iqbal, Mary Czerwinski","doi":"10.1109/tse.2024.3504831","DOIUrl":"https://doi.org/10.1109/tse.2024.3504831","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"18 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142690736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenProgJS: a Baseline System for Test-based Automated Repair of JavaScript Programs GenProgJS:基于测试的 JavaScript 程序自动修复基准系统
IF 7.4 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-21 DOI: 10.1109/tse.2024.3497798
Viktor Csuvik, Dániel Horváth, Márk Lajkó, László Vidács
{"title":"GenProgJS: a Baseline System for Test-based Automated Repair of JavaScript Programs","authors":"Viktor Csuvik, Dániel Horváth, Márk Lajkó, László Vidács","doi":"10.1109/tse.2024.3497798","DOIUrl":"https://doi.org/10.1109/tse.2024.3497798","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"23 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142684362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Inter-Dataset Code Duplication and Data Leakage in Large Language Models 论大型语言模型中的数据集间代码重复和数据泄露
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-21 DOI: 10.1109/TSE.2024.3504286
José Antonio Hernández López;Boqi Chen;Mootez Saad;Tushar Sharma;Dániel Varró
Motivation. Large language models (LLMs) have exhibited remarkable proficiency in diverse software engineering (SE) tasks, such as code summarization, code translation, and code search. Handling such tasks typically involves acquiring foundational coding knowledge on large, general-purpose datasets during a pre-training phase, and subsequently refining on smaller, task-specific datasets as part of a fine-tuning phase. Problem statement. Data leakage i.e., using information of the test set to perform the model training, is a well-known issue in training of machine learning models. A manifestation of this issue is the intersection of the training and testing splits. While intra-dataset code duplication examines this intersection within a given dataset and has been addressed in prior research, inter-dataset code duplication, which gauges the overlap between different datasets, remains largely unexplored. If this phenomenon exists, it could compromise the integrity of LLM evaluations because of the inclusion of fine-tuning test samples that were already encountered during pre-training, resulting in inflated performance metrics. Contribution. This paper explores the phenomenon of inter-dataset code duplication and its impact on evaluating LLMs across diverse SE tasks. Study design. We conduct an empirical study using the CodeSearchNet dataset (csn), a widely adopted pre-training dataset, and five fine-tuning datasets used for various SE tasks. We first identify the intersection between the pre-training and fine-tuning datasets using a deduplication process. Next, we pre-train two versions of LLMs using a subset of csn: one leaky LLM, which includes the identified intersection in its pre-training set, and one non-leaky LLM that excludes these samples. Finally, we fine-tune both models and compare their performances using fine-tuning test samples that are part of the intersection. Results. Our findings reveal a potential threat to the evaluation of LLMs across multiple SE tasks, stemming from the inter-dataset code duplication phenomenon. We also demonstrate that this threat is accentuated by the chosen fine-tuning technique. Furthermore, we provide evidence that open-source models such as CodeBERT, GraphCodeBERT, and UnixCoder could be affected by inter-dataset duplication. Based on our findings, we delve into prior research that may be susceptible to this threat. Additionally, we offer guidance to SE researchers on strategies to prevent inter-dataset code duplication.
动机。大型语言模型(llm)在不同的软件工程(SE)任务中表现出了显著的熟练程度,例如代码总结、代码翻译和代码搜索。处理此类任务通常需要在预训练阶段获取大型通用数据集的基础编码知识,然后在微调阶段对较小的任务特定数据集进行细化。问题陈述。数据泄漏,即利用测试集的信息进行模型训练,是机器学习模型训练中一个众所周知的问题。这个问题的一个表现就是训练和测试分离的交集。虽然数据集内代码复制检查了给定数据集内的交叉点,并且已经在先前的研究中得到了解决,但数据集间代码复制(衡量不同数据集之间的重叠)在很大程度上仍未被探索。如果这种现象存在,它可能会损害LLM评估的完整性,因为它包含了在预训练期间已经遇到的微调测试样本,从而导致虚增的性能指标。的贡献。本文探讨了数据集间代码重复的现象及其对跨不同SE任务评估llm的影响。研究设计。我们使用CodeSearchNet数据集(csn)进行实证研究,这是一个广泛采用的预训练数据集,以及用于各种SE任务的五个微调数据集。我们首先使用重复数据删除过程确定预训练和微调数据集之间的交集。接下来,我们使用csn的一个子集预训练两个版本的LLM:一个漏的LLM,它在其预训练集中包括已识别的交集,另一个非漏的LLM排除这些样本。最后,我们对两个模型进行微调,并使用作为交集一部分的微调测试样本来比较它们的性能。结果。我们的研究结果揭示了跨多个SE任务的llm评估的潜在威胁,源于数据集间代码重复现象。我们还证明,所选择的微调技术加剧了这种威胁。此外,我们提供的证据表明,CodeBERT、GraphCodeBERT和UnixCoder等开源模型可能受到数据集间复制的影响。根据我们的发现,我们深入研究了可能易受这种威胁的先前研究。此外,我们还为SE研究人员提供了防止数据集间代码重复的策略指导。
{"title":"On Inter-Dataset Code Duplication and Data Leakage in Large Language Models","authors":"José Antonio Hernández López;Boqi Chen;Mootez Saad;Tushar Sharma;Dániel Varró","doi":"10.1109/TSE.2024.3504286","DOIUrl":"10.1109/TSE.2024.3504286","url":null,"abstract":"<italic>Motivation.</i>\u0000 Large language models (\u0000<sc>LLM</small>\u0000s) have exhibited remarkable proficiency in diverse software engineering (\u0000<sc>SE</small>\u0000) tasks, such as code summarization, code translation, and code search. Handling such tasks typically involves acquiring foundational coding knowledge on large, general-purpose datasets during a pre-training phase, and subsequently refining on smaller, task-specific datasets as part of a fine-tuning phase. \u0000<italic>Problem statement.</i>\u0000 Data leakage \u0000<italic>i.e.,</i>\u0000 using information of the test set to perform the model training, is a well-known issue in training of machine learning models. A manifestation of this issue is the intersection of the training and testing splits. While \u0000<italic>intra-dataset</i>\u0000 code duplication examines this intersection within a given dataset and has been addressed in prior research, \u0000<italic>inter-dataset code duplication</i>\u0000, which gauges the overlap between different datasets, remains largely unexplored. If this phenomenon exists, it could compromise the integrity of \u0000<sc>LLM</small>\u0000 evaluations because of the inclusion of fine-tuning test samples that were already encountered during pre-training, resulting in inflated performance metrics. \u0000<italic>Contribution.</i>\u0000 This paper explores the phenomenon of inter-dataset code duplication and its impact on evaluating \u0000<sc>LLM</small>\u0000s across diverse \u0000<sc>SE</small>\u0000 tasks. \u0000<italic>Study design.</i>\u0000 We conduct an empirical study using the \u0000<sc>CodeSearchNet</small>\u0000 dataset (\u0000<sc>csn</small>\u0000), a widely adopted pre-training dataset, and five fine-tuning datasets used for various \u0000<sc>SE</small>\u0000 tasks. We first identify the intersection between the pre-training and fine-tuning datasets using a deduplication process. Next, we pre-train two versions of \u0000<sc>LLM</small>\u0000s using a subset of \u0000<sc>csn</small>\u0000: one leaky \u0000<sc>LLM</small>\u0000, which includes the identified intersection in its pre-training set, and one non-leaky \u0000<sc>LLM</small>\u0000 that excludes these samples. Finally, we fine-tune both models and compare their performances using fine-tuning test samples that are part of the intersection. \u0000<italic>Results.</i>\u0000 Our findings reveal a potential threat to the evaluation of \u0000<sc>LLM</small>\u0000s across multiple \u0000<sc>SE</small>\u0000 tasks, stemming from the inter-dataset code duplication phenomenon. We also demonstrate that this threat is accentuated by the chosen fine-tuning technique. Furthermore, we provide evidence that open-source models such as \u0000<sc>CodeBERT</small>\u0000, \u0000<sc>GraphCodeBERT</small>\u0000, and \u0000<sc>UnixCoder</small>\u0000 could be affected by inter-dataset duplication. Based on our findings, we delve into prior research that may be susceptible to this threat. Additionally, we offer guidance to \u0000<sc>SE</small>\u0000 researchers on strategies to prevent inter-dataset code duplication.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"192-205"},"PeriodicalIF":6.5,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142684364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Line-Level Defect Prediction by Capturing Code Contexts With Graph Convolutional Networks 通过图卷积网络捕捉代码上下文进行线路级缺陷预测
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-20 DOI: 10.1109/TSE.2024.3503723
Shouyu Yin;Shikai Guo;Hui Li;Chenchen Li;Rong Chen;Xiaochen Li;He Jiang
Software defect prediction refers to the systematic analysis and review of software using various approaches and tools to identify potential defects or errors. Software defect prediction aids developers in swiftly identifying defects and optimizing development resource allocation, thus enhancing software quality and reliability. Previous defect prediction approaches still face two main limitations: 1) lacking of contextual semantic information and 2) Ignoring the joint reasoning between different granularities of defect predictions. In response to these challenges, we propose LineDef, a line-level defect prediction approach by capturing code contexts with graph convolutional networks. Specifically, LineDef comprises three components: the token embedding component, the graph extraction component, and the multi-granularity defect prediction component. The token embedding component maps each token to a vector to obtain a high-dimensional semantic feature representation of the token. Subsequently, the graph extraction component utilizes a sliding window to extract line-level and token-level graphs, addressing the challenge of capturing contextual semantic relationships in the code. Finally, the multi-granularity defect prediction component leverages graph convolutional layers and attention mechanisms to acquire prediction labels and risk scores, thereby achieving file-level and line-level defect prediction. Experimental studies on 32 datasets across 9 different software projects show that LineDef exhibits significantly enhanced balanced accuracy, ranging from 15.61% to 45.20%, compared to state-of-the-art file-level defect prediction approaches, and a remarkable cost-effectiveness improvement ranging from 15.32% to 278%, compared to state-of-the-art line-level defect prediction approaches. These results demonstrate that LineDef approach can extract more comprehensive information from lines of code for defect prediction.
软件缺陷预测是指使用各种方法和工具对软件进行系统的分析和审查,以识别潜在的缺陷或错误。软件缺陷预测帮助开发人员快速识别缺陷并优化开发资源分配,从而提高软件质量和可靠性。以前的缺陷预测方法仍然面临两个主要的局限性:1)缺乏上下文语义信息;2)忽略了不同粒度缺陷预测之间的联合推理。为了应对这些挑战,我们提出了LineDef,这是一种通过使用图卷积网络捕获代码上下文的行级缺陷预测方法。具体来说,LineDef包括三个组件:令牌嵌入组件、图提取组件和多粒度缺陷预测组件。标记嵌入组件将每个标记映射到一个向量,以获得标记的高维语义特征表示。随后,图形提取组件利用滑动窗口提取行级和记号级图形,解决了在代码中捕获上下文语义关系的挑战。最后,多粒度缺陷预测组件利用图卷积层和关注机制获取预测标签和风险评分,从而实现文件级和行级缺陷预测。对9个不同软件项目的32个数据集的实验研究表明,LineDef与最先进的文件级缺陷预测方法相比,显示出显著增强的平衡精度,范围从15.61%到45.20%,与最先进的行级缺陷预测方法相比,显著的成本效益改进范围从15.32%到278%。这些结果表明LineDef方法可以从代码行中提取更全面的信息来进行缺陷预测。
{"title":"Line-Level Defect Prediction by Capturing Code Contexts With Graph Convolutional Networks","authors":"Shouyu Yin;Shikai Guo;Hui Li;Chenchen Li;Rong Chen;Xiaochen Li;He Jiang","doi":"10.1109/TSE.2024.3503723","DOIUrl":"10.1109/TSE.2024.3503723","url":null,"abstract":"Software defect prediction refers to the systematic analysis and review of software using various approaches and tools to identify potential defects or errors. Software defect prediction aids developers in swiftly identifying defects and optimizing development resource allocation, thus enhancing software quality and reliability. Previous defect prediction approaches still face two main limitations: 1) lacking of contextual semantic information and 2) Ignoring the joint reasoning between different granularities of defect predictions. In response to these challenges, we propose LineDef, a line-level defect prediction approach by capturing code contexts with graph convolutional networks. Specifically, LineDef comprises three components: the token embedding component, the graph extraction component, and the multi-granularity defect prediction component. The token embedding component maps each token to a vector to obtain a high-dimensional semantic feature representation of the token. Subsequently, the graph extraction component utilizes a sliding window to extract line-level and token-level graphs, addressing the challenge of capturing contextual semantic relationships in the code. Finally, the multi-granularity defect prediction component leverages graph convolutional layers and attention mechanisms to acquire prediction labels and risk scores, thereby achieving file-level and line-level defect prediction. Experimental studies on 32 datasets across 9 different software projects show that LineDef exhibits significantly enhanced balanced accuracy, ranging from 15.61% to 45.20%, compared to state-of-the-art file-level defect prediction approaches, and a remarkable cost-effectiveness improvement ranging from 15.32% to 278%, compared to state-of-the-art line-level defect prediction approaches. These results demonstrate that LineDef approach can extract more comprehensive information from lines of code for defect prediction.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"172-191"},"PeriodicalIF":6.5,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does Treatment Adherence Impact Experiment Results in TDD? 坚持治疗会影响 TDD 的实验结果吗?
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-15 DOI: 10.1109/TSE.2024.3497332
Itir Karac;Jose Ignacio Panach;Burak Turhan;Natalia Juristo
Context: In software engineering (SE) experiments, the way in which a treatment is applied could affect results. Different interpretations of how to apply the treatment and decisions on treatment adherence could lead to different results when data are analysed. Objective: This paper aims to study whether treatment adherence has an impact on the results of an SE experiment. Method: The experiment used as test case for our research uses Test-Driven Development (TDD) and Incremental Test-Last Development, (ITLD) as treatments. We reported elsewhere the design and results of such an experiment where 24 participants were recruited from industry. Here, we compare experiment results depending on the use of data from adherent participants or data from all the participants irrespective of their adherence to treatments. Results: Only 40% of the participants adhere to both TDD protocol and to the ITLD protocol; 27% never followed TDD; 20% used TDD even in the control group; 13% are defiers (used TDD in ITLD session but not in TDD session). Considering that both TDD and ITLD are less complex than other SE methods, we can hypothesize that more complex SE techniques could get even lower adherence to the treatment. Conclusion: Both TDD and ITLD are applied differently across participants. Training participants could not be enough to ensure a medium to large adherence of experiment participants. Adherence to treatments impacts results and should not be taken for granted in SE experiments.
背景:在软件工程(SE)实验中,应用处理的方式可能会影响结果。在分析数据时,对如何应用治疗和对治疗依从性的决定的不同解释可能导致不同的结果。目的:研究治疗依从性是否对SE实验结果有影响。方法:实验作为我们研究的测试用例,使用测试驱动开发(TDD)和增量测试最后开发(ITLD)作为处理方法。我们在其他地方报道了这样一个实验的设计和结果,该实验从工业界招募了24名参与者。在这里,我们根据使用来自坚持治疗的参与者的数据或来自所有参与者的数据来比较实验结果,而不管他们是否坚持治疗。结果:只有40%的参与者同时遵守TDD协议和ITLD协议;27%的人从未遵循TDD;对照组也有20%使用TDD;13%是定义者(在ITLD会话中使用TDD,但不在TDD会话中使用)。考虑到TDD和ITLD都没有其他SE方法那么复杂,我们可以假设更复杂的SE技术可以获得更低的治疗依从性。结论:TDD和ITLD在参与者中的应用不同。培训参与者不足以确保实验参与者的中等到较大的依从性。坚持治疗会影响结果,在SE实验中不应被认为是理所当然的。
{"title":"Does Treatment Adherence Impact Experiment Results in TDD?","authors":"Itir Karac;Jose Ignacio Panach;Burak Turhan;Natalia Juristo","doi":"10.1109/TSE.2024.3497332","DOIUrl":"10.1109/TSE.2024.3497332","url":null,"abstract":"<bold>Context:</b>\u0000 In software engineering (SE) experiments, the way in which a treatment is applied could affect results. Different interpretations of how to apply the treatment and decisions on treatment adherence could lead to different results when data are analysed. \u0000<bold>Objective:</b>\u0000 This paper aims to study whether treatment adherence has an impact on the results of an SE experiment. \u0000<bold>Method:</b>\u0000 The experiment used as test case for our research uses Test-Driven Development (TDD) and Incremental Test-Last Development, (ITLD) as treatments. We reported elsewhere the design and results of such an experiment where 24 participants were recruited from industry. Here, we compare experiment results depending on the use of data from adherent participants or data from all the participants irrespective of their adherence to treatments. \u0000<bold>Results:</b>\u0000 Only 40% of the participants adhere to both TDD protocol and to the ITLD protocol; 27% never followed TDD; 20% used TDD even in the control group; 13% are defiers (used TDD in ITLD session but not in TDD session). Considering that both TDD and ITLD are less complex than other SE methods, we can hypothesize that more complex SE techniques could get even lower adherence to the treatment. \u0000<bold>Conclusion:</b>\u0000 Both TDD and ITLD are applied differently across participants. Training participants could not be enough to ensure a medium to large adherence of experiment participants. Adherence to treatments impacts results and should not be taken for granted in SE experiments.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"135-152"},"PeriodicalIF":6.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10754655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142642983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scoping Software Engineering for AI: The TSE Perspective 人工智能软件工程的范围界定:TSE 的视角
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-13 DOI: 10.1109/TSE.2024.3470368
Sebastian Uchitel;Marsha Chechik;Massimiliano Di Penta;Bram Adams;Nazareno Aguirre;Gabriele Bavota;Domenico Bianculli;Kelly Blincoe;Ana Cavalcanti;Yvonne Dittrich;Filomena Ferrucci;Rashina Hoda;LiGuo Huang;David Lo;Michael R. Lyu;Lei Ma;Jonathan I. Maletic;Leonardo Mariani;Collin McMillan;Tim Menzies;Martin Monperrus;Ana Moreno;Nachiappan Nagappan;Liliana Pasquale;Patrizio Pelliccione;Michael Pradel;Rahul Purandare;Sukyoung Ryu;Mehrdad Sabetzadeh;Alexander Serebrenik;Jun Sun;Kla Tantithamthavorn;Christoph Treude;Manuel Wimmer;Yingfei Xiong;Tao Yue;Andy Zaidman;Tao Zhang;Hao Zhong
{"title":"Scoping Software Engineering for AI: The TSE Perspective","authors":"Sebastian Uchitel;Marsha Chechik;Massimiliano Di Penta;Bram Adams;Nazareno Aguirre;Gabriele Bavota;Domenico Bianculli;Kelly Blincoe;Ana Cavalcanti;Yvonne Dittrich;Filomena Ferrucci;Rashina Hoda;LiGuo Huang;David Lo;Michael R. Lyu;Lei Ma;Jonathan I. Maletic;Leonardo Mariani;Collin McMillan;Tim Menzies;Martin Monperrus;Ana Moreno;Nachiappan Nagappan;Liliana Pasquale;Patrizio Pelliccione;Michael Pradel;Rahul Purandare;Sukyoung Ryu;Mehrdad Sabetzadeh;Alexander Serebrenik;Jun Sun;Kla Tantithamthavorn;Christoph Treude;Manuel Wimmer;Yingfei Xiong;Tao Yue;Andy Zaidman;Tao Zhang;Hao Zhong","doi":"10.1109/TSE.2024.3470368","DOIUrl":"10.1109/TSE.2024.3470368","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2709-2711"},"PeriodicalIF":6.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10752650","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Context-Aware Clustering Approach for Assisting Operators in Classifying Security Alerts 协助操作员对安全警报进行分类的情境感知聚类方法
IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-11-13 DOI: 10.1109/TSE.2024.3497588
Yu Liu;Tong Li;Runzi Zhang;Zhao Jin;Mingkai Tong;Wenmao Liu;Yiting Wang;Zhen Yang
Modern software has evolved from delivering software products to web services and applications, which need to be protected by security operation centers (SOC) against ubiquitous cyber attacks. Numerous security alerts are continuously generated every day, which have to be efficiently and correctly processed to identify potential threats. Many AIOps (artificial intelligence for IT operations) approaches have been proposed to (semi-)automate the inspection of alerts so as to reduce manual effort as much as possible. However, due to the ever-complicating attacks, a significant amount of manual work is still required in practice to ensure correct analysis results. In this paper, we propose a Context-Aware cLustering approach for cLassifying sEcurity alErts (CALLEE), which fully exploits the rich relationships among alerts in order to precisely identify similar alerts, significantly reducing the workload of SOC. Specifically, we first design a core conceptual model to capture connections among security alerts, based on which we establish corresponding heterogeneous information networks. Next, we systematically design a set of meta-paths to profile typical alert scenarios precisely, contributing to obtaining the representation of security alerts. We then cluster security alerts based on their contextual similarities, considering the tradeoff between the number of clusters and the homogeneity of each cluster. Finally, security operators only need to manually inspect a limited number of alerts within each cluster, pragmatically reducing their workload while ensuring the accuracy of alert classification. To evaluate the effectiveness of our approach, we collaborate with our industrial partner and pragmatically apply the approach to a real alert dataset. The results show that our approach can reduce the workload of SOC by 99.76%, outperforming baseline approaches. In addition, we further investigate the integration of our proposal with the real business scenario of our industrial partner. The feedback from practitioners shows that CALLEE is pragmatically applicable and helpful in industrial settings.
现代软件已经从提供软件产品发展到web服务和应用程序,这些服务和应用程序需要由安全操作中心(SOC)保护,以抵御无处不在的网络攻击。每天都会不断产生大量的安全警报,必须对这些警报进行有效和正确的处理,以识别潜在的威胁。已经提出了许多AIOps (IT操作的人工智能)方法来(半)自动化警报检查,以便尽可能减少人工工作。然而,由于越来越复杂的攻击,在实践中仍然需要大量的手工工作来确保正确的分析结果。在本文中,我们提出了一种上下文感知的安全警报分类聚类方法(CALLEE),该方法充分利用警报之间的丰富关系来精确识别相似的警报,从而大大减少了SOC的工作量。具体而言,我们首先设计了一个核心概念模型来捕获安全警报之间的联系,并在此基础上建立了相应的异构信息网络。接下来,我们系统地设计了一组元路径来精确地分析典型的警报场景,有助于获得安全警报的表示。然后,我们根据上下文相似性对安全警报进行集群,考虑集群数量和每个集群的同质性之间的权衡。最后,安全操作员只需要手动检查每个集群中有限数量的警报,在确保警报分类准确性的同时,切实减少了他们的工作量。为了评估我们方法的有效性,我们与我们的工业合作伙伴合作,并将该方法实用地应用于真实的警报数据集。结果表明,我们的方法可以将SOC的工作负载减少99.76%,优于基准方法。此外,我们进一步研究我们的建议与我们的工业合作伙伴的实际业务场景的集成。从业人员的反馈表明CALLEE在工业环境中具有实用的适用性和帮助。
{"title":"A Context-Aware Clustering Approach for Assisting Operators in Classifying Security Alerts","authors":"Yu Liu;Tong Li;Runzi Zhang;Zhao Jin;Mingkai Tong;Wenmao Liu;Yiting Wang;Zhen Yang","doi":"10.1109/TSE.2024.3497588","DOIUrl":"10.1109/TSE.2024.3497588","url":null,"abstract":"Modern software has evolved from delivering software products to web services and applications, which need to be protected by security operation centers (SOC) against ubiquitous cyber attacks. Numerous security alerts are continuously generated every day, which have to be efficiently and correctly processed to identify potential threats. Many AIOps (artificial intelligence for IT operations) approaches have been proposed to (semi-)automate the inspection of alerts so as to reduce manual effort as much as possible. However, due to the ever-complicating attacks, a significant amount of manual work is still required in practice to ensure correct analysis results. In this paper, we propose a Context-Aware cLustering approach for cLassifying sEcurity alErts (CALLEE), which fully exploits the rich relationships among alerts in order to precisely identify similar alerts, significantly reducing the workload of SOC. Specifically, we first design a core conceptual model to capture connections among security alerts, based on which we establish corresponding heterogeneous information networks. Next, we systematically design a set of meta-paths to profile typical alert scenarios precisely, contributing to obtaining the representation of security alerts. We then cluster security alerts based on their contextual similarities, considering the tradeoff between the number of clusters and the homogeneity of each cluster. Finally, security operators only need to manually inspect a limited number of alerts within each cluster, pragmatically reducing their workload while ensuring the accuracy of alert classification. To evaluate the effectiveness of our approach, we collaborate with our industrial partner and pragmatically apply the approach to a real alert dataset. The results show that our approach can reduce the workload of SOC by 99.76%, outperforming baseline approaches. In addition, we further investigate the integration of our proposal with the real business scenario of our industrial partner. The feedback from practitioners shows that CALLEE is pragmatically applicable and helpful in industrial settings.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"153-171"},"PeriodicalIF":6.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1