IEEE Transactions on Software Engineering最新文献

英文中文

Refactoring Microservices to Microservices in Support of Evolutionary Design 将微服务重构为支持进化设计的微服务

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-31 DOI: 10.1109/TSE.2024.3523487

Chenxing Zhong;Shanshan Li;He Zhang;Huang Huang;Lanxin Yang;Yuanfang Cai

Evolutionary design is a widely accepted practice for defining microservice boundaries. It is performed through a sequence of incremental refactoring tasks (we call it “microservice refactoring”), each restructuring only part of a microservice system (a.k.a., refactoring part) into well-defined services for improving the architecture in a controlled manner. Despite its popularity in practice, microservice refactoring suffers from insufficient methodological support. While there are numerous studies addressing similar software design tasks, i.e., software remodularization and microservitization, their approaches prove inadequate when applied to microservice refactoring. Our analysis reveals that their approaches may even degrade the entire architecture in microservice refactoring, as they only optimize the refactoring part in such applications, but neglect the relationships between the refactoring part and the remaining system. As the first response to the need, Micro2Micro is proposed to re-partition the refactoring part while optimizing three quality objectives including the interdependence between the refactoring and non-refactoring parts. In addition, it allows architects to intervene in the decision-making process by interactively incorporating their knowledge into the iterative search for optimal refactoring solutions. An empirical study on 13 open-source projects of different sizes shows that the solutions from Micro2Micro perform well and exhibit quality improvement with an average up to 45% to the original architecture. Users of Micro2Micro found the suggested solutions highly satisfactory. They acknowledge the advantages in terms of infusing human intelligence into decisions, providing immediate quality feedback, and quick exploration capability.

{"title":"Refactoring Microservices to Microservices in Support of Evolutionary Design","authors":"Chenxing Zhong;Shanshan Li;He Zhang;Huang Huang;Lanxin Yang;Yuanfang Cai","doi":"10.1109/TSE.2024.3523487","DOIUrl":"10.1109/TSE.2024.3523487","url":null,"abstract":"<italic>Evolutionary design is a widely accepted practice for defining microservice boundaries. It is performed through a sequence of incremental refactoring tasks (we call it <italic>“microservice refactoring”), each restructuring only part of a microservice system (<italic>a.k.a., refactoring part) into well-defined services for improving the architecture in a controlled manner. Despite its popularity in practice, microservice refactoring suffers from insufficient methodological support. While there are numerous studies addressing similar software design tasks, <italic>i.<italic>e., software remodularization and microservitization, their approaches prove inadequate when applied to microservice refactoring. Our analysis reveals that their approaches may even degrade the entire architecture in microservice refactoring, as they only optimize the refactoring part in such applications, but neglect the relationships between the refactoring part and the remaining system. As the first response to the need, <italic>Micro2Micro is proposed to re-partition the refactoring part while optimizing three quality objectives including the interdependence between the refactoring and non-refactoring parts. In addition, it allows architects to intervene in the decision-making process by interactively incorporating their knowledge into the iterative search for optimal refactoring solutions. An empirical study on 13 open-source projects of different sizes shows that the solutions from <italic>Micro2Micro perform well and exhibit quality improvement with an average up to 45% to the original architecture. Users of <italic>Micro2Micro found the suggested solutions highly satisfactory. They acknowledge the advantages in terms of infusing human intelligence into decisions, providing immediate quality feedback, and quick exploration capability.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"484-502"},"PeriodicalIF":6.5,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142908423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Retrospective on Developing Code Clone Detector CCFinder and Its Impact 代码克隆检测器CCFinder的发展回顾及其影响

IF 7.4 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-27 DOI: 10.1109/tse.2024.3523370

Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue

引用次数: 0

Do Chase Your Tail! Missing Key Aspects Augmentation in Textual Vulnerability Descriptions of Long-Tail Software Through Feature Inference 一定要追你的尾巴！基于特征推理的长尾软件文本漏洞描述缺失关键方面增强

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-27 DOI: 10.1109/TSE.2024.3523284

Linyi Han;Shidong Pan;Zhenchang Xing;Jiamou Sun;Sofonias Yitagesu;Xiaowang Zhang;Zhiyong Feng

Augmenting missing key aspects in Textual Vulnerability Descriptions (TVDs) is crucial for effective vulnerability analysis. For instance, in TVDs, key aspects include Attack Vector, Vulnerability Type, among others. These key aspects help security engineers understand and address the vulnerability in a timely manner. For software with a large user base (non-long-tail software), augmenting these missing key aspects has significantly advanced vulnerability analysis and software security research. However, software instances with a limited user base (long-tail software) often get overlooked due to inconsistency software names, TVD limited avaliability, and domain-specific jargon, which complicates vulnerability analysis and software repairs. In this paper, we introduce a novel software feature inference framework designed to augment the missing key aspects of TVDs for long-tail software. Firstly, we tackle the issue of non-standard software names found in community-maintained vulnerability databases by cross-referencing government databases with Common Vulnerabilities and Exposures (CVEs). Next, we employ Large Language Models (LLMs) to generate the missing key aspects. However, the limited availability of historical TVDs restricts the variety of examples. To overcome this limitation, we utilize the Common Weakness Enumeration (CWE) to classify all TVDs and select cluster centers as representative examples. To ensure accuracy, we present Natural Language Inference (NLI) models specifically designed for long-tail software. These models identify and eliminate incorrect responses. Additionally, we use a wiki repository to provide explanations for proprietary terms. Our evaluations demonstrate that our approach significantly improves the accuracy of augmenting missing key aspects of TVDs for log-tail software from 0.27 to 0.56 (+107%). Interestingly, the accuracy of non-long-tail software also increases from 64% to 71%. As a result, our approach can be useful in various downstream tasks that require complete TVD information.

{"title":"Do Chase Your Tail! Missing Key Aspects Augmentation in Textual Vulnerability Descriptions of Long-Tail Software Through Feature Inference","authors":"Linyi Han;Shidong Pan;Zhenchang Xing;Jiamou Sun;Sofonias Yitagesu;Xiaowang Zhang;Zhiyong Feng","doi":"10.1109/TSE.2024.3523284","DOIUrl":"10.1109/TSE.2024.3523284","url":null,"abstract":"Augmenting missing key aspects in Textual Vulnerability Descriptions (TVDs) is crucial for effective vulnerability analysis. For instance, in TVDs, key aspects include <italic>Attack Vector, <italic>Vulnerability Type, among others. These key aspects help security engineers understand and address the vulnerability in a timely manner. For software with a large user base (non-long-tail software), augmenting these missing key aspects has significantly advanced vulnerability analysis and software security research. However, software instances with a limited user base (long-tail software) often get overlooked due to inconsistency software names, TVD limited avaliability, and domain-specific jargon, which complicates vulnerability analysis and software repairs. In this paper, we introduce a novel software feature inference framework designed to augment the missing key aspects of TVDs for long-tail software. Firstly, we tackle the issue of non-standard software names found in community-maintained vulnerability databases by cross-referencing government databases with Common Vulnerabilities and Exposures (CVEs). Next, we employ Large Language Models (LLMs) to generate the missing key aspects. However, the limited availability of historical TVDs restricts the variety of examples. To overcome this limitation, we utilize the Common Weakness Enumeration (CWE) to classify all TVDs and select cluster centers as representative examples. To ensure accuracy, we present Natural Language Inference (NLI) models specifically designed for long-tail software. These models identify and eliminate incorrect responses. Additionally, we use a wiki repository to provide explanations for proprietary terms. Our evaluations demonstrate that our approach significantly improves the accuracy of augmenting missing key aspects of TVDs for log-tail software from 0.27 to 0.56 (+107%). Interestingly, the accuracy of non-long-tail software also increases from 64% to 71%. As a result, our approach can be useful in various downstream tasks that require complete TVD information.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"466-483"},"PeriodicalIF":6.5,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Retrospective of Proving the Correctness of Multiprocess Programs 多进程程序正确性证明回顾

IF 7.4 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-24 DOI: 10.1109/tse.2024.3522038

Leslie Lamport

引用次数: 0

Dynamic Change Management: Quiescence Revisited 动态变更管理：重新审视静止状态

IF 7.4 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-23 DOI: 10.1109/tse.2024.3521298

Jeff Kramer, Jeff Magee

引用次数: 0

Reflections of a Former Editor-in-Chief of TSE 《TSE》前总编辑的思考

IF 7.4 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-23 DOI: 10.1109/tse.2024.3521306

Jeff Kramer

引用次数: 0

ArchHypo: Managing Software Architecture Uncertainty Using Hypotheses Engineering ArchHypo：使用假设工程管理软件架构的不确定性

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-19 DOI: 10.1109/TSE.2024.3520477

Kelson Silva;Jorge Melegati;Fabio Silveira;Xiaofeng Wang;Mauricio Ferreira;Eduardo Guerra

Uncertainty is present in software architecture decisions due to a lack of knowledge about the requirements and the solutions involved. However, this uncertainty is usually not made explicit, and decisions can be made based on unproven premises or false assumptions. This paper focuses on a technique called ArchHypo that uses hypotheses engineering to manage uncertainties related to software architecture. It proposes formulating a technical plan based on each hypothesis’ assessment, incorporating measures able to mitigate its impact and reduce uncertainty. To evaluate the proposed technique, this paper reports an application of the technique in a mission-critical project that faced several technical challenges. Conclusions were based on data extracted from the project documentation and a questionnaire answered by all team members. As a result, the application of ArchHypo provided a structured approach to dividing the architectural work through iterations, which facilitated architectural decision-making. However, further research is needed to fully understand its impact across different contexts. On the other hand, the team identified the learning curve and process adjustments required for ArchHypo's adoption as significant challenges that could hinder its widespread adoption. In conclusion, the evidence found in this study indicates that the technique has the potential to provide a suitable way to manage the uncertainties related to software architecture, facilitating the strategic postponement of decisions while addressing their potential impact.

{"title":"ArchHypo: Managing Software Architecture Uncertainty Using Hypotheses Engineering","authors":"Kelson Silva;Jorge Melegati;Fabio Silveira;Xiaofeng Wang;Mauricio Ferreira;Eduardo Guerra","doi":"10.1109/TSE.2024.3520477","DOIUrl":"10.1109/TSE.2024.3520477","url":null,"abstract":"Uncertainty is present in software architecture decisions due to a lack of knowledge about the requirements and the solutions involved. However, this uncertainty is usually not made explicit, and decisions can be made based on unproven premises or false assumptions. This paper focuses on a technique called ArchHypo that uses hypotheses engineering to manage uncertainties related to software architecture. It proposes formulating a technical plan based on each hypothesis’ assessment, incorporating measures able to mitigate its impact and reduce uncertainty. To evaluate the proposed technique, this paper reports an application of the technique in a mission-critical project that faced several technical challenges. Conclusions were based on data extracted from the project documentation and a questionnaire answered by all team members. As a result, the application of ArchHypo provided a structured approach to dividing the architectural work through iterations, which facilitated architectural decision-making. However, further research is needed to fully understand its impact across different contexts. On the other hand, the team identified the learning curve and process adjustments required for ArchHypo's adoption as significant challenges that could hinder its widespread adoption. In conclusion, the evidence found in this study indicates that the technique has the potential to provide a suitable way to manage the uncertainties related to software architecture, facilitating the strategic postponement of decisions while addressing their potential impact.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"430-448"},"PeriodicalIF":6.5,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10807272","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ChatAssert: LLM-Based Test Oracle Generation With External Tools Assistance ChatAssert：利用外部工具辅助基于 LLM 的测试 Oracle 生成

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-16 DOI: 10.1109/TSE.2024.3519159

Ishrak Hayet;Adam Scott;Marcelo d'Amorim

Test oracle generation is an important and challenging problem. Neural-based solutions have been recently proposed for oracle generation but they are still inaccurate. For example, the accuracy of the state-of-the-art technique teco is only 27.5% on its dataset including 3,540 test cases. We propose ChatAssert, a prompt engineering framework designed for oracle generation that uses dynamic and static information to iteratively refine prompts for querying large language models (LLMs). ChatAssert uses code summaries and examples to assist an LLM in generating candidate test oracles, uses a lightweight static analysis to assist the LLM in repairing generated oracles that fail to compile, and uses dynamic information obtained from test runs to help the LLM in repairing oracles that compile but do not pass. Experimental results using an independent publicly-available dataset show that ChatAssert improves the state-of-the-art technique, teco, on key evaluation metrics. For example, it improves Acc@1 by 15%. Overall, results provide initial yet strong evidence that using external tools in the formulation of prompts is an important aid in LLM-based oracle generation.

测试oracle生成是一个重要而富有挑战性的问题。基于神经的解决方案最近被提出用于oracle生成，但它们仍然不准确。例如，最先进的技术teco在其包含3540个测试用例的数据集上的准确性仅为27.5%。我们提出了ChatAssert，这是一个为oracle生成设计的提示工程框架，它使用动态和静态信息来迭代地优化查询大型语言模型（llm）的提示。ChatAssert使用代码摘要和示例来帮助LLM生成候选测试oracle，使用轻量级静态分析来帮助LLM修复生成的无法编译的oracle，并使用从测试运行中获得的动态信息来帮助LLM修复编译但未通过的oracle。使用独立的公开数据集的实验结果表明，ChatAssert在关键评估指标上提高了最先进的技术。例如，它提高了Acc@1 15%。总的来说，结果提供了初步但强有力的证据，表明在制定提示时使用外部工具是基于llm的oracle生成的重要辅助。

{"title":"ChatAssert: LLM-Based Test Oracle Generation With External Tools Assistance","authors":"Ishrak Hayet;Adam Scott;Marcelo d'Amorim","doi":"10.1109/TSE.2024.3519159","DOIUrl":"10.1109/TSE.2024.3519159","url":null,"abstract":"Test oracle generation is an important and challenging problem. Neural-based solutions have been recently proposed for oracle generation but they are still inaccurate. For example, the accuracy of the state-of-the-art technique <sc>teco is only 27.5% on its dataset including 3,540 test cases. We propose <sc>ChatAssert, a prompt engineering framework designed for oracle generation that uses dynamic and static information to iteratively refine prompts for querying large language models (LLMs). <sc>ChatAssert uses code summaries and examples to assist an LLM in generating candidate test oracles, uses a lightweight static analysis to assist the LLM in repairing generated oracles that fail to compile, and uses dynamic information obtained from test runs to help the LLM in repairing oracles that compile but do not pass. Experimental results using an independent publicly-available dataset show that <sc>ChatAssert improves the state-of-the-art technique, <sc>teco, on key evaluation metrics. For example, it improves <italic>Acc@1 by 15%. Overall, results provide initial yet strong evidence that using external tools in the formulation of prompts is an important aid in LLM-based oracle generation.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"305-319"},"PeriodicalIF":6.5,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142832239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced Crowdsourced Test Report Prioritization via Image-and-Text Semantic Understanding and Feature Integration 通过图像和文本语义理解和特征集成增强众包测试报告优先级

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-12 DOI: 10.1109/TSE.2024.3516372

Chunrong Fang;Shengcheng Yu;Quanjun Zhang;Xin Li;Yulei Liu;Zhenyu Chen

Crowdsourced testing has gained prominence in the field of software testing due to its ability to effectively address the challenges posed by the fragmentation problem in mobile app testing. The inherent openness of crowdsourced testing brings diversity to the testing outcome. However, it also presents challenges for app developers in inspecting a substantial quantity of test reports. To help app developers inspect the bugs in crowdsourced test reports as early as possible, crowdsourced test report prioritization has emerged as an effective technology by establishing a systematic optimal report inspecting sequence. Nevertheless, crowdsourced test reports consist of app screenshots and textual descriptions, but current prioritization approaches mostly rely on textual descriptions, and some may add vectorized image features at the image-as-a-whole level or widget level. They still lack precision in accurately characterizing the distinctive features of crowdsourced test reports. In terms of prioritization strategy, prevailing approaches adopt simple prioritization based on features combined merely using weighted coefficients, without adequately considering the semantics, which may result in biased and ineffective outcomes. In this paper, we propose EncrePrior, an enhanced crowdsourced test report prioritization approach via image-and-text semantic understanding and feature integration. EncrePrior extracts distinctive features from crowdsourced test reports. For app screenshots, EncrePrior considers the structure (i.e., GUI layout) and the contents (i.e., GUI widgets), viewing the app screenshot from the macroscopic and microscopic perspectives, respectively. For textual descriptions, EncrePrior considers the Bug Description and Reproduction Step as the bug context. During the prioritization, we do not directly merge the features with weights to guide the prioritization. Instead, in order to comprehensively consider the semantics, we adopt a prioritize-reprioritize strategy. This practice combines different features together by considering their individual ranks. The reports are first prioritized on four features separately. Then, the ranks on four sequences are used to lexicographically reprioritize the test reports with an integration of features from app screenshots and textual descriptions. Results of an empirical study show that EncrePrior outperforms the representative baseline approach DeepPrior by 15.61% on average, ranging from 2.99% to 63.64% on different apps, and the novelly proposed features and prioritization strategy all contribute to the excellent performance of EncrePrior.

众包测试在软件测试领域获得了突出的地位，因为它能够有效地解决手机应用测试中碎片化问题所带来的挑战。众包测试固有的开放性为测试结果带来了多样性。然而，它也给应用程序开发人员带来了检查大量测试报告的挑战。为了帮助应用开发者尽早发现众包测试报告中的bug，众包测试报告优先级排序作为一种有效的技术应运而生，通过建立系统的最优报告检测顺序。尽管如此，众包测试报告由应用截图和文本描述组成，但目前的优先级排序方法主要依赖于文本描述，有些人可能会在图像整体级别或小部件级别添加矢量化图像功能。它们在准确地描述众包测试报告的独特特征方面仍然缺乏精确性。在优先排序策略方面，主流方法采用简单的基于特征组合的优先排序，仅使用加权系数，没有充分考虑语义，可能导致结果有偏差和无效。在本文中，我们提出了一种通过图像和文本语义理解和特征集成的增强众包测试报告优先级方法EncrePrior。EncrePrior从众包测试报告中提取出独特的特征。对于应用截图，EncrePrior会考虑结构（即GUI布局）和内容（即GUI小部件），分别从宏观和微观的角度来查看应用截图。对于文本描述，EncrePrior将Bug描述和复制步骤视为Bug上下文。在优先级划分过程中，我们不会直接将特征与权重合并来指导优先级划分。相反，为了全面考虑语义，我们采用了优先级-重新优先级策略。这种做法通过考虑它们各自的等级，将不同的特征组合在一起。报告首先分别对四个特性进行优先级排序。然后，使用四个序列上的排名，结合应用程序截图和文本描述的功能，按字典顺序重新排列测试报告的优先级。实证研究结果表明，在不同的应用中，EncrePrior的准确率在2.99% ~ 63.64%之间，平均优于具有代表性的基线方法deepreor 15.61%，而新提出的特征和优先级策略都是EncrePrior表现优异的原因。

{"title":"Enhanced Crowdsourced Test Report Prioritization via Image-and-Text Semantic Understanding and Feature Integration","authors":"Chunrong Fang;Shengcheng Yu;Quanjun Zhang;Xin Li;Yulei Liu;Zhenyu Chen","doi":"10.1109/TSE.2024.3516372","DOIUrl":"10.1109/TSE.2024.3516372","url":null,"abstract":"Crowdsourced testing has gained prominence in the field of software testing due to its ability to effectively address the challenges posed by the fragmentation problem in mobile app testing. The inherent openness of crowdsourced testing brings diversity to the testing outcome. However, it also presents challenges for app developers in inspecting a substantial quantity of test reports. To help app developers inspect the bugs in crowdsourced test reports as early as possible, crowdsourced test report prioritization has emerged as an effective technology by establishing a systematic optimal report inspecting sequence. Nevertheless, crowdsourced test reports consist of app screenshots and textual descriptions, but current prioritization approaches mostly rely on textual descriptions, and some may add vectorized image features at the image-as-a-whole level or widget level. They still lack precision in accurately characterizing the distinctive features of crowdsourced test reports. In terms of prioritization strategy, prevailing approaches adopt simple prioritization based on features combined merely using weighted coefficients, without adequately considering the semantics, which may result in biased and ineffective outcomes. In this paper, we propose \u0000<sc>EncrePrior\u0000, an enhanced crowdsourced test report prioritization approach via image-and-text semantic understanding and feature integration. \u0000<sc>EncrePrior\u0000 extracts distinctive features from crowdsourced test reports. For app screenshots, \u0000<sc>EncrePrior\u0000 considers the structure (i.e., GUI layout) and the contents (i.e., GUI widgets), viewing the app screenshot from the macroscopic and microscopic perspectives, respectively. For textual descriptions, \u0000<sc>EncrePrior\u0000 considers the Bug Description and Reproduction Step as the bug context. During the prioritization, we do not directly merge the features with weights to guide the prioritization. Instead, in order to comprehensively consider the semantics, we adopt a prioritize-reprioritize strategy. This practice combines different features together by considering their individual ranks. The reports are first prioritized on four features separately. Then, the ranks on four sequences are used to lexicographically reprioritize the test reports with an integration of features from app screenshots and textual descriptions. Results of an empirical study show that \u0000<sc>EncrePrior\u0000 outperforms the representative baseline approach \u0000<sc>DeepPrior\u0000 by 15.61% on average, ranging from 2.99% to 63.64% on different apps, and the novelly proposed features and prioritization strategy all contribute to the excellent performance of \u0000<sc>EncrePrior\u0000.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"283-304"},"PeriodicalIF":6.5,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142815562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Compiler Error Recovery Defects via Program Mutation Exploration 通过程序突变探索检测编译器错误恢复缺陷

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-11 DOI: 10.1109/TSE.2024.3510912

Yixuan Tang;Jingxuan Zhang;Xiaochen Li;Zhiqiu Huang;He Jiang

Compiler error recovery diagnostics facilitates software development as it provides the possible causes and suggestions on potential programming errors. However, due to compiler bugs, error recovery diagnostics could be erroneous, spurious, missing, or even crashing for mature production compilers like GCC and Clang. Compiler testing is one of the most widely used ways of ensuring its quality. However, existing compiler diagnostics testing approaches (e.g., DIPROM) only consider the typically syntactically valid test programs as inputs, which are unlikely to trigger compiler error recovery defects. Therefore, in this paper, we propose the first mutation based approach for Compiler Error Recovery diagnostics Testing, called CERTest. Specifically, CERTest first explores the mutation space for a given seed program, and leverages a series of mutation configurations (which are referred as a series of mutators applying for a seed) to iteratively mutate the structures of the seed, so as to generate error-sensitive program variants for triggering compiler error recovery mechanisms. To effectively construct error-sensitive structures, CERTest then applies a novel furthest-first based selection approach to select a set of representative mutation configurations to generate program variants in each iteration. With the generated program variants, CERTest finally leverages differential testing to detect error recovery defects in different compilers. The experiments on GCC and Clang demonstrate that CERTest outperforms five state-of-the-art approaches (i.e., DIPROM, Ccoft, Clang-fuzzer, AFL++, and HiCOND) by up to 13.10%

$sim$

221.61% on average in the term of bug-finding capability, and CERTest detects 9 new error recovery defects, 5 of which have been confirmed or fixed by developers.

{"title":"Detecting Compiler Error Recovery Defects via Program Mutation Exploration","authors":"Yixuan Tang;Jingxuan Zhang;Xiaochen Li;Zhiqiu Huang;He Jiang","doi":"10.1109/TSE.2024.3510912","DOIUrl":"10.1109/TSE.2024.3510912","url":null,"abstract":"Compiler error recovery diagnostics facilitates software development as it provides the possible causes and suggestions on potential programming errors. However, due to compiler bugs, error recovery diagnostics could be erroneous, spurious, missing, or even crashing for mature production compilers like GCC and Clang. Compiler testing is one of the most widely used ways of ensuring its quality. However, existing compiler diagnostics testing approaches (e.g., DIPROM) only consider the typically syntactically valid test programs as inputs, which are unlikely to trigger compiler error recovery defects. Therefore, in this paper, we propose the first mutation based approach for Compiler Error Recovery diagnostics Testing, called CERTest. Specifically, CERTest first explores the mutation space for a given seed program, and leverages a series of mutation configurations (which are referred as a series of mutators applying for a seed) to iteratively mutate the structures of the seed, so as to generate error-sensitive program variants for triggering compiler error recovery mechanisms. To effectively construct error-sensitive structures, CERTest then applies a novel furthest-first based selection approach to select a set of representative mutation configurations to generate program variants in each iteration. With the generated program variants, CERTest finally leverages differential testing to detect error recovery defects in different compilers. The experiments on GCC and Clang demonstrate that CERTest outperforms five state-of-the-art approaches (i.e., DIPROM, Ccoft, Clang-fuzzer, AFL++, and HiCOND) by up to 13.10%<inline-formula><tex-math>$sim$</tex-math></inline-formula>221.61% on average in the term of bug-finding capability, and CERTest detects 9 new error recovery defects, 5 of which have been confirmed or fixed by developers.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"389-412"},"PeriodicalIF":6.5,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142809247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Transactions on Software Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀