Pub Date : 2024-12-31DOI: 10.1109/TSE.2024.3523487
Chenxing Zhong;Shanshan Li;He Zhang;Huang Huang;Lanxin Yang;Yuanfang Cai
Evolutionary design is a widely accepted practice for defining microservice boundaries. It is performed through a sequence of incremental refactoring tasks (we call it “microservice refactoring”), each restructuring only part of a microservice system (a.k.a., refactoring part) into well-defined services for improving the architecture in a controlled manner. Despite its popularity in practice, microservice refactoring suffers from insufficient methodological support. While there are numerous studies addressing similar software design tasks, i.e., software remodularization and microservitization, their approaches prove inadequate when applied to microservice refactoring. Our analysis reveals that their approaches may even degrade the entire architecture in microservice refactoring, as they only optimize the refactoring part in such applications, but neglect the relationships between the refactoring part and the remaining system. As the first response to the need, Micro2Micro is proposed to re-partition the refactoring part while optimizing three quality objectives including the interdependence between the refactoring and non-refactoring parts. In addition, it allows architects to intervene in the decision-making process by interactively incorporating their knowledge into the iterative search for optimal refactoring solutions. An empirical study on 13 open-source projects of different sizes shows that the solutions from Micro2Micro perform well and exhibit quality improvement with an average up to 45% to the original architecture. Users of Micro2Micro found the suggested solutions highly satisfactory. They acknowledge the advantages in terms of infusing human intelligence into decisions, providing immediate quality feedback, and quick exploration capability.
{"title":"Refactoring Microservices to Microservices in Support of Evolutionary Design","authors":"Chenxing Zhong;Shanshan Li;He Zhang;Huang Huang;Lanxin Yang;Yuanfang Cai","doi":"10.1109/TSE.2024.3523487","DOIUrl":"10.1109/TSE.2024.3523487","url":null,"abstract":"<italic>Evolutionary design</i> is a widely accepted practice for defining microservice boundaries. It is performed through a sequence of incremental refactoring tasks (we call it <italic>“microservice refactoring”</i>), each restructuring only part of a microservice system (<italic>a.k.a., refactoring part</i>) into well-defined services for improving the architecture in a controlled manner. Despite its popularity in practice, microservice refactoring suffers from insufficient methodological support. While there are numerous studies addressing similar software design tasks, <italic>i</i>.<italic>e</i>., software remodularization and microservitization, their approaches prove inadequate when applied to microservice refactoring. Our analysis reveals that their approaches may even degrade the entire architecture in microservice refactoring, as they only optimize the refactoring part in such applications, but neglect the relationships between the refactoring part and the remaining system. As the first response to the need, <italic>Micro2Micro</i> is proposed to re-partition the refactoring part while optimizing three quality objectives including the interdependence between the refactoring and non-refactoring parts. In addition, it allows architects to intervene in the decision-making process by interactively incorporating their knowledge into the iterative search for optimal refactoring solutions. An empirical study on 13 open-source projects of different sizes shows that the solutions from <italic>Micro2Micro</i> perform well and exhibit quality improvement with an average up to 45% to the original architecture. Users of <italic>Micro2Micro</i> found the suggested solutions highly satisfactory. They acknowledge the advantages in terms of infusing human intelligence into decisions, providing immediate quality feedback, and quick exploration capability.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"484-502"},"PeriodicalIF":6.5,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142908423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Augmenting missing key aspects in Textual Vulnerability Descriptions (TVDs) is crucial for effective vulnerability analysis. For instance, in TVDs, key aspects include Attack Vector, Vulnerability Type, among others. These key aspects help security engineers understand and address the vulnerability in a timely manner. For software with a large user base (non-long-tail software), augmenting these missing key aspects has significantly advanced vulnerability analysis and software security research. However, software instances with a limited user base (long-tail software) often get overlooked due to inconsistency software names, TVD limited avaliability, and domain-specific jargon, which complicates vulnerability analysis and software repairs. In this paper, we introduce a novel software feature inference framework designed to augment the missing key aspects of TVDs for long-tail software. Firstly, we tackle the issue of non-standard software names found in community-maintained vulnerability databases by cross-referencing government databases with Common Vulnerabilities and Exposures (CVEs). Next, we employ Large Language Models (LLMs) to generate the missing key aspects. However, the limited availability of historical TVDs restricts the variety of examples. To overcome this limitation, we utilize the Common Weakness Enumeration (CWE) to classify all TVDs and select cluster centers as representative examples. To ensure accuracy, we present Natural Language Inference (NLI) models specifically designed for long-tail software. These models identify and eliminate incorrect responses. Additionally, we use a wiki repository to provide explanations for proprietary terms. Our evaluations demonstrate that our approach significantly improves the accuracy of augmenting missing key aspects of TVDs for log-tail software from 0.27 to 0.56 (+107%). Interestingly, the accuracy of non-long-tail software also increases from 64% to 71%. As a result, our approach can be useful in various downstream tasks that require complete TVD information.
{"title":"Do Chase Your Tail! Missing Key Aspects Augmentation in Textual Vulnerability Descriptions of Long-Tail Software Through Feature Inference","authors":"Linyi Han;Shidong Pan;Zhenchang Xing;Jiamou Sun;Sofonias Yitagesu;Xiaowang Zhang;Zhiyong Feng","doi":"10.1109/TSE.2024.3523284","DOIUrl":"10.1109/TSE.2024.3523284","url":null,"abstract":"Augmenting missing key aspects in Textual Vulnerability Descriptions (TVDs) is crucial for effective vulnerability analysis. For instance, in TVDs, key aspects include <italic>Attack Vector</i>, <italic>Vulnerability Type</i>, among others. These key aspects help security engineers understand and address the vulnerability in a timely manner. For software with a large user base (non-long-tail software), augmenting these missing key aspects has significantly advanced vulnerability analysis and software security research. However, software instances with a limited user base (long-tail software) often get overlooked due to inconsistency software names, TVD limited avaliability, and domain-specific jargon, which complicates vulnerability analysis and software repairs. In this paper, we introduce a novel software feature inference framework designed to augment the missing key aspects of TVDs for long-tail software. Firstly, we tackle the issue of non-standard software names found in community-maintained vulnerability databases by cross-referencing government databases with Common Vulnerabilities and Exposures (CVEs). Next, we employ Large Language Models (LLMs) to generate the missing key aspects. However, the limited availability of historical TVDs restricts the variety of examples. To overcome this limitation, we utilize the Common Weakness Enumeration (CWE) to classify all TVDs and select cluster centers as representative examples. To ensure accuracy, we present Natural Language Inference (NLI) models specifically designed for long-tail software. These models identify and eliminate incorrect responses. Additionally, we use a wiki repository to provide explanations for proprietary terms. Our evaluations demonstrate that our approach significantly improves the accuracy of augmenting missing key aspects of TVDs for log-tail software from 0.27 to 0.56 (+107%). Interestingly, the accuracy of non-long-tail software also increases from 64% to 71%. As a result, our approach can be useful in various downstream tasks that require complete TVD information.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"466-483"},"PeriodicalIF":6.5,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24DOI: 10.1109/tse.2024.3522038
Leslie Lamport
{"title":"A Retrospective of Proving the Correctness of Multiprocess Programs","authors":"Leslie Lamport","doi":"10.1109/tse.2024.3522038","DOIUrl":"https://doi.org/10.1109/tse.2024.3522038","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142884233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-23DOI: 10.1109/tse.2024.3521306
Jeff Kramer
{"title":"Reflections of a Former Editor-in-Chief of TSE","authors":"Jeff Kramer","doi":"10.1109/tse.2024.3521306","DOIUrl":"https://doi.org/10.1109/tse.2024.3521306","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"60 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142879938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-19DOI: 10.1109/TSE.2024.3520477
Kelson Silva;Jorge Melegati;Fabio Silveira;Xiaofeng Wang;Mauricio Ferreira;Eduardo Guerra
Uncertainty is present in software architecture decisions due to a lack of knowledge about the requirements and the solutions involved. However, this uncertainty is usually not made explicit, and decisions can be made based on unproven premises or false assumptions. This paper focuses on a technique called ArchHypo that uses hypotheses engineering to manage uncertainties related to software architecture. It proposes formulating a technical plan based on each hypothesis’ assessment, incorporating measures able to mitigate its impact and reduce uncertainty. To evaluate the proposed technique, this paper reports an application of the technique in a mission-critical project that faced several technical challenges. Conclusions were based on data extracted from the project documentation and a questionnaire answered by all team members. As a result, the application of ArchHypo provided a structured approach to dividing the architectural work through iterations, which facilitated architectural decision-making. However, further research is needed to fully understand its impact across different contexts. On the other hand, the team identified the learning curve and process adjustments required for ArchHypo's adoption as significant challenges that could hinder its widespread adoption. In conclusion, the evidence found in this study indicates that the technique has the potential to provide a suitable way to manage the uncertainties related to software architecture, facilitating the strategic postponement of decisions while addressing their potential impact.
{"title":"ArchHypo: Managing Software Architecture Uncertainty Using Hypotheses Engineering","authors":"Kelson Silva;Jorge Melegati;Fabio Silveira;Xiaofeng Wang;Mauricio Ferreira;Eduardo Guerra","doi":"10.1109/TSE.2024.3520477","DOIUrl":"10.1109/TSE.2024.3520477","url":null,"abstract":"Uncertainty is present in software architecture decisions due to a lack of knowledge about the requirements and the solutions involved. However, this uncertainty is usually not made explicit, and decisions can be made based on unproven premises or false assumptions. This paper focuses on a technique called ArchHypo that uses hypotheses engineering to manage uncertainties related to software architecture. It proposes formulating a technical plan based on each hypothesis’ assessment, incorporating measures able to mitigate its impact and reduce uncertainty. To evaluate the proposed technique, this paper reports an application of the technique in a mission-critical project that faced several technical challenges. Conclusions were based on data extracted from the project documentation and a questionnaire answered by all team members. As a result, the application of ArchHypo provided a structured approach to dividing the architectural work through iterations, which facilitated architectural decision-making. However, further research is needed to fully understand its impact across different contexts. On the other hand, the team identified the learning curve and process adjustments required for ArchHypo's adoption as significant challenges that could hinder its widespread adoption. In conclusion, the evidence found in this study indicates that the technique has the potential to provide a suitable way to manage the uncertainties related to software architecture, facilitating the strategic postponement of decisions while addressing their potential impact.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"430-448"},"PeriodicalIF":6.5,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10807272","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-16DOI: 10.1109/TSE.2024.3519159
Ishrak Hayet;Adam Scott;Marcelo d'Amorim
Test oracle generation is an important and challenging problem. Neural-based solutions have been recently proposed for oracle generation but they are still inaccurate. For example, the accuracy of the state-of-the-art technique teco is only 27.5% on its dataset including 3,540 test cases. We propose ChatAssert, a prompt engineering framework designed for oracle generation that uses dynamic and static information to iteratively refine prompts for querying large language models (LLMs). ChatAssert uses code summaries and examples to assist an LLM in generating candidate test oracles, uses a lightweight static analysis to assist the LLM in repairing generated oracles that fail to compile, and uses dynamic information obtained from test runs to help the LLM in repairing oracles that compile but do not pass. Experimental results using an independent publicly-available dataset show that ChatAssert improves the state-of-the-art technique, teco, on key evaluation metrics. For example, it improves Acc@1 by 15%. Overall, results provide initial yet strong evidence that using external tools in the formulation of prompts is an important aid in LLM-based oracle generation.
{"title":"ChatAssert: LLM-Based Test Oracle Generation With External Tools Assistance","authors":"Ishrak Hayet;Adam Scott;Marcelo d'Amorim","doi":"10.1109/TSE.2024.3519159","DOIUrl":"10.1109/TSE.2024.3519159","url":null,"abstract":"Test oracle generation is an important and challenging problem. Neural-based solutions have been recently proposed for oracle generation but they are still inaccurate. For example, the accuracy of the state-of-the-art technique <sc>teco</small> is only 27.5% on its dataset including 3,540 test cases. We propose <sc>ChatAssert</small>, a prompt engineering framework designed for oracle generation that uses dynamic and static information to iteratively refine prompts for querying large language models (LLMs). <sc>ChatAssert</small> uses code summaries and examples to assist an LLM in generating candidate test oracles, uses a lightweight static analysis to assist the LLM in repairing generated oracles that fail to compile, and uses dynamic information obtained from test runs to help the LLM in repairing oracles that compile but do not pass. Experimental results using an independent publicly-available dataset show that <sc>ChatAssert</small> improves the state-of-the-art technique, <sc>teco</small>, on key evaluation metrics. For example, it improves <italic>Acc@1</i> by 15%. Overall, results provide initial yet strong evidence that using external tools in the formulation of prompts is an important aid in LLM-based oracle generation.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"305-319"},"PeriodicalIF":6.5,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142832239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Crowdsourced testing has gained prominence in the field of software testing due to its ability to effectively address the challenges posed by the fragmentation problem in mobile app testing. The inherent openness of crowdsourced testing brings diversity to the testing outcome. However, it also presents challenges for app developers in inspecting a substantial quantity of test reports. To help app developers inspect the bugs in crowdsourced test reports as early as possible, crowdsourced test report prioritization has emerged as an effective technology by establishing a systematic optimal report inspecting sequence. Nevertheless, crowdsourced test reports consist of app screenshots and textual descriptions, but current prioritization approaches mostly rely on textual descriptions, and some may add vectorized image features at the image-as-a-whole level or widget level. They still lack precision in accurately characterizing the distinctive features of crowdsourced test reports. In terms of prioritization strategy, prevailing approaches adopt simple prioritization based on features combined merely using weighted coefficients, without adequately considering the semantics, which may result in biased and ineffective outcomes. In this paper, we propose EncrePrior