Quantum computing has emerged as a promising field with the potential to revolutionize various domains by harnessing the principles of quantum mechanics. As quantum hardware and algorithms continue to advance, developing high-quality quantum software has become crucial. However, testing quantum programs poses unique challenges due to the distinctive characteristics of quantum systems and the complexity of multi-subroutine programs. This paper addresses the specific testing requirements of multi-subroutine quantum programs. We begin by investigating critical properties by surveying existing quantum libraries and providing insights into the challenges of testing these programs. Building upon this understanding, we focus on testing criteria and techniques based on the whole testing process perspective, spanning from unit testing to integration testing. We delve into various aspects, including IO analysis, quantum relation checking, structural testing, behavior testing, integration of subroutine pairs, and test case generation. We also introduce novel testing principles and criteria to guide the testing process. We conduct comprehensive testing on typical quantum subroutines, including diverse mutants and randomized inputs, to evaluate our proposed approach. The analysis of failures provides valuable insights into the effectiveness of our testing methodology. Additionally, we present case studies on representative multi-subroutine quantum programs, demonstrating the practical application and effectiveness of our proposed testing principles and criteria.
{"title":"Testing Multi-Subroutine Quantum Programs: From Unit Testing to Integration Testing","authors":"Peixun Long, Jianjun Zhao","doi":"10.1145/3656339","DOIUrl":"https://doi.org/10.1145/3656339","url":null,"abstract":"<p>Quantum computing has emerged as a promising field with the potential to revolutionize various domains by harnessing the principles of quantum mechanics. As quantum hardware and algorithms continue to advance, developing high-quality quantum software has become crucial. However, testing quantum programs poses unique challenges due to the distinctive characteristics of quantum systems and the complexity of multi-subroutine programs. This paper addresses the specific testing requirements of multi-subroutine quantum programs. We begin by investigating critical properties by surveying existing quantum libraries and providing insights into the challenges of testing these programs. Building upon this understanding, we focus on testing criteria and techniques based on the whole testing process perspective, spanning from unit testing to integration testing. We delve into various aspects, including IO analysis, quantum relation checking, structural testing, behavior testing, integration of subroutine pairs, and test case generation. We also introduce novel testing principles and criteria to guide the testing process. We conduct comprehensive testing on typical quantum subroutines, including diverse mutants and randomized inputs, to evaluate our proposed approach. The analysis of failures provides valuable insights into the effectiveness of our testing methodology. Additionally, we present case studies on representative multi-subroutine quantum programs, demonstrating the practical application and effectiveness of our proposed testing principles and criteria.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"108 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Defect predictors, static bug detectors and humans inspecting the code can propose locations in the program that are more likely to be buggy before they are discovered through testing. Automated test generators such as search-based software testing (SBST) techniques can use this information to direct their search for test cases to likely-buggy code, thus speeding up the process of detecting existing bugs in those locations. Often the predictions given by these tools or humans are imprecise, which can misguide the SBST technique and may deteriorate its performance. In this paper, we study the impact of imprecision in defect prediction on the bug detection effectiveness of SBST.
Our study finds that the recall of the defect predictor, i.e., the proportion of correctly identified buggy code, has a significant impact on bug detection effectiveness of SBST with a large effect size. More precisely, the SBST technique detects 7.5 fewer bugs on average (out of 420 bugs) for every 5% decrements of the recall. On the other hand, the effect of precision, a measure for false alarms, is not of meaningful practical significance as indicated by a very small effect size.
In the context of combining defect prediction and SBST, our recommendation is to increase the recall of defect predictors as a primary objective and precision as a secondary objective. In our experiments, we find that 75% precision is as good as 100% precision. To account for the imprecision of defect predictors, in particular low recall values, SBST techniques should be designed to search for test cases that also cover the predicted non-buggy parts of the program, while prioritising the parts that have been predicted as buggy.
{"title":"On the Impact of Lower Recall and Precision in Defect Prediction for Guiding Search-Based Software Testing","authors":"Anjana Perera, Burak Turhan, Aldeida Aleti, Marcel Böhme","doi":"10.1145/3655022","DOIUrl":"https://doi.org/10.1145/3655022","url":null,"abstract":"<p>Defect predictors, static bug detectors and humans inspecting the code can propose locations in the program that are more likely to be buggy before they are discovered through testing. Automated test generators such as search-based software testing (SBST) techniques can use this information to direct their search for test cases to likely-buggy code, thus speeding up the process of detecting existing bugs in those locations. Often the predictions given by these tools or humans are imprecise, which can misguide the SBST technique and may deteriorate its performance. In this paper, we study the impact of imprecision in defect prediction on the bug detection effectiveness of SBST. </p><p>Our study finds that the recall of the defect predictor, i.e., the proportion of correctly identified buggy code, has a significant impact on bug detection effectiveness of SBST with a large effect size. More precisely, the SBST technique detects 7.5 fewer bugs on average (out of 420 bugs) for every 5% decrements of the recall. On the other hand, the effect of precision, a measure for false alarms, is not of meaningful practical significance as indicated by a very small effect size. </p><p>In the context of combining defect prediction and SBST, our recommendation is to increase the recall of defect predictors as a primary objective and precision as a secondary objective. In our experiments, we find that 75% precision is as good as 100% precision. To account for the imprecision of defect predictors, in particular low recall values, SBST techniques should be designed to search for test cases that also cover the predicted non-buggy parts of the program, while prioritising the parts that have been predicted as buggy.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"39 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper explores the adoption of Generative Artificial Intelligence (AI) tools within the domain of software engineering, focusing on the influencing factors at the individual, technological, and social levels. We applied a convergent mixed-methods approach to offer a comprehensive understanding of AI adoption dynamics. We initially conducted a questionnaire survey with 100 software engineers, drawing upon the Technology Acceptance Model (TAM), the Diffusion of Innovation Theory (DOI), and the Social Cognitive Theory (SCT) as guiding theoretical frameworks. Employing the Gioia Methodology, we derived a theoretical model of AI adoption in software engineering: the Human-AI Collaboration and Adaptation Framework (HACAF). This model was then validated using Partial Least Squares – Structural Equation Modeling (PLS-SEM) based on data from 183 software engineers. Findings indicate that at this early stage of AI integration, the compatibility of AI tools within existing development workflows predominantly drives their adoption, challenging conventional technology acceptance theories. The impact of perceived usefulness, social factors, and personal innovativeness seems less pronounced than expected. The study provides crucial insights for future AI tool design and offers a framework for developing effective organizational implementation strategies.
{"title":"Navigating the Complexity of Generative AI Adoption in Software Engineering","authors":"Daniel Russo","doi":"10.1145/3652154","DOIUrl":"https://doi.org/10.1145/3652154","url":null,"abstract":"<p>This paper explores the adoption of Generative Artificial Intelligence (AI) tools within the domain of software engineering, focusing on the influencing factors at the individual, technological, and social levels. We applied a convergent mixed-methods approach to offer a comprehensive understanding of AI adoption dynamics. We initially conducted a questionnaire survey with 100 software engineers, drawing upon the Technology Acceptance Model (TAM), the Diffusion of Innovation Theory (DOI), and the Social Cognitive Theory (SCT) as guiding theoretical frameworks. Employing the Gioia Methodology, we derived a theoretical model of AI adoption in software engineering: the Human-AI Collaboration and Adaptation Framework (HACAF). This model was then validated using Partial Least Squares – Structural Equation Modeling (PLS-SEM) based on data from 183 software engineers. Findings indicate that at this early stage of AI integration, the compatibility of AI tools within existing development workflows predominantly drives their adoption, challenging conventional technology acceptance theories. The impact of perceived usefulness, social factors, and personal innovativeness seems less pronounced than expected. The study provides crucial insights for future AI tool design and offers a framework for developing effective organizational implementation strategies.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"13 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Gavidia-Calderon, Anastasia Kordoni, Amel Bennaceur, Mark Levine, Bashar Nuseibeh
Autonomous systems, such as drones and rescue robots, are increasingly used during emergencies. They deliver services and provide situational awareness that facilitate emergency management and response. To do so, they need to interact and cooperate with humans in their environment. Human behaviour is uncertain and complex, so it can be difficult to reason about it formally. In this paper, we propose IDEA: an adaptive software architecture that enables cooperation between humans and autonomous systems, by leveraging in the social identity approach. This approach establishes that group membership drives human behaviour. Identity and group membership are crucial during emergencies, as they influence cooperation among survivors. IDEA systems infer the social identity of surrounding humans, thereby establishing their group membership. By reasoning about groups, we limit the number of cooperation strategies the system needs to explore. IDEA systems select a strategy from the equilibrium analysis of game-theoretic models, that represent interactions between group members and the IDEA system. We demonstrate our approach using a search-and-rescue scenario, in which an IDEA rescue robot optimises evacuation by collaborating with survivors. Using an empirically validated agent-based model, we show that the deployment of the IDEA system can reduce median evacuation time by (13.6% ).
无人机和救援机器人等自主系统在紧急情况下的应用越来越广泛。它们提供服务和态势感知,促进应急管理和响应。为此,它们需要与环境中的人类进行互动与合作。人类行为既不确定又复杂,因此很难对其进行正式推理。在本文中,我们提出了 IDEA:一种自适应软件架构,通过社会身份方法实现人类与自主系统之间的合作。这种方法认为,群体成员身份是人类行为的驱动力。在紧急情况下,身份和群体成员身份至关重要,因为它们会影响幸存者之间的合作。IDEA 系统推断周围人类的社会身份,从而确定他们的群体成员身份。通过对群体的推理,我们限制了系统需要探索的合作策略的数量。IDEA 系统从博弈论模型的均衡分析中选择一种策略,该模型代表了群体成员与 IDEA 系统之间的互动。我们使用一个搜救场景来演示我们的方法,在该场景中,IDEA 救援机器人通过与幸存者合作来优化撤离。通过使用经过经验验证的基于代理的模型,我们表明部署 IDEA 系统可以将中位撤离时间缩短(13.6%)。
{"title":"The IDEA of Us: An Identity-Aware Architecture for Autonomous Systems","authors":"Carlos Gavidia-Calderon, Anastasia Kordoni, Amel Bennaceur, Mark Levine, Bashar Nuseibeh","doi":"10.1145/3654439","DOIUrl":"https://doi.org/10.1145/3654439","url":null,"abstract":"<p>Autonomous systems, such as drones and rescue robots, are increasingly used during emergencies. They deliver services and provide situational awareness that facilitate emergency management and response. To do so, they need to interact and cooperate with humans in their environment. Human behaviour is uncertain and complex, so it can be difficult to reason about it formally. In this paper, we propose IDEA: an adaptive software architecture that enables cooperation between humans and autonomous systems, by leveraging in the social identity approach. This approach establishes that group membership drives human behaviour. Identity and group membership are crucial during emergencies, as they influence cooperation among survivors. IDEA systems infer the social identity of surrounding humans, thereby establishing their group membership. By reasoning about groups, we limit the number of cooperation strategies the system needs to explore. IDEA systems select a strategy from the equilibrium analysis of game-theoretic models, that represent interactions between group members and the IDEA system. We demonstrate our approach using a search-and-rescue scenario, in which an IDEA rescue robot optimises evacuation by collaborating with survivors. Using an empirically validated agent-based model, we show that the deployment of the IDEA system can reduce median evacuation time by (13.6% ).</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"14 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140325149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fault localization (FL) and automated program repair (APR) are two main tasks of automatic software debugging. Compared with traditional methods, deep learning-based approaches have been demonstrated to achieve better performance in FL and APR tasks. However, the existing deep learning-based FL methods ignore the deep semantic features or only consider simple code representations. And for APR tasks, existing template-based APR methods are weak in selecting the correct fix templates for more effective program repair, which are also not able to synthesize patches via the embedded end-to-end code modification knowledge obtained by training models on large-scale bug-fix code pairs. Moreover, in most of FL and APR methods, the model designs and training phases are performed separately, leading to ineffective sharing of updated parameters and extracted knowledge during the training process. This limitation hinders the further improvement in the performance of FL and APR tasks. To solve the above problems, we propose a novel approach called MTL-TRANSFER, which leverages a multi-task learning strategy to extract deep semantic features and transferred knowledge from different perspectives. First, we construct a large-scale open-source bug datasets and implement 11 multi-task learning models for bug detection and patch generation sub-tasks on 11 commonly used bug types, as well as one multi-classifier to learn the relevant semantics for the subsequent fix template selection task. Second, an MLP-based ranking model is leveraged to fuse spectrum-based, mutation-based and semantic-based features to generate a sorted list of suspicious statements. Third, we combine the patches generated by the neural patch generation sub-task from the multi-task learning strategy with the optimized fix template selecting order gained from the multi-classifier mentioned above. Finally, the more accurate FL results, the optimized fix template selecting order, and the expanded patch candidates are combined together to further enhance the overall performance of APR tasks. Our extensive experiments on widely-used benchmark Defects4J show that MTL-TRANSFER outperforms all baselines in FL and APR tasks, proving the effectiveness of our approach. Compared with our previously proposed FL method TRANSFER-FL (which is also the state-of-the-art statement-level FL method), MTL-TRANSFER increases the faults hit by 8/11/12 on Top-1/3/5 metrics (92/159/183 in total). And on APR tasks, the number of successfully repaired bugs of MTL-TRANSFER under the perfect localization setting reaches 75, which is 8 more than our previous APR method TRANSFER-PR. Furthermore, another experiment to simulate the actual repair scenarios shows that MTL-TRANSFER can successfully repair 15 and 9 more bugs (56 in total) compared with TBar and TRANSFER, which demonstrates the effectiveness of the combination of our optimized FL and APR components.
{"title":"MTL-TRANSFER: Leveraging Multi-task Learning and Transferred Knowledge for Improving Fault Localization and Program Repair","authors":"Xu Wang, Hongwei Yu, Xiangxin Meng, Hongliang Cao, Hongyu Zhang, Hailong Sun, Xudong Liu, Chunming Hu","doi":"10.1145/3654441","DOIUrl":"https://doi.org/10.1145/3654441","url":null,"abstract":"<p>Fault localization (FL) and automated program repair (APR) are two main tasks of automatic software debugging. Compared with traditional methods, deep learning-based approaches have been demonstrated to achieve better performance in FL and APR tasks. However, the existing deep learning-based FL methods ignore the deep semantic features or only consider simple code representations. And for APR tasks, existing template-based APR methods are weak in selecting the correct fix templates for more effective program repair, which are also not able to synthesize patches via the embedded end-to-end code modification knowledge obtained by training models on large-scale bug-fix code pairs. Moreover, in most of FL and APR methods, the model designs and training phases are performed separately, leading to ineffective sharing of updated parameters and extracted knowledge during the training process. This limitation hinders the further improvement in the performance of FL and APR tasks. To solve the above problems, we propose a novel approach called MTL-TRANSFER, which leverages a multi-task learning strategy to extract deep semantic features and transferred knowledge from different perspectives. First, we construct a large-scale open-source bug datasets and implement 11 multi-task learning models for bug detection and patch generation sub-tasks on 11 commonly used bug types, as well as one multi-classifier to learn the relevant semantics for the subsequent fix template selection task. Second, an MLP-based ranking model is leveraged to fuse spectrum-based, mutation-based and semantic-based features to generate a sorted list of suspicious statements. Third, we combine the patches generated by the neural patch generation sub-task from the multi-task learning strategy with the optimized fix template selecting order gained from the multi-classifier mentioned above. Finally, the more accurate FL results, the optimized fix template selecting order, and the expanded patch candidates are combined together to further enhance the overall performance of APR tasks. Our extensive experiments on widely-used benchmark Defects4J show that MTL-TRANSFER outperforms all baselines in FL and APR tasks, proving the effectiveness of our approach. Compared with our previously proposed FL method TRANSFER-FL (which is also the state-of-the-art statement-level FL method), MTL-TRANSFER increases the faults hit by 8/11/12 on Top-1/3/5 metrics (92/159/183 in total). And on APR tasks, the number of successfully repaired bugs of MTL-TRANSFER under the perfect localization setting reaches 75, which is 8 more than our previous APR method TRANSFER-PR. Furthermore, another experiment to simulate the actual repair scenarios shows that MTL-TRANSFER can successfully repair 15 and 9 more bugs (56 in total) compared with TBar and TRANSFER, which demonstrates the effectiveness of the combination of our optimized FL and APR components.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"26 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emanuele Iannone, Giulia Sellitto, Emanuele Iaccarino, Filomena Ferrucci, Andrea De Lucia, Fabio Palomba
With the rate of discovered and disclosed vulnerabilities escalating, researchers have been experimenting with machine learning to predict whether a vulnerability will be exploited. Existing solutions leverage information unavailable when a CVE is created, making them unsuitable just after the disclosure. This paper experiments with early exploitability prediction models driven exclusively by the initial CVE record, i.e., the original description and the linked online discussions. Leveraging NVD and Exploit Database, we evaluate 72 prediction models trained using six traditional machine learning classifiers, four feature representation schemas, and three data balancing algorithms. We also experiment with five pre-trained large language models (LLMs). The models leverage seven different corpora made by combining three data sources, i.e., CVE description, Security Focus, and BugTraq. The models are evaluated in a realistic, time-aware fashion by removing the training and test instances that cannot be labeled “neutral” with sufficient confidence. The validation reveals that CVE descriptions and Security Focus discussions are the best data to train on. Pre-trained LLMs do not show the expected performance, requiring further pre-training in the security domain. We distill new research directions, identify possible room for improvement, and envision automated systems assisting security experts in assessing the exploitability.
{"title":"Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?","authors":"Emanuele Iannone, Giulia Sellitto, Emanuele Iaccarino, Filomena Ferrucci, Andrea De Lucia, Fabio Palomba","doi":"10.1145/3654443","DOIUrl":"https://doi.org/10.1145/3654443","url":null,"abstract":"<p>With the rate of discovered and disclosed vulnerabilities escalating, researchers have been experimenting with machine learning to predict whether a vulnerability will be exploited. Existing solutions leverage information unavailable when a CVE is created, making them unsuitable just after the disclosure. This paper experiments with <i>early</i> exploitability prediction models driven exclusively by the initial CVE record, i.e., the original description and the linked online discussions. Leveraging NVD and Exploit Database, we evaluate 72 prediction models trained using six traditional machine learning classifiers, four feature representation schemas, and three data balancing algorithms. We also experiment with five pre-trained large language models (LLMs). The models leverage seven different corpora made by combining three data sources, i.e., CVE description, <span>Security Focus</span>, and <span>BugTraq</span>. The models are evaluated in a <i>realistic</i>, time-aware fashion by removing the training and test instances that cannot be labeled <i>“neutral”</i> with sufficient confidence. The validation reveals that CVE descriptions and <span>Security Focus</span> discussions are the best data to train on. Pre-trained LLMs do not show the expected performance, requiring further pre-training in the security domain. We distill new research directions, identify possible room for improvement, and envision automated systems assisting security experts in assessing the exploitability.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"6 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The increase of software supply chain threats has underscored the necessity for robust security mechanisms, among which the Software Bill of Materials (SBOM) stands out as a promising solution. SBOMs, by providing a machine-readable inventory of software composition details, play a crucial role in enhancing transparency and traceability within software supply chains. This empirical study delves into the practical challenges and solutions associated with the adoption of SBOMs, through an analysis of 4,786 GitHub discussions across 510 SBOM-related projects. Through repository mining and analysis, this research delineates key topics, challenges, and solutions intrinsic to the effective utilization of SBOMs. Furthermore, we shed light on commonly used tools and frameworks for SBOM generation, exploring their respective strengths and limitations. This study underscores a set of findings, for example, there are four phases of the SBOM life cycle, and each phase has a set of SBOM development activities and issues; in addition, this study emphasizes that SBOM play in ensuring resilient software development practices and the imperative of their widespread adoption and integration to bolster supply chain security. The insights of our study provide vital input for future work and practical advancements in this topic.
{"title":"On the Way to SBOMs: Investigating Design Issues and Solutions in Practice","authors":"Tingting Bi, Boming Xia, Zhenchang Xing, Qinghua Lu, Liming Zhu","doi":"10.1145/3654442","DOIUrl":"https://doi.org/10.1145/3654442","url":null,"abstract":"<p>The increase of software supply chain threats has underscored the necessity for robust security mechanisms, among which the Software Bill of Materials (SBOM) stands out as a promising solution. SBOMs, by providing a machine-readable inventory of software composition details, play a crucial role in enhancing transparency and traceability within software supply chains. This empirical study delves into the practical challenges and solutions associated with the adoption of SBOMs, through an analysis of 4,786 GitHub discussions across 510 SBOM-related projects. Through repository mining and analysis, this research delineates key topics, challenges, and solutions intrinsic to the effective utilization of SBOMs. Furthermore, we shed light on commonly used tools and frameworks for SBOM generation, exploring their respective strengths and limitations. This study underscores a set of findings, for example, there are four phases of the SBOM life cycle, and each phase has a set of SBOM development activities and issues; in addition, this study emphasizes that SBOM play in ensuring resilient software development practices and the imperative of their widespread adoption and integration to bolster supply chain security. The insights of our study provide vital input for future work and practical advancements in this topic.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"1 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140300659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guofu Zhang, Lei Li, Zhaopin Su, Feng Yue, Yang Chen, Miqing Li, Xin Yao
The multi-objective testing resource allocation problem (MOTRAP) is concerned on how to reasonably plan the testing time of software testers to save the cost and improve the reliability as much as possible. The feasible solution space of a MOTRAP is determined by its variables (i.e., the time invested in each component) and constraints (e.g., the pre-specified reliability, cost, or time). Although a variety of state-of-the-art constrained multi-objective optimisers can be used to find individual solutions in this space, their search remains inefficient and expensive due to the fact that this space is very tiny compare to the large search space. The decision maker may often suffer a prolonged but unsuccessful search that fails to return a feasible solution. In this work, we first formulate a heavily constrained MOTRAP on the basis of an architecture-based model, in which reliability, cost, and time are optimised under the pre-specified multiple constraints on reliability, cost, and time. Then, to estimate the feasible solution space of this specific MOTRAP, we develop theoretical and algorithmic approaches to deduce new tighter lower and upper bounds on variables from constraints. Importantly, our approach can help the decision maker identify whether their constraint settings are practicable, and meanwhile, the derived bounds can just enclose the tiny feasible solution space and help off-the-shelf constrained multi-objective optimisers make the search within the feasible solution space as much as possible. Additionally, to further make good use of these bounds, we propose a generalised bound constraint handling method that can be readily employed by constrained multi-objective optimisers to pull infeasible solutions back into the estimated space with theoretical guarantee. Finally, we evaluate our approach on application and empirical cases. Experimental results reveal that our approach significantly enhances the efficiency, effectiveness, and robustness of off-the-shelf constrained multi-objective optimisers and state-of-the-art bound constraint handling methods at finding high-quality solutions for the decision maker. These improvements may help the decision maker take the stress out of setting constraints and selecting constrained multi-objective optimisers and facilitate the testing planning more efficiently and effectively.
{"title":"On Estimating the Feasible Solution Space of Multi-Objective Testing Resource Allocation","authors":"Guofu Zhang, Lei Li, Zhaopin Su, Feng Yue, Yang Chen, Miqing Li, Xin Yao","doi":"10.1145/3654444","DOIUrl":"https://doi.org/10.1145/3654444","url":null,"abstract":"<p>The multi-objective testing resource allocation problem (MOTRAP) is concerned on how to reasonably plan the testing time of software testers to save the cost and improve the reliability as much as possible. The feasible solution space of a MOTRAP is determined by its variables (i.e., the time invested in each component) and constraints (e.g., the pre-specified reliability, cost, or time). Although a variety of state-of-the-art constrained multi-objective optimisers can be used to find individual solutions in this space, their search remains inefficient and expensive due to the fact that this space is very tiny compare to the large search space. The decision maker may often suffer a prolonged but unsuccessful search that fails to return a feasible solution. In this work, we first formulate a heavily constrained MOTRAP on the basis of an architecture-based model, in which reliability, cost, and time are optimised under the pre-specified multiple constraints on reliability, cost, and time. Then, to estimate the feasible solution space of this specific MOTRAP, we develop theoretical and algorithmic approaches to deduce new tighter lower and upper bounds on variables from constraints. Importantly, our approach can help the decision maker identify whether their constraint settings are practicable, and meanwhile, the derived bounds can just enclose the tiny feasible solution space and help off-the-shelf constrained multi-objective optimisers make the search within the feasible solution space as much as possible. Additionally, to further make good use of these bounds, we propose a generalised bound constraint handling method that can be readily employed by constrained multi-objective optimisers to pull infeasible solutions back into the estimated space with theoretical guarantee. Finally, we evaluate our approach on application and empirical cases. Experimental results reveal that our approach significantly enhances the efficiency, effectiveness, and robustness of off-the-shelf constrained multi-objective optimisers and state-of-the-art bound constraint handling methods at finding high-quality solutions for the decision maker. These improvements may help the decision maker take the stress out of setting constraints and selecting constrained multi-objective optimisers and facilitate the testing planning more efficiently and effectively.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"84 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Greybox fuzzing is a powerful testing technique. Given a set of initial seeds, greybox fuzzing continuously generates new test inputs to execute the program under test and drives executions with code coverage as feedback. Seed prioritization is an important step of greybox fuzzing that helps greybox fuzzing choose promising seeds for input generation in priority. However, mainstream greybox fuzzers like AFL++ and Zest tend to neglect the importance of seed prioritization. They may pick seeds plainly according to the sequential order of the seeds being queued or an order produced with a random-based approach, which may consequently degrade their performance in exploring code and exposing bugs. In the meantime, existing state-of-the-art techniques like Alphuzz and K-Scheduler adopt complex strategies to schedule seeds. Although powerful, such strategies also inevitably incur great overhead and will reduce the scalability of the proposed technique.
In this paper, we propose a novel distance-based seed prioritization approach named DiPri to facilitate greybox fuzzing.
Specifically, DiPri evaluates the queued seeds according to seed distances and chooses the outlier ones, which are the farthest from the others, in priority to improve the probabilities of discovering previously unexplored code regions. To make a profound evaluation of DiPri, we prototype DiPri on AFL++ and conduct large-scale experiments with four baselines and 24 C/C++ fuzz targets, where eight are from widely adopted real-world projects, eight are from the coverage-based benchmark FuzzBench, and eight are from the bug-based benchmark Magma. The results obtained through a fuzzing exceeding 50,000 CPU hours suggest that DiPri can (1) insignificantly influence the host fuzzer’s capability of code coverage by slightly improving the branch coverage on the eight targets from real-world projects and slightly reducing the branch coverage on the eight targets from FuzzBench, and (2) improve the host fuzzer’s capability of finding bugs by triggering five more Magma bugs. Besides the evaluation with the three C/C++ benchmarks, we integrate DiPri into the Java fuzzer Zest and conduct experiments on a Java benchmark composed of five real-world programs for more than 8,000 CPU hours to empirically study the scalability of DiPri. The results with the Java benchmark demonstrate that DiPri is pretty scalable and can help the host fuzzer find bugs more consistently.
{"title":"DiPri: Distance-based Seed Prioritization for Greybox Fuzzing","authors":"Ruixiang Qian, Quanjun Zhang, Chunrong Fang, Ding Yang, Shun Li, Binyu Li, Zhenyu Chen","doi":"10.1145/3654440","DOIUrl":"https://doi.org/10.1145/3654440","url":null,"abstract":"<p>Greybox fuzzing is a powerful testing technique. Given a set of initial seeds, greybox fuzzing continuously generates new test inputs to execute the program under test and drives executions with code coverage as feedback. Seed prioritization is an important step of greybox fuzzing that helps greybox fuzzing choose promising seeds for input generation in priority. However, mainstream greybox fuzzers like AFL++ and Zest tend to neglect the importance of seed prioritization. They may pick seeds plainly according to the sequential order of the seeds being queued or an order produced with a random-based approach, which may consequently degrade their performance in exploring code and exposing bugs. In the meantime, existing state-of-the-art techniques like Alphuzz and K-Scheduler adopt complex strategies to schedule seeds. Although powerful, such strategies also inevitably incur great overhead and will reduce the scalability of the proposed technique. </p><p>In this paper, we propose a novel distance-based seed prioritization approach named <span>DiPri</span> to facilitate greybox fuzzing. </p><p>Specifically, <span>DiPri</span> evaluates the queued seeds according to seed distances and chooses the outlier ones, which are the farthest from the others, in priority to improve the probabilities of discovering previously unexplored code regions. To make a profound evaluation of <span>DiPri</span>, we prototype <span>DiPri</span> on AFL++ and conduct large-scale experiments with four baselines and 24 C/C++ fuzz targets, where eight are from widely adopted real-world projects, eight are from the coverage-based benchmark FuzzBench, and eight are from the bug-based benchmark Magma. The results obtained through a fuzzing exceeding 50,000 CPU hours suggest that <span>DiPri</span> can (1) insignificantly influence the host fuzzer’s capability of code coverage by slightly improving the branch coverage on the eight targets from real-world projects and slightly reducing the branch coverage on the eight targets from FuzzBench, and (2) improve the host fuzzer’s capability of finding bugs by triggering five more Magma bugs. Besides the evaluation with the three C/C++ benchmarks, we integrate <span>DiPri</span> into the Java fuzzer Zest and conduct experiments on a Java benchmark composed of five real-world programs for more than 8,000 CPU hours to empirically study the scalability of <span>DiPri</span>. The results with the Java benchmark demonstrate that <span>DiPri</span> is pretty scalable and can help the host fuzzer find bugs more consistently.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"30 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140300916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps). Developers need to guarantee the quality of mobile apps before it is released to the market. There have been many approaches using different strategies to test the GUI of mobile apps. However, they still need improvement due to their limited effectiveness. In this paper, we propose DinoDroid, an approach based on deep Q-networks to automate testing of Android apps. DinoDroid learns a behavior model from a set of existing apps and the learned model can be used to explore and generate tests for new apps. DinoDroid is able to capture the fine-grained details of GUI events (e.g., the content of GUI widgets) and use them as features that are fed into deep neural network, which acts as the agent to guide app exploration. DinoDroid automatically adapts the learned model during the exploration without the need of any modeling strategies or pre-defined rules. We conduct experiments on 64 open-source Android apps. The results showed that DinoDroid outperforms existing Android testing tools in terms of code coverage and bug detection.
{"title":"DinoDroid: Testing Android Apps Using Deep Q-Networks","authors":"Yu Zhao, Brent Harrison, Tingting Yu","doi":"10.1145/3652150","DOIUrl":"https://doi.org/10.1145/3652150","url":null,"abstract":"<p>The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps). Developers need to guarantee the quality of mobile apps before it is released to the market. There have been many approaches using different strategies to test the GUI of mobile apps. However, they still need improvement due to their limited effectiveness. In this paper, we propose DinoDroid, an approach based on deep Q-networks to automate testing of Android apps. DinoDroid learns a behavior model from a set of existing apps and the learned model can be used to explore and generate tests for new apps. DinoDroid is able to capture the fine-grained details of GUI events (e.g., the content of GUI widgets) and use them as features that are fed into deep neural network, which acts as the agent to guide app exploration. DinoDroid automatically adapts the learned model during the exploration without the need of any modeling strategies or pre-defined rules. We conduct experiments on 64 open-source Android apps. The results showed that DinoDroid outperforms existing Android testing tools in terms of code coverage and bug detection.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"25 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140127676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}