首页 > 最新文献

2023 IEEE/ACM International Conference on Automation of Software Test (AST)最新文献

英文 中文
AST 2023 Steering Committee AST 2023指导委员会
Pub Date : 2023-05-01 DOI: 10.1109/ast58925.2023.00026
{"title":"AST 2023 Steering Committee","authors":"","doi":"10.1109/ast58925.2023.00026","DOIUrl":"https://doi.org/10.1109/ast58925.2023.00026","url":null,"abstract":"","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121692627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Trade-offs of Text-based Diversity in Test Prioritisation 评估测试优先级中基于文本的多样性的权衡
Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00021
Ranim Khojah, Chi Hong Chao, F. D. O. Neto
Diversity-based techniques (DBT) have been cost-effective by prioritizing the most dissimilar test cases to detect faults at earlier stages of test execution. Diversity is measured on test specifications to convey how different test cases are from one another. However, there is little research on the trade-off of diversity measures based on different types of text-based specification (lexicographical or semantics). Particularly because the text content in test scripts vary widely from unit (e.g., code) to system-level (e.g., natural language). This paper compares and evaluates the cost-effectiveness in coverage and failures of different text-based diversity measures for different levels of tests. We perform an experiment on the test suites of 7 open source projects on the unit level, and 2 industry projects on the integration and system level. Our results show that test suites prioritised using semantic-based diversity measures causes a small improvement in requirements coverage, as opposed to lexical diversity that showed less coverage than random for system-level artefacts. In contrast, using lexical-based measures such as Jaccard or Levenshtein to prioritise code artefacts yield better failure coverage across all levels of tests. We summarise our findings into a list of recommendations for using semantic or lexical diversity on different levels of testing.
基于多样性的技术(DBT)通过对最不相似的测试用例进行优先排序,从而在测试执行的早期阶段检测故障,从而节省了成本。多样性是在测试规范上测量的,以传达测试用例之间的差异。然而,关于基于不同类型的文本规范(词典或语义)的多样性度量权衡的研究很少。特别是因为测试脚本中的文本内容从单元(例如,代码)到系统级(例如,自然语言)变化很大。本文比较和评估了不同文本多样性措施在不同测试水平下的覆盖和失败的成本效益。我们在单元层面上对7个开源项目的测试套件进行实验,在集成和系统层面上对2个行业项目的测试套件进行实验。我们的结果表明,使用基于语义的多样性度量对测试套件进行优先排序,会导致需求覆盖率的小幅提高,而与之相对的是,对于系统级工件,词汇多样性显示的覆盖率比随机的要少。相反,使用诸如Jaccard或Levenshtein之类的基于词汇的度量来确定代码工件的优先级,可以在所有级别的测试中获得更好的失败覆盖率。我们将我们的发现总结为在不同水平的测试中使用语义或词汇多样性的建议列表。
{"title":"Evaluating the Trade-offs of Text-based Diversity in Test Prioritisation","authors":"Ranim Khojah, Chi Hong Chao, F. D. O. Neto","doi":"10.1109/AST58925.2023.00021","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00021","url":null,"abstract":"Diversity-based techniques (DBT) have been cost-effective by prioritizing the most dissimilar test cases to detect faults at earlier stages of test execution. Diversity is measured on test specifications to convey how different test cases are from one another. However, there is little research on the trade-off of diversity measures based on different types of text-based specification (lexicographical or semantics). Particularly because the text content in test scripts vary widely from unit (e.g., code) to system-level (e.g., natural language). This paper compares and evaluates the cost-effectiveness in coverage and failures of different text-based diversity measures for different levels of tests. We perform an experiment on the test suites of 7 open source projects on the unit level, and 2 industry projects on the integration and system level. Our results show that test suites prioritised using semantic-based diversity measures causes a small improvement in requirements coverage, as opposed to lexical diversity that showed less coverage than random for system-level artefacts. In contrast, using lexical-based measures such as Jaccard or Levenshtein to prioritise code artefacts yield better failure coverage across all levels of tests. We summarise our findings into a list of recommendations for using semantic or lexical diversity on different levels of testing.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114303252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orchestration Strategies for Regression Test Suites 回归测试套件的编排策略
Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00020
Renan Greca, Breno Miranda, A. Bertolino
Regression testing is widely studied in the literature, although most research on the topic is concerned with improving specific sub-challenges of a wider goal. Test suite orchestration proposes a more comprehensive view of the challenge of regression testing, by merging and combining different techniques with a variety of objectives, including prioritizing, selecting, reducing and amplifying tests, detecting flaky tests and potentially more. This paper presents the key approaches and techniques that form test suite orchestration, along with common evaluation metrics, and discusses how they can be used together to ultimately provide an efficient and effective regression testing strategy. To illustrate the benefits of orchestration, we provide some examples of existing papers that take steps towards this goal, even if the specific terminology is not yet used. Orchestrated strategies utilizing existing regression testing techniques provide a pathway to practicality and real-world usage of the academic literature.
回归测试在文献中得到了广泛的研究,尽管大多数关于该主题的研究都是关于改善更广泛目标的特定子挑战。测试套件编排提出了回归测试挑战的更全面的观点,通过合并和组合不同的技术和各种各样的目标,包括确定优先级、选择、减少和扩大测试、检测不稳定的测试等等。本文介绍了形成测试套件编排的关键方法和技术,以及常见的评估度量,并讨论了如何将它们一起使用以最终提供高效和有效的回归测试策略。为了说明编排的好处,我们提供了一些现有论文的例子,这些论文朝着这个目标迈出了一步,即使还没有使用特定的术语。利用现有回归测试技术的精心策划的策略为学术文献的实用性和现实世界的使用提供了一条途径。
{"title":"Orchestration Strategies for Regression Test Suites","authors":"Renan Greca, Breno Miranda, A. Bertolino","doi":"10.1109/AST58925.2023.00020","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00020","url":null,"abstract":"Regression testing is widely studied in the literature, although most research on the topic is concerned with improving specific sub-challenges of a wider goal. Test suite orchestration proposes a more comprehensive view of the challenge of regression testing, by merging and combining different techniques with a variety of objectives, including prioritizing, selecting, reducing and amplifying tests, detecting flaky tests and potentially more. This paper presents the key approaches and techniques that form test suite orchestration, along with common evaluation metrics, and discusses how they can be used together to ultimately provide an efficient and effective regression testing strategy. To illustrate the benefits of orchestration, we provide some examples of existing papers that take steps towards this goal, even if the specific terminology is not yet used. Orchestrated strategies utilizing existing regression testing techniques provide a pathway to practicality and real-world usage of the academic literature.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130521819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlakyCat: Predicting Flaky Tests Categories using Few-Shot Learning FlakyCat:使用Few-Shot学习预测FlakyCat测试类别
Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00018
Amal Akli, Guillaume Haben, Sarra Habchi, Mike Papadakis, Yves Le Traon
Flaky tests are tests that yield different outcomes when run on the same version of a program. This non-deterministic behaviour plagues continuous integration with false signals, wasting developers’ time and reducing their trust in test suites. Studies highlighted the importance of keeping tests flakiness-free. Recently, the research community has been pushing towards the detection of flaky tests by suggesting many static and dynamic approaches. While promising, those approaches mainly focus on classifying tests as flaky or not and, even when high performances are reported, it remains challenging to understand the cause of flakiness. This part is crucial for researchers and developers that aim to fix it. To help with the comprehension of a given flaky test, we propose FlakyCat, the first approach to classify flaky tests based on their root cause category. FlakyCat relies on CodeBERT for code representation and leverages Siamese networks to train a multi-class classifier. We train and evaluate FlakyCat on a set of 451 flaky tests collected from open-source Java projects. Our evaluation shows that FlakyCat categorises flaky tests accurately, with an F1 score of 73%. Furthermore, we investigate the performance of our approach for each category, revealing that Async waits, Unordered collections and Time-related flaky tests are accurately classified, while Concurrency-related flaky tests are more challenging to predict. Finally, to facilitate the comprehension of FlakyCat’s predictions, we present a new technique for CodeBERT-based model interpretability that highlights code statements influencing the categorization.
不稳定测试是在程序的同一版本上运行时产生不同结果的测试。这种不确定的行为用错误的信号困扰着持续集成,浪费了开发人员的时间,降低了他们对测试套件的信任。研究强调了保持测试无片状的重要性。最近,研究界一直在通过提出许多静态和动态方法来推动片状测试的检测。虽然有希望,但这些方法主要集中在将测试分类为片状或非片状,即使报告了高性能,也很难理解片状的原因。这部分对于致力于解决这一问题的研究人员和开发人员来说至关重要。为了帮助理解给定的片状测试,我们提出了FlakyCat,这是基于其根本原因类别对片状测试进行分类的第一种方法。FlakyCat依赖CodeBERT进行代码表示,并利用暹罗网络来训练多类分类器。我们在一组从开源Java项目中收集的451个片状测试上训练和评估FlakyCat。我们的评估表明,FlakyCat对片状测试进行了准确的分类,F1得分为73%。此外,我们研究了我们的方法对每个类别的性能,揭示了异步等待、无序集合和与时间相关的片状测试被准确分类,而与并发相关的片状测试更难预测。最后,为了便于理解FlakyCat的预测,我们提出了一种基于codebert的模型可解释性的新技术,该技术突出了影响分类的代码语句。
{"title":"FlakyCat: Predicting Flaky Tests Categories using Few-Shot Learning","authors":"Amal Akli, Guillaume Haben, Sarra Habchi, Mike Papadakis, Yves Le Traon","doi":"10.1109/AST58925.2023.00018","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00018","url":null,"abstract":"Flaky tests are tests that yield different outcomes when run on the same version of a program. This non-deterministic behaviour plagues continuous integration with false signals, wasting developers’ time and reducing their trust in test suites. Studies highlighted the importance of keeping tests flakiness-free. Recently, the research community has been pushing towards the detection of flaky tests by suggesting many static and dynamic approaches. While promising, those approaches mainly focus on classifying tests as flaky or not and, even when high performances are reported, it remains challenging to understand the cause of flakiness. This part is crucial for researchers and developers that aim to fix it. To help with the comprehension of a given flaky test, we propose FlakyCat, the first approach to classify flaky tests based on their root cause category. FlakyCat relies on CodeBERT for code representation and leverages Siamese networks to train a multi-class classifier. We train and evaluate FlakyCat on a set of 451 flaky tests collected from open-source Java projects. Our evaluation shows that FlakyCat categorises flaky tests accurately, with an F1 score of 73%. Furthermore, we investigate the performance of our approach for each category, revealing that Async waits, Unordered collections and Time-related flaky tests are accurately classified, while Concurrency-related flaky tests are more challenging to predict. Finally, to facilitate the comprehension of FlakyCat’s predictions, we present a new technique for CodeBERT-based model interpretability that highlights code statements influencing the categorization.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114288161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Structural Test Input Generation for 3-Address Code Coverage Using Path-Merged Symbolic Execution 使用路径合并符号执行生成三地址代码覆盖的结构测试输入
Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00012
Soha Hussein, Stephen McCamant, Elena Sherman, Vaibhav Sharma, M. Whalen
Test input generation is one of the key applications of symbolic execution (SE). However, being a path-sensitive technique, SE often faces path explosion even when creating a branch-adequate test suite. Path-merging symbolic execution (PM-SE) alleviates the path explosion problem by summarizing regions of code into disjunctive constraints, thus traversing at once a set of paths with the same prefixes. Previous work has shown that PM-SE can reduce run-time up to 38%, though these improvements can be impaired if the summarized code results in complex constraints or introduces additional symbols that increase the number of branching points in the later execution.Considering these trade-offs, examining the ability of PM-SE to generate branch-adequate test inputs is an open research problem. This paper investigates it by developing a technique that extracts structural coverage-related queries from disjoint constraints. Using this approach, we extend PM-SE to generate branch-adequate test inputs.Experiments compare the effectiveness and efficiency of test input generation by SE and PM-SE techniques. Results show that those techniques are complementary. For some programs, PM-SE yields faster coverage, with fewer generated tests, while for others, SE performs better. In addition, each technique covers branches that the other fails to discover.
测试输入生成是符号执行(SE)的关键应用之一。然而,作为一种路径敏感的技术,即使在创建分支充足的测试套件时,SE也经常面临路径爆炸。路径合并符号执行(PM-SE)通过将代码区域汇总为析取约束,从而一次遍历具有相同前缀的路径集,从而缓解了路径爆炸问题。以前的工作已经表明PM-SE可以最多减少38%的运行时间,尽管如果总结的代码导致复杂的约束或引入额外的符号,在以后的执行中增加分支点的数量,这些改进可能会受到损害。考虑到这些权衡,检查PM-SE生成分支充足的测试输入的能力是一个开放的研究问题。本文通过开发一种从不相交约束中提取结构性覆盖相关查询的技术来研究它。使用这种方法,我们扩展PM-SE以生成分支充足的测试输入。实验比较了SE和PM-SE技术生成测试输入的有效性和效率。结果表明,这些技术是互补的。对于某些程序,PM-SE产生更快的覆盖率,生成的测试更少,而对于其他程序,SE表现更好。此外,每种技术都涵盖了其他技术无法发现的分支。
{"title":"Structural Test Input Generation for 3-Address Code Coverage Using Path-Merged Symbolic Execution","authors":"Soha Hussein, Stephen McCamant, Elena Sherman, Vaibhav Sharma, M. Whalen","doi":"10.1109/AST58925.2023.00012","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00012","url":null,"abstract":"Test input generation is one of the key applications of symbolic execution (SE). However, being a path-sensitive technique, SE often faces path explosion even when creating a branch-adequate test suite. Path-merging symbolic execution (PM-SE) alleviates the path explosion problem by summarizing regions of code into disjunctive constraints, thus traversing at once a set of paths with the same prefixes. Previous work has shown that PM-SE can reduce run-time up to 38%, though these improvements can be impaired if the summarized code results in complex constraints or introduces additional symbols that increase the number of branching points in the later execution.Considering these trade-offs, examining the ability of PM-SE to generate branch-adequate test inputs is an open research problem. This paper investigates it by developing a technique that extracts structural coverage-related queries from disjoint constraints. Using this approach, we extend PM-SE to generate branch-adequate test inputs.Experiments compare the effectiveness and efficiency of test input generation by SE and PM-SE techniques. Results show that those techniques are complementary. For some programs, PM-SE yields faster coverage, with fewer generated tests, while for others, SE performs better. In addition, each technique covers branches that the other fails to discover.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121982353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a Review on Simulated ADAS/AD Testing ADAS/AD模拟测试综述
Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00015
Yavuz Köroglu, F. Wotawa
Vehicle and traffic simulation is a common practice for testing and evaluating advanced driver-assistance systems (ADAS) and autonomous driving (AD). As a result, the literature mentions numerous simulators capable of simulating ADAS/AD implementations. In this study, we investigate previous surveys that cover multiple scenarios and initiate a systematic review targeting simulators for testing ADAS/AD. Our results show that the literature mentions, in total, 181 simulators capable of evaluating one or more ADAS/AD implementations. Furthermore, according to previous surveys and reviews, the most popular simulators are CARLA, Airsim, and SUMO. Finally, our results uncover that every five years, the number of novel simulators added to the literature grows at least quadratically, showing that further review is necessary to address the differences between these simulators and understand the simulator landscape from an ADAS/AD testing perspective.
车辆和交通模拟是测试和评估高级驾驶辅助系统(ADAS)和自动驾驶(AD)的常用方法。因此,文献中提到了许多能够模拟ADAS/AD实现的模拟器。在本研究中,我们调查了先前的调查,涵盖了多个场景,并启动了针对ADAS/AD测试模拟器的系统综述。我们的研究结果表明,文献中总共提到了181个能够评估一个或多个ADAS/AD实现的模拟器。此外,根据之前的调查和评论,最受欢迎的模拟器是CARLA, Airsim和SUMO。最后,我们的研究结果表明,每隔五年,文献中新增的新型模拟器的数量至少呈二次增长,这表明有必要进一步审查以解决这些模拟器之间的差异,并从ADAS/AD测试的角度了解模拟器的前景。
{"title":"Towards a Review on Simulated ADAS/AD Testing","authors":"Yavuz Köroglu, F. Wotawa","doi":"10.1109/AST58925.2023.00015","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00015","url":null,"abstract":"Vehicle and traffic simulation is a common practice for testing and evaluating advanced driver-assistance systems (ADAS) and autonomous driving (AD). As a result, the literature mentions numerous simulators capable of simulating ADAS/AD implementations. In this study, we investigate previous surveys that cover multiple scenarios and initiate a systematic review targeting simulators for testing ADAS/AD. Our results show that the literature mentions, in total, 181 simulators capable of evaluating one or more ADAS/AD implementations. Furthermore, according to previous surveys and reviews, the most popular simulators are CARLA, Airsim, and SUMO. Finally, our results uncover that every five years, the number of novel simulators added to the literature grows at least quadratically, showing that further review is necessary to address the differences between these simulators and understand the simulator landscape from an ADAS/AD testing perspective.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122486454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Project setting using Deep learning Architectures in Just-In-Time Software Fault Prediction: An Investigation 在实时软件故障预测中使用深度学习架构的跨项目设置研究
Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00007
Sushant Kumar Pandey, A. Tripathi
The prediction of whether a software change is fault-inducing or not in the software system using various learning methods, the study concerned in Just-In-Time Software Fault Prediction (JIT-SFP). Building such predicting model requires adequate training data. However, there needs to be more training data at the beginning of the software system. Cross-Project (CP) setting can subjugate this challenge by employing data from different software projects. It can achieve similar predictive performance to Within-Project (WP) fault prediction. It is still being determined to what level the CP training data can be useful in such a situation. Furthermore, it also needs to be discovered whether CP data are helpful in the initial phase of fault detection, and when there is an inadequate WP train set, CP could be beneficial to extend. This article deals with such investigations in real software projects. We proposed a new method by levering a deep belief network and long short-term memory called JITCP-Predictor. Out of ten, the proposed model significantly outperforms every ten project benchmark methods, and it is superior from 10.63% to 136.36% and 7.04% to 35.71% in terms of MCC and F-Measure, respectively. The mean values of MCC and F-Measure produced by JITCP-Predictor are 0.52 ± 0.021 and 0.76 ± 0.76, respectively. We also found that the proposed model is more suitable for large and moderate-size projects. The proposed model avoids class imbalance and overfitting problems and takes reasonable training costs.
JIT-SFP (Just-In-Time software Fault prediction)是利用各种学习方法来预测软件系统中的软件变更是否会导致故障的研究。建立这样的预测模型需要足够的训练数据。然而,在软件系统的初始阶段,需要有更多的训练数据。跨项目(CP)设置可以通过使用来自不同软件项目的数据来克服这一挑战。它可以达到与项目内(WP)故障预测相似的预测性能。在这种情况下,CP训练数据的有用程度仍有待确定。此外,还需要发现CP数据在故障检测的初始阶段是否有帮助,当WP训练集不足时,CP是否有利于扩展。本文讨论了在实际软件项目中的此类调查。我们提出了一种利用深度信念网络和长短期记忆的新方法,称为JITCP-Predictor。其中,提出的模型显著优于每10种项目基准方法,在MCC和F-Measure方面分别优于10.63% ~ 136.36%和7.04% ~ 35.71%。JITCP-Predictor的MCC和F-Measure的平均值分别为0.52±0.021和0.76±0.76。我们还发现,所提出的模型更适合大中型项目。该模型避免了类不平衡和过拟合问题,训练成本合理。
{"title":"Cross-Project setting using Deep learning Architectures in Just-In-Time Software Fault Prediction: An Investigation","authors":"Sushant Kumar Pandey, A. Tripathi","doi":"10.1109/AST58925.2023.00007","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00007","url":null,"abstract":"The prediction of whether a software change is fault-inducing or not in the software system using various learning methods, the study concerned in Just-In-Time Software Fault Prediction (JIT-SFP). Building such predicting model requires adequate training data. However, there needs to be more training data at the beginning of the software system. Cross-Project (CP) setting can subjugate this challenge by employing data from different software projects. It can achieve similar predictive performance to Within-Project (WP) fault prediction. It is still being determined to what level the CP training data can be useful in such a situation. Furthermore, it also needs to be discovered whether CP data are helpful in the initial phase of fault detection, and when there is an inadequate WP train set, CP could be beneficial to extend. This article deals with such investigations in real software projects. We proposed a new method by levering a deep belief network and long short-term memory called JITCP-Predictor. Out of ten, the proposed model significantly outperforms every ten project benchmark methods, and it is superior from 10.63% to 136.36% and 7.04% to 35.71% in terms of MCC and F-Measure, respectively. The mean values of MCC and F-Measure produced by JITCP-Predictor are 0.52 ± 0.021 and 0.76 ± 0.76, respectively. We also found that the proposed model is more suitable for large and moderate-size projects. The proposed model avoids class imbalance and overfitting problems and takes reasonable training costs.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"1246 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114359277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoMetric: Towards Measuring Open-Source Software Quality Metrics Automatically 自动化度量:朝着自动度量开源软件质量度量的方向发展
Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00009
Taejun Lee, Heewon Park, Heejo Lee
In modern software development, open-source software (OSS) plays a crucial role. Although some methods exist to verify the safety of OSS, the current automation technologies fall short. To address this problem, we propose AutoMetric, an automatic technique for measuring security metrics for OSS in repository level. Using AutoMetric which only collects repository addresses of the projects, it is possible to inspect many projects simultaneously regardless of its size and scope. AutoMetric contains five metrics: Mean Time to Update (MU), Mean Time to Commit (MC), Number of Contributors (NC), Inactive Period (IP), and Branch Protection (BP). These metrics can be calculated quickly even if the source code changes. By comparing metrics in AutoMetric with 2,675 reported vulnerabilities in GitHub Advisory Database (GAD), the result shows that the more frequent updates and commits and the shorter the inactivity period, the more vulnerabilities were found.
在现代软件开发中,开源软件(OSS)起着至关重要的作用。虽然存在一些方法来验证OSS的安全性,但目前的自动化技术还远远不够。为了解决这个问题,我们提出了AutoMetric,这是一种用于在存储库级别度量OSS安全度量的自动技术。使用AutoMetric只收集项目的存储库地址,可以同时检查许多项目,而不考虑其大小和范围。AutoMetric包含五个度量:平均更新时间(MU)、平均提交时间(MC)、贡献者数量(NC)、非活动周期(IP)和分支保护(BP)。即使源代码更改,也可以快速计算出这些度量。通过将AutoMetric中的指标与GitHub Advisory Database (GAD)中报告的2675个漏洞进行比较,结果表明,更新和提交越频繁,不活跃时间越短,发现的漏洞就越多。
{"title":"AutoMetric: Towards Measuring Open-Source Software Quality Metrics Automatically","authors":"Taejun Lee, Heewon Park, Heejo Lee","doi":"10.1109/AST58925.2023.00009","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00009","url":null,"abstract":"In modern software development, open-source software (OSS) plays a crucial role. Although some methods exist to verify the safety of OSS, the current automation technologies fall short. To address this problem, we propose AutoMetric, an automatic technique for measuring security metrics for OSS in repository level. Using AutoMetric which only collects repository addresses of the projects, it is possible to inspect many projects simultaneously regardless of its size and scope. AutoMetric contains five metrics: Mean Time to Update (MU), Mean Time to Commit (MC), Number of Contributors (NC), Inactive Period (IP), and Branch Protection (BP). These metrics can be calculated quickly even if the source code changes. By comparing metrics in AutoMetric with 2,675 reported vulnerabilities in GitHub Advisory Database (GAD), the result shows that the more frequent updates and commits and the shorter the inactivity period, the more vulnerabilities were found.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115128009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-coverage testing of functionally equivalent programs 功能等效程序的交叉覆盖测试
Pub Date : 2023-04-28 DOI: 10.1109/AST58925.2023.00014
A. Bertolino, G. D. Angelis, F. Giandomenico, F. Lonetti
Cross-coverage of a program P refers to the test coverage measured over a different program Q that is functionally equivalent to P. The novel concept of cross-coverage can find useful applications in the test of redundant software. We apply here cross-coverage for test suite augmentation and show that additional test cases generated from the coverage of an equivalent program, referred to as cross tests, can increase the coverage of a program in more effective way than a random baseline. We also observe that -contrary to traditional coverage testing-cross coverage could help finding (artificially created) missing functionality faults.
程序P的交叉覆盖是指在功能上等同于P的不同程序Q上测量的测试覆盖。交叉覆盖的新概念可以在冗余软件的测试中找到有用的应用。我们在这里将交叉覆盖应用于测试套件扩展,并展示了从等效程序的覆盖中生成的额外测试用例,称为交叉测试,可以比随机基线更有效地增加程序的覆盖。我们还观察到——与传统的覆盖测试相反——交叉覆盖可以帮助发现(人为创建的)缺失的功能错误。
{"title":"Cross-coverage testing of functionally equivalent programs","authors":"A. Bertolino, G. D. Angelis, F. Giandomenico, F. Lonetti","doi":"10.1109/AST58925.2023.00014","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00014","url":null,"abstract":"Cross-coverage of a program P refers to the test coverage measured over a different program Q that is functionally equivalent to P. The novel concept of cross-coverage can find useful applications in the test of redundant software. We apply here cross-coverage for test suite augmentation and show that additional test cases generated from the coverage of an equivalent program, referred to as cross tests, can increase the coverage of a program in more effective way than a random baseline. We also observe that -contrary to traditional coverage testing-cross coverage could help finding (artificially created) missing functionality faults.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123702187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Effect of Instrumentation on Test Flakiness 仪器仪表对测试薄片的影响
Pub Date : 2023-03-17 DOI: 10.1109/AST58925.2023.00016
Shawn Rasheed, Jens Dietrich, Amjed Tahir
Test flakiness is a problem that affects testing and processes that rely on it. Several factors cause or influence the flakiness of test outcomes. Test execution order, randomness and concurrency are some of the more common and well-studied causes. Some studies mention code instrumentation as a factor that causes or affects test flakiness. However, evidence for this issue is scarce. In this study, we attempt to systematically collect evidence for the effects of instrumentation on test flakiness. We experiment with common types of instrumentation for Java programs—namely, application performance monitoring, coverage and profiling instrumentation. We then study the effects of instrumentation on a set of nine programs obtained from an existing dataset used to study test flakiness, consisting of popular GitHub projects written in Java. We observe cases where realworld instrumentation causes flakiness in a program. However, this effect is rare. We also discuss a related issue—how instrumentation may interfere with flakiness detection and prevention.
测试碎片是一个影响测试和依赖于它的过程的问题。有几个因素导致或影响测试结果的不稳定。测试执行顺序、随机性和并发性是一些更常见且研究得很充分的原因。一些研究提到代码插装是导致或影响测试不稳定的一个因素。然而,这一问题的证据很少。在这项研究中,我们试图系统地收集仪器对测试碎片的影响的证据。我们对Java程序的常见插装类型进行了试验——即应用程序性能监视、覆盖和分析插装。然后,我们研究了仪器对一组9个程序的影响,这些程序来自一个用于研究测试脆弱性的现有数据集,由用Java编写的流行GitHub项目组成。我们观察到,在现实世界中,插装会导致程序出现漏洞。然而,这种效果是罕见的。我们还讨论了一个相关的问题-仪器仪表如何干扰薄片的检测和预防。
{"title":"On the Effect of Instrumentation on Test Flakiness","authors":"Shawn Rasheed, Jens Dietrich, Amjed Tahir","doi":"10.1109/AST58925.2023.00016","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00016","url":null,"abstract":"Test flakiness is a problem that affects testing and processes that rely on it. Several factors cause or influence the flakiness of test outcomes. Test execution order, randomness and concurrency are some of the more common and well-studied causes. Some studies mention code instrumentation as a factor that causes or affects test flakiness. However, evidence for this issue is scarce. In this study, we attempt to systematically collect evidence for the effects of instrumentation on test flakiness. We experiment with common types of instrumentation for Java programs—namely, application performance monitoring, coverage and profiling instrumentation. We then study the effects of instrumentation on a set of nine programs obtained from an existing dataset used to study test flakiness, consisting of popular GitHub projects written in Java. We observe cases where realworld instrumentation causes flakiness in a program. However, this effect is rare. We also discuss a related issue—how instrumentation may interfere with flakiness detection and prevention.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115022013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2023 IEEE/ACM International Conference on Automation of Software Test (AST)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1