2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)最新文献_第2页

AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System AUTOTRAINER:一个自动DNN训练问题检测和修复系统

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00043

Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen

With machine learning models especially Deep Neural Network (DNN) models becoming an integral part of the new intelligent software, new tools to support their engineering process are in high demand. Existing DNN debugging tools are either post-training which wastes a lot of time training a buggy model and requires expertises, or limited on collecting training logs without analyzing the problem not even fixing them. In this paper, we propose AUTOTRAINER, a DNN training monitoring and automatic repairing tool which supports detecting and auto repairing five commonly seen training problems. During training, it periodically checks the training status and detects potential problems. Once a problem is found, AUTOTRAINER tries to fix it by using built-in state-of-the-art solutions. It supports various model structures and input data types, such as Convolutional Neural Networks (CNNs) for image and Recurrent Neural Networks (RNNs) for texts. Our evaluation on 6 datasets, 495 models show that AUTOTRAINER can effectively detect all potential problems with 100% detection rate and no false positives. Among all models with problems, it can fix 97.33% of them, increasing the accuracy by 47.08% on average.

随着机器学习模型，特别是深度神经网络(DNN)模型成为新的智能软件的组成部分，支持其工程过程的新工具需求量很大。现有的DNN调试工具要么是训练后的，这浪费了大量时间来训练一个有bug的模型，需要专业知识，要么仅限于收集训练日志，而没有分析问题，甚至没有修复问题。在本文中，我们提出了AUTOTRAINER，一个DNN训练监测和自动修复工具，支持检测和自动修复五个常见的训练问题。在训练过程中，定期检查训练状态，发现潜在问题。一旦发现问题，AUTOTRAINER会尝试使用内置的最先进的解决方案来解决问题。它支持各种模型结构和输入数据类型，例如图像的卷积神经网络(cnn)和文本的循环神经网络(rnn)。我们对6个数据集，495个模型的评估表明，AUTOTRAINER可以有效地检测出所有潜在的问题，检出率为100%，无误报。在所有存在问题的模型中，修复率达到97.33%，平均提高准确率47.08%。

{"title":"AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System","authors":"Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen","doi":"10.1109/ICSE43902.2021.00043","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00043","url":null,"abstract":"With machine learning models especially Deep Neural Network (DNN) models becoming an integral part of the new intelligent software, new tools to support their engineering process are in high demand. Existing DNN debugging tools are either post-training which wastes a lot of time training a buggy model and requires expertises, or limited on collecting training logs without analyzing the problem not even fixing them. In this paper, we propose AUTOTRAINER, a DNN training monitoring and automatic repairing tool which supports detecting and auto repairing five commonly seen training problems. During training, it periodically checks the training status and detects potential problems. Once a problem is found, AUTOTRAINER tries to fix it by using built-in state-of-the-art solutions. It supports various model structures and input data types, such as Convolutional Neural Networks (CNNs) for image and Recurrent Neural Networks (RNNs) for texts. Our evaluation on 6 datasets, 495 models show that AUTOTRAINER can effectively detect all potential problems with 100% detection rate and no false positives. Among all models with problems, it can fix 97.33% of them, increasing the accuracy by 47.08% on average.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122758063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Identifying Key Features from App User Reviews 从应用用户评论中识别关键功能

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00088

Huayao Wu, Wenjun Deng, Xintao Niu, Changhai Nie

Due to the rapid growth and strong competition of mobile application (app) market, app developers should not only offer users with attractive new features, but also carefully maintain and improve existing features based on users' feedbacks. User reviews indicate a rich source of information to plan such feature maintenance activities, and it could be of great benefit for developers to evaluate and magnify the contribution of specific features to the overall success of their apps. In this study, we refer to the features that are highly correlated to app ratings as key features, and we present KEFE, a novel approach that leverages app description and user reviews to identify key features of a given app. The application of KEFE especially relies on natural language processing, deep machine learning classifier, and regression analysis technique, which involves three main steps: 1) extracting feature-describing phrases from app description; 2) matching each app feature with its relevant user reviews; and 3) building a regression model to identify features that have significant relationships with app ratings. To train and evaluate KEFE, we collect 200 app descriptions and 1,108,148 user reviews from Chinese Apple App Store. Experimental results demonstrate the effectiveness of KEFE in feature extraction, where an average F-measure of 78.13% is achieved. The key features identified are also likely to provide hints for successful app releases, as for the releases that receive higher app ratings, 70% of features improvements are related to key features.

由于移动应用市场的快速增长和激烈竞争，应用开发者不仅要为用户提供有吸引力的新功能，还要根据用户的反馈仔细维护和改进现有功能。用户评论为计划功能维护活动提供了丰富的信息来源，这对开发者评估和放大特定功能对应用整体成功的贡献大有裨益。在本研究中，我们将与应用评级高度相关的特征称为关键特征，并提出了KEFE，这是一种利用应用描述和用户评论来识别给定应用的关键特征的新方法。KEFE的应用尤其依赖于自然语言处理、深度机器学习分类器和回归分析技术，主要包括三个步骤:1)从应用描述中提取特征描述短语;2)将每个应用功能与其相关的用户评论进行匹配;3)建立回归模型以识别与应用评级有显著关系的功能。为了训练和评估KEFE，我们从中国苹果应用商店收集了200个应用描述和1108148个用户评论。实验结果证明了KEFE在特征提取中的有效性，平均f值达到78.13%。确定的关键功能也可能为成功的应用发布提供线索，因为在获得较高应用评级的发布中，70%的功能改进与关键功能有关。

{"title":"Identifying Key Features from App User Reviews","authors":"Huayao Wu, Wenjun Deng, Xintao Niu, Changhai Nie","doi":"10.1109/ICSE43902.2021.00088","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00088","url":null,"abstract":"Due to the rapid growth and strong competition of mobile application (app) market, app developers should not only offer users with attractive new features, but also carefully maintain and improve existing features based on users' feedbacks. User reviews indicate a rich source of information to plan such feature maintenance activities, and it could be of great benefit for developers to evaluate and magnify the contribution of specific features to the overall success of their apps. In this study, we refer to the features that are highly correlated to app ratings as key features, and we present KEFE, a novel approach that leverages app description and user reviews to identify key features of a given app. The application of KEFE especially relies on natural language processing, deep machine learning classifier, and regression analysis technique, which involves three main steps: 1) extracting feature-describing phrases from app description; 2) matching each app feature with its relevant user reviews; and 3) building a regression model to identify features that have significant relationships with app ratings. To train and evaluate KEFE, we collect 200 app descriptions and 1,108,148 user reviews from Chinese Apple App Store. Experimental results demonstrate the effectiveness of KEFE in feature extraction, where an average F-measure of 78.13% is achieved. The key features identified are also likely to provide hints for successful app releases, as for the releases that receive higher app ratings, 70% of features improvements are related to key features.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129661928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

On Indirectly Dependent Documentation in the Context of Code Evolution: A Study 代码演化背景下的间接依赖文档研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00134

Devika Sondhi, Avyakt Gupta, Salil Purandare, A. Rana, Deepanshu Kaushal, Rahul Purandare

A software system evolves over time due to factors such as bug-fixes, enhancements, optimizations and deprecation. As entities interact in a software repository, the alterations made at one point may require the changes to be reflected at various other points to maintain consistency. However, often less attention is given to making appropriate changes to the documentation associated with the functions. Inconsistent documentation is undesirable, since documentation serves as a useful source of information about the functionality. This paper presents a study on the prevalence of function documentations that are indirectly or implicitly dependent on entities other than the associated function. We observe a substantial presence of such documentations, with 62% of the studied Javadoc comments being dependent on other entities, as studied in 11 open-source repositories implemented in Java. We comprehensively analyze the nature of documentation updates made in 1288 commit logs and study patterns to reason about the cause of dependency in the documentation. Our findings from the observed patterns may be applied to suggest documentations that should be updated on making a change in the repository.

由于诸如bug修复、增强、优化和弃用等因素，软件系统会随着时间的推移而发展。当实体在软件存储库中交互时，在一点上所做的更改可能需要在不同的其他点上反映更改以保持一致性。但是，对与函数相关的文档进行适当更改的注意往往较少。不一致的文档是不可取的，因为文档是关于功能的有用信息来源。本文对间接或隐含地依赖于相关功能以外的实体的功能文档的流行进行了研究。我们观察到此类文档的大量存在，62%的Javadoc注释依赖于其他实体，这是在11个Java实现的开源存储库中研究的。我们全面分析了1288个提交日志中所做的文档更新的性质，并研究了模式，以推断文档中依赖的原因。我们从观察到的模式中得到的发现可以应用于建议在对存储库进行更改时应该更新的文档。

{"title":"On Indirectly Dependent Documentation in the Context of Code Evolution: A Study","authors":"Devika Sondhi, Avyakt Gupta, Salil Purandare, A. Rana, Deepanshu Kaushal, Rahul Purandare","doi":"10.1109/ICSE43902.2021.00134","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00134","url":null,"abstract":"A software system evolves over time due to factors such as bug-fixes, enhancements, optimizations and deprecation. As entities interact in a software repository, the alterations made at one point may require the changes to be reflected at various other points to maintain consistency. However, often less attention is given to making appropriate changes to the documentation associated with the functions. Inconsistent documentation is undesirable, since documentation serves as a useful source of information about the functionality. This paper presents a study on the prevalence of function documentations that are indirectly or implicitly dependent on entities other than the associated function. We observe a substantial presence of such documentations, with 62% of the studied Javadoc comments being dependent on other entities, as studied in 11 open-source repositories implemented in Java. We comprehensively analyze the nature of documentation updates made in 1288 commit logs and study patterns to reason about the cause of dependency in the documentation. Our findings from the observed patterns may be applied to suggest documentations that should be updated on making a change in the repository.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116898202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Supporting Quality Assurance with Automated Process-Centric Quality Constraints Checking 支持以自动化过程为中心的质量约束检查的质量保证

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00118

Christoph Mayr-Dorn, Michael Vierhauser, Stefan Bichler, Felix Keplinger, J. Cleland-Huang, Alexander Egyed, Thomas Mehofer

Regulations, standards, and guidelines for safety-critical systems stipulate stringent traceability but do not prescribe the corresponding, detailed software engineering process. Given the industrial practice of using only semi-formal notations to describe engineering processes, processes are rarely "executable" and developers have to spend significant manual effort in ensuring that they follow the steps mandated by quality assurance. The size and complexity of systems and regulations makes manual, timely feedback from Quality Assurance (QA) engineers infeasible. In this paper we propose a novel framework for tracking processes in the background, automatically checking QA constraints depending on process progress, and informing the developer of unfulfilled QA constraints. We evaluate our approach by applying it to two different case studies; one open source community system and a safety-critical system in the air-traffic control domain. Results from the analysis show that trace links are often corrected or completed after the fact and thus timely and automated constraint checking support has significant potential on reducing rework.

安全关键系统的规章、标准和指导方针规定了严格的可追溯性，但没有规定相应的、详细的软件工程过程。考虑到仅使用半形式化符号来描述工程过程的工业实践，过程很少是“可执行的”，开发人员必须花费大量的手工工作来确保他们遵循质量保证规定的步骤。系统和规则的规模和复杂性使得来自质量保证(QA)工程师的手动、及时的反馈变得不可行。在本文中，我们提出了一个新的框架，用于在后台跟踪过程，根据过程进度自动检查QA约束，并通知开发人员未实现的QA约束。我们通过将其应用于两个不同的案例研究来评估我们的方法;一个开源社区系统和一个空中交通管制领域的安全关键系统。分析结果表明，跟踪环节经常在事后被纠正或完成，因此及时和自动化的约束检查支持在减少返工方面具有重要的潜力。

{"title":"Supporting Quality Assurance with Automated Process-Centric Quality Constraints Checking","authors":"Christoph Mayr-Dorn, Michael Vierhauser, Stefan Bichler, Felix Keplinger, J. Cleland-Huang, Alexander Egyed, Thomas Mehofer","doi":"10.1109/ICSE43902.2021.00118","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00118","url":null,"abstract":"Regulations, standards, and guidelines for safety-critical systems stipulate stringent traceability but do not prescribe the corresponding, detailed software engineering process. Given the industrial practice of using only semi-formal notations to describe engineering processes, processes are rarely \"executable\" and developers have to spend significant manual effort in ensuring that they follow the steps mandated by quality assurance. The size and complexity of systems and regulations makes manual, timely feedback from Quality Assurance (QA) engineers infeasible. In this paper we propose a novel framework for tracking processes in the background, automatically checking QA constraints depending on process progress, and informing the developer of unfulfilled QA constraints. We evaluate our approach by applying it to two different case studies; one open source community system and a safety-critical system in the air-traffic control domain. Results from the analysis show that trace links are often corrected or completed after the fact and thus timely and automated constraint checking support has significant potential on reducing rework.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123998537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Sustainable Solving: Reducing the Memory Footprint of IFDS-Based Data Flow Analyses Using Intelligent Garbage Collection 可持续解决:使用智能垃圾收集减少基于ifds的数据流分析的内存占用

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00102

Steven Arzt

Static data flow analysis is an integral building block for many applications, ranging from compile-time code optimization to security and privacy analysis. When assessing whether a mobile app is trustworthy, for example, analysts need to identify which of the user's personal data is sent to external parties such as the app developer or cloud providers. Since accessing and sending data is usually done via API calls, tracking the data flow between source and sink API is often the method of choice. Precise algorithms such as IFDS help reduce the number of false positives, but also introduce significant performance penalties. With its fixpoint iteration over the program's entire exploded supergraph, IFDS is particularly memory-intensive, consuming hundreds of megabytes or even several gigabytes for medium-sized apps. In this paper, we present a technique called CleanDroid for reducing the memory footprint of a precise IFDS-based data flow analysis and demonstrate its effectiveness in the popular FlowDroid open-source data flow solver. CleanDroid efficiently removes edges from the path edge table used for the IFDS fixpoint iteration without affecting termination. As we show on 600 realworld Android apps from the Google Play Store, CleanDroid reduces the average per-app memory consumption by around 63% to 78%. At the same time, CleanDroid speeds up the analysis by up to 66%.

静态数据流分析是许多应用程序不可或缺的组成部分，从编译时代码优化到安全性和隐私分析。例如，在评估移动应用程序是否值得信赖时，分析师需要确定哪些用户的个人数据被发送给了应用程序开发人员或云提供商等外部方。由于访问和发送数据通常是通过API调用完成的，因此跟踪源和接收API之间的数据流通常是选择的方法。像IFDS这样的精确算法有助于减少误报的数量，但也会带来严重的性能损失。由于在程序的整个爆炸超图上进行定点迭代，IFDS的内存消耗特别大，对于中型应用程序来说，它需要消耗数百兆字节甚至几gb的内存。在本文中，我们提出了一种名为CleanDroid的技术，用于减少基于ifds的精确数据流分析的内存占用，并在流行的FlowDroid开源数据流求解器中展示了其有效性。CleanDroid在不影响终止的情况下，有效地从用于IFDS定点迭代的路径边缘表中删除边缘。正如我们在Google Play Store的600个真实Android应用中所展示的那样，CleanDroid将每个应用的平均内存消耗减少了约63%至78%。同时，CleanDroid将分析速度提高了66%。

{"title":"Sustainable Solving: Reducing the Memory Footprint of IFDS-Based Data Flow Analyses Using Intelligent Garbage Collection","authors":"Steven Arzt","doi":"10.1109/ICSE43902.2021.00102","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00102","url":null,"abstract":"Static data flow analysis is an integral building block for many applications, ranging from compile-time code optimization to security and privacy analysis. When assessing whether a mobile app is trustworthy, for example, analysts need to identify which of the user's personal data is sent to external parties such as the app developer or cloud providers. Since accessing and sending data is usually done via API calls, tracking the data flow between source and sink API is often the method of choice. Precise algorithms such as IFDS help reduce the number of false positives, but also introduce significant performance penalties. With its fixpoint iteration over the program's entire exploded supergraph, IFDS is particularly memory-intensive, consuming hundreds of megabytes or even several gigabytes for medium-sized apps. In this paper, we present a technique called CleanDroid for reducing the memory footprint of a precise IFDS-based data flow analysis and demonstrate its effectiveness in the popular FlowDroid open-source data flow solver. CleanDroid efficiently removes edges from the path edge table used for the IFDS fixpoint iteration without affecting termination. As we show on 600 realworld Android apps from the Google Play Store, CleanDroid reduces the average per-app memory consumption by around 63% to 78%. At the same time, CleanDroid speeds up the analysis by up to 66%.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"491 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116324897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

AutoCCAG: An Automated Approach to Constrained Covering Array Generation AutoCCAG:约束覆盖阵列生成的自动化方法

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00030

Chuan Luo, Jinkun Lin, Shaowei Cai, Xin Chen, Bing He, Bo Qiao, Pu Zhao, Qingwei Lin, Hongyu Zhang, Wei Wu, S. Rajmohan, Dongmei Zhang

Combinatorial interaction testing (CIT) is an important technique for testing highly configurable software systems with demonstrated effectiveness in practice. The goal of CIT is to generate test cases covering the interactions of configuration options, under certain hard constraints. In this context, constrained covering arrays (CCAs) are frequently used as test cases in CIT. Constrained Covering Array Generation (CCAG) is an NP-hard combinatorial optimization problem, solving which requires an effective method for generating small CCAs. In particular, effectively solving t-way CCAG with t>=4 is even more challenging. Inspired by the success of automated algorithm configuration and automated algorithm selection in solving combinatorial optimization problems, in this paper, we investigate the efficacy of automated algorithm configuration and automated algorithm selection for the CCAG problem, and propose a novel, automated CCAG approach called AutoCCAG. Extensive experiments on public benchmarks show that AutoCCAG can find much smaller-sized CCAs than current state-of-the-art approaches, indicating the effectiveness of AutoCCAG. More encouragingly, to our best knowledge, our paper reports the first results for CCAG with a high coverage strength (i.e., 5-way CCAG) on public benchmarks. Our results demonstrate that AutoCCAG can bring considerable benefits in testing highly configurable software systems.

组合交互测试(CIT)是测试高可配置软件系统的一种重要技术，在实践中已经证明了它的有效性。CIT的目标是在某些硬性约束下生成覆盖配置选项交互的测试用例。在此背景下，约束覆盖阵列(cca)经常被用作CIT中的测试用例，约束覆盖阵列生成(CCAG)是一个NP-hard组合优化问题，解决这一问题需要一种有效的方法来生成小的cca。特别是，有效求解t>=4的t-way CCAG更具挑战性。受自动算法配置和自动算法选择在解决组合优化问题中的成功启发，本文研究了自动算法配置和自动算法选择在CCAG问题中的有效性，并提出了一种新的自动CCAG方法，称为AutoCCAG。在公共基准上进行的大量实验表明，AutoCCAG可以找到比当前最先进的方法小得多的cca，这表明了AutoCCAG的有效性。更令人鼓舞的是，据我们所知，我们的论文报告了在公共基准上具有高覆盖强度(即5路CCAG)的CCAG的第一个结果。我们的结果表明，AutoCCAG可以为测试高度可配置的软件系统带来相当大的好处。

{"title":"AutoCCAG: An Automated Approach to Constrained Covering Array Generation","authors":"Chuan Luo, Jinkun Lin, Shaowei Cai, Xin Chen, Bing He, Bo Qiao, Pu Zhao, Qingwei Lin, Hongyu Zhang, Wei Wu, S. Rajmohan, Dongmei Zhang","doi":"10.1109/ICSE43902.2021.00030","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00030","url":null,"abstract":"Combinatorial interaction testing (CIT) is an important technique for testing highly configurable software systems with demonstrated effectiveness in practice. The goal of CIT is to generate test cases covering the interactions of configuration options, under certain hard constraints. In this context, constrained covering arrays (CCAs) are frequently used as test cases in CIT. Constrained Covering Array Generation (CCAG) is an NP-hard combinatorial optimization problem, solving which requires an effective method for generating small CCAs. In particular, effectively solving t-way CCAG with t>=4 is even more challenging. Inspired by the success of automated algorithm configuration and automated algorithm selection in solving combinatorial optimization problems, in this paper, we investigate the efficacy of automated algorithm configuration and automated algorithm selection for the CCAG problem, and propose a novel, automated CCAG approach called AutoCCAG. Extensive experiments on public benchmarks show that AutoCCAG can find much smaller-sized CCAs than current state-of-the-art approaches, indicating the effectiveness of AutoCCAG. More encouragingly, to our best knowledge, our paper reports the first results for CCAG with a high coverage strength (i.e., 5-way CCAG) on public benchmarks. Our results demonstrate that AutoCCAG can bring considerable benefits in testing highly configurable software systems.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114460791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Semantic Patches for Adaptation of JavaScript Programs to Evolving Libraries 为JavaScript程序适应不断发展的库的语义补丁

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00020

Benjamin Barslev Nielsen, Martin Toldam Torp, Anders Møller

JavaScript libraries are often updated and sometimes breaking changes are introduced in the process, resulting in the client developers having to adapt their code to the changes. In addition to locating the affected parts of their code, the client developers must apply suitable patches, which is a tedious, error-prone, and entirely manual process. To reduce the manual effort, we present JSFIX. Given a collection of semantic patches, which are formalized descriptions of the breaking changes, the tool detects the locations affected by breaking changes and then transforms those parts of the code to become compatible with the new library version. JSFIX relies on an existing static analysis to approximate the set of affected locations, and an interactive process where the user answers questions about the client code to filter away false positives. An evaluation involving 12 popular JavaScript libraries and 203 clients shows that our notion of semantic patches can accurately express most of the breaking changes that occur in practice, and that JSFIX can successfully adapt most of the clients to the changes. In particular, 31 clients have accepted pull requests made by JSFIX, indicating that the code quality is good enough for practical usage. It takes JSFIX only a few seconds to patch, on average, 3.8 source locations affected by breaking changes in each client, with only 2.7 questions to the user, which suggests that the approach can significantly reduce the manual effort required when adapting JavaScript programs to evolving libraries.

JavaScript库经常更新，有时在这个过程中会引入破坏性的更改，导致客户端开发人员不得不调整他们的代码以适应这些更改。除了定位其代码中受影响的部分之外，客户端开发人员还必须应用合适的补丁，这是一个乏味、容易出错且完全手动的过程。为了减少手工工作，我们提供了JSFIX。给定一组语义补丁(它们是破坏性更改的形式化描述)，该工具会检测受破坏性更改影响的位置，然后将这些代码部分转换为与新库版本兼容。JSFIX依赖于现有的静态分析来估计受影响的位置集，并依赖于用户回答有关客户机代码的问题的交互式过程来过滤掉误报。一项涉及12个流行的JavaScript库和203个客户端的评估表明，我们的语义补丁概念可以准确地表达在实践中发生的大多数破坏性更改，并且JSFIX可以成功地使大多数客户端适应这些更改。特别是，有31个客户端接受了JSFIX发出的拉取请求，这表明代码质量足够好，可以实际使用。JSFIX只需要几秒钟的时间来修补，平均每个客户端中有3.8个受破坏性更改影响的源代码位置，只需要向用户提出2.7个问题，这表明该方法可以显著减少在使JavaScript程序适应不断发展的库时所需的手工工作。

{"title":"Semantic Patches for Adaptation of JavaScript Programs to Evolving Libraries","authors":"Benjamin Barslev Nielsen, Martin Toldam Torp, Anders Møller","doi":"10.1109/ICSE43902.2021.00020","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00020","url":null,"abstract":"JavaScript libraries are often updated and sometimes breaking changes are introduced in the process, resulting in the client developers having to adapt their code to the changes. In addition to locating the affected parts of their code, the client developers must apply suitable patches, which is a tedious, error-prone, and entirely manual process. To reduce the manual effort, we present JSFIX. Given a collection of semantic patches, which are formalized descriptions of the breaking changes, the tool detects the locations affected by breaking changes and then transforms those parts of the code to become compatible with the new library version. JSFIX relies on an existing static analysis to approximate the set of affected locations, and an interactive process where the user answers questions about the client code to filter away false positives. An evaluation involving 12 popular JavaScript libraries and 203 clients shows that our notion of semantic patches can accurately express most of the breaking changes that occur in practice, and that JSFIX can successfully adapt most of the clients to the changes. In particular, 31 clients have accepted pull requests made by JSFIX, indicating that the code quality is good enough for practical usage. It takes JSFIX only a few seconds to patch, on average, 3.8 source locations affected by breaking changes in each client, with only 2.7 questions to the user, which suggests that the approach can significantly reduce the manual effort required when adapting JavaScript programs to evolving libraries.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130912116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

JUSTGen: Effective Test Generation for Unspecified JNI Behaviors on JVMs jusgen:为jvm上未指定的JNI行为生成有效的测试

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00151

Sungjae Hwang, Sungho Lee, Jihoon Kim, Sukyoung Ryu

Java Native Interface (JNI) provides a way for Java applications to access native libraries, but it is difficult to develop correct JNI programs. By leveraging native code, the JNI enables Java developers to implement efficient applications and to reuse code written in other programming languages such as C and C++. Besides, the core Java libraries already use the JNI to provide system features like a graphical user interface. As a result, many mainstream Java Virtual Machines (JVMs) support the JNI. However, due to the complex interoperation semantics between different programming languages, implementing correct JNI programs is not trivial. Moreover, because of the performance overhead, JVMs do not validate erroneous JNI interoperations by default, but they validate them only when the debug feature, the -Xcheck:jni option, is enabled. Therefore, the correctness of JNI programs highly relies on the checks by the -Xcheck:jni option of JVMs. Questions remain, however, on the quality of the checks provided by the feature. Are there any properties that the -Xcheck:jni option fails to validate? If so, what potential issues can arise due to the lack of such validation? To the best of our knowledge, no research has explored these questions in-depth. In this paper, we empirically study the validation quality and impacts of the -Xcheck:jni option on mainstream JVMs using unspecified corner cases in the JNI specification. Such unspecified cases may lead to unexpected run-time behaviors because their semantics is not defined in the specification. For a systematic study, we propose JUSTGEN, a semi-automated approach to identify unspecified cases from a specification and generate test programs. JUSTGEN receives the JNI specification written in our domain specific language (DSL), and automatically discovers unspecified cases using an SMT solver. It then generates test programs that trigger the behaviors of unspecified cases. Using the generated tests, we empirically study the validation ability of the -Xcheck:jni option. Our experimental result shows that the JNI debug feature does not validate thousands of unspecified cases on JVMs, and they can cause critical run-time errors such as violation of the Java type system and memory corruption. We reported 792 unspecified cases that are not validated by JVMs to their corresponding JVM vendors. Among them, 563 cases have been fixed and the remaining cases will be fixed in near future. Based on our empirical study, we believe that the JNI specification should specify the semantics of the missing cases clearly and the debug feature should be supported completely.

Java本机接口(Java Native Interface, JNI)为Java应用程序提供了访问本机库的方法，但是很难开发正确的JNI程序。通过利用本机代码，JNI使Java开发人员能够实现高效的应用程序，并重用用其他编程语言(如C和c++)编写的代码。此外，核心Java库已经使用JNI来提供图形用户界面等系统特性。因此，许多主流的Java虚拟机(jvm)都支持JNI。然而，由于不同编程语言之间复杂的互操作语义，实现正确的JNI程序并非易事。此外，由于性能开销，jvm默认情况下不会验证错误的JNI互操作，但只有在启用调试特性(-Xcheck: JNI选项)时才会验证它们。因此，JNI程序的正确性高度依赖于jvm的-Xcheck: JNI选项的检查。然而，该功能提供的检查质量仍然存在问题。是否有任何属性-Xcheck:jni选项无法验证?如果是这样，由于缺乏这样的验证，可能会出现哪些潜在的问题?据我们所知，还没有研究深入探讨过这些问题。在本文中，我们使用jni规范中未指定的角落案例，实证地研究了-Xcheck:jni选项对主流jvm的验证质量和影响。这种未指定的情况可能导致意外的运行时行为，因为它们的语义没有在规范中定义。对于系统的研究，我们提出JUSTGEN，这是一种半自动化的方法，用于从规范中识别未指定的案例并生成测试程序。JUSTGEN接收用我们的领域特定语言(DSL)编写的JNI规范，并使用SMT求解器自动发现未指定的情况。然后，它生成触发未指定情况的行为的测试程序。使用生成的测试，我们经验地研究了-Xcheck:jni选项的验证能力。我们的实验结果表明，JNI调试特性不能验证jvm上数千种未指定的情况，并且它们可能导致严重的运行时错误，例如违反Java类型系统和内存损坏。我们向相应的JVM供应商报告了792个未被JVM验证的未指定案例。其中563个案件已经确定，其余案件将在近期确定。根据我们的实证研究，我们认为JNI规范应该清楚地指定缺失用例的语义，并且应该完全支持调试特性。

{"title":"JUSTGen: Effective Test Generation for Unspecified JNI Behaviors on JVMs","authors":"Sungjae Hwang, Sungho Lee, Jihoon Kim, Sukyoung Ryu","doi":"10.1109/ICSE43902.2021.00151","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00151","url":null,"abstract":"Java Native Interface (JNI) provides a way for Java applications to access native libraries, but it is difficult to develop correct JNI programs. By leveraging native code, the JNI enables Java developers to implement efficient applications and to reuse code written in other programming languages such as C and C++. Besides, the core Java libraries already use the JNI to provide system features like a graphical user interface. As a result, many mainstream Java Virtual Machines (JVMs) support the JNI. However, due to the complex interoperation semantics between different programming languages, implementing correct JNI programs is not trivial. Moreover, because of the performance overhead, JVMs do not validate erroneous JNI interoperations by default, but they validate them only when the debug feature, the -Xcheck:jni option, is enabled. Therefore, the correctness of JNI programs highly relies on the checks by the -Xcheck:jni option of JVMs. Questions remain, however, on the quality of the checks provided by the feature. Are there any properties that the -Xcheck:jni option fails to validate? If so, what potential issues can arise due to the lack of such validation? To the best of our knowledge, no research has explored these questions in-depth. In this paper, we empirically study the validation quality and impacts of the -Xcheck:jni option on mainstream JVMs using unspecified corner cases in the JNI specification. Such unspecified cases may lead to unexpected run-time behaviors because their semantics is not defined in the specification. For a systematic study, we propose JUSTGEN, a semi-automated approach to identify unspecified cases from a specification and generate test programs. JUSTGEN receives the JNI specification written in our domain specific language (DSL), and automatically discovers unspecified cases using an SMT solver. It then generates test programs that trigger the behaviors of unspecified cases. Using the generated tests, we empirically study the validation ability of the -Xcheck:jni option. Our experimental result shows that the JNI debug feature does not validate thousands of unspecified cases on JVMs, and they can cause critical run-time errors such as violation of the Java type system and memory corruption. We reported 792 unspecified cases that are not validated by JVMs to their corresponding JVM vendors. Among them, 563 cases have been fixed and the remaining cases will be fixed in near future. Based on our empirical study, we believe that the JNI specification should specify the semantics of the missing cases clearly and the debug feature should be supported completely.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132638314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

MuDelta: Delta-Oriented Mutation Testing at Commit Time MuDelta:提交时面向增量的突变测试

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00086

Wei Ma, T. Chekam, Mike Papadakis, M. Harman

To effectively test program changes using mutation testing, one needs to use mutants that are relevant to the altered program behaviours. In view of this, we introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviours. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR Curve) AUC values with 0.63 and 0.32 precision and recall values. These predictions are significantly higher than random guesses, 0.20 (PR-Curve) AUC, 0.21 and 0.21 precision and recall, and subsequently lead to strong relevant tests that kill 45%more relevant mutants than randomly sampled mutants (either sampled from those residing on the changed component(s) or from the changed lines). Our results also show that MuDelta selects mutants with 27% higher fault revealing ability in fault introducing commits. Taken together, our results corroborate the conclusion that commit-based mutation testing is suitable and promising for evolving software.

为了使用突变测试有效地测试程序更改，需要使用与更改的程序行为相关的突变。鉴于此，我们引入了MuDelta，一种识别与提交相关的突变体的方法;影响和被改变的程序行为影响的突变体。我们的方法将机器学习应用于静态代码特征的基于图和矢量表示的组合方案。我们的结果来自21个coretils程序中的50个提交，证明了我们的方法具有很强的预测能力;得到0.80 (ROC)和0.50 (PR曲线)AUC值，精密度和召回率分别为0.63和0.32。这些预测显著高于随机猜测，0.20 (pr曲线)AUC, 0.21和0.21精度和召回率，并随后导致强相关测试，杀死45%以上的相关突变比随机抽样的突变(无论是从那些住在改变的组件或从改变的线)。我们的研究结果还表明，MuDelta在故障引入提交中选择的故障显示能力高出27%的突变体。综上所述，我们的结果证实了基于提交的突变测试适用于进化软件的结论。

{"title":"MuDelta: Delta-Oriented Mutation Testing at Commit Time","authors":"Wei Ma, T. Chekam, Mike Papadakis, M. Harman","doi":"10.1109/ICSE43902.2021.00086","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00086","url":null,"abstract":"To effectively test program changes using mutation testing, one needs to use mutants that are relevant to the altered program behaviours. In view of this, we introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviours. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR Curve) AUC values with 0.63 and 0.32 precision and recall values. These predictions are significantly higher than random guesses, 0.20 (PR-Curve) AUC, 0.21 and 0.21 precision and recall, and subsequently lead to strong relevant tests that kill 45%more relevant mutants than randomly sampled mutants (either sampled from those residing on the changed component(s) or from the changed lines). Our results also show that MuDelta selects mutants with 27% higher fault revealing ability in fault introducing commits. Taken together, our results corroborate the conclusion that commit-based mutation testing is suitable and promising for evolving software.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"31 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113956931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Data-Oriented Differential Testing of Object-Relational Mapping Systems 面向数据的对象-关系映射系统差分测试

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00137

Thodoris Sotiropoulos, Stefanos Chaliasos, Vaggelis Atlidakis, Dimitris Mitropoulos, D. Spinellis

We introduce, what is to the best of our knowledge, the first approach for systematically testing Object-Relational Mapping (ORM) systems. Our approach leverages differential testing to establish a test oracle for ORM-specific bugs. Specifically, we first generate random relational database schemas, set up the respective databases, and then, we query these databases using the APIs of the ORM systems under test. To tackle the challenge that ORMs lack a common input language, we generate queries written in an abstract query language. These abstract queries are translated into concrete, executable ORM queries, which are ultimately used to differentially test the correctness of target implementations. The effectiveness of our method heavily relies on the data inserted to the underlying databases. Therefore, we employ a solver-based approach for producing targeted database records with respect to the constraints of the generated queries. We implement our approach as a tool, called CYNTHIA, which found 28 bugs in five popular ORM systems. The vast majority of these bugs are confirmed (25 / 28), more than half were fixed (20 / 28), and three were marked as release blockers by the corresponding developers.

据我们所知，我们介绍了系统地测试对象关系映射(ORM)系统的第一种方法。我们的方法利用差异测试来为orm特定的bug建立一个测试oracle。具体来说，我们首先生成随机关系数据库模式，设置相应的数据库，然后，我们使用被测ORM系统的api查询这些数据库。为了解决orm缺乏公共输入语言的问题，我们生成了用抽象查询语言编写的查询。这些抽象查询被转换成具体的、可执行的ORM查询，最终用于区别测试目标实现的正确性。我们的方法的有效性在很大程度上依赖于插入到底层数据库中的数据。因此，我们采用基于求解器的方法，根据生成的查询的约束来生成目标数据库记录。我们将我们的方法作为一个名为CYNTHIA的工具来实现，该工具在5个流行的ORM系统中发现了28个bug。绝大多数的bug被确认(25 / 28)，超过一半的bug被修复(20 / 28)，三个bug被相应的开发者标记为发布阻碍。

{"title":"Data-Oriented Differential Testing of Object-Relational Mapping Systems","authors":"Thodoris Sotiropoulos, Stefanos Chaliasos, Vaggelis Atlidakis, Dimitris Mitropoulos, D. Spinellis","doi":"10.1109/ICSE43902.2021.00137","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00137","url":null,"abstract":"We introduce, what is to the best of our knowledge, the first approach for systematically testing Object-Relational Mapping (ORM) systems. Our approach leverages differential testing to establish a test oracle for ORM-specific bugs. Specifically, we first generate random relational database schemas, set up the respective databases, and then, we query these databases using the APIs of the ORM systems under test. To tackle the challenge that ORMs lack a common input language, we generate queries written in an abstract query language. These abstract queries are translated into concrete, executable ORM queries, which are ultimately used to differentially test the correctness of target implementations. The effectiveness of our method heavily relies on the data inserted to the underlying databases. Therefore, we employ a solver-based approach for producing targeted database records with respect to the constraints of the generated queries. We implement our approach as a tool, called CYNTHIA, which found 28 bugs in five popular ORM systems. The vast majority of these bugs are confirmed (25 / 28), more than half were fixed (20 / 28), and three were marked as release blockers by the corresponding developers.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114101646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13