D. Spadini, Fabio Palomba, T. Baum, Stefan Hanenberg, M. Bruntink, Alberto Bacchelli
Test-Driven Code Review (TDR) is a code review practice in which a reviewer inspects a patch by examining the changed test code before the changed production code. Although this practice has been mentioned positively by practitioners in informal literature and interviews, there is no systematic knowledge of its effects, prevalence, problems, and advantages. In this paper, we aim at empirically understanding whether this practice has an effect on code review effectiveness and how developers' perceive TDR. We conduct (i) a controlled experiment with 93 developers that perform more than 150 reviews, and (ii) 9 semi-structured interviews and a survey with 103 respondents to gather information on how TDR is perceived. Key results from the experiment show that developers adopting TDR find the same proportion of defects in production code, but more in test code, at the expenses of fewer maintainability issues in production code. Furthermore, we found that most developers prefer to review production code as they deem it more critical and tests should follow from it. Moreover, general poor test code quality and no tool support hinder the adoption of TDR. Public preprint: https://doi.org/10.5281/zenodo.2551217, data and materials: https://doi.org/10.5281/zenodo.2553139
{"title":"Test-Driven Code Review: An Empirical Study","authors":"D. Spadini, Fabio Palomba, T. Baum, Stefan Hanenberg, M. Bruntink, Alberto Bacchelli","doi":"10.1109/ICSE.2019.00110","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00110","url":null,"abstract":"Test-Driven Code Review (TDR) is a code review practice in which a reviewer inspects a patch by examining the changed test code before the changed production code. Although this practice has been mentioned positively by practitioners in informal literature and interviews, there is no systematic knowledge of its effects, prevalence, problems, and advantages. In this paper, we aim at empirically understanding whether this practice has an effect on code review effectiveness and how developers' perceive TDR. We conduct (i) a controlled experiment with 93 developers that perform more than 150 reviews, and (ii) 9 semi-structured interviews and a survey with 103 respondents to gather information on how TDR is perceived. Key results from the experiment show that developers adopting TDR find the same proportion of defects in production code, but more in test code, at the expenses of fewer maintainability issues in production code. Furthermore, we found that most developers prefer to review production code as they deem it more critical and tests should follow from it. Moreover, general poor test code quality and no tool support hinder the adoption of TDR. Public preprint: https://doi.org/10.5281/zenodo.2551217, data and materials: https://doi.org/10.5281/zenodo.2553139","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"423 1","pages":"1061-1072"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75047860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many mobile applications (i.e., apps) include UI widgets to use or collect users' sensitive data. Thus, to identify suspicious sensitive data usage such as UI-permission mismatch, it is crucial to understand the intentions of UI widgets. However, many UI widgets leverage icons of specific shapes (object icons) and icons embedded with text (text icons) to express their intentions, posing challenges for existing detection techniques that analyze only textual data to identify sensitive UI widgets. In this work, we propose a novel app analysis framework, ICONINTENT, that synergistically combines program analysis and icon classification to identify sensitive UI widgets in Android apps. ICONINTENT automatically associates UI widgets and icons via static analysis on app's UI layout files and code, and then adapts computer vision techniques to classify the associated icons into eight categories of sensitive data. Our evaluations of ICONINTENT on 150 apps from Google Play show that ICONINTENT can detect 248 sensitive UI widgets in 97 apps, achieving a precision of 82.4%. When combined with SUPOR, the state-of-the-art sensitive UI widget identification technique based on text analysis, SUPOR +ICONINTENT can detect 487 sensitive UI widgets (101.2% improvement over SUPOR only), and reduces suspicious permissions to be inspected by 50.7% (129.4% improvement over SUPOR only).
{"title":"IconIntent: Automatic Identification of Sensitive UI Widgets Based on Icon Classification for Android Apps","authors":"Xusheng Xiao, Xiaoyin Wang, Zhihao Cao, Hanlin Wang, Peng Gao","doi":"10.1109/ICSE.2019.00041","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00041","url":null,"abstract":"Many mobile applications (i.e., apps) include UI widgets to use or collect users' sensitive data. Thus, to identify suspicious sensitive data usage such as UI-permission mismatch, it is crucial to understand the intentions of UI widgets. However, many UI widgets leverage icons of specific shapes (object icons) and icons embedded with text (text icons) to express their intentions, posing challenges for existing detection techniques that analyze only textual data to identify sensitive UI widgets. In this work, we propose a novel app analysis framework, ICONINTENT, that synergistically combines program analysis and icon classification to identify sensitive UI widgets in Android apps. ICONINTENT automatically associates UI widgets and icons via static analysis on app's UI layout files and code, and then adapts computer vision techniques to classify the associated icons into eight categories of sensitive data. Our evaluations of ICONINTENT on 150 apps from Google Play show that ICONINTENT can detect 248 sensitive UI widgets in 97 apps, achieving a precision of 82.4%. When combined with SUPOR, the state-of-the-art sensitive UI widget identification technique based on text analysis, SUPOR +ICONINTENT can detect 487 sensitive UI widgets (101.2% improvement over SUPOR only), and reduces suspicious permissions to be inspected by 50.7% (129.4% improvement over SUPOR only).","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"126 1","pages":"257-268"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75826196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenguang Zhu, Owolabi Legunsen, A. Shi, Miloš Gligorić
Regression test selection (RTS) reduces regression testing costs by re-running only tests that can change behavior due to code changes. Researchers and large software organizations recently developed and adopted several RTS tools to deal with the rapidly growing costs of regression testing. As RTS tools gain adoption, it becomes critical to check that they are correct and efficient. Unfortunately, checking RTS tools currently relies solely on limited tests that RTS tool developers manually write. We present RTSCheck, the first framework for checking RTS tools. RTSCheck feeds evolving programs (i.e., sequences of program revisions) to an RTS tool and checks the output against rules inspired by existing RTS test suites. Violations of these rules are likely due to deviations from expected RTS tool behavior, and indicative of bugs in the tool. RTSCheck uses three components to obtain evolving programs: (1) AutoEP automatically generates evolving programs and corresponding tests, (2) DefectsEP uses buggy and fixed program revisions from bug databases, and (3) EvoEP uses sequences of program revisions from actual open-source projects' histories. We used RTSCheck to check three recently developed RTS tools for Java: Clover, Ekstazi, and STARTS. RTSCheck discovered 27 bugs in these three tools.
{"title":"A Framework for Checking Regression Test Selection Tools","authors":"Chenguang Zhu, Owolabi Legunsen, A. Shi, Miloš Gligorić","doi":"10.1109/ICSE.2019.00056","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00056","url":null,"abstract":"Regression test selection (RTS) reduces regression testing costs by re-running only tests that can change behavior due to code changes. Researchers and large software organizations recently developed and adopted several RTS tools to deal with the rapidly growing costs of regression testing. As RTS tools gain adoption, it becomes critical to check that they are correct and efficient. Unfortunately, checking RTS tools currently relies solely on limited tests that RTS tool developers manually write. We present RTSCheck, the first framework for checking RTS tools. RTSCheck feeds evolving programs (i.e., sequences of program revisions) to an RTS tool and checks the output against rules inspired by existing RTS test suites. Violations of these rules are likely due to deviations from expected RTS tool behavior, and indicative of bugs in the tool. RTSCheck uses three components to obtain evolving programs: (1) AutoEP automatically generates evolving programs and corresponding tests, (2) DefectsEP uses buggy and fixed program revisions from bug databases, and (3) EvoEP uses sequences of program revisions from actual open-source projects' histories. We used RTSCheck to check three recently developed RTS tools for Java: Clover, Ekstazi, and STARTS. RTSCheck discovered 27 bugs in these three tools.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"13 1","pages":"430-441"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78797011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Wang, Ming Wen, Rongxin Wu, Zhenwei Liu, Shin Hwei Tan, Zhiliang Zhu, Hai Yu, S. Cheung
Intensive use of libraries in Java projects brings potential risk of dependency conflicts, which occur when a project directly or indirectly depends on multiple versions of the same library or class. When this happens, JVM loads one version and shadows the others. Runtime exceptions can occur when methods in the shadowed versions are referenced. Although project management tools such as Maven are able to give warnings of potential dependency conflicts when a project is built, developers often ask for crashing stack traces before examining these warnings. It motivates us to develop Riddle, an automated approach that generates tests and collects crashing stack traces for projects subject to risk of dependency conflicts. Riddle, built on top of Asm and Evosuite, combines condition mutation, search strategies and condition restoration. We applied Riddle on 19 real-world Java projects with duplicate libraries or classes. We reported 20 identified dependency conflicts including their induced crashing stack traces and the details of generated tests. Among them, 15 conflicts were confirmed by developers as real issues, and 10 were readily fixed. The evaluation results demonstrate the effectiveness and usefulness of Riddle.
{"title":"Could I Have a Stack Trace to Examine the Dependency Conflict Issue?","authors":"Ying Wang, Ming Wen, Rongxin Wu, Zhenwei Liu, Shin Hwei Tan, Zhiliang Zhu, Hai Yu, S. Cheung","doi":"10.1109/ICSE.2019.00068","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00068","url":null,"abstract":"Intensive use of libraries in Java projects brings potential risk of dependency conflicts, which occur when a project directly or indirectly depends on multiple versions of the same library or class. When this happens, JVM loads one version and shadows the others. Runtime exceptions can occur when methods in the shadowed versions are referenced. Although project management tools such as Maven are able to give warnings of potential dependency conflicts when a project is built, developers often ask for crashing stack traces before examining these warnings. It motivates us to develop Riddle, an automated approach that generates tests and collects crashing stack traces for projects subject to risk of dependency conflicts. Riddle, built on top of Asm and Evosuite, combines condition mutation, search strategies and condition restoration. We applied Riddle on 19 real-world Java projects with duplicate libraries or classes. We reported 20 identified dependency conflicts including their induced crashing stack traces and the details of generated tests. Among them, 15 conflicts were confirmed by developers as real issues, and 10 were readily fixed. The evaluation results demonstrate the effectiveness and usefulness of Riddle.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"572-583"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90403364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Nguyen, Juri Di Rocco, D. D. Ruscio, Lina Ochoa, Thomas Degueule, M. D. Penta
Software developers interact with APIs on a daily basis and, therefore, often face the need to learn how to use new APIs suitable for their purposes. Previous work has shown that recommending usage patterns to developers facilitates the learning process. Current approaches to usage pattern recommendation, however, still suffer from high redundancy and poor run-time performance. In this paper, we reformulate the problem of usage pattern recommendation in terms of a collaborative-filtering recommender system. We present a new tool, FOCUS, which mines open-source project repositories to recommend API method invocations and usage patterns by analyzing how APIs are used in projects similar to the current project. We evaluate FOCUS on a large number of Java projects extracted from GitHub and Maven Central and find that it outperforms the state-of-the-art approach PAM with regards to success rate, accuracy, and execution time. Results indicate the suitability of context-aware collaborative-filtering recommender systems to provide API usage patterns.
{"title":"FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns","authors":"P. Nguyen, Juri Di Rocco, D. D. Ruscio, Lina Ochoa, Thomas Degueule, M. D. Penta","doi":"10.1109/ICSE.2019.00109","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00109","url":null,"abstract":"Software developers interact with APIs on a daily basis and, therefore, often face the need to learn how to use new APIs suitable for their purposes. Previous work has shown that recommending usage patterns to developers facilitates the learning process. Current approaches to usage pattern recommendation, however, still suffer from high redundancy and poor run-time performance. In this paper, we reformulate the problem of usage pattern recommendation in terms of a collaborative-filtering recommender system. We present a new tool, FOCUS, which mines open-source project repositories to recommend API method invocations and usage patterns by analyzing how APIs are used in projects similar to the current project. We evaluate FOCUS on a large number of Java projects extracted from GitHub and Maven Central and find that it outperforms the state-of-the-art approach PAM with regards to success rate, accuracy, and execution time. Results indicate the suitability of context-aware collaborative-filtering recommender systems to provide API usage patterns.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"1050-1060"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89975688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Heradio, David Fernández-Amorós, Christoph Mayr-Dorn, Alexander Egyed
Variability models are broadly used to specify the configurable features of highly customizable software. In practice, they can be large, defining thousands of features with their dependencies and conflicts. In such cases, visualization techniques and automated analysis support are crucial for understanding the models. This paper contributes to this line of research by presenting a novel, probabilistic foundation for statistical reasoning about variability models. Our approach not only provides a new way to visualize, describe and interpret variability models, but it also supports the improvement of additional state-of-the-art methods for software product lines; for instance, providing exact computations where only approximations were available before, and increasing the sensitivity of existing analysis operations for variability models. We demonstrate the benefits of our approach using real case studies with up to 17,365 features, and written in two different languages (KConfig and feature models).
{"title":"Supporting the Statistical Analysis of Variability Models","authors":"R. Heradio, David Fernández-Amorós, Christoph Mayr-Dorn, Alexander Egyed","doi":"10.1109/ICSE.2019.00091","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00091","url":null,"abstract":"Variability models are broadly used to specify the configurable features of highly customizable software. In practice, they can be large, defining thousands of features with their dependencies and conflicts. In such cases, visualization techniques and automated analysis support are crucial for understanding the models. This paper contributes to this line of research by presenting a novel, probabilistic foundation for statistical reasoning about variability models. Our approach not only provides a new way to visualize, describe and interpret variability models, but it also supports the improvement of additional state-of-the-art methods for software product lines; for instance, providing exact computations where only approximations were available before, and increasing the sensitivity of existing analysis operations for variability models. We demonstrate the benefits of our approach using real case studies with up to 17,365 features, and written in two different languages (KConfig and feature models).","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"28 1","pages":"843-853"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86140773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankit Agrawal, S. Khoshmanesh, Michael Vierhauser, Mona Rahimi, J. Cleland-Huang, R. Lutz
Safety Assurance Cases (SACs) are increasingly used to guide and evaluate the safety of software-intensive systems. They are used to construct a hierarchically organized set of claims, arguments, and evidence in order to provide a structured argument that a system is safe for use. However, as the system evolves and grows in size, a SAC can be difficult to maintain. In this paper we utilize design science to develop a novel solution for identifying areas of a SAC that are affected by changes to the system. Moreover, we generate actionable recommendations for updating the SAC, including its underlying artifacts and trace links, in order to evolve an existing safety case for use in a new version of the system. Our approach, Safety Artifact Forest Analysis (SAFA), leverages traceability to automatically compare software artifacts from a previously approved or certified version with a new version of the system. We identify, visualize, and explain changes in a Delta Tree. We evaluate our approach using the Dronology system for monitoring and coordinating the actions of cooperating, small Unmanned Aerial Vehicles. Results from a user study show that SAFA helped users to identify changes that potentially impacted system safety and provided information that could be used to help maintain and evolve a SAC.
{"title":"Leveraging Artifact Trees to Evolve and Reuse Safety Cases","authors":"Ankit Agrawal, S. Khoshmanesh, Michael Vierhauser, Mona Rahimi, J. Cleland-Huang, R. Lutz","doi":"10.1109/ICSE.2019.00124","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00124","url":null,"abstract":"Safety Assurance Cases (SACs) are increasingly used to guide and evaluate the safety of software-intensive systems. They are used to construct a hierarchically organized set of claims, arguments, and evidence in order to provide a structured argument that a system is safe for use. However, as the system evolves and grows in size, a SAC can be difficult to maintain. In this paper we utilize design science to develop a novel solution for identifying areas of a SAC that are affected by changes to the system. Moreover, we generate actionable recommendations for updating the SAC, including its underlying artifacts and trace links, in order to evolve an existing safety case for use in a new version of the system. Our approach, Safety Artifact Forest Analysis (SAFA), leverages traceability to automatically compare software artifacts from a previously approved or certified version with a new version of the system. We identify, visualize, and explain changes in a Delta Tree. We evaluate our approach using the Dronology system for monitoring and coordinating the actions of cooperating, small Unmanned Aerial Vehicles. Results from a user study show that SAFA helped users to identify changes that potentially impacted system safety and provided information that could be used to help maintain and evolve a SAC.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"25 1","pages":"1222-1233"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78117406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Max Lillack, Stefan Stanciulescu, Wilhelm Hedman, T. Berger, A. Wąsowski
Cloning is a simple way to create new variants of a system. While cheap at first, it increases maintenance cost in the long term. Eventually, the cloned variants need to be integrated into a configurable platform. Such an integration is challenging: it involves merging the usual code improvements between the variants, and also integrating the variable code (features) into the platform. Thus, variant integration differs from traditional soft- ware merging, which does not produce or organize configurable code, but creates a single system that cannot be configured into variants. In practice, variant integration requires fine-grained code edits, performed in an exploratory manner, in multiple iterations. Unfortunately, little tool support exists for integrating cloned variants. In this work, we show that fine-grained code edits needed for integration can be alleviated by a small set of integration intentions-domain-specific actions declared over code snippets controlling the integration. Developers can interactively explore the integration space by declaring (or revoking) intentions on code elements. We contribute the intentions (e.g., 'keep functionality' or 'keep as a configurable feature') and the IDE tool INCLINE, which implements the intentions and five editable views that visualize the integration process and allow declaring intentions producing a configurable integrated platform. In a series of experiments, we evaluated the completeness of the pro- posed intentions, the correctness and performance of INCLINE, and the benefits of using intentions for variant integration. The experiments show that INCLINE can handle complex integration tasks, that views help to navigate the code, and that it consistently reduces mistakes made by developers during variant integration.
{"title":"Intention-Based Integration of Software Variants","authors":"Max Lillack, Stefan Stanciulescu, Wilhelm Hedman, T. Berger, A. Wąsowski","doi":"10.1109/ICSE.2019.00090","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00090","url":null,"abstract":"Cloning is a simple way to create new variants of a system. While cheap at first, it increases maintenance cost in the long term. Eventually, the cloned variants need to be integrated into a configurable platform. Such an integration is challenging: it involves merging the usual code improvements between the variants, and also integrating the variable code (features) into the platform. Thus, variant integration differs from traditional soft- ware merging, which does not produce or organize configurable code, but creates a single system that cannot be configured into variants. In practice, variant integration requires fine-grained code edits, performed in an exploratory manner, in multiple iterations. Unfortunately, little tool support exists for integrating cloned variants. In this work, we show that fine-grained code edits needed for integration can be alleviated by a small set of integration intentions-domain-specific actions declared over code snippets controlling the integration. Developers can interactively explore the integration space by declaring (or revoking) intentions on code elements. We contribute the intentions (e.g., 'keep functionality' or 'keep as a configurable feature') and the IDE tool INCLINE, which implements the intentions and five editable views that visualize the integration process and allow declaring intentions producing a configurable integrated platform. In a series of experiments, we evaluated the completeness of the pro- posed intentions, the correctness and performance of INCLINE, and the benefits of using intentions for variant integration. The experiments show that INCLINE can handle complex integration tasks, that views help to navigate the code, and that it consistently reduces mistakes made by developers during variant integration.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"13 1","pages":"831-842"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74431010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xie Wang, Huaijin Wang, Z. Su, Enyi Tang, Xin Chen, Weijun Shen, Zhenyu Chen, Linzhang Wang, Xianpei Zhang, Xuandong Li
Numerical code is often applied in the safety-critical, but resource-limited areas. Hence, it is crucial for it to be correct and efficient, both of which are difficult to ensure. On one hand, accumulated rounding errors in numerical programs can cause system failures. On the other hand, arbitrary/infinite-precision arithmetic, although accurate, is infeasible in practice and especially in resource-limited scenarios because it performs thousands of times slower than floating-point arithmetic. Thus, it has been a significant challenge to obtain high-precision, easy-to-maintain, and efficient numerical code. This paper introduces a novel global optimization framework to tackle this challenge. Using our framework, a developer simply writes the infinite-precision numerical program directly following the problem's mathematical requirement specification. The resulting code is correct and easy-to-maintain, but inefficient. Our framework then optimizes the program in a global fashion (i.e., considering the whole program, rather than individual expressions or statements as in prior work), the key technical difficulty this work solves. To this end, it analyzes the program's numerical value flows across different statements through a symbolic trace extraction algorithm, and generates optimized traces via stochastic algebraic transformations guided by effective rule selection. We first evaluate our technique on numerical benchmarks from the literature; results show that our global optimization achieves significantly higher worst-case accuracy than the state-of-the-art numerical optimization tool. Second, we show that our framework is also effective on benchmarks having complicated program structures, which are challenging for numerical optimization. Finally, we apply our framework on real-world code to successfully detect numerical bugs that have been confirmed by developers.
{"title":"Global Optimization of Numerical Programs Via Prioritized Stochastic Algebraic Transformations","authors":"Xie Wang, Huaijin Wang, Z. Su, Enyi Tang, Xin Chen, Weijun Shen, Zhenyu Chen, Linzhang Wang, Xianpei Zhang, Xuandong Li","doi":"10.1109/ICSE.2019.00116","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00116","url":null,"abstract":"Numerical code is often applied in the safety-critical, but resource-limited areas. Hence, it is crucial for it to be correct and efficient, both of which are difficult to ensure. On one hand, accumulated rounding errors in numerical programs can cause system failures. On the other hand, arbitrary/infinite-precision arithmetic, although accurate, is infeasible in practice and especially in resource-limited scenarios because it performs thousands of times slower than floating-point arithmetic. Thus, it has been a significant challenge to obtain high-precision, easy-to-maintain, and efficient numerical code. This paper introduces a novel global optimization framework to tackle this challenge. Using our framework, a developer simply writes the infinite-precision numerical program directly following the problem's mathematical requirement specification. The resulting code is correct and easy-to-maintain, but inefficient. Our framework then optimizes the program in a global fashion (i.e., considering the whole program, rather than individual expressions or statements as in prior work), the key technical difficulty this work solves. To this end, it analyzes the program's numerical value flows across different statements through a symbolic trace extraction algorithm, and generates optimized traces via stochastic algebraic transformations guided by effective rule selection. We first evaluate our technique on numerical benchmarks from the literature; results show that our global optimization achieves significantly higher worst-case accuracy than the state-of-the-art numerical optimization tool. Second, we show that our framework is also effective on benchmarks having complicated program structures, which are challenging for numerical optimization. Finally, we apply our framework on real-world code to successfully detect numerical bugs that have been confirmed by developers.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"15 1","pages":"1131-1141"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81507014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naji Dmeiri, David A. Tomassi, Yichen Wang, Antara Bhowmick, Yen-Chuan Liu, Premkumar T. Devanbu, Bogdan Vasilescu, Cindy Rubio-Gonz'alez
Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like TRAVIS-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BUGSWARM, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BUGSWARM toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually.
{"title":"BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes","authors":"Naji Dmeiri, David A. Tomassi, Yichen Wang, Antara Bhowmick, Yen-Chuan Liu, Premkumar T. Devanbu, Bogdan Vasilescu, Cindy Rubio-Gonz'alez","doi":"10.1109/ICSE.2019.00048","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00048","url":null,"abstract":"Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like TRAVIS-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BUGSWARM, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BUGSWARM toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"32 1","pages":"339-349"},"PeriodicalIF":0.0,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90948882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}