S. Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, C. Tantithamthavorn
With the rise of the Mining Software Repositories (MSR) field, defect datasets extracted from software repositories play a foundational role in many empirical studies related to software quality. At the core of defect data preparation is the identification of post-release defects. Prior studies leverage many heuristics (e.g., keywords and issue IDs) to identify post-release defects. However, such the heuristic approach is based on several assumptions, which pose common threats to the validity of many studies. In this paper, we set out to investigate the nature of the difference of defect datasets generated by the heuristic approach and the realistic approach that leverages the earliest affected release that is realistically estimated by a software development team for a given defect. In addition, we investigate the impact of defect identification approaches on the predictive accuracy and the ranking of defective modules that are produced by defect models. Through a case study of defect datasets of 32 releases, we find that that the heuristic approach has a large impact on both defect count datasets and binary defect datasets. Surprisingly, we find that the heuristic approach has a minimal impact on defect count models, suggesting that future work should not be too concerned about defect count models that are constructed using heuristic defect datasets. On the other hand, using defect datasets generated by the realistic approach lead to an improvement in the predictive accuracy of defect classification models.
{"title":"Mining Software Defects: Should We Consider Affected Releases?","authors":"S. Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, C. Tantithamthavorn","doi":"10.1109/ICSE.2019.00075","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00075","url":null,"abstract":"With the rise of the Mining Software Repositories (MSR) field, defect datasets extracted from software repositories play a foundational role in many empirical studies related to software quality. At the core of defect data preparation is the identification of post-release defects. Prior studies leverage many heuristics (e.g., keywords and issue IDs) to identify post-release defects. However, such the heuristic approach is based on several assumptions, which pose common threats to the validity of many studies. In this paper, we set out to investigate the nature of the difference of defect datasets generated by the heuristic approach and the realistic approach that leverages the earliest affected release that is realistically estimated by a software development team for a given defect. In addition, we investigate the impact of defect identification approaches on the predictive accuracy and the ranking of defective modules that are produced by defect models. Through a case study of defect datasets of 32 releases, we find that that the heuristic approach has a large impact on both defect count datasets and binary defect datasets. Surprisingly, we find that the heuristic approach has a minimal impact on defect count models, suggesting that future work should not be too concerned about defect count models that are constructed using heuristic defect datasets. On the other hand, using defect datasets generated by the realistic approach lead to an improvement in the predictive accuracy of defect classification models.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"46 1","pages":"654-665"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77259891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Hellendoorn, Sebastian Proksch, H. Gall, Alberto Bacchelli
Code completion is commonly used by software developers and is integrated into all major IDE's. Good completion tools can not only save time and effort but may also help avoid incorrect API usage. Many proposed completion tools have shown promising results on synthetic benchmarks, but these benchmarks make no claims about the realism of the completions they test. This lack of grounding in real-world data could hinder our scientific understanding of developer needs and of the efficacy of completion models. This paper presents a case study on 15,000 code completions that were applied by 66 real developers, which we study and contrast with artificial completions to inform future research and tools in this area. We find that synthetic benchmarks misrepresent many aspects of real-world completions; tested completion tools were far less accurate on real-world data. Worse, on the few completions that consumed most of the developers' time, prediction accuracy was less than 20% -- an effect that is invisible in synthetic benchmarks. Our findings have ramifications for future benchmarks, tool design and real-world efficacy: Benchmarks must account for completions that developers use most, such as intra-project APIs; models should be designed to be amenable to intra-project data; and real-world developer trials are essential to quantifying performance on the least predictable completions, which are both most time-consuming and far more typical than artificial data suggests. We publicly release our preprint [https://doi.org/10.5281/zenodo.2565673] and replication data and materials [https://doi.org/10.5281/zenodo.2562249].
{"title":"When Code Completion Fails: A Case Study on Real-World Completions","authors":"V. Hellendoorn, Sebastian Proksch, H. Gall, Alberto Bacchelli","doi":"10.1109/ICSE.2019.00101","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00101","url":null,"abstract":"Code completion is commonly used by software developers and is integrated into all major IDE's. Good completion tools can not only save time and effort but may also help avoid incorrect API usage. Many proposed completion tools have shown promising results on synthetic benchmarks, but these benchmarks make no claims about the realism of the completions they test. This lack of grounding in real-world data could hinder our scientific understanding of developer needs and of the efficacy of completion models. This paper presents a case study on 15,000 code completions that were applied by 66 real developers, which we study and contrast with artificial completions to inform future research and tools in this area. We find that synthetic benchmarks misrepresent many aspects of real-world completions; tested completion tools were far less accurate on real-world data. Worse, on the few completions that consumed most of the developers' time, prediction accuracy was less than 20% -- an effect that is invisible in synthetic benchmarks. Our findings have ramifications for future benchmarks, tool design and real-world efficacy: Benchmarks must account for completions that developers use most, such as intra-project APIs; models should be designed to be amenable to intra-project data; and real-world developer trials are essential to quantifying performance on the least predictable completions, which are both most time-consuming and far more typical than artificial data suggests. We publicly release our preprint [https://doi.org/10.5281/zenodo.2565673] and replication data and materials [https://doi.org/10.5281/zenodo.2562249].","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"8 1","pages":"960-970"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89302549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yibiao Yang, Yuming Zhou, Hao Sun, Z. Su, Zhiqiang Zuo, Lei Xu, Baowen Xu
Reliable code coverage tools are critically important as it is heavily used to facilitate many quality assurance activities, such as software testing, fuzzing, and debugging. However, little attention has been devoted to assessing the reliability of code coverage tools. In this study, we propose a randomized differential testing approach to hunting for bugs in the most widely used C code coverage tools. Specifically, by generating random input programs, our approach seeks for inconsistencies in code coverage reports produced by different code coverage tools, and then identifies inconsistencies as potential code coverage bugs. To effectively report code coverage bugs, we addressed three specific challenges: (1) How to filter out duplicate test programs as many of them triggering the same bugs in code coverage tools; (2) how to automatically reduce large test programs to much smaller ones that have the same properties; and (3) how to determine which code coverage tools have bugs? The extensive evaluations validate the effectiveness of our approach, resulting in 42 and 28 confirmed/fixed bugs for gcov and llvm-cov, respectively. This case study indicates that code coverage tools are not as reliable as it might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of code coverage tools. This work opens up a new direction in code coverage validation which calls for more attention in this area.
{"title":"Hunting for Bugs in Code Coverage Tools via Randomized Differential Testing","authors":"Yibiao Yang, Yuming Zhou, Hao Sun, Z. Su, Zhiqiang Zuo, Lei Xu, Baowen Xu","doi":"10.1109/ICSE.2019.00061","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00061","url":null,"abstract":"Reliable code coverage tools are critically important as it is heavily used to facilitate many quality assurance activities, such as software testing, fuzzing, and debugging. However, little attention has been devoted to assessing the reliability of code coverage tools. In this study, we propose a randomized differential testing approach to hunting for bugs in the most widely used C code coverage tools. Specifically, by generating random input programs, our approach seeks for inconsistencies in code coverage reports produced by different code coverage tools, and then identifies inconsistencies as potential code coverage bugs. To effectively report code coverage bugs, we addressed three specific challenges: (1) How to filter out duplicate test programs as many of them triggering the same bugs in code coverage tools; (2) how to automatically reduce large test programs to much smaller ones that have the same properties; and (3) how to determine which code coverage tools have bugs? The extensive evaluations validate the effectiveness of our approach, resulting in 42 and 28 confirmed/fixed bugs for gcov and llvm-cov, respectively. This case study indicates that code coverage tools are not as reliable as it might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of code coverage tools. This work opens up a new direction in code coverage validation which calls for more attention in this area.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"8 1","pages":"488-499"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86713090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raffi Khatchadourian, Yiming Tang, M. Bagherzadeh, Syed Ahmed
Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel and unorder or de-parallelize already parallel streams. The approach was implemented as a plug-in to the Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ?642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential.
{"title":"Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams","authors":"Raffi Khatchadourian, Yiming Tang, M. Bagherzadeh, Syed Ahmed","doi":"10.1109/ICSE.2019.00072","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00072","url":null,"abstract":"Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel and unorder or de-parallelize already parallel streams. The approach was implemented as a plug-in to the Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ?642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"7 1","pages":"619-630"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81946310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, Z. Su
This paper introduces a new, fully automated modelbased approach for effective testing of Android apps. Different from existing model-based approaches that guide testing with a static GUI model (i.e., the model does not evolve its abstraction during testing, and is thus often imprecise), our approach dynamically optimizes the model by leveraging the runtime information during testing. This capability of model evolution significantly improves model precision, and thus dramatically enhances the testing effectiveness compared to existing approaches, which our evaluation confirms.We have realized our technique in a practical tool, APE. On 15 large, widely-used apps from the Google Play Store, APE outperforms the state-of-the-art Android GUI testing tools in terms of both testing coverage and the number of detected unique crashes. To further demonstrate APE’s effectiveness and usability, we conduct another evaluation of APE on 1,316 popular apps, where it found 537 unique crashes. Out of the 38 reported crashes, 13 have been fixed and 5 have been confirmed.
本文介绍了一种新的、全自动的基于模型的方法来有效地测试Android应用程序。与使用静态GUI模型指导测试的现有基于模型的方法不同(即,模型在测试期间不会发展其抽象,因此通常是不精确的),我们的方法通过在测试期间利用运行时信息来动态优化模型。这种模型演化能力显著提高了模型精度,与现有方法相比显著提高了测试效率,我们的评估证实了这一点。我们已经在一个实用的工具APE中实现了我们的技术。在来自Google Play Store的15款大型且广泛使用的应用中,APE在测试覆盖率和检测到的独特崩溃数量方面都优于最先进的Android GUI测试工具。为了进一步证明APE的有效性和可用性,我们对1,316个流行应用程序进行了另一次APE评估,发现了537个独特的崩溃。在38起报告的撞车事故中,13起已经修复,5起已经确认。
{"title":"Practical GUI Testing of Android Applications Via Model Abstraction and Refinement","authors":"Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, Z. Su","doi":"10.1109/ICSE.2019.00042","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00042","url":null,"abstract":"This paper introduces a new, fully automated modelbased approach for effective testing of Android apps. Different from existing model-based approaches that guide testing with a static GUI model (i.e., the model does not evolve its abstraction during testing, and is thus often imprecise), our approach dynamically optimizes the model by leveraging the runtime information during testing. This capability of model evolution significantly improves model precision, and thus dramatically enhances the testing effectiveness compared to existing approaches, which our evaluation confirms.We have realized our technique in a practical tool, APE. On 15 large, widely-used apps from the Google Play Store, APE outperforms the state-of-the-art Android GUI testing tools in terms of both testing coverage and the number of detected unique crashes. To further demonstrate APE’s effectiveness and usability, we conduct another evaluation of APE on 1,316 popular apps, where it found 537 unique crashes. Out of the 38 reported crashes, 13 have been fixed and 5 have been confirmed.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"5 1","pages":"269-280"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72645971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
End-user programming, particularly trigger-action programming (TAP), is a popular method of letting users express their intent for how smart devices and cloud services interact. Unfortunately, sometimes it can be challenging for users to correctly express their desires through TAP. This paper presents AutoTap, a system that lets novice users easily specify desired properties for devices and services. AutoTap translates these properties to linear temporal logic (LTL) and both automatically synthesizes property-satisfying TAP rules from scratch and repairs existing TAP rules. We designed AutoTap based on a user study about properties users wish to express. Through a second user study, we show that novice users made significantly fewer mistakes when expressing desired behaviors using AutoTap than using TAP rules. Our experiments show that AutoTap is a simple and effective option for expressive end-user programming.
{"title":"AutoTap: Synthesizing and Repairing Trigger-Action Programs Using LTL Properties","authors":"Lefan Zhang, Weijia He, Jesse Martinez, Noah Brackenbury, Shan Lu, Blase Ur","doi":"10.1109/ICSE.2019.00043","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00043","url":null,"abstract":"End-user programming, particularly trigger-action programming (TAP), is a popular method of letting users express their intent for how smart devices and cloud services interact. Unfortunately, sometimes it can be challenging for users to correctly express their desires through TAP. This paper presents AutoTap, a system that lets novice users easily specify desired properties for devices and services. AutoTap translates these properties to linear temporal logic (LTL) and both automatically synthesizes property-satisfying TAP rules from scratch and repairs existing TAP rules. We designed AutoTap based on a user study about properties users wish to express. Through a second user study, we show that novice users made significantly fewer mistakes when expressing desired behaviors using AutoTap than using TAP rules. Our experiments show that AutoTap is a simple and effective option for expressive end-user programming.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"32 1","pages":"281-291"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74800658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Xia, Zhiyuan Wan, Pavneet Singh Kochhar, D. Lo
Coding proficiency is essential to software practitioners. Unfortunately, our understanding on coding proficiency often translates to vague stereotypes, e.g., "able to write good code". The lack of specificity hinders employers from measuring a software engineer's coding proficiency, and software engineers from improving their coding proficiency skills. This raises an important question: what skills matter to improve one's coding proficiency. To answer this question, we perform an empirical study by surveying 340 software practitioners from 33 countries across 5 continents. We first identify 38 coding proficiency skills grouped into nine categories by interviewing 15 developers from three companies. We then ask our survey respondents to rate the level of importance for these skills, and provide rationales of their ratings. Our study highlights a total of 21 important skills that receive an average rating of 4.0 and above (important and very important), along with rationales given by proponents and dissenters. We discuss implications of our findings to researchers, educators, and practitioners.
{"title":"How Practitioners Perceive Coding Proficiency","authors":"Xin Xia, Zhiyuan Wan, Pavneet Singh Kochhar, D. Lo","doi":"10.1109/ICSE.2019.00098","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00098","url":null,"abstract":"Coding proficiency is essential to software practitioners. Unfortunately, our understanding on coding proficiency often translates to vague stereotypes, e.g., \"able to write good code\". The lack of specificity hinders employers from measuring a software engineer's coding proficiency, and software engineers from improving their coding proficiency skills. This raises an important question: what skills matter to improve one's coding proficiency. To answer this question, we perform an empirical study by surveying 340 software practitioners from 33 countries across 5 continents. We first identify 38 coding proficiency skills grouped into nine categories by interviewing 15 developers from three companies. We then ask our survey respondents to rate the level of importance for these skills, and provide rationales of their ratings. Our study highlights a total of 21 important skills that receive an average rating of 4.0 and above (important and very important), along with rationales given by proponents and dissenters. We discuss implications of our findings to researchers, educators, and practitioners.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"54 1","pages":"924-935"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84502542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dehai Zhao, Zhenchang Xing, Chunyang Chen, Xin Xia, Guoqiang Li
Programming screencasts have two important applications in software engineering context: study developer behaviors, information needs and disseminate software engineering knowledge. Although programming screencasts are easy to produce, they are not easy to analyze or index due to the image nature of the data. Existing techniques extract only content from screencasts, but ignore workflow actions by which developers accomplish programming tasks. This significantly limits the effective use of programming screencasts in downstream applications. In this paper, we are the first to present a novel technique for recognizing workflow actions in programming screencasts. Our technique exploits image differencing and Convolutional Neural Network (CNN) to analyze the correspondence and change of consecutive frames, based on which nine classes of frequent developer actions can be recognized from programming screencasts. Using programming screencasts from Youtube, we evaluate different configurations of our CNN model and the performance of our technique for developer action recognition across developers, working environments and programming languages. Using screencasts of developers’ real work, we demonstrate the usefulness of our technique in a practical application for actionaware extraction of key-code frames in developers’ work.
{"title":"ActionNet: Vision-Based Workflow Action Recognition From Programming Screencasts","authors":"Dehai Zhao, Zhenchang Xing, Chunyang Chen, Xin Xia, Guoqiang Li","doi":"10.1109/ICSE.2019.00049","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00049","url":null,"abstract":"Programming screencasts have two important applications in software engineering context: study developer behaviors, information needs and disseminate software engineering knowledge. Although programming screencasts are easy to produce, they are not easy to analyze or index due to the image nature of the data. Existing techniques extract only content from screencasts, but ignore workflow actions by which developers accomplish programming tasks. This significantly limits the effective use of programming screencasts in downstream applications. In this paper, we are the first to present a novel technique for recognizing workflow actions in programming screencasts. Our technique exploits image differencing and Convolutional Neural Network (CNN) to analyze the correspondence and change of consecutive frames, based on which nine classes of frequent developer actions can be recognized from programming screencasts. Using programming screencasts from Youtube, we evaluate different configurations of our CNN model and the performance of our technique for developer action recognition across developers, working environments and programming languages. Using screencasts of developers’ real work, we demonstrate the usefulness of our technique in a practical application for actionaware extraction of key-code frames in developers’ work.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"77 1","pages":"350-361"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83993841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Java is one of the most widely used programming languages. However, the absence of explicit support for architectural constructs, such as software components, in the programming language itself has prevented software developers from achieving the many benefits that come with architecture-based development. To address this issue, Java 9 has introduced the Java Platform Module System (JPMS), resulting in the first instance of encapsulation of modules with rich software architectural interfaces added to a mainstream programming language. The primary goal of JPMS is to construct and maintain large applications efficiently-as well as improve the encapsulation, security, and maintainability of Java applications in general and the JDK itself. A challenge, however, is that module declarations do not necessarily reflect actual usage of modules in an application, allowing developers to mistakenly specify inconsistent dependencies among the modules. In this paper, we formally define 8 inconsistent modular dependencies that may arise in Java-9 applications. We also present DARCY, an approach that leverages these definitions and static program analyses to automatically (1) detect the specified inconsistent dependencies within Java applications and (2) repair those identified inconsistencies. The results of our experiments, conducted over 38 open-source Java-9 applications, indicate that architectural inconsistencies are widespread and demonstrate the benefits of DARCY in automated detection and repair of these inconsistencies.
{"title":"Detection and Repair of Architectural Inconsistencies in Java","authors":"Negar Ghorbani, Joshua Garcia, S. Malek","doi":"10.1109/ICSE.2019.00067","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00067","url":null,"abstract":"Java is one of the most widely used programming languages. However, the absence of explicit support for architectural constructs, such as software components, in the programming language itself has prevented software developers from achieving the many benefits that come with architecture-based development. To address this issue, Java 9 has introduced the Java Platform Module System (JPMS), resulting in the first instance of encapsulation of modules with rich software architectural interfaces added to a mainstream programming language. The primary goal of JPMS is to construct and maintain large applications efficiently-as well as improve the encapsulation, security, and maintainability of Java applications in general and the JDK itself. A challenge, however, is that module declarations do not necessarily reflect actual usage of modules in an application, allowing developers to mistakenly specify inconsistent dependencies among the modules. In this paper, we formally define 8 inconsistent modular dependencies that may arise in Java-9 applications. We also present DARCY, an approach that leverages these definitions and static program analyses to automatically (1) detect the specified inconsistent dependencies within Java applications and (2) repair those identified inconsistencies. The results of our experiments, conducted over 38 open-source Java-9 applications, indicate that architectural inconsistencies are widespread and demonstrate the benefits of DARCY in automated detection and repair of these inconsistencies.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"10 1","pages":"560-571"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83004484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Kaltenecker, A. Grebhahn, Norbert Siegmund, Jianmei Guo, S. Apel
Configurable software systems provide a multitude of configuration options to adjust and optimize their functional and non-functional properties. For instance, to find the fastest configuration for a given setting, a brute-force strategy measures the performance of all configurations, which is typically intractable. Addressing this challenge, state-of-the-art strategies rely on machine learning, analyzing only a few configurations (i.e., a sample set) to predict the performance of other configurations. However, to obtain accurate performance predictions, a representative sample set of configurations is required. Addressing this task, different sampling strategies have been proposed, which come with different advantages (e.g., covering the configuration space systematically) and disadvantages (e.g., the need to enumerate all configurations). In our experiments, we found that most sampling strategies do not achieve a good coverage of the configuration space with respect to covering relevant performance values. That is, they miss important configurations with distinct performance behavior. Based on this observation, we devise a new sampling strategy, called distance-based sampling, that is based on a distance metric and a probability distribution to spread the configurations of the sample set according to a given probability distribution across the configuration space. This way, we cover different kinds of interactions among configuration options in the sample set. To demonstrate the merits of distance-based sampling, we compare it to state-of-the-art sampling strategies, such as t-wise sampling, on $10$ real-world configurable software systems. Our results show that distance-based sampling leads to more accurate performance models for medium to large sample sets.
{"title":"Distance-Based Sampling of Software Configuration Spaces","authors":"Christian Kaltenecker, A. Grebhahn, Norbert Siegmund, Jianmei Guo, S. Apel","doi":"10.1109/ICSE.2019.00112","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00112","url":null,"abstract":"Configurable software systems provide a multitude of configuration options to adjust and optimize their functional and non-functional properties. For instance, to find the fastest configuration for a given setting, a brute-force strategy measures the performance of all configurations, which is typically intractable. Addressing this challenge, state-of-the-art strategies rely on machine learning, analyzing only a few configurations (i.e., a sample set) to predict the performance of other configurations. However, to obtain accurate performance predictions, a representative sample set of configurations is required. Addressing this task, different sampling strategies have been proposed, which come with different advantages (e.g., covering the configuration space systematically) and disadvantages (e.g., the need to enumerate all configurations). In our experiments, we found that most sampling strategies do not achieve a good coverage of the configuration space with respect to covering relevant performance values. That is, they miss important configurations with distinct performance behavior. Based on this observation, we devise a new sampling strategy, called distance-based sampling, that is based on a distance metric and a probability distribution to spread the configurations of the sample set according to a given probability distribution across the configuration space. This way, we cover different kinds of interactions among configuration options in the sample set. To demonstrate the merits of distance-based sampling, we compare it to state-of-the-art sampling strategies, such as t-wise sampling, on $10$ real-world configurable software systems. Our results show that distance-based sampling leads to more accurate performance models for medium to large sample sets.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"1084-1094"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76515171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}