Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00100
E. Alomar
Refactoring is a critical task in software maintenance. It is typically performed to enforce best design practices or to cope with design defects. Research in refactoring has been driven by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. In this context, we aim at exploring how developers document their refactoring activities during the software life cycle, we call such activity Self-Affirmed Refactoring (SAR), by understanding developers perception of refactorings so that we can bridge the gap between refactoring and automation in general. We aim in more accurately mimicking the human decision making when recommending better software refactoring and remodularization.
{"title":"Towards Better Understanding Developer Perception of Refactoring","authors":"E. Alomar","doi":"10.1109/ICSME.2019.00100","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00100","url":null,"abstract":"Refactoring is a critical task in software maintenance. It is typically performed to enforce best design practices or to cope with design defects. Research in refactoring has been driven by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. In this context, we aim at exploring how developers document their refactoring activities during the software life cycle, we call such activity Self-Affirmed Refactoring (SAR), by understanding developers perception of refactorings so that we can bridge the gap between refactoring and automation in general. We aim in more accurately mimicking the human decision making when recommending better software refactoring and remodularization.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114226177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00078
Houssem Ben Braiek, Foutse Khomh
The increasing inclusion of Deep Learning (DL) models in safety-critical systems such as autonomous vehicles have led to the development of multiple model-based DL testing techniques. One common denominator of these testing techniques is the automated generation of test cases, e.g., new inputs transformed from the original training data with the aim to optimize some test adequacy criteria. So far, the effectiveness of these approaches has been hindered by their reliance on random fuzzing or transformations that do not always produce test cases with a good diversity. To overcome these limitations, we propose, DeepEvolution, a novel search-based approach for testing DL models that relies on metaheuristics to ensure a maximum diversity in generated test cases. We assess the effectiveness of DeepEvolution in testing computer-vision DL models and found that it significantly increases the neuronal coverage of generated test cases. Moreover, using DeepEvolution, we could successfully find several corner-case behaviors. Finally, DeepEvolution outperformed Tensorfuzz (a coverage-guided fuzzing tool developed at Google Brain) in detecting latent defects introduced during the quantization of the models. These results suggest that search-based approaches can help build effective testing tools for DL systems.
{"title":"DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks","authors":"Houssem Ben Braiek, Foutse Khomh","doi":"10.1109/ICSME.2019.00078","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00078","url":null,"abstract":"The increasing inclusion of Deep Learning (DL) models in safety-critical systems such as autonomous vehicles have led to the development of multiple model-based DL testing techniques. One common denominator of these testing techniques is the automated generation of test cases, e.g., new inputs transformed from the original training data with the aim to optimize some test adequacy criteria. So far, the effectiveness of these approaches has been hindered by their reliance on random fuzzing or transformations that do not always produce test cases with a good diversity. To overcome these limitations, we propose, DeepEvolution, a novel search-based approach for testing DL models that relies on metaheuristics to ensure a maximum diversity in generated test cases. We assess the effectiveness of DeepEvolution in testing computer-vision DL models and found that it significantly increases the neuronal coverage of generated test cases. Moreover, using DeepEvolution, we could successfully find several corner-case behaviors. Finally, DeepEvolution outperformed Tensorfuzz (a coverage-guided fuzzing tool developed at Google Brain) in detecting latent defects introduced during the quantization of the models. These results suggest that search-based approaches can help build effective testing tools for DL systems.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127832627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00032
Ferdian Thung, Hong Jin Kang, Lingxiao Jiang, D. Lo
Deprecation of APIs in software libraries is common when library maintainers make changes to a library and will no longer support certain APIs in the future. When deprecation occurs, developers whose programs depend on the APIs need to replace the usages of the deprecated APIs sooner or later. Often times, software documentation specifies which new APIs the developers should use in place of a deprecated API. However, replacing the usages of a deprecated API remains a challenge since developers may not know exactly how to use the new APIs. The developers would first need to understand the API changes before they can replace the deprecated API correctly. Previous work has proposed an approach to assist developers in deprecatedAndroid API replacement by learning from examples. In this work, we also focus on Android APIs and propose an approach named No Example API Transformation (NEAT) to generate transformation rules that can assist developers in deprecatedAPI replacement even when code examples are not available (e.g., when the deprecation has happened recently). We have validated the effectiveness of NEATon generating transformation rules for deprecated Android APIs. Using NEAT, we can generate 37 transformation rules for 37 out of a selection of 100 deprecatedAPIs and have validated these rules to be correct.
{"title":"Towards Generating Transformation Rules without Examples for Android API Replacement","authors":"Ferdian Thung, Hong Jin Kang, Lingxiao Jiang, D. Lo","doi":"10.1109/ICSME.2019.00032","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00032","url":null,"abstract":"Deprecation of APIs in software libraries is common when library maintainers make changes to a library and will no longer support certain APIs in the future. When deprecation occurs, developers whose programs depend on the APIs need to replace the usages of the deprecated APIs sooner or later. Often times, software documentation specifies which new APIs the developers should use in place of a deprecated API. However, replacing the usages of a deprecated API remains a challenge since developers may not know exactly how to use the new APIs. The developers would first need to understand the API changes before they can replace the deprecated API correctly. Previous work has proposed an approach to assist developers in deprecatedAndroid API replacement by learning from examples. In this work, we also focus on Android APIs and propose an approach named No Example API Transformation (NEAT) to generate transformation rules that can assist developers in deprecatedAPI replacement even when code examples are not available (e.g., when the deprecation has happened recently). We have validated the effectiveness of NEATon generating transformation rules for deprecated Android APIs. Using NEAT, we can generate 37 transformation rules for 37 out of a selection of 100 deprecatedAPIs and have validated these rules to be correct.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124122389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00011
Théo Zimmermann, Annalí Casanueva Artís
For most software projects, the bug tracker is an essential tool. In open source development, this tool plays an even more central role as it is generally open to all users, who are encouraged to test the software and report bugs. Previous studies have highlighted the act of reporting a bug as a first step leading a user to become an active contributor. The impact of the bug reporting environment on the bug tracking activity is difficult to assess because of the lack of comparison points. In this paper, we take advantage of the switch, from Bugzilla to GitHub, of the bug tracker of Coq, a medium-sized open source project, to evaluate and interpret the impact that such a change can have. We first report on the switch itself, including the migration of preexisting issues. Then we analyze data from before and after the switch using a regression discontinuity design, an econometric methodology imported from quantitative policy analysis. We complete this quantitative analysis with qualitative data from interviews with developers. We show that the switch induces an increase in bug reporting, particularly from principal developers themselves, and more generally an increased engagement with the bug tracking platform, with more comments by developers and also more external commentators.
{"title":"Impact of Switching Bug Trackers: A Case Study on a Medium-Sized Open Source Project","authors":"Théo Zimmermann, Annalí Casanueva Artís","doi":"10.1109/ICSME.2019.00011","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00011","url":null,"abstract":"For most software projects, the bug tracker is an essential tool. In open source development, this tool plays an even more central role as it is generally open to all users, who are encouraged to test the software and report bugs. Previous studies have highlighted the act of reporting a bug as a first step leading a user to become an active contributor. The impact of the bug reporting environment on the bug tracking activity is difficult to assess because of the lack of comparison points. In this paper, we take advantage of the switch, from Bugzilla to GitHub, of the bug tracker of Coq, a medium-sized open source project, to evaluate and interpret the impact that such a change can have. We first report on the switch itself, including the migration of preexisting issues. Then we analyze data from before and after the switch using a regression discontinuity design, an econometric methodology imported from quantitative policy analysis. We complete this quantitative analysis with qualitative data from interviews with developers. We show that the switch induces an increase in bug reporting, particularly from principal developers themselves, and more generally an increased engagement with the bug tracking platform, with more comments by developers and also more external commentators.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131798663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00076
A. Alsharif, G. M. Kapfhammer, Phil McMinn
Since relational databases are a key component of software systems ranging from small mobile to large enterprise applications, there are well-studied methods that automatically generate test cases for database-related functionality. Yet, there has been no research to analyze how well testers - who must often serve as an "oracle" - both understand tests involving SQL and decide if they reveal flaws. This paper reports on a human study of test comprehension in the context of automatically generated tests that assess the correct specification of the integrity constraints in a relational database schema. In this domain, a tool generates INSERT statements with data values designed to either satisfy (i.e., be accepted into the database) or violate the schema (i.e., be rejected from the database). The study reveals two key findings. First, the choice of data values in INSERTs influences human understandability: the use of default values for elements not involved in the test (but necessary for adhering to SQL's syntax rules) aided participants, allowing them to easily identify and understand the important test values. Yet, negative numbers and "garbage" strings hindered this process. The second finding is more far reaching: humans found the outcome of test cases very difficult to predict when NULL was used in conjunction with foreign keys and CHECK constraints. This suggests that, while including NULLs can surface the confusing semantics of database schemas, their use makes tests less understandable for humans.
{"title":"What Factors Make SQL Test Cases Understandable for Testers? A Human Study of Automated Test Data Generation Techniques","authors":"A. Alsharif, G. M. Kapfhammer, Phil McMinn","doi":"10.1109/ICSME.2019.00076","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00076","url":null,"abstract":"Since relational databases are a key component of software systems ranging from small mobile to large enterprise applications, there are well-studied methods that automatically generate test cases for database-related functionality. Yet, there has been no research to analyze how well testers - who must often serve as an \"oracle\" - both understand tests involving SQL and decide if they reveal flaws. This paper reports on a human study of test comprehension in the context of automatically generated tests that assess the correct specification of the integrity constraints in a relational database schema. In this domain, a tool generates INSERT statements with data values designed to either satisfy (i.e., be accepted into the database) or violate the schema (i.e., be rejected from the database). The study reveals two key findings. First, the choice of data values in INSERTs influences human understandability: the use of default values for elements not involved in the test (but necessary for adhering to SQL's syntax rules) aided participants, allowing them to easily identify and understand the important test values. Yet, negative numbers and \"garbage\" strings hindered this process. The second finding is more far reaching: humans found the outcome of test cases very difficult to predict when NULL was used in conjunction with foreign keys and CHECK constraints. This suggests that, while including NULLs can surface the confusing semantics of database schemas, their use makes tests less understandable for humans.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131209125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00056
Thomas Wagner, Christian Brem, S. Strobl, T. Grechenig
Re-platforming — the transfer of a legacy system to a new target platform — is often seen as a first cost-reduction step to enable a long term reengineering effort of a legacy system. However, recreating a functionally equivalent ecosystem on a different platform entails significant technical and organisational challenges. This extended abstract describes challenges, their solutions as well as an important lesson learned during a re-platforming project of 40 year old PL/I and COBOL applications.
{"title":"Challenges in re-Platforming Mixed Language PL/I and COBOL IS to an Open Systems Platform","authors":"Thomas Wagner, Christian Brem, S. Strobl, T. Grechenig","doi":"10.1109/ICSME.2019.00056","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00056","url":null,"abstract":"Re-platforming — the transfer of a legacy system to a new target platform — is often seen as a first cost-reduction step to enable a long term reengineering effort of a legacy system. However, recreating a functionally equivalent ecosystem on a different platform entails significant technical and organisational challenges. This extended abstract describes challenges, their solutions as well as an important lesson learned during a re-platforming project of 40 year old PL/I and COBOL applications.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133832436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00031
Quinn Hanam, A. Mesbah, Reid Holmes
Code reviews are often used as a means for developers to manually examine source code changes to ensure the behavioural effects of a change are well understood. Unfortunately, the behavioural impact of a change can include parts of the system outside of the area syntactically affected by the change. In the context of code reviews this can be problematic, as the impact of a change can extend beyond the diff that is presented to the reviewer. Change impact analysis is a promising technique which could potentially assist developers by helping surface parts of the code not present in the diff but that could be affected by the change. In this work we investigate the utility of change impact analysis as a tool for assisting developers understand the effects of code changes. While we find that traditional techniques may not benefit developers, more precise techniques may reduce time and increase accuracy. Specifically, we propose and study a novel technique which extracts semantic, rather than syntactic, change impact relations from JavaScript commits. We (1) define four novel semantic change impact relations and (2) implement an analysis tool called tool that interprets structural changes over partial JavaScript programs to extract these relations. In a study of 2,000 commits from the version history of three popular NodeJS applications, tool reduced false positives by 9–37% and further reduced the size of change impact sets by 19–91% by splitting up unrelated semantic relations, compared to change impact sets computed with Unix diff and control and data dependencies. Additionally, through a user study in which developers performed code review tasks with tool, we found that reducing false positives and providing stronger semantics had a meaningful impact on their ability to find defects within code change diffs.
{"title":"Aiding Code Change Understanding with Semantic Change Impact Analysis","authors":"Quinn Hanam, A. Mesbah, Reid Holmes","doi":"10.1109/ICSME.2019.00031","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00031","url":null,"abstract":"Code reviews are often used as a means for developers to manually examine source code changes to ensure the behavioural effects of a change are well understood. Unfortunately, the behavioural impact of a change can include parts of the system outside of the area syntactically affected by the change. In the context of code reviews this can be problematic, as the impact of a change can extend beyond the diff that is presented to the reviewer. Change impact analysis is a promising technique which could potentially assist developers by helping surface parts of the code not present in the diff but that could be affected by the change. In this work we investigate the utility of change impact analysis as a tool for assisting developers understand the effects of code changes. While we find that traditional techniques may not benefit developers, more precise techniques may reduce time and increase accuracy. Specifically, we propose and study a novel technique which extracts semantic, rather than syntactic, change impact relations from JavaScript commits. We (1) define four novel semantic change impact relations and (2) implement an analysis tool called tool that interprets structural changes over partial JavaScript programs to extract these relations. In a study of 2,000 commits from the version history of three popular NodeJS applications, tool reduced false positives by 9–37% and further reduced the size of change impact sets by 19–91% by splitting up unrelated semantic relations, compared to change impact sets computed with Unix diff and control and data dependencies. Additionally, through a user study in which developers performed code review tasks with tool, we found that reducing false positives and providing stronger semantics had a meaningful impact on their ability to find defects within code change diffs.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132756695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00102
Deema Alshoaibi
Performance regression testing is highly expensive as it delays system development when optimally conducted after each code change. As a result, performance regression testing should be devoted to code changes highly probably encountering regression. In this context, recent studies focus on the early identification of potentially problematic code changes through characterizing them using static and dynamic metrics. The aim of my research thesis is to support performance regression by better identifying and characterizing performance regression introducing code changes. Our first contribution has tackled the detection of these changes as an optimization problem. Our proposed approach used a combination of static and dynamic metrics and built using evolutionary computation, a detection rule, which was shown to outperform recent state-of-the-art studies. To extend our research, we are planning to increase metrics used, to better profile problematic code changes. We also plan on reducing the identification cost by searching for a traedeoff that reduces the use of dynamic metrics, while maintaining the detection performance. In addition, we would like to prioritize test case based on code changes characteristics to be conducted when regression predicted.
{"title":"Characterizing Performance Regression Introducing Code Changes","authors":"Deema Alshoaibi","doi":"10.1109/ICSME.2019.00102","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00102","url":null,"abstract":"Performance regression testing is highly expensive as it delays system development when optimally conducted after each code change. As a result, performance regression testing should be devoted to code changes highly probably encountering regression. In this context, recent studies focus on the early identification of potentially problematic code changes through characterizing them using static and dynamic metrics. The aim of my research thesis is to support performance regression by better identifying and characterizing performance regression introducing code changes. Our first contribution has tackled the detection of these changes as an optimization problem. Our proposed approach used a combination of static and dynamic metrics and built using evolutionary computation, a detection rule, which was shown to outperform recent state-of-the-art studies. To extend our research, we are planning to increase metrics used, to better profile problematic code changes. We also plan on reducing the identification cost by searching for a traedeoff that reduces the use of dynamic metrics, while maintaining the detection performance. In addition, we would like to prioritize test case based on code changes characteristics to be conducted when regression predicted.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133356250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00091
Zack Coker, D. Widder, Claire Le Goues, C. Bogart, Joshua Sunshine
Features of frameworks, such as inversion of control and the structure of framework applications, require developers to adjust their programming and debugging strategies as compared to sequential programs. However, the benefits and challenges of framework debugging are not fully understood, and gaining this knowledge could provide guidance in debugging strategies and framework tool design. To gain insight into the framework application debugging process, we performed two human studies investigating how developers fix applications that use a framework API incorrectly. These studies focused on the Android Fragment class and the ROS framework. We analyzed the results of the studies using a mixed-methods approach, using techniques from qualitative approaches. Our analysis found that participants benefited from the structure of frameworks and the pre-made solutions to common problems in the domain. Participants encountered challenges with understanding frame-work abstractions, and had particular difficulty with inversion of control and object protocol issues. When compared to prior work on debugging, these results show that framework applications have unique debugging challenges.
{"title":"A Qualitative Study on Framework Debugging","authors":"Zack Coker, D. Widder, Claire Le Goues, C. Bogart, Joshua Sunshine","doi":"10.1109/ICSME.2019.00091","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00091","url":null,"abstract":"Features of frameworks, such as inversion of control and the structure of framework applications, require developers to adjust their programming and debugging strategies as compared to sequential programs. However, the benefits and challenges of framework debugging are not fully understood, and gaining this knowledge could provide guidance in debugging strategies and framework tool design. To gain insight into the framework application debugging process, we performed two human studies investigating how developers fix applications that use a framework API incorrectly. These studies focused on the Android Fragment class and the ROS framework. We analyzed the results of the studies using a mixed-methods approach, using techniques from qualitative approaches. Our analysis found that participants benefited from the structure of frameworks and the pre-made solutions to common problems in the domain. Participants encountered challenges with understanding frame-work abstractions, and had particular difficulty with inversion of control and object protocol issues. When compared to prior work on debugging, these results show that framework applications have unique debugging challenges.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132920559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICSME.2019.00012
Yigit Küçük, Tim A. D. Henderson, Andy Podgurski
Statistical Fault Localization (SFL) uses coverage profiles (or "spectra") collected from passing and failing tests, together with statistical metrics, which are typically composed of simple estimators, to identify which elements of a program are most likely to have caused observed failures. Previous SFL research has not thoroughly examined how the effectiveness of SFL metrics is related to the proportion of failures in test suites and related quantities. To address this issue, we studied the Defects4J benchmark suite of programs and test suites and found that if a test suite has very few failures, SFL performs poorly. To better understand this phenomenon, we investigated the precision of some statistical estimators of which SFL metrics are composed, as measured by their coefficients of variation. The precision of an embedded estimator, which depends on the dataset, was found to correlate with the effectiveness of a metric containing it: low precision is associated with poor effectiveness. Boosting precision by adding test cases was found to improve overall SFL effectiveness. We present our findings and discuss their implications for the evaluation and use of SFL metrics.
{"title":"The Impact of Rare Failures on Statistical Fault Localization: The Case of the Defects4J Suite","authors":"Yigit Küçük, Tim A. D. Henderson, Andy Podgurski","doi":"10.1109/ICSME.2019.00012","DOIUrl":"https://doi.org/10.1109/ICSME.2019.00012","url":null,"abstract":"Statistical Fault Localization (SFL) uses coverage profiles (or \"spectra\") collected from passing and failing tests, together with statistical metrics, which are typically composed of simple estimators, to identify which elements of a program are most likely to have caused observed failures. Previous SFL research has not thoroughly examined how the effectiveness of SFL metrics is related to the proportion of failures in test suites and related quantities. To address this issue, we studied the Defects4J benchmark suite of programs and test suites and found that if a test suite has very few failures, SFL performs poorly. To better understand this phenomenon, we investigated the precision of some statistical estimators of which SFL metrics are composed, as measured by their coefficients of variation. The precision of an embedded estimator, which depends on the dataset, was found to correlate with the effectiveness of a metric containing it: low precision is associated with poor effectiveness. Boosting precision by adding test cases was found to improve overall SFL effectiveness. We present our findings and discuss their implications for the evaluation and use of SFL metrics.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121975067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}