Ferhat Erata, Arda Goknil, B. Tekinerdogan, G. Kardas
We present Tarski, a tool for specifying configurable trace semantics to facilitate automated reasoning about traces. Software development projects require that various types of traces be modeled between and within development artifacts. For any given artifact (e.g., requirements, architecture models and source code), Tarski allows the user to specify new trace types and their configurable semantics, while, using the semantics, it automatically infers new traces based on existing traces provided by the user, and checks the consistency of traces. It has been evaluated on three industrial case studies in the automotive domain (https://modelwriter.github.io/Tarski/).
{"title":"A tool for automated reasoning about traces based on configurable formal semantics","authors":"Ferhat Erata, Arda Goknil, B. Tekinerdogan, G. Kardas","doi":"10.1145/3106237.3122825","DOIUrl":"https://doi.org/10.1145/3106237.3122825","url":null,"abstract":"We present Tarski, a tool for specifying configurable trace semantics to facilitate automated reasoning about traces. Software development projects require that various types of traces be modeled between and within development artifacts. For any given artifact (e.g., requirements, architecture models and source code), Tarski allows the user to specify new trace types and their configurable semantics, while, using the semantics, it automatically infers new traces based on existing traces provided by the user, and checks the consistency of traces. It has been evaluated on three industrial case studies in the automotive domain (https://modelwriter.github.io/Tarski/).","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115835061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Programmable logic controllers (PLCs) are specialized computers for automating a wide range of cyber-physical systems. Since these systems are often safety-critical, software running on PLCs need to be free of programming errors. However, automated tools for testing PLC software are lacking despite the pervasive use of PLCs in industry. We propose a symbolic execution based method, named SymPLC, for automatically testing PLC software written in programming languages specified in the IEC 61131-3 standard. SymPLC takes the PLC source code as input and translates it into C before applying symbolic execution, to systematically generate test inputs that cover both paths in each periodic task and interleavings of these tasks. Toward this end, we propose a number of PLC-specific reduction techniques for identifying and eliminating redundant interleavings. We have evaluated SymPLC on a large set of benchmark programs with both single and multiple tasks. Our experiments show that SymPLC can handle these programs efficiently, and for multi-task PLC programs, our new reduction techniques outperform the state-of-the-art partial order reduction technique by more than two orders of magnitude.
{"title":"Symbolic execution of programmable logic controller code","authors":"Shengjian Guo, Meng Wu, Chao Wang","doi":"10.1145/3106237.3106245","DOIUrl":"https://doi.org/10.1145/3106237.3106245","url":null,"abstract":"Programmable logic controllers (PLCs) are specialized computers for automating a wide range of cyber-physical systems. Since these systems are often safety-critical, software running on PLCs need to be free of programming errors. However, automated tools for testing PLC software are lacking despite the pervasive use of PLCs in industry. We propose a symbolic execution based method, named SymPLC, for automatically testing PLC software written in programming languages specified in the IEC 61131-3 standard. SymPLC takes the PLC source code as input and translates it into C before applying symbolic execution, to systematically generate test inputs that cover both paths in each periodic task and interleavings of these tasks. Toward this end, we propose a number of PLC-specific reduction techniques for identifying and eliminating redundant interleavings. We have evaluated SymPLC on a large set of benchmark programs with both single and multiple tasks. Our experiments show that SymPLC can handle these programs efficiently, and for multi-task PLC programs, our new reduction techniques outperform the state-of-the-art partial order reduction technique by more than two orders of magnitude.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131367434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The most criticized problem in the Android ecosystem is fragmentation, i.e., 24,093 Android devices in the wild are made by 1,294 manufacturers and installed with extremely customized operating systems. The existence of so many different active versions of Android makes security updates and vulnerability responses across the whole range of Android devices difficult. In this paper, we seek to respond to the unpatched kernel vulnerabilities for the fragmented Android devices. Specifically, we propose and implement LaChouTi, which is an automated kernel security update framework consisting of cloud service and end application update. LaChouTi first tracks and identifies the exposed vulnerabilities according to the CVE-Patch map for the target Android kernels. Then, it generates differential binary patches for the identified results. Finally, it pushes and applies the patches to the kernels. We evaluate LaChouTi using 12 Nexus Android devices that have different Android versions, different kernel versions, different series and different manufacturers, and find 1922 unpatched kernel vulnerabilities in these devices. The results show that: (1) the security risk of unpatched vulnerabilities caused by fragmentation is serious; and (2) the proposed LaChouTi is effective in responding to such security risk. Finally, we implement LaChouTi on new commercial devices by collaborating with four internationally renowned manufacturers. The results demonstrate that LaChouTi is effective for the manufacturers'security updates.
{"title":"LaChouTi: kernel vulnerability responding framework for the fragmented Android devices","authors":"Jingzheng Wu, Mutian Yang","doi":"10.1145/3106237.3117768","DOIUrl":"https://doi.org/10.1145/3106237.3117768","url":null,"abstract":"The most criticized problem in the Android ecosystem is fragmentation, i.e., 24,093 Android devices in the wild are made by 1,294 manufacturers and installed with extremely customized operating systems. The existence of so many different active versions of Android makes security updates and vulnerability responses across the whole range of Android devices difficult. In this paper, we seek to respond to the unpatched kernel vulnerabilities for the fragmented Android devices. Specifically, we propose and implement LaChouTi, which is an automated kernel security update framework consisting of cloud service and end application update. LaChouTi first tracks and identifies the exposed vulnerabilities according to the CVE-Patch map for the target Android kernels. Then, it generates differential binary patches for the identified results. Finally, it pushes and applies the patches to the kernels. We evaluate LaChouTi using 12 Nexus Android devices that have different Android versions, different kernel versions, different series and different manufacturers, and find 1922 unpatched kernel vulnerabilities in these devices. The results show that: (1) the security risk of unpatched vulnerabilities caused by fragmentation is serious; and (2) the proposed LaChouTi is effective in responding to such security risk. Finally, we implement LaChouTi on new commercial devices by collaborating with four internationally renowned manufacturers. The results demonstrate that LaChouTi is effective for the manufacturers'security updates.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126845829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vladimir Ivanov, Alan Rogers, G. Succi, Jooyong Yi, V. Zorin
It is a cliche to say that there is a gap between research and practice. As the interest and importance in the practical impact of research has been growing, the gap between research and practice is expected to be narrowing. However, our study reveals that there still seems to be a wide gap. We survey so ware engineers about what they care about when developing so ware. We then compare our survey results with the research topics of the papers published in ICSE/FSE recently. We found the following discrepancy: while so ware engineers care more about so ware development productivity than the quality of so ware, papers on research areas closely related to so ware productivity--such as so ware development process management and so ware development techniques--are significantly less published than papers on so ware verification and validation that account for more than half of publications. We also found that so ware engineers are in great need for techniques for accurate effort estimation, and they are not necessarily knowledgable about techniques they can use to meet their needs.
{"title":"What do software engineers care about? gaps between research and practice","authors":"Vladimir Ivanov, Alan Rogers, G. Succi, Jooyong Yi, V. Zorin","doi":"10.1145/3106237.3117778","DOIUrl":"https://doi.org/10.1145/3106237.3117778","url":null,"abstract":"It is a cliche to say that there is a gap between research and practice. As the interest and importance in the practical impact of research has been growing, the gap between research and practice is expected to be narrowing. However, our study reveals that there still seems to be a wide gap. We survey so ware engineers about what they care about when developing so ware. We then compare our survey results with the research topics of the papers published in ICSE/FSE recently. We found the following discrepancy: while so ware engineers care more about so ware development productivity than the quality of so ware, papers on research areas closely related to so ware productivity--such as so ware development process management and so ware development techniques--are significantly less published than papers on so ware verification and validation that account for more than half of publications. We also found that so ware engineers are in great need for techniques for accurate effort estimation, and they are not necessarily knowledgable about techniques they can use to meet their needs.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125288449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mobile applications may benefit from context awareness since they incur to context changes during their execution and their success depends on the user perceived quality. Context awareness requires context monitoring and system adaptation, these two tasks are very expensive especially in mobile applications. Our research aims at developing a methodology that enables effective context awareness techniques for mobile applications that allows adaptations of the mobile app to context changes so that the desired system quality properties and user satisfaction is maximized. Here effective means selecting a minimum set of context variables to monitor and a minimum set of adaptive tactics to inject into mobile applications that allows to guarantee the required software quality and to maximize the user satisfaction. In this paper, we show the devised methodology on a motivating example, detailing the ongoing work.
{"title":"User- and analysis-driven context aware software development in mobile computing","authors":"Mai Abusair","doi":"10.1145/3106237.3119873","DOIUrl":"https://doi.org/10.1145/3106237.3119873","url":null,"abstract":"Mobile applications may benefit from context awareness since they incur to context changes during their execution and their success depends on the user perceived quality. Context awareness requires context monitoring and system adaptation, these two tasks are very expensive especially in mobile applications. Our research aims at developing a methodology that enables effective context awareness techniques for mobile applications that allows adaptations of the mobile app to context changes so that the desired system quality properties and user satisfaction is maximized. Here effective means selecting a minimum set of context variables to monitor and a minimum set of adaptive tactics to inject into mobile applications that allows to guarantee the required software quality and to maximize the user satisfaction. In this paper, we show the devised methodology on a motivating example, detailing the ongoing work.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"46 11-12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114011847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Variability models are often enriched with attributes, such as performance, that encode the influence of features on the respective attribute. In spite of their importance, there are only few attributed variability models available that have attribute values obtained from empirical, real-world observations and that cover interactions between features. But, what does it mean for research and practice when staying in the comfort zone of developing algorithms and tools in a setting where artificial attribute values are used and where interactions are neglected? This is the central question that we want to answer here. To leave the comfort zone, we use a combination of kernel density estimation and a genetic algorithm to rescale a given (real-world) attribute-value profile to a given variability model. To demonstrate the influence and relevance of realistic attribute values and interactions, we present a replication of a widely recognized, third-party study, into which we introduce realistic attribute values and interactions. We found statistically significant differences between the original study and the replication. We infer lessons learned to conduct experiments that involve attributed variability models. We also provide the accompanying tool Thor for generating attribute values including interactions. Our solution is shown to be agnostic about the given input distribution and to scale to large variability models.
{"title":"Attributed variability models: outside the comfort zone","authors":"Norbert Siegmund, Stefan Sobernig, S. Apel","doi":"10.1145/3106237.3106251","DOIUrl":"https://doi.org/10.1145/3106237.3106251","url":null,"abstract":"Variability models are often enriched with attributes, such as performance, that encode the influence of features on the respective attribute. In spite of their importance, there are only few attributed variability models available that have attribute values obtained from empirical, real-world observations and that cover interactions between features. But, what does it mean for research and practice when staying in the comfort zone of developing algorithms and tools in a setting where artificial attribute values are used and where interactions are neglected? This is the central question that we want to answer here. To leave the comfort zone, we use a combination of kernel density estimation and a genetic algorithm to rescale a given (real-world) attribute-value profile to a given variability model. To demonstrate the influence and relevance of realistic attribute values and interactions, we present a replication of a widely recognized, third-party study, into which we introduce realistic attribute values and interactions. We found statistically significant differences between the original study and the replication. We infer lessons learned to conduct experiments that involve attributed variability models. We also provide the accompanying tool Thor for generating attribute values including interactions. Our solution is shown to be agnostic about the given input distribution and to scale to large variability models.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114507844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thanhvu Nguyen, Timos Antonopoulos, Andrew Ruef, M. Hicks
Numerical invariants, e.g., relationships among numerical variables in a program, represent a useful class of properties to analyze programs. General polynomial invariants represent more complex numerical relations, but they are often required in many scientific and engineering applications. We present NumInv, a tool that implements a counterexample-guided invariant generation (CEGIR) technique to automatically discover numerical invariants, which are polynomial equality and inequality relations among numerical variables. This CEGIR technique infers candidate invariants from program traces and then checks them against the program source code using the KLEE test-input generation tool. If the invariants are incorrect KLEE returns counterexample traces, which help the dynamic inference obtain better results. Existing CEGIR approaches often require sound invariants, however NumInv sacrifices soundness and produces results that KLEE cannot refute within certain time bounds. This design and the use of KLEE as a verifier allow NumInv to discover useful and important numerical invariants for many challenging programs. Preliminary results show that NumInv generates required invariants for understanding and verifying correctness of programs involving complex arithmetic. We also show that NumInv discovers polynomial invariants that capture precise complexity bounds of programs used to benchmark existing static complexity analysis techniques. Finally, we show that NumInv performs competitively comparing to state of the art numerical invariant analysis tools.
{"title":"Counterexample-guided approach to finding numerical invariants","authors":"Thanhvu Nguyen, Timos Antonopoulos, Andrew Ruef, M. Hicks","doi":"10.1145/3106237.3106281","DOIUrl":"https://doi.org/10.1145/3106237.3106281","url":null,"abstract":"Numerical invariants, e.g., relationships among numerical variables in a program, represent a useful class of properties to analyze programs. General polynomial invariants represent more complex numerical relations, but they are often required in many scientific and engineering applications. We present NumInv, a tool that implements a counterexample-guided invariant generation (CEGIR) technique to automatically discover numerical invariants, which are polynomial equality and inequality relations among numerical variables. This CEGIR technique infers candidate invariants from program traces and then checks them against the program source code using the KLEE test-input generation tool. If the invariants are incorrect KLEE returns counterexample traces, which help the dynamic inference obtain better results. Existing CEGIR approaches often require sound invariants, however NumInv sacrifices soundness and produces results that KLEE cannot refute within certain time bounds. This design and the use of KLEE as a verifier allow NumInv to discover useful and important numerical invariants for many challenging programs. Preliminary results show that NumInv generates required invariants for understanding and verifying correctness of programs involving complex arithmetic. We also show that NumInv discovers polynomial invariants that capture precise complexity bounds of programs used to benchmark existing static complexity analysis techniques. Finally, we show that NumInv performs competitively comparing to state of the art numerical invariant analysis tools.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115572542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During software development process, Chinese developers often seek solutions to the technical problems they encounter by searching relevant questions on Q&A sites. When developers fail to find solutions on Q&A sites in Chinese, they could translate their query and search on the English Q&A sites. However, Chinese developers who are non-native English speakers often are not comfortable to ask or search questions in English, as they do not know the proper translation of the Chinese technical words into the English technical words. Furthermore, the process of manually formulating cross-language queries and determining the importance of query words is a tedious and time-consuming process. For the purpose of helping Chinese developers take advantages of the rich knowledge base of the English version of Stack Overflow and simplify the retrieval process, we propose an automated cross-language relevant question retrieval tool (XSearch) to retrieve relevant English questions on Stack Overflow for a given Chinese question. This tool can address the increasing need for developer to solve technical problems by retrieving cross-language relevant Q&A resources. Demo Tool Website: http://172.93.36.10:8080/XSearch Demo Video: https://goo.gl/h57sed
{"title":"XSearch: a domain-specific cross-language relevant question retrieval tool","authors":"Bowen Xu, Zhenchang Xing, Xin Xia, D. Lo, X. Le","doi":"10.1145/3106237.3122820","DOIUrl":"https://doi.org/10.1145/3106237.3122820","url":null,"abstract":"During software development process, Chinese developers often seek solutions to the technical problems they encounter by searching relevant questions on Q&A sites. When developers fail to find solutions on Q&A sites in Chinese, they could translate their query and search on the English Q&A sites. However, Chinese developers who are non-native English speakers often are not comfortable to ask or search questions in English, as they do not know the proper translation of the Chinese technical words into the English technical words. Furthermore, the process of manually formulating cross-language queries and determining the importance of query words is a tedious and time-consuming process. For the purpose of helping Chinese developers take advantages of the rich knowledge base of the English version of Stack Overflow and simplify the retrieval process, we propose an automated cross-language relevant question retrieval tool (XSearch) to retrieve relevant English questions on Stack Overflow for a given Chinese question. This tool can address the increasing need for developer to solve technical problems by retrieving cross-language relevant Q&A resources. Demo Tool Website: http://172.93.36.10:8080/XSearch Demo Video: https://goo.gl/h57sed","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129159077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The demand for high quality mobile applications is constantly rising, especially in mission critical settings. Thus, new software engineering methodologies are needed in order to ensure the desired quality of an application. The research presented proposes a quality assurance methodology for mobile applications through test automation by optimizing test suites. The desired goal is to find a minimal test suite while maintaining efficiency and reducing execution cost. Furthermore to avoid invalidating an optimized test suite as the system under test evolves, the approach further proposes to extract patterns from the applied changes to an application. The evaluation plan comprises a combination of an empirical and an industrial case study based on open source projects and an industrial project in the healthcare domain. It is expected that the presented approach supports the testing process on mobile application platforms.
{"title":"Application of search-based software engineering methodologies for test suite optimization and evolution in mission critical mobile application development","authors":"Andreas Schuler","doi":"10.1145/3106237.3119876","DOIUrl":"https://doi.org/10.1145/3106237.3119876","url":null,"abstract":"The demand for high quality mobile applications is constantly rising, especially in mission critical settings. Thus, new software engineering methodologies are needed in order to ensure the desired quality of an application. The research presented proposes a quality assurance methodology for mobile applications through test automation by optimizing test suites. The desired goal is to find a minimal test suite while maintaining efficiency and reducing execution cost. Furthermore to avoid invalidating an optimized test suite as the system under test evolves, the approach further proposes to extract patterns from the applied changes to an application. The evaluation plan comprises a combination of an empirical and an industrial case study based on open source projects and an industrial project in the healthcare domain. It is expected that the presented approach supports the testing process on mobile application platforms.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116220924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adriaan Labuschagne, Laura Inozemtseva, Reid Holmes
Software defects cost time and money to diagnose and fix. Consequently, developers use a variety of techniques to avoid introducing defects into their systems. However, these techniques have costs of their own; the benefit of using a technique must outweigh the cost of applying it. In this paper we investigate the costs and benefits of automated regression testing in practice. Specifically, we studied 61 projects that use Travis CI, a cloud-based continuous integration tool, in order to examine real test failures that were encountered by the developers of those projects. We determined how the developers resolved the failures they encountered and used this information to classify the failures as being caused by a flaky test, by a bug in the system under test, or by a broken or obsolete test. We consider that test failures caused by bugs represent a benefit of the test suite, while failures caused by broken or obsolete tests represent a test suite maintenance cost. We found that 18% of test suite executions fail and that 13% of these failures are flaky. Of the non-flaky failures, only 74% were caused by a bug in the system under test; the remaining 26% were due to incorrect or obsolete tests. In addition, we found that, in the failed builds, only 0.38% of the test case executions failed and 64% of failed builds contained more than one failed test. Our findings contribute to a wider understanding of the unforeseen costs that can impact the overall cost effectiveness of regression testing in practice. They can also inform research into test case selection techniques, as we have provided an approximate empirical bound on the practical value that could be extracted from such techniques. This value appears to be large, as the 61 systems under study contained nearly 3 million lines of test code and yet over 99% of test case executions could have been eliminated with a perfect oracle.
{"title":"Measuring the cost of regression testing in practice: a study of Java projects using continuous integration","authors":"Adriaan Labuschagne, Laura Inozemtseva, Reid Holmes","doi":"10.1145/3106237.3106288","DOIUrl":"https://doi.org/10.1145/3106237.3106288","url":null,"abstract":"Software defects cost time and money to diagnose and fix. Consequently, developers use a variety of techniques to avoid introducing defects into their systems. However, these techniques have costs of their own; the benefit of using a technique must outweigh the cost of applying it. In this paper we investigate the costs and benefits of automated regression testing in practice. Specifically, we studied 61 projects that use Travis CI, a cloud-based continuous integration tool, in order to examine real test failures that were encountered by the developers of those projects. We determined how the developers resolved the failures they encountered and used this information to classify the failures as being caused by a flaky test, by a bug in the system under test, or by a broken or obsolete test. We consider that test failures caused by bugs represent a benefit of the test suite, while failures caused by broken or obsolete tests represent a test suite maintenance cost. We found that 18% of test suite executions fail and that 13% of these failures are flaky. Of the non-flaky failures, only 74% were caused by a bug in the system under test; the remaining 26% were due to incorrect or obsolete tests. In addition, we found that, in the failed builds, only 0.38% of the test case executions failed and 64% of failed builds contained more than one failed test. Our findings contribute to a wider understanding of the unforeseen costs that can impact the overall cost effectiveness of regression testing in practice. They can also inform research into test case selection techniques, as we have provided an approximate empirical bound on the practical value that could be extracted from such techniques. This value appears to be large, as the 61 systems under study contained nearly 3 million lines of test code and yet over 99% of test case executions could have been eliminated with a perfect oracle.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121658312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}