We present Wolverine, an integrated Debug-Repair environment for heap manipulating programs. Wolverine facilitates stepping through a concrete program execution, provides visualizations of the abstract program states (as box-and-arrow diagrams) and integrates a novel, proof-directed repair algorithm to synthesize repair patches. To provide a seamless environment, Wolverine supports "hot-patching" of the generated repair patches, enabling the programmer to continue the debug session without requiring an abort-compile-debug cycle. We also propose new debug-repair possibilities, "specification refinement" and "specification slicing" made possible by Wolverine. We evaluate our framework on 1600 buggy programs (generated using fault injection) on a variety of data-structures like singly, doubly and circular linked-lists, Binary Search Trees, AVL trees, Red-Black trees and Splay trees; Wolverine could repair all the buggy instances within reasonable time (less than 5 sec in most cases). We also evaluate Wolverine on 247 (buggy) student submissions; Wolverine could repair more than 80% of programs where the student had made a reasonable attempt.
{"title":"Synergistic debug-repair of heap manipulations","authors":"Sahil Verma, Subhajit Roy","doi":"10.1145/3106237.3106263","DOIUrl":"https://doi.org/10.1145/3106237.3106263","url":null,"abstract":"We present Wolverine, an integrated Debug-Repair environment for heap manipulating programs. Wolverine facilitates stepping through a concrete program execution, provides visualizations of the abstract program states (as box-and-arrow diagrams) and integrates a novel, proof-directed repair algorithm to synthesize repair patches. To provide a seamless environment, Wolverine supports \"hot-patching\" of the generated repair patches, enabling the programmer to continue the debug session without requiring an abort-compile-debug cycle. We also propose new debug-repair possibilities, \"specification refinement\" and \"specification slicing\" made possible by Wolverine. We evaluate our framework on 1600 buggy programs (generated using fault injection) on a variety of data-structures like singly, doubly and circular linked-lists, Binary Search Trees, AVL trees, Red-Black trees and Splay trees; Wolverine could repair all the buggy instances within reasonable time (less than 5 sec in most cases). We also evaluate Wolverine on 247 (buggy) student submissions; Wolverine could repair more than 80% of programs where the student had made a reasonable attempt.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123787073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dan Gopstein, J. Iannacone, Yu Yan, Lois DeLong, Yanyan Zhuang, M. Yeh, Justin Cappos
Humans often mistake the meaning of source code, and so misjudge a program's true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of `atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback.
{"title":"Understanding misunderstandings in source code","authors":"Dan Gopstein, J. Iannacone, Yu Yan, Lois DeLong, Yanyan Zhuang, M. Yeh, Justin Cappos","doi":"10.1145/3106237.3106264","DOIUrl":"https://doi.org/10.1145/3106237.3106264","url":null,"abstract":"Humans often mistake the meaning of source code, and so misjudge a program's true behavior. These mistakes can be caused by extremely small, isolated patterns in code, which can lead to significant runtime errors. These patterns are used in large, popular software projects and even recommended in style guides. To identify code patterns that may confuse programmers we extracted a preliminary set of `atoms of confusion' from known confusing code. We show empirically in an experiment with 73 participants that these code patterns can lead to a significantly increased rate of misunderstanding versus equivalent code without the patterns. We then go on to take larger confusing programs and measure (in an experiment with 43 participants) the impact, in terms of programmer confusion, of removing these confusing patterns. All of our instruments, analysis code, and data are publicly available online for replication, experimentation, and feedback.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121894560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Test case prioritization (TCP) is a practical activity in software testing for exposing faults earlier. Researchers have proposed many TCP techniques to reorder test cases. Among them, coverage-based TCPs have been widely investigated. Specifically, coverage-based TCP approaches leverage coverage information between source code and test cases, i.e., static code coverage and dynamic code coverage, to schedule test cases. Existing coverage-based TCP techniques mainly focus on maximizing coverage while often do not consider the likely distribution of faults in source code. However, software faults are not often equally distributed in source code, e.g., around 80% faults are located in about 20% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. In this paper, we present a quality-aware test case prioritization technique, QTEP, to address the limitation of existing coverage-based TCP algorithms. In QTEP, we leverage code inspection techniques, i.e., a typical statistic defect prediction model and a typical static bug finder, to detect fault-prone source code and then adapt existing coverage-based TCP algorithms by considering the weighted source code in terms of fault-proneness. Our evaluation with 16 variant QTEP techniques on 33 different versions of 7 open source Java projects shows that QTEP could improve existing coverage-based TCP techniques for both regression and new test cases. Specifically, the improvement of the best variant of QTEP for regression test cases could be up to 15.0% and on average 7.6%, and for all test cases (both regression and new test cases), the improvement could be up to 10.0% and on average 5.0%.
{"title":"QTEP: quality-aware test case prioritization","authors":"Song Wang, Jaechang Nam, Lin Tan","doi":"10.1145/3106237.3106258","DOIUrl":"https://doi.org/10.1145/3106237.3106258","url":null,"abstract":"Test case prioritization (TCP) is a practical activity in software testing for exposing faults earlier. Researchers have proposed many TCP techniques to reorder test cases. Among them, coverage-based TCPs have been widely investigated. Specifically, coverage-based TCP approaches leverage coverage information between source code and test cases, i.e., static code coverage and dynamic code coverage, to schedule test cases. Existing coverage-based TCP techniques mainly focus on maximizing coverage while often do not consider the likely distribution of faults in source code. However, software faults are not often equally distributed in source code, e.g., around 80% faults are located in about 20% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. In this paper, we present a quality-aware test case prioritization technique, QTEP, to address the limitation of existing coverage-based TCP algorithms. In QTEP, we leverage code inspection techniques, i.e., a typical statistic defect prediction model and a typical static bug finder, to detect fault-prone source code and then adapt existing coverage-based TCP algorithms by considering the weighted source code in terms of fault-proneness. Our evaluation with 16 variant QTEP techniques on 33 different versions of 7 open source Java projects shows that QTEP could improve existing coverage-based TCP techniques for both regression and new test cases. Specifically, the improvement of the best variant of QTEP for regression test cases could be up to 15.0% and on average 7.6%, and for all test cases (both regression and new test cases), the improvement could be up to 10.0% and on average 5.0%.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129083073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diptikalyan Saha, Neelamadhav Gantayat, Senthil Mani, Barry Mitchell
With the omnipresence of mobile devices coupled with recent advances in automatic speech recognition capabilities, there has been a growing demand for natural language query (NLQ) interface to retrieve information from the knowledge bases. Business users particularly find this useful as NLQ interface enables them to ask questions without the knowledge of the query language or the data schema. In this paper, we apply an existing research technology called ``ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores'' in the industry domain of SAP-ERP systems. The goal is to enable users to query SAP-ERP data using natural language. We present the challenges and their solutions of such a technology transfer. We present the effectiveness of the natural language query interface on a set of questions given by a set of SAP practitioners.
{"title":"Natural language querying in SAP-ERP platform","authors":"Diptikalyan Saha, Neelamadhav Gantayat, Senthil Mani, Barry Mitchell","doi":"10.1145/3106237.3117765","DOIUrl":"https://doi.org/10.1145/3106237.3117765","url":null,"abstract":"With the omnipresence of mobile devices coupled with recent advances in automatic speech recognition capabilities, there has been a growing demand for natural language query (NLQ) interface to retrieve information from the knowledge bases. Business users particularly find this useful as NLQ interface enables them to ask questions without the knowledge of the query language or the data schema. In this paper, we apply an existing research technology called ``ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores'' in the industry domain of SAP-ERP systems. The goal is to enable users to query SAP-ERP data using natural language. We present the challenges and their solutions of such a technology transfer. We present the effectiveness of the natural language query interface on a set of questions given by a set of SAP practitioners.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130393673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kostas Ferles, Valentin Wüstholz, M. Christakis, Işıl Dillig
This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program P, program trimming generates a new program P' such that P and P' are equi-safe (i.e., P' has a bug if and only if P has a bug), but P' has fewer execution paths than P. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools. In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques.
{"title":"Failure-directed program trimming","authors":"Kostas Ferles, Valentin Wüstholz, M. Christakis, Işıl Dillig","doi":"10.1145/3106237.3106249","DOIUrl":"https://doi.org/10.1145/3106237.3106249","url":null,"abstract":"This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program P, program trimming generates a new program P' such that P and P' are equi-safe (i.e., P' has a bug if and only if P has a bug), but P' has fewer execution paths than P. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools. In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126812243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amazon Web Services unveiled their "Lambda" platform in late 2014. Since then, each of the major cloud computing infrastructure providers has released services supporting a similar style of deployment and operation, where rather than deploying and running monolithic services, or dedicated virtual machines, users are able to deploy individual functions, and pay only for the time that their code is actually executing. These technologies are gathered together under the marketing term "serverless" and the providers suggest that they have the potential to significantly change how client/server applications are designed, developed and operated. This paper presents two case industrial studies of early adopters, showing how migrating an application to the Lambda deployment architecture reduced hosting costs - by between 66% and 95% - and discusses how further adoption of this trend might influence common software architecture design practices.
{"title":"Serverless computing: economic and architectural impact","authors":"Gojko Adzic, Robert Chatley","doi":"10.1145/3106237.3117767","DOIUrl":"https://doi.org/10.1145/3106237.3117767","url":null,"abstract":"Amazon Web Services unveiled their \"Lambda\" platform in late 2014. Since then, each of the major cloud computing infrastructure providers has released services supporting a similar style of deployment and operation, where rather than deploying and running monolithic services, or dedicated virtual machines, users are able to deploy individual functions, and pay only for the time that their code is actually executing. These technologies are gathered together under the marketing term \"serverless\" and the providers suggest that they have the potential to significantly change how client/server applications are designed, developed and operated. This paper presents two case industrial studies of early adopters, showing how migrating an application to the Lambda deployment architecture reduced hosting costs - by between 66% and 95% - and discusses how further adoption of this trend might influence common software architecture design practices.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113957422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this talk, I will discuss the use of software engineering research results in industrial practice, based on two projects I have been involved with. The first project addressed the challenge that manipulation of financial market data had to be expressed precisely for a large number of different financial markets. The challenge was addressed by defining a functional Domain Specific Language (DSL) that was geared towards expressing these manipulations at a high level of abstraction. An environment that implements the DSL was built using the Eclipse platform together with a compiler that generates a Java-based reference implementation of these manipulations. The implementation is used as a test oracle to generate test cases, which are in turn used to validate a soft real-time system that implements these manipulations. In another project that is still ongoing, I have proposed the use of software product line research to engineer a family of mobile banking applications. I will reflect on the experience of integrating software product line principles and modern Agile development practices. I will then discuss a few areas of software engineering research, that I have personally been involved in, that I have found not to be very useful in practice. I will conclude by outlining some topics where novel research results would be very beneficial from an industrial point of view.
在这次演讲中,我将以我参与的两个项目为基础,讨论软件工程研究成果在工业实践中的应用。第一个项目解决了对金融市场数据的操纵必须针对大量不同的金融市场进行精确表达这一挑战。通过定义一种功能性领域特定语言(Domain Specific Language, DSL)解决了这一挑战,该语言旨在在高层次抽象上表达这些操作。实现DSL的环境是使用Eclipse平台以及生成这些操作的基于java的参考实现的编译器构建的。实现被用作测试oracle来生成测试用例,这些用例反过来被用来验证实现这些操作的软实时系统。在另一个仍在进行的项目中,我建议使用软件产品线研究来设计一系列移动银行应用程序。我将反思集成软件产品线原则和现代敏捷开发实践的经验。然后我将讨论软件工程研究的几个领域,这些领域是我个人参与的,但我发现在实践中不是很有用。最后,我将概述一些主题,从工业的角度来看,新颖的研究成果将非常有益。
{"title":"Software engineering research results in industrial practice: a tale of two projects (invited talk)","authors":"W. Emmerich","doi":"10.1145/3106237.3121273","DOIUrl":"https://doi.org/10.1145/3106237.3121273","url":null,"abstract":"In this talk, I will discuss the use of software engineering research results in industrial practice, based on two projects I have been involved with. The first project addressed the challenge that manipulation of financial market data had to be expressed precisely for a large number of different financial markets. The challenge was addressed by defining a functional Domain Specific Language (DSL) that was geared towards expressing these manipulations at a high level of abstraction. An environment that implements the DSL was built using the Eclipse platform together with a compiler that generates a Java-based reference implementation of these manipulations. The implementation is used as a test oracle to generate test cases, which are in turn used to validate a soft real-time system that implements these manipulations. In another project that is still ongoing, I have proposed the use of software product line research to engineer a family of mobile banking applications. I will reflect on the experience of integrating software product line principles and modern Agile development practices. I will then discuss a few areas of software engineering research, that I have personally been involved in, that I have found not to be very useful in practice. I will conclude by outlining some topics where novel research results would be very beneficial from an industrial point of view.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130745748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tegan Brennan, Nestan Tsiskaridze, Nicolás Rosner, Abdulbaki Aydin, T. Bultan
Symbolic program analysis techniques rely on satisfiability-checking constraint solvers, while quantitative program analysis techniques rely on model-counting constraint solvers. Hence, the efficiency of satisfiability checking and model counting is crucial for efficiency of modern program analysis techniques. In this paper, we present a constraint caching framework to expedite potentially expensive satisfiability and model-counting queries. Integral to this framework is our new constraint normalization procedure under which the cardinality of the solution set of a constraint, but not necessarily the solution set itself, is preserved. We extend these constraint normalization techniques to string constraints in order to support analysis of string-manipulating code. A group-theoretic framework which generalizes earlier results on constraint normalization is used to express our normalization techniques. We also present a parameterized caching approach where, in addition to storing the result of a model-counting query, we also store a model-counter object in the constraint store that allows us to efficiently recount the number of satisfying models for different maximum bounds. We implement our caching framework in our tool Cashew, which is built as an extension of the Green caching framework, and integrate it with the symbolic execution tool Symbolic PathFinder (SPF) and the model-counting constraint solver ABC. Our experiments show that constraint caching can significantly improve the performance of symbolic and quantitative program analyses. For instance, Cashew can normalize the 10,104 unique constraints in the SMC/Kaluza benchmark down to 394 normal forms, achieve a 10x speedup on the SMC/Kaluza-Big dataset, and an average 3x speedup in our SPF-based side-channel analysis experiments.
{"title":"Constraint normalization and parameterized caching for quantitative program analysis","authors":"Tegan Brennan, Nestan Tsiskaridze, Nicolás Rosner, Abdulbaki Aydin, T. Bultan","doi":"10.1145/3106237.3106303","DOIUrl":"https://doi.org/10.1145/3106237.3106303","url":null,"abstract":"Symbolic program analysis techniques rely on satisfiability-checking constraint solvers, while quantitative program analysis techniques rely on model-counting constraint solvers. Hence, the efficiency of satisfiability checking and model counting is crucial for efficiency of modern program analysis techniques. In this paper, we present a constraint caching framework to expedite potentially expensive satisfiability and model-counting queries. Integral to this framework is our new constraint normalization procedure under which the cardinality of the solution set of a constraint, but not necessarily the solution set itself, is preserved. We extend these constraint normalization techniques to string constraints in order to support analysis of string-manipulating code. A group-theoretic framework which generalizes earlier results on constraint normalization is used to express our normalization techniques. We also present a parameterized caching approach where, in addition to storing the result of a model-counting query, we also store a model-counter object in the constraint store that allows us to efficiently recount the number of satisfying models for different maximum bounds. We implement our caching framework in our tool Cashew, which is built as an extension of the Green caching framework, and integrate it with the symbolic execution tool Symbolic PathFinder (SPF) and the model-counting constraint solver ABC. Our experiments show that constraint caching can significantly improve the performance of symbolic and quantitative program analyses. For instance, Cashew can normalize the 10,104 unique constraints in the SMC/Kaluza benchmark down to 394 normal forms, achieve a 10x speedup on the SMC/Kaluza-Big dataset, and an average 3x speedup in our SPF-based side-channel analysis experiments.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128912420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Formal specifications of library functions play a critical role in a number of program analysis and development tasks. We present Bach, a technique for discovering likely relational specifications from data describing input-output behavior of a set of functions comprising a library or a program. Relational specifications correlate different executions of different functions; for instance, commutativity, transitivity, equivalence of two functions, etc. Bach combines novel insights from program synthesis and databases to discover a rich array of specifications. We apply Bach to learn specifications from data generated for a number of standard libraries. Our experimental evaluation demonstrates Bach's ability to learn useful and deep specifications in a small amount of time.
{"title":"Discovering relational specifications","authors":"Calvin Smith, G. Ferns, Aws Albarghouthi","doi":"10.1145/3106237.3106279","DOIUrl":"https://doi.org/10.1145/3106237.3106279","url":null,"abstract":"Formal specifications of library functions play a critical role in a number of program analysis and development tasks. We present Bach, a technique for discovering likely relational specifications from data describing input-output behavior of a set of functions comprising a library or a program. Relational specifications correlate different executions of different functions; for instance, commutativity, transitivity, equivalence of two functions, etc. Bach combines novel insights from program synthesis and databases to discover a rich array of specifications. We apply Bach to learn specifications from data generated for a number of standard libraries. Our experimental evaluation demonstrates Bach's ability to learn useful and deep specifications in a small amount of time.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115433704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Castelluccio, Carlo Sansone, L. Verdoliva, G. Poggi
We devised an algorithm, inspired by contrast-set mining algorithms such as STUCCO, to automatically find statistically significant properties (correlations) in crash groups. Many earlier works focused on improving the clustering of crashes but, to the best of our knowledge, the problem of automatically describing properties of a cluster of crashes is so far unexplored. This means developers currently spend a fair amount of time analyzing the groups themselves, which in turn means that a) they are not spending their time actually developing a fix for the crash; and b) they might miss something in their exploration of the crash data (there is a large number of attributes in crash reports and it is hard and error-prone to manually analyze everything). Our algorithm helps developers and release managers understand crash reports more easily and in an automated way, helping in pinpointing the root cause of the crash. The tool implementing the algorithm has been deployed on Mozilla's crash reporting service.
{"title":"Automatically analyzing groups of crashes for finding correlations","authors":"M. Castelluccio, Carlo Sansone, L. Verdoliva, G. Poggi","doi":"10.1145/3106237.3106306","DOIUrl":"https://doi.org/10.1145/3106237.3106306","url":null,"abstract":"We devised an algorithm, inspired by contrast-set mining algorithms such as STUCCO, to automatically find statistically significant properties (correlations) in crash groups. Many earlier works focused on improving the clustering of crashes but, to the best of our knowledge, the problem of automatically describing properties of a cluster of crashes is so far unexplored. This means developers currently spend a fair amount of time analyzing the groups themselves, which in turn means that a) they are not spending their time actually developing a fix for the crash; and b) they might miss something in their exploration of the crash data (there is a large number of attributes in crash reports and it is hard and error-prone to manually analyze everything). Our algorithm helps developers and release managers understand crash reports more easily and in an automated way, helping in pinpointing the root cause of the crash. The tool implementing the algorithm has been deployed on Mozilla's crash reporting service.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115750008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}