Brandon Paulsen, Jingbo Wang, Jiawei Wang, Chao Wang
As neural networks make their way into safety-critical systems, where misbehavior can lead to catastrophes, there is a growing interest in certifying the equivalence of two structurally similar neural networks - a problem known as differential verification. For example, compression techniques are often used in practice for deploying trained neural networks on computationally- and energy-constrained devices, which raises the question of how faithfully the compressed network mimics the original network. Unfortunately, existing methods either focus on verifying a single network or rely on loose approximations to prove the equivalence of two networks. Due to overly conservative approximation, differential verification lacks scalability in terms of both accuracy and computational cost. To overcome these problems, we propose NEURODIFF, a symbolic and fine-grained approximation technique that drastically increases the accuracy of differential verification on feed-forward ReLU networks while achieving many orders-of-magnitude speedup. NEURODIFF has two key contributions. The first one is new convex approximations that more accurately bound the difference of two networks under all possible inputs. The second one is judicious use of symbolic variables to represent neurons whose difference bounds have accumulated significant error. We find that these two techniques are complementary, i.e., when combined, the benefit is greater than the sum of their individual benefits. We have evaluated NEURODIFF on a variety of differential verification tasks. Our results show that NEURODIFF is up to 1000X faster and 5X more accurate than the state-of-the-art tool.
{"title":"NEURODIFF: Scalable Differential Verification of Neural Networks using Fine-Grained Approximation","authors":"Brandon Paulsen, Jingbo Wang, Jiawei Wang, Chao Wang","doi":"10.1145/3324884.3416560","DOIUrl":"https://doi.org/10.1145/3324884.3416560","url":null,"abstract":"As neural networks make their way into safety-critical systems, where misbehavior can lead to catastrophes, there is a growing interest in certifying the equivalence of two structurally similar neural networks - a problem known as differential verification. For example, compression techniques are often used in practice for deploying trained neural networks on computationally- and energy-constrained devices, which raises the question of how faithfully the compressed network mimics the original network. Unfortunately, existing methods either focus on verifying a single network or rely on loose approximations to prove the equivalence of two networks. Due to overly conservative approximation, differential verification lacks scalability in terms of both accuracy and computational cost. To overcome these problems, we propose NEURODIFF, a symbolic and fine-grained approximation technique that drastically increases the accuracy of differential verification on feed-forward ReLU networks while achieving many orders-of-magnitude speedup. NEURODIFF has two key contributions. The first one is new convex approximations that more accurately bound the difference of two networks under all possible inputs. The second one is judicious use of symbolic variables to represent neurons whose difference bounds have accumulated significant error. We find that these two techniques are complementary, i.e., when combined, the benefit is greater than the sum of their individual benefits. We have evaluated NEURODIFF on a variety of differential verification tasks. Our results show that NEURODIFF is up to 1000X faster and 5X more accurate than the state-of-the-art tool.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128009586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Smart contracts are Turing-complete programs running on the blockchain. They cannot be modified, even when bugs are detected. The Selfdestruct function is the only way to destroy a contract on the blockchain system and transfer all the Ethers on the contract balance. Thus, many developers use this function to destroy a contract and redeploy a new one when bugs are detected. In this paper, we propose a deep learning-based method to find security issues of Ethereum smart contracts by finding the updated version of a destructed contract. After finding the updated versions, we use open card sorting to find security issues.
{"title":"Finding Ethereum Smart Contracts Security Issues by Comparing History Versions","authors":"Jiachi Chen","doi":"10.1145/3324884.3418923","DOIUrl":"https://doi.org/10.1145/3324884.3418923","url":null,"abstract":"Smart contracts are Turing-complete programs running on the blockchain. They cannot be modified, even when bugs are detected. The Selfdestruct function is the only way to destroy a contract on the blockchain system and transfer all the Ethers on the contract balance. Thus, many developers use this function to destroy a contract and redeploy a new one when bugs are detected. In this paper, we propose a deep learning-based method to find security issues of Ethereum smart contracts by finding the updated version of a destructed contract. After finding the updated versions, we use open card sorting to find security issues.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127335184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Liu, Mingwei Liu, Xin Peng, Christoph Treude, Zhenchang Xing, Xiaoxin Zhang
Developers are concerned with the comparison of similar APIs in terms of their commonalities and (often subtle) differences. Our empirical study of Stack Overflow questions and API documentation confirms that API comparison questions are common and can often be answered by knowledge contained in API reference documentation. Our study also identifies eight types of API statements that are useful for API comparison. Based on these findings, we propose a knowledge graph based approach APIComp that automatically extracts API knowledge from API reference documentation to support the comparison of a pair of API classes or methods from different aspects. Our approach includes an offline phase for constructing an API knowledge graph, and an online phase for generating an API comparison result for a given pair of API elements. Our evaluation shows that the quality of different kinds of extracted knowledge in the API knowledge graph is generally high. Furthermore, the comparison results generated by APIComp are significantly better than those generated by a baseline approach based on heuristic rules and text similarity, and our generated API comparison results are useful for helping developers in API selection tasks.
{"title":"Generating Concept based API Element Comparison Using a Knowledge Graph","authors":"Yang Liu, Mingwei Liu, Xin Peng, Christoph Treude, Zhenchang Xing, Xiaoxin Zhang","doi":"10.1145/3324884.3416628","DOIUrl":"https://doi.org/10.1145/3324884.3416628","url":null,"abstract":"Developers are concerned with the comparison of similar APIs in terms of their commonalities and (often subtle) differences. Our empirical study of Stack Overflow questions and API documentation confirms that API comparison questions are common and can often be answered by knowledge contained in API reference documentation. Our study also identifies eight types of API statements that are useful for API comparison. Based on these findings, we propose a knowledge graph based approach APIComp that automatically extracts API knowledge from API reference documentation to support the comparison of a pair of API classes or methods from different aspects. Our approach includes an offline phase for constructing an API knowledge graph, and an online phase for generating an API comparison result for a given pair of API elements. Our evaluation shows that the quality of different kinds of extracted knowledge in the API knowledge graph is generally high. Furthermore, the comparison results generated by APIComp are significantly better than those generated by a baseline approach based on heuristic rules and text similarity, and our generated API comparison results are useful for helping developers in API selection tasks.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126650387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Songqiang Chen, Xiaoyuan Xie, Bangguo Yin, Yuanxiang Ji, Lin Chen, Baowen Xu
Bug reports in a repository are generally organized line by line in a list-view, with their titles and other meta-data displayed. In this list-view, a concise and precise title plays an important role that enables project practitioners to quickly and correctly digest the core idea of the bug, without carefully reading the corresponding details. However, the quality of bug report titles varies in open-source communities, which may be due to the limited time and unprofessionalism of authors. To help report authors efficiently draft good-quality titles, we propose a method, named iTAPE, to automatically generate titles for their bug reports. iTAPE formulates title generation into a one-sentence summarization task. By properly tackling two domain-specific challenges (i.e. lacking off-the-shelf dataset and handling the low-frequency human-named tokens), iTAPE then generates titles using a Seq2Seq-based model. A comprehensive experimental study shows that iTAPE can obtain fairly satisfactory results, in terms of the comparison with three latest one-sentence summarization works, as well as the feedback from human evaluation.
{"title":"Stay Professional and Efficient: Automatically Generate Titles for Your Bug Reports","authors":"Songqiang Chen, Xiaoyuan Xie, Bangguo Yin, Yuanxiang Ji, Lin Chen, Baowen Xu","doi":"10.1145/3324884.3416538","DOIUrl":"https://doi.org/10.1145/3324884.3416538","url":null,"abstract":"Bug reports in a repository are generally organized line by line in a list-view, with their titles and other meta-data displayed. In this list-view, a concise and precise title plays an important role that enables project practitioners to quickly and correctly digest the core idea of the bug, without carefully reading the corresponding details. However, the quality of bug report titles varies in open-source communities, which may be due to the limited time and unprofessionalism of authors. To help report authors efficiently draft good-quality titles, we propose a method, named iTAPE, to automatically generate titles for their bug reports. iTAPE formulates title generation into a one-sentence summarization task. By properly tackling two domain-specific challenges (i.e. lacking off-the-shelf dataset and handling the low-frequency human-named tokens), iTAPE then generates titles using a Seq2Seq-based model. A comprehensive experimental study shows that iTAPE can obtain fairly satisfactory results, in terms of the comparison with three latest one-sentence summarization works, as well as the feedback from human evaluation.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127257729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hardening is the process of configuring IT systems to ensure the security of the systems' components and data they process or store. The complexity of contemporary IT infrastructures, however, renders manual security hardening and maintenance a daunting task. In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of thi
{"title":"Automated Implementation of Windows-related Security-Configuration Guides","authors":"Patrick Stöckle, Bernd Grobauer, A. Pretschner","doi":"10.1145/3324884.3416540","DOIUrl":"https://doi.org/10.1145/3324884.3416540","url":null,"abstract":"Hardening is the process of configuring IT systems to ensure the security of the systems' components and data they process or store. The complexity of contemporary IT infrastructures, however, renders manual security hardening and maintenance a daunting task. In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. In many organizations, security-configuration guides expressed in the SCAP (Security Content Automation Protocol) are used as a basis for hardening, but these guides by themselves provide no means for automatically implementing the required configurations. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of this approach can extract and implement 83% of the rules without any manual effort and 96% with minimal manual effort. Furthermore, we conduct a study with 12 state-of-the-art guides consisting of 2014 rules with automatic checks and show that our tooling can implement at least 97% of them correctly. We have thus significantly reduced the effort of securing systems based on existing security-configuration guides. In this paper, we propose an approach to automatically extract the relevant information from publicly available security-configuration guides for Windows operating systems using natural language processing. In a second step, the extracted information is verified using the information of available settings stored in the Windows Administrative Template files, in which the majority of Windows configuration settings is defined. We show that our implementation of thi","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124764536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Smart home systems are becoming increasingly popular. Software engineering of such systems hence becomes a prominent challenge. In this engineering paradigm, users are often interested in considering sensor states while they are performing various activities. Existing works have proposed initial efforts on incremental development method for activity-oriented requirements. However, there is no systematic way of ensuring reliability and security of such systems which may be developed by various developers and may execute in a complex environment. Some properties, especially those including timing constraints, need to be satisfied. In this paper, we introduce Actom, a framework for identification of activity-oriented requirements and runtime verification. Actom supports the development of the mapping between activities and required sensor readings (activity-sensor mapping). At runtime, Actom receives results of activity recognition and is able to trigger actuators to provide the required physical conditions for the activities, as determined by the activity-sensor mapping. Moreover, Actom continuously monitors whether activity-sensor mapping holds over a time period during the activity. We also discuss the evaluation plan to demonstrate the effectiveness and efficiency of Actom. The end product will be a systematic framework to facilitate the development of activity-oriented requirements and monitor properties with timing constraints to improve reliability and security.
{"title":"Towards Programming and Verification for Activity-Oriented Smart Home Systems","authors":"Xuansong Li, Wei Song, X. Zhang","doi":"10.1145/3324884.3418906","DOIUrl":"https://doi.org/10.1145/3324884.3418906","url":null,"abstract":"Smart home systems are becoming increasingly popular. Software engineering of such systems hence becomes a prominent challenge. In this engineering paradigm, users are often interested in considering sensor states while they are performing various activities. Existing works have proposed initial efforts on incremental development method for activity-oriented requirements. However, there is no systematic way of ensuring reliability and security of such systems which may be developed by various developers and may execute in a complex environment. Some properties, especially those including timing constraints, need to be satisfied. In this paper, we introduce Actom, a framework for identification of activity-oriented requirements and runtime verification. Actom supports the development of the mapping between activities and required sensor readings (activity-sensor mapping). At runtime, Actom receives results of activity recognition and is able to trigger actuators to provide the required physical conditions for the activities, as determined by the activity-sensor mapping. Moreover, Actom continuously monitors whether activity-sensor mapping holds over a time period during the activity. We also discuss the evaluation plan to demonstrate the effectiveness and efficiency of Actom. The end product will be a systematic framework to facilitate the development of activity-oriented requirements and monitor properties with timing constraints to improve reliability and security.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133804600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Formal specifications in Alloy are organized around user-defined data domains, associated with signatures, with almost no support for built-in datatypes. This minimality in the built-in datatypes provided by the language is one of its main features, as it contributes to the automated analyzability of models. One of the few built-in datatypes available in Alloy specifications are integers, whose SAT-based treatment allows only for small bit-widths. In many contexts, where relational datatypes dominate, the use of integers may be auxiliary, e.g., in the use of cardinality constraints and other features. However, as the applications of Alloy are increased, e.g., with the use of the language and its tool support as backend engine for different analysis tasks, the provision of efficient support for numerical datatypes becomes a need. In this work, we present our current preliminary approach to providing an efficient, scalable and user-friendly extension to Alloy, with arithmetic support for numerical datatypes. Our implementation allows for arithmetic with varying precisions, and is implemented via standard Alloy constructions, thus resorting to SAT solving for resolving arithmetic constraints in models.
{"title":"SAT-Based Arithmetic Support for Alloy","authors":"César Cornejo","doi":"10.1145/3324884.3415285","DOIUrl":"https://doi.org/10.1145/3324884.3415285","url":null,"abstract":"Formal specifications in Alloy are organized around user-defined data domains, associated with signatures, with almost no support for built-in datatypes. This minimality in the built-in datatypes provided by the language is one of its main features, as it contributes to the automated analyzability of models. One of the few built-in datatypes available in Alloy specifications are integers, whose SAT-based treatment allows only for small bit-widths. In many contexts, where relational datatypes dominate, the use of integers may be auxiliary, e.g., in the use of cardinality constraints and other features. However, as the applications of Alloy are increased, e.g., with the use of the language and its tool support as backend engine for different analysis tasks, the provision of efficient support for numerical datatypes becomes a need. In this work, we present our current preliminary approach to providing an efficient, scalable and user-friendly extension to Alloy, with arithmetic support for numerical datatypes. Our implementation allows for arithmetic with varying precisions, and is implemented via standard Alloy constructions, thus resorting to SAT solving for resolving arithmetic constraints in models.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"617 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131994556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Exception handling is an important built-in feature of many modern programming languages such as Java. It allows developers to deal with abnormal or unexpected conditions that may occur at runtime in advance by using try-catch blocks. Missing or improper implementation of exception handling can cause catastrophic consequences such as system crash. However, previous studies reveal that developers are unwilling or feel it hard to adopt exception handling mechanism, and tend to ignore it until a system failure forces them to do so. To help developers with exception handling, existing work produces recommendations such as code examples and exception types, which still requires developers to localize the try blocks and modify the catch block code to fit the context. In this paper, we propose a novel neural approach to automated exception handling, which can predict locations of try blocks and automatically generate the complete catch blocks. We collect a large number of Java methods from GitHub and conduct experiments to evaluate our approach. The evaluation results, including quantitative measurement and human evaluation, show that our approach is highly effective and outperforms all baselines. Our work makes one step further towards automated exception handling.
{"title":"Learning to Handle Exceptions","authors":"Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Yanjun Pu, Xudong Liu","doi":"10.1145/3324884.3416568","DOIUrl":"https://doi.org/10.1145/3324884.3416568","url":null,"abstract":"Exception handling is an important built-in feature of many modern programming languages such as Java. It allows developers to deal with abnormal or unexpected conditions that may occur at runtime in advance by using try-catch blocks. Missing or improper implementation of exception handling can cause catastrophic consequences such as system crash. However, previous studies reveal that developers are unwilling or feel it hard to adopt exception handling mechanism, and tend to ignore it until a system failure forces them to do so. To help developers with exception handling, existing work produces recommendations such as code examples and exception types, which still requires developers to localize the try blocks and modify the catch block code to fit the context. In this paper, we propose a novel neural approach to automated exception handling, which can predict locations of try blocks and automatically generate the complete catch blocks. We collect a large number of Java methods from GitHub and conduct experiments to evaluate our approach. The evaluation results, including quantitative measurement and human evaluation, show that our approach is highly effective and outperforms all baselines. Our work makes one step further towards automated exception handling.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132028942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Strings play many roles in programming because they often contain complex and semantically rich information. For example, programmers use strings to filter inputs via regular expression matching, to express the names of program elements accessed through some form of reflection, to embed code written in another formal language, and to assemble textual output produced by a program. The omnipresence of strings leads to a wide range of mistakes that developers may make, yet little is currently known about these mistakes. The lack of knowledge about string-related bugs leads to developers repeating the same mistakes again and again, and to poor support for finding and fixing such bugs. This paper presents the first empirical study of the root causes, consequences, and other properties of string-related bugs. We systematically study 204 string-related bugs in a diverse set of projects written in JavaScript, a language where strings play a particularly important role. Our findings include (i) that many string-related mistakes are caused by a recurring set of root cause patterns, such as incorrect string literals and regular expressions, (ii) that string-related bugs have a diverse set of consequences, including incorrect output or silent omission of expected behavior, (iii) that fixing string-related bugs often requires changing just a single line, with many of the required repair ingredients available in the surrounding code, (iv) that string-related bugs occur across all parts of applications, including the core components, and (v) that almost none of these bugs are detected by existing static analyzers. Our findings not only show the importance and prevalence of string-related bugs, but they help developers to avoid common mistakes and tool builders to tackle the challenge of finding and fixing string-related bugs.
{"title":"No Strings Attached: An Empirical Study of String-related Software Bugs","authors":"A. Eghbali, Michael Pradel","doi":"10.1145/3324884.3416576","DOIUrl":"https://doi.org/10.1145/3324884.3416576","url":null,"abstract":"Strings play many roles in programming because they often contain complex and semantically rich information. For example, programmers use strings to filter inputs via regular expression matching, to express the names of program elements accessed through some form of reflection, to embed code written in another formal language, and to assemble textual output produced by a program. The omnipresence of strings leads to a wide range of mistakes that developers may make, yet little is currently known about these mistakes. The lack of knowledge about string-related bugs leads to developers repeating the same mistakes again and again, and to poor support for finding and fixing such bugs. This paper presents the first empirical study of the root causes, consequences, and other properties of string-related bugs. We systematically study 204 string-related bugs in a diverse set of projects written in JavaScript, a language where strings play a particularly important role. Our findings include (i) that many string-related mistakes are caused by a recurring set of root cause patterns, such as incorrect string literals and regular expressions, (ii) that string-related bugs have a diverse set of consequences, including incorrect output or silent omission of expected behavior, (iii) that fixing string-related bugs often requires changing just a single line, with many of the required repair ingredients available in the surrounding code, (iv) that string-related bugs occur across all parts of applications, including the core components, and (v) that almost none of these bugs are detected by existing static analyzers. Our findings not only show the importance and prevalence of string-related bugs, but they help developers to avoid common mistakes and tool builders to tackle the challenge of finding and fixing string-related bugs.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133474789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaima Boufaied, C. Menghi, D. Bianculli, Yago Isasi Parache
Signal-based temporal properties (SBTPs) characterize the behavior of a system when its inputs and outputs are signals over time; they are very common for the requirements specification of cyber-physical systems. Although there exist several specification languages for expressing SBTPs, such languages either do not easily allow the specification of important types of properties (such as spike or oscillatory behaviors), or are not supported by (efficient) trace-checking procedures. In this paper, we propose SB-TemPsy, a novel model-driven trace-checking approach for SBTPs. SB-TemPsy provides (i) SB-TemPsy-DSL, a domain-specific language that allows the specification of SBTPs covering the most frequent requirement types in cyber-physical systems, and (ii) SB-TemPsy-Check, an efficient, model-driven trace-checking procedure. This procedure reduces the problem of checking an SB-TemPsy-DSL property over an execution trace to the problem of evaluating an Object Constraint Language constraint on a model of the execution trace. We evaluated our contributions by assessing the expressiveness of SB-TemPsy-DSL and the applicability of SB-TemPsy-Check using a representative industrial case study in the satellite domain. SB-TemPsy-DSL could express 97% of the requirements of our case study and SB-TemPsy-Check yielded a trace-checking verdict in 87% of the cases, with an average checking time of 48.7 s. From a practical standpoint and compared to state-of-the-art alternatives, our approach strikes a better trade-off between expressiveness and performance as it supports a large set of property types that can be checked, in most cases, within practical time limits.
{"title":"Trace-Checking Signal-based Temporal Properties: A Model-Driven Approach","authors":"Chaima Boufaied, C. Menghi, D. Bianculli, Yago Isasi Parache","doi":"10.1145/3324884.3416631","DOIUrl":"https://doi.org/10.1145/3324884.3416631","url":null,"abstract":"Signal-based temporal properties (SBTPs) characterize the behavior of a system when its inputs and outputs are signals over time; they are very common for the requirements specification of cyber-physical systems. Although there exist several specification languages for expressing SBTPs, such languages either do not easily allow the specification of important types of properties (such as spike or oscillatory behaviors), or are not supported by (efficient) trace-checking procedures. In this paper, we propose SB-TemPsy, a novel model-driven trace-checking approach for SBTPs. SB-TemPsy provides (i) SB-TemPsy-DSL, a domain-specific language that allows the specification of SBTPs covering the most frequent requirement types in cyber-physical systems, and (ii) SB-TemPsy-Check, an efficient, model-driven trace-checking procedure. This procedure reduces the problem of checking an SB-TemPsy-DSL property over an execution trace to the problem of evaluating an Object Constraint Language constraint on a model of the execution trace. We evaluated our contributions by assessing the expressiveness of SB-TemPsy-DSL and the applicability of SB-TemPsy-Check using a representative industrial case study in the satellite domain. SB-TemPsy-DSL could express 97% of the requirements of our case study and SB-TemPsy-Check yielded a trace-checking verdict in 87% of the cases, with an average checking time of 48.7 s. From a practical standpoint and compared to state-of-the-art alternatives, our approach strikes a better trade-off between expressiveness and performance as it supports a large set of property types that can be checked, in most cases, within practical time limits.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127930727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}