首页 > 最新文献

Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering最新文献

英文 中文
Edge4Real
Di Shao, Xiao Liu, Ben Cheng, Owen Wang, Thuong N. Hoang
Recognition of human behaviours including body motions and facial expressions plays a significant role in human-centric software engineering. However, due to the data and computation intensive nature of human behaviour recognition through video analytics, expensive powerful machines are often required, which could hinder the research and application in human-centric software engineering. To address such an issue, this paper proposes a cost-effective human behaviour recognition system named Edge4Real which can be easily deployed in an edge computing environment with commodity machines. Compared with existing centralised solutions, Edge4Real has three major advantages including cost-effectiveness, easy-to-use, and realtime. Specifically, Edge4Real adopts a distributed architecture where components such as motion capturing, human behaviour recognition, data decoding and extraction, and the application of the recognition result, can be deployed on separated end devices and edge nodes in an edge computing environment. Using a virtual reality application which can capture a user's motion and translate into the motion of a 3D avatar in real time, we successfully validate the effectiveness of the system and demonstrate its promising value to the research and application of human-centric software engineering. The demo video can be found at https://youtu.be/tnEshD8j-kA.
{"title":"Edge4Real","authors":"Di Shao, Xiao Liu, Ben Cheng, Owen Wang, Thuong N. Hoang","doi":"10.1145/3324884.3415297","DOIUrl":"https://doi.org/10.1145/3324884.3415297","url":null,"abstract":"Recognition of human behaviours including body motions and facial expressions plays a significant role in human-centric software engineering. However, due to the data and computation intensive nature of human behaviour recognition through video analytics, expensive powerful machines are often required, which could hinder the research and application in human-centric software engineering. To address such an issue, this paper proposes a cost-effective human behaviour recognition system named Edge4Real which can be easily deployed in an edge computing environment with commodity machines. Compared with existing centralised solutions, Edge4Real has three major advantages including cost-effectiveness, easy-to-use, and realtime. Specifically, Edge4Real adopts a distributed architecture where components such as motion capturing, human behaviour recognition, data decoding and extraction, and the application of the recognition result, can be deployed on separated end devices and edge nodes in an edge computing environment. Using a virtual reality application which can capture a user's motion and translate into the motion of a 3D avatar in real time, we successfully validate the effectiveness of the system and demonstrate its promising value to the research and application of human-centric software engineering. The demo video can be found at https://youtu.be/tnEshD8j-kA.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116133684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FlashRegex FlashRegex
Yeting Li, Zhiwu Xu, Jialun Cao, Haiming Chen, Tingjian Ge, S. Cheung, Haoren Zhao
Regular expressions (regexes) are widely used in different fields of computer science such as programming languages, string processing and databases. However, existing tools for synthesizing or repairing regexes were not designed to be resilient to Regex Denial of Service (ReDoS) attacks. Specifically, if a regex has super-linear (SL) worst-case complexity, an attacker could provide carefully-crafted inputs to launch ReDoS attacks. Therefore, in this paper, we propose a programming-by-example framework, FlashRegex, for generating anti-ReDoS regexes by either synthesizing or repairing from given examples. It is the first framework that integrates regex synthesis and repair with the awareness of ReDoS-vulnerabilities. We present novel algorithms to deduce anti-ReDoS regexes by reducing the ambiguity of these regexes and by using Boolean Satisfiability (SAT) or Neighborhood Search (NS) techniques. We evaluate FlashRegex with five related state-of-the-art tools. The evaluation results show that our work can effectively and efficiently generate anti-ReDoS regexes from given examples, and also reveal that existing synthesis and repair tools have neglected ReDoS-vulnerabilities of regexes. Specifically, the existing synthesis and repair tools generated up to 394 ReDoS-vulnerable regex within few seconds to more than onehour, while FlashRegex generated no SL regex within around five seconds. Furthermore, the evaluation results on ReDoS-vulnerable regex repair also show that FlashRegex has better capability than existing repair tools and even human experts, achieving 4 more ReDoS-invulnerable regex after repair without trimming and resorting, highlighting the usefulness of FlashRegex in terms of the generality, automation and user-friendliness.
{"title":"FlashRegex","authors":"Yeting Li, Zhiwu Xu, Jialun Cao, Haiming Chen, Tingjian Ge, S. Cheung, Haoren Zhao","doi":"10.1145/3324884.3416556","DOIUrl":"https://doi.org/10.1145/3324884.3416556","url":null,"abstract":"Regular expressions (regexes) are widely used in different fields of computer science such as programming languages, string processing and databases. However, existing tools for synthesizing or repairing regexes were not designed to be resilient to Regex Denial of Service (ReDoS) attacks. Specifically, if a regex has super-linear (SL) worst-case complexity, an attacker could provide carefully-crafted inputs to launch ReDoS attacks. Therefore, in this paper, we propose a programming-by-example framework, FlashRegex, for generating anti-ReDoS regexes by either synthesizing or repairing from given examples. It is the first framework that integrates regex synthesis and repair with the awareness of ReDoS-vulnerabilities. We present novel algorithms to deduce anti-ReDoS regexes by reducing the ambiguity of these regexes and by using Boolean Satisfiability (SAT) or Neighborhood Search (NS) techniques. We evaluate FlashRegex with five related state-of-the-art tools. The evaluation results show that our work can effectively and efficiently generate anti-ReDoS regexes from given examples, and also reveal that existing synthesis and repair tools have neglected ReDoS-vulnerabilities of regexes. Specifically, the existing synthesis and repair tools generated up to 394 ReDoS-vulnerable regex within few seconds to more than onehour, while FlashRegex generated no SL regex within around five seconds. Furthermore, the evaluation results on ReDoS-vulnerable regex repair also show that FlashRegex has better capability than existing repair tools and even human experts, achieving 4 more ReDoS-invulnerable regex after repair without trimming and resorting, highlighting the usefulness of FlashRegex in terms of the generality, automation and user-friendliness.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
JISET JISET
Jihyeok Park, Jihee Park, Seungmin An, Sukyoung Ryu
JavaScript was initially designed for client-side programming in web browsers, but its engine is now embedded in various kinds of host software. Despite the popularity, since the JavaScript semantics is complex especially due to its dynamic nature, understanding and reasoning about JavaScript programs are challenging tasks. Thus, researchers have proposed several attempts to define the formal semantics of JavaScript based on ECMAScript, the official JavaScript specification. However, the existing approaches are manual, labor-intensive, and error-prone and all of their formal semantics target ECMAScript 5.1 (ES5.1, 2011) or its former versions. Therefore, they are not suitable for understanding modern JavaScript language features introduced since ECMAScript 6 (ES6, 2015). Moreover, ECMAScript has been annually updated since ES6, which already made five releases after ES5.1. To alleviate the problem, we propose JISET, a JavaScript IR-based Semantics Extraction Toolchain. It is the first tool that automatically synthesizes parsers and AST-IR translators directly from a given language specification, ECMAScript. For syntax, we develop a parser generation technique with lookahead parsing for BNFES, a variant of the extended BNF used in ECMAScript. For semantics, JISET synthesizes AST-IR translators using forward compatible rule-based compilation. Compile rules describe how to convert each step of abstract algorithms written in a structured natural language into IRES, an Intermediate Representation that we designed for ECMAScript. For the four most recent ECMAScript versions, JISET automatically synthesized parsers for all versions, and compiled 95.03% of the algorithm steps on average. After we complete the missing parts manually, the extracted core semantics of the latest ECMAScript (ES10, 2019) passed all 18,064 applicable tests. Using this first formal semantics of modern JavaScript, we found nine specification errors in ES10, which were all confirmed by the Ecma Technical Committee 39. Furthermore, we showed that JISET is forward compatible by applying it to nine feature proposals ready for inclusion in the next ECMAScript, which let us find three errors in the BigInt proposal.
{"title":"JISET","authors":"Jihyeok Park, Jihee Park, Seungmin An, Sukyoung Ryu","doi":"10.1145/3324884.3416632","DOIUrl":"https://doi.org/10.1145/3324884.3416632","url":null,"abstract":"JavaScript was initially designed for client-side programming in web browsers, but its engine is now embedded in various kinds of host software. Despite the popularity, since the JavaScript semantics is complex especially due to its dynamic nature, understanding and reasoning about JavaScript programs are challenging tasks. Thus, researchers have proposed several attempts to define the formal semantics of JavaScript based on ECMAScript, the official JavaScript specification. However, the existing approaches are manual, labor-intensive, and error-prone and all of their formal semantics target ECMAScript 5.1 (ES5.1, 2011) or its former versions. Therefore, they are not suitable for understanding modern JavaScript language features introduced since ECMAScript 6 (ES6, 2015). Moreover, ECMAScript has been annually updated since ES6, which already made five releases after ES5.1. To alleviate the problem, we propose JISET, a JavaScript IR-based Semantics Extraction Toolchain. It is the first tool that automatically synthesizes parsers and AST-IR translators directly from a given language specification, ECMAScript. For syntax, we develop a parser generation technique with lookahead parsing for BNFES, a variant of the extended BNF used in ECMAScript. For semantics, JISET synthesizes AST-IR translators using forward compatible rule-based compilation. Compile rules describe how to convert each step of abstract algorithms written in a structured natural language into IRES, an Intermediate Representation that we designed for ECMAScript. For the four most recent ECMAScript versions, JISET automatically synthesized parsers for all versions, and compiled 95.03% of the algorithm steps on average. After we complete the missing parts manually, the extracted core semantics of the latest ECMAScript (ES10, 2019) passed all 18,064 applicable tests. Using this first formal semantics of modern JavaScript, we found nine specification errors in ES10, which were all confirmed by the Ecma Technical Committee 39. Furthermore, we showed that JISET is forward compatible by applying it to nine feature proposals ready for inclusion in the next ECMAScript, which let us find three errors in the BigInt proposal.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125722647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
MetPurity MetPurity
Runze Yu, Youzhe Zhang, J. Xuan
In object-oriented programming, a method is pure if calling the method does not change object states that exist in the pre-states of the method call. Pure methods are widely-used in automatic techniques, including test generation, compiler optimization, and program repair. Due to the source code dependency, it is infeasible to completely and accurately identify all pure methods. Instead, existing techniques such as ReImInfer are designed to identify a subset of accurate results of pure method and mark the other methods as unknown ones. In this paper, we designed and implemented MetPurity, a learning-based tool of pure method identification. Given all methods in a project, MetPurity labels a training set via automatic program analysis and builds a binary classifier (implemented with the random forest classifier) based on the training set. This classifier is used to predict the purity of all the other methods (i.e., unknown ones) in the same project. Preliminary evaluation on four open-source Java projects shows that Metpurity can provide a list of identified pure methods with a low error rate. Applying Met-purity to EvoSuite can increase the number of generated assertions for regression testing in test generation by EvoSuite.
{"title":"MetPurity","authors":"Runze Yu, Youzhe Zhang, J. Xuan","doi":"10.1145/3324884.3415292","DOIUrl":"https://doi.org/10.1145/3324884.3415292","url":null,"abstract":"In object-oriented programming, a method is pure if calling the method does not change object states that exist in the pre-states of the method call. Pure methods are widely-used in automatic techniques, including test generation, compiler optimization, and program repair. Due to the source code dependency, it is infeasible to completely and accurately identify all pure methods. Instead, existing techniques such as ReImInfer are designed to identify a subset of accurate results of pure method and mark the other methods as unknown ones. In this paper, we designed and implemented MetPurity, a learning-based tool of pure method identification. Given all methods in a project, MetPurity labels a training set via automatic program analysis and builds a binary classifier (implemented with the random forest classifier) based on the training set. This classifier is used to predict the purity of all the other methods (i.e., unknown ones) in the same project. Preliminary evaluation on four open-source Java projects shows that Metpurity can provide a list of identified pure methods with a low error rate. Applying Met-purity to EvoSuite can increase the number of generated assertions for regression testing in test generation by EvoSuite.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124684116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
TestMC TestMC
Muhammad Usman, Wenxi Wang, S. Khurshid
Model counting is the problem for finding the number of solutions to a formula over a bounded universe. This is a classic problem in computer science that has seen many recent advances in techniques and tools that tackle it. These advances have led to applications of model counting in many domains, e.g., quantitative program analysis, reliability, and security. Given the sheer complexity of the underlying problem, today's model counters employ sophisticated algorithms and heuristics, which result in complex tools that must be heavily optimized. Therefore, establishing the correctness of implementations of model counters necessitates rigorous testing. This experience paper presents an empirical study on testing industrial strength model counters by applying the principles of differential and metamorphic testing together with bounded exhaustive input generation and input minimization. We embody these principles in the TestMC framework, and apply it to test four model counters, including three state-of-the-art model counters from three different classes. Specifically, we test the exact model counters projMC and dSharp, the probabilistic exact model counter Ganak, and the probabilistic approximate model counter ApproxMC. As subjects, we use three complementary test suites of input formulas. One suite consists of larger formulas that are derived from a wide range of real-world software design problems. The second suite consists of a bounded exhaustive set of small formulas that TestMC generated. The third suite consists of formulas generated using an off-the-shelf CNF fuzzer. TestMC found bugs in three of the four subject model counters. The bugs led to crashes, segmentation faults, incorrect model counts, and resource exhaustion by the solvers. Two of the tools were corrected subsequent to the bug reports we submitted based on our study, whereas the bugs we reported in the third tool were deemed by the tool authors to not require a fix.
{"title":"TestMC","authors":"Muhammad Usman, Wenxi Wang, S. Khurshid","doi":"10.1145/3324884.3416563","DOIUrl":"https://doi.org/10.1145/3324884.3416563","url":null,"abstract":"Model counting is the problem for finding the number of solutions to a formula over a bounded universe. This is a classic problem in computer science that has seen many recent advances in techniques and tools that tackle it. These advances have led to applications of model counting in many domains, e.g., quantitative program analysis, reliability, and security. Given the sheer complexity of the underlying problem, today's model counters employ sophisticated algorithms and heuristics, which result in complex tools that must be heavily optimized. Therefore, establishing the correctness of implementations of model counters necessitates rigorous testing. This experience paper presents an empirical study on testing industrial strength model counters by applying the principles of differential and metamorphic testing together with bounded exhaustive input generation and input minimization. We embody these principles in the TestMC framework, and apply it to test four model counters, including three state-of-the-art model counters from three different classes. Specifically, we test the exact model counters projMC and dSharp, the probabilistic exact model counter Ganak, and the probabilistic approximate model counter ApproxMC. As subjects, we use three complementary test suites of input formulas. One suite consists of larger formulas that are derived from a wide range of real-world software design problems. The second suite consists of a bounded exhaustive set of small formulas that TestMC generated. The third suite consists of formulas generated using an off-the-shelf CNF fuzzer. TestMC found bugs in three of the four subject model counters. The bugs led to crashes, segmentation faults, incorrect model counts, and resource exhaustion by the solvers. Two of the tools were corrected subsequent to the bug reports we submitted based on our study, whereas the bugs we reported in the third tool were deemed by the tool authors to not require a fix.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128476740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
PerfCI
Omar Javed, J. H. Dawes, Marta Han, G. Franzoni, A. Pfeiffer, Giles Reger, Walter Binder
Software performance testing is an essential quality assurance mechanism that can identify optimization opportunities. Automating this process requires strong tool support, especially in the case of Continuous Integration (CI) where tests need to run completely automatically and it is desirable to provide developers with actionable feedback. A lack of existing tools means that performance testing is normally left out of the scope of CI. In this paper, we propose a toolchain - PerfCI - to pave the way for developers to easily set up and carry out automated performance testing under CI. Our toolchain is based on allowing users to (1) specify performance testing tasks, (2) analyze unit tests on a variety of python projects ranging from scripts to full-blown flask-based web services, by extending a performance analysis framework (VyPR) and (3) evaluate performance data to get feedback on the code. We demonstrate the feasibility of our toolchain by using it on a web service running at the Compact Muon Solenoid (CMS) experiment at the world's largest particle physics laboratory - CERN. Package. Source code, example and documentation of PerfCI are available: https://gitlab.cern.ch/omjaved/PerfCI. Tool demonstration can be viewed on YouTube: https://youtu.be/RDmXMKAlv7g. We also provide the data set used in the analysis: https://gitlab.cern.ch/omjaved/PerfCI-dataset.
{"title":"PerfCI","authors":"Omar Javed, J. H. Dawes, Marta Han, G. Franzoni, A. Pfeiffer, Giles Reger, Walter Binder","doi":"10.1145/3324884.3415288","DOIUrl":"https://doi.org/10.1145/3324884.3415288","url":null,"abstract":"Software performance testing is an essential quality assurance mechanism that can identify optimization opportunities. Automating this process requires strong tool support, especially in the case of Continuous Integration (CI) where tests need to run completely automatically and it is desirable to provide developers with actionable feedback. A lack of existing tools means that performance testing is normally left out of the scope of CI. In this paper, we propose a toolchain - PerfCI - to pave the way for developers to easily set up and carry out automated performance testing under CI. Our toolchain is based on allowing users to (1) specify performance testing tasks, (2) analyze unit tests on a variety of python projects ranging from scripts to full-blown flask-based web services, by extending a performance analysis framework (VyPR) and (3) evaluate performance data to get feedback on the code. We demonstrate the feasibility of our toolchain by using it on a web service running at the Compact Muon Solenoid (CMS) experiment at the world's largest particle physics laboratory - CERN. Package. Source code, example and documentation of PerfCI are available: https://gitlab.cern.ch/omjaved/PerfCI. Tool demonstration can be viewed on YouTube: https://youtu.be/RDmXMKAlv7g. We also provide the data set used in the analysis: https://gitlab.cern.ch/omjaved/PerfCI-dataset.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114309771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
SCDetector SCDetector
Yueming Wu, Deqing Zou, Shihan Dou, Sirui Yang, Wei Yang, Feng Cheng, Hong Liang, Hai Jin
Code clone detection is to find out code fragments with similar functionalities, which has been more and more important in software engineering. Many approaches have been proposed to detect code clones, in which token-based methods are the most scalable but cannot handle semantic clones because of the lack of consideration of program semantics. To address the issue, researchers conduct program analysis to distill the program semantics into a graph representation and detect clones by matching the graphs. However, such approaches suffer from low scalability since graph matching is typically time-consuming. In this paper, we propose SCDetector to combine the scalability of token-based methods with the accuracy of graph-based methods for software functional clone detection. Given a function source code, we first extract the control flow graph by static analysis. Instead of using traditional heavyweight graph matching, we treat the graph as a social network and apply social-network-centrality analysis to dig out the centrality of each basic block. Then we assign the centrality to each token in a basic block and sum the centrality ofthe same token in different basic blocks. By this, a graph is turned into certain tokens with graph details (i.e., centrality), called semantic tokens. Finally, these semantic tokens are fed into a Siamese architecture neural network to train a code clone detector. We evaluate SCDetector on two large datasets of functionally similar code. Experimental results indicate that our system is superior to four state-of-the-art methods (i.e., SourcererCC, Deckard, RtvNN, and ASTNN) and the time cost of SCDetector is 14 times less than a traditional graph-based method (i.e., CCSharp) on detecting semantic clones.
{"title":"SCDetector","authors":"Yueming Wu, Deqing Zou, Shihan Dou, Sirui Yang, Wei Yang, Feng Cheng, Hong Liang, Hai Jin","doi":"10.1145/3324884.3416562","DOIUrl":"https://doi.org/10.1145/3324884.3416562","url":null,"abstract":"Code clone detection is to find out code fragments with similar functionalities, which has been more and more important in software engineering. Many approaches have been proposed to detect code clones, in which token-based methods are the most scalable but cannot handle semantic clones because of the lack of consideration of program semantics. To address the issue, researchers conduct program analysis to distill the program semantics into a graph representation and detect clones by matching the graphs. However, such approaches suffer from low scalability since graph matching is typically time-consuming. In this paper, we propose SCDetector to combine the scalability of token-based methods with the accuracy of graph-based methods for software functional clone detection. Given a function source code, we first extract the control flow graph by static analysis. Instead of using traditional heavyweight graph matching, we treat the graph as a social network and apply social-network-centrality analysis to dig out the centrality of each basic block. Then we assign the centrality to each token in a basic block and sum the centrality ofthe same token in different basic blocks. By this, a graph is turned into certain tokens with graph details (i.e., centrality), called semantic tokens. Finally, these semantic tokens are fed into a Siamese architecture neural network to train a code clone detector. We evaluate SCDetector on two large datasets of functionally similar code. Experimental results indicate that our system is superior to four state-of-the-art methods (i.e., SourcererCC, Deckard, RtvNN, and ASTNN) and the time cost of SCDetector is 14 times less than a traditional graph-based method (i.e., CCSharp) on detecting semantic clones.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115334955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
CCGraph CCGraph
Y. Zou, B. Ban, Yinxing Xue, Yun Xu
Software clone detection is an active research area, which is very important for software maintenance, bug detection, etc. The two pieces of cloned code reflect some similarities or equivalents in the syntax or structure of the code representations. There are many representations of code like AST, token, PDG, etc. The PDG (Program Dependency Graph) of source code can contain both syntactic and structural information. However, most existing PDG-based tools are quite time-consuming and miss many clones because they detect code clones with exact graph matching by using subgraph isomorphism. In this paper, we propose a novel PDG-based code clone detector, CCGraph, that uses graph kernels. Firstly, we normalize the structure of PDGs and design a two-stage filtering strategy by measuring the characteristic vectors of codes. Then we detect the code clones by using an approximate graph matching algorithm based on the reforming WL (Weisfeiler-Lehman) graph kernel. Experiment results show that CCGraph retains a high accuracy, has both better recall and F1-score values, and detects more semantic clones than other two related state-of-the-art tools. Besides, CCGraph is much more efficient than the existing PDG-based tools.
{"title":"CCGraph","authors":"Y. Zou, B. Ban, Yinxing Xue, Yun Xu","doi":"10.1145/3324884.3416541","DOIUrl":"https://doi.org/10.1145/3324884.3416541","url":null,"abstract":"Software clone detection is an active research area, which is very important for software maintenance, bug detection, etc. The two pieces of cloned code reflect some similarities or equivalents in the syntax or structure of the code representations. There are many representations of code like AST, token, PDG, etc. The PDG (Program Dependency Graph) of source code can contain both syntactic and structural information. However, most existing PDG-based tools are quite time-consuming and miss many clones because they detect code clones with exact graph matching by using subgraph isomorphism. In this paper, we propose a novel PDG-based code clone detector, CCGraph, that uses graph kernels. Firstly, we normalize the structure of PDGs and design a two-stage filtering strategy by measuring the characteristic vectors of codes. Then we detect the code clones by using an approximate graph matching algorithm based on the reforming WL (Weisfeiler-Lehman) graph kernel. Experiment results show that CCGraph retains a high accuracy, has both better recall and F1-score values, and detects more semantic clones than other two related state-of-the-art tools. Besides, CCGraph is much more efficient than the existing PDG-based tools.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124792569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
OSLDetector OSLDetector
Dan Zhang, Ping Luo, Wei Tang, Min Zhou
Using open-source libraries can provide rich functions and reduce development cost. However, some critical issues have also been caused such as license conflicts and vulnerability risks. In this paper, we design and implement an open-source libraries detection tool OSLDetector which uses methods of matching features to detect third-party libraries for multi-platform software in binaries. We took a series of methods such as filtering features and novelty building an internal clone forest to cope with the challenge of feature duplication. The tool can also provide the conflict of licenses and identify possible corresponding vulnerabilities, so these potential risks can be resolved and avoided. To evaluate the efficiency of OSLDetector, we collect 5K libraries containing 9K versions and manage their respective license type and existing vulnerabilities. The experimental results with a precision of 96% and recall of 92.3% show that OSLDetector is effective and outperforms similar tools.
{"title":"OSLDetector","authors":"Dan Zhang, Ping Luo, Wei Tang, Min Zhou","doi":"10.1145/3324884.3415303","DOIUrl":"https://doi.org/10.1145/3324884.3415303","url":null,"abstract":"Using open-source libraries can provide rich functions and reduce development cost. However, some critical issues have also been caused such as license conflicts and vulnerability risks. In this paper, we design and implement an open-source libraries detection tool OSLDetector which uses methods of matching features to detect third-party libraries for multi-platform software in binaries. We took a series of methods such as filtering features and novelty building an internal clone forest to cope with the challenge of feature duplication. The tool can also provide the conflict of licenses and identify possible corresponding vulnerabilities, so these potential risks can be resolved and avoided. To evaluate the efficiency of OSLDetector, we collect 5K libraries containing 9K versions and manage their respective license type and existing vulnerabilities. The experimental results with a precision of 96% and recall of 92.3% show that OSLDetector is effective and outperforms similar tools.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127458291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
BugPecker
Junming Cao, Shouliang Yang, Wenhui Jiang, Hushuang Zeng, Beijun Shen, Hao Zhong
Given a bug report of a project, the task of locating the faults of the bug report is called fault localization. To help programmers in the fault localization process, many approaches have been proposed, and have achieved promising results to locate faulty files. However, it is still challenging to locate faulty methods, because many methods are short and do not have sufficient details to determine whether they are faulty. In this paper, we present BugPecker, a novel approach to locate faulty methods based on its deep learning on revision graphs. Its key idea includes (1) building revision graphs and capturing the details of past fixes as much as possible, and (2) discovering relations inside our revision graphs to expand the details for methods and calculating various features to assist our ranking. We have implemented BugPecker, and evaluated it on three open source projects. The early results show that BugPecker achieves a mean average precision (MAP) of 0.263 and mean reciprocal rank (MRR) of 0.291, which improve the prior approaches significantly. For example, BugPecker improves the MAP values of all three projects by five times, compared with two recent approaches such as DNNLoc-m and BLIA 1.5.
{"title":"BugPecker","authors":"Junming Cao, Shouliang Yang, Wenhui Jiang, Hushuang Zeng, Beijun Shen, Hao Zhong","doi":"10.1145/3324884.3418934","DOIUrl":"https://doi.org/10.1145/3324884.3418934","url":null,"abstract":"Given a bug report of a project, the task of locating the faults of the bug report is called fault localization. To help programmers in the fault localization process, many approaches have been proposed, and have achieved promising results to locate faulty files. However, it is still challenging to locate faulty methods, because many methods are short and do not have sufficient details to determine whether they are faulty. In this paper, we present BugPecker, a novel approach to locate faulty methods based on its deep learning on revision graphs. Its key idea includes (1) building revision graphs and capturing the details of past fixes as much as possible, and (2) discovering relations inside our revision graphs to expand the details for methods and calculating various features to assist our ranking. We have implemented BugPecker, and evaluated it on three open source projects. The early results show that BugPecker achieves a mean average precision (MAP) of 0.263 and mean reciprocal rank (MRR) of 0.291, which improve the prior approaches significantly. For example, BugPecker improves the MAP values of all three projects by five times, compared with two recent approaches such as DNNLoc-m and BLIA 1.5.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129029236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1