Many web applications use databases for persistent data storage, and using Object Relational Mapping (ORM) frameworks is a common way to develop such database-backed web applications. Unfortunately, developing efficient ORM applications is challenging, as the ORM framework hides the underlying database query generation and execution. This problem is becoming more severe as these applications need to process an increasingly large amount of persistent data. Recent research has targeted specific aspects of performance problems in ORM applications. However, there has not been any systematic study to identify common performance anti-patterns in real-world such applications, how they affect resulting application performance, and remedies for them. In this paper, we try to answer these questions through a comprehensive study of 12 representative real-world ORM applications. We generalize 9 ORM performance anti-patterns from more than 200 performance issues that we obtain by studying their bug-tracking systems and profiling their latest versions. To prove our point, we manually fix 64 performance issues in their latest versions and obtain a median speedup of 2× (and up to 39× max) with fewer than 5 lines of code change in most cases. Many of the issues we found have been confirmed by developers, and we have implemented ways to identify other code fragments with similar issues as well.
{"title":"How not to Structure Your Database-Backed Web Applications: A Study of Performance Bugs in the Wild","authors":"Junwen Yang, Pranav Subramaniam, Shan Lu, Cong Yan, Alvin Cheung","doi":"10.1145/3180155.3180194","DOIUrl":"https://doi.org/10.1145/3180155.3180194","url":null,"abstract":"Many web applications use databases for persistent data storage, and using Object Relational Mapping (ORM) frameworks is a common way to develop such database-backed web applications. Unfortunately, developing efficient ORM applications is challenging, as the ORM framework hides the underlying database query generation and execution. This problem is becoming more severe as these applications need to process an increasingly large amount of persistent data. Recent research has targeted specific aspects of performance problems in ORM applications. However, there has not been any systematic study to identify common performance anti-patterns in real-world such applications, how they affect resulting application performance, and remedies for them. In this paper, we try to answer these questions through a comprehensive study of 12 representative real-world ORM applications. We generalize 9 ORM performance anti-patterns from more than 200 performance issues that we obtain by studying their bug-tracking systems and profiling their latest versions. To prove our point, we manually fix 64 performance issues in their latest versions and obtain a median speedup of 2× (and up to 39× max) with fewer than 5 lines of code change in most cases. Many of the issues we found have been confirmed by developers, and we have implemented ways to identify other code fragments with similar issues as well.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"71 1","pages":"800-810"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74745491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan Bell, Owolabi Legunsen, Michael C Hilton, Lamyaa Eloussi, Tifany Yung, D. Marinov
Developers often run tests to check that their latest changes to a code repository did not break any previously working functionality. Ideally, any new test failures would indicate regressions caused by the latest changes. However, some test failures may not be due to the latest changes but due to non-determinism in the tests, popularly called flaky tests. The typical way to detect flaky tests is to rerun failing tests repeatedly. Unfortunately, rerunning failing tests can be costly and can slow down the development cycle. We present the first extensive evaluation of rerunning failing tests and propose a new technique, called DeFlaker, that detects if a test failure is due to a flaky test without rerunning and with very low runtime overhead. DeFlaker monitors the coverage of latest code changes and marks as flaky any newly failing test that did not execute any of the changes. We deployed DeFlaker live, in the build process of 96 Java projects on TravisCI, and found 87 previously unknown flaky tests in 10 of these projects. We also ran experiments on project histories, where DeFlaker detected 1,874 flaky tests from 4,846 failures, with a low false alarm rate (1.5%). DeFlaker had a higher recall (95.5% vs. 23%) of confirmed flaky tests than Maven's default flaky test detector.
{"title":"DeFlaker: Automatically Detecting Flaky Tests","authors":"Jonathan Bell, Owolabi Legunsen, Michael C Hilton, Lamyaa Eloussi, Tifany Yung, D. Marinov","doi":"10.1145/3180155.3180164","DOIUrl":"https://doi.org/10.1145/3180155.3180164","url":null,"abstract":"Developers often run tests to check that their latest changes to a code repository did not break any previously working functionality. Ideally, any new test failures would indicate regressions caused by the latest changes. However, some test failures may not be due to the latest changes but due to non-determinism in the tests, popularly called flaky tests. The typical way to detect flaky tests is to rerun failing tests repeatedly. Unfortunately, rerunning failing tests can be costly and can slow down the development cycle. We present the first extensive evaluation of rerunning failing tests and propose a new technique, called DeFlaker, that detects if a test failure is due to a flaky test without rerunning and with very low runtime overhead. DeFlaker monitors the coverage of latest code changes and marks as flaky any newly failing test that did not execute any of the changes. We deployed DeFlaker live, in the build process of 96 Java projects on TravisCI, and found 87 previously unknown flaky tests in 10 of these projects. We also ran experiments on project histories, where DeFlaker detected 1,874 flaky tests from 4,846 failures, with a low false alarm rate (1.5%). DeFlaker had a higher recall (95.5% vs. 23%) of confirmed flaky tests than Maven's default flaky test detector.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"88 1","pages":"433-444"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78174893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikolaos Tsantalis, Matin Mansouri, L. Eshkevari, D. Mazinanian, Danny Dig
Refactoring detection algorithms have been crucial to a variety of applications: (i) empirical studies about the evolution of code, tests, and faults, (ii) tools for library API migration, (iii) improving the comprehension of changes and code reviews, etc. However, recent research has questioned the accuracy of the state-of-the-art refactoring detection tools, which poses threats to the reliability of their application. Moreover, previous refactoring detection tools are very sensitive to user-provided similarity thresholds, which further reduces their practical accuracy. In addition, their requirement to build the project versions/revisions under analysis makes them inapplicable in many real-world scenarios. To reinvigorate a previously fruitful line of research that has stifled, we designed, implemented, and evaluated RMiner, a technique that overcomes the above limitations. At the heart of RMiner is an AST-based statement matching algorithm that determines refactoring candidates without requiring user-defined thresholds. To empirically evaluate RMiner, we created the most comprehensive oracle to date that uses triangulation to create a dataset with considerably reduced bias, representing 3,188 refactorings from 185 open-source projects. Using this oracle, we found that RMiner has a precision of 98% and recall of 87%, which is a significant improvement over the previous state-of-the-art.
{"title":"Accurate and Efficient Refactoring Detection in Commit History","authors":"Nikolaos Tsantalis, Matin Mansouri, L. Eshkevari, D. Mazinanian, Danny Dig","doi":"10.1145/3180155.3180206","DOIUrl":"https://doi.org/10.1145/3180155.3180206","url":null,"abstract":"Refactoring detection algorithms have been crucial to a variety of applications: (i) empirical studies about the evolution of code, tests, and faults, (ii) tools for library API migration, (iii) improving the comprehension of changes and code reviews, etc. However, recent research has questioned the accuracy of the state-of-the-art refactoring detection tools, which poses threats to the reliability of their application. Moreover, previous refactoring detection tools are very sensitive to user-provided similarity thresholds, which further reduces their practical accuracy. In addition, their requirement to build the project versions/revisions under analysis makes them inapplicable in many real-world scenarios. To reinvigorate a previously fruitful line of research that has stifled, we designed, implemented, and evaluated RMiner, a technique that overcomes the above limitations. At the heart of RMiner is an AST-based statement matching algorithm that determines refactoring candidates without requiring user-defined thresholds. To empirically evaluate RMiner, we created the most comprehensive oracle to date that uses triangulation to create a dataset with considerably reduced bias, representing 3,188 refactorings from 185 open-source projects. Using this oracle, we found that RMiner has a precision of 98% and recall of 87%, which is a significant improvement over the previous state-of-the-art.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"176 1","pages":"483-494"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73960138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Android apps are omnipresent, and frequently suffer from crashes — leading to poor user experience and economic loss. Past work focused on automated test generation to detect crashes in Android apps. However, automated repair of crashes has not been studied. In this paper, we propose the first approach to automatically repair Android apps, specifically we propose a technique for fixing crashes in Android apps. Unlike most test-based repair approaches, we do not need a test-suite; instead a single failing test is meticulously analyzed for crash locations and reasons behind these crashes. Our approach hinges on a careful empirical study which seeks to establish common root-causes for crashes in Android apps, and then distills the remedy of these root-causes in the form of eight generic transformation operators. These operators are applied using a search-based repair framework embodied in our repair tool Droix. We also prepare a benchmark DroixBench capturing reproducible crashes in Android apps. Our evaluation of Droix on DroixBench reveals that the automatically produced patches are often syntactically identical to the human patch, and on some rare occasion even better than the human patch (in terms of avoiding regressions). These results confirm our intuition that our proposed transformations form a sufficient set of operators to patch crashes in Android.
{"title":"Repairing Crashes in Android Apps","authors":"Shin Hwei Tan, Zhen Dong, Xiang Gao, Abhik Roychoudhury","doi":"10.1145/3180155.3180243","DOIUrl":"https://doi.org/10.1145/3180155.3180243","url":null,"abstract":"Android apps are omnipresent, and frequently suffer from crashes — leading to poor user experience and economic loss. Past work focused on automated test generation to detect crashes in Android apps. However, automated repair of crashes has not been studied. In this paper, we propose the first approach to automatically repair Android apps, specifically we propose a technique for fixing crashes in Android apps. Unlike most test-based repair approaches, we do not need a test-suite; instead a single failing test is meticulously analyzed for crash locations and reasons behind these crashes. Our approach hinges on a careful empirical study which seeks to establish common root-causes for crashes in Android apps, and then distills the remedy of these root-causes in the form of eight generic transformation operators. These operators are applied using a search-based repair framework embodied in our repair tool Droix. We also prepare a benchmark DroixBench capturing reproducible crashes in Android apps. Our evaluation of Droix on DroixBench reveals that the automatically produced patches are often syntactically identical to the human patch, and on some rare occasion even better than the human patch (in terms of avoiding regressions). These results confirm our intuition that our proposed transformations form a sufficient set of operators to patch crashes in Android.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"14 1","pages":"187-198"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80474148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanxin Lu, Swarat Chaudhuri, C. Jermaine, David Melski
We introduce program splicing, a programming methodology that aims to automate the work ow of copying, pasting, and modifying code available online. Here, the programmer starts by writing a "draft" that mixes un nished code, natural language comments, and correctness requirements. A program synthesizer that interacts with a large, searchable database of program snippets is used to automatically complete the draft into a program that meets the re-quirements. The synthesis process happens in two stages. First, the synthesizer identi es a small number of programs in the database that are relevant to the synthesis task. Next it uses an enumerative search to systematically ll the draft with expressions and statements from these relevant programs. The resulting program is returned to the programmer, who can modify it and possibly invoke additional rounds of synthesis. We present an implementation of program splicing, called Splicer, for the Java programming language. Splicer uses a corpus of over 3.5 million procedures from an open-source software repository. Our evaluation uses the system in a suite of everyday programming tasks, and includes a comparison with a state-of-the-art competing approach as well as a user study. The results point to the broad scope and scalability of program splicing and indicate that the approach can signi cantly boost programmer productivity.
{"title":"Program Splicing","authors":"Yanxin Lu, Swarat Chaudhuri, C. Jermaine, David Melski","doi":"10.1145/3180155.3180190","DOIUrl":"https://doi.org/10.1145/3180155.3180190","url":null,"abstract":"We introduce program splicing, a programming methodology that aims to automate the work ow of copying, pasting, and modifying code available online. Here, the programmer starts by writing a \"draft\" that mixes un nished code, natural language comments, and correctness requirements. A program synthesizer that interacts with a large, searchable database of program snippets is used to automatically complete the draft into a program that meets the re-quirements. The synthesis process happens in two stages. First, the synthesizer identi es a small number of programs in the database that are relevant to the synthesis task. Next it uses an enumerative search to systematically ll the draft with expressions and statements from these relevant programs. The resulting program is returned to the programmer, who can modify it and possibly invoke additional rounds of synthesis. We present an implementation of program splicing, called Splicer, for the Java programming language. Splicer uses a corpus of over 3.5 million procedures from an open-source software repository. Our evaluation uses the system in a suite of everyday programming tasks, and includes a comparison with a state-of-the-art competing approach as well as a user study. The results point to the broad scope and scalability of program splicing and indicate that the approach can signi cantly boost programmer productivity.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"49 1","pages":"338-349"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91324104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Sousa, Anderson Oliveira, W. Oizumi, Simone Diniz Junqueira Barbosa, Alessandro F. Garcia, Jaejoon Lee, Marcos Kalinowski, R. Mello, B. Neto, R. Oliveira, C. Lucena, R. Paes
The prevalence of design problems may cause re-engineering or even discontinuation of the system. Due to missing, informal or outdated design documentation, developers often have to rely on the source code to identify design problems. Therefore, developers have to analyze different symptoms that manifest in several code elements, which may quickly turn into a complex task. Although researchers have been investigating techniques to help developers in identifying design problems, there is little knowledge on how developers actually proceed to identify design problems. In order to tackle this problem, we conducted a multi-trial industrial experiment with professionals from 5 software companies to build a grounded theory. The resulting theory offers explanations on how developers identify design problems in practice. For instance, it reveals the characteristics of symptoms that developers consider helpful. Moreover, developers often combine different types of symptoms to identify a single design problem. This knowledge serves as a basis to further understand the phenomena and advance towards more effective identification techniques.
{"title":"Identifying Design Problems in the Source Code: A Grounded Theory","authors":"L. Sousa, Anderson Oliveira, W. Oizumi, Simone Diniz Junqueira Barbosa, Alessandro F. Garcia, Jaejoon Lee, Marcos Kalinowski, R. Mello, B. Neto, R. Oliveira, C. Lucena, R. Paes","doi":"10.1145/3180155.3180239","DOIUrl":"https://doi.org/10.1145/3180155.3180239","url":null,"abstract":"The prevalence of design problems may cause re-engineering or even discontinuation of the system. Due to missing, informal or outdated design documentation, developers often have to rely on the source code to identify design problems. Therefore, developers have to analyze different symptoms that manifest in several code elements, which may quickly turn into a complex task. Although researchers have been investigating techniques to help developers in identifying design problems, there is little knowledge on how developers actually proceed to identify design problems. In order to tackle this problem, we conducted a multi-trial industrial experiment with professionals from 5 software companies to build a grounded theory. The resulting theory offers explanations on how developers identify design problems in practice. For instance, it reveals the characteristics of symptoms that developers consider helpful. Moreover, developers often combine different types of symptoms to identify a single design problem. This knowledge serves as a basis to further understand the phenomena and advance towards more effective identification techniques.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"14 2 1","pages":"921-931"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90266951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raja Ben Abdessalem, S. Nejati, L. Briand, Thomas Stifter
Vision-based control systems are key enablers of many autonomous vehicular systems, including self-driving cars. Testing such systems is complicated by complex and multidimensional input spaces. We propose an automated testing algorithm that builds on learnable evolutionary algorithms. These algorithms rely on machine learning or a combination of machine learning and Darwinian genetic operators to guide the generation of new solutions (test scenarios in our context). Our approach combines multiobjective population-based search algorithms and decision tree classification models to achieve the following goals: First, classification models guide the search-based generation of tests faster towards critical test scenarios (i.e., test scenarios leading to failures). Second, search algorithms refine classification models so that the models can accurately characterize critical regions (i.e., the regions of a test input space that are likely to contain most critical test scenarios). Our evaluation performed on an industrial automotive automotive system shows that: (1) Our algorithm outperforms a baseline evolutionary search algorithm and generates 78% more distinct, critical test scenarios compared to the baseline algorithm. (2) Our algorithm accurately characterizes critical regions of the system under test, thus identifying the conditions that are likely to lead to system failures.
{"title":"Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms","authors":"Raja Ben Abdessalem, S. Nejati, L. Briand, Thomas Stifter","doi":"10.1145/3180155.3180160","DOIUrl":"https://doi.org/10.1145/3180155.3180160","url":null,"abstract":"Vision-based control systems are key enablers of many autonomous vehicular systems, including self-driving cars. Testing such systems is complicated by complex and multidimensional input spaces. We propose an automated testing algorithm that builds on learnable evolutionary algorithms. These algorithms rely on machine learning or a combination of machine learning and Darwinian genetic operators to guide the generation of new solutions (test scenarios in our context). Our approach combines multiobjective population-based search algorithms and decision tree classification models to achieve the following goals: First, classification models guide the search-based generation of tests faster towards critical test scenarios (i.e., test scenarios leading to failures). Second, search algorithms refine classification models so that the models can accurately characterize critical regions (i.e., the regions of a test input space that are likely to contain most critical test scenarios). Our evaluation performed on an industrial automotive automotive system shows that: (1) Our algorithm outperforms a baseline evolutionary search algorithm and generates 78% more distinct, critical test scenarios compared to the baseline algorithm. (2) Our algorithm accurately characterizes critical regions of the system under test, thus identifying the conditions that are likely to lead to system failures.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"1016-1026"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89139603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developers commonly define tasks to help coordinate software development efforts---whether they be feature implementation, refactoring, or bug fixes. Developers establish links between tasks to express implicit dependencies that needs explicit handling---dependencies that often require the developers responsible for a given task to assess how changes in a linked task affect their own work and vice versa (i.e., change propagation). While seemingly useful, it is unknown if change propagation indeed coincides with task links. No study has investigated to what extent change propagation actually occurs between task pairs and whether it is able to serve as a metric for characterizing the underlying task dependency. In this paper, we study the temporal relationship between developer reading and changing of source code in relationship to task links We identify seven situations that explain the varying correlation of change propagation with linked task pairs and find six motifs describing when change propagation occurs between non-linked task pairs. Our paper demonstrates that task links are indeed useful for recommending which artifacts to monitor for changes, which developers to involve in a task, or which tasks to inspect.
{"title":"Does the Propagation of Artifact Changes Across Tasks Reflect Work Dependencies?","authors":"Christoph Mayr-Dorn, Alexander Egyed","doi":"10.1145/3180155.3180185","DOIUrl":"https://doi.org/10.1145/3180155.3180185","url":null,"abstract":"Developers commonly define tasks to help coordinate software development efforts---whether they be feature implementation, refactoring, or bug fixes. Developers establish links between tasks to express implicit dependencies that needs explicit handling---dependencies that often require the developers responsible for a given task to assess how changes in a linked task affect their own work and vice versa (i.e., change propagation). While seemingly useful, it is unknown if change propagation indeed coincides with task links. No study has investigated to what extent change propagation actually occurs between task pairs and whether it is able to serve as a metric for characterizing the underlying task dependency. In this paper, we study the temporal relationship between developer reading and changing of source code in relationship to task links We identify seven situations that explain the varying correlation of change propagation with linked task pairs and find six motifs describing when change propagation occurs between non-linked task pairs. Our paper demonstrates that task links are indeed useful for recommending which artifacts to monitor for changes, which developers to involve in a task, or which tasks to inspect.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"116 1","pages":"397-407"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78230861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Applications (apps) that conceal their activities are fundamentally deceptive; app marketplaces and end-users should treat such apps as suspicious. However, due to its nature and intent, activity concealing is not disclosed up-front, which puts users at risk. In this paper, we focus on characterization and detection of such techniques, e.g., hiding the app or removing traces, which we call "self hiding behavior" (SHB). SHB has not been studied per se – rather it has been reported on only as a byproduct of malware investigations. We address this gap via a study and suite of static analyses targeted at SH in Android apps. Specifically, we present (1) a detailed characterization of SHB, (2) a suite of static analyses to detect such behavior, and (3) a set of detectors that employ SHB to distinguish between benign and malicious apps. We show that SHB ranges from hiding the app's presence or activity to covering an app's traces, e.g., by blocking phone calls/text messages or removing calls and messages from logs. Using our static analysis tools on a large dataset of 9,452 Android apps (benign as well as malicious) we expose the frequency of 12 such SH behaviors. Our approach is effective: it has revealed that malicious apps employ 1.5 SHBs per app on average. Surprisingly, SH behavior is also employed by legitimate ("benign") apps, which can affect users negatively in multiple ways. When using our approach for separating malicious from benign apps, our approach has high precision and recall (combined F-measure = 87.19%). Our approach is also efficient, with analysis typically taking just 37 seconds per app. We believe that our findings and analysis tool are beneficial to both app marketplaces and end-users.
{"title":"Self-Hiding Behavior in Android Apps: Detection and Characterization","authors":"Zhiyong Shan, Iulian Neamtiu, Raina Samuel","doi":"10.1145/3180155.3180214","DOIUrl":"https://doi.org/10.1145/3180155.3180214","url":null,"abstract":"Applications (apps) that conceal their activities are fundamentally deceptive; app marketplaces and end-users should treat such apps as suspicious. However, due to its nature and intent, activity concealing is not disclosed up-front, which puts users at risk. In this paper, we focus on characterization and detection of such techniques, e.g., hiding the app or removing traces, which we call \"self hiding behavior\" (SHB). SHB has not been studied per se – rather it has been reported on only as a byproduct of malware investigations. We address this gap via a study and suite of static analyses targeted at SH in Android apps. Specifically, we present (1) a detailed characterization of SHB, (2) a suite of static analyses to detect such behavior, and (3) a set of detectors that employ SHB to distinguish between benign and malicious apps. We show that SHB ranges from hiding the app's presence or activity to covering an app's traces, e.g., by blocking phone calls/text messages or removing calls and messages from logs. Using our static analysis tools on a large dataset of 9,452 Android apps (benign as well as malicious) we expose the frequency of 12 such SH behaviors. Our approach is effective: it has revealed that malicious apps employ 1.5 SHBs per app on average. Surprisingly, SH behavior is also employed by legitimate (\"benign\") apps, which can affect users negatively in multiple ways. When using our approach for separating malicious from benign apps, our approach has high precision and recall (combined F-measure = 87.19%). Our approach is also efficient, with analysis typically taking just 37 seconds per app. We believe that our findings and analysis tool are beneficial to both app marketplaces and end-users.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"43 1","pages":"728-739"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77118010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To implement a program functionality, developers can reuse previously written code snippets by searching through a large-scale codebase. Over the years, many code search tools have been proposed to help developers. The existing approaches often treat source code as textual documents and utilize information retrieval models to retrieve relevant code snippets that match a given query. These approaches mainly rely on the textual similarity between source code and natural language query. They lack a deep understanding of the semantics of queries and source code. In this paper, we propose a novel deep neural network named CODEnn (Code-Description Embedding Neural Network). Instead of matching text similarity, CODEnn jointly embeds code snippets and natural language descriptions into a high-dimensional vector space, in such a way that code snippet and its corresponding description have similar vectors. Using the unified vector representation, code snippets related to a natural language query can be retrieved according to their vectors. Semantically related words can also be recognized and irrelevant/noisy keywords in queries can be handled. As a proof-of-concept application, we implement a code search tool named DeepCS using the proposed CODEnn model. We empirically evaluate DeepCS on a large scale codebase collected from GitHub. The experimental results show that our approach can effectively retrieve relevant code snippets and outperforms previous techniques.
{"title":"Deep Code Search","authors":"Xiaodong Gu, Hongyu Zhang, Sunghun Kim","doi":"10.1145/3180155.3180167","DOIUrl":"https://doi.org/10.1145/3180155.3180167","url":null,"abstract":"To implement a program functionality, developers can reuse previously written code snippets by searching through a large-scale codebase. Over the years, many code search tools have been proposed to help developers. The existing approaches often treat source code as textual documents and utilize information retrieval models to retrieve relevant code snippets that match a given query. These approaches mainly rely on the textual similarity between source code and natural language query. They lack a deep understanding of the semantics of queries and source code. In this paper, we propose a novel deep neural network named CODEnn (Code-Description Embedding Neural Network). Instead of matching text similarity, CODEnn jointly embeds code snippets and natural language descriptions into a high-dimensional vector space, in such a way that code snippet and its corresponding description have similar vectors. Using the unified vector representation, code snippets related to a natural language query can be retrieved according to their vectors. Semantically related words can also be recognized and irrelevant/noisy keywords in queries can be handled. As a proof-of-concept application, we implement a code search tool named DeepCS using the proposed CODEnn model. We empirically evaluate DeepCS on a large scale codebase collected from GitHub. The experimental results show that our approach can effectively retrieve relevant code snippets and outperforms previous techniques.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"71 1","pages":"933-944"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81848528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}