M. J. Decker, Christian D. Newman, Natalia Dragan, M. Collard, Jonathan I. Maletic, Nicholas A. Kraft
A study of how method roles evolve during the lifetime of a software system is presented. Evolution is examined by analyzing when the stereotype of a method changes. Stereotypes provide a high-level categorization of a method's behavior and role, and also provide insight into how a method interacts with its environment and carries out tasks. The study covers 50 open-source systems and 6 closed-source systems. Results show that method behavior with respect to stereotype is highly stable and constant over time. Overall, out of all the history examined, only about 10% of changes to methods result in a change in their stereotype. Examples of methods that change stereotype are further examined. A select number of these types of changes are indicators of code smells.
{"title":"[Research Paper] Which Method-Stereotype Changes are Indicators of Code Smells?","authors":"M. J. Decker, Christian D. Newman, Natalia Dragan, M. Collard, Jonathan I. Maletic, Nicholas A. Kraft","doi":"10.1109/SCAM.2018.00017","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00017","url":null,"abstract":"A study of how method roles evolve during the lifetime of a software system is presented. Evolution is examined by analyzing when the stereotype of a method changes. Stereotypes provide a high-level categorization of a method's behavior and role, and also provide insight into how a method interacts with its environment and carries out tasks. The study covers 50 open-source systems and 6 closed-source systems. Results show that method behavior with respect to stereotype is highly stable and constant over time. Overall, out of all the history examined, only about 10% of changes to methods result in a change in their stereotype. Examples of methods that change stereotype are further examined. A select number of these types of changes are indicators of code smells.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131089570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raffi Khatchadourian, Yiming Tang, M. Bagherzadeh, Syed Ahmed
Streaming APIs are pervasive in mainstream Object-Oriented languages and platforms. For example, the Java 8 Stream API allows for functional-like, MapReduce-style operations in processing both finite, e.g., collections, and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we describe the engineering aspects of an open source automated refactoring tool called Optimize Streams that assists developers in writing optimal stream software in a semantics-preserving fashion. Based on a novel ordering and typestate analysis, the tool is implemented as a plug-in to the popular Eclipse IDE, using both the WALA and SAFE frameworks. The tool was evaluated on 11 Java projects consisting of ~642 thousand lines of code, where we found that 36.31% of candidate streams were refactorable, and an average speedup of 1.55 on a performance suite was observed. We also describe experiences gained from integrating three very different static analysis frameworks to provide developers with an easy-to-use interface for optimizing their stream code to its full potential.
{"title":"[Engineering Paper] A Tool for Optimizing Java 8 Stream Software via Automated Refactoring","authors":"Raffi Khatchadourian, Yiming Tang, M. Bagherzadeh, Syed Ahmed","doi":"10.1109/SCAM.2018.00011","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00011","url":null,"abstract":"Streaming APIs are pervasive in mainstream Object-Oriented languages and platforms. For example, the Java 8 Stream API allows for functional-like, MapReduce-style operations in processing both finite, e.g., collections, and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we describe the engineering aspects of an open source automated refactoring tool called Optimize Streams that assists developers in writing optimal stream software in a semantics-preserving fashion. Based on a novel ordering and typestate analysis, the tool is implemented as a plug-in to the popular Eclipse IDE, using both the WALA and SAFE frameworks. The tool was evaluated on 11 Java projects consisting of ~642 thousand lines of code, where we found that 36.31% of candidate streams were refactorable, and an average speedup of 1.55 on a performance suite was observed. We also describe experiences gained from integrating three very different static analysis frameworks to provide developers with an easy-to-use interface for optimizing their stream code to its full potential.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127553756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muslim Chochlov, M. English, J. Buckley, D. Ilie, Maria Scanlon
As part of a module re-unification project of an industrial partner's code, spanning one systems and two derivative systems, the feature-clone variants across these systems have to be extracted, to be later re-unified as singular code elements for re-use. To assist developers with this task, the CoRA (The Code Re-unification Application) tool was designed and implemented. An approach, and the subsequent design of the tool was derived from reflection on manual feature-location/clonedetection efforts on the company's systems, in the first phase of an action research cycle where the approach/implementation will be iteratively trialled, and subsequently refined, in-situ. A pilot study is discussed that leads to the proposed tool. The tool combines a hybrid (textual-static) feature location technique and a textual clone detection technique for featureclone identification. In this paper, the rationale behind the CoRA tool is presented, followed by a tool overview and its implementation details. Finally, an example use case shows how the tool is used to locate clones of a particular feature.
{"title":"[Engineering Paper] Identifying Feature Clones in a Suite of Systems","authors":"Muslim Chochlov, M. English, J. Buckley, D. Ilie, Maria Scanlon","doi":"10.1109/SCAM.2018.00024","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00024","url":null,"abstract":"As part of a module re-unification project of an industrial partner's code, spanning one systems and two derivative systems, the feature-clone variants across these systems have to be extracted, to be later re-unified as singular code elements for re-use. To assist developers with this task, the CoRA (The Code Re-unification Application) tool was designed and implemented. An approach, and the subsequent design of the tool was derived from reflection on manual feature-location/clonedetection efforts on the company's systems, in the first phase of an action research cycle where the approach/implementation will be iteratively trialled, and subsequently refined, in-situ. A pilot study is discussed that leads to the proposed tool. The tool combines a hybrid (textual-static) feature location technique and a textual clone detection technique for featureclone identification. In this paper, the rationale behind the CoRA tool is presented, followed by a tool overview and its implementation details. Finally, an example use case shows how the tool is used to locate clones of a particular feature.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"679 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128337027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Guelton, A. Guinet, Pierrick Brunet, J. Caamaño, F. Dagnat, Nicolas Szlifierski
Code obfuscation is the de facto standard to protect intellectual property when delivering code in an unmanaged environment. It relies on additive layers of code tangling techniques, white-box encryption calls and platform-specific or tool-specific countermeasures to make it harder for a reverse engineer to access critical pieces of data or to understand core algorithms. The literature provides plenty of different obfuscation techniques that can be used at compile time to transform data or control flow in order to provide some kind of protection against different reverse engineering scenarii. Scheduling code transformations to optimize a given metric is known as the pass scheduling problem, a problem known to be NP-hard, but solved in a practical way using hard-coded sequences that are generally satisfactory. Adding code obfuscation to the problem introduces two new dimensions. First, as a code obfuscator needs to find a balance between obfuscation and performance, pass scheduling becomes a multi-criteria optimization problem. Second, obfuscation passes transform their inputs in unconventional ways, which means some pass combinations may not be desirable or even valid. This paper highlights several issues met when blindly chaining different kind of obfuscation and optimization passes, emphasizing the need of a formal model to combine them. It proposes a non-intrusive formalism to leverage on sequential pass management techniques. The model is validated on real-world scenarii gathered during the development of an industrial-strength obfuscator on top of the LLVM compiler infrastructure.
{"title":"[Research Paper] Combining Obfuscation and Optimizations in the Real World","authors":"S. Guelton, A. Guinet, Pierrick Brunet, J. Caamaño, F. Dagnat, Nicolas Szlifierski","doi":"10.1109/SCAM.2018.00010","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00010","url":null,"abstract":"Code obfuscation is the de facto standard to protect intellectual property when delivering code in an unmanaged environment. It relies on additive layers of code tangling techniques, white-box encryption calls and platform-specific or tool-specific countermeasures to make it harder for a reverse engineer to access critical pieces of data or to understand core algorithms. The literature provides plenty of different obfuscation techniques that can be used at compile time to transform data or control flow in order to provide some kind of protection against different reverse engineering scenarii. Scheduling code transformations to optimize a given metric is known as the pass scheduling problem, a problem known to be NP-hard, but solved in a practical way using hard-coded sequences that are generally satisfactory. Adding code obfuscation to the problem introduces two new dimensions. First, as a code obfuscator needs to find a balance between obfuscation and performance, pass scheduling becomes a multi-criteria optimization problem. Second, obfuscation passes transform their inputs in unconventional ways, which means some pass combinations may not be desirable or even valid. This paper highlights several issues met when blindly chaining different kind of obfuscation and optimization passes, emphasizing the need of a formal model to combine them. It proposes a non-intrusive formalism to leverage on sequential pass management techniques. The model is validated on real-world scenarii gathered during the development of an industrial-strength obfuscator on top of the LLVM compiler infrastructure.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134431343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gábor Antal, Péter Hegedüs, Z. Tóth, R. Ferenc, T. Gyimóthy
The popularity and wide adoption of JavaScript both at the client and server side makes its code analysis more important than ever before. Most of the algorithms for vulnerability analysis, coding issue detection, or type inference rely on the call graph representation of the underlying program. Despite some obvious advantages of dynamic analysis, static algorithms should also be considered for call graph construction as they do not require extensive test beds for programs and their costly execution and tracing. In this paper, we systematically compare five widely adopted static algorithms - implemented by the npm call graph, IBM WALA, Google Closure Compiler, Approximate Call Graph, and Type Analyzer for JavaScript tools - for building JavaScript call graphs on 26 WebKit SunSpider benchmark programs and 6 real-world Node.js modules. We provide a performance analysis as well as a quantitative and qualitative evaluation of the results. We found that there was a relatively large intersection of the found call edges among the algorithms, which proved to be 100% precise. However, most of the tools found edges that were missed by all others. ACG had the highest precision followed immediately by TAJS, but ACG found significantly more call edges. As for the combination of tools, ACG and TAJS together covered 99% of the found true edges by all algorithms, while maintaining a precision as high as 98%. Only two of the tools were able to analyze up-to-date multi-file Node.js modules due to incomplete language features support. They agreed on almost 60% of the call edges, but each of them found valid edges that the other missed.
{"title":"[Research Paper] Static JavaScript Call Graphs: A Comparative Study","authors":"Gábor Antal, Péter Hegedüs, Z. Tóth, R. Ferenc, T. Gyimóthy","doi":"10.1109/SCAM.2018.00028","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00028","url":null,"abstract":"The popularity and wide adoption of JavaScript both at the client and server side makes its code analysis more important than ever before. Most of the algorithms for vulnerability analysis, coding issue detection, or type inference rely on the call graph representation of the underlying program. Despite some obvious advantages of dynamic analysis, static algorithms should also be considered for call graph construction as they do not require extensive test beds for programs and their costly execution and tracing. In this paper, we systematically compare five widely adopted static algorithms - implemented by the npm call graph, IBM WALA, Google Closure Compiler, Approximate Call Graph, and Type Analyzer for JavaScript tools - for building JavaScript call graphs on 26 WebKit SunSpider benchmark programs and 6 real-world Node.js modules. We provide a performance analysis as well as a quantitative and qualitative evaluation of the results. We found that there was a relatively large intersection of the found call edges among the algorithms, which proved to be 100% precise. However, most of the tools found edges that were missed by all others. ACG had the highest precision followed immediately by TAJS, but ACG found significantly more call edges. As for the combination of tools, ACG and TAJS together covered 99% of the found true edges by all algorithms, while maintaining a precision as high as 98%. Only two of the tools were able to analyze up-to-date multi-file Node.js modules due to incomplete language features support. They agreed on almost 60% of the call edges, but each of them found valid edges that the other missed.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116582159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multithreaded programs are prone to dataraces. Dataraces are known to be hard to detect and reproduce by manual effort, although they often have detrimental effects on program reliability. Automated techniques are thus demanded for detecting dataraces efficiently and precisely. There have been proposed a lot of datarace detectors so far, among which dynamic ones are promising because of their precision. However, existing dynamic race detectors incur high race-checking overheads. Even a state-of-the-art dynamic race detector, called Parallel FastTrack, fails to efficiently detect races under certain conditions, despite its attempt to parallelize race detection for efficiency. In this paper, we propose an efficient and precise parallel race detector. For our proposal, we first experimentally reveal that the load-distribution policy of Parallel FastTrack tends to skew race-checking loads to a few detection threads. We then present a simple but effective technique, called POI, for balancing race-checking loads among detection threads. POI takes race-checking loads of each detection thread into account and reduces the load skew by making each detection thread manage almost the same number of memory addresses to be checked. Experiments on several real multithreaded data-processing applications show that POI succeeded in reducing, on average, about 37% of race detection overheads, which the load-distribution policy of Parallel FastTrack would impose.
{"title":"[Research Paper] POI: Skew-Aware Parallel Race Detection","authors":"Yoshitaka Sakurai, Yoshitaka Arahori, K. Gondow","doi":"10.1109/SCAM.2018.00033","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00033","url":null,"abstract":"Multithreaded programs are prone to dataraces. Dataraces are known to be hard to detect and reproduce by manual effort, although they often have detrimental effects on program reliability. Automated techniques are thus demanded for detecting dataraces efficiently and precisely. There have been proposed a lot of datarace detectors so far, among which dynamic ones are promising because of their precision. However, existing dynamic race detectors incur high race-checking overheads. Even a state-of-the-art dynamic race detector, called Parallel FastTrack, fails to efficiently detect races under certain conditions, despite its attempt to parallelize race detection for efficiency. In this paper, we propose an efficient and precise parallel race detector. For our proposal, we first experimentally reveal that the load-distribution policy of Parallel FastTrack tends to skew race-checking loads to a few detection threads. We then present a simple but effective technique, called POI, for balancing race-checking loads among detection threads. POI takes race-checking loads of each detection thread into account and reduces the load skew by making each detection thread manage almost the same number of memory addresses to be checked. Experiments on several real multithreaded data-processing applications show that POI succeeded in reducing, on average, about 37% of race detection overheads, which the load-distribution policy of Parallel FastTrack would impose.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131520884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software systems naturally evolve, and this evolution often brings design problems that cause system degradation. Architectural smells are typical symptoms of such problems, and several of these smells are related to undesired dependencies among modules. The early detection of these smells is important for developers, because they can plan ahead for maintenance or refactoring efforts, thus preventing system degradation. Existing tools for identifying architectural smells can detect the smells once they exist in the source code. This means that their undesired dependencies are already created. In this work, we explore a forward-looking approach that is able to infer groups of likely module dependencies that can anticipate architectural smells in a future system version. Our approach considers the current module structure as a network, along with information from previous versions, and applies link prediction techniques (from the field of social network analysis). In particular, we focus on dependency-related smells, such as Cyclic Dependency and Hub-like Dependency, which fit well with the link prediction model. An initial evaluation with two open-source projects shows that, under certain considerations, the predictions of our approach are satisfactory. Furthermore, the approach can be extended to other types of dependency-based smells or metrics.
{"title":"[Research Paper] Towards Anticipation of Architectural Smells Using Link Prediction Techniques","authors":"J. A. D. Pace, Antonela Tommasel, D. Godoy","doi":"10.1109/SCAM.2018.00015","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00015","url":null,"abstract":"Software systems naturally evolve, and this evolution often brings design problems that cause system degradation. Architectural smells are typical symptoms of such problems, and several of these smells are related to undesired dependencies among modules. The early detection of these smells is important for developers, because they can plan ahead for maintenance or refactoring efforts, thus preventing system degradation. Existing tools for identifying architectural smells can detect the smells once they exist in the source code. This means that their undesired dependencies are already created. In this work, we explore a forward-looking approach that is able to infer groups of likely module dependencies that can anticipate architectural smells in a future system version. Our approach considers the current module structure as a network, along with information from previous versions, and applies link prediction techniques (from the field of social network analysis). In particular, we focus on dependency-related smells, such as Cyclic Dependency and Hub-like Dependency, which fit well with the link prediction model. An initial evaluation with two open-source projects shows that, under certain considerations, the predictions of our approach are satisfactory. Furthermore, the approach can be extended to other types of dependency-based smells or metrics.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114536733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}