Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00007
Quentin Stiévenart, Coen De Roover
WebAssembly is a new W3C standard, providing a portable target for compilation for various languages. All major browsers can run WebAssembly programs, and its use extends beyond the web: there is interest in compiling cross-platform desktop applications, server applications, IoT and embedded applications to WebAssembly because of the performance and security guarantees it aims to provide. Indeed, WebAssembly has been carefully designed with security in mind. In particular, WebAssembly applications are sandboxed from their host environment. However, recent works have brought to light several limitations that expose WebAssembly to traditional attack vectors. Visitors of websites using WebAssembly have been exposed to malicious code as a result.In this paper, we propose an automated static program analysis to address these security concerns. Our analysis is focused on information flow and is compositional. For every WebAssembly function, it first computes a summary that describes in a sound manner where the information from its parameters and the global program state can flow to. These summaries can then be applied during the subsequent analysis of function calls. Through a classical fixed-point formulation, one obtains an approximation of the information flow in the WebAssembly program. This results in the first compositional static analysis for WebAssembly. On a set of 34 benchmark programs spanning 196kLOC of WebAssembly, we compute at least 64% of the function summaries precisely in less than a minute in total.
{"title":"Compositional Information Flow Analysis for WebAssembly Programs","authors":"Quentin Stiévenart, Coen De Roover","doi":"10.1109/SCAM51674.2020.00007","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00007","url":null,"abstract":"WebAssembly is a new W3C standard, providing a portable target for compilation for various languages. All major browsers can run WebAssembly programs, and its use extends beyond the web: there is interest in compiling cross-platform desktop applications, server applications, IoT and embedded applications to WebAssembly because of the performance and security guarantees it aims to provide. Indeed, WebAssembly has been carefully designed with security in mind. In particular, WebAssembly applications are sandboxed from their host environment. However, recent works have brought to light several limitations that expose WebAssembly to traditional attack vectors. Visitors of websites using WebAssembly have been exposed to malicious code as a result.In this paper, we propose an automated static program analysis to address these security concerns. Our analysis is focused on information flow and is compositional. For every WebAssembly function, it first computes a summary that describes in a sound manner where the information from its parameters and the global program state can flow to. These summaries can then be applied during the subsequent analysis of function calls. Through a classical fixed-point formulation, one obtains an approximation of the information flow in the WebAssembly program. This results in the first compositional static analysis for WebAssembly. On a set of 34 benchmark programs spanning 196kLOC of WebAssembly, we compute at least 64% of the function summaries precisely in less than a minute in total.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"34 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126042669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00024
Ana Paula M. Tarchetti, L. Amaral, M. Oliveira, R. Bonifácio, G. Pinto, D. Lo
Maintaining complex software systems is a timeconsuming and challenging task. Practitioners must have a general understanding of the system’s decomposition and how the system’s developers have implemented the software features (probably cutting across different modules). Re-engineering practices are imperative to tackle these challenges. Previous research has shown the benefits of using software module clustering (SMC) to aid developers during re-engineering tasks (e.g., revealing the architecture of the systems, identifying how the concerns are spread among the modules of the systems, recommending refactorings, and so on). Nonetheless, although the literature on software module clustering has substantially evolved in the last 20 years, there are just a few tools publicly available. Still, these available tools do not scale to large scenarios, in particular, when optimizing multi-objectives. In this paper we present the Draco Clustering Tool (DCT), a new software module clustering tool. DCT design decisions make multi-objective software clusterization feasible, even for software systems comprising up to 1,000 modules. We report an empirical study that compares DCT with other available multi-objective tool (HD-NSGA-II), and both DCT and HD-NSGA-II with mono-objective tools (BUNCH and HD-LNS). We evidence that DCT solves the scalability issue when clustering medium size projects in a multi-objective mode. In a more extreme case, DCT was able to cluster Druid (an analytics data store) 221 times faster than HD-NSGA-II.
{"title":"DCT: An Scalable Multi-Objective Module Clustering Tool","authors":"Ana Paula M. Tarchetti, L. Amaral, M. Oliveira, R. Bonifácio, G. Pinto, D. Lo","doi":"10.1109/SCAM51674.2020.00024","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00024","url":null,"abstract":"Maintaining complex software systems is a timeconsuming and challenging task. Practitioners must have a general understanding of the system’s decomposition and how the system’s developers have implemented the software features (probably cutting across different modules). Re-engineering practices are imperative to tackle these challenges. Previous research has shown the benefits of using software module clustering (SMC) to aid developers during re-engineering tasks (e.g., revealing the architecture of the systems, identifying how the concerns are spread among the modules of the systems, recommending refactorings, and so on). Nonetheless, although the literature on software module clustering has substantially evolved in the last 20 years, there are just a few tools publicly available. Still, these available tools do not scale to large scenarios, in particular, when optimizing multi-objectives. In this paper we present the Draco Clustering Tool (DCT), a new software module clustering tool. DCT design decisions make multi-objective software clusterization feasible, even for software systems comprising up to 1,000 modules. We report an empirical study that compares DCT with other available multi-objective tool (HD-NSGA-II), and both DCT and HD-NSGA-II with mono-objective tools (BUNCH and HD-LNS). We evidence that DCT solves the scalability issue when clustering medium size projects in a multi-objective mode. In a more extreme case, DCT was able to cluster Druid (an analytics data store) 221 times faster than HD-NSGA-II.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128609490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00011
Rungroj Maipradit, B. Lin, Csaba Nagy, G. Bavota, Michele Lanza, Hideaki Hata, Ken-ichi Matsumoto
Modern software is developed under considerable time pressure, which implies that developers more often than not have to resort to compromises when it comes to code that is well written and code that just does the job. This has led over the past decades to the concept of “technical debt”, a short-term hack that potentially generates long-term maintenance problems. Self-admitted technical debt (SATD) is a particular form of technical debt: developers consciously perform the hack but also document it in the code by adding comments as a reminder (or as an admission of guilt). We focus on a specific type of SATD, namely “On-hold” SATD, in which developers document in their comments the need to halt an implementation task due to conditions outside of their scope of work (e.g., an open issue must be closed before a function can be implemented).We present an approach, based on regular expressions and machine learning, which is able to detect issues referenced in code comments, and to automatically classify the detected instances as either “On-hold” (the issue is referenced to indicate the need to wait for its resolution before completing a task), or as “cross-reference”, (the issue is referenced to document the code, for example to explain the rationale behind an implementation choice). Our approach also mines the issue tracker of the projects to check if the On-hold SATD instances are “superfluous” and can be removed (i.e., the referenced issue has been closed, but the SATD is still in the code). Our evaluation confirms that our approach can indeed identify relevant instances of On-hold SATD. We illustrate its usefulness by identifying superfluous On-hold SATD instances in open source projects as confirmed by the original developers.
{"title":"Automated Identification of On-hold Self-admitted Technical Debt","authors":"Rungroj Maipradit, B. Lin, Csaba Nagy, G. Bavota, Michele Lanza, Hideaki Hata, Ken-ichi Matsumoto","doi":"10.1109/SCAM51674.2020.00011","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00011","url":null,"abstract":"Modern software is developed under considerable time pressure, which implies that developers more often than not have to resort to compromises when it comes to code that is well written and code that just does the job. This has led over the past decades to the concept of “technical debt”, a short-term hack that potentially generates long-term maintenance problems. Self-admitted technical debt (SATD) is a particular form of technical debt: developers consciously perform the hack but also document it in the code by adding comments as a reminder (or as an admission of guilt). We focus on a specific type of SATD, namely “On-hold” SATD, in which developers document in their comments the need to halt an implementation task due to conditions outside of their scope of work (e.g., an open issue must be closed before a function can be implemented).We present an approach, based on regular expressions and machine learning, which is able to detect issues referenced in code comments, and to automatically classify the detected instances as either “On-hold” (the issue is referenced to indicate the need to wait for its resolution before completing a task), or as “cross-reference”, (the issue is referenced to document the code, for example to explain the rationale behind an implementation choice). Our approach also mines the issue tracker of the projects to check if the On-hold SATD instances are “superfluous” and can be removed (i.e., the referenced issue has been closed, but the SATD is still in the code). Our evaluation confirms that our approach can indeed identify relevant instances of On-hold SATD. We illustrate its usefulness by identifying superfluous On-hold SATD instances in open source projects as confirmed by the original developers.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"8 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128782013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00030
J. Fabry, Ynès Jaradin, Aynel Gül
Part of the ecosystem of applications running on mainframe computers is the DFSORT program. It is responsible for sorting and reformatting data (amongst other functionalities) and is configured by specifications written in a Domain-Specific Language (DSL). When migrating such sort workloads off from the mainframe, the SyncSort product is an attractive alternative. It is also configured by specifications written in a DSL but this language is structured in a radically different way. Whereas the DFSORT DSL uses an explicit fixed pipeline for processing, the SyncSort DSL does not. To allow DFSORT workloads to run on SyncSort we have therefore built a source-to-source translator from the DFSORT DSL to the SyncSort DSL. Our language converter performs abstract interpretation of the DFSORT specification, considering the different steps in the DFSORT pipeline at translation time. This is done by building a graph of objects and key to the construction of this graph is the reification of the records being sorted. In this paper we report on the design and implementation of the converter, describing how it treats the DFSORT pipeline. We also show how its design allowed for the straightforward implementation of unexpected changes in requirements for the generated output.
{"title":"Engineering a Converter Between Two Domain-Specific Languages for Sorting","authors":"J. Fabry, Ynès Jaradin, Aynel Gül","doi":"10.1109/SCAM51674.2020.00030","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00030","url":null,"abstract":"Part of the ecosystem of applications running on mainframe computers is the DFSORT program. It is responsible for sorting and reformatting data (amongst other functionalities) and is configured by specifications written in a Domain-Specific Language (DSL). When migrating such sort workloads off from the mainframe, the SyncSort product is an attractive alternative. It is also configured by specifications written in a DSL but this language is structured in a radically different way. Whereas the DFSORT DSL uses an explicit fixed pipeline for processing, the SyncSort DSL does not. To allow DFSORT workloads to run on SyncSort we have therefore built a source-to-source translator from the DFSORT DSL to the SyncSort DSL. Our language converter performs abstract interpretation of the DFSORT specification, considering the different steps in the DFSORT pipeline at translation time. This is done by building a graph of objects and key to the construction of this graph is the reification of the records being sorted. In this paper we report on the design and implementation of the converter, describing how it treats the DFSORT pipeline. We also show how its design allowed for the straightforward implementation of unexpected changes in requirements for the generated output.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128700822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00031
Roger Scott, Joseph Ranieri, Lucja Kot, Vineeth Kashyap
Programmers often add meaningful information about program semantics when naming program entities such as variables, functions, and macros. However, static analysis tools typically discount this information when they look for bugs in a program. In this work, we describe the design and implementation of a static analysis checker called SWAPD, which uses the natural language information in programs to warn about mistakenly-swapped arguments at call sites. SWAPD combines two independent detection strategies to improve the effectiveness of the overall checker. We present the results of a comprehensive evaluation of SWAPD over a large corpus of C and C++ programs totaling 417 million lines of code. In this evaluation, SWAPD found 154 manually-vetted real-world cases of mistakenly-swapped arguments, suggesting that such errors— while not pervasive in released code—are a real problem and a worthwhile target for static analysis.
{"title":"Out of Sight, Out of Place: Detecting and Assessing Swapped Arguments","authors":"Roger Scott, Joseph Ranieri, Lucja Kot, Vineeth Kashyap","doi":"10.1109/SCAM51674.2020.00031","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00031","url":null,"abstract":"Programmers often add meaningful information about program semantics when naming program entities such as variables, functions, and macros. However, static analysis tools typically discount this information when they look for bugs in a program. In this work, we describe the design and implementation of a static analysis checker called SWAPD, which uses the natural language information in programs to warn about mistakenly-swapped arguments at call sites. SWAPD combines two independent detection strategies to improve the effectiveness of the overall checker. We present the results of a comprehensive evaluation of SWAPD over a large corpus of C and C++ programs totaling 417 million lines of code. In this evaluation, SWAPD found 154 manually-vetted real-world cases of mistakenly-swapped arguments, suggesting that such errors— while not pervasive in released code—are a real problem and a worthwhile target for static analysis.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121342039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00034
Camilo Velázquez-Rodríguez, Coen De Roover
Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting.This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.
{"title":"MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven","authors":"Camilo Velázquez-Rodríguez, Coen De Roover","doi":"10.1109/SCAM51674.2020.00034","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00034","url":null,"abstract":"Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting.This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114451231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00028
Richárd Szalay, Ábel Sinkovics, Z. Porkoláb
Argument selection defects, in which the programmer has chosen the wrong argument to a function call is a widely investigated problem. The compiler can detect such misuse of arguments based on the argument and parameter type in case of statically typed programming languages. When adjacent parameters have the same type, or they can be converted between one another, the potential error will not be diagnosed. Related research is usually confined to exact type equivalence, often ignoring potential implicit or explicit conversions. However, in current mainstream languages, like C++, built-in conversions between numerics and user-defined conversions may significantly increase the number of mistakes to go unnoticed. We investigated the situation for C and C++ languages where functions are defined with multiple adjacent parameters that allow arguments to pass in the wrong order. When implicit conversions are taken into account, the number of mistake-prone function declarations significantly increases compared to strict type equivalence. We analysed the outcome and categorised the offending parameter types. The empirical results should further encourage the language and library development community to emphasise the importance of strong typing and the restriction of implicit conversion.
{"title":"The Role of Implicit Conversions in Erroneous Function Argument Swapping in C++","authors":"Richárd Szalay, Ábel Sinkovics, Z. Porkoláb","doi":"10.1109/SCAM51674.2020.00028","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00028","url":null,"abstract":"Argument selection defects, in which the programmer has chosen the wrong argument to a function call is a widely investigated problem. The compiler can detect such misuse of arguments based on the argument and parameter type in case of statically typed programming languages. When adjacent parameters have the same type, or they can be converted between one another, the potential error will not be diagnosed. Related research is usually confined to exact type equivalence, often ignoring potential implicit or explicit conversions. However, in current mainstream languages, like C++, built-in conversions between numerics and user-defined conversions may significantly increase the number of mistakes to go unnoticed. We investigated the situation for C and C++ languages where functions are defined with multiple adjacent parameters that allow arguments to pass in the wrong order. When implicit conversions are taken into account, the number of mistake-prone function declarations significantly increases compared to strict type equivalence. We analysed the outcome and categorised the offending parameter types. The empirical results should further encourage the language and library development community to emphasise the importance of strong typing and the restriction of implicit conversion.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"38 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131234856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/scam51674.2020.00002
{"title":"Title Page iii","authors":"","doi":"10.1109/scam51674.2020.00002","DOIUrl":"https://doi.org/10.1109/scam51674.2020.00002","url":null,"abstract":"","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129138482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00035
Tukaram Muske, A. Serebrenik
Static analysis tools are useful to detect common programming errors. However, they generate a large number of false positives. Postprocessing of these alarms using a model checker has been proposed to automatically eliminate false positives from them. To scale up the automated false positives elimination (AFPE), several techniques, e.g., program slicing, are used. However, these techniques increase the time taken by AFPE, and the increased time is a major concern during application of AFPE to alarms generated on large systems.To reduce the time taken by AFPE, we propose two techniques. The techniques achieve the reduction by identifying and skipping redundant calls to the slicer and model checker. The first technique is based on our observation that, (a) combination of application-level slicing, verification with incremental context, and the context-level slicing helps to eliminate more false positives; (b) however, doing so can result in redundant calls to the slicer. In this technique, we use data dependencies to compute these redundant calls. The second technique is based on our observation that (a) code partitioning is commonly used by static analysis tools to analyze very large systems, and (b) applying AFPE to alarms generated on partitioned-code can result in repeated calls to both the slicer and model checker. We use memoization to identify the repeated calls and skip them.The first technique is currently under evaluation. Our initial evaluation of the second technique indicates that it reduces AFPE time by up to 56%, with median reduction of 12.15%.
{"title":"Techniques for Efficient Automated Elimination of False Positives","authors":"Tukaram Muske, A. Serebrenik","doi":"10.1109/SCAM51674.2020.00035","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00035","url":null,"abstract":"Static analysis tools are useful to detect common programming errors. However, they generate a large number of false positives. Postprocessing of these alarms using a model checker has been proposed to automatically eliminate false positives from them. To scale up the automated false positives elimination (AFPE), several techniques, e.g., program slicing, are used. However, these techniques increase the time taken by AFPE, and the increased time is a major concern during application of AFPE to alarms generated on large systems.To reduce the time taken by AFPE, we propose two techniques. The techniques achieve the reduction by identifying and skipping redundant calls to the slicer and model checker. The first technique is based on our observation that, (a) combination of application-level slicing, verification with incremental context, and the context-level slicing helps to eliminate more false positives; (b) however, doing so can result in redundant calls to the slicer. In this technique, we use data dependencies to compute these redundant calls. The second technique is based on our observation that (a) code partitioning is commonly used by static analysis tools to analyze very large systems, and (b) applying AFPE to alarms generated on partitioned-code can result in repeated calls to both the slicer and model checker. We use memoization to identify the repeated calls and skip them.The first technique is currently under evaluation. Our initial evaluation of the second technique indicates that it reduces AFPE time by up to 56%, with median reduction of 12.15%.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115030655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00013
Thomas Schweizer, Vassilis Zafeiris, Marios Fokaefs, Michalis Famelis
Refactoring does not always improve monotonically the quality of software. In this exploratory study, we analyze the revision history of JFreechart to see if fluctuations in internal quality metrics in commits containing refactoring can be used as indicators for the presence of design tradeoffs. We present qualitative and quantitative results suggesting that, in the context of refactoring, tradeoffs in internal quality metrics can be used to find design tradeoffs.
{"title":"Can Refactorings Indicate Design Tradeoffs?","authors":"Thomas Schweizer, Vassilis Zafeiris, Marios Fokaefs, Michalis Famelis","doi":"10.1109/SCAM51674.2020.00013","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00013","url":null,"abstract":"Refactoring does not always improve monotonically the quality of software. In this exploratory study, we analyze the revision history of JFreechart to see if fluctuations in internal quality metrics in commits containing refactoring can be used as indicators for the presence of design tradeoffs. We present qualitative and quantitative results suggesting that, in the context of refactoring, tradeoffs in internal quality metrics can be used to find design tradeoffs.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132388328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}