Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00025
Felix Dobslaw, Morgan Vallin, Robin Sundström
In programming, concurrency allows threads to share processing units interleaving and seemingly simultaneous to improve resource utilization and performance. Previous research has found that concurrency faults are hard to avoid, hard to find, often leading to undesired and unpredictable behavior. Further, with the growing availability of multi-core devices and adaptation of concurrency features in high-level languages, concurrency faults occur reportedly often, which is why countermeasures must be investigated to limit harm. Reactive programming provides an abstraction to simplify complex concurrent and asynchronous tasks through reactive language extensions such as the RxJava and Project Reactor libraries for Java. Still, blocking violations are possibly resulting in concurrency faults with no Java compiler warnings. BlockHound is a tool that detects incorrect blocking by wrapping the original code and intercepting blocking calls to provide appropriate runtime errors. In this study, we seek an understanding of how common blocking violations are and whether a tool such as BlockHound can give us insight into the root-causes to highlight them as pitfalls to developers. The investigated Softwares are Java-based open-source projects using reactive frameworks selected based on high star ratings and large fork quantities that indicate high adoption. We activated BlockHound in the project’s test-suites and analyzed log files for common patterns to reveal blocking violations in 7/29 investigated open-source projects with 5024 stars and 1437 forks. A small number of system calls could be identified as root-causes. We here present countermeasures that successfully removed the uncertainty of blocking violations. The code’s intentional logic was retained in all validated projects through passing unit-tests.
{"title":"Free the Bugs: Disclosing Blocking Violations in Reactive Programming","authors":"Felix Dobslaw, Morgan Vallin, Robin Sundström","doi":"10.1109/SCAM51674.2020.00025","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00025","url":null,"abstract":"In programming, concurrency allows threads to share processing units interleaving and seemingly simultaneous to improve resource utilization and performance. Previous research has found that concurrency faults are hard to avoid, hard to find, often leading to undesired and unpredictable behavior. Further, with the growing availability of multi-core devices and adaptation of concurrency features in high-level languages, concurrency faults occur reportedly often, which is why countermeasures must be investigated to limit harm. Reactive programming provides an abstraction to simplify complex concurrent and asynchronous tasks through reactive language extensions such as the RxJava and Project Reactor libraries for Java. Still, blocking violations are possibly resulting in concurrency faults with no Java compiler warnings. BlockHound is a tool that detects incorrect blocking by wrapping the original code and intercepting blocking calls to provide appropriate runtime errors. In this study, we seek an understanding of how common blocking violations are and whether a tool such as BlockHound can give us insight into the root-causes to highlight them as pitfalls to developers. The investigated Softwares are Java-based open-source projects using reactive frameworks selected based on high star ratings and large fork quantities that indicate high adoption. We activated BlockHound in the project’s test-suites and analyzed log files for common patterns to reveal blocking violations in 7/29 investigated open-source projects with 5024 stars and 1437 forks. A small number of system calls could be identified as root-causes. We here present countermeasures that successfully removed the uncertainty of blocking violations. The code’s intentional logic was retained in all validated projects through passing unit-tests.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128776000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00024
Ana Paula M. Tarchetti, L. Amaral, M. Oliveira, R. Bonifácio, G. Pinto, D. Lo
Maintaining complex software systems is a timeconsuming and challenging task. Practitioners must have a general understanding of the system’s decomposition and how the system’s developers have implemented the software features (probably cutting across different modules). Re-engineering practices are imperative to tackle these challenges. Previous research has shown the benefits of using software module clustering (SMC) to aid developers during re-engineering tasks (e.g., revealing the architecture of the systems, identifying how the concerns are spread among the modules of the systems, recommending refactorings, and so on). Nonetheless, although the literature on software module clustering has substantially evolved in the last 20 years, there are just a few tools publicly available. Still, these available tools do not scale to large scenarios, in particular, when optimizing multi-objectives. In this paper we present the Draco Clustering Tool (DCT), a new software module clustering tool. DCT design decisions make multi-objective software clusterization feasible, even for software systems comprising up to 1,000 modules. We report an empirical study that compares DCT with other available multi-objective tool (HD-NSGA-II), and both DCT and HD-NSGA-II with mono-objective tools (BUNCH and HD-LNS). We evidence that DCT solves the scalability issue when clustering medium size projects in a multi-objective mode. In a more extreme case, DCT was able to cluster Druid (an analytics data store) 221 times faster than HD-NSGA-II.
{"title":"DCT: An Scalable Multi-Objective Module Clustering Tool","authors":"Ana Paula M. Tarchetti, L. Amaral, M. Oliveira, R. Bonifácio, G. Pinto, D. Lo","doi":"10.1109/SCAM51674.2020.00024","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00024","url":null,"abstract":"Maintaining complex software systems is a timeconsuming and challenging task. Practitioners must have a general understanding of the system’s decomposition and how the system’s developers have implemented the software features (probably cutting across different modules). Re-engineering practices are imperative to tackle these challenges. Previous research has shown the benefits of using software module clustering (SMC) to aid developers during re-engineering tasks (e.g., revealing the architecture of the systems, identifying how the concerns are spread among the modules of the systems, recommending refactorings, and so on). Nonetheless, although the literature on software module clustering has substantially evolved in the last 20 years, there are just a few tools publicly available. Still, these available tools do not scale to large scenarios, in particular, when optimizing multi-objectives. In this paper we present the Draco Clustering Tool (DCT), a new software module clustering tool. DCT design decisions make multi-objective software clusterization feasible, even for software systems comprising up to 1,000 modules. We report an empirical study that compares DCT with other available multi-objective tool (HD-NSGA-II), and both DCT and HD-NSGA-II with mono-objective tools (BUNCH and HD-LNS). We evidence that DCT solves the scalability issue when clustering medium size projects in a multi-objective mode. In a more extreme case, DCT was able to cluster Druid (an analytics data store) 221 times faster than HD-NSGA-II.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128609490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00011
Rungroj Maipradit, B. Lin, Csaba Nagy, G. Bavota, Michele Lanza, Hideaki Hata, Ken-ichi Matsumoto
Modern software is developed under considerable time pressure, which implies that developers more often than not have to resort to compromises when it comes to code that is well written and code that just does the job. This has led over the past decades to the concept of “technical debt”, a short-term hack that potentially generates long-term maintenance problems. Self-admitted technical debt (SATD) is a particular form of technical debt: developers consciously perform the hack but also document it in the code by adding comments as a reminder (or as an admission of guilt). We focus on a specific type of SATD, namely “On-hold” SATD, in which developers document in their comments the need to halt an implementation task due to conditions outside of their scope of work (e.g., an open issue must be closed before a function can be implemented).We present an approach, based on regular expressions and machine learning, which is able to detect issues referenced in code comments, and to automatically classify the detected instances as either “On-hold” (the issue is referenced to indicate the need to wait for its resolution before completing a task), or as “cross-reference”, (the issue is referenced to document the code, for example to explain the rationale behind an implementation choice). Our approach also mines the issue tracker of the projects to check if the On-hold SATD instances are “superfluous” and can be removed (i.e., the referenced issue has been closed, but the SATD is still in the code). Our evaluation confirms that our approach can indeed identify relevant instances of On-hold SATD. We illustrate its usefulness by identifying superfluous On-hold SATD instances in open source projects as confirmed by the original developers.
{"title":"Automated Identification of On-hold Self-admitted Technical Debt","authors":"Rungroj Maipradit, B. Lin, Csaba Nagy, G. Bavota, Michele Lanza, Hideaki Hata, Ken-ichi Matsumoto","doi":"10.1109/SCAM51674.2020.00011","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00011","url":null,"abstract":"Modern software is developed under considerable time pressure, which implies that developers more often than not have to resort to compromises when it comes to code that is well written and code that just does the job. This has led over the past decades to the concept of “technical debt”, a short-term hack that potentially generates long-term maintenance problems. Self-admitted technical debt (SATD) is a particular form of technical debt: developers consciously perform the hack but also document it in the code by adding comments as a reminder (or as an admission of guilt). We focus on a specific type of SATD, namely “On-hold” SATD, in which developers document in their comments the need to halt an implementation task due to conditions outside of their scope of work (e.g., an open issue must be closed before a function can be implemented).We present an approach, based on regular expressions and machine learning, which is able to detect issues referenced in code comments, and to automatically classify the detected instances as either “On-hold” (the issue is referenced to indicate the need to wait for its resolution before completing a task), or as “cross-reference”, (the issue is referenced to document the code, for example to explain the rationale behind an implementation choice). Our approach also mines the issue tracker of the projects to check if the On-hold SATD instances are “superfluous” and can be removed (i.e., the referenced issue has been closed, but the SATD is still in the code). Our evaluation confirms that our approach can indeed identify relevant instances of On-hold SATD. We illustrate its usefulness by identifying superfluous On-hold SATD instances in open source projects as confirmed by the original developers.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"8 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128782013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00030
J. Fabry, Ynès Jaradin, Aynel Gül
Part of the ecosystem of applications running on mainframe computers is the DFSORT program. It is responsible for sorting and reformatting data (amongst other functionalities) and is configured by specifications written in a Domain-Specific Language (DSL). When migrating such sort workloads off from the mainframe, the SyncSort product is an attractive alternative. It is also configured by specifications written in a DSL but this language is structured in a radically different way. Whereas the DFSORT DSL uses an explicit fixed pipeline for processing, the SyncSort DSL does not. To allow DFSORT workloads to run on SyncSort we have therefore built a source-to-source translator from the DFSORT DSL to the SyncSort DSL. Our language converter performs abstract interpretation of the DFSORT specification, considering the different steps in the DFSORT pipeline at translation time. This is done by building a graph of objects and key to the construction of this graph is the reification of the records being sorted. In this paper we report on the design and implementation of the converter, describing how it treats the DFSORT pipeline. We also show how its design allowed for the straightforward implementation of unexpected changes in requirements for the generated output.
{"title":"Engineering a Converter Between Two Domain-Specific Languages for Sorting","authors":"J. Fabry, Ynès Jaradin, Aynel Gül","doi":"10.1109/SCAM51674.2020.00030","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00030","url":null,"abstract":"Part of the ecosystem of applications running on mainframe computers is the DFSORT program. It is responsible for sorting and reformatting data (amongst other functionalities) and is configured by specifications written in a Domain-Specific Language (DSL). When migrating such sort workloads off from the mainframe, the SyncSort product is an attractive alternative. It is also configured by specifications written in a DSL but this language is structured in a radically different way. Whereas the DFSORT DSL uses an explicit fixed pipeline for processing, the SyncSort DSL does not. To allow DFSORT workloads to run on SyncSort we have therefore built a source-to-source translator from the DFSORT DSL to the SyncSort DSL. Our language converter performs abstract interpretation of the DFSORT specification, considering the different steps in the DFSORT pipeline at translation time. This is done by building a graph of objects and key to the construction of this graph is the reification of the records being sorted. In this paper we report on the design and implementation of the converter, describing how it treats the DFSORT pipeline. We also show how its design allowed for the straightforward implementation of unexpected changes in requirements for the generated output.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128700822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00031
Roger Scott, Joseph Ranieri, Lucja Kot, Vineeth Kashyap
Programmers often add meaningful information about program semantics when naming program entities such as variables, functions, and macros. However, static analysis tools typically discount this information when they look for bugs in a program. In this work, we describe the design and implementation of a static analysis checker called SWAPD, which uses the natural language information in programs to warn about mistakenly-swapped arguments at call sites. SWAPD combines two independent detection strategies to improve the effectiveness of the overall checker. We present the results of a comprehensive evaluation of SWAPD over a large corpus of C and C++ programs totaling 417 million lines of code. In this evaluation, SWAPD found 154 manually-vetted real-world cases of mistakenly-swapped arguments, suggesting that such errors— while not pervasive in released code—are a real problem and a worthwhile target for static analysis.
{"title":"Out of Sight, Out of Place: Detecting and Assessing Swapped Arguments","authors":"Roger Scott, Joseph Ranieri, Lucja Kot, Vineeth Kashyap","doi":"10.1109/SCAM51674.2020.00031","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00031","url":null,"abstract":"Programmers often add meaningful information about program semantics when naming program entities such as variables, functions, and macros. However, static analysis tools typically discount this information when they look for bugs in a program. In this work, we describe the design and implementation of a static analysis checker called SWAPD, which uses the natural language information in programs to warn about mistakenly-swapped arguments at call sites. SWAPD combines two independent detection strategies to improve the effectiveness of the overall checker. We present the results of a comprehensive evaluation of SWAPD over a large corpus of C and C++ programs totaling 417 million lines of code. In this evaluation, SWAPD found 154 manually-vetted real-world cases of mistakenly-swapped arguments, suggesting that such errors— while not pervasive in released code—are a real problem and a worthwhile target for static analysis.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121342039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00034
Camilo Velázquez-Rodríguez, Coen De Roover
Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting.This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.
{"title":"MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven","authors":"Camilo Velázquez-Rodríguez, Coen De Roover","doi":"10.1109/SCAM51674.2020.00034","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00034","url":null,"abstract":"Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting.This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114451231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00028
Richárd Szalay, Ábel Sinkovics, Z. Porkoláb
Argument selection defects, in which the programmer has chosen the wrong argument to a function call is a widely investigated problem. The compiler can detect such misuse of arguments based on the argument and parameter type in case of statically typed programming languages. When adjacent parameters have the same type, or they can be converted between one another, the potential error will not be diagnosed. Related research is usually confined to exact type equivalence, often ignoring potential implicit or explicit conversions. However, in current mainstream languages, like C++, built-in conversions between numerics and user-defined conversions may significantly increase the number of mistakes to go unnoticed. We investigated the situation for C and C++ languages where functions are defined with multiple adjacent parameters that allow arguments to pass in the wrong order. When implicit conversions are taken into account, the number of mistake-prone function declarations significantly increases compared to strict type equivalence. We analysed the outcome and categorised the offending parameter types. The empirical results should further encourage the language and library development community to emphasise the importance of strong typing and the restriction of implicit conversion.
{"title":"The Role of Implicit Conversions in Erroneous Function Argument Swapping in C++","authors":"Richárd Szalay, Ábel Sinkovics, Z. Porkoláb","doi":"10.1109/SCAM51674.2020.00028","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00028","url":null,"abstract":"Argument selection defects, in which the programmer has chosen the wrong argument to a function call is a widely investigated problem. The compiler can detect such misuse of arguments based on the argument and parameter type in case of statically typed programming languages. When adjacent parameters have the same type, or they can be converted between one another, the potential error will not be diagnosed. Related research is usually confined to exact type equivalence, often ignoring potential implicit or explicit conversions. However, in current mainstream languages, like C++, built-in conversions between numerics and user-defined conversions may significantly increase the number of mistakes to go unnoticed. We investigated the situation for C and C++ languages where functions are defined with multiple adjacent parameters that allow arguments to pass in the wrong order. When implicit conversions are taken into account, the number of mistake-prone function declarations significantly increases compared to strict type equivalence. We analysed the outcome and categorised the offending parameter types. The empirical results should further encourage the language and library development community to emphasise the importance of strong typing and the restriction of implicit conversion.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"38 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131234856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00023
D. Binkley, James R. Glenn, A. Alsharif, Phil McMinn
The squeeziness of a sequence of program statements captures the loss of information (loss of entropy) caused by its execution. This information loss leads to problems such as failed error propagation. Intuitively, longer more complex statement sequences (more formally, longer paths of dependencies) bring greater squeeze. Using the cost of search-based test data generation as a measure of lost information, we investigate this intuition. Unexpectedly, we find virtually no correlation between dependence path length and information loss. Thus our study represents an (unexpected) negative result.Moreover, looking through the literature, this finding is in agreement with recent work of Masri and Podgurski. As such, our work replicates a negative result. More precisely, it provides a conceptual, generalization and extension replication. The replication falls into the category of a conceptual replication in that different methods are used to address a common problem, and into the category of generalization and extension in that we sample a different population of subjects and more rigorously consider the resulting data. Specifically, while Masri and Podgurski only informally observed the lack of a connection, we rigorously assess it using a range of statistical models.
{"title":"An Investigation into the Effect of Control and Data Dependence Paths on Predicate Testability","authors":"D. Binkley, James R. Glenn, A. Alsharif, Phil McMinn","doi":"10.1109/SCAM51674.2020.00023","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00023","url":null,"abstract":"The squeeziness of a sequence of program statements captures the loss of information (loss of entropy) caused by its execution. This information loss leads to problems such as failed error propagation. Intuitively, longer more complex statement sequences (more formally, longer paths of dependencies) bring greater squeeze. Using the cost of search-based test data generation as a measure of lost information, we investigate this intuition. Unexpectedly, we find virtually no correlation between dependence path length and information loss. Thus our study represents an (unexpected) negative result.Moreover, looking through the literature, this finding is in agreement with recent work of Masri and Podgurski. As such, our work replicates a negative result. More precisely, it provides a conceptual, generalization and extension replication. The replication falls into the category of a conceptual replication in that different methods are used to address a common problem, and into the category of generalization and extension in that we sample a different population of subjects and more rigorously consider the resulting data. Specifically, while Masri and Podgurski only informally observed the lack of a connection, we rigorously assess it using a range of statistical models.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130105557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/SCAM51674.2020.00035
Tukaram Muske, A. Serebrenik
Static analysis tools are useful to detect common programming errors. However, they generate a large number of false positives. Postprocessing of these alarms using a model checker has been proposed to automatically eliminate false positives from them. To scale up the automated false positives elimination (AFPE), several techniques, e.g., program slicing, are used. However, these techniques increase the time taken by AFPE, and the increased time is a major concern during application of AFPE to alarms generated on large systems.To reduce the time taken by AFPE, we propose two techniques. The techniques achieve the reduction by identifying and skipping redundant calls to the slicer and model checker. The first technique is based on our observation that, (a) combination of application-level slicing, verification with incremental context, and the context-level slicing helps to eliminate more false positives; (b) however, doing so can result in redundant calls to the slicer. In this technique, we use data dependencies to compute these redundant calls. The second technique is based on our observation that (a) code partitioning is commonly used by static analysis tools to analyze very large systems, and (b) applying AFPE to alarms generated on partitioned-code can result in repeated calls to both the slicer and model checker. We use memoization to identify the repeated calls and skip them.The first technique is currently under evaluation. Our initial evaluation of the second technique indicates that it reduces AFPE time by up to 56%, with median reduction of 12.15%.
{"title":"Techniques for Efficient Automated Elimination of False Positives","authors":"Tukaram Muske, A. Serebrenik","doi":"10.1109/SCAM51674.2020.00035","DOIUrl":"https://doi.org/10.1109/SCAM51674.2020.00035","url":null,"abstract":"Static analysis tools are useful to detect common programming errors. However, they generate a large number of false positives. Postprocessing of these alarms using a model checker has been proposed to automatically eliminate false positives from them. To scale up the automated false positives elimination (AFPE), several techniques, e.g., program slicing, are used. However, these techniques increase the time taken by AFPE, and the increased time is a major concern during application of AFPE to alarms generated on large systems.To reduce the time taken by AFPE, we propose two techniques. The techniques achieve the reduction by identifying and skipping redundant calls to the slicer and model checker. The first technique is based on our observation that, (a) combination of application-level slicing, verification with incremental context, and the context-level slicing helps to eliminate more false positives; (b) however, doing so can result in redundant calls to the slicer. In this technique, we use data dependencies to compute these redundant calls. The second technique is based on our observation that (a) code partitioning is commonly used by static analysis tools to analyze very large systems, and (b) applying AFPE to alarms generated on partitioned-code can result in repeated calls to both the slicer and model checker. We use memoization to identify the repeated calls and skip them.The first technique is currently under evaluation. Our initial evaluation of the second technique indicates that it reduces AFPE time by up to 56%, with median reduction of 12.15%.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115030655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/scam51674.2020.00002
{"title":"Title Page iii","authors":"","doi":"10.1109/scam51674.2020.00002","DOIUrl":"https://doi.org/10.1109/scam51674.2020.00002","url":null,"abstract":"","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129138482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}