首页 > 最新文献

2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)最新文献

英文 中文
Towards Better Symbol Resolution for C/C++ Programs: A Cluster-Based Solution 面向更好的C/ c++程序符号解析:基于集群的解决方案
Richárd Szalay, Z. Porkoláb, Dániel Krupp
Resolving symbol references is an important part of many application areas from development environments to various static analyser tools, especially when it is used for code comprehension purposes. Different occurrences of the same program elements, like function definitions and their call sites, variable declarations and their usage, or type definitions and their applications should be connected. In case of the C++ programming language, the most current tools use mangled names to correlate symbols, e.g. when implementing actions like "go to definition" or "list all references". However, for large projects, where multiple binaries are created, symbol resolution based on mangled names can be, and usually is, ambiguous. This leads to inaccurate behaviour even in major development tools. In this paper we explore the reason of this ambiguity, and propose our clustering algorithm based on essential build information to improve the accuracy of symbol resolution. We implemented our method as part of the CodeCompass open source code comprehension tool and measured its efficiency.
从开发环境到各种静态分析工具,解析符号引用是许多应用程序领域的重要组成部分,特别是当它用于代码理解目的时。相同程序元素的不同出现,如函数定义和它们的调用地点,变量声明和它们的用法,或者类型定义和它们的应用程序,应该联系起来。就c++编程语言而言,当前大多数工具使用混乱的名称来关联符号,例如在实现“转到定义”或“列出所有引用”等操作时。然而,对于创建多个二进制文件的大型项目,基于混乱名称的符号解析可能是,而且通常是不明确的。即使在主要的开发工具中,这也会导致不准确的行为。本文探讨了产生歧义的原因,提出了基于基本构建信息的聚类算法,以提高符号分辨的精度。我们将我们的方法作为CodeCompass开源代码理解工具的一部分实现,并测量了它的效率。
{"title":"Towards Better Symbol Resolution for C/C++ Programs: A Cluster-Based Solution","authors":"Richárd Szalay, Z. Porkoláb, Dániel Krupp","doi":"10.1109/SCAM.2017.15","DOIUrl":"https://doi.org/10.1109/SCAM.2017.15","url":null,"abstract":"Resolving symbol references is an important part of many application areas from development environments to various static analyser tools, especially when it is used for code comprehension purposes. Different occurrences of the same program elements, like function definitions and their call sites, variable declarations and their usage, or type definitions and their applications should be connected. In case of the C++ programming language, the most current tools use mangled names to correlate symbols, e.g. when implementing actions like \"go to definition\" or \"list all references\". However, for large projects, where multiple binaries are created, symbol resolution based on mangled names can be, and usually is, ambiguous. This leads to inaccurate behaviour even in major development tools. In this paper we explore the reason of this ambiguity, and propose our clustering algorithm based on essential build information to improve the accuracy of symbol resolution. We implemented our method as part of the CodeCompass open source code comprehension tool and measured its efficiency.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132084474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On the Relationships Between Stability and Bug-Proneness of Code Clones: An Empirical Study 代码克隆稳定性与bug易感性关系的实证研究
M. S. Rahman, C. Roy
Exact or similar copies of code fragments in a code base are known as code clones. Code clones are considered as one of the serious code smells. Stability is a widely investigated perspective of assessing the impacts of clones on software systems. A number of existing studies show that clones are often less stable than non-cloned code. This suggests that clones change more frequently than non-cloned code and thus may require comparatively more maintenance efforts. Again, frequent changes to clones may increase the likelihood of missing change propagation to the co-change candidates leading to inconsistencies or bugs. However, none of the existing studies investigate whether stability of clones is related to the bug-proneness. In this paper, we present an empirical study that analyzes the relationships between stability and bug-proneness of clones. We identify bug-fix commits by analyzing the commit messages from software repositories. We then identify the clones those are changed in the bug-fix commits as bug-prone clones. We then compare the stability of buggy and non-buggy clones considering the fine-grained syntactic change types and their significance.,,Our experimental results based on five open-source Java systems of different size and application domains show that (1) stability and bug-proneness of code clones are related and this relationship is statistically significant, (2) for both exact (Type 1) and near-miss (Type 2 and Type 3) clones, buggy clones tend to have higher frequency of changes than non-buggy clones, (3) the bug-proneness of Type 2 and Type 3 clones tend to be strongly related with their stability compared to Type 1 clones, and (4) the relation between the stability and the bug-proneness of clones with respect to fine-grained change types is likely to be influenced by the changes of low to medium significance. We believe that our findings are important and potentially useful in identifying and prioritizing candidate clones for management.
代码库中代码片段的精确或相似副本称为代码克隆。代码克隆被认为是严重的代码异味之一。稳定性是评估克隆对软件系统影响的一个广泛研究的角度。许多现有的研究表明,克隆代码通常比非克隆代码更不稳定。这表明克隆代码比非克隆代码更改更频繁,因此可能需要相对更多的维护工作。同样,对克隆的频繁更改可能会增加遗漏更改传播到共同更改候选项的可能性,从而导致不一致或错误。然而,目前还没有研究表明克隆的稳定性是否与细菌倾向有关。本文对克隆的稳定性与虫性之间的关系进行了实证研究。我们通过分析来自软件存储库的提交消息来识别bug修复提交。然后,我们将在bug修复提交中更改的克隆标识为容易出现bug的克隆。然后,考虑到细粒度语法变化类型及其重要性,我们比较了有bug和无bug克隆的稳定性。基于5个不同规模和应用领域的开源Java系统,我们的实验结果表明:(1)代码克隆的稳定性和bug倾向是相关的,并且这种关系具有统计学意义;(2)对于精确(类型1)和近距离(类型2和类型3)克隆,有bug的克隆比无bug的克隆具有更高的更改频率。(3)与1型克隆相比,2型和3型克隆的bug-proneness与其稳定性有较强的相关性;(4)对于细粒度变化类型,克隆的稳定性与bug-proneness的关系很可能受到中低等显著性变化的影响。我们相信,我们的发现对于识别和确定候选克隆的管理优先级是重要的和潜在的有用的。
{"title":"On the Relationships Between Stability and Bug-Proneness of Code Clones: An Empirical Study","authors":"M. S. Rahman, C. Roy","doi":"10.1109/SCAM.2017.26","DOIUrl":"https://doi.org/10.1109/SCAM.2017.26","url":null,"abstract":"Exact or similar copies of code fragments in a code base are known as code clones. Code clones are considered as one of the serious code smells. Stability is a widely investigated perspective of assessing the impacts of clones on software systems. A number of existing studies show that clones are often less stable than non-cloned code. This suggests that clones change more frequently than non-cloned code and thus may require comparatively more maintenance efforts. Again, frequent changes to clones may increase the likelihood of missing change propagation to the co-change candidates leading to inconsistencies or bugs. However, none of the existing studies investigate whether stability of clones is related to the bug-proneness. In this paper, we present an empirical study that analyzes the relationships between stability and bug-proneness of clones. We identify bug-fix commits by analyzing the commit messages from software repositories. We then identify the clones those are changed in the bug-fix commits as bug-prone clones. We then compare the stability of buggy and non-buggy clones considering the fine-grained syntactic change types and their significance.,,Our experimental results based on five open-source Java systems of different size and application domains show that (1) stability and bug-proneness of code clones are related and this relationship is statistically significant, (2) for both exact (Type 1) and near-miss (Type 2 and Type 3) clones, buggy clones tend to have higher frequency of changes than non-buggy clones, (3) the bug-proneness of Type 2 and Type 3 clones tend to be strongly related with their stability compared to Type 1 clones, and (4) the relation between the stability and the bug-proneness of clones with respect to fine-grained change types is likely to be influenced by the changes of low to medium significance. We believe that our findings are important and potentially useful in identifying and prioritizing candidate clones for management.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124852139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Working Around Loops for Infeasible Path Detection in Binary Programs 二进制程序中不可行的路径检测绕环工作
Jordy Ruiz, H. Cassé, M. D. Michiel
The research of a safe Worst-Case Execution Time (WCET) estimation is necessary to build reliable hard, critical real-time systems. Infeasible paths are a major cause of overestimation of theWorst-Case Execution Time (WCET): without data flow constraints, static analysis by implicit path enumeration will take into account semantically impossible, potentially expensive execution paths, making theWorst-Case Execution Path unreachable in practice. We present in this paper an approach that allows to significantly tighten the WCET by identifying infeasible paths, namely in loops, and injecting them as additional Integer Linear Programming (ILP) constraints during the WCET computation. Our entire analysis, albeit platform independent, works directly on binary programs in order to get the tightest, most reliable WCET. Impactful infeasible paths are largely found within (often nested) loops; therefore having an efficient, exploitable and reasonably scalable representation of the state of a program within loops is a key challenge of infeasible path analysis. We show ours to yield decidedly significant results on a selection of benchmarks from actual hard real-time applications as well as the classic M¨alardalen suite.
研究安全的最坏情况执行时间(WCET)估计对于构建可靠的、硬的、关键的实时系统是必要的。不可行的路径是高估最坏情况执行时间(WCET)的主要原因:没有数据流约束,通过隐式路径枚举的静态分析将考虑语义上不可能的,潜在的昂贵的执行路径,使最坏情况执行路径在实践中无法到达。在本文中,我们提出了一种方法,通过识别不可行路径,即在循环中,并在WCET计算期间将它们作为附加的整数线性规划(ILP)约束注入,从而显着收紧WCET。我们的整个分析,尽管是独立于平台的,直接在二进制程序上工作,以获得最严格、最可靠的WCET。有效的不可行的路径主要存在于(通常是嵌套的)循环中;因此,在循环中对程序的状态进行有效的、可利用的和合理可扩展的表示是不可行的路径分析的一个关键挑战。在实际硬实时应用程序和经典的M¨alardalen套件的基准测试中,我们展示了我们的方法可以产生明显的显著结果。
{"title":"Working Around Loops for Infeasible Path Detection in Binary Programs","authors":"Jordy Ruiz, H. Cassé, M. D. Michiel","doi":"10.1109/SCAM.2017.13","DOIUrl":"https://doi.org/10.1109/SCAM.2017.13","url":null,"abstract":"The research of a safe Worst-Case Execution Time (WCET) estimation is necessary to build reliable hard, critical real-time systems. Infeasible paths are a major cause of overestimation of theWorst-Case Execution Time (WCET): without data flow constraints, static analysis by implicit path enumeration will take into account semantically impossible, potentially expensive execution paths, making theWorst-Case Execution Path unreachable in practice. We present in this paper an approach that allows to significantly tighten the WCET by identifying infeasible paths, namely in loops, and injecting them as additional Integer Linear Programming (ILP) constraints during the WCET computation. Our entire analysis, albeit platform independent, works directly on binary programs in order to get the tightest, most reliable WCET. Impactful infeasible paths are largely found within (often nested) loops; therefore having an efficient, exploitable and reasonably scalable representation of the state of a program within loops is a key challenge of infeasible path analysis. We show ours to yield decidedly significant results on a selection of benchmarks from actual hard real-time applications as well as the classic M¨alardalen suite.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Exploratory Study of Functional Redundancy in Code Repositories 代码库中功能冗余的探索性研究
Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos
In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).
在大型代码存储库中,跨项目重复函数的可能性很高。这种类型的功能冗余(FR)对于最近的代码重用和修复方法是理想的。然而,由于它与程序等价密切相关,程序等价是一个不可确定的问题,因此难以测量。这就是大多数研究冗余的研究关注语法复制而不是语义复制(例如克隆)的原因之一。在本文中,我们用从SourceForge随机抽取的68个Java项目来评估代码库中FR的范围。我们的技术通过首先搜索具有相似接口(返回类型、名称和参数类型)的方法来近似函数相似性。然后我们执行这些方法来验证对于给定的输入样本,哪些候选对具有匹配的输出。最近的一些研究也关注这种类型的语义复制,但我们的检测方法通常更便宜,更精确,因为它关注方法并使用接口来减少搜索空间。虽然我们的范围仅限于静态方法,这使得我们的结果保守,但我们的发现是有希望的。特别是,我们发现了984对冗余方法,并且存储库中的68个项目中有28个(41.17%)存在冗余。此外,我们可以访问源代码的大多数冗余方法都没有引用文本克隆(只有一个冗余方法对引用复制的代码)。我们的研究还表明,所提出的冗余检测方法具有很高的精度,并且通常成本低廉(每种方法只需执行四次即可达到100%的精度)。
{"title":"An Exploratory Study of Functional Redundancy in Code Repositories","authors":"Marcelo Suzuki, A. C. D. Paula, E. Guerra, C. Lopes, Otávio Augusto Lazzarini Lemos","doi":"10.1109/SCAM.2017.21","DOIUrl":"https://doi.org/10.1109/SCAM.2017.21","url":null,"abstract":"In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115264187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Harvesting the Wisdom of the Crowd to Infer Method Nullness in Java 在Java中收集人群的智慧来推断方法为空
Manuel Leuenberger, Haidar Osman, Mohammad Ghafari, Oscar Nierstrasz
Null pointer exceptions are common bugs in Java projects. Previous research has shown that dereferencing the results of method calls is the main source of these bugs, as developers do not anticipate that some methods return null. To make matters worse, we find that whether a method returns null or not (nullness), is rarely documented. We argue that method nullness is a vital piece of information that can help developers avoid this category of bugs. This is especially important for external APIs where developers may not even have access to the code.,,In this paper, we study the method nullness of Apache Lucene, the de facto standard library for text processing in Java. Particularly, we investigate how often the result of each Lucene method is checked against null in Lucene clients. We call this measure method nullability, which can serve as a proxy for method nullness. Analyzing Lucene internal and external usage, we find that most methods are never checked for null. External clients check more methods than Lucene checks internally. Manually inspecting our dataset reveals that some null checks are unnecessary. We present an IDE plugin that complements existing documentation and makes up for missing documentation regarding method nullness and generates nullness annotations, so that static analysis can pinpoint potentially missing or unnecessary null checks.
空指针异常是Java项目中常见的错误。先前的研究表明,解引用方法调用的结果是这些bug的主要来源,因为开发人员没有预料到某些方法返回null。更糟糕的是,我们发现方法是否返回null (null)很少有文档记录。我们认为方法为空是一个重要的信息,可以帮助开发人员避免这类错误。这对于开发人员甚至无法访问代码的外部api尤其重要。在本文中,我们研究了Apache Lucene的方法空性,Lucene是Java文本处理的事实上的标准库。特别是,我们研究了在Lucene客户端中每个Lucene方法的结果被检查为null的频率。我们称此度量为方法空性,它可以作为方法空性的代理。分析Lucene的内部和外部使用情况,我们发现大多数方法从来没有检查null。外部客户端检查的方法比Lucene内部检查的要多。手动检查我们的数据集会发现一些空检查是不必要的。我们提供了一个IDE插件,它补充了现有的文档,弥补了关于方法空性的缺失文档,并生成空性注释,以便静态分析可以查明潜在的缺失或不必要的空检查。
{"title":"Harvesting the Wisdom of the Crowd to Infer Method Nullness in Java","authors":"Manuel Leuenberger, Haidar Osman, Mohammad Ghafari, Oscar Nierstrasz","doi":"10.1109/SCAM.2017.22","DOIUrl":"https://doi.org/10.1109/SCAM.2017.22","url":null,"abstract":"Null pointer exceptions are common bugs in Java projects. Previous research has shown that dereferencing the results of method calls is the main source of these bugs, as developers do not anticipate that some methods return null. To make matters worse, we find that whether a method returns null or not (nullness), is rarely documented. We argue that method nullness is a vital piece of information that can help developers avoid this category of bugs. This is especially important for external APIs where developers may not even have access to the code.,,In this paper, we study the method nullness of Apache Lucene, the de facto standard library for text processing in Java. Particularly, we investigate how often the result of each Lucene method is checked against null in Lucene clients. We call this measure method nullability, which can serve as a proxy for method nullness. Analyzing Lucene internal and external usage, we find that most methods are never checked for null. External clients check more methods than Lucene checks internally. Manually inspecting our dataset reveals that some null checks are unnecessary. We present an IDE plugin that complements existing documentation and makes up for missing documentation regarding method nullness and generates nullness annotations, so that static analysis can pinpoint potentially missing or unnecessary null checks.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126845869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Investigating the Use of Code Analysis and NLP to Promote a Consistent Usage of Identifiers 调查使用代码分析和NLP来促进标识符的一致使用
B. Lin, Simone Scalabrino, Andrea Mocci, R. Oliveto, G. Bavota, Michele Lanza
Meaningless identifiers as well as inconsistent use of identifiers in the source code might hinder code readability and result in increased software maintenance efforts. Over the past years, effort has been devoted to promoting a consistent usage of identifiers across different parts of a system through approaches exploiting static code analysis and Natural Language Processing (NLP). These techniques have been evaluated in small-scale studies, but it is unclear how they compare to each other and how they complement each other. Furthermore, a full-fledged larger empirical evaluation is still missing.,,We aim at bridging this gap. We asked developers of five projects to assess the meaningfulness of the recommendations generated by three techniques, two already existing in the literature (one exploiting static analysis, one using NLP) and a novel one we propose. With a total of 922 rename refactorings evaluated, this is, to the best of our knowledge, the largest empirical study conducted to assess and compare rename refactoring tools promoting a consistent use of identifiers. Our study sheds light on the current state-of-the-art in rename refactoring recommenders, and indicates directions for future work.
无意义的标识符以及在源代码中不一致的标识符使用可能会妨碍代码的可读性,并导致软件维护工作的增加。在过去的几年中,人们一直致力于通过利用静态代码分析和自然语言处理(NLP)的方法,在系统的不同部分中促进标识符的一致使用。这些技术已经在小规模研究中进行了评估,但尚不清楚它们如何相互比较以及如何相互补充。此外,一个成熟的更大的经验评估仍然缺失。我们的目标是弥合这一差距。我们要求五个项目的开发人员评估由三种技术生成的建议的意义,其中两种技术已经存在于文献中(一种利用静态分析,一种使用NLP),另一种是我们提出的新技术。总共评估了922个重命名重构,据我们所知,这是评估和比较重命名重构工具促进标识符一致使用的最大的实证研究。我们的研究揭示了当前重命名重构推荐的最新技术,并指出了未来工作的方向。
{"title":"Investigating the Use of Code Analysis and NLP to Promote a Consistent Usage of Identifiers","authors":"B. Lin, Simone Scalabrino, Andrea Mocci, R. Oliveto, G. Bavota, Michele Lanza","doi":"10.1109/SCAM.2017.17","DOIUrl":"https://doi.org/10.1109/SCAM.2017.17","url":null,"abstract":"Meaningless identifiers as well as inconsistent use of identifiers in the source code might hinder code readability and result in increased software maintenance efforts. Over the past years, effort has been devoted to promoting a consistent usage of identifiers across different parts of a system through approaches exploiting static code analysis and Natural Language Processing (NLP). These techniques have been evaluated in small-scale studies, but it is unclear how they compare to each other and how they complement each other. Furthermore, a full-fledged larger empirical evaluation is still missing.,,We aim at bridging this gap. We asked developers of five projects to assess the meaningfulness of the recommendations generated by three techniques, two already existing in the literature (one exploiting static analysis, one using NLP) and a novel one we propose. With a total of 922 rename refactorings evaluated, this is, to the best of our knowledge, the largest empirical study conducted to assess and compare rename refactoring tools promoting a consistent use of identifiers. Our study sheds light on the current state-of-the-art in rename refactoring recommenders, and indicates directions for future work.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116580789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A Static Code Smell Detector for SQL Queries Embedded in Java Code Java代码中嵌入SQL查询的静态代码气味检测器
Csaba Nagy, Anthony Cleve
A database plays a central role in the architecture of an information system, and the way it stores the data delimits its main features. However, it is not just the data that matters. The way it is handled, i.e., how the application communicates with the database is of critical importance too. Therefore the implementation of such a communication layer has to be reliable and efficient. SQL is a popular language to query a database, and modern technologies rely on it (or its dialects) as query strings embedded in the application code. In many languages (e.g. in Java), an embedded query is typically constructed through several string operations that obstruct developers in understanding the statement finally sent to the database. It is a potential source of fault-prone and inefficient database usage, i.e., code smells. In our paper, we present a tool for the identification of code smells in SQL queries embedded in Java code. Our tool implements a combined static analysis of the SQL statements embedded in the source code, the database schema, and the data in the database. We use a lightweight query extraction algorithm to extract SQL code from the Java code and implement smell detectors on the ASG of our fault-tolerant SQL parser. Depending on the context of the smell, its severity is also determined. Developers can examine the identified issues with the help of an Eclipse plug-in or through command line interfaces.
数据库在信息系统的体系结构中起着中心作用,它存储数据的方式决定了它的主要特征。然而,重要的不仅仅是数据。处理它的方式,即应用程序如何与数据库通信也至关重要。因此,这种通信层的实现必须是可靠和高效的。SQL是查询数据库的流行语言,现代技术依赖于它(或它的方言)作为嵌入到应用程序代码中的查询字符串。在许多语言中(例如Java),嵌入式查询通常是通过几个字符串操作构造的,这会阻碍开发人员理解最终发送到数据库的语句。它是容易出错和数据库使用效率低下的潜在来源,即代码气味。在本文中,我们提供了一个工具,用于识别嵌入Java代码中的SQL查询中的代码气味。我们的工具实现了对源代码中嵌入的SQL语句、数据库模式和数据库中的数据的组合静态分析。我们使用轻量级查询提取算法从Java代码中提取SQL代码,并在容错SQL解析器的ASG上实现气味检测器。根据气味的环境,它的严重程度也是确定的。开发人员可以借助Eclipse插件或通过命令行接口检查已识别的问题。
{"title":"A Static Code Smell Detector for SQL Queries Embedded in Java Code","authors":"Csaba Nagy, Anthony Cleve","doi":"10.1109/SCAM.2017.19","DOIUrl":"https://doi.org/10.1109/SCAM.2017.19","url":null,"abstract":"A database plays a central role in the architecture of an information system, and the way it stores the data delimits its main features. However, it is not just the data that matters. The way it is handled, i.e., how the application communicates with the database is of critical importance too. Therefore the implementation of such a communication layer has to be reliable and efficient. SQL is a popular language to query a database, and modern technologies rely on it (or its dialects) as query strings embedded in the application code. In many languages (e.g. in Java), an embedded query is typically constructed through several string operations that obstruct developers in understanding the statement finally sent to the database. It is a potential source of fault-prone and inefficient database usage, i.e., code smells. In our paper, we present a tool for the identification of code smells in SQL queries embedded in Java code. Our tool implements a combined static analysis of the SQL statements embedded in the source code, the database schema, and the data in the database. We use a lightweight query extraction algorithm to extract SQL code from the Java code and implement smell detectors on the ASG of our fault-tolerant SQL parser. Depending on the context of the smell, its severity is also determined. Developers can examine the identified issues with the help of an Eclipse plug-in or through command line interfaces.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125136807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Revisiting Exception Handling Practices with Exception Flow Analysis 用异常流分析重温异常处理实践
G. B. D. Pádua, Weiyi Shang
Modern programming languages, such as Java and C#, typically provide features that handle exceptions. These features separate error-handling code from regular source code and aim to assist in the practice of software comprehension and maintenance. Having acknowledged the advantages of exception handling features, their misuse can still cause reliability degradation or even catastrophic software failures. Prior studies on exception handling aim to understand the practices of exception handling in its different components, such as the origin of the exceptions and the handling code of the exceptions. Yet, the observed findings were scattered and diverse. In this paper, to complement prior research findings on exception handling, we study its features by enriching the knowledge of handling code with a flow analysis of exceptions. Our case study is conducted with over 10K exception handling blocks, and over 77K related exception flows from 16 open-source Java and C# (.NET) libraries and applications. Our case study results show that each try block has up to 12 possible potentially recoverable yet propagated exceptions. More importantly, 22% of the distinct possible exceptions can be traced back to multiple methods (average of 1.39 and max of 34). Such results highlight the additional challenge of composing quality exception handling code. To make it worse, we confirm that there is a lack of documentation of the possible exceptions and their sources. However, such critical information can be identified by exception flow analysis on well-documented API calls (e.g., JRE and.NET documentation). Finally, we observe different strategies in exception handling code between Java and C#. Our findings highlight the opportunities of leveraging automated software analysis to assist in exception handling practices and signify the need of more further in-depth studies on exception handling practice.
现代编程语言,如Java和c#,通常提供处理异常的特性。这些特性将错误处理代码与常规源代码分开,旨在帮助软件理解和维护的实践。虽然已经认识到异常处理特性的优点,但它们的误用仍然会导致可靠性下降,甚至是灾难性的软件故障。先前对异常处理的研究旨在了解异常处理在其不同组件中的实践,例如异常的来源和异常处理代码。然而,观察到的发现是分散和多样化的。在本文中,为了补充先前在异常处理方面的研究成果,我们通过使用异常流分析丰富处理代码的知识来研究异常处理的特征。我们的案例研究使用了来自16个开源Java和c# (. net)库和应用程序的超过10K个异常处理块和77K个相关异常流。我们的案例研究结果表明,每个try块最多有12个可能的潜在可恢复但已传播的异常。更重要的是,22%不同的可能异常可以追溯到多个方法(平均值为1.39,最大值为34)。这样的结果突出了编写高质量异常处理代码的额外挑战。更糟糕的是,我们确认缺乏可能的异常及其来源的文档。然而,这些关键信息可以通过对记录良好的API调用(例如JRE和java . js)的异常流分析来识别。网络文档)。最后,我们观察到Java和c#在异常处理代码中的不同策略。我们的发现强调了利用自动化软件分析来辅助异常处理实践的机会,并表明需要对异常处理实践进行更深入的研究。
{"title":"Revisiting Exception Handling Practices with Exception Flow Analysis","authors":"G. B. D. Pádua, Weiyi Shang","doi":"10.1109/SCAM.2017.16","DOIUrl":"https://doi.org/10.1109/SCAM.2017.16","url":null,"abstract":"Modern programming languages, such as Java and C#, typically provide features that handle exceptions. These features separate error-handling code from regular source code and aim to assist in the practice of software comprehension and maintenance. Having acknowledged the advantages of exception handling features, their misuse can still cause reliability degradation or even catastrophic software failures. Prior studies on exception handling aim to understand the practices of exception handling in its different components, such as the origin of the exceptions and the handling code of the exceptions. Yet, the observed findings were scattered and diverse. In this paper, to complement prior research findings on exception handling, we study its features by enriching the knowledge of handling code with a flow analysis of exceptions. Our case study is conducted with over 10K exception handling blocks, and over 77K related exception flows from 16 open-source Java and C# (.NET) libraries and applications. Our case study results show that each try block has up to 12 possible potentially recoverable yet propagated exceptions. More importantly, 22% of the distinct possible exceptions can be traced back to multiple methods (average of 1.39 and max of 34). Such results highlight the additional challenge of composing quality exception handling code. To make it worse, we confirm that there is a lack of documentation of the possible exceptions and their sources. However, such critical information can be identified by exception flow analysis on well-documented API calls (e.g., JRE and.NET documentation). Finally, we observe different strategies in exception handling code between Java and C#. Our findings highlight the opportunities of leveraging automated software analysis to assist in exception handling practices and signify the need of more further in-depth studies on exception handling practice.","PeriodicalId":306744,"journal":{"name":"2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131069799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1