首页 > 最新文献

2010 10th IEEE Working Conference on Source Code Analysis and Manipulation最新文献

英文 中文
Language-Independent Clone Detection Applied to Plagiarism Detection 语言无关克隆检测在抄袭检测中的应用
Pub Date : 2010-09-12 DOI: 10.1109/SCAM.2010.19
Romain Brixtel, Mathieu Fontaine, Boris Lesner, Cyril Bazin, R. Robbes
Clone detection is usually applied in the context of detecting small-to medium scale fragments of duplicated code in large software systems. In this paper, we address the problem of clone detection applied to plagiarism detection in the context of source code assignments done by computer science students. Plagiarism detection comes with a distinct set of constraints to usual clone detection approaches, which influenced the design of the approach we present in this paper. For instance, the source code can be heavily changed at a superficial level (in an attempt to look genuine), yet be functionally very similar. Since assignments turned in by computer science students can be in a variety of languages, we work at the syntactic level and do not consider the source-code semantics. Consequently, the approach we propose is endogenous and makes no assumption about the programming language being analysed. It is based on an alignment method using the parallel principle at local resolution (character level) to compute similarities between documents. We tested our framework on hundreds of real source files, involving a wide array of programming languages (Java, C, Python, PHP, Haskell, bash). Our approach allowed us to discover previously undetected frauds, and to empirically evaluate its accuracy and robustness.
克隆检测通常用于检测大型软件系统中小到中等规模的重复代码片段。在本文中,我们解决了克隆检测在计算机科学专业学生完成的源代码作业中应用于剽窃检测的问题。抄袭检测与通常的克隆检测方法相比有一组不同的约束,这影响了我们在本文中提出的方法的设计。例如,源代码可以在表面上进行大量更改(试图看起来真实),但功能非常相似。由于计算机科学专业的学生提交的作业可以用各种语言,我们在语法层面上工作,而不考虑源代码语义。因此,我们提出的方法是内生的,对所分析的编程语言没有任何假设。它基于一种对齐方法,使用局部分辨率(字符级别)的并行原则来计算文档之间的相似性。我们在数百个真实的源文件上测试了我们的框架,这些文件涉及各种编程语言(Java、C、Python、PHP、Haskell、bash)。我们的方法使我们能够发现以前未被发现的欺诈行为,并通过经验评估其准确性和稳健性。
{"title":"Language-Independent Clone Detection Applied to Plagiarism Detection","authors":"Romain Brixtel, Mathieu Fontaine, Boris Lesner, Cyril Bazin, R. Robbes","doi":"10.1109/SCAM.2010.19","DOIUrl":"https://doi.org/10.1109/SCAM.2010.19","url":null,"abstract":"Clone detection is usually applied in the context of detecting small-to medium scale fragments of duplicated code in large software systems. In this paper, we address the problem of clone detection applied to plagiarism detection in the context of source code assignments done by computer science students. Plagiarism detection comes with a distinct set of constraints to usual clone detection approaches, which influenced the design of the approach we present in this paper. For instance, the source code can be heavily changed at a superficial level (in an attempt to look genuine), yet be functionally very similar. Since assignments turned in by computer science students can be in a variety of languages, we work at the syntactic level and do not consider the source-code semantics. Consequently, the approach we propose is endogenous and makes no assumption about the programming language being analysed. It is based on an alignment method using the parallel principle at local resolution (character level) to compute similarities between documents. We tested our framework on hundreds of real source files, involving a wide array of programming languages (Java, C, Python, PHP, Haskell, bash). Our approach allowed us to discover previously undetected frauds, and to empirically evaluate its accuracy and robustness.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122813749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
How Good is Static Analysis at Finding Concurrency Bugs? 静态分析在发现并发bug方面有多好?
Pub Date : 2010-09-12 DOI: 10.1109/SCAM.2010.26
Devin Kester, Martin Mwebesa, J. S. Bradbury
Detecting bugs in concurrent software is challenging due to the many different thread interleavings. Dynamic analysis and testing solutions to bug detection are often costly as they need to provide coverage of the interleaving space in addition to traditional black box or white box coverage. An alternative to dynamic analysis detection of concurrency bugs is the use of static analysis. This paper examines the use of three static analysis tools (Find Bugs, J Lint and Chord) in order to assess each tool's ability to find concurrency bugs and to identify the percentage of spurious results produced. The empirical data presented is based on an experiment involving 12 concurrent Java programs.
由于存在许多不同的线程交错,在并发软件中检测bug是具有挑战性的。缺陷检测的动态分析和测试解决方案通常是昂贵的,因为除了传统的黑盒或白盒覆盖之外,它们还需要提供交错空间的覆盖。动态分析检测并发性错误的另一种方法是使用静态分析。本文检查了三种静态分析工具(Find Bugs, J Lint和Chord)的使用情况,以便评估每个工具发现并发错误的能力,并确定产生的虚假结果的百分比。所提供的经验数据是基于一个涉及12个并发Java程序的实验。
{"title":"How Good is Static Analysis at Finding Concurrency Bugs?","authors":"Devin Kester, Martin Mwebesa, J. S. Bradbury","doi":"10.1109/SCAM.2010.26","DOIUrl":"https://doi.org/10.1109/SCAM.2010.26","url":null,"abstract":"Detecting bugs in concurrent software is challenging due to the many different thread interleavings. Dynamic analysis and testing solutions to bug detection are often costly as they need to provide coverage of the interleaving space in addition to traditional black box or white box coverage. An alternative to dynamic analysis detection of concurrency bugs is the use of static analysis. This paper examines the use of three static analysis tools (Find Bugs, J Lint and Chord) in order to assess each tool's ability to find concurrency bugs and to identify the percentage of spurious results produced. The empirical data presented is based on an experiment involving 12 concurrent Java programs.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126351994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
MemSafe: Ensuring the Spatial and Temporal Memory Safety of C at Runtime MemSafe:确保C在运行时的空间和时间内存安全
Matthew S. Simpson, R. Barua
Memory access violations are a leading source of unreliability in C programs. As evidence of this problem, a variety of methods exist that retrofit C with software checks to detect memory errors at runtime. However, these methods generally suffer from one or more drawbacks including the inability to detect all errors, the use of incompatible metadata, the need for manual code modifications, and high runtime overheads. In this paper, we present a compiler analysis and transformation for ensuring the memory safety of C called MemSafe. MemSafe makes several novel contributions that improve upon previous work and lower the cost of safety. These include (1) a method for modeling temporal errors as spatial errors, (2) a metadata representation that combines features of both object - and pointer-based approaches, and (3) a dataflow representation that simplifies optimizations for removing unneeded checks. MemSafe is capable of detecting real errors with lower overheads than previous efforts. Experimental results show that MemSafe detects all memory errors in 6 programs with known violations and ensures complete safety with an average overhead of 87% on 30 large programs widely-used in evaluating error detection tools.
内存访问违规是C程序不可靠性的主要来源。作为这个问题的证据,存在着各种各样的方法,它们用软件检查来改进C语言,以便在运行时检测内存错误。然而,这些方法通常存在一个或多个缺点,包括无法检测所有错误、使用不兼容的元数据、需要手动修改代码以及高运行时开销。本文提出了一种保证C语言内存安全的编译器分析和转换方法——MemSafe。MemSafe做出了一些新颖的贡献,改进了以前的工作,降低了安全成本。这些包括(1)将时间误差建模为空间误差的方法,(2)结合了基于对象和指针方法的特性的元数据表示,以及(3)简化优化以删除不必要检查的数据流表示。MemSafe能够以比以前更低的开销检测真正的错误。实验结果表明,MemSafe可以在6个已知违规的程序中检测到所有内存错误,并在30个广泛用于评估错误检测工具的大型程序中以87%的平均开销确保完全安全。
{"title":"MemSafe: Ensuring the Spatial and Temporal Memory Safety of C at Runtime","authors":"Matthew S. Simpson, R. Barua","doi":"10.1002/spe.2105","DOIUrl":"https://doi.org/10.1002/spe.2105","url":null,"abstract":"Memory access violations are a leading source of unreliability in C programs. As evidence of this problem, a variety of methods exist that retrofit C with software checks to detect memory errors at runtime. However, these methods generally suffer from one or more drawbacks including the inability to detect all errors, the use of incompatible metadata, the need for manual code modifications, and high runtime overheads. In this paper, we present a compiler analysis and transformation for ensuring the memory safety of C called MemSafe. MemSafe makes several novel contributions that improve upon previous work and lower the cost of safety. These include (1) a method for modeling temporal errors as spatial errors, (2) a metadata representation that combines features of both object - and pointer-based approaches, and (3) a dataflow representation that simplifies optimizations for removing unneeded checks. MemSafe is capable of detecting real errors with lower overheads than previous efforts. Experimental results show that MemSafe detects all memory errors in 6 programs with known violations and ensures complete safety with an average overhead of 87% on 30 large programs widely-used in evaluating error detection tools.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115396969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Parallel Reachability and Escape Analyses 并行可达性和逸出分析
Pub Date : 2010-09-12 DOI: 10.1109/SCAM.2010.10
Marcus Edvinsson, Jonas Lundberg, Welf Löwe
Static program analysis usually consists of a number of steps, each producing partial results. For example, the points-to analysis step, calculating object references in a program, usually just provides the input for larger client analyses like reach ability and escape analyses. All these analyses are computationally intense and it is therefore vital to create parallel approaches that make use of the processing power that comes from multiple cores in modern desktop computers. The present paper presents two parallel approaches to increase the efficiency of reach ability analysis and escape analysis, based on a parallel points-to analysis. The experiments show that the two parallel approaches achieve a speed-up of 1.5 for reach ability analysis and 3.8 for escape analysis on 8 cores for a benchmark suite of Java programs.
静态程序分析通常由许多步骤组成,每个步骤产生部分结果。例如,指向分析步骤,计算程序中的对象引用,通常只是为较大的客户端分析(如到达能力和转义分析)提供输入。所有这些分析都是计算密集型的,因此创建并行方法以利用现代台式计算机中来自多核的处理能力至关重要。本文在平行点对分析的基础上,提出了两种并行的方法来提高可达能力分析和逃逸分析的效率。实验表明,在8核的Java程序基准测试中,两种并行方法的可达性分析和逸出分析的速度分别提高了1.5和3.8。
{"title":"Parallel Reachability and Escape Analyses","authors":"Marcus Edvinsson, Jonas Lundberg, Welf Löwe","doi":"10.1109/SCAM.2010.10","DOIUrl":"https://doi.org/10.1109/SCAM.2010.10","url":null,"abstract":"Static program analysis usually consists of a number of steps, each producing partial results. For example, the points-to analysis step, calculating object references in a program, usually just provides the input for larger client analyses like reach ability and escape analyses. All these analyses are computationally intense and it is therefore vital to create parallel approaches that make use of the processing power that comes from multiple cores in modern desktop computers. The present paper presents two parallel approaches to increase the efficiency of reach ability analysis and escape analysis, based on a parallel points-to analysis. The experiments show that the two parallel approaches achieve a speed-up of 1.5 for reach ability analysis and 3.8 for escape analysis on 8 cores for a benchmark suite of Java programs.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114380190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deriving Coupling Metrics from Call Graphs 从调用图派生耦合度量
Pub Date : 2010-09-12 DOI: 10.1109/SCAM.2010.25
Simon Allier, S. Vaucher, Bruno Dufour, H. Sahraoui
Coupling metrics play an important role in empirical software engineering research as well as in industrial measurement programs. The existing coupling metrics have usually been defined in a way that they can be computed from a static analysis of the source code. However, modern programs extensively use dynamic language features such as polymorphism and dynamic class loading that are difficult to capture by static analysis. Consequently, the derived metric values might not accurately reflect the state of a program. In this paper, we express existing definitions of coupling metrics using call graphs. We then compare the results of four different call graph construction algorithms with standard tool implementations of these metrics in an empirical study. Our results show important variations in coupling between standard and call graph-based calculations due to the support of dynamic features.
耦合度量在实证软件工程研究以及工业测量方案中发挥着重要作用。现有的耦合度量通常是以一种可以从源代码的静态分析中计算出来的方式定义的。然而,现代程序广泛使用动态语言特性,如多态性和动态类加载,这些特性很难通过静态分析捕获。因此,派生的度量值可能不能准确地反映程序的状态。在本文中,我们使用调用图来表达耦合度量的现有定义。然后,我们在实证研究中比较了四种不同的调用图构建算法与这些度量的标准工具实现的结果。我们的结果显示,由于支持动态特性,标准计算和基于调用图的计算之间的耦合发生了重要变化。
{"title":"Deriving Coupling Metrics from Call Graphs","authors":"Simon Allier, S. Vaucher, Bruno Dufour, H. Sahraoui","doi":"10.1109/SCAM.2010.25","DOIUrl":"https://doi.org/10.1109/SCAM.2010.25","url":null,"abstract":"Coupling metrics play an important role in empirical software engineering research as well as in industrial measurement programs. The existing coupling metrics have usually been defined in a way that they can be computed from a static analysis of the source code. However, modern programs extensively use dynamic language features such as polymorphism and dynamic class loading that are difficult to capture by static analysis. Consequently, the derived metric values might not accurately reflect the state of a program. In this paper, we express existing definitions of coupling metrics using call graphs. We then compare the results of four different call graph construction algorithms with standard tool implementations of these metrics in an empirical study. Our results show important variations in coupling between standard and call graph-based calculations due to the support of dynamic features.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130457701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Recovering the Memory Behavior of Executable Programs 恢复可执行程序的内存行为
Pub Date : 2010-09-12 DOI: 10.1109/SCAM.2010.18
A. Ketterlin, P. Clauss
This paper deals with the binary analysis of executable programs, with the goal of understanding how they access memory. It explains how to statically build a formal model of all memory accesses. Starting with a control-flow graph of each procedure, well-known techniques are used to structure this graph into a hierarchy of loops in all cases. The paper shows that much more information can be extracted by performing a complete data-flow analysis over machine registers after the program has been put in static single assignment (SSA) form. By using the SSA form, registers used in addressing memory can be symbolically expressed in terms of other, previously set registers. By including the loop structures in the analysis, loop indices and trip counts can also often be expressed symbolically. The whole process produces a formal model made of loops where memory accesses are linear expressions of loop counters and registers. The paper provides a quantitative evaluation of the results when applied to several dozens of SPEC benchmark programs. Because static analysis is often incomplete, the paper ends by describing a lightweight instrumentation strategy that collects at run time enough information to complete the program's symbolic description.
本文处理可执行程序的二进制分析,目的是了解它们如何访问内存。它解释了如何静态地构建所有内存访问的正式模型。从每个过程的控制流图开始,使用众所周知的技术在所有情况下将该图构建为循环层次结构。本文表明,在将程序置于静态单赋值(SSA)形式后,通过对机器寄存器进行完整的数据流分析可以提取更多的信息。通过使用SSA形式,寻址内存中使用的寄存器可以用其他先前设置的寄存器符号表示。通过在分析中包含环路结构,环路指标和行程数也常常可以用符号表示。整个过程产生了一个由循环组成的正式模型,其中内存访问是循环计数器和寄存器的线性表达式。本文对应用于几十个SPEC基准程序的结果进行了定量评价。由于静态分析通常是不完整的,因此本文最后描述了一种轻量级的检测策略,该策略在运行时收集足够的信息来完成程序的符号描述。
{"title":"Recovering the Memory Behavior of Executable Programs","authors":"A. Ketterlin, P. Clauss","doi":"10.1109/SCAM.2010.18","DOIUrl":"https://doi.org/10.1109/SCAM.2010.18","url":null,"abstract":"This paper deals with the binary analysis of executable programs, with the goal of understanding how they access memory. It explains how to statically build a formal model of all memory accesses. Starting with a control-flow graph of each procedure, well-known techniques are used to structure this graph into a hierarchy of loops in all cases. The paper shows that much more information can be extracted by performing a complete data-flow analysis over machine registers after the program has been put in static single assignment (SSA) form. By using the SSA form, registers used in addressing memory can be symbolically expressed in terms of other, previously set registers. By including the loop structures in the analysis, loop indices and trip counts can also often be expressed symbolically. The whole process produces a formal model made of loops where memory accesses are linear expressions of loop counters and registers. The paper provides a quantitative evaluation of the results when applied to several dozens of SPEC benchmark programs. Because static analysis is often incomplete, the paper ends by describing a lightweight instrumentation strategy that collects at run time enough information to complete the program's symbolic description.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"18 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114045064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reconstruction of Composite Types for Decompilation 用于反编译的复合类型重构
Pub Date : 2010-09-12 DOI: 10.1109/SCAM.2010.24
K. Troshina, Yegor Derevenets, A. Chernov
Decompilation is reconstruction of a program in a high-level language from a program in a low-level language. This paper presents a method for automatic reconstruction of composite types (structures, arrays and combinations of them)in a high-level program during decompilation. Assembly code is obtained by disassembling a binary code or traces collected by a simulator. The proposed method is based on expressing memory access operations as pairs base offset, then building equivalence classes for the bases used in the program and accumulating offsets for each equivalence class. For Strictly conforming C programs our approach is substantiated by the C language semantics as defined in the international standard. However, experimental results have revealed that it is applicable for real-world programs also. Experimental results are obtained for a number of open-source programs as well as for traces collected from them. The method is an essential part of the tool for program decompilation TyDec being developed by the authors. Decompiler TyDec can be used as a standalone tool or as a plug-in for Interactive Trace Explorer TrEx being developed in Institute for System Programming, Russian Academy of Sciences.
反编译是用高级语言从低级语言的程序中重建一个程序。本文提出了一种在高级程序反编译过程中自动重建复合类型(结构、数组和它们的组合)的方法。汇编代码是通过反汇编由模拟器收集的二进制代码或跟踪来获得的。该方法将内存访问操作表示为对基偏移量,然后为程序中使用的基建立等价类,并为每个等价类累积偏移量。对于严格符合C语言的程序,我们的方法是由国际标准中定义的C语言语义所证实的。然而,实验结果表明,它也适用于现实世界的程序。实验结果得到了一些开源程序,以及从他们收集的痕迹。该方法是作者正在开发的程序反编译工具TyDec的重要组成部分。反编译器TyDec可以作为独立的工具使用,也可以作为俄罗斯科学院系统编程研究所开发的交互式跟踪资源管理器TrEx的插件使用。
{"title":"Reconstruction of Composite Types for Decompilation","authors":"K. Troshina, Yegor Derevenets, A. Chernov","doi":"10.1109/SCAM.2010.24","DOIUrl":"https://doi.org/10.1109/SCAM.2010.24","url":null,"abstract":"Decompilation is reconstruction of a program in a high-level language from a program in a low-level language. This paper presents a method for automatic reconstruction of composite types (structures, arrays and combinations of them)in a high-level program during decompilation. Assembly code is obtained by disassembling a binary code or traces collected by a simulator. The proposed method is based on expressing memory access operations as pairs base offset, then building equivalence classes for the bases used in the program and accumulating offsets for each equivalence class. For Strictly conforming C programs our approach is substantiated by the C language semantics as defined in the international standard. However, experimental results have revealed that it is applicable for real-world programs also. Experimental results are obtained for a number of open-source programs as well as for traces collected from them. The method is an essential part of the tool for program decompilation TyDec being developed by the authors. Decompiler TyDec can be used as a standalone tool or as a plug-in for Interactive Trace Explorer TrEx being developed in Institute for System Programming, Russian Academy of Sciences.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128421760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Subclass Instantiation Distribution 子类实例化分布
Pub Date : 2010-09-12 DOI: 10.1109/SCAM.2010.12
Amy Wheeler, D. Binkley
During execution, an objected-oriented program typically creates a large number of objects. This research considers the distribution of those objects that share a common su per class. If this distribution is uniform then all subclasses are equally likely to be instantiated. However, if not, then the lack of uniformity can be exploited by giving preferential treatment to the dominant class (or classes). For example, a tester might spend greater testing resources on the dominant class while an engineer refactoring the code might begin with a more dominant class. An experiment designed to investigate the distribution of subclass instantiations was performed using eight Java programs containing almost half a million lines of code and just over three thousand classes. The results show that outside a few infrequent instances, most distributions are heavily skewed.
在执行过程中,面向对象程序通常会创建大量对象。本研究考虑了每个类共享一个公共su的对象的分布。如果这个分布是均匀的,那么所有子类被实例化的可能性是相等的。然而,如果不是这样,那么缺乏统一性就可以通过给予统治阶级(或阶级)优惠来加以利用。例如,测试人员可能会在主导类上花费更多的测试资源,而工程师可能会从更主导的类开始重构代码。为了研究子类实例化的分布,我们使用了8个Java程序,这些程序包含近50万行代码和3000多个类。结果表明,除了少数不常见的情况外,大多数分布都严重倾斜。
{"title":"Subclass Instantiation Distribution","authors":"Amy Wheeler, D. Binkley","doi":"10.1109/SCAM.2010.12","DOIUrl":"https://doi.org/10.1109/SCAM.2010.12","url":null,"abstract":"During execution, an objected-oriented program typically creates a large number of objects. This research considers the distribution of those objects that share a common su per class. If this distribution is uniform then all subclasses are equally likely to be instantiated. However, if not, then the lack of uniformity can be exploited by giving preferential treatment to the dominant class (or classes). For example, a tester might spend greater testing resources on the dominant class while an engineer refactoring the code might begin with a more dominant class. An experiment designed to investigate the distribution of subclass instantiations was performed using eight Java programs containing almost half a million lines of code and just over three thousand classes. The results show that outside a few infrequent instances, most distributions are heavily skewed.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128446867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AMBIDEXTER: Practical Ambiguity Detection 实用的歧义检测
Pub Date : 2010-09-12 DOI: 10.1109/SCAM.2010.21
Bas Basten, T. Storm
Ambiguity detection tools try to statically track down ambiguities in context-free grammars. Current ambiguity detection tools, however, either are too slow for large realistic cases, or produce incomprehensible ambiguity reports. AmbiDexter is the ambiguity tool to have your cake and eat it too.
歧义检测工具试图静态地跟踪上下文无关语法中的歧义。然而,当前的歧义检测工具要么对于大型现实案例来说太慢,要么产生难以理解的歧义报告。AmbiDexter是一个鱼与熊掌兼得的模糊工具。
{"title":"AMBIDEXTER: Practical Ambiguity Detection","authors":"Bas Basten, T. Storm","doi":"10.1109/SCAM.2010.21","DOIUrl":"https://doi.org/10.1109/SCAM.2010.21","url":null,"abstract":"Ambiguity detection tools try to statically track down ambiguities in context-free grammars. Current ambiguity detection tools, however, either are too slow for large realistic cases, or produce incomprehensible ambiguity reports. AmbiDexter is the ambiguity tool to have your cake and eat it too.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114674118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Estimating the Optimal Number of Latent Concepts in Source Code Analysis 估计源代码分析中潜在概念的最优数量
Pub Date : 2010-09-12 DOI: 10.1109/SCAM.2010.22
Scott Grant, J. Cordy
The optimal number of latent topics required to model the most accurate latent substructure for a source code corpus is an open question in source code analysis. Most estimates about the number of latent topics that exist in a software corpus are based on the assumption that the data is similar to natural language, but there is little empirical evidence to support this. In order to help determine the appropriate number of topics needed to accurately represent the source code, we generate a series of Latent Dirichlet Allocation models with varying topic counts. We use a heuristic to evaluate the ability of the model to identify related source code blocks, and demonstrate the consequences of choosing too few or too many latent topics.
为源代码语料库建立最准确的潜在子结构所需的潜在主题的最优数量是源代码分析中的一个开放问题。大多数关于软件语料库中存在的潜在主题数量的估计都是基于数据与自然语言相似的假设,但是很少有经验证据支持这一点。为了帮助确定准确表示源代码所需的适当主题数量,我们生成了一系列具有不同主题数量的Latent Dirichlet Allocation模型。我们使用启发式方法来评估模型识别相关源代码块的能力,并演示选择过少或过多潜在主题的后果。
{"title":"Estimating the Optimal Number of Latent Concepts in Source Code Analysis","authors":"Scott Grant, J. Cordy","doi":"10.1109/SCAM.2010.22","DOIUrl":"https://doi.org/10.1109/SCAM.2010.22","url":null,"abstract":"The optimal number of latent topics required to model the most accurate latent substructure for a source code corpus is an open question in source code analysis. Most estimates about the number of latent topics that exist in a software corpus are based on the assumption that the data is similar to natural language, but there is little empirical evidence to support this. In order to help determine the appropriate number of topics needed to accurately represent the source code, we generate a series of Latent Dirichlet Allocation models with varying topic counts. We use a heuristic to evaluate the ability of the model to identify related source code blocks, and demonstrate the consequences of choosing too few or too many latent topics.","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
期刊
2010 10th IEEE Working Conference on Source Code Analysis and Manipulation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1