2011 18th Working Conference on Reverse Engineering最新文献

英文中文

Internet-scale Real-time Code Clone Search Via Multi-level Indexing 互联网规模的实时代码克隆搜索通过多层次索引

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.13

I. Keivanloo, J. Rilling, P. Charland

Finding lines of code similar to a code fragment across large knowledge bases in fractions of a second is a new branch of code clone research also known as real-time code clone search. Among the requirements real-time code clone search has to meet are scalability, short response time, scalable incremental corpus updates, and support for type-1, type-2, and type-3 clones. We conducted a set of empirical studies on a large open source code corpus to gain insight about its characteristics. We used these results to design and optimize a multi-level indexing approach using hash table-based and binary search to improve Internet-scale real-time code clone search response time. Finally, we performed an evaluation on an Internet-scale corpus (1.5 million Java files and 266 MLOC). Our approach maintains a response time for 99.9% of clone searches in the microseconds range, while supporting the aforementioned requirements.

在不到一秒的时间内在大型知识库中找到与代码片段相似的代码行是代码克隆研究的一个新分支，也称为实时代码克隆搜索。实时代码克隆搜索必须满足的需求包括可伸缩性、短响应时间、可伸缩的增量语料库更新，以及对类型1、类型2和类型3克隆的支持。我们对一个大型开源代码语料库进行了一组实证研究，以深入了解其特征。我们利用这些结果设计并优化了基于哈希表和二进制搜索的多级索引方法，以改善互联网规模的实时代码克隆搜索响应时间。最后，我们对一个互联网规模的语料库(150万个Java文件和266个MLOC)进行了评估。我们的方法将99.9%的克隆搜索的响应时间保持在微秒范围内，同时支持上述需求。

引用次数: 43

Stealthy Profiling and Debugging of Malware Trampolining from User to Kernel Space 从用户空间到内核空间的恶意软件的隐身分析与调试

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.62

J. Raber

A reverse engineer trying to understand protected malware binaries is faced with avoiding detection by antidebugging protections. Advanced protection systems may even load specialized drivers that can re-flash firmware and change the privileges of running applications, significantly increasing the penalty of detection. Hades is a Windows kernel driver designed to aid reverse engineering endeavors. It avoids detection by employing intelligent instrumentation via instruction rerouting in both user and kernel space. This technique allows a reverse engineer to easily debug and profile binaries without fear of invoking protection penalties.

试图理解受保护的恶意软件二进制文件的逆向工程师面临着通过反调试保护来避免检测的问题。高级保护系统甚至可以加载专门的驱动程序，这些驱动程序可以重新刷新固件并更改正在运行的应用程序的特权，从而大大增加了检测的代价。Hades是一个Windows内核驱动程序，旨在帮助逆向工程。它通过在用户空间和内核空间中使用指令重路由的智能检测来避免检测。这种技术允许逆向工程师轻松地调试和分析二进制文件，而不必担心调用保护惩罚。

引用次数: 1

Reasoning over the Evolution of Source Code Using Quantified Regular Path Expressions 使用量化正则路径表达式的源代码演化推理

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.54

Andy Kellens, Coen De Roover, Carlos Noguera, Reinout Stevens, V. Jonckers

Version control systems (VCS) have become indispensable to develop software. Next to their immediate advantages, they also offer information about the evolution of software and its development process. Despite this wealth of information, it has only been leveraged by tools that are dedicated to a specific software engineering task such as predicting bugs or identifying hotspots. General-purpose tool support for reasoning about the information contained in a version control system is limited. In this paper, we introduce the logic-based program query language ABSINTHE. It supports querying versioned software systems using logic queries in which quantified regular path expressions are embedded. These expressions lend themselves to specifying the properties that each individual version in a sequence of successive software versions ought to exhibit.

版本控制系统(VCS)已经成为开发软件不可或缺的工具。除了它们的直接优势之外，它们还提供了关于软件演变及其开发过程的信息。尽管有如此丰富的信息，但它只被专门用于特定软件工程任务(如预测错误或识别热点)的工具所利用。对版本控制系统中包含的信息进行推理的通用工具支持是有限的。本文介绍了基于逻辑的程序查询语言ABSINTHE。它支持使用嵌入量化正则路径表达式的逻辑查询查询版本化的软件系统。这些表达式有助于指定在一系列连续软件版本中每个单独版本应该展示的属性。

引用次数: 9

On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems Simhash在大型软件系统中检测脱靶克隆的有效性研究

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.12

M. Uddin, C. Roy, Kevin A. Schneider, Abram Hindle

Clone detection techniques essentially cluster textually, syntactically and/or semantically similar code fragments in or across software systems. For large datasets, similarity identification is costly both in terms of time and memory, and especially so when detecting near-miss clones where lines could be modified, added and/or deleted in the copied fragments. The capability and effectiveness of a clone detection tool mostly depends on the code similarity measurement technique it uses. A variety of similarity measurement approaches have been used for clone detection, including fingerprint based approaches, which have had varying degrees of success notwithstanding some limitations. In this paper, we investigate the effectiveness of simhash, a state of the art fingerprint based data similarity measurement technique for detecting both exact and near-miss clones in large scale software systems. Our experimental data show that simhash is indeed effective in identifying various types of clones in a software system despite wide variations in experimental circumstances. The approach is also suitable as a core capability for building other tools, such as tools for: incremental clone detection, code searching, and clone management.

克隆检测技术本质上是对软件系统内或跨软件系统的文本、语法和/或语义相似的代码片段进行聚类。对于大型数据集，相似性识别在时间和内存方面都是昂贵的，特别是在检测可能修改、添加和/或删除复制片段中的行的未遂克隆时。克隆检测工具的能力和有效性主要取决于它所使用的代码相似度度量技术。各种相似性测量方法已被用于克隆检测，包括基于指纹的方法，尽管存在一些局限性，但已取得了不同程度的成功。在本文中，我们研究了simhash的有效性，simhash是一种基于指纹的数据相似度测量技术，用于检测大型软件系统中的精确克隆和近靶克隆。我们的实验数据表明，尽管实验环境存在很大差异，但simhash在识别软件系统中各种类型的克隆方面确实有效。该方法也适合作为构建其他工具的核心功能，例如用于:增量克隆检测、代码搜索和克隆管理的工具。

{"title":"On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems","authors":"M. Uddin, C. Roy, Kevin A. Schneider, Abram Hindle","doi":"10.1109/WCRE.2011.12","DOIUrl":"https://doi.org/10.1109/WCRE.2011.12","url":null,"abstract":"Clone detection techniques essentially cluster textually, syntactically and/or semantically similar code fragments in or across software systems. For large datasets, similarity identification is costly both in terms of time and memory, and especially so when detecting near-miss clones where lines could be modified, added and/or deleted in the copied fragments. The capability and effectiveness of a clone detection tool mostly depends on the code similarity measurement technique it uses. A variety of similarity measurement approaches have been used for clone detection, including fingerprint based approaches, which have had varying degrees of success notwithstanding some limitations. In this paper, we investigate the effectiveness of simhash, a state of the art fingerprint based data similarity measurement technique for detecting both exact and near-miss clones in large scale software systems. Our experimental data show that simhash is indeed effective in identifying various types of clones in a software system despite wide variations in experimental circumstances. The approach is also suitable as a core capability for building other tools, such as tools for: incremental clone detection, code searching, and clone management.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"57 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134412519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74

Reverse Engineering Feature Models from Programs' Feature Sets 从程序的特征集逆向工程特征模型

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.45

Evelyn Nicole Haslinger, R. Lopez-Herrejon, Alexander Egyed

Successful software is more and more rarely developed as a one-of-a-kind system. Instead, different system variants are built from a common set of assets and customized for catering to the different functionality or technology needs of the distinct clients and users. The Software Product Line Engineering (SPLE) paradigm has proven effective to cope with the variability described for this scenario. However, evolving a Software Product Line (SPL) from a family of systems is not a simple endeavor. A crucial requirement is accurately capturing the variability present in the family of systems and representing it with Feature Models (FMs), the de facto standard for variability modeling. Current research has focused on extracting FMs from configuration scripts, propositional logic expressions or natural language. In contrast, in this short paper we present an algorithm that reverse engineers a basic feature model from the feature sets which describe the features each system provides. We perform an evaluation of our approach using several case studies and outline the issues that still need to be addressed.

成功的软件越来越少被开发成独一无二的系统。相反，不同的系统变体是从一组共同的资产中构建的，并为满足不同客户和用户的不同功能或技术需求而进行定制。软件产品线工程(SPLE)范例已被证明能够有效地处理此场景所描述的可变性。然而，从一系列系统中发展出软件产品线(SPL)并不是一件简单的事情。一个关键的需求是准确地捕获系统家族中的可变性，并用特征模型(Feature Models, FMs)表示它，这是可变性建模的事实标准。目前的研究主要集中在从配置脚本、命题逻辑表达式或自然语言中提取FMs。相比之下，在这篇短文中，我们提出了一种算法，该算法从描述每个系统提供的特征的特征集中反向工程一个基本特征模型。我们使用几个案例研究对我们的方法进行了评估，并概述了仍然需要解决的问题。

引用次数: 59

Recommending People in Developers' Collaboration Network 推荐开发者协作网络中的人员

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.53

Didi Surian, Nian Liu, D. Lo, Hanghang Tong, Ee-Peng Lim, C. Faloutsos

Many software developments involve collaborations of developers across the globe. This is true for both open-source and closed-source development efforts. Developers collaborate on different projects of various types. As with any other teamwork endeavors, finding compatibility among members in a development team is helpful towards the realization of the team's goal. Compatible members tend to share similar programming style and naming strategy, communicate well with one another, etc. However, finding the right person to work with is not an easy task. In this work, we extract information available from Source forge. Net, the largest database of open source software, and build developer collaboration network comprising of information on developers, projects, and project properties. Based on an input developer, we then recommend a list of top developers that are most compatible based on their programming language skills, past projects and project categories they have worked on before, via a random walk with restart procedure. Our quantitative and qualitative experiments show that we are able to recommend reasonable developer candidates from snapshots of Source forge. Net consisting of tens of thousands of developers and projects, and hundreds of project properties.

许多软件开发涉及全球开发人员的协作。对于开源和闭源开发工作都是如此。开发人员在不同类型的不同项目上进行协作。与任何其他团队合作一样，在开发团队中找到成员之间的兼容性有助于实现团队目标。兼容的成员倾向于共享相似的编程风格和命名策略，彼此之间沟通良好，等等。然而，找到合适的人一起工作并不是一件容易的事。在这项工作中，我们从Source forge中提取可用的信息。Net，最大的开源软件数据库，并构建由开发人员、项目和项目属性信息组成的开发人员协作网络。基于输入开发者，我们会根据他们的编程语言技能，过去的项目和他们之前工作过的项目类别，通过随机漫步和重新启动过程，推荐一份最兼容的顶级开发者列表。我们的定量和定性实验表明，我们能够从Source forge的快照中推荐合理的开发人员候选人。网由数以万计的开发商和项目组成，以及数以百计的项目物业。

{"title":"Recommending People in Developers' Collaboration Network","authors":"Didi Surian, Nian Liu, D. Lo, Hanghang Tong, Ee-Peng Lim, C. Faloutsos","doi":"10.1109/WCRE.2011.53","DOIUrl":"https://doi.org/10.1109/WCRE.2011.53","url":null,"abstract":"Many software developments involve collaborations of developers across the globe. This is true for both open-source and closed-source development efforts. Developers collaborate on different projects of various types. As with any other teamwork endeavors, finding compatibility among members in a development team is helpful towards the realization of the team's goal. Compatible members tend to share similar programming style and naming strategy, communicate well with one another, etc. However, finding the right person to work with is not an easy task. In this work, we extract information available from Source forge. Net, the largest database of open source software, and build developer collaboration network comprising of information on developers, projects, and project properties. Based on an input developer, we then recommend a list of top developers that are most compatible based on their programming language skills, past projects and project categories they have worked on before, via a random walk with restart procedure. Our quantitative and qualitative experiments show that we are able to recommend reasonable developer candidates from snapshots of Source forge. Net consisting of tens of thousands of developers and projects, and hundreds of project properties.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129526643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

iProblems - An Integrated Instrument for Reporting Design Flaws, Vulnerabilities and Defects 问题-报告设计缺陷、漏洞和缺陷的集成工具

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.65

Mihai Codoban, C. Marinescu, Radu Marinescu

In the current demonstration we present a new instrument that provides for each existing class in an analyzed system information related to the problems the class reveals. We associate different types of problems to a class: design flaws, vulnerabilities and defects. In order to validate its usefulness, we perform some experiments on a suite of object-oriented systems and some results are briefly presented in the last part of the demo.

在当前的演示中，我们提出了一种新的工具，它为分析系统中的每个现有类提供与类揭示的问题相关的信息。我们将不同类型的问题关联到一个类中:设计缺陷、漏洞和缺陷。为了验证其有效性，我们在一套面向对象系统上进行了一些实验，并在演示的最后部分简要介绍了一些结果。

引用次数: 4

Reconstructing Traceability between Bugs and Test Cases: An Experimental Study 重构bug和测试用例之间的可追溯性:一项实验研究

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.58

Nilam Kaushik, L. Tahvildari, Mark Moore

In manual testing, testers typically follow the steps listed in the bug report to verify whether a bug has been fixed or not. Depending on time and availability of resources, a tester may execute some additional test cases to ensure test coverage. In the case of manual testing, the process of finding the most relevant manual test cases to run is largely manual and involves tester expertise. From a usability standpoint, the task of finding the most relevant test cases is tedious as the tester typically has to switch between the defect management tool and the test case management tool in order to search for test cases relevant to the bug at hand. In this paper, we use IR techniques to recover trace ability between bugs and test cases with the aim of recommending test cases for bugs. We report on our experience of recovering trace ability between bugs and test cases using techniques such as Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) through a small industrial case study.

在手动测试中，测试人员通常按照错误报告中列出的步骤来验证错误是否已经修复。根据时间和资源的可用性，测试人员可以执行一些额外的测试用例来确保测试覆盖率。在手工测试的情况下，寻找最相关的手工测试用例运行的过程在很大程度上是手工的，并且涉及测试人员的专业知识。从可用性的角度来看，寻找最相关的测试用例的任务是乏味的，因为测试人员通常必须在缺陷管理工具和测试用例管理工具之间切换，以便搜索与手头的bug相关的测试用例。在本文中，我们使用IR技术来恢复bug和测试用例之间的跟踪能力，目的是为bug推荐测试用例。我们通过一个小型工业案例研究，报告了我们使用潜在语义索引(LSI)和潜在狄利克雷分配(LDA)等技术在bug和测试用例之间恢复追踪能力的经验。

引用次数: 22

Towards the Extraction of Domain Concepts from the Identifiers 从标识符中提取领域概念的研究

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.19

S. Abebe, P. Tonella

Program identifiers represent an invaluable source of information for developers who are not familiar with the code to be evolved. Domain concepts and inter-concept relationships can be automatically extracted by means of natural language processing techniques applied to the program identifiers. However, the ontology produced by this approach tends to be very large and to include implementation details that reduce its usefulness for domain concept understanding. In this paper, we analyze the effectiveness of information retrieval based techniques used to filter domain concepts and relations from the implementation details, so as to obtain a smaller, more informative domain ontology. In particular, we show that fully automated techniques based on keywords or topics have quite poor performance, while a semi-automated approach, requiring limited user involvement, can highly improve the filtering of domain concepts.

程序标识符为不熟悉要开发的代码的开发人员提供了宝贵的信息来源。通过应用于程序标识符的自然语言处理技术，可以自动提取领域概念和概念间关系。然而，这种方法产生的本体往往非常庞大，并且包含实现细节，这降低了它对领域概念理解的有用性。本文分析了基于信息检索技术从实现细节中过滤领域概念和关系的有效性，从而获得更小、信息量更大的领域本体。特别是，我们表明基于关键字或主题的全自动技术性能相当差，而需要有限用户参与的半自动方法可以极大地改善领域概念的过滤。

引用次数: 24

Locating the Meaning of Terms in Source Code Research on "Term Introduction" 源代码中术语意义的定位——“术语介绍”研究

2011 18th Working Conference on Reverse Engineering

Pub Date : 2011-10-17 DOI: 10.1109/WCRE.2011.21

Jan Nonnen, D. Speicher, Paul Imhoff

Software developers are often facing the challenge of understanding a large code base. Program comprehension is not only achieved by looking at object interactions, but also by considering the meaning of the identifiers and the contained terms. Ideally, the source code should exemplify this meaning. We propose to call the source code locations that define the meaning of a term term introduction. We further derive a heuristic to determine the introduction location with the help of an explorative study. This study was performed on 8000 manually evaluated samples gained from 30 open source projects. To support reproducibility, all samples and classifications are also available online. The achieved results show a precision of 75% for the heuristic.

软件开发人员经常面临理解大型代码库的挑战。程序理解不仅通过查看对象交互来实现，而且还通过考虑标识符和包含的术语的含义来实现。理想情况下，源代码应该举例说明这一含义。我们建议将定义术语含义的源代码位置称为术语介绍。在探索性研究的基础上，进一步推导出一种确定引子位置的启发式方法。这项研究是在从30个开源项目中获得的8000个人工评估样本上进行的。为了支持可重复性，所有样本和分类也可在线获得。所取得的结果表明，启发式的精度为75%。

引用次数: 11

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 18th Working Conference on Reverse Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀