2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)最新文献_第4页

Benchmarks for software clone detection: A ten-year retrospective 软件克隆检测的基准:十年回顾

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330194

C. Roy, J. Cordy

There have been a great many methods and tools proposed for software clone detection. While some work has been done on assessing and comparing performance of these tools, very little empirical evaluation has been done. In particular, accuracy measures such as precision and recall have only been roughly estimated, due both to problems in creating a validated clone benchmark against which tools can be compared, and to the manual effort required to hand check large numbers of candidate clones. In order to cope with this issue, over the last 10 years we have been working towards building cloning benchmarks for objectively evaluating clone detection tools. Beginning with our WCRE 2008 paper, where we conducted a modestly large empirical study with the NiCad clone detection tool, over the past ten years we have extended and grown our work to include several languages, much larger datasets, and model clones in languages such as Simulink. From a modest set of 15 C and Java systems comprising a total of 7 million lines in 2008, our work has progressed to a benchmark called BigCloneBench with eight million manually validated clone pairs in a large inter-project source dataset of more than 25,000 projects and 365 million lines of code. In this paper, we present a history and overview of software clone detection benchmarks, and review the steps of ourselves and others to come to this stage. We outline a future for clone detection benchmarks and hope to encourage researchers to both use existing benchmarks and to contribute to building the benchmarks of the future.

目前已经提出了许多软件克隆检测的方法和工具。虽然在评估和比较这些工具的性能方面已经做了一些工作，但很少进行实证评估。特别是，精确度和召回率之类的准确性度量只能粗略估计，这是由于创建可与工具进行比较的经过验证的克隆基准存在问题，并且需要手工检查大量候选克隆。为了解决这个问题，在过去的10年里，我们一直致力于建立克隆基准，以客观地评估克隆检测工具。从我们的WCRE 2008论文开始，我们使用NiCad克隆检测工具进行了中等规模的实证研究，在过去的十年中，我们扩展和发展了我们的工作，包括几种语言，更大的数据集，以及用Simulink等语言进行的模型克隆。从2008年的15个C和Java系统，总共700万行，我们的工作已经发展到一个名为BigCloneBench的基准，在一个大型项目间源数据集中有超过25,000个项目和3.65亿行代码，其中有800万对手动验证的克隆对。在本文中，我们介绍了软件克隆检测基准的历史和概述，并回顾了我们自己和其他人达到这一阶段的步骤。我们概述了克隆检测基准的未来，并希望鼓励研究人员使用现有基准并为构建未来的基准做出贡献。

{"title":"Benchmarks for software clone detection: A ten-year retrospective","authors":"C. Roy, J. Cordy","doi":"10.1109/SANER.2018.8330194","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330194","url":null,"abstract":"There have been a great many methods and tools proposed for software clone detection. While some work has been done on assessing and comparing performance of these tools, very little empirical evaluation has been done. In particular, accuracy measures such as precision and recall have only been roughly estimated, due both to problems in creating a validated clone benchmark against which tools can be compared, and to the manual effort required to hand check large numbers of candidate clones. In order to cope with this issue, over the last 10 years we have been working towards building cloning benchmarks for objectively evaluating clone detection tools. Beginning with our WCRE 2008 paper, where we conducted a modestly large empirical study with the NiCad clone detection tool, over the past ten years we have extended and grown our work to include several languages, much larger datasets, and model clones in languages such as Simulink. From a modest set of 15 C and Java systems comprising a total of 7 million lines in 2008, our work has progressed to a benchmark called BigCloneBench with eight million manually validated clone pairs in a large inter-project source dataset of more than 25,000 projects and 365 million lines of code. In this paper, we present a history and overview of software clone detection benchmarks, and review the steps of ourselves and others to come to this stage. We outline a future for clone detection benchmarks and hope to encourage researchers to both use existing benchmarks and to contribute to building the benchmarks of the future.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"10 1","pages":"26-37"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84647222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Automated quality assessment for crowdsourced test reports of mobile applications 移动应用众包测试报告的自动质量评估

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330224

Xin Chen, He Jiang, Xiaochen Li, Tieke He, Zhenyu Chen

In crowdsourced mobile application testing, crowd workers help developers perform testing and submit test reports for unexpected behaviors. These submitted test reports usually provide critical information for developers to understand and reproduce the bugs. However, due to the poor performance of workers and the inconvenience of editing on mobile devices, the quality of test reports may vary sharply. At times developers have to spend a significant portion of their available resources to handle the low-quality test reports, thus heavily decreasing their efficiency. In this paper, to help developers predict whether a test report should be selected for inspection within limited resources, we propose a new framework named TERQAF to automatically model the quality of test reports. TERQAF defines a series of quantifiable indicators to measure the desirable properties of test reports and aggregates the numerical values of all indicators to determine the quality of test reports by using step transformation functions. Experiments conducted over five crowdsourced test report datasets of mobile applications show that TERQAF can correctly predict the quality of test reports with accuracy of up to 88.06% and outperform baselines by up to 23.06%. Meanwhile, the experimental results also demonstrate that the four categories of measurable indicators have positive impacts on TERQAF in evaluating the quality of test reports.

在众包移动应用程序测试中，众包工作者帮助开发人员执行测试并提交意外行为的测试报告。这些提交的测试报告通常为开发人员提供理解和重现错误的关键信息。然而，由于工作人员的工作能力差，以及在移动设备上编辑的不便，测试报告的质量可能会有很大的差异。有时，开发人员不得不花费大量可用资源来处理低质量的测试报告，从而严重降低了他们的效率。在本文中，为了帮助开发人员预测是否应该在有限的资源内选择测试报告进行检查，我们提出了一个名为TERQAF的新框架来自动建模测试报告的质量。TERQAF定义了一系列可量化的指标来衡量测试报告的理想属性，并利用阶跃变换函数将所有指标的数值相加来确定测试报告的质量。在5个移动应用众包测试报告数据集上进行的实验表明，TERQAF能够正确预测测试报告的质量，准确率高达88.06%，优于基线的准确率高达23.06%。同时，实验结果也表明，四类可测量指标对TERQAF评价检测报告质量有正向影响。

{"title":"Automated quality assessment for crowdsourced test reports of mobile applications","authors":"Xin Chen, He Jiang, Xiaochen Li, Tieke He, Zhenyu Chen","doi":"10.1109/SANER.2018.8330224","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330224","url":null,"abstract":"In crowdsourced mobile application testing, crowd workers help developers perform testing and submit test reports for unexpected behaviors. These submitted test reports usually provide critical information for developers to understand and reproduce the bugs. However, due to the poor performance of workers and the inconvenience of editing on mobile devices, the quality of test reports may vary sharply. At times developers have to spend a significant portion of their available resources to handle the low-quality test reports, thus heavily decreasing their efficiency. In this paper, to help developers predict whether a test report should be selected for inspection within limited resources, we propose a new framework named TERQAF to automatically model the quality of test reports. TERQAF defines a series of quantifiable indicators to measure the desirable properties of test reports and aggregates the numerical values of all indicators to determine the quality of test reports by using step transformation functions. Experiments conducted over five crowdsourced test report datasets of mobile applications show that TERQAF can correctly predict the quality of test reports with accuracy of up to 88.06% and outperform baselines by up to 23.06%. Meanwhile, the experimental results also demonstrate that the four categories of measurable indicators have positive impacts on TERQAF in evaluating the quality of test reports.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"1 1","pages":"368-379"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83091898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

An extensible approach for taming the challenges of JavaScript dead code elimination 一种可扩展的方法，用于克服JavaScript死代码消除的挑战

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330226

N. Obbink, I. Malavolta, Gian Luca Scoccia, P. Lago

JavaScript is becoming the de-facto programming language of the Web. Large-scale web applications (web apps) written in Javascript are commonplace nowadays, with big technology players (e.g., Google, Facebook) using it in their core flagship products. Today, it is common practice to reuse existing JavaScript code, usually in the form of third-party libraries and frameworks. If on one side this practice helps in speeding up development time, on the other side it comes with the risk of bringing dead code, i.e., JavaScript code which is never executed, but still downloaded from the network and parsed in the browser. This overhead can negatively impact the overall performance and energy consumption of the web app. In this paper we present Lacuna, an approach for JavaScript dead code elimination, where existing JavaScript analysis techniques are applied in combination. The proposed approach supports both static and dynamic analyses, it is extensible, and independent of the specificities of the used JavaScript analysis techniques. Lacuna can be applied to any JavaScript code base, without imposing any constraints to the developer, e.g., on her coding style or on the use of some specific JavaScript feature (e.g., modules). Lacuna has been evaluated on a suite of 29 publicly-available web apps, composed of 15,946 JavaScript functions, and built with different JavaScript frameworks (e.g., Angular, Vue.js, jQuery). Despite being a prototype, Lacuna obtained promising results in terms of analysis execution time and precision.

JavaScript正在成为事实上的Web编程语言。如今，用Javascript编写的大规模web应用程序(web apps)很常见，大型技术公司(如Google, Facebook)在其核心旗舰产品中使用Javascript。如今，重用现有JavaScript代码是一种常见的做法，通常以第三方库和框架的形式。一方面，这种做法有助于加快开发时间，另一方面，它带来了死代码的风险，即永远不会执行的JavaScript代码，但仍然从网络下载并在浏览器中解析。这种开销会对web应用程序的整体性能和能耗产生负面影响。在本文中，我们提出了一种消除JavaScript死代码的方法Lacuna，该方法结合了现有的JavaScript分析技术。建议的方法支持静态和动态分析，它是可扩展的，并且独立于所使用的JavaScript分析技术的特性。Lacuna可以应用于任何JavaScript代码库，而不会对开发人员施加任何限制，例如，对她的编码风格或对某些特定JavaScript功能(例如，模块)的使用。Lacuna已经在一套29个公开可用的web应用程序上进行了评估，这些应用程序由15,946个JavaScript函数组成，并使用不同的JavaScript框架(例如，Angular, Vue.js, jQuery)构建。尽管是一个原型，Lacuna在分析执行时间和精度方面取得了令人鼓舞的结果。

{"title":"An extensible approach for taming the challenges of JavaScript dead code elimination","authors":"N. Obbink, I. Malavolta, Gian Luca Scoccia, P. Lago","doi":"10.1109/SANER.2018.8330226","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330226","url":null,"abstract":"JavaScript is becoming the de-facto programming language of the Web. Large-scale web applications (web apps) written in Javascript are commonplace nowadays, with big technology players (e.g., Google, Facebook) using it in their core flagship products. Today, it is common practice to reuse existing JavaScript code, usually in the form of third-party libraries and frameworks. If on one side this practice helps in speeding up development time, on the other side it comes with the risk of bringing dead code, i.e., JavaScript code which is never executed, but still downloaded from the network and parsed in the browser. This overhead can negatively impact the overall performance and energy consumption of the web app. In this paper we present Lacuna, an approach for JavaScript dead code elimination, where existing JavaScript analysis techniques are applied in combination. The proposed approach supports both static and dynamic analyses, it is extensible, and independent of the specificities of the used JavaScript analysis techniques. Lacuna can be applied to any JavaScript code base, without imposing any constraints to the developer, e.g., on her coding style or on the use of some specific JavaScript feature (e.g., modules). Lacuna has been evaluated on a suite of 29 publicly-available web apps, composed of 15,946 JavaScript functions, and built with different JavaScript frameworks (e.g., Angular, Vue.js, jQuery). Despite being a prototype, Lacuna obtained promising results in terms of analysis execution time and precision.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"14 1","pages":"291-401"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78257347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Grammatical inference from data exchange files: An experiment on engineering software 基于数据交换文件的语法推断:一个工程软件实验

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330259

Markus Exler, M. Moser, J. Pichler, Günter Fleck, B. Dorninger

Complex engineering problems are typically solved by running a batch of software programs. Data exchange between these software programs is frequently based on semi-structured text files. These files are edited by text editors providing basic input support, however without proper input validation prior program execution. Consequently, even minor lexical or syntactic errors cause software programs to stop without delivering a result. To tackle these problems a more specific editor support, which is aware of language concepts of data exchange files, needs to be provided. In this paper, we investigate if and in what quality a language grammar can be inferred from a set of existing text files, in order to provide a basis for the desired editing support. For this experiment, we chose a Minimal Adequate Teacher (MAT) method together with specific preprocessing of the existing text files. Thereby, we were able to construct complete grammar rules for most of the language constructs found in a corpus of semi-structured text files. The inferred grammar, however, requires refactoring towards a suitable and maintainable basis for the desired editor support.

复杂的工程问题通常是通过运行一批软件程序来解决的。这些软件程序之间的数据交换通常基于半结构化的文本文件。这些文件由提供基本输入支持的文本编辑器编辑，但是在程序执行之前没有适当的输入验证。因此，即使是很小的词法或语法错误也会导致软件程序停止运行而无法交付结果。为了解决这些问题，需要提供更具体的编辑器支持，它知道数据交换文件的语言概念。在本文中，我们研究了一种语言语法是否可以从一组现有的文本文件中推断出来，以及以什么样的质量推断出来，以便为所需的编辑支持提供基础。在本实验中，我们选择了最小适足教师(MAT)方法，并对现有文本文件进行了特定的预处理。因此，我们能够为半结构化文本文件语料库中的大多数语言结构构建完整的语法规则。然而，推断出的语法需要重构，以获得所需编辑器支持的合适且可维护的基础。

{"title":"Grammatical inference from data exchange files: An experiment on engineering software","authors":"Markus Exler, M. Moser, J. Pichler, Günter Fleck, B. Dorninger","doi":"10.1109/SANER.2018.8330259","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330259","url":null,"abstract":"Complex engineering problems are typically solved by running a batch of software programs. Data exchange between these software programs is frequently based on semi-structured text files. These files are edited by text editors providing basic input support, however without proper input validation prior program execution. Consequently, even minor lexical or syntactic errors cause software programs to stop without delivering a result. To tackle these problems a more specific editor support, which is aware of language concepts of data exchange files, needs to be provided. In this paper, we investigate if and in what quality a language grammar can be inferred from a set of existing text files, in order to provide a basis for the desired editing support. For this experiment, we chose a Minimal Adequate Teacher (MAT) method together with specific preprocessing of the existing text files. Thereby, we were able to construct complete grammar rules for most of the language constructs found in a corpus of semi-structured text files. The inferred grammar, however, requires refactoring towards a suitable and maintainable basis for the desired editor support.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"5 1","pages":"557-561"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76631822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Micro-clones in evolving software 进化软件中的微克隆

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330196

Manishankar Mondal, C. Roy, Kevin A. Schneider

Detection, tracking, and refactoring of code clones (i.e., identical or nearly similar code fragments in the code-base of a software system) have been extensively investigated by a great many studies. Code clones have often been considered bad smells. While clone refactoring is important for removing code clones from the code-base, clone tracking is important for consistently updating code clones that are not suitable for refactoring. In this research we investigate the importance of micro-clones (i.e., code clones of less than five lines of code) in consistent updating of the code-base. While the existing clone detectors and trackers have ignored micro clones, our investigation on thousands of commits from six subject systems imply that around 80% of all consistent updates during system evolution occur in micro clones. The percentage of consistent updates occurring in micro clones is significantly higher than that in regular clones according to our statistical significance tests. Also, the consistent updates occurring in micro-clones can be up to 23% of all updates during the whole period of evolution. According to our manual analysis, around 83% of the consistent updates in micro-clones are non-trivial. As micro-clones also require consistent updates like the regular clones, tracking or refactoring micro-clones can help us considerably minimize effort for consistently updating such clones. Thus, micro-clones should also be taken into proper consideration when making clone management decisions.

对代码克隆(即软件系统代码库中相同或几乎相似的代码片段)的检测、跟踪和重构已经进行了大量的研究。代码克隆通常被认为是不好的气味。虽然克隆重构对于从代码库中移除代码克隆很重要，但克隆跟踪对于持续更新不适合重构的代码克隆也很重要。在本研究中，我们研究了微克隆(即少于5行代码的代码克隆)在代码库一致更新中的重要性。虽然现有的克隆检测器和跟踪器忽略了微克隆，但我们对来自六个主题系统的数千次提交的调查表明，在系统进化过程中，大约80%的一致更新发生在微克隆中。根据我们的统计显著性检验，微克隆中发生的一致性更新的百分比明显高于常规克隆。此外，在整个进化过程中，发生在微型克隆中的一致性更新可能高达所有更新的23%。根据我们的手工分析，微克隆中大约83%的一致性更新是非琐碎的。由于微型克隆也需要与常规克隆一样的一致性更新，跟踪或重构微型克隆可以帮助我们大大减少持续更新此类克隆的工作量。因此，在进行克隆管理决策时，也应适当考虑微型克隆。

{"title":"Micro-clones in evolving software","authors":"Manishankar Mondal, C. Roy, Kevin A. Schneider","doi":"10.1109/SANER.2018.8330196","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330196","url":null,"abstract":"Detection, tracking, and refactoring of code clones (i.e., identical or nearly similar code fragments in the code-base of a software system) have been extensively investigated by a great many studies. Code clones have often been considered bad smells. While clone refactoring is important for removing code clones from the code-base, clone tracking is important for consistently updating code clones that are not suitable for refactoring. In this research we investigate the importance of micro-clones (i.e., code clones of less than five lines of code) in consistent updating of the code-base. While the existing clone detectors and trackers have ignored micro clones, our investigation on thousands of commits from six subject systems imply that around 80% of all consistent updates during system evolution occur in micro clones. The percentage of consistent updates occurring in micro clones is significantly higher than that in regular clones according to our statistical significance tests. Also, the consistent updates occurring in micro-clones can be up to 23% of all updates during the whole period of evolution. According to our manual analysis, around 83% of the consistent updates in micro-clones are non-trivial. As micro-clones also require consistent updates like the regular clones, tracking or refactoring micro-clones can help us considerably minimize effort for consistently updating such clones. Thus, micro-clones should also be taken into proper consideration when making clone management decisions.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"15 1","pages":"50-60"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85461957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Structured random differential testing of instruction decoders 指令解码器的结构化随机差分测试

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330199

Nathan Jay, B. Miller

Decoding binary executable files is a critical facility for software analysis, including debugging, performance monitoring, malware detection, cyber forensics, and sandboxing, among other techniques. As a foundational capability, binary decoding must be consistently correct for the techniques that rely on it to be viable. Unfortunately, modern instruction sets are huge and the encodings are complex, so as a result, modern binary decoders are buggy. In this paper, we present a testing methodology that automatically infers structural information for an instruction set and uses the inferred structure to efficiently generate structured-random test cases independent of the instruction set being tested. Our testing methodology includes automatic output verification using differential analysis and reassembly to generate error reports. This testing methodology requires little instruction-set-specific knowledge, allowing rapid testing of decoders for new architectures and extensions to existing ones. We have implemented our testing procedure in a tool name Fleece and used it to test multiple binary decoders (Intel XED, libopcodes, LLVM, Dyninst and Capstone) on multiple architectures (x86, ARM and PowerPC). Our testing efficiently covered thousands of instruction format variations for each instruction set and uncovered decoding bugs in every decoder we tested.

解码二进制可执行文件是软件分析的关键工具，包括调试、性能监控、恶意软件检测、网络取证和沙盒等技术。作为一项基本能力，二进制解码必须始终正确，这样依赖于它的技术才可行。不幸的是，现代的指令集是巨大的，编码是复杂的，因此，现代二进制解码器是错误的。在本文中，我们提出了一种测试方法，可以自动推断指令集的结构信息，并使用推断的结构有效地生成独立于被测试指令集的结构化随机测试用例。我们的测试方法包括使用差异分析和重新组装来生成错误报告的自动输出验证。这种测试方法几乎不需要特定于指令集的知识，允许对新架构和现有架构的扩展的解码器进行快速测试。我们已经在一个名为Fleece的工具中实现了我们的测试过程，并使用它在多个架构(x86, ARM和PowerPC)上测试多个二进制解码器(Intel XED, libopcodes, LLVM, Dyninst和Capstone)。我们的测试有效地覆盖了每个指令集的数千种指令格式变化，并发现了我们测试的每个解码器中的解码错误。

{"title":"Structured random differential testing of instruction decoders","authors":"Nathan Jay, B. Miller","doi":"10.1109/SANER.2018.8330199","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330199","url":null,"abstract":"Decoding binary executable files is a critical facility for software analysis, including debugging, performance monitoring, malware detection, cyber forensics, and sandboxing, among other techniques. As a foundational capability, binary decoding must be consistently correct for the techniques that rely on it to be viable. Unfortunately, modern instruction sets are huge and the encodings are complex, so as a result, modern binary decoders are buggy. In this paper, we present a testing methodology that automatically infers structural information for an instruction set and uses the inferred structure to efficiently generate structured-random test cases independent of the instruction set being tested. Our testing methodology includes automatic output verification using differential analysis and reassembly to generate error reports. This testing methodology requires little instruction-set-specific knowledge, allowing rapid testing of decoders for new architectures and extensions to existing ones. We have implemented our testing procedure in a tool name Fleece and used it to test multiple binary decoders (Intel XED, libopcodes, LLVM, Dyninst and Capstone) on multiple architectures (x86, ARM and PowerPC). Our testing efficiently covered thousands of instruction format variations for each instruction set and uncovered decoding bugs in every decoder we tested.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"11 9","pages":"84-94"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91496192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Detecting third-party libraries in Android applications with high precision and recall 检测Android应用程序中的第三方库，精度高，召回率高

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330204

Yuan Zhang, Jiarun Dai, Xiaohan Zhang, S. Huang, Zhemin Yang, Min Yang, Hao Chen

Third-party libraries are widely used in Android applications to ease development and enhance functionalities. However, the incorporated libraries also bring new security & privacy issues to the host application, and blur the accounting between application code and library code. Under this situation, a precise and reliable library detector is highly desirable. In fact, library code may be customized by developers during integration and dead library code may be eliminated by code obfuscators during application build process. However, existing research on library detection has not gracefully handled these problems, thus facing severe limitations in practice. In this paper, we propose LibPecker, an obfuscation-resilient, highly precise and reliable library detector for Android applications. LibPecker adopts signature matching to give a similarity score between a given library and an application. By fully utilizing the internal class dependencies inside a library, LibPecker generates a strict signature for each class. To tolerate library code customization and elimination as much as possible, LibPecker introduces adaptive class similarity threshold and weighted class similarity score when calculating library similarity. To quantitatively evaluate the precision and the recall of LibPecker, we perform the first such experiment (to the best of our knowledge) with a large number of libraries and applications. Results show that LibPecker significantly outperforms the state-of-the-art tools in both recall and precision (91% and 98.1% respectively).

第三方库广泛用于Android应用程序，以简化开发和增强功能。然而，合并的库也给宿主应用程序带来了新的安全和隐私问题，并且模糊了应用程序代码和库代码之间的记账。在这种情况下，需要一个精确可靠的库检测器。实际上，库代码可以由开发人员在集成期间定制，而无用的库代码可以在应用程序构建过程中被代码混淆器消除。然而，现有的图书馆检测研究并没有很好地处理这些问题，因此在实践中面临着严重的局限性。在本文中，我们提出LibPecker，一个用于Android应用程序的混淆弹性，高精度和可靠的库检测器。LibPecker采用签名匹配来给出给定库和应用程序之间的相似度评分。通过充分利用库中的内部类依赖，LibPecker为每个类生成严格的签名。为了尽可能地容忍库代码的定制和消除，LibPecker在计算库相似度时引入了自适应类相似度阈值和加权类相似度评分。为了定量地评估LibPecker的精确度和召回率，我们用大量的库和应用程序执行了第一个这样的实验(据我们所知)。结果表明，LibPecker在召回率和准确率方面都明显优于最先进的工具(分别为91%和98.1%)。

{"title":"Detecting third-party libraries in Android applications with high precision and recall","authors":"Yuan Zhang, Jiarun Dai, Xiaohan Zhang, S. Huang, Zhemin Yang, Min Yang, Hao Chen","doi":"10.1109/SANER.2018.8330204","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330204","url":null,"abstract":"Third-party libraries are widely used in Android applications to ease development and enhance functionalities. However, the incorporated libraries also bring new security & privacy issues to the host application, and blur the accounting between application code and library code. Under this situation, a precise and reliable library detector is highly desirable. In fact, library code may be customized by developers during integration and dead library code may be eliminated by code obfuscators during application build process. However, existing research on library detection has not gracefully handled these problems, thus facing severe limitations in practice. In this paper, we propose LibPecker, an obfuscation-resilient, highly precise and reliable library detector for Android applications. LibPecker adopts signature matching to give a similarity score between a given library and an application. By fully utilizing the internal class dependencies inside a library, LibPecker generates a strict signature for each class. To tolerate library code customization and elimination as much as possible, LibPecker introduces adaptive class similarity threshold and weighted class similarity score when calculating library similarity. To quantitatively evaluate the precision and the recall of LibPecker, we perform the first such experiment (to the best of our knowledge) with a large number of libraries and applications. Results show that LibPecker significantly outperforms the state-of-the-art tools in both recall and precision (91% and 98.1% respectively).","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"53 1","pages":"141-152"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75946663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Reconciling the past and the present: An empirical study on the application of source code transformations to automatically rejuvenate Java programs 调和过去和现在:对源代码转换应用程序自动恢复Java程序的实证研究

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330247

Reno Dantas, Antonio Carvalho, Diego Marcilio, Luisa Fantin, Uriel Silva, Walter Lucas, R. Bonifácio

Software systems change frequently over time, either due to new business requirements or technology pressures. Programming languages evolve in a similar constant fashion, though when a language release introduces new programming constructs, older constructs and idioms might become obsolete. The coexistence between newer and older constructs leads to several problems, such as increased maintenance efforts and steeper learning curve for developers. In this paper we present a RASCAL Java transformation library that evolves legacy systems to use more recent programming language constructs (such as multi-catch and lambda expressions). In order to understand how relevant automatic software rejuvenation is, we submitted 2462 transformations to 40 open source projects via the GitHub pull request mechanism. Initial results show that simple transformations, for instance the introduction of the diamond operator, are more likely to be accepted than transformations that change the code substantially, such as refactoring enhanced for loops to the newer functional style.

随着时间的推移，由于新的业务需求或技术压力，软件系统经常发生变化。编程语言以类似的恒定方式发展，尽管当语言版本引入新的编程结构时，旧的结构和习惯用法可能会过时。新旧结构之间的共存导致了一些问题，例如增加了维护工作，开发人员的学习曲线更陡峭。在本文中，我们提出了一个RASCAL Java转换库，它将遗留系统发展为使用更最新的编程语言结构(例如多捕获和lambda表达式)。为了了解自动软件复兴的相关程度，我们通过GitHub pull request机制向40个开源项目提交了2462个转换。最初的结果表明，简单的转换(例如引入菱形运算符)比实质性地改变代码的转换(例如将增强的for循环重构为更新的函数式风格)更容易被接受。

{"title":"Reconciling the past and the present: An empirical study on the application of source code transformations to automatically rejuvenate Java programs","authors":"Reno Dantas, Antonio Carvalho, Diego Marcilio, Luisa Fantin, Uriel Silva, Walter Lucas, R. Bonifácio","doi":"10.1109/SANER.2018.8330247","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330247","url":null,"abstract":"Software systems change frequently over time, either due to new business requirements or technology pressures. Programming languages evolve in a similar constant fashion, though when a language release introduces new programming constructs, older constructs and idioms might become obsolete. The coexistence between newer and older constructs leads to several problems, such as increased maintenance efforts and steeper learning curve for developers. In this paper we present a RASCAL Java transformation library that evolves legacy systems to use more recent programming language constructs (such as multi-catch and lambda expressions). In order to understand how relevant automatic software rejuvenation is, we submitted 2462 transformations to 40 open source projects via the GitHub pull request mechanism. Initial results show that simple transformations, for instance the introduction of the diamond operator, are more likely to be accepted than transformations that change the code substantially, such as refactoring enhanced for loops to the newer functional style.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"23 1","pages":"497-501"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82177113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Extracting features from requirements: Achieving accuracy and automation with neural networks 从需求中提取特征:用神经网络实现准确性和自动化

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330243

Y. Li, Sandro Schulze, G. Saake

Analyzing and extracting features and variability from different artifacts is an indispensable activity to support systematic integration of single software systems and Software Product Line (SPL). Beyond manually extracting variability, a variety of approaches, such as feature location in source code and feature extraction in requirements, has been proposed for automating the identification of features and their variation points. While requirements contain more complete variability information and provide traceability links to other artifacts, current techniques exhibit a lack of accuracy as well as a limited degree of automation. In this paper, we propose an unsupervised learning structure to overcome the abovementioned limitations. In particular, our technique consists of two steps: First, we apply Laplacian Eigenmaps, an unsupervised dimensionality reduction technique, to embed text requirements into compact binary codes. Second, requirements are transformed into a matrix representation by looking up a pre-trained word embedding. Then, the matrix is fed into CNN to learn linguistic characteristics of the requirements. Furthermore, we train CNN by matching the output of CNN with the pre-trained binary codes. Initial results show that accuracy is still limited, but that our approach allows to automate the entire process.

分析并从不同的工件中提取特征和可变性是支持单个软件系统和软件产品线(SPL)的系统集成必不可少的活动。除了手动提取可变性之外，已经提出了各种方法，例如源代码中的特征定位和需求中的特征提取，用于自动识别特征及其变异点。虽然需求包含更完整的可变性信息，并提供到其他工件的可追溯性链接，但当前的技术显示出缺乏准确性以及有限程度的自动化。在本文中，我们提出了一种无监督学习结构来克服上述限制。特别是，我们的技术包括两个步骤:首先，我们应用拉普拉斯特征映射，一种无监督降维技术，将文本需求嵌入到紧凑的二进制代码中。其次，通过查找预训练的词嵌入，将需求转换为矩阵表示。然后，将矩阵输入CNN学习语言特征的需求。此外，我们通过将CNN的输出与预训练的二进制代码进行匹配来训练CNN。初步结果表明，准确性仍然有限，但我们的方法允许整个过程自动化。

{"title":"Extracting features from requirements: Achieving accuracy and automation with neural networks","authors":"Y. Li, Sandro Schulze, G. Saake","doi":"10.1109/SANER.2018.8330243","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330243","url":null,"abstract":"Analyzing and extracting features and variability from different artifacts is an indispensable activity to support systematic integration of single software systems and Software Product Line (SPL). Beyond manually extracting variability, a variety of approaches, such as feature location in source code and feature extraction in requirements, has been proposed for automating the identification of features and their variation points. While requirements contain more complete variability information and provide traceability links to other artifacts, current techniques exhibit a lack of accuracy as well as a limited degree of automation. In this paper, we propose an unsupervised learning structure to overcome the abovementioned limitations. In particular, our technique consists of two steps: First, we apply Laplacian Eigenmaps, an unsupervised dimensionality reduction technique, to embed text requirements into compact binary codes. Second, requirements are transformed into a matrix representation by looking up a pre-trained word embedding. Then, the matrix is fed into CNN to learn linguistic characteristics of the requirements. Furthermore, we train CNN by matching the output of CNN with the pre-trained binary codes. Initial results show that accuracy is still limited, but that our approach allows to automate the entire process.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"16 1","pages":"477-481"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81890656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Generating descriptions for screenshots to assist crowdsourced testing 为屏幕截图生成描述，以协助众包测试

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330246

Di Liu, Xiaofang Zhang, Yang Feng, James A. Jones

Crowdsourced software testing has been shown to be capable of detecting many bugs and simulating real usage scenarios. As such, it is popular in mobile-application testing. However in mobile testing, test reports often consist of only some screenshots and short text descriptions. Inspecting and under-standing the overwhelming number of mobile crowdsourced test reports becomes a time-consuming but inevitable task. The paucity and potential inaccuracy of textual information and the well-defined screenshots of activity views within mobile applications motivate us to propose a novel technique to assist developers in understanding crowdsourced test reports by automatically describing the screenshots. To reach this goal, in this paper, we propose a fully automatic technique to generate descriptive words for the well-defined screenshots. We employ the test reports written by professional testers to build up language models. We use the computer-vision technique, namely Spatial Pyramid Matching (SPM), to measure similarities and extract features from the screenshot images. The experimental results, based on more than 1000 test reports from 4 industrial crowdsourced projects, show that our proposed technique is promising for developers to better understand the mobile crowdsourced test reports.

众包软件测试已经被证明能够检测出许多bug并模拟真实的使用场景。因此，它在移动应用程序测试中很受欢迎。然而，在移动测试中，测试报告通常只包含一些屏幕截图和简短的文本描述。检查和理解大量的移动众包测试报告成为一项耗时但不可避免的任务。文本信息的缺乏和潜在的不准确性，以及移动应用程序中活动视图的良好定义的屏幕截图，促使我们提出一种新技术，通过自动描述屏幕截图来帮助开发人员理解众包测试报告。为了达到这一目标，在本文中，我们提出了一种全自动技术来为定义良好的截图生成描述性单词。我们使用专业测试人员编写的测试报告来建立语言模型。我们使用计算机视觉技术，即空间金字塔匹配(SPM)，来测量相似度并从截图图像中提取特征。基于4个工业众包项目的1000多份测试报告的实验结果表明，我们提出的技术有望帮助开发人员更好地理解移动众包测试报告。

{"title":"Generating descriptions for screenshots to assist crowdsourced testing","authors":"Di Liu, Xiaofang Zhang, Yang Feng, James A. Jones","doi":"10.1109/SANER.2018.8330246","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330246","url":null,"abstract":"Crowdsourced software testing has been shown to be capable of detecting many bugs and simulating real usage scenarios. As such, it is popular in mobile-application testing. However in mobile testing, test reports often consist of only some screenshots and short text descriptions. Inspecting and under-standing the overwhelming number of mobile crowdsourced test reports becomes a time-consuming but inevitable task. The paucity and potential inaccuracy of textual information and the well-defined screenshots of activity views within mobile applications motivate us to propose a novel technique to assist developers in understanding crowdsourced test reports by automatically describing the screenshots. To reach this goal, in this paper, we propose a fully automatic technique to generate descriptive words for the well-defined screenshots. We employ the test reports written by professional testers to build up language models. We use the computer-vision technique, namely Spatial Pyramid Matching (SPM), to measure similarities and extract features from the screenshot images. The experimental results, based on more than 1000 test reports from 4 industrial crowdsourced projects, show that our proposed technique is promising for developers to better understand the mobile crowdsourced test reports.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"1 1","pages":"492-496"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89277915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14