首页 > 最新文献

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)最新文献

英文 中文
[Engineering Paper] Challenges of Implementing Cross Translation Unit Analysis in Clang Static Analyzer [工程论文]在Clang静态分析仪中实现交叉平移单元分析的挑战
G. Horváth, Péter Szécsi, Zoltán Gera, Dániel Krupp, Norbert Pataki
Static analysis is a great approach to find bugs and code smells. Some of the errors span across multiple translation units. Unfortunately, separate compilation makes cross translation unit analysis challenging for C family languages. In this paper, we describe a model and an implementation for cross translation unit symbolic execution for C family languages. We were able to extend the scope of the analysis without modifying any of the existing checkers. The analysis is implemented in the open source Clang compiler. We also measured the performance of the approach and the quality of the reports. The solution proved to be scalable to large codebases and the number of findings increased significantly for the evaluated projects. The implementation is already accepted into mainline Clang.
静态分析是发现bug和代码气味的好方法。有些错误跨越了多个翻译单元。不幸的是,单独的编译使得C家族语言的跨翻译单元分析变得困难。本文描述了C族语言中跨翻译单元符号执行的模型和实现。我们能够在不修改任何现有检查器的情况下扩展分析的范围。该分析是在开源Clang编译器中实现的。我们还测量了方法的性能和报告的质量。该解决方案被证明可以扩展到大型代码库,并且对于评估的项目,发现的数量显著增加。该实现已经被主流Clang接受。
{"title":"[Engineering Paper] Challenges of Implementing Cross Translation Unit Analysis in Clang Static Analyzer","authors":"G. Horváth, Péter Szécsi, Zoltán Gera, Dániel Krupp, Norbert Pataki","doi":"10.1109/SCAM.2018.00027","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00027","url":null,"abstract":"Static analysis is a great approach to find bugs and code smells. Some of the errors span across multiple translation units. Unfortunately, separate compilation makes cross translation unit analysis challenging for C family languages. In this paper, we describe a model and an implementation for cross translation unit symbolic execution for C family languages. We were able to extend the scope of the analysis without modifying any of the existing checkers. The analysis is implemented in the open source Clang compiler. We also measured the performance of the approach and the quality of the reports. The solution proved to be scalable to large codebases and the number of findings increased significantly for the evaluated projects. The implementation is already accepted into mainline Clang.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121532325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
[Engineering Paper] An IDE for Easy Programming of Simple Robotics Tasks [工程论文]一种简单机器人任务编程的IDE
David C. Shepherd, Patrick Francis, David Weintrop, Diana Franklin, Boyang Li, Afsoon Afzal
Many robotic tasks in small manufacturing sites are quite simple. For example, a pick and place task requires only a few common commands. Unfortunately, the standard languages and programming environments for industrial robots are complex, making even these simple tasks nearly impossible for novices. To enable novices to program simple tasks we created a block-based programming language and environment focused on usability, learnability, and understandability and embedded its programming environment in a state-of-the-art robot simulator. By using this high-fidelity prototype over the course of a year in a case study, a user study, and for countless demonstrations we have gained many concrete insights. In this paper we discuss the details of the language, the design of its programming environment, and concrete insights gained via longitudinal usage.
许多小型制造场所的机器人任务相当简单。例如,拾取和放置任务只需要几个常用命令。不幸的是,工业机器人的标准语言和编程环境非常复杂,使得即使是这些简单的任务对于新手来说也几乎不可能完成。为了使新手能够编写简单的任务,我们创建了一种基于块的编程语言和环境,重点关注可用性、易学性和可理解性,并将其编程环境嵌入到最先进的机器人模拟器中。通过在一年的案例研究、用户研究和无数演示中使用这个高保真原型,我们获得了许多具体的见解。在本文中,我们讨论了该语言的细节,其编程环境的设计,以及通过纵向使用获得的具体见解。
{"title":"[Engineering Paper] An IDE for Easy Programming of Simple Robotics Tasks","authors":"David C. Shepherd, Patrick Francis, David Weintrop, Diana Franklin, Boyang Li, Afsoon Afzal","doi":"10.1109/SCAM.2018.00032","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00032","url":null,"abstract":"Many robotic tasks in small manufacturing sites are quite simple. For example, a pick and place task requires only a few common commands. Unfortunately, the standard languages and programming environments for industrial robots are complex, making even these simple tasks nearly impossible for novices. To enable novices to program simple tasks we created a block-based programming language and environment focused on usability, learnability, and understandability and embedded its programming environment in a state-of-the-art robot simulator. By using this high-fidelity prototype over the course of a year in a case study, a user study, and for countless demonstrations we have gained many concrete insights. In this paper we discuss the details of the language, the design of its programming environment, and concrete insights gained via longitudinal usage.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133367734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
[Engineering Paper] RECKA and RPromF: Two Frama-C Plug-ins for Optimizing Registers Usage in CUDA, OpenACC and OpenMP Programs [工程论文]RECKA和RPromF:两个框架- c插件优化CUDA, OpenACC和OpenMP程序中的寄存器使用
R. Diarra, A. Mérigot, B. Vincke
Pointer aliasing still hinders compiler optimizations. The ISO C standard 99 has added the restrict keyword that allows programmer to specify non-aliasing as an aid to the compiler's optimizer. The task of annotating pointers with the restrict keyword is still left to the programmer and this task is, in general, tedious and prone to errors. Scalar replacement is an optimization widely used by compilers. In this paper, we present two new Frama-C plug-ins, RECKA for automatic annotation of CUDA kernels arguments with the restrict keyword, and RPromF for scalar replacement in OpenACC and OpenMP 4.0/4.5 codes for GPU. More specifically, RECKA works as follows: (i) an alias analysis is performed on CUDA kernels and their callers; (ii) if not found any alias then CUDA kernels are cloned, the clones are renamed and their arguments are annotated with the restrict qualifier; and (iii) instructions are added to kernels call sites to perform at runtime a less-than check analysis on kernel actuals parameters and determine if the clone must be called or the original one. RPromF includes five main steps: (i) OpenACC/OpenMP offloading regions are identified; (ii) functions containing these offloading codes and their callers are analyzed to check that there is no alias; (iii) if there is no alias then the offloading codes are cloned; (iv) clone's instructions are analyzed to retrieve data reuse information and perform scalar replacement; and instructions are added to be able to use the optimized clone whenever possible. We have evaluated the two plug-ins on PolyBench benchmark suite. The results show that both scalar replacement and the usage of restrict keyword are effective for improving the overall performance of OpenACC, OpenMP 4.0/4.5 and CUDA codes.
指针混叠仍然阻碍编译器优化。ISO C标准99增加了restrict关键字,允许程序员指定非混叠,作为编译器优化器的辅助。用restrict关键字注释指针的任务仍然留给程序员,而这项任务通常是乏味的,而且容易出错。标量替换是编译器广泛使用的一种优化方法。在本文中,我们提出了两个新的Frama-C插件,RECKA用于使用restrict关键字自动标注CUDA内核参数,RPromF用于GPU的OpenACC和OpenMP 4.0/4.5代码中的标量替换。更具体地说,RECKA的工作原理如下:(i)在CUDA内核及其调用者上执行别名分析;(ii)如果没有找到任何别名,那么CUDA内核被克隆,克隆被重命名,它们的参数用限制限定符注释;(iii)将指令添加到内核调用站点,以便在运行时对内核实际参数执行小于检查的分析,并确定是否必须调用克隆或原始版本。RPromF包括五个主要步骤:(i)确定OpenACC/OpenMP卸载区域;(ii)分析包含这些卸载代码的函数及其调用者,以检查是否没有别名;(iii)如果没有别名,则会克隆卸载代码;(iv)分析克隆指令,检索数据重用信息,进行标量替换;并且添加了指令,以便能够在任何可能的情况下使用优化的克隆。我们已经在PolyBench基准套件上评估了这两个插件。结果表明,标量替换和restrict关键字的使用对于提高OpenACC、OpenMP 4.0/4.5和CUDA代码的整体性能都是有效的。
{"title":"[Engineering Paper] RECKA and RPromF: Two Frama-C Plug-ins for Optimizing Registers Usage in CUDA, OpenACC and OpenMP Programs","authors":"R. Diarra, A. Mérigot, B. Vincke","doi":"10.1109/SCAM.2018.00029","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00029","url":null,"abstract":"Pointer aliasing still hinders compiler optimizations. The ISO C standard 99 has added the restrict keyword that allows programmer to specify non-aliasing as an aid to the compiler's optimizer. The task of annotating pointers with the restrict keyword is still left to the programmer and this task is, in general, tedious and prone to errors. Scalar replacement is an optimization widely used by compilers. In this paper, we present two new Frama-C plug-ins, RECKA for automatic annotation of CUDA kernels arguments with the restrict keyword, and RPromF for scalar replacement in OpenACC and OpenMP 4.0/4.5 codes for GPU. More specifically, RECKA works as follows: (i) an alias analysis is performed on CUDA kernels and their callers; (ii) if not found any alias then CUDA kernels are cloned, the clones are renamed and their arguments are annotated with the restrict qualifier; and (iii) instructions are added to kernels call sites to perform at runtime a less-than check analysis on kernel actuals parameters and determine if the clone must be called or the original one. RPromF includes five main steps: (i) OpenACC/OpenMP offloading regions are identified; (ii) functions containing these offloading codes and their callers are analyzed to check that there is no alias; (iii) if there is no alias then the offloading codes are cloned; (iv) clone's instructions are analyzed to retrieve data reuse information and perform scalar replacement; and instructions are added to be able to use the optimized clone whenever possible. We have evaluated the two plug-ins on PolyBench benchmark suite. The results show that both scalar replacement and the usage of restrict keyword are effective for improving the overall performance of OpenACC, OpenMP 4.0/4.5 and CUDA codes.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132160916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
[Research Paper] Semantics-Based Code Search Using Input/Output Examples 使用输入/输出示例的基于语义的代码搜索
Renhe Jiang, Zhengzhao Chen, Zejun Zhang, Yu Pei, Minxue Pan, Tian Zhang
As the quality and quantity of open source code increase, semantics-based code search has become an emerging need for software developers to retrieve and reuse existing source code. We present an approach of semantics-based code search using input/output examples for the Java language. Our approach encodes Java methods in code repositories into path constraints via symbolic analysis and leverages SMT solvers to find the methods whose path constraints can satisfy the given input/output examples. Our approach extends the applicability of the semantics-based search technology to more general Java code compared with existing methods. To evaluate our approach, we encoded 1228 methods from GitHub and applied semantics-based code search on 35 queries extracted from Stack Overflow. Correct method code for 29 queries was obtained during the search and the average search time was just about 48 seconds.
随着开源代码质量和数量的增加,基于语义的代码搜索已经成为软件开发人员检索和重用现有源代码的新兴需求。我们使用Java语言的输入/输出示例提出了一种基于语义的代码搜索方法。我们的方法通过符号分析将代码库中的Java方法编码为路径约束,并利用SMT求解器查找路径约束能够满足给定输入/输出示例的方法。与现有方法相比,我们的方法扩展了基于语义的搜索技术对更通用的Java代码的适用性。为了评估我们的方法,我们从GitHub编码了1228个方法,并对从Stack Overflow中提取的35个查询应用了基于语义的代码搜索。在搜索过程中获得了29个查询的正确方法代码,平均搜索时间仅为48秒左右。
{"title":"[Research Paper] Semantics-Based Code Search Using Input/Output Examples","authors":"Renhe Jiang, Zhengzhao Chen, Zejun Zhang, Yu Pei, Minxue Pan, Tian Zhang","doi":"10.1109/SCAM.2018.00018","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00018","url":null,"abstract":"As the quality and quantity of open source code increase, semantics-based code search has become an emerging need for software developers to retrieve and reuse existing source code. We present an approach of semantics-based code search using input/output examples for the Java language. Our approach encodes Java methods in code repositories into path constraints via symbolic analysis and leverages SMT solvers to find the methods whose path constraints can satisfy the given input/output examples. Our approach extends the applicability of the semantics-based search technology to more general Java code compared with existing methods. To evaluate our approach, we encoded 1228 methods from GitHub and applied semantics-based code search on 35 queries extracted from Stack Overflow. Correct method code for 29 queries was obtained during the search and the average search time was just about 48 seconds.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126271845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
[Research Paper] Detecting Evolutionary Coupling Using Transitive Association Rules [研究论文]利用传递关联规则检测进化耦合
Md. Anaytul Islam, Md. Moksedul Islam, Manishankar Mondal, B. Roy, C. Roy, Kevin A. Schneider
If two or more program entities (such as files, classes, methods) co-change (i.e., change together) frequently during software evolution, then it is likely that these two entities are coupled (i.e., the entities are related). Such a coupling is termed as evolutionary coupling in the literature. The concept of traditional evolutionary coupling restricts us to assume coupling among only those entities that changed together in the past. The entities that did not co-change in the past might also have coupling. However, such couplings can not be retrieved using the current concept of detecting evolutionary coupling in the literature. In this paper, we investigate whether we can detect such couplings by applying transitive rules on the evolutionary couplings detected using the traditional mechanism. We call these couplings that we detect using our proposed mechanism as transitive evolutionary couplings. According to our research on thousands of revisions of four subject systems, transitive evolutionary couplings combined with the traditional ones provide us with 13.96% higher recall and 5.56% higher precision in detecting future co-change candidates when compared with a state-of-the-art technique.
如果两个或更多的程序实体(如文件、类、方法)在软件发展过程中经常共同更改(即一起更改),那么这两个实体很可能是耦合的(即,实体是相关的)。这种耦合在文献中被称为进化耦合。传统的进化耦合的概念限制了我们只能假设那些在过去一起变化的实体之间存在耦合。过去没有共同变化的实体也可能具有耦合性。然而,这种耦合不能检索使用当前的概念检测进化耦合在文献中。在本文中,我们研究了在使用传统机制检测到的进化耦合上应用传递规则是否可以检测到这种耦合。我们将使用我们提出的机制检测到的这些耦合称为传递进化耦合。通过对4个主题系统的数千个版本的研究,与现有技术相比,传递进化耦合与传统技术相结合,在检测未来共变候选词时的召回率提高了13.96%,准确率提高了5.56%。
{"title":"[Research Paper] Detecting Evolutionary Coupling Using Transitive Association Rules","authors":"Md. Anaytul Islam, Md. Moksedul Islam, Manishankar Mondal, B. Roy, C. Roy, Kevin A. Schneider","doi":"10.1109/SCAM.2018.00020","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00020","url":null,"abstract":"If two or more program entities (such as files, classes, methods) co-change (i.e., change together) frequently during software evolution, then it is likely that these two entities are coupled (i.e., the entities are related). Such a coupling is termed as evolutionary coupling in the literature. The concept of traditional evolutionary coupling restricts us to assume coupling among only those entities that changed together in the past. The entities that did not co-change in the past might also have coupling. However, such couplings can not be retrieved using the current concept of detecting evolutionary coupling in the literature. In this paper, we investigate whether we can detect such couplings by applying transitive rules on the evolutionary couplings detected using the traditional mechanism. We call these couplings that we detect using our proposed mechanism as transitive evolutionary couplings. According to our research on thousands of revisions of four subject systems, transitive evolutionary couplings combined with the traditional ones provide us with 13.96% higher recall and 5.56% higher precision in detecting future co-change candidates when compared with a state-of-the-art technique.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126571175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Title Page i 第1页
{"title":"Title Page i","authors":"","doi":"10.1109/scam.2018.00001","DOIUrl":"https://doi.org/10.1109/scam.2018.00001","url":null,"abstract":"","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128692535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
[Engineering Paper] Built-in Clone Detection in Meta Languages [工程论文]元语言的内置克隆检测
R. Koschke, Urs-Bjorn Schmidt, Bernhard J. Berger
Developers often practice re-use by copying and pasting code. Copied and pasted code is also known as clones. Clones may be found in all programming languages. Automated clone detection may help to detect clones in order to support software maintenance and language design. Syntax-based clone detectors find similar syntax subtrees and, hence, are guaranteed to yield only syntactic clones. They are also known to have high precision and good recall. Developing a syntax-based clone detector for each language from scratch may be an expensive task. In this paper, we explore the idea to integrate syntax-based clone detection into workbenches for language engineering. Such workbenches allow developers to create their own domain-specific language or to create parsers for existing languages. With the integration of clone detection into these workbenches, a clone detector comes as a free byproduct of the grammar specification. The effort is spent only once for the workbench and not multiple times for every language built with the workbench. We report our lessons learned in applying this idea for three language workbenches: the popular parser generator ANTLR and two language workbenches for domain-specific languages, namely, MPS, developed by JetBrains, and Xtext, which is based on the Eclipse Modeling Framework.
开发人员经常通过复制和粘贴代码来实践重用。复制和粘贴的代码也被称为克隆。克隆可以在所有编程语言中找到。自动克隆检测可以帮助检测克隆,以便支持软件维护和语言设计。基于语法的克隆检测器找到相似的语法子树,因此保证只产生语法克隆。它们也被认为具有高精度和良好的召回率。从头开始为每种语言开发基于语法的克隆检测器可能是一项昂贵的任务。在本文中,我们探索了将基于语法的克隆检测集成到语言工程工作台中的想法。这样的工作台允许开发人员创建他们自己的领域特定语言,或者为现有语言创建解析器。通过将克隆检测集成到这些工作台中,克隆检测作为语法规范的免费副产品出现了。对于工作台,只需要花费一次时间,而对于使用工作台构建的每种语言,则不会花费多次时间。我们报告了在三种语言工作台中应用这一思想的经验教训:流行的解析器生成器ANTLR和两个针对特定领域语言的语言工作台,即由JetBrains开发的MPS和基于Eclipse建模框架的Xtext。
{"title":"[Engineering Paper] Built-in Clone Detection in Meta Languages","authors":"R. Koschke, Urs-Bjorn Schmidt, Bernhard J. Berger","doi":"10.1109/SCAM.2018.00026","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00026","url":null,"abstract":"Developers often practice re-use by copying and pasting code. Copied and pasted code is also known as clones. Clones may be found in all programming languages. Automated clone detection may help to detect clones in order to support software maintenance and language design. Syntax-based clone detectors find similar syntax subtrees and, hence, are guaranteed to yield only syntactic clones. They are also known to have high precision and good recall. Developing a syntax-based clone detector for each language from scratch may be an expensive task. In this paper, we explore the idea to integrate syntax-based clone detection into workbenches for language engineering. Such workbenches allow developers to create their own domain-specific language or to create parsers for existing languages. With the integration of clone detection into these workbenches, a clone detector comes as a free byproduct of the grammar specification. The effort is spent only once for the workbench and not multiple times for every language built with the workbench. We report our lessons learned in applying this idea for three language workbenches: the popular parser generator ANTLR and two language workbenches for domain-specific languages, namely, MPS, developed by JetBrains, and Xtext, which is based on the Eclipse Modeling Framework.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114832072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
[Research Paper] CroLSim: Cross Language Software Similarity Detector Using API Documentation [研究论文]CroLSim:使用API文档的跨语言软件相似性检测器
Kawser Wazed Nafi, B. Roy, C. Roy, Kevin A. Schneider
In today's open source era, developers look forsimilar software applications in source code repositories for anumber of reasons, including, exploring alternative implementations, reusing source code, or looking for a better application. However, while there are a great many studies for finding similarapplications written in the same programming language, there isa marked lack of studies for finding similar software applicationswritten in different languages. In this paper, we fill the gapby proposing a novel modelCroLSimwhich is able to detectsimilar software applications across different programming lan-guages. In our approach, we use the API documentation tofind relationships among the API calls used by the differentprogramming languages. We adopt a deep learning based word-vector learning method to identify semantic relationships amongthe API documentation which we then use to detect cross-language similar software applications. For evaluating CroLSim, we formed a repository consisting of 8,956 Java, 7,658 C#, and 10,232 Python applications collected from GitHub. Weobserved thatCroLSimcan successfully detect similar softwareapplications across different programming languages with a meanaverage precision rate of 0.65, an average confidence rate of3.6 (out of 5) with 75% high rated successful queries, whichoutperforms all related existing approaches with a significantperformance improvement.
在今天的开源时代,开发人员出于多种原因在源代码存储库中寻找类似的软件应用程序,包括探索替代实现、重用源代码或寻找更好的应用程序。然而,尽管有很多研究寻找用同一种编程语言编写的类似应用程序,但明显缺乏寻找用不同语言编写的类似软件应用程序的研究。在本文中,我们提出了一种新的crolsim模型来填补这一空白,该模型能够检测不同编程语言之间的类似软件应用程序。在我们的方法中,我们使用API文档来查找不同编程语言使用的API调用之间的关系。我们采用基于深度学习的词向量学习方法来识别API文档之间的语义关系,然后我们使用这些语义关系来检测跨语言相似的软件应用程序。为了评估CroLSim,我们建立了一个存储库,其中包含从GitHub收集的8,956个Java、7,658个c#和10,232个Python应用程序。我们观察到crolsimm可以成功地检测不同编程语言之间的类似软件应用程序,平均准确率为0.65,平均置信率为3.6(满分5分),75%的高评分成功查询,这优于所有相关的现有方法,性能得到了显著提高。
{"title":"[Research Paper] CroLSim: Cross Language Software Similarity Detector Using API Documentation","authors":"Kawser Wazed Nafi, B. Roy, C. Roy, Kevin A. Schneider","doi":"10.1109/SCAM.2018.00023","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00023","url":null,"abstract":"In today's open source era, developers look forsimilar software applications in source code repositories for anumber of reasons, including, exploring alternative implementations, reusing source code, or looking for a better application. However, while there are a great many studies for finding similarapplications written in the same programming language, there isa marked lack of studies for finding similar software applicationswritten in different languages. In this paper, we fill the gapby proposing a novel modelCroLSimwhich is able to detectsimilar software applications across different programming lan-guages. In our approach, we use the API documentation tofind relationships among the API calls used by the differentprogramming languages. We adopt a deep learning based word-vector learning method to identify semantic relationships amongthe API documentation which we then use to detect cross-language similar software applications. For evaluating CroLSim, we formed a repository consisting of 8,956 Java, 7,658 C#, and 10,232 Python applications collected from GitHub. Weobserved thatCroLSimcan successfully detect similar softwareapplications across different programming languages with a meanaverage precision rate of 0.65, an average confidence rate of3.6 (out of 5) with 75% high rated successful queries, whichoutperforms all related existing approaches with a significantperformance improvement.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124842720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
[Engineering Paper] Enabling the Continuous Analysis of Security Vulnerabilities with VulData7 [工程论文]利用VulData7实现安全漏洞的持续分析
Matthieu Jimenez, Yves Le Traon, Mike Papadakis
Studies on security vulnerabilities require the analysis, investigation and comprehension of real vulnerable code instances. However, collecting and experimenting with a sufficient number of such instances is challenging. To cope with this issue, we developed VulData7, an extensible framework and dataset of real vulnerabilities, automatically collected from software archives. The current version of the dataset contains all reported vulnerabilities (in the NVD database) of 4 security critical open source systems, i.e., Linux Kernel, WireShark, OpenSSL, SystemD. For each vulnerability, VulData7 provides the vulnerability report data (description, CVE number, CWE number, CVSS severity score and others), the vulnerable code instance (list of versions), and when available its corresponding patches (list of fixing commits) and the files (before and after fix). VulData7 is automated, flexible and easily extensible. Once configured, it extracts and links information from the related software archives (through Git and NVD reports) to create a dataset that is continuously updated with the latest information available. Currently, VulData7 retrieves fixes for 1,600 out of the 2,800 reported vulnerabilities of the 4 systems. The framework also supports the collection of additional software defects and aims at easing empirical studies and analyses. We believe that our framework is a valuable resource for both developers and researchers interested in secure software development. Vul-Data7 can also serve educational purposes and trigger research on source code analysis. VulData7 is publicly available at: https://github.com/electricalwind/data7
安全漏洞的研究需要对真实的漏洞代码实例进行分析、调查和理解。然而,收集和试验足够数量的此类实例是具有挑战性的。为了解决这个问题,我们开发了VulData7,这是一个可扩展的框架和真实漏洞的数据集,自动从软件存档中收集。当前版本的数据集包含4个安全关键开源系统(即Linux Kernel, WireShark, OpenSSL, SystemD)的所有报告漏洞(在NVD数据库中)。对于每个漏洞,VulData7提供了漏洞报告数据(描述、CVE号、CWE号、CVSS严重性评分等)、漏洞代码实例(版本列表)以及可用时对应的补丁(修复提交列表)和文件(修复前后)。VulData7是自动化、灵活且易于扩展的。配置完成后,它从相关的软件存档(通过Git和NVD报告)中提取和链接信息,以创建一个不断更新最新可用信息的数据集。目前,VulData7检索了4个系统中报告的2800个漏洞中的1600个漏洞的修复程序。该框架还支持附加软件缺陷的收集,旨在简化实证研究和分析。我们相信我们的框架对于对安全软件开发感兴趣的开发人员和研究人员来说都是一个有价值的资源。vull - data7还可以用于教育目的,并引发对源代码分析的研究。VulData7可在:https://github.com/electricalwind/data7公开获得
{"title":"[Engineering Paper] Enabling the Continuous Analysis of Security Vulnerabilities with VulData7","authors":"Matthieu Jimenez, Yves Le Traon, Mike Papadakis","doi":"10.1109/SCAM.2018.00014","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00014","url":null,"abstract":"Studies on security vulnerabilities require the analysis, investigation and comprehension of real vulnerable code instances. However, collecting and experimenting with a sufficient number of such instances is challenging. To cope with this issue, we developed VulData7, an extensible framework and dataset of real vulnerabilities, automatically collected from software archives. The current version of the dataset contains all reported vulnerabilities (in the NVD database) of 4 security critical open source systems, i.e., Linux Kernel, WireShark, OpenSSL, SystemD. For each vulnerability, VulData7 provides the vulnerability report data (description, CVE number, CWE number, CVSS severity score and others), the vulnerable code instance (list of versions), and when available its corresponding patches (list of fixing commits) and the files (before and after fix). VulData7 is automated, flexible and easily extensible. Once configured, it extracts and links information from the related software archives (through Git and NVD reports) to create a dataset that is continuously updated with the latest information available. Currently, VulData7 retrieves fixes for 1,600 out of the 2,800 reported vulnerabilities of the 4 systems. The framework also supports the collection of additional software defects and aims at easing empirical studies and analyses. We believe that our framework is a valuable resource for both developers and researchers interested in secure software development. Vul-Data7 can also serve educational purposes and trigger research on source code analysis. VulData7 is publicly available at: https://github.com/electricalwind/data7","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131039831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis 从初创企业到规模扩张:静态和动态规划分析的机遇和开放问题
M. Harman, P. O'Hearn
This paper describes some of the challenges and opportunities when deploying static and dynamic analysis at scale, drawing on the authors' experience with the Infer and Sapienz Technologies at Facebook, each of which started life as a research-led start-up that was subsequently deployed at scale, impacting billions of people worldwide. The paper identifies open problems that have yet to receive significant attention from the scientific community, yet which have potential for profound real world impact, formulating these as research questions that, we believe, are ripe for exploration and that would make excellent topics for research projects. Note: This paper accompanies the authors' joint keynote at the 18th IEEE International Working Conference on Source Code Analysis and Manipulation, September 23rd-24th, 2018 - Madrid, Spain.
本文描述了大规模部署静态和动态分析时的一些挑战和机遇,借鉴了作者在Facebook的Infer和Sapienz Technologies的经验,这两个公司都是从研究主导的初创企业开始的,随后被大规模部署,影响了全球数十亿人。这篇论文指出了尚未得到科学界重视的开放性问题,但这些问题有可能对现实世界产生深远的影响,我们认为这些研究问题已经成熟,可以进行探索,并将成为研究项目的优秀主题。注:本文与作者在2018年9月23日至24日在西班牙马德里举行的第18届IEEE源代码分析与操作国际工作会议上的联合主题演讲一起发表。
{"title":"From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis","authors":"M. Harman, P. O'Hearn","doi":"10.1109/SCAM.2018.00009","DOIUrl":"https://doi.org/10.1109/SCAM.2018.00009","url":null,"abstract":"This paper describes some of the challenges and opportunities when deploying static and dynamic analysis at scale, drawing on the authors' experience with the Infer and Sapienz Technologies at Facebook, each of which started life as a research-led start-up that was subsequently deployed at scale, impacting billions of people worldwide. The paper identifies open problems that have yet to receive significant attention from the scientific community, yet which have potential for profound real world impact, formulating these as research questions that, we believe, are ripe for exploration and that would make excellent topics for research projects. Note: This paper accompanies the authors' joint keynote at the 18th IEEE International Working Conference on Source Code Analysis and Manipulation, September 23rd-24th, 2018 - Madrid, Spain.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132574807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
期刊
2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1