CodeKoan: A Source Code Pattern Search Engine Extracting Crowd Knowledge

C. Schramm, Yingding Wang, François Bry
{"title":"CodeKoan: A Source Code Pattern Search Engine Extracting Crowd Knowledge","authors":"C. Schramm, Yingding Wang, François Bry","doi":"10.1145/3195863.3195864","DOIUrl":null,"url":null,"abstract":"Source code search is frequently needed and important in software development. Keyword search for source code is a widely used but a limited approach. This paper presents CodeKoan, a scalable engine for searching millions of online code examples written by the worldwide programmers’ community which uses data parallel processing to achieve horizontal scalability. The search engine relies on a token-based, programming language independent algorithm and, as a proof-of-concept, indexes all code examples from Stack Overflow for two programming languages: Java and Python. This paper demonstrates the benefits of extracting crowd knowledge from Stack Overflow by analyzing well-known open source repositories such as OpenNLP and Elasticsearch: Up to one third of the source code in the examined repositories reuses code patterns from Stack Overflow. It also shows that the proposed approach recognizes similar source code and is resilient to modifications such as insertion, deletion and swapping of statements. Furthermore, evidence is given that the proposed approach returns very few false positives among the search results.","PeriodicalId":131063,"journal":{"name":"2018 IEEE/ACM 5th International Workshop on Crowd Sourcing in Software Engineering (CSI-SE)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 5th International Workshop on Crowd Sourcing in Software Engineering (CSI-SE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3195863.3195864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Source code search is frequently needed and important in software development. Keyword search for source code is a widely used but a limited approach. This paper presents CodeKoan, a scalable engine for searching millions of online code examples written by the worldwide programmers’ community which uses data parallel processing to achieve horizontal scalability. The search engine relies on a token-based, programming language independent algorithm and, as a proof-of-concept, indexes all code examples from Stack Overflow for two programming languages: Java and Python. This paper demonstrates the benefits of extracting crowd knowledge from Stack Overflow by analyzing well-known open source repositories such as OpenNLP and Elasticsearch: Up to one third of the source code in the examined repositories reuses code patterns from Stack Overflow. It also shows that the proposed approach recognizes similar source code and is resilient to modifications such as insertion, deletion and swapping of statements. Furthermore, evidence is given that the proposed approach returns very few false positives among the search results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CodeKoan:一个提取大众知识的源代码模式搜索引擎
源代码搜索在软件开发中是经常需要和重要的。关键字搜索源代码是一种广泛使用但有局限性的方法。本文介绍了CodeKoan,一个可扩展的引擎,用于搜索由全球程序员社区编写的数百万在线代码示例,它使用数据并行处理来实现水平可伸缩性。该搜索引擎依赖于基于令牌的、独立于编程语言的算法,并且作为概念验证,索引了两种编程语言(Java和Python)的Stack Overflow中的所有代码示例。本文通过分析著名的开源存储库(如OpenNLP和Elasticsearch),展示了从Stack Overflow中提取人群知识的好处:在被检查的存储库中,多达三分之一的源代码重用了Stack Overflow中的代码模式。实验还表明,该方法可以识别相似的源代码,并且对语句的插入、删除和交换等修改具有弹性。此外,给出的证据表明,该方法在搜索结果中返回很少的假阳性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
CrowdAssistant: A Virtual Buddy for Crowd Worker CodeKoan: A Source Code Pattern Search Engine Extracting Crowd Knowledge Competence, Collaboration, and Time Management: Barriers and Recommendations for Crowdworkers A Hybrid Simulation Model for Crowdsourced Software Development Do Extra Dollars Pay Off? - An Exploratory Study on TopCoder
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1