A Linked Data platform for mining software repositories

I. Keivanloo, C. Forbes, Aseel Hmood, Mostafa Erfani, Christopher Neal, George Peristerakis, J. Rilling
{"title":"A Linked Data platform for mining software repositories","authors":"I. Keivanloo, C. Forbes, Aseel Hmood, Mostafa Erfani, Christopher Neal, George Peristerakis, J. Rilling","doi":"10.1109/MSR.2012.6224296","DOIUrl":null,"url":null,"abstract":"The mining of software repositories involves the extraction of both basic and value-added information from existing software repositories. The repositories will be mined to extract facts by different stakeholders (e.g. researchers, managers) and for various purposes. To avoid unnecessary pre-processing and analysis steps, sharing and integration of both basic and value-added facts are needed. In this research, we introduce SeCold, an open and collaborative platform for sharing software datasets. SeCold provides the first online software ecosystem Linked Data platform that supports data extraction and on-the-fly inter-dataset integration from major version control, issue tracking, and quality evaluation systems. In its first release, the dataset contains about two billion facts, such as source code statements, software licenses, and code clones from 18 000 software projects. In its second release the SeCold project will contain additional facts mined from issue trackers and versioning systems. Our approach is based on the same fundamental principle as Wikipedia: researchers and tool developers share analysis results obtained from their tools by publishing them as part of the SeCold portal and therefore make them an integrated part of the global knowledge domain. The SeCold project is an official member of the Linked Data dataset cloud and is currently the eighth largest online dataset available on the Web.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR.2012.6224296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 59

Abstract

The mining of software repositories involves the extraction of both basic and value-added information from existing software repositories. The repositories will be mined to extract facts by different stakeholders (e.g. researchers, managers) and for various purposes. To avoid unnecessary pre-processing and analysis steps, sharing and integration of both basic and value-added facts are needed. In this research, we introduce SeCold, an open and collaborative platform for sharing software datasets. SeCold provides the first online software ecosystem Linked Data platform that supports data extraction and on-the-fly inter-dataset integration from major version control, issue tracking, and quality evaluation systems. In its first release, the dataset contains about two billion facts, such as source code statements, software licenses, and code clones from 18 000 software projects. In its second release the SeCold project will contain additional facts mined from issue trackers and versioning systems. Our approach is based on the same fundamental principle as Wikipedia: researchers and tool developers share analysis results obtained from their tools by publishing them as part of the SeCold portal and therefore make them an integrated part of the global knowledge domain. The SeCold project is an official member of the Linked Data dataset cloud and is currently the eighth largest online dataset available on the Web.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于挖掘软件存储库的关联数据平台
软件存储库的挖掘包括从现有的软件存储库中提取基本信息和增值信息。存储库将被挖掘,以由不同的利益相关者(例如研究人员、管理人员)为各种目的提取事实。为了避免不必要的预处理和分析步骤,需要共享和集成基本事实和增值事实。在本研究中,我们介绍了一个开放和协作的软件数据集共享平台SeCold。SeCold提供首个在线软件生态系统关联数据平台,支持主要版本控制、问题跟踪和质量评估系统的数据提取和实时数据集集成。在第一个版本中,该数据集包含大约20亿个事实,例如源代码语句、软件许可证和来自18000个软件项目的代码克隆。在第二个版本中,SeCold项目将包含从问题跟踪器和版本控制系统中挖掘的额外事实。我们的方法基于与维基百科相同的基本原则:研究人员和工具开发人员共享从他们的工具中获得的分析结果,并将其作为第二门户网站的一部分发布,从而使其成为全球知识领域的一个组成部分。SeCold项目是关联数据数据集云的官方成员,目前是网络上第八大可用在线数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
MINCE: Mining change history of Android project Co-evolution of logical couplings and commits for defect estimation Analysis of customer satisfaction survey data Do faster releases improve software quality? An empirical case study of Mozilla Firefox Why do software packages conflict?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1