Data collection and analysis of GitHub repositories and users

Fragkiskos Chatziasimidis, I. Stamelos
{"title":"Data collection and analysis of GitHub repositories and users","authors":"Fragkiskos Chatziasimidis, I. Stamelos","doi":"10.1109/IISA.2015.7388026","DOIUrl":null,"url":null,"abstract":"In this paper, we present the collection and mining of GitHub data, aiming to understand GitHub user behavior and project success factors. We collected information about approximately 100K projects and 10K GitHub users//owners of these projects, via GitHub API. Subsequently, we statistically analyzed such data, discretized values of features via k-means algorithm, and finally we applied apriori algorithm via weka in order to find out association rules. Having assumed that project success could be measured by the cardinality of downloads we kept only the rules which had as right par a download cardinality higher than a threshold of 1000 downloads. The results provide intersting insight in the GitHub ecosystem and seven success rules for GitHub projects.","PeriodicalId":433872,"journal":{"name":"2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2015.7388026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

In this paper, we present the collection and mining of GitHub data, aiming to understand GitHub user behavior and project success factors. We collected information about approximately 100K projects and 10K GitHub users//owners of these projects, via GitHub API. Subsequently, we statistically analyzed such data, discretized values of features via k-means algorithm, and finally we applied apriori algorithm via weka in order to find out association rules. Having assumed that project success could be measured by the cardinality of downloads we kept only the rules which had as right par a download cardinality higher than a threshold of 1000 downloads. The results provide intersting insight in the GitHub ecosystem and seven success rules for GitHub projects.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GitHub存储库和用户的数据收集和分析
在本文中,我们介绍了对GitHub数据的收集和挖掘,旨在了解GitHub用户行为和项目成功因素。我们通过GitHub API收集了大约100K个项目和10K个GitHub用户//这些项目的所有者的信息。随后,我们对这些数据进行统计分析,通过k-means算法对特征值进行离散化,最后通过weka应用apriori算法找出关联规则。假设项目成功可以通过下载量基数来衡量,我们只保留那些下载量基数高于1000下载量阈值的规则。研究结果为GitHub生态系统提供了有趣的见解,并为GitHub项目提供了七条成功法则。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A learning approach for strategic consumers in smart electricity markets On the construction of increasing-chord graphs on convex point sets A braided routing mechanism to reduce traffic load's local variance in wireless sensor networks Monitoring people with MCI: Deployment in a real scenario for low-budget smartphones MicroCAS: Design and implementation of proposed standards in micro-learning on mobile devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1