The Open-Closed Principle of Modern Machine Learning Frameworks

Houssem Ben Braiek, Foutse Khomh, Bram Adams
{"title":"The Open-Closed Principle of Modern Machine Learning Frameworks","authors":"Houssem Ben Braiek, Foutse Khomh, Bram Adams","doi":"10.1145/3196398.3196445","DOIUrl":null,"url":null,"abstract":"Recent advances in computing technologies and the availability of huge volumes of data have sparked a new machine learning (ML) revolution, where almost every day a new headline touts the demise of human experts by ML models on some task. Open source software development is rumoured to play a significant role in this revolution, with both academics and large corporations such as Google and Microsoft releasing their ML frameworks under an open source license. This paper takes a step back to examine and understand the role of open source development in modern ML, by examining the growth of the open source ML ecosystem on GitHub, its actors, and the adoption of frameworks over time. By mining LinkedIn and Google Scholar profiles, we also examine driving factors behind this growth (paid vs. voluntary contributors), as well as the major players who promote its democratization (companies vs. communities), and the composition of ML development teams (engineers vs. scientists). According to the technology adoption lifecycle, we find that ML is in between the stages of early adoption and early majority. Furthermore, companies are the main drivers behind open source ML, while the majority of development teams are hybrid teams comprising both engineers and professional scientists. The latter correspond to scientists employed by a company, and by far represent the most active profiles in the development of ML applications, which reflects the importance of a scientific background for the development of ML frameworks to complement coding skills. The large influence of cloud computing companies on the development of open source ML frameworks raises the risk of vendor lock-in. These frameworks, while open source, could be optimized for specific commercial cloud offerings.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"110 1","pages":"353-363"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3196398.3196445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

Recent advances in computing technologies and the availability of huge volumes of data have sparked a new machine learning (ML) revolution, where almost every day a new headline touts the demise of human experts by ML models on some task. Open source software development is rumoured to play a significant role in this revolution, with both academics and large corporations such as Google and Microsoft releasing their ML frameworks under an open source license. This paper takes a step back to examine and understand the role of open source development in modern ML, by examining the growth of the open source ML ecosystem on GitHub, its actors, and the adoption of frameworks over time. By mining LinkedIn and Google Scholar profiles, we also examine driving factors behind this growth (paid vs. voluntary contributors), as well as the major players who promote its democratization (companies vs. communities), and the composition of ML development teams (engineers vs. scientists). According to the technology adoption lifecycle, we find that ML is in between the stages of early adoption and early majority. Furthermore, companies are the main drivers behind open source ML, while the majority of development teams are hybrid teams comprising both engineers and professional scientists. The latter correspond to scientists employed by a company, and by far represent the most active profiles in the development of ML applications, which reflects the importance of a scientific background for the development of ML frameworks to complement coding skills. The large influence of cloud computing companies on the development of open source ML frameworks raises the risk of vendor lock-in. These frameworks, while open source, could be optimized for specific commercial cloud offerings.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
现代机器学习框架的开闭原则
计算技术的最新进步和海量数据的可用性引发了一场新的机器学习(ML)革命,几乎每天都有新的头条新闻吹捧机器学习模型在某些任务上取代了人类专家。据说开源软件开发在这场革命中扮演着重要的角色,学术界和谷歌和微软等大公司都在开源许可下发布了他们的机器学习框架。本文通过研究GitHub上开源ML生态系统的发展、参与者和框架的采用,回顾和理解开源开发在现代ML中的作用。通过挖掘LinkedIn和b谷歌Scholar的个人资料,我们还研究了这种增长背后的驱动因素(付费贡献者与自愿贡献者),以及促进其民主化的主要参与者(公司与社区),以及机器学习开发团队的组成(工程师与科学家)。根据技术采用生命周期,我们发现ML处于早期采用阶段和早期主流阶段之间。此外,公司是开源机器学习背后的主要驱动力,而大多数开发团队是由工程师和专业科学家组成的混合团队。后者对应于公司雇用的科学家,并且迄今为止代表了ML应用程序开发中最活跃的概况,这反映了科学背景对ML框架开发的重要性,以补充编码技能。云计算公司对开源ML框架开发的巨大影响增加了供应商锁定的风险。这些框架虽然是开源的,但可以针对特定的商业云产品进行优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Detecting and Characterizing Developer Behavior Following Opportunistic Reuse of Code Snippets from the Web The Open-Closed Principle of Modern Machine Learning Frameworks Prevalence of Confusing Code in Software Projects: Atoms of Confusion in the Wild Large-Scale Analysis of the Co-commit Patterns of the Active Developers in GitHub's Top Repositories Structured Information on State and Evolution of Dockerfiles on GitHub
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1