Characterizing the Usage of CI Tools in ML Projects

D. Rzig, Foyzul Hassan, Chetan Bansal, Nachiappan Nagappan
{"title":"Characterizing the Usage of CI Tools in ML Projects","authors":"D. Rzig, Foyzul Hassan, Chetan Bansal, Nachiappan Nagappan","doi":"10.1145/3544902.3546237","DOIUrl":null,"url":null,"abstract":"Background: Continuous Integration (CI) has become widely adopted to enable faster code change integration. Meanwhile, Machine Learning (ML) is being used by software applications for previously unsolvable real-world scenarios. ML projects employ development processes different from those of traditional software projects, but they too require multiple iterations in their development, and may benefit from CI. Aims: While there are many works covering CI within traditional software, none of them empirically explored the adoption of CI and its associated issues within ML projects. To address this knowledge gap, we performed an empirical analysis comparing CI adoption between ML and Non-ML projects. Method: We developed TraVanalyzer, the first Travis CI configuration analyzer, to analyze the CI practices of ML projects, and developed a CI log analyzer to identify the different CI problems of ML projects. Results: We found that Travis CI is the most popular CI tool for ML projects, and that their CI adoption lags behind that of Non-ML projects, but that ML projects which adopted CI, used it for building, testing, code analysis, and automatic deployment more than Non-ML projects. Furthermore, while CI in ML projects is as likely to experience problems as CI in Non-ML projects, it has more varied reasons for build-breakage. The most frequent CI failures of ML projects are due to testing-related problems, similar to Non-ML and OSS CI failures. Conclusion: To the best of our knowledge, this is the first work that has analyzed ML projects’ CI usage, practices, and issues, and contextualized its results by comparing them with similar Non-ML projects. It provides findings for researchers and ML developers to identify possible improvement scopes for CI in ML projects.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544902.3546237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Background: Continuous Integration (CI) has become widely adopted to enable faster code change integration. Meanwhile, Machine Learning (ML) is being used by software applications for previously unsolvable real-world scenarios. ML projects employ development processes different from those of traditional software projects, but they too require multiple iterations in their development, and may benefit from CI. Aims: While there are many works covering CI within traditional software, none of them empirically explored the adoption of CI and its associated issues within ML projects. To address this knowledge gap, we performed an empirical analysis comparing CI adoption between ML and Non-ML projects. Method: We developed TraVanalyzer, the first Travis CI configuration analyzer, to analyze the CI practices of ML projects, and developed a CI log analyzer to identify the different CI problems of ML projects. Results: We found that Travis CI is the most popular CI tool for ML projects, and that their CI adoption lags behind that of Non-ML projects, but that ML projects which adopted CI, used it for building, testing, code analysis, and automatic deployment more than Non-ML projects. Furthermore, while CI in ML projects is as likely to experience problems as CI in Non-ML projects, it has more varied reasons for build-breakage. The most frequent CI failures of ML projects are due to testing-related problems, similar to Non-ML and OSS CI failures. Conclusion: To the best of our knowledge, this is the first work that has analyzed ML projects’ CI usage, practices, and issues, and contextualized its results by comparing them with similar Non-ML projects. It provides findings for researchers and ML developers to identify possible improvement scopes for CI in ML projects.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
描述机器学习项目中CI工具的使用
背景:持续集成(CI)已被广泛采用,以实现更快的代码更改集成。与此同时,机器学习(ML)正在被软件应用程序用于解决以前无法解决的现实世界场景。ML项目采用的开发过程与传统的软件项目不同,但是它们在开发过程中也需要多次迭代,并且可能从CI中受益。目标:虽然有很多作品涉及传统软件中的持续集成,但没有一个是经验性地探讨持续集成的采用及其在ML项目中的相关问题。为了解决这一知识差距,我们对ML和非ML项目之间的CI采用进行了实证分析。方法:开发了第一个Travis CI配置分析器TraVanalyzer来分析ML项目的CI实践,并开发了一个CI日志分析器来识别ML项目的不同CI问题。结果:我们发现Travis CI是ML项目中最流行的CI工具,并且他们的CI采用落后于非ML项目,但是采用CI的ML项目比非ML项目更多地使用它进行构建、测试、代码分析和自动部署。此外,虽然ML项目中的CI与非ML项目中的CI一样可能遇到问题,但它有更多不同的构建破坏原因。ML项目中最常见的CI失败是由于与测试相关的问题,类似于非ML和OSS CI失败。结论:据我们所知,这是第一个分析机器学习项目的CI使用、实践和问题的工作,并通过将其与类似的非机器学习项目进行比较,将其结果置于环境中。它为研究人员和机器学习开发人员提供了发现,以确定机器学习项目中CI的可能改进范围。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analyzing the Relationship between Community and Design Smells in Open-Source Software Projects: An Empirical Study A Preliminary Investigation of MLOps Practices in GitHub PG-VulNet: Detect Supply Chain Vulnerabilities in IoT Devices using Pseudo-code and Graphs On the Relationship Between Story Points and Development Effort in Agile Open-Source Software DevOps Practitioners’ Perceptions of the Low-code Trend
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1