Characterizing the Usage of CI Tools in ML Projects

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement Pub Date : 2022-09-19 DOI:10.1145/3544902.3546237

D. Rzig, Foyzul Hassan, Chetan Bansal, Nachiappan Nagappan

{"title":"Characterizing the Usage of CI Tools in ML Projects","authors":"D. Rzig, Foyzul Hassan, Chetan Bansal, Nachiappan Nagappan","doi":"10.1145/3544902.3546237","DOIUrl":null,"url":null,"abstract":"Background: Continuous Integration (CI) has become widely adopted to enable faster code change integration. Meanwhile, Machine Learning (ML) is being used by software applications for previously unsolvable real-world scenarios. ML projects employ development processes different from those of traditional software projects, but they too require multiple iterations in their development, and may benefit from CI. Aims: While there are many works covering CI within traditional software, none of them empirically explored the adoption of CI and its associated issues within ML projects. To address this knowledge gap, we performed an empirical analysis comparing CI adoption between ML and Non-ML projects. Method: We developed TraVanalyzer, the first Travis CI configuration analyzer, to analyze the CI practices of ML projects, and developed a CI log analyzer to identify the different CI problems of ML projects. Results: We found that Travis CI is the most popular CI tool for ML projects, and that their CI adoption lags behind that of Non-ML projects, but that ML projects which adopted CI, used it for building, testing, code analysis, and automatic deployment more than Non-ML projects. Furthermore, while CI in ML projects is as likely to experience problems as CI in Non-ML projects, it has more varied reasons for build-breakage. The most frequent CI failures of ML projects are due to testing-related problems, similar to Non-ML and OSS CI failures. Conclusion: To the best of our knowledge, this is the first work that has analyzed ML projects’ CI usage, practices, and issues, and contextualized its results by comparing them with similar Non-ML projects. It provides findings for researchers and ML developers to identify possible improvement scopes for CI in ML projects.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544902.3546237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Background: Continuous Integration (CI) has become widely adopted to enable faster code change integration. Meanwhile, Machine Learning (ML) is being used by software applications for previously unsolvable real-world scenarios. ML projects employ development processes different from those of traditional software projects, but they too require multiple iterations in their development, and may benefit from CI. Aims: While there are many works covering CI within traditional software, none of them empirically explored the adoption of CI and its associated issues within ML projects. To address this knowledge gap, we performed an empirical analysis comparing CI adoption between ML and Non-ML projects. Method: We developed TraVanalyzer, the first Travis CI configuration analyzer, to analyze the CI practices of ML projects, and developed a CI log analyzer to identify the different CI problems of ML projects. Results: We found that Travis CI is the most popular CI tool for ML projects, and that their CI adoption lags behind that of Non-ML projects, but that ML projects which adopted CI, used it for building, testing, code analysis, and automatic deployment more than Non-ML projects. Furthermore, while CI in ML projects is as likely to experience problems as CI in Non-ML projects, it has more varied reasons for build-breakage. The most frequent CI failures of ML projects are due to testing-related problems, similar to Non-ML and OSS CI failures. Conclusion: To the best of our knowledge, this is the first work that has analyzed ML projects’ CI usage, practices, and issues, and contextualized its results by comparing them with similar Non-ML projects. It provides findings for researchers and ML developers to identify possible improvement scopes for CI in ML projects.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

描述机器学习项目中CI工具的使用

背景:持续集成(CI)已被广泛采用，以实现更快的代码更改集成。与此同时，机器学习(ML)正在被软件应用程序用于解决以前无法解决的现实世界场景。ML项目采用的开发过程与传统的软件项目不同，但是它们在开发过程中也需要多次迭代，并且可能从CI中受益。目标:虽然有很多作品涉及传统软件中的持续集成，但没有一个是经验性地探讨持续集成的采用及其在ML项目中的相关问题。为了解决这一知识差距，我们对ML和非ML项目之间的CI采用进行了实证分析。方法:开发了第一个Travis CI配置分析器TraVanalyzer来分析ML项目的CI实践，并开发了一个CI日志分析器来识别ML项目的不同CI问题。结果:我们发现Travis CI是ML项目中最流行的CI工具，并且他们的CI采用落后于非ML项目，但是采用CI的ML项目比非ML项目更多地使用它进行构建、测试、代码分析和自动部署。此外，虽然ML项目中的CI与非ML项目中的CI一样可能遇到问题，但它有更多不同的构建破坏原因。ML项目中最常见的CI失败是由于与测试相关的问题，类似于非ML和OSS CI失败。结论:据我们所知，这是第一个分析机器学习项目的CI使用、实践和问题的工作，并通过将其与类似的非机器学习项目进行比较，将其结果置于环境中。它为研究人员和机器学习开发人员提供了发现，以确定机器学习项目中CI的可能改进范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

自引率

0.00%

发文量