机器学习云有多好?对 5 年内的两个快照进行基准测试

Jiawei Jiang, Yi Wei, Yu Liu, Wentao Wu, Chuang Hu, Zhigao Zheng, Ziyi Zhang, Yingxia Shao, Ce Zhang
{"title":"机器学习云有多好?对 5 年内的两个快照进行基准测试","authors":"Jiawei Jiang, Yi Wei, Yu Liu, Wentao Wu, Chuang Hu, Zhigao Zheng, Ziyi Zhang, Yingxia Shao, Ce Zhang","doi":"10.1007/s00778-024-00842-3","DOIUrl":null,"url":null,"abstract":"<p>We conduct an empirical study of machine learning functionalities provided by major cloud service providers, which we call <i>machine learning clouds</i>. Machine learning clouds hold the promise of hiding all the sophistication of running large-scale machine learning: Instead of specifying <i>how</i> to run a machine learning task, users only specify <i>what</i> machine learning task to run and the cloud figures out the rest. Raising the level of abstraction, however, rarely comes free—a performance penalty is possible. <i>How good, then, are current machine learning clouds on real-world machine learning workloads?</i> We study this question by conducting benchmark on the mainstream machine learning clouds. Since these platforms continue to innovate, our benchmark tries to reflect their evolvement. Concretely, this paper consists of two sub-benchmarks—<span>mlbench</span> and <span>automlbench</span>. When we first started this work in 2016, only two cloud platforms provide machine learning services and limited themselves to model training and simple hyper-parameter tuning. We then focus on binary classification problems and present <span>mlbench</span>, a novel benchmark constructed by harvesting datasets from Kaggle competitions. We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on <span>mlbench</span>. In the recent few years, more cloud providers support machine learning and include automatic machine learning (AutoML) techniques in their machine learning clouds. Their AutoML services can ease manual tuning on the whole machine learning pipeline, including but not limited to data preprocessing, feature selection, model selection, hyper-parameter, and model ensemble. To reflect these advancements, we design <span>automlbench</span> to assess the AutoML performance of four machine learning clouds using different kinds of workloads. Our comparative study reveals the strength and weakness of existing machine learning clouds and points out potential future directions for improvement.\n</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How good are machine learning clouds? Benchmarking two snapshots over 5 years\",\"authors\":\"Jiawei Jiang, Yi Wei, Yu Liu, Wentao Wu, Chuang Hu, Zhigao Zheng, Ziyi Zhang, Yingxia Shao, Ce Zhang\",\"doi\":\"10.1007/s00778-024-00842-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>We conduct an empirical study of machine learning functionalities provided by major cloud service providers, which we call <i>machine learning clouds</i>. Machine learning clouds hold the promise of hiding all the sophistication of running large-scale machine learning: Instead of specifying <i>how</i> to run a machine learning task, users only specify <i>what</i> machine learning task to run and the cloud figures out the rest. Raising the level of abstraction, however, rarely comes free—a performance penalty is possible. <i>How good, then, are current machine learning clouds on real-world machine learning workloads?</i> We study this question by conducting benchmark on the mainstream machine learning clouds. Since these platforms continue to innovate, our benchmark tries to reflect their evolvement. Concretely, this paper consists of two sub-benchmarks—<span>mlbench</span> and <span>automlbench</span>. When we first started this work in 2016, only two cloud platforms provide machine learning services and limited themselves to model training and simple hyper-parameter tuning. We then focus on binary classification problems and present <span>mlbench</span>, a novel benchmark constructed by harvesting datasets from Kaggle competitions. We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on <span>mlbench</span>. In the recent few years, more cloud providers support machine learning and include automatic machine learning (AutoML) techniques in their machine learning clouds. Their AutoML services can ease manual tuning on the whole machine learning pipeline, including but not limited to data preprocessing, feature selection, model selection, hyper-parameter, and model ensemble. To reflect these advancements, we design <span>automlbench</span> to assess the AutoML performance of four machine learning clouds using different kinds of workloads. Our comparative study reveals the strength and weakness of existing machine learning clouds and points out potential future directions for improvement.\\n</p>\",\"PeriodicalId\":501532,\"journal\":{\"name\":\"The VLDB Journal\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The VLDB Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00778-024-00842-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-024-00842-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们对主要云服务提供商提供的机器学习功能进行了实证研究,我们称之为机器学习云。机器学习云有望隐藏运行大规模机器学习的所有复杂性:用户无需指定如何运行机器学习任务,只需指定要运行什么机器学习任务,其余的都由云计算来解决。然而,抽象程度的提高很少是免费的--可能会造成性能损失。那么,目前的机器学习云在真实世界的机器学习工作负载上表现如何呢?我们通过对主流机器学习云进行基准测试来研究这个问题。由于这些平台在不断创新,我们的基准测试试图反映它们的发展。具体来说,本文包括两个子基准--mlbench 和 automlbench。2016 年我们刚开始这项工作时,只有两个云平台提供机器学习服务,而且仅限于模型训练和简单的超参数调整。随后,我们将重点放在二元分类问题上,并介绍了 mlbench,这是一种通过从 Kaggle 竞赛中获取数据集而构建的新型基准。然后,我们比较了 Kaggle 上最优秀的获奖代码与在 mlbench 上运行 Azure 和 Amazon 机器学习云的性能。近年来,越来越多的云提供商支持机器学习,并在其机器学习云中加入了自动机器学习(AutoML)技术。它们的 AutoML 服务可简化整个机器学习管道的人工调整,包括但不限于数据预处理、特征选择、模型选择、超参数和模型集合。为了反映这些进步,我们设计了 automlbench,使用不同类型的工作负载来评估四种机器学习云的 AutoML 性能。我们的比较研究揭示了现有机器学习云的优缺点,并指出了未来潜在的改进方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
How good are machine learning clouds? Benchmarking two snapshots over 5 years

We conduct an empirical study of machine learning functionalities provided by major cloud service providers, which we call machine learning clouds. Machine learning clouds hold the promise of hiding all the sophistication of running large-scale machine learning: Instead of specifying how to run a machine learning task, users only specify what machine learning task to run and the cloud figures out the rest. Raising the level of abstraction, however, rarely comes free—a performance penalty is possible. How good, then, are current machine learning clouds on real-world machine learning workloads? We study this question by conducting benchmark on the mainstream machine learning clouds. Since these platforms continue to innovate, our benchmark tries to reflect their evolvement. Concretely, this paper consists of two sub-benchmarks—mlbench and automlbench. When we first started this work in 2016, only two cloud platforms provide machine learning services and limited themselves to model training and simple hyper-parameter tuning. We then focus on binary classification problems and present mlbench, a novel benchmark constructed by harvesting datasets from Kaggle competitions. We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on mlbench. In the recent few years, more cloud providers support machine learning and include automatic machine learning (AutoML) techniques in their machine learning clouds. Their AutoML services can ease manual tuning on the whole machine learning pipeline, including but not limited to data preprocessing, feature selection, model selection, hyper-parameter, and model ensemble. To reflect these advancements, we design automlbench to assess the AutoML performance of four machine learning clouds using different kinds of workloads. Our comparative study reveals the strength and weakness of existing machine learning clouds and points out potential future directions for improvement.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A versatile framework for attributed network clustering via K-nearest neighbor augmentation Discovering critical vertices for reinforcement of large-scale bipartite networks DumpyOS: A data-adaptive multi-ary index for scalable data series similarity search Enabling space-time efficient range queries with REncoder AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1