D-SmartML: A Distributed Automated Machine Learning Framework

A. Elrahman, M. Elhelw, Radwa El Shawi, S. Sakr
{"title":"D-SmartML: A Distributed Automated Machine Learning Framework","authors":"A. Elrahman, M. Elhelw, Radwa El Shawi, S. Sakr","doi":"10.1109/ICDCS47774.2020.00115","DOIUrl":null,"url":null,"abstract":"Nowadays, machine learning is playing a crucial role in harnessing the value of massive data amount currently produced every day. The process of building a high-quality machine learning model is an iterative, complex and time-consuming process that requires solid knowledge about the various machine learning algorithms in addition to having a good experience with effectively tuning their hyper-parameters. With the booming demand for machine learning applications, it has been recognized that the number of knowledgeable data scientists can not scale with the growing data volumes and application needs in our digital world. Therefore, recently, several automated machine learning (AutoML) frameworks have been developed by automating the process of Combined Algorithm Selection and Hyper-parameter tuning (CASH). However, a main limitation of these frameworks is that they have been built on top of centralized machine learning libraries (e.g. scikit-learn) that can only work on a single node and thus they are not scalable to process and handle large data volumes. To tackle this challenge, we demonstrate D-SmartML, a distributed AutoML framework on top of Apache Spark, a distributed data processing framework. Our framework is equipped with a meta learning mechanism for automated algorithm selection and supports three different automated hyper-parameter tuning techniques: distributed grid search, distributed random search and distributed hyperband optimization. We will demonstrate the scalability of our framework on handling large datasets. In addition, we will show how our framework outperforms the-state-of-the-art framework for distributed AutoML optimization, TransmogrifAI.","PeriodicalId":158630,"journal":{"name":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS47774.2020.00115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Nowadays, machine learning is playing a crucial role in harnessing the value of massive data amount currently produced every day. The process of building a high-quality machine learning model is an iterative, complex and time-consuming process that requires solid knowledge about the various machine learning algorithms in addition to having a good experience with effectively tuning their hyper-parameters. With the booming demand for machine learning applications, it has been recognized that the number of knowledgeable data scientists can not scale with the growing data volumes and application needs in our digital world. Therefore, recently, several automated machine learning (AutoML) frameworks have been developed by automating the process of Combined Algorithm Selection and Hyper-parameter tuning (CASH). However, a main limitation of these frameworks is that they have been built on top of centralized machine learning libraries (e.g. scikit-learn) that can only work on a single node and thus they are not scalable to process and handle large data volumes. To tackle this challenge, we demonstrate D-SmartML, a distributed AutoML framework on top of Apache Spark, a distributed data processing framework. Our framework is equipped with a meta learning mechanism for automated algorithm selection and supports three different automated hyper-parameter tuning techniques: distributed grid search, distributed random search and distributed hyperband optimization. We will demonstrate the scalability of our framework on handling large datasets. In addition, we will show how our framework outperforms the-state-of-the-art framework for distributed AutoML optimization, TransmogrifAI.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
D-SmartML:分布式自动化机器学习框架
如今,机器学习在利用每天产生的大量数据的价值方面发挥着至关重要的作用。构建高质量机器学习模型的过程是一个迭代、复杂和耗时的过程,除了具有有效调整超参数的良好经验外,还需要对各种机器学习算法有扎实的了解。随着对机器学习应用的需求不断增长,人们已经认识到,在我们的数字世界中,知识渊博的数据科学家的数量无法满足不断增长的数据量和应用需求。因此,最近,通过自动化组合算法选择和超参数调优(CASH)过程,开发了几种自动化机器学习(AutoML)框架。然而,这些框架的一个主要限制是它们是建立在集中的机器学习库(例如scikit-learn)之上的,这些库只能在单个节点上工作,因此它们不能扩展到处理和处理大数据量。为了应对这一挑战,我们展示了D-SmartML,一个基于Apache Spark(分布式数据处理框架)的分布式AutoML框架。我们的框架配备了用于自动算法选择的元学习机制,并支持三种不同的自动超参数调优技术:分布式网格搜索、分布式随机搜索和分布式超带优化。我们将演示我们的框架在处理大型数据集方面的可伸缩性。此外,我们将展示我们的框架如何优于分布式AutoML优化的最先进框架TransmogrifAI。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Energy-Efficient Edge Offloading Scheme for UAV-Assisted Internet of Things Kill Two Birds with One Stone: Auto-tuning RocksDB for High Bandwidth and Low Latency BlueFi: Physical-layer Cross-Technology Communication from Bluetooth to WiFi [Title page i] Distributionally Robust Edge Learning with Dirichlet Process Prior
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1