D-SmartML: A Distributed Automated Machine Learning Framework

2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2020-11-01 DOI:10.1109/ICDCS47774.2020.00115

A. Elrahman, M. Elhelw, Radwa El Shawi, S. Sakr

{"title":"D-SmartML: A Distributed Automated Machine Learning Framework","authors":"A. Elrahman, M. Elhelw, Radwa El Shawi, S. Sakr","doi":"10.1109/ICDCS47774.2020.00115","DOIUrl":null,"url":null,"abstract":"Nowadays, machine learning is playing a crucial role in harnessing the value of massive data amount currently produced every day. The process of building a high-quality machine learning model is an iterative, complex and time-consuming process that requires solid knowledge about the various machine learning algorithms in addition to having a good experience with effectively tuning their hyper-parameters. With the booming demand for machine learning applications, it has been recognized that the number of knowledgeable data scientists can not scale with the growing data volumes and application needs in our digital world. Therefore, recently, several automated machine learning (AutoML) frameworks have been developed by automating the process of Combined Algorithm Selection and Hyper-parameter tuning (CASH). However, a main limitation of these frameworks is that they have been built on top of centralized machine learning libraries (e.g. scikit-learn) that can only work on a single node and thus they are not scalable to process and handle large data volumes. To tackle this challenge, we demonstrate D-SmartML, a distributed AutoML framework on top of Apache Spark, a distributed data processing framework. Our framework is equipped with a meta learning mechanism for automated algorithm selection and supports three different automated hyper-parameter tuning techniques: distributed grid search, distributed random search and distributed hyperband optimization. We will demonstrate the scalability of our framework on handling large datasets. In addition, we will show how our framework outperforms the-state-of-the-art framework for distributed AutoML optimization, TransmogrifAI.","PeriodicalId":158630,"journal":{"name":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS47774.2020.00115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Nowadays, machine learning is playing a crucial role in harnessing the value of massive data amount currently produced every day. The process of building a high-quality machine learning model is an iterative, complex and time-consuming process that requires solid knowledge about the various machine learning algorithms in addition to having a good experience with effectively tuning their hyper-parameters. With the booming demand for machine learning applications, it has been recognized that the number of knowledgeable data scientists can not scale with the growing data volumes and application needs in our digital world. Therefore, recently, several automated machine learning (AutoML) frameworks have been developed by automating the process of Combined Algorithm Selection and Hyper-parameter tuning (CASH). However, a main limitation of these frameworks is that they have been built on top of centralized machine learning libraries (e.g. scikit-learn) that can only work on a single node and thus they are not scalable to process and handle large data volumes. To tackle this challenge, we demonstrate D-SmartML, a distributed AutoML framework on top of Apache Spark, a distributed data processing framework. Our framework is equipped with a meta learning mechanism for automated algorithm selection and supports three different automated hyper-parameter tuning techniques: distributed grid search, distributed random search and distributed hyperband optimization. We will demonstrate the scalability of our framework on handling large datasets. In addition, we will show how our framework outperforms the-state-of-the-art framework for distributed AutoML optimization, TransmogrifAI.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

D-SmartML:分布式自动化机器学习框架

如今，机器学习在利用每天产生的大量数据的价值方面发挥着至关重要的作用。构建高质量机器学习模型的过程是一个迭代、复杂和耗时的过程，除了具有有效调整超参数的良好经验外，还需要对各种机器学习算法有扎实的了解。随着对机器学习应用的需求不断增长，人们已经认识到，在我们的数字世界中，知识渊博的数据科学家的数量无法满足不断增长的数据量和应用需求。因此，最近，通过自动化组合算法选择和超参数调优(CASH)过程，开发了几种自动化机器学习(AutoML)框架。然而，这些框架的一个主要限制是它们是建立在集中的机器学习库(例如scikit-learn)之上的，这些库只能在单个节点上工作，因此它们不能扩展到处理和处理大数据量。为了应对这一挑战，我们展示了D-SmartML，一个基于Apache Spark(分布式数据处理框架)的分布式AutoML框架。我们的框架配备了用于自动算法选择的元学习机制，并支持三种不同的自动超参数调优技术:分布式网格搜索、分布式随机搜索和分布式超带优化。我们将演示我们的框架在处理大型数据集方面的可伸缩性。此外，我们将展示我们的框架如何优于分布式AutoML优化的最先进框架TransmogrifAI。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)

自引率

0.00%

发文量

期刊最新文献

An Energy-Efficient Edge Offloading Scheme for UAV-Assisted Internet of Things Kill Two Birds with One Stone: Auto-tuning RocksDB for High Bandwidth and Low Latency BlueFi: Physical-layer Cross-Technology Communication from Bluetooth to WiFi [Title page i] Distributionally Robust Edge Learning with Dirichlet Process Prior