Mrs:高性能MapReduce迭代和异步算法在Python

Jeffrey Lund, C. Ashcraft, Andrew W. McNabb, Kevin Seppi
{"title":"Mrs:高性能MapReduce迭代和异步算法在Python","authors":"Jeffrey Lund, C. Ashcraft, Andrew W. McNabb, Kevin Seppi","doi":"10.1109/PYHPC.2016.10","DOIUrl":null,"url":null,"abstract":"Mrs [1] is a lightweight Python-based MapReduce implementation designed to make MapReduce programs easy to write and quick to run, particularly useful for research and academia. A common set of algorithms that would benefit from Mrs are iterative algorithms, like those frequently found in machine learning; however, iterative algorithms typically perform poorly in the MapReduce framework, meaning potentially poor performance in Mrs as well.Therefore, we propose four modifications to the original Mrs with the intent to improve its ability to perform iterative algorithms. First, we used direct task-to-task communication for most iterations and only occasionally write to a distributed file system to preserve fault tolerance. Second, we combine the reduce and map tasks which span successive iterations to eliminate unnecessary communication and scheduling latency. Third, we propose a generator-callback programming model to allow for greater flexibility in the scheduling of tasks. Finally, some iterative algorithms are naturally expressed in terms of asynchronous message passing, so we propose a fully asynchronous variant of MapReduce.We then demonstrate Mrs' enhanced performance in the context of two iterative applications: particle swarm optimization (PSO), and expectation maximization (EM).","PeriodicalId":178771,"journal":{"name":"2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mrs: High Performance MapReduce for Iterative and Asynchronous Algorithms in Python\",\"authors\":\"Jeffrey Lund, C. Ashcraft, Andrew W. McNabb, Kevin Seppi\",\"doi\":\"10.1109/PYHPC.2016.10\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mrs [1] is a lightweight Python-based MapReduce implementation designed to make MapReduce programs easy to write and quick to run, particularly useful for research and academia. A common set of algorithms that would benefit from Mrs are iterative algorithms, like those frequently found in machine learning; however, iterative algorithms typically perform poorly in the MapReduce framework, meaning potentially poor performance in Mrs as well.Therefore, we propose four modifications to the original Mrs with the intent to improve its ability to perform iterative algorithms. First, we used direct task-to-task communication for most iterations and only occasionally write to a distributed file system to preserve fault tolerance. Second, we combine the reduce and map tasks which span successive iterations to eliminate unnecessary communication and scheduling latency. Third, we propose a generator-callback programming model to allow for greater flexibility in the scheduling of tasks. Finally, some iterative algorithms are naturally expressed in terms of asynchronous message passing, so we propose a fully asynchronous variant of MapReduce.We then demonstrate Mrs' enhanced performance in the context of two iterative applications: particle swarm optimization (PSO), and expectation maximization (EM).\",\"PeriodicalId\":178771,\"journal\":{\"name\":\"2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PYHPC.2016.10\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PYHPC.2016.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

Mrs[1]是一个轻量级的基于python的MapReduce实现,旨在使MapReduce程序易于编写和快速运行,对研究和学术界特别有用。从Mrs中受益的一组常见算法是迭代算法,就像机器学习中经常发现的那些算法;然而,迭代算法通常在MapReduce框架中表现不佳,这意味着在Mrs中也可能表现不佳。因此,我们对原始的Mrs进行了四种修改,以提高其执行迭代算法的能力。首先,对于大多数迭代,我们使用直接的任务到任务通信,只是偶尔写入分布式文件系统,以保持容错性。其次,我们将跨越连续迭代的reduce和map任务结合起来,以消除不必要的通信和调度延迟。第三,我们提出了一个生成器-回调编程模型,以便在任务调度中具有更大的灵活性。最后,一些迭代算法自然地以异步消息传递的方式表达,因此我们提出了MapReduce的完全异步变体。然后,我们在两个迭代应用:粒子群优化(PSO)和期望最大化(EM)的背景下展示了Mrs的增强性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Mrs: High Performance MapReduce for Iterative and Asynchronous Algorithms in Python
Mrs [1] is a lightweight Python-based MapReduce implementation designed to make MapReduce programs easy to write and quick to run, particularly useful for research and academia. A common set of algorithms that would benefit from Mrs are iterative algorithms, like those frequently found in machine learning; however, iterative algorithms typically perform poorly in the MapReduce framework, meaning potentially poor performance in Mrs as well.Therefore, we propose four modifications to the original Mrs with the intent to improve its ability to perform iterative algorithms. First, we used direct task-to-task communication for most iterations and only occasionally write to a distributed file system to preserve fault tolerance. Second, we combine the reduce and map tasks which span successive iterations to eliminate unnecessary communication and scheduling latency. Third, we propose a generator-callback programming model to allow for greater flexibility in the scheduling of tasks. Finally, some iterative algorithms are naturally expressed in terms of asynchronous message passing, so we propose a fully asynchronous variant of MapReduce.We then demonstrate Mrs' enhanced performance in the context of two iterative applications: particle swarm optimization (PSO), and expectation maximization (EM).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Migrating Legacy Fortran to Python While Retaining Fortran-Level Performance through Transpilation and Type Hints Boosting Python Performance on Intel Processors: A Case Study of Optimizing Music Recognition PALLADIO: A Parallel Framework for Robust Variable Selection in High-Dimensional Data Dynamic Provisioning and Execution of HPC Workflows Using Python Mrs: High Performance MapReduce for Iterative and Asynchronous Algorithms in Python
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1