Driving big data with big compute

C. Byun, W. Arcand, David Bestor, Bill Bergeron, M. Hubbell, J. Kepner, A. McCabe, P. Michaleas, J. Mullen, David O'Gwynn, Andrew Prout, A. Reuther, Antonio Rosa, Charles Yee
{"title":"Driving big data with big compute","authors":"C. Byun, W. Arcand, David Bestor, Bill Bergeron, M. Hubbell, J. Kepner, A. McCabe, P. Michaleas, J. Mullen, David O'Gwynn, Andrew Prout, A. Reuther, Antonio Rosa, Charles Yee","doi":"10.1109/HPEC.2012.6408678","DOIUrl":null,"url":null,"abstract":"Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Conference on High Performance Extreme Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2012.6408678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 48

Abstract

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用大计算驱动大数据
大数据(以Hadoop集群为代表)和大计算(以MPI集群为代表)为存储和处理大量数据提供了独特的能力。Hadoop集群使Java社区可以很容易地访问分布式计算,而MPI集群为计算密集型工作负载提供了高并行效率。将大数据和大计算社区结合在一起是一个活跃的研究领域。LLGrid团队已经开发和部署了许多技术,旨在提供两者的最佳效果。LLGrid MapReduce允许在任何语言和任何计算集群上快速有效地使用map/reduce并行编程模型。D4M(动态分布式维度数据模型)为Apache Accumulo数据库提供了一个高级分布式数组接口。这些技术的可访问性是通过测量使用这些工具的工作量来评估的,通常是几行代码。性能是通过测量插入到Accumulo数据库中的速率来评估的。使用这些工具,在一个8节点集群上,数据库插入速率达到了4M /秒。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Synthetic Aperture Radar on low power multi-core Digital Signal Processor Accelerating fully homomorphic encryption using GPU Parallel search of k-nearest neighbors with synchronous operations An update on SIPHER (Scalable Implementation of Primitives for Homomorphic EncRyption) — FPGA implementation using Simulink High locality and increased intra-node parallelism for solving finite element models on GPUs by novel element-by-element implementation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1