Driving big data with big compute

2012 IEEE Conference on High Performance Extreme Computing Pub Date : 2012-09-01 DOI:10.1109/HPEC.2012.6408678

C. Byun, W. Arcand, David Bestor, Bill Bergeron, M. Hubbell, J. Kepner, A. McCabe, P. Michaleas, J. Mullen, David O'Gwynn, Andrew Prout, A. Reuther, Antonio Rosa, Charles Yee

{"title":"Driving big data with big compute","authors":"C. Byun, W. Arcand, David Bestor, Bill Bergeron, M. Hubbell, J. Kepner, A. McCabe, P. Michaleas, J. Mullen, David O'Gwynn, Andrew Prout, A. Reuther, Antonio Rosa, Charles Yee","doi":"10.1109/HPEC.2012.6408678","DOIUrl":null,"url":null,"abstract":"Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.","PeriodicalId":193020,"journal":{"name":"2012 IEEE Conference on High Performance Extreme Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Conference on High Performance Extreme Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2012.6408678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

Abstract

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用大计算驱动大数据

大数据(以Hadoop集群为代表)和大计算(以MPI集群为代表)为存储和处理大量数据提供了独特的能力。Hadoop集群使Java社区可以很容易地访问分布式计算，而MPI集群为计算密集型工作负载提供了高并行效率。将大数据和大计算社区结合在一起是一个活跃的研究领域。LLGrid团队已经开发和部署了许多技术，旨在提供两者的最佳效果。LLGrid MapReduce允许在任何语言和任何计算集群上快速有效地使用map/reduce并行编程模型。D4M(动态分布式维度数据模型)为Apache Accumulo数据库提供了一个高级分布式数组接口。这些技术的可访问性是通过测量使用这些工具的工作量来评估的，通常是几行代码。性能是通过测量插入到Accumulo数据库中的速率来评估的。使用这些工具，在一个8节点集群上，数据库插入速率达到了4M /秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 IEEE Conference on High Performance Extreme Computing

自引率

0.00%

发文量