Upgrading a high performance computing environment for massive data processing

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Internet Services and Applications Pub Date : 2019-10-16 DOI:10.1186/s13174-019-0118-7
Lucas M. Ponce, Walter dos Santos, Wagner Meira, Dorgival Guedes, Daniele Lezzi, Rosa M. Badia
{"title":"Upgrading a high performance computing environment for massive data processing","authors":"Lucas M. Ponce, Walter dos Santos, Wagner Meira, Dorgival Guedes, Daniele Lezzi, Rosa M. Badia","doi":"10.1186/s13174-019-0118-7","DOIUrl":null,"url":null,"abstract":"High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.","PeriodicalId":46467,"journal":{"name":"Journal of Internet Services and Applications","volume":"176 2 1","pages":"1-18"},"PeriodicalIF":2.4000,"publicationDate":"2019-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Internet Services and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13174-019-0118-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 7

Abstract

High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
升级高性能计算环境,处理海量数据
高性能计算(HPC)和海量数据处理(Big data)是两个开始融合的趋势。在这个过程中,硬件架构、系统支持和编程范例的各个方面正在从两个角度重新审视。本文介绍了我们在这条收敛路径上的经验,并提出了一个框架,该框架解决了源自这种集成的一些编程问题。我们的贡献是开发一个集成环境,它集成了(1)COMPSs,一种用于开发和执行分布式基础设施并行应用程序的编程框架;柠檬水,数据挖掘和分析工具;(iii) HDFS,大数据系统中使用最广泛的分布式文件系统。为了验证我们的框架,我们使用Lemonade创建了通过HDFS访问数据的COMPSs应用程序,并将它们与使用流行的大数据框架Spark构建的等效应用程序进行了比较。结果表明,通过简化数据访问和重新安排数据传输,减少执行时间,HDFS集成使comps受益。与Lemonade的集成促进了COMPSs的使用,并可能有助于它在数据科学社区的普及,因为它为数据领域的专家提供了高效的算法实现,这些专家希望开发具有更高抽象层次的应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Internet Services and Applications
Journal of Internet Services and Applications COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
3.70
自引率
0.00%
发文量
2
审稿时长
13 weeks
期刊最新文献
Load Balancing between Paths using Software Defined Networks Predictive Fraud Detection: An Intelligent Method for Internet of Smart Grid Things Systems An Approach to Remote Update Embedded Systems in the Internet of Things NetOr: A Microservice Oriented Inter-Domain Vertical Service Orchestrator for 5G Networks Data Compression in LoRa Networks: A Compromise between Performance and Energy Consumption
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1