利用MapReduce探索星系结构模式发现的可扩展性

A. Vulpe, M. Frîncu
{"title":"利用MapReduce探索星系结构模式发现的可扩展性","authors":"A. Vulpe, M. Frîncu","doi":"10.1109/CCGrid.2016.46","DOIUrl":null,"url":null,"abstract":"Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"224 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce\",\"authors\":\"A. Vulpe, M. Frîncu\",\"doi\":\"10.1109/CCGrid.2016.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"224 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

众所周知,天体物理学的应用需要大量的数据和计算,望远镜每天都会产生大量的图像。为了分析这些图像,在原始数据上应用了数据挖掘、统计和图像处理技术。像MapReduce这样的大数据平台是处理和存储天体物理数据的理想选择,因为它们能够处理松散耦合的并行任务。这些平台通常部署在云中,然而,大多数天体物理学应用程序都是遗留应用程序,没有针对云计算进行优化。虽然有些人在利用Hadoop的优势来存储天体物理数据和处理现有的大型数据集方面做了一些工作,但对云支持的天体物理应用程序的可扩展性进行评估的研究并不多。在这项工作中,我们分析了MapReduce应用程序在与集群检测和集群间空间模式搜索相关的天体物理问题中的数据和资源可扩展性。最大并行度受模式搜索中的集群数量和(集群、子集群)对数量的限制。我们在谷歌计算引擎和Amazon EC2上执行缩放测试。我们表明,虽然实现了数据可伸缩性,但资源可伸缩性(向上扩展)是有限的,而且似乎依赖于底层云平台。对于未来的工作,我们还计划在数十个具有几GB大输入文件的实例上研究扩展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce
Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Increasing the Performance of Data Centers by Combining Remote GPU Virtualization with Slurm DiBA: Distributed Power Budget Allocation for Large-Scale Computing Clusters Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era DTStorage: Dynamic Tape-Based Storage for Cost-Effective and Highly-Available Streaming Service Facilitating the Execution of HPC Workloads in Colombia through the Integration of a Private IaaS and a Scientific PaaS/SaaS Marketplace
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1