利用MapReduce探索星系结构模式发现的可扩展性

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI:10.1109/CCGrid.2016.46

A. Vulpe, M. Frîncu

{"title":"利用MapReduce探索星系结构模式发现的可扩展性","authors":"A. Vulpe, M. Frîncu","doi":"10.1109/CCGrid.2016.46","DOIUrl":null,"url":null,"abstract":"Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"224 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce\",\"authors\":\"A. Vulpe, M. Frîncu\",\"doi\":\"10.1109/CCGrid.2016.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"224 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

众所周知，天体物理学的应用需要大量的数据和计算，望远镜每天都会产生大量的图像。为了分析这些图像，在原始数据上应用了数据挖掘、统计和图像处理技术。像MapReduce这样的大数据平台是处理和存储天体物理数据的理想选择，因为它们能够处理松散耦合的并行任务。这些平台通常部署在云中，然而，大多数天体物理学应用程序都是遗留应用程序，没有针对云计算进行优化。虽然有些人在利用Hadoop的优势来存储天体物理数据和处理现有的大型数据集方面做了一些工作，但对云支持的天体物理应用程序的可扩展性进行评估的研究并不多。在这项工作中，我们分析了MapReduce应用程序在与集群检测和集群间空间模式搜索相关的天体物理问题中的数据和资源可扩展性。最大并行度受模式搜索中的集群数量和(集群、子集群)对数量的限制。我们在谷歌计算引擎和Amazon EC2上执行缩放测试。我们表明，虽然实现了数据可伸缩性，但资源可伸缩性(向上扩展)是有限的，而且似乎依赖于底层云平台。对于未来的工作，我们还计划在数十个具有几GB大输入文件的实例上研究扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce

Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

自引率

0.00%

发文量