{"title":"利用MapReduce探索星系结构模式发现的可扩展性","authors":"A. Vulpe, M. Frîncu","doi":"10.1109/CCGrid.2016.46","DOIUrl":null,"url":null,"abstract":"Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"224 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce\",\"authors\":\"A. Vulpe, M. Frîncu\",\"doi\":\"10.1109/CCGrid.2016.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"224 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce
Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.