首页 > 最新文献

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing最新文献

英文 中文
Optimal Footprint Symbiosis in Shared Cache 共享缓存中的最优内存占用共生
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.153
Xiaolin Wang, Yechen Li, Yingwei Luo, Xiameng Hu, Jacob Brock, C. Ding, Zhenlin Wang
On multicore processors, applications are run sharing the cache. This paper presents online optimization to collocate applications to minimize cache interference to maximize performance. The paper formulates the optimization problem and solution, presents a new sampling technique for locality analysis and evaluates it in an exhaustive test of 12,870 cases. For locality analysis, previous sampling was two orders of magnitude faster than full-trace analysis. The new sampling reduces the cost by another two orders of magnitude. The best prior work improves co-run performance by 56% on average. The new optimization improves it by another 29%. When sampling and optimization are combined, the paper shows that it takes less than 0.1 second analysis per program to obtain a co-run that is within 1.5% of the best possible performance.
在多核处理器上,应用程序运行时共享缓存。本文提出了在线优化配置应用程序,以减少缓存干扰,最大限度地提高性能。本文提出了优化问题及其求解方法,提出了一种新的局部分析抽样技术,并通过对12870例的穷举测试对其进行了评价。对于局部性分析,以前的采样比全迹分析快两个数量级。新的采样方法将成本又降低了两个数量级。最好的先前工作平均提高了56%的协同运行性能。新的优化又提高了29%。当采样和优化相结合时,论文表明,每个程序只需不到0.1秒的分析就可以获得在最佳性能的1.5%以内的共同运行。
{"title":"Optimal Footprint Symbiosis in Shared Cache","authors":"Xiaolin Wang, Yechen Li, Yingwei Luo, Xiameng Hu, Jacob Brock, C. Ding, Zhenlin Wang","doi":"10.1109/CCGrid.2015.153","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.153","url":null,"abstract":"On multicore processors, applications are run sharing the cache. This paper presents online optimization to collocate applications to minimize cache interference to maximize performance. The paper formulates the optimization problem and solution, presents a new sampling technique for locality analysis and evaluates it in an exhaustive test of 12,870 cases. For locality analysis, previous sampling was two orders of magnitude faster than full-trace analysis. The new sampling reduces the cost by another two orders of magnitude. The best prior work improves co-run performance by 56% on average. The new optimization improves it by another 29%. When sampling and optimization are combined, the paper shows that it takes less than 0.1 second analysis per program to obtain a co-run that is within 1.5% of the best possible performance.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"52 1","pages":"412-422"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78833815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Cloud-Based OLAP over Big Data: Application Scenarios and Performance Analysis 基于大数据的云OLAP:应用场景及性能分析
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.174
A. Cuzzocrea, Rim Moussa, Guandong Xu, G. Grasso
Following our previous research results, in this paper we provide two authoritative application scenarios that build on top of OLAP*, a middleware for parallel processing of OLAP queries that truly realizes effective and efficiently OLAP over Big Data. We have provided two authoritative case studies, namely parallel OLAP data cube processing and virtual OLAP data cube design, for which we also propose a comprehensive performance evaluation and analysis. Derived analysis clearly confirms the benefits of our proposed framework.
根据我们之前的研究成果,本文提供了两种权威的应用场景,它们构建在OLAP*之上,OLAP*是一种并行处理OLAP查询的中间件,真正实现了基于大数据的高效OLAP。我们提供了两个权威的案例研究,即并行OLAP数据立方体处理和虚拟OLAP数据立方体设计,并对其进行了全面的性能评估和分析。衍生分析清楚地证实了我们提出的框架的好处。
{"title":"Cloud-Based OLAP over Big Data: Application Scenarios and Performance Analysis","authors":"A. Cuzzocrea, Rim Moussa, Guandong Xu, G. Grasso","doi":"10.1109/CCGrid.2015.174","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.174","url":null,"abstract":"Following our previous research results, in this paper we provide two authoritative application scenarios that build on top of OLAP*, a middleware for parallel processing of OLAP queries that truly realizes effective and efficiently OLAP over Big Data. We have provided two authoritative case studies, namely parallel OLAP data cube processing and virtual OLAP data cube design, for which we also propose a comprehensive performance evaluation and analysis. Derived analysis clearly confirms the benefits of our proposed framework.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"146 1","pages":"921-927"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85008743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Highly Available Cloud-Based Cluster Management 基于云的高可用集群管理
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.125
Dmitry Duplyakin, Matthew Haney, H. Tufo
We present an architecture that increases persistence and reliability of automated infrastructure management in the context of hybrid, cluster-cloud environments. We describe our highly available implementation that builds upon Chef configuration management system and infrastructure-as-a-service cloud resources from Amazon Web Services. We summarize our experience with managing a 20-node Linux cluster using this implementation. Our analysis of utilization and cost of necessary cloud resources indicates that the designed system is a low-cost alternative to acquiring additional physical hardware for hardening cluster management.
我们提出了一种在混合、集群云环境中提高自动化基础设施管理持久性和可靠性的架构。我们描述了基于Chef配置管理系统和Amazon Web Services的基础设施即服务云资源的高可用性实现。我们总结一下使用此实现管理20节点Linux集群的经验。我们对必要云资源的利用率和成本的分析表明,所设计的系统是一种低成本的替代方案,可以获得额外的物理硬件来加强集群管理。
{"title":"Highly Available Cloud-Based Cluster Management","authors":"Dmitry Duplyakin, Matthew Haney, H. Tufo","doi":"10.1109/CCGrid.2015.125","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.125","url":null,"abstract":"We present an architecture that increases persistence and reliability of automated infrastructure management in the context of hybrid, cluster-cloud environments. We describe our highly available implementation that builds upon Chef configuration management system and infrastructure-as-a-service cloud resources from Amazon Web Services. We summarize our experience with managing a 20-node Linux cluster using this implementation. Our analysis of utilization and cost of necessary cloud resources indicates that the designed system is a low-cost alternative to acquiring additional physical hardware for hardening cluster management.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"4 1","pages":"1201-1204"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87266025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BigDataDIRAC: Deploying Distributed Big Data Applications BigDataDIRAC:部署分布式大数据应用
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.109
Víctor Fernández, V. Muñoz, T. F. Pena
The Distributed Infrastructure with Remote Agent Control (DIRAC) software framework allows a user community to manage computing activities in different grid and cloud environments. Many communities from several fields (LHCb, Belle II, Creatis, DIRAC4EGI multiple community portal, etc.) use DIRAC to run jobs in distributed environments. Google created the MapReduce programming model offering an efficient way of performing distributed computation over large data sets. Several enterprises are providing Hadoop cloud based resources to their users, and are trying to simplify the usage of Hadoop in the cloud. Based in these two robust technologies, we have created BigDataDIRAC, a solution which gives users the opportunity to access multiple Big Data resources scattered in different geographical areas, such as access to grid resources. This approach opens the possibility of offering not only grid and cloud to the users, but also Big Data resources from the same DIRAC environment. Proof of concept is shown using three computing centers in two countries, and with four Hadoop clusters. Our results demonstrate the ability of BigDataDIRAC to manage jobs driven by dataset location stored in the Hadoop File System (HDFS) of the Hadoop distributed clusters. DIRAC is used to monitor the execution, collect the necessary statistical data, and upload the results from the remote HDFS to the SandBox Storage machine. The tests produced the equivalent of 5 days continuous processing.
具有远程代理控制的分布式基础设施(DIRAC)软件框架允许用户社区管理不同网格和云环境中的计算活动。来自多个领域的许多社区(LHCb、Belle II、Creatis、DIRAC4EGI多社区门户等)使用DIRAC在分布式环境中运行作业。Google创建了MapReduce编程模型,提供了在大型数据集上执行分布式计算的有效方法。一些企业正在为他们的用户提供基于Hadoop云的资源,并试图简化Hadoop在云中的使用。基于这两项强大的技术,我们创建了BigDataDIRAC,该解决方案使用户有机会访问分散在不同地理区域的多个大数据资源,例如访问网格资源。这种方法不仅可以为用户提供网格和云,还可以从相同的DIRAC环境中提供大数据资源。概念验证使用了两个国家的三个计算中心和四个Hadoop集群。我们的结果证明了BigDataDIRAC管理由存储在Hadoop分布式集群的Hadoop文件系统(HDFS)中的数据集位置驱动的作业的能力。DIRAC用于监控执行,收集必要的统计数据,并将结果从远程HDFS上传到SandBox Storage机器。测试产生了相当于5天连续处理的结果。
{"title":"BigDataDIRAC: Deploying Distributed Big Data Applications","authors":"Víctor Fernández, V. Muñoz, T. F. Pena","doi":"10.1109/CCGrid.2015.109","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.109","url":null,"abstract":"The Distributed Infrastructure with Remote Agent Control (DIRAC) software framework allows a user community to manage computing activities in different grid and cloud environments. Many communities from several fields (LHCb, Belle II, Creatis, DIRAC4EGI multiple community portal, etc.) use DIRAC to run jobs in distributed environments. Google created the MapReduce programming model offering an efficient way of performing distributed computation over large data sets. Several enterprises are providing Hadoop cloud based resources to their users, and are trying to simplify the usage of Hadoop in the cloud. Based in these two robust technologies, we have created BigDataDIRAC, a solution which gives users the opportunity to access multiple Big Data resources scattered in different geographical areas, such as access to grid resources. This approach opens the possibility of offering not only grid and cloud to the users, but also Big Data resources from the same DIRAC environment. Proof of concept is shown using three computing centers in two countries, and with four Hadoop clusters. Our results demonstrate the ability of BigDataDIRAC to manage jobs driven by dataset location stored in the Hadoop File System (HDFS) of the Hadoop distributed clusters. DIRAC is used to monitor the execution, collect the necessary statistical data, and upload the results from the remote HDFS to the SandBox Storage machine. The tests produced the equivalent of 5 days continuous processing.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"26 1","pages":"1177-1180"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90259948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud 云中数据密集型科学工作流的数据放置策略
Qing Zhao, Congcong Xiong, Xi Zhao, Ce Yu, Jian Xiao
With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow's execution. In this paper, we proposed a 2- stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named "the first order conduction correlation". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.
随着云计算和大数据的到来,许多具有大量数据的科学应用可以抽象为科学工作流,并在云环境中运行。智能地分布这些数据集可以有效地减少工作流执行过程中的数据传输。在本文中,我们提出了一个两阶段的数据放置策略。在初始阶段,我们根据数据集的相关性对数据集进行聚类,并将这些聚类分配到数据中心。与已有工作相比,我们将数据大小纳入关联计算,并对中间数据提出了一种新的数据关联,称为“一阶传导关联”。从而可以更合理地衡量数据传输成本。在运行阶段,重新分配算法可以根据变化的因素调整数据布局,并且对重新分配本身的开销也进行了测量。仿真结果表明,该策略可以有效地减少工作流执行过程中数据移动的时间消耗。
{"title":"A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud","authors":"Qing Zhao, Congcong Xiong, Xi Zhao, Ce Yu, Jian Xiao","doi":"10.1109/CCGrid.2015.72","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.72","url":null,"abstract":"With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow's execution. In this paper, we proposed a 2- stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named \"the first order conduction correlation\". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"11 1","pages":"928-934"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84290821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Log-Structured Global Array for Efficient Multi-Version Snapshots 高效多版本快照的日志结构全局数组
H. Fujita, N. Dun, Z. Rubenstein, A. Chien
In exascale systems, increasing error rate -- particularly silent data corruption -- is a major concern. The Global ViewResilience (GVR) system builds a new model of application resilience on versioned global arrays. These arrays can be exploited for flexible, application-specific error checking and recovery. We explore a fundamental challenge to the GVR model -- the cost of versioning. We propose a novel log-structured implementation that appends new data to an update log, simultaneously tracking modified regions and versioning incrementally. We compare performance of log-structured arrays to traditional flat arrays using micro-benchmarks and three full applications, and show that versioning can be more than 10x faster, and reduce memory cost significantly. Further, in future systems with NVRAM, a log-structured approach is more tolerant onramp limitations such as write bandwidth and wear-out.
在百亿亿次系统中,不断增加的错误率——尤其是无声数据损坏——是一个主要问题。GVR (Global ViewResilience)系统在版本化的全局数组上建立了一个新的应用程序弹性模型。这些阵列可以用于灵活的、特定于应用程序的错误检查和恢复。我们将探讨GVR模型面临的一个基本挑战——版本控制成本。我们提出了一种新颖的日志结构实现,它将新数据附加到更新日志中,同时跟踪修改的区域并逐步进行版本控制。我们使用微基准测试和三个完整的应用程序比较了日志结构数组与传统平面数组的性能,并表明版本控制可以提高10倍以上的速度,并显著降低内存成本。此外,在未来使用NVRAM的系统中,日志结构方法更能容忍入匝道限制,如写带宽和损耗。
{"title":"Log-Structured Global Array for Efficient Multi-Version Snapshots","authors":"H. Fujita, N. Dun, Z. Rubenstein, A. Chien","doi":"10.1109/CCGrid.2015.80","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.80","url":null,"abstract":"In exascale systems, increasing error rate -- particularly silent data corruption -- is a major concern. The Global ViewResilience (GVR) system builds a new model of application resilience on versioned global arrays. These arrays can be exploited for flexible, application-specific error checking and recovery. We explore a fundamental challenge to the GVR model -- the cost of versioning. We propose a novel log-structured implementation that appends new data to an update log, simultaneously tracking modified regions and versioning incrementally. We compare performance of log-structured arrays to traditional flat arrays using micro-benchmarks and three full applications, and show that versioning can be more than 10x faster, and reduce memory cost significantly. Further, in future systems with NVRAM, a log-structured approach is more tolerant onramp limitations such as write bandwidth and wear-out.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"7 3 1","pages":"281-291"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83478157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Parallel Algorithm for Clipping Polygons with Improved Bounds and a Distributed Overlay Processing System Using MPI 改进边界多边形裁剪并行算法及基于MPI的分布式叠加处理系统
S. Puri, S. Prasad
Clipping arbitrary polygons is one of the complex operations in computer graphics and computational geometry. It is applied in many fields such as Geographic Information Systems (GIS) and VLSI CAD. We have two significant results to report. Our first result is the effective parallelization of the classic, highly sequential Greiner-Hormann algorithm, which yields the first output-sensitive CREW PRAM algorithm for a pair of simple polygons, and can perform clipping in O(logn) time using O(n+k) processors, where n is the total number of vertices and k is the number of edge intersections. This improves upon our previous clipping algorithm based on the parallelization of Vatti's sweepline algorithm, which requires O(n+k+k') processors to achieve logarithmic time complexity where k' can be O(n2). This also improves upon another O(logn) time algorithm by Karinthi, Srinivas, and Almasi which unlike our algorithm does not handle self-intersecting polygons, is not output-sensitive, and must employ O(n2) processors to achieve O(logn) time. We also study multi-core and many-core implementations of our parallel Greiner-Hormann algorithm. Our second result is a practical, parallel GIS system, namely MPI-GIS, for polygon overlay processing of two GIS layers containing large number of polygons over a cluster of compute nodes. It employs R-tree for efficient indexing and identification of potentially intersecting set of polygons across two input GIS layers. Spatial data files tend to be large in size (in GBs) and the underlying overlay computation is highly irregular and compute intensive. This system achieves 44X speedup on a 32-node NERSC's CARVER cluster while processing about 600K polygons in two GIS layers within 19 seconds which takes over 13 minutes on state-of-art ArcGIS system.
任意多边形的裁剪是计算机图形学和计算几何中的复杂运算之一。它被广泛应用于地理信息系统(GIS)、超大规模集成电路(VLSI) CAD等领域。我们有两个重要的结果要报告。我们的第一个结果是经典的、高度顺序的Greiner-Hormann算法的有效并行化,它产生了对一对简单多边形的第一个输出敏感的CREW PRAM算法,并且可以使用O(n+k)处理器在O(logn)时间内执行裁剪,其中n是顶点的总数,k是边缘相交的数量。这改进了我们之前基于Vatti横扫线算法并行化的裁剪算法,该算法需要O(n+k+k')个处理器来实现对数时间复杂度,其中k'可以为O(n2)。这也改进了Karinthi, Srinivas和Almasi的另一个O(logn)时间算法,与我们的算法不同,该算法不处理自相交多边形,对输出不敏感,并且必须使用O(n2)处理器来实现O(logn)时间。我们还研究了并行Greiner-Hormann算法的多核和多核实现。我们的第二个成果是一个实用的并行GIS系统,即MPI-GIS,用于在一组计算节点上对包含大量多边形的两个GIS层进行多边形叠加处理。它采用r树来有效地索引和识别跨两个输入GIS层的潜在相交多边形集。空间数据文件往往很大(以gb为单位),底层的覆盖计算是高度不规则和计算密集的。该系统在32节点的NERSC CARVER集群上实现了44X的加速,同时在19秒内处理两个GIS层中大约60万个多边形,而在最先进的ArcGIS系统上需要13分钟以上。
{"title":"A Parallel Algorithm for Clipping Polygons with Improved Bounds and a Distributed Overlay Processing System Using MPI","authors":"S. Puri, S. Prasad","doi":"10.1109/CCGrid.2015.43","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.43","url":null,"abstract":"Clipping arbitrary polygons is one of the complex operations in computer graphics and computational geometry. It is applied in many fields such as Geographic Information Systems (GIS) and VLSI CAD. We have two significant results to report. Our first result is the effective parallelization of the classic, highly sequential Greiner-Hormann algorithm, which yields the first output-sensitive CREW PRAM algorithm for a pair of simple polygons, and can perform clipping in O(logn) time using O(n+k) processors, where n is the total number of vertices and k is the number of edge intersections. This improves upon our previous clipping algorithm based on the parallelization of Vatti's sweepline algorithm, which requires O(n+k+k') processors to achieve logarithmic time complexity where k' can be O(n2). This also improves upon another O(logn) time algorithm by Karinthi, Srinivas, and Almasi which unlike our algorithm does not handle self-intersecting polygons, is not output-sensitive, and must employ O(n2) processors to achieve O(logn) time. We also study multi-core and many-core implementations of our parallel Greiner-Hormann algorithm. Our second result is a practical, parallel GIS system, namely MPI-GIS, for polygon overlay processing of two GIS layers containing large number of polygons over a cluster of compute nodes. It employs R-tree for efficient indexing and identification of potentially intersecting set of polygons across two input GIS layers. Spatial data files tend to be large in size (in GBs) and the underlying overlay computation is highly irregular and compute intensive. This system achieves 44X speedup on a 32-node NERSC's CARVER cluster while processing about 600K polygons in two GIS layers within 19 seconds which takes over 13 minutes on state-of-art ArcGIS system.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"138 1","pages":"576-585"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86591068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
A Priority-Based Scheduling Heuristic to Maximize Parallelism of Ready Tasks for DAG Applications 一种基于优先级的启发式调度方法以最大化DAG应用程序的就绪任务并行性
Wei Zheng, Lu Tang, R. Sakellariou
In practical Cloud/Grid computing systems, DAG scheduling may be faced with challenges arising from severe uncertainty about the underlying platform. For instance, it could be hard to have explicit information about task execution time and/or the availability of resources, both may change dynamically, in difficult to predict ways. In such a setting, the development of various kinds of just-in-time scheduling schemes, which aim at maximizing the parallelism of ready tasks of DAG, seems to be a promising approach to cope with the lack of environment information and achieve efficient DAG execution. Although many attempts have been tried to develop such just-in-time scheduling heuristics, most of them are based on DAG decomposition, which results in complicated and suboptimal solutions for general DAGs. This paper presents a priority-based heuristic, which is not only easy to apply to arbitrary DAGs, but also exhibits comparable or better performance than the existing solutions.
在实际的云/网格计算系统中,DAG调度可能面临底层平台严重不确定性带来的挑战。例如,很难获得关于任务执行时间和/或资源可用性的明确信息,这两者都可能以难以预测的方式动态变化。在这种情况下,开发各种以DAG就绪任务并行度最大化为目标的just-in-time调度方案,似乎是应对环境信息缺乏、实现DAG高效执行的一种很有前景的方法。尽管已经有许多尝试开发这种即时调度启发式方法,但大多数方法都是基于DAG分解,这导致一般DAG的解决方案复杂且次优。本文提出了一种基于优先级的启发式算法,它不仅易于应用于任意dag,而且具有与现有解决方案相当或更好的性能。
{"title":"A Priority-Based Scheduling Heuristic to Maximize Parallelism of Ready Tasks for DAG Applications","authors":"Wei Zheng, Lu Tang, R. Sakellariou","doi":"10.1109/CCGrid.2015.97","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.97","url":null,"abstract":"In practical Cloud/Grid computing systems, DAG scheduling may be faced with challenges arising from severe uncertainty about the underlying platform. For instance, it could be hard to have explicit information about task execution time and/or the availability of resources, both may change dynamically, in difficult to predict ways. In such a setting, the development of various kinds of just-in-time scheduling schemes, which aim at maximizing the parallelism of ready tasks of DAG, seems to be a promising approach to cope with the lack of environment information and achieve efficient DAG execution. Although many attempts have been tried to develop such just-in-time scheduling heuristics, most of them are based on DAG decomposition, which results in complicated and suboptimal solutions for general DAGs. This paper presents a priority-based heuristic, which is not only easy to apply to arbitrary DAGs, but also exhibits comparable or better performance than the existing solutions.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"74 1","pages":"596-605"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82290594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Deep Learning Prediction Process Accelerator Based FPGA 基于FPGA的深度学习预测过程加速器
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.114
Qi Yu, Chao Wang, Xiang Ma, Xi Li, Xuehai Zhou
Recently, machine learning is widely used in applications and cloud services. And as the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems. To give users better experience, high performance implementations of deep learning applications seem very important. As a common means to accelerate algorithms, FPGA has high performance, low power consumption, small size and other characteristics. So we use FPGA to design a deep learning accelerator, the accelerator focuses on the implementation of the prediction process, data access optimization and pipeline structure. Compared with Core 2 CPU 2.3GHz, our accelerator can achieve promising result.
最近,机器学习在应用程序和云服务中得到了广泛的应用。而深度学习作为机器学习的新兴领域,在解决复杂学习问题方面表现出了出色的能力。为了给用户更好的体验,深度学习应用的高性能实现显得非常重要。FPGA作为一种常用的算法加速手段,具有高性能、低功耗、体积小等特点。与Core 2 CPU 2.3GHz相比,我们的加速器可以取得很好的效果。
{"title":"A Deep Learning Prediction Process Accelerator Based FPGA","authors":"Qi Yu, Chao Wang, Xiang Ma, Xi Li, Xuehai Zhou","doi":"10.1109/CCGrid.2015.114","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.114","url":null,"abstract":"Recently, machine learning is widely used in applications and cloud services. And as the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems. To give users better experience, high performance implementations of deep learning applications seem very important. As a common means to accelerate algorithms, FPGA has high performance, low power consumption, small size and other characteristics. So we use FPGA to design a deep learning accelerator, the accelerator focuses on the implementation of the prediction process, data access optimization and pipeline structure. Compared with Core 2 CPU 2.3GHz, our accelerator can achieve promising result.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"1 1","pages":"1159-1162"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90057790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
A Resource Allocation Model for Hybrid Storage Systems 混合存储系统的资源分配模型
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.132
Hui Wang, P. Varman
Providing QoS guarantees for hybrid storage systems made up of both solid-state drives (SSDs) and hard disks (HDs) is a challenging problem. Since HDs and SSDs have widely different IOPS capacities, it is not sensible to treat the storage system as a monolithic black box, instead a useful QoS model must necessarily differentiate the IOs made to different device types. Traditional storage resource allocation models have largely been designed to provide QoS for a single resource type, and result in poor utilization and fairness when applied to multiple coupled resources. In this paper, we present a new resource allocation model for hybrid storage systems using a multi-resource framework. The model supports reservations and shares for clients sharing the storage system. Reservations specify the minimum throughput (IOPS) that a client must receive, while shares reflect its weight relative to other clients that are bottlenecked on the same device. We present a formal multi-resource allocation model to allocate IOPS to clients, together with an IO scheduling algorithm to maximize system throughput. The model and algorithms are validated with empirical results.
为由固态驱动器(ssd)和硬盘(HDs)组成的混合存储系统提供QoS保证是一个具有挑战性的问题。由于hdd和ssd的IOPS容量差异很大,因此将存储系统视为一个单一的黑盒子是不明智的,相反,一个有用的QoS模型必须区分针对不同设备类型的IOs。传统的存储资源分配模型在很大程度上被设计为为单一资源类型提供QoS,当应用于多个耦合资源时,其利用率和公平性较差。本文提出了一种基于多资源框架的混合存储系统资源分配模型。该模型支持对共享存储系统的客户端进行预留和共享。预留指定客户端必须接收的最小吞吐量(IOPS),而份额则反映其相对于在同一设备上遇到瓶颈的其他客户端的权重。我们提出了一个正式的多资源分配模型来为客户端分配IOPS,以及一个IO调度算法来最大化系统吞吐量。实验结果验证了模型和算法的正确性。
{"title":"A Resource Allocation Model for Hybrid Storage Systems","authors":"Hui Wang, P. Varman","doi":"10.1109/CCGrid.2015.132","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.132","url":null,"abstract":"Providing QoS guarantees for hybrid storage systems made up of both solid-state drives (SSDs) and hard disks (HDs) is a challenging problem. Since HDs and SSDs have widely different IOPS capacities, it is not sensible to treat the storage system as a monolithic black box, instead a useful QoS model must necessarily differentiate the IOs made to different device types. Traditional storage resource allocation models have largely been designed to provide QoS for a single resource type, and result in poor utilization and fairness when applied to multiple coupled resources. In this paper, we present a new resource allocation model for hybrid storage systems using a multi-resource framework. The model supports reservations and shares for clients sharing the storage system. Reservations specify the minimum throughput (IOPS) that a client must receive, while shares reflect its weight relative to other clients that are bottlenecked on the same device. We present a formal multi-resource allocation model to allocate IOPS to clients, together with an IO scheduling algorithm to maximize system throughput. The model and algorithms are validated with empirical results.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"108 1","pages":"91-100"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73349179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1