2015 44th International Conference on Parallel Processing最新文献_第7页

A Heterogeneity-Aware Region-Level Data Layout for Hybrid Parallel File Systems 混合并行文件系统的异构感知区域级数据布局

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.43

Shuibing He, Xian-He Sun, Yang Wang, Antonios Kougkas, Adnan Haider

Parallel file systems (PFS) are commonly used in high-end computing systems. With the emergence of solid state drives (SSD), hybrid PFSs, which consist of both HDD and SSD servers, provide a practical I/O system solution for data-intensive applications. However, most existing PFS layout schemes are inefficient for hybrid PFSs due to their lack of awareness of the performance differences between heterogeneous servers and the workload changes between different parts of a file. This lack of recognition can result in severe I/O performance degradation. In this study, we propose a heterogeneity-aware region-level (HARL) data layout scheme to improve the data distribution of a hybrid PFS. HARL first divides a file into fine-grained, varying sized regions according to the changes of an application's I/O workload, then chooses appropriate file stripe sizes on heterogeneous servers based on the server performance for each file region. Experimental results of representative benchmarks show that HARL can greatly improve the I/O system performance.

并行文件系统(PFS)通常用于高端计算系统。随着固态硬盘(SSD)的出现，混合pfs(由HDD和SSD服务器组成)为数据密集型应用程序提供了实用的I/O系统解决方案。然而，大多数现有的PFS布局方案对于混合PFS是低效的，因为它们缺乏对异构服务器之间的性能差异和文件不同部分之间的工作负载变化的认识。缺乏识别会导致严重的I/O性能下降。在这项研究中，我们提出了一种异构感知区域级(HARL)数据布局方案，以改善混合PFS的数据分布。HARL首先根据应用程序I/O工作负载的变化将文件划分为细粒度的大小不同的区域，然后根据每个文件区域的服务器性能在异构服务器上选择适当的文件条带大小。代表性基准测试的实验结果表明，HARL可以极大地提高I/O系统的性能。

引用次数: 12

A Penalty Aware Memory Allocation Scheme for Key-Value Cache 键值缓存的惩罚感知内存分配方案

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.62

Jianqiang Ou, Marc Patton, M. D. Moore, Yuehai Xu, Song Jiang

Key-value caches, represented by Mem cached, play a critical role in data centers. Its efficacy can significantly impact users' perceived service time and back-end systems' workloads. A central issue in the in-memory cache's management is memory allocation, or how the limited space is distributed for storing key-value items of various sizes. When a cache is full, the allocation issue is how to conduct replacement operations on items of different sizes. To effectively address the issue, a practitioner must simultaneously consider three factors, which are access locality, item size, and miss penalty. Existing designs consider only one or two of the first two factors, and pay little attention on miss penalty. This inadequacy can substantially compromise utilization of cache space and request service time. In this paper we propose a Penalty Aware Memory Allocation scheme (PAMA) that takes all three factors into account. While the three different factors cannot be directly compared to each other in a quantitative manner, PAMA uses their impacts on service time to determine where a unit of memory space should be (de)allocated. The impacts are quantified as the decrease (or increase) of service time if a unit of space is allocated (or deal located). PAMA efficiently tracks access pattern and use of memory, and speculatively evaluates the impacts to enable penalty-aware memory allocation for KV caches. Our evaluation with real-world Mem cached workload traces demonstrates that PAMA can significantly reduce request service time compared to other representative KV cache management schemes.

以Mem缓存为代表的键值缓存在数据中心中起着至关重要的作用。它的有效性可以显著影响用户感知的服务时间和后端系统的工作负载。内存缓存管理中的一个中心问题是内存分配，或者如何分配有限的空间来存储不同大小的键值项。当缓存已满时，分配问题是如何对不同大小的项进行替换操作。为了有效地解决这个问题，从业者必须同时考虑三个因素，即访问位置、项目大小和错过处罚。现有的设计只考虑了前两个因素中的一个或两个，而很少关注失分惩罚。这种不足会严重影响缓存空间的利用率和请求服务时间。在本文中，我们提出了一种惩罚感知内存分配方案(PAMA)，该方案考虑了这三个因素。虽然这三个不同的因素不能以定量的方式直接相互比较，但PAMA使用它们对服务时间的影响来确定应该在何处分配一个内存空间单元。如果分配了一个单位的空间(或定位了一个交易)，则将影响量化为服务时间的减少(或增加)。PAMA有效地跟踪访问模式和内存的使用，并推测性地评估影响，以实现KV缓存的惩罚感知内存分配。我们对实际Mem缓存工作负载跟踪的评估表明，与其他代表性的KV缓存管理方案相比，PAMA可以显着减少请求服务时间。

{"title":"A Penalty Aware Memory Allocation Scheme for Key-Value Cache","authors":"Jianqiang Ou, Marc Patton, M. D. Moore, Yuehai Xu, Song Jiang","doi":"10.1109/ICPP.2015.62","DOIUrl":"https://doi.org/10.1109/ICPP.2015.62","url":null,"abstract":"Key-value caches, represented by Mem cached, play a critical role in data centers. Its efficacy can significantly impact users' perceived service time and back-end systems' workloads. A central issue in the in-memory cache's management is memory allocation, or how the limited space is distributed for storing key-value items of various sizes. When a cache is full, the allocation issue is how to conduct replacement operations on items of different sizes. To effectively address the issue, a practitioner must simultaneously consider three factors, which are access locality, item size, and miss penalty. Existing designs consider only one or two of the first two factors, and pay little attention on miss penalty. This inadequacy can substantially compromise utilization of cache space and request service time. In this paper we propose a Penalty Aware Memory Allocation scheme (PAMA) that takes all three factors into account. While the three different factors cannot be directly compared to each other in a quantitative manner, PAMA uses their impacts on service time to determine where a unit of memory space should be (de)allocated. The impacts are quantified as the decrease (or increase) of service time if a unit of space is allocated (or deal located). PAMA efficiently tracks access pattern and use of memory, and speculatively evaluates the impacts to enable penalty-aware memory allocation for KV caches. Our evaluation with real-world Mem cached workload traces demonstrates that PAMA can significantly reduce request service time compared to other representative KV cache management schemes.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128429219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

REED: A Reliable Energy-Efficient RAID 里德:可靠的节能RAID

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.74

Shu Yin, Xuewu Li, Kenli Li, Jianzhong Huang, X. Ruan, Xiaomin Zhu, Wei Cao, X. Qin

Recent studies indicate that the energy cost and carbon footprint of data centers have become exorbitant. It is a demanding and challenging task to reduce energy consumption in large-scale storage systems in modern data centers. Most energy conservation techniques inevitably have adverse impacts on parallel disk systems. To address the reliability issues of energy-efficient parallel disks, we propose a reliable energy-efficient RAID system called REED, which aims at improving both energy efficiency and reliability of RAID systems by seamlessly integrating HDDs and SSDs. At the heart of REED is a high-performance cache mechanism powered by SSDs, which are serving popular data. Under light workload conditions, REED spins down HDDs into the low-power mode, thereby offering energy conservation. Importantly, during an I/O access turbulence (i.e., I/O load is dynamically and frequently changing), REED is conducive to reducing the number of disk power-state transitions by keeping HDDs in the low-power mode while serving requests with SSDs. We build a model to quantitatively show that REED is capable of improving the reliability of energy-efficient RAIDs. We implement the REED prototype in a real-world RAID-0 system. Our experimental results demonstrate that REED improves the energy-efficiency of conventional RAID-0 by up to 73% while maintaining good reliability.

最近的研究表明，数据中心的能源成本和碳足迹已经变得过高。在现代数据中心中，如何降低大型存储系统的能耗是一项艰巨而富有挑战性的任务。大多数节能技术不可避免地对并行磁盘系统产生不利影响。为了解决节能并行磁盘的可靠性问题，我们提出了一种可靠的节能RAID系统，称为REED，旨在通过无缝集成hdd和ssd来提高RAID系统的能效和可靠性。REED的核心是由ssd提供支持的高性能缓存机制，ssd提供流行数据。在轻工作负载条件下，REED将hdd旋转到低功耗模式，从而提供节能。重要的是，在I/O访问动荡期间(即，I/O负载是动态和频繁变化的)，REED有助于通过将hdd保持在低功耗模式，同时使用ssd处理请求，从而减少磁盘电源状态转换的数量。我们建立了一个模型，定量地表明REED能够提高节能raid的可靠性。我们在真实的RAID-0系统中实现了REED原型。我们的实验结果表明，REED在保持良好可靠性的同时，将传统RAID-0的能效提高了73%。

{"title":"REED: A Reliable Energy-Efficient RAID","authors":"Shu Yin, Xuewu Li, Kenli Li, Jianzhong Huang, X. Ruan, Xiaomin Zhu, Wei Cao, X. Qin","doi":"10.1109/ICPP.2015.74","DOIUrl":"https://doi.org/10.1109/ICPP.2015.74","url":null,"abstract":"Recent studies indicate that the energy cost and carbon footprint of data centers have become exorbitant. It is a demanding and challenging task to reduce energy consumption in large-scale storage systems in modern data centers. Most energy conservation techniques inevitably have adverse impacts on parallel disk systems. To address the reliability issues of energy-efficient parallel disks, we propose a reliable energy-efficient RAID system called REED, which aims at improving both energy efficiency and reliability of RAID systems by seamlessly integrating HDDs and SSDs. At the heart of REED is a high-performance cache mechanism powered by SSDs, which are serving popular data. Under light workload conditions, REED spins down HDDs into the low-power mode, thereby offering energy conservation. Importantly, during an I/O access turbulence (i.e., I/O load is dynamically and frequently changing), REED is conducive to reducing the number of disk power-state transitions by keeping HDDs in the low-power mode while serving requests with SSDs. We build a model to quantitatively show that REED is capable of improving the reliability of energy-efficient RAIDs. We implement the REED prototype in a real-world RAID-0 system. Our experimental results demonstrate that REED improves the energy-efficiency of conventional RAID-0 by up to 73% while maintaining good reliability.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130678704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A Responsive Knapsack-Based Algorithm for Resource Provisioning and Scheduling of Scientific Workflows in Clouds 云环境下科学工作流资源分配与调度的响应式背包算法

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.93

M. A. Rodriguez, R. Buyya

Scientific workflows are used to process vast amounts of data and to conduct large-scale experiments and simulations. They are time consuming and resource intensive applications that benefit from running in distributed platforms. In particular, scientific workflows can greatly leverage the ease-of-access, affordability, and scalability offered by cloud computing. To achieve this, innovative and efficient ways of orchestrating the workflow tasks and managing the compute resources in a cost-conscious manner need to be developed. We propose an adaptive, resource provisioning and scheduling algorithm for scientific workflows deployed in Infrastructure as a Service clouds. Our algorithm was designed to address challenges specific to clouds such as the pay-as-you-go model, the performance variation of resources and the on-demand access to unlimited, heterogeneous virtual machines. It is capable of responding to the dynamics of the cloud infrastructure and is successful in generating efficient solutions that meet a user-defined deadline and minimise the overall cost of the used infrastructure. Our simulation experiments demonstrate that it performs better than other state-of-the-art algorithms.

科学工作流程用于处理大量数据并进行大规模实验和模拟。它们是耗时且资源密集的应用程序，可以从运行在分布式平台中获益。特别是，科学工作流可以极大地利用云计算提供的易访问性、可负担性和可伸缩性。为了实现这一目标，需要开发创新和有效的方法，以成本意识的方式编排工作流任务和管理计算资源。我们为部署在基础设施即服务云中的科学工作流提出了一种自适应的资源配置和调度算法。我们的算法旨在解决云特有的挑战，例如按需付费模式、资源的性能变化以及对无限异构虚拟机的按需访问。它能够响应云基础设施的动态，并成功地生成高效的解决方案，满足用户定义的最后期限，并最大限度地降低所使用基础设施的总体成本。我们的模拟实验表明，它比其他最先进的算法性能更好。

{"title":"A Responsive Knapsack-Based Algorithm for Resource Provisioning and Scheduling of Scientific Workflows in Clouds","authors":"M. A. Rodriguez, R. Buyya","doi":"10.1109/ICPP.2015.93","DOIUrl":"https://doi.org/10.1109/ICPP.2015.93","url":null,"abstract":"Scientific workflows are used to process vast amounts of data and to conduct large-scale experiments and simulations. They are time consuming and resource intensive applications that benefit from running in distributed platforms. In particular, scientific workflows can greatly leverage the ease-of-access, affordability, and scalability offered by cloud computing. To achieve this, innovative and efficient ways of orchestrating the workflow tasks and managing the compute resources in a cost-conscious manner need to be developed. We propose an adaptive, resource provisioning and scheduling algorithm for scientific workflows deployed in Infrastructure as a Service clouds. Our algorithm was designed to address challenges specific to clouds such as the pay-as-you-go model, the performance variation of resources and the on-demand access to unlimited, heterogeneous virtual machines. It is capable of responding to the dynamics of the cloud infrastructure and is successful in generating efficient solutions that meet a user-defined deadline and minimise the overall cost of the used infrastructure. Our simulation experiments demonstrate that it performs better than other state-of-the-art algorithms.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131925582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Bit Flipping Errors in High Performance Linpack at Exascale and Beyond Exascale及以上的高性能Linpack中的位翻转错误

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.51

Erlin Yao, Guangming Tan

For the High Performance Linpack (HPL) benchmark at the coming Exascale and beyond, silent errors like bit flipping in memory are expected to become inevitable. However, since bit flipping errors are difficult to be detected and located, their impact to the numerical correctness of HPL has not been evaluated thoroughly and quantitatively, while the impact at Exascale is especially susceptible. In this paper, an initial quantitative analysis of the impact of bit flipping errors to the numerical correctness of HPL has been investigated. To validate the numerical correctness of computed solution using HPL, there is a residual check after the approximate solution obtained. This paper has shown that in the case of only one bit flipping to any element in the original data matrix, if the flipped position is not the leading position of exponent, the residual check in HPL will almost surely pass at the scale of Exa flops and beyond. Experiments on modified HPL in single precision at small scales have verified the theoretical results in double precision at Exascale. The results obtained in this paper can provide a better understanding to the impact of bit flipping errors to numerical correctness of scientific computing applications.

对于即将到来的百亿亿级及更高级别的高性能Linpack (HPL)基准测试，内存中的位翻转等无声错误预计将不可避免。然而，由于比特翻转错误难以检测和定位，它们对HPL数值正确性的影响尚未得到全面和定量的评估，而在Exascale上的影响尤其容易受到影响。本文初步定量分析了位翻转误差对HPL数值正确性的影响。为了验证HPL计算解的数值正确性，在得到近似解后进行残差检验。本文证明了在只有1位翻转到原始数据矩阵中的任何元素的情况下，如果翻转位置不是指数的前导位置，则HPL中的残差检查几乎肯定会在Exa或更大的浮点数范围内通过。在小尺度单精度改进HPL上的实验验证了双精度的理论结果。本文的研究结果可以更好地理解比特翻转误差对科学计算应用数值正确性的影响。

{"title":"Bit Flipping Errors in High Performance Linpack at Exascale and Beyond","authors":"Erlin Yao, Guangming Tan","doi":"10.1109/ICPP.2015.51","DOIUrl":"https://doi.org/10.1109/ICPP.2015.51","url":null,"abstract":"For the High Performance Linpack (HPL) benchmark at the coming Exascale and beyond, silent errors like bit flipping in memory are expected to become inevitable. However, since bit flipping errors are difficult to be detected and located, their impact to the numerical correctness of HPL has not been evaluated thoroughly and quantitatively, while the impact at Exascale is especially susceptible. In this paper, an initial quantitative analysis of the impact of bit flipping errors to the numerical correctness of HPL has been investigated. To validate the numerical correctness of computed solution using HPL, there is a residual check after the approximate solution obtained. This paper has shown that in the case of only one bit flipping to any element in the original data matrix, if the flipped position is not the leading position of exponent, the residual check in HPL will almost surely pass at the scale of Exa flops and beyond. Experiments on modified HPL in single precision at small scales have verified the theoretical results in double precision at Exascale. The results obtained in this paper can provide a better understanding to the impact of bit flipping errors to numerical correctness of scientific computing applications.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131416342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems 异构分布式存储系统数据再生的最优节点选择

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.48

Qingyuan Gong, Jiaqi Wang, Dongsheng Wei, Jin Wang, Xin Wang

Distributed storage systems introduce redundancy to protect data from node failures. After a storage node fails, the lost data should be regenerated at a replacement storage node as soon as possible to maintain the same level of redundancy. Minimizing such a regeneration time is critical to the reliability of distributed storage systems. Existing work commits to reduce the regeneration time by either minimizing the regenerating traffic, or adjusting the regenerating traffic patterns, whereas nodes participating the regeneration are generally assumed to be given beforehand. However, real-world distributed storage systems usually exhibit heterogeneous link capacities, and the regeneration time is highly related to the selection of the participating nodes. In this paper, we consider the minimization of the regeneration time by selecting the participating nodes in heterogeneous networks. We propose optimal node selection algorithms respectively for two cases: 1) the newcomer is not given, 2) both the newcomer and the providers are not given. Analysis shows that the optimal regeneration time can be achieved in each case. We then consider the effect of flexible amount of data blocks from each provider on the regeneration time, and apply this observation to enhance our schemes. Experiment results show that our node selection schemes can significantly reduce the regeneration time, especially in practical networks with heterogeneous link capacities, compared with the scheme based on random node selection.

分布式存储系统通过引入冗余来保护数据免受节点故障的影响。当存储节点出现故障时，需要尽快在替换的存储节点上恢复丢失的数据，以保持原有的冗余度。最小化再生时间对于提高分布式存储系统的可靠性至关重要。现有工作通过最小化再生流量或调整再生流量模式来减少再生时间，而参与再生的节点通常是预先给定的。然而，现实中的分布式存储系统通常表现为异构链路容量，再生时间与参与节点的选择高度相关。本文通过选择异构网络中的参与节点来考虑再生时间的最小化。本文分别针对两种情况提出了最优节点选择算法:1)新来者不给定，2)新来者和提供者都不给定。分析表明，在每种情况下均可获得最佳再生时间。然后，我们考虑了来自每个提供者的数据块的灵活数量对再生时间的影响，并应用这一观察结果来改进我们的方案。实验结果表明，与基于随机节点选择的方案相比，我们的节点选择方案可以显著缩短再生时间，特别是在具有异构链路容量的实际网络中。

{"title":"Optimal Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems","authors":"Qingyuan Gong, Jiaqi Wang, Dongsheng Wei, Jin Wang, Xin Wang","doi":"10.1109/ICPP.2015.48","DOIUrl":"https://doi.org/10.1109/ICPP.2015.48","url":null,"abstract":"Distributed storage systems introduce redundancy to protect data from node failures. After a storage node fails, the lost data should be regenerated at a replacement storage node as soon as possible to maintain the same level of redundancy. Minimizing such a regeneration time is critical to the reliability of distributed storage systems. Existing work commits to reduce the regeneration time by either minimizing the regenerating traffic, or adjusting the regenerating traffic patterns, whereas nodes participating the regeneration are generally assumed to be given beforehand. However, real-world distributed storage systems usually exhibit heterogeneous link capacities, and the regeneration time is highly related to the selection of the participating nodes. In this paper, we consider the minimization of the regeneration time by selecting the participating nodes in heterogeneous networks. We propose optimal node selection algorithms respectively for two cases: 1) the newcomer is not given, 2) both the newcomer and the providers are not given. Analysis shows that the optimal regeneration time can be achieved in each case. We then consider the effect of flexible amount of data blocks from each provider on the regeneration time, and apply this observation to enhance our schemes. Experiment results show that our node selection schemes can significantly reduce the regeneration time, especially in practical networks with heterogeneous link capacities, compared with the scheme based on random node selection.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131494629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Efficient Use of Hardware Transactional Memory for Parallel Mesh Generation 并行网格生成中硬件事务内存的有效利用

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.69

Tetsu Kobayashi, Shigeyuki Sato, H. Iwasaki

Efficient transactional executions are desirable for parallel implementations of algorithms with graph refinements. Hardware transactional memory (HTM) is promising for easy yet efficient transactional executions. Long HTM transactions, however, abort with high probability because of hardware limitations. Unfortunately, Delaunay mesh refinement (DMR), which is an algorithm with graph refinements for mesh generation, causes long transactions. Its parallel implementation naively based on HTM therefore leads to poor performance. To utilize HTM efficiently for parallel implementation of DMR, we present an approach to shortening transactions. Our HTM based implementations of DMR achieved significantly higher throughput and better scalability than a naive HTM-based one and lock-based ones. On a quad-core Has well processor, the absolute speedup of one of our implementations was up to 2.64 with 16 threads.

高效的事务执行对于具有图形细化的算法的并行实现是理想的。硬件事务性内存(HTM)有望实现简单而高效的事务性执行。但是，由于硬件限制，长HTM事务很可能会失败。不幸的是，Delaunay网格细化(DMR)是一种用图形细化来生成网格的算法，它会导致长时间的事务。因此，其基于HTM的并行实现会导致较差的性能。为了有效地利用HTM并行实现DMR，我们提出了一种缩短事务的方法。我们基于HTM的DMR实现比简单的基于html和基于锁的实现实现了更高的吞吐量和更好的可伸缩性。在四核Has well处理器上，我们的一个实现的绝对加速高达2.64，有16个线程。

引用次数: 0

Reflex-Tree: A Biologically Inspired Parallel Architecture for Future Smart Cities 反射树:生物学启发的未来智慧城市并行架构

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.45

Jason Kane, Bo Tang, Zhen Chen, Jun Yan, Tao Wei, Haibo He, Qing Yang

We introduce a new parallel computing and communication architecture, Reflex-Tree, with massive sensing, data processing, and control functions suitable for future smart cities. The central feature of the proposed Reflex-Tree architecture is inspired by a fundamental element of the human nervous system: reflex arcs, the neuromuscular reactions and instinctive motions of a part of the body in response to urgent situations. At the bottom level of the Reflex-Tree (layer 4), novel sensing devices are proposed that are controlled by low power processing elements. These "leaf" nodes are then connected to new classification engines based on machine learning techniques, including support vector machines (SVM), to form the third layer. The next layer up consists of servers that provide accurate control decisions via multi-layer adaptive learning and spatial-temporal association, before they are connected to the top level cloud where complex system behavior analysis is performed. Our multi-layered architecture mimics human neural circuits to achieve the high levels of parallelization and scalability required for efficient city-wide monitoring and feedback. To demonstrate the utility of our architecture, we present the design, implementation, and experimental evaluation of a prototype Reflex-Tree. City power supply network and gas pipeline management scenarios are used to drive our prototype as case studies. We show the effectiveness for several levels of the architecture and discuss the feasibility of implementation.

我们推出了一种新的并行计算和通信架构，反射树，具有大规模的传感，数据处理和控制功能，适合未来的智慧城市。所提出的反射树结构的中心特征的灵感来自于人类神经系统的一个基本元素:反射弧，神经肌肉的反应和身体的一部分在紧急情况下的本能运动。在反射树的底层(第4层)，提出了由低功耗处理元件控制的新型传感器件。然后，这些“叶子”节点连接到基于机器学习技术的新分类引擎，包括支持向量机(SVM)，形成第三层。下一层由服务器组成，这些服务器通过多层自适应学习和时空关联提供准确的控制决策，然后连接到执行复杂系统行为分析的顶级云。我们的多层架构模仿人类神经回路，以实现高效的全市监测和反馈所需的高水平并行化和可扩展性。为了演示我们架构的实用性，我们给出了一个原型Reflex-Tree的设计、实现和实验评估。城市供电网络和天然气管道管理场景用于驱动我们的原型作为案例研究。我们展示了该体系结构的几个级别的有效性，并讨论了实现的可行性。

{"title":"Reflex-Tree: A Biologically Inspired Parallel Architecture for Future Smart Cities","authors":"Jason Kane, Bo Tang, Zhen Chen, Jun Yan, Tao Wei, Haibo He, Qing Yang","doi":"10.1109/ICPP.2015.45","DOIUrl":"https://doi.org/10.1109/ICPP.2015.45","url":null,"abstract":"We introduce a new parallel computing and communication architecture, Reflex-Tree, with massive sensing, data processing, and control functions suitable for future smart cities. The central feature of the proposed Reflex-Tree architecture is inspired by a fundamental element of the human nervous system: reflex arcs, the neuromuscular reactions and instinctive motions of a part of the body in response to urgent situations. At the bottom level of the Reflex-Tree (layer 4), novel sensing devices are proposed that are controlled by low power processing elements. These \"leaf\" nodes are then connected to new classification engines based on machine learning techniques, including support vector machines (SVM), to form the third layer. The next layer up consists of servers that provide accurate control decisions via multi-layer adaptive learning and spatial-temporal association, before they are connected to the top level cloud where complex system behavior analysis is performed. Our multi-layered architecture mimics human neural circuits to achieve the high levels of parallelization and scalability required for efficient city-wide monitoring and feedback. To demonstrate the utility of our architecture, we present the design, implementation, and experimental evaluation of a prototype Reflex-Tree. City power supply network and gas pipeline management scenarios are used to drive our prototype as case studies. We show the effectiveness for several levels of the architecture and discuss the feasibility of implementation.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123350581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Connecting the Dots: Reconstructing Network Behavior with Individual and Lossy Logs 连接点:重建网络行为与个人和有损日志

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.26

Jiliang Wang, Xiaolong Zheng, Xufei Mao, Zhichao Cao, Daibo Liu, Yunhao Liu

In distributed networks such as wireless ad hoc networks, local and lossy logs are often available on individual nodes. We propose REFILL, which analyzes lossy and unsynchronized logs collected from individual nodes and reconstructs the network behaviors. We design an inference engine based on protocol semantics to abstract states on each node. Further we leverage inherent and implicit event correlations in and between nodes to connect interference engines and analyze logs from different nodes. Based on unsynchronized and incomplete logs, REFILL can reconstruct network behavior, recover the network scenario and understand what has happened in the network. We show that the result of REFILL can be used to guide protocol design, network management, diagnosis, etc. We implement REFILL and apply it to a large-scale wireless sensor network project. REFILL provides a detailed per-packet tracing information based on event flows. We show that REFILL can reveal and verify fundamental issues, like locating packet loss positions and root causes. Further, we present implications and demonstrate how to leverage REFILL to enhance network performance.

在诸如无线自组织网络之类的分布式网络中，本地和有损日志通常在单个节点上可用。我们提出了fill，它分析从单个节点收集的有损和不同步的日志，并重建网络行为。我们设计了一个基于协议语义的推理引擎来抽象每个节点上的状态。此外，我们利用节点内部和节点之间固有的和隐式的事件相关性来连接干扰引擎并分析来自不同节点的日志。基于不同步和不完整的日志，可以重构网络行为，恢复网络场景，了解网络中发生了什么。结果表明，该方法可用于指导协议设计、网络管理、诊断等。我们实现了fill并将其应用到一个大型无线传感器网络项目中。fill提供了基于事件流的详细的每个包跟踪信息。我们展示了fill可以揭示和验证基本问题，如定位数据包丢失位置和根本原因。此外，我们提出了影响并演示了如何利用重新填充来提高网络性能。

{"title":"Connecting the Dots: Reconstructing Network Behavior with Individual and Lossy Logs","authors":"Jiliang Wang, Xiaolong Zheng, Xufei Mao, Zhichao Cao, Daibo Liu, Yunhao Liu","doi":"10.1109/ICPP.2015.26","DOIUrl":"https://doi.org/10.1109/ICPP.2015.26","url":null,"abstract":"In distributed networks such as wireless ad hoc networks, local and lossy logs are often available on individual nodes. We propose REFILL, which analyzes lossy and unsynchronized logs collected from individual nodes and reconstructs the network behaviors. We design an inference engine based on protocol semantics to abstract states on each node. Further we leverage inherent and implicit event correlations in and between nodes to connect interference engines and analyze logs from different nodes. Based on unsynchronized and incomplete logs, REFILL can reconstruct network behavior, recover the network scenario and understand what has happened in the network. We show that the result of REFILL can be used to guide protocol design, network management, diagnosis, etc. We implement REFILL and apply it to a large-scale wireless sensor network project. REFILL provides a detailed per-packet tracing information based on event flows. We show that REFILL can reveal and verify fundamental issues, like locating packet loss positions and root causes. Further, we present implications and demonstrate how to leverage REFILL to enhance network performance.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122679391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store 基于rdma的键值存储加速HPC集群大数据分析I/O性能

2015 44th International Conference on Parallel Processing

Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.79

Nusrat S. Islam, D. Shankar, Xiaoyi Lu, Md. Wasi-ur-Rahman, D. Panda

Hadoop Distributed File System (HDFS) is the underlying storage engine of many Big Data processing frameworks such as Hadoop MapReduce, HBase, Hive, and Spark. Even though HDFS is well-known for its scalability and reliability, the requirement of large amount of local storage space makes HDFS deployment challenging on HPC clusters. Moreover, HPC clusters usually have large installation of parallel file system like Lustre. In this study, we propose a novel design to integrate HDFS with Lustre through a high performance key-value store. We design a burst buffer system using RDMA-based Mem cached and present three schemes to integrate HDFS with Lustre through this buffer layer, considering different aspects of I/O, data-locality, and fault-tolerance. Our proposed schemes can ensure performance improvement for Big Data applications on HPC clusters. At the same time, they lead to reduced local storage requirement. Performance evaluations show that, our design can improve the write performance of Test DFSIO by up to 2.6x over HDFS and 1.5x over Lustre. The gain in read throughput is up to 8x. Sort execution time is reduced by up to 28% over Lustre and 19% over HDFS. Our design can also significantly benefit I/O-intensive workloads compared to both HDFS and Lustre.

HDFS (Hadoop Distributed File System)是许多大数据处理框架(如Hadoop MapReduce、HBase、Hive、Spark)的底层存储引擎。尽管HDFS以其可扩展性和可靠性而闻名，但对大量本地存储空间的需求使得HDFS在HPC集群上的部署具有挑战性。此外，HPC集群通常安装大量的并行文件系统，如Lustre。在这项研究中，我们提出了一种新的设计，通过高性能的键值存储将HDFS与Lustre集成在一起。我们使用基于rdma的Mem缓存设计了一个突发缓冲系统，并提出了三种方案通过该缓冲层将HDFS与Lustre集成，考虑了I/O，数据局域性和容错的不同方面。我们提出的方案可以确保高性能计算集群上大数据应用的性能提升。同时，它们减少了本地存储需求。性能评估表明，我们的设计可以将Test DFSIO的写性能比HDFS提高2.6倍，比Lustre提高1.5倍。读吞吐量的增益高达8倍。排序执行时间比Lustre减少了28%，比HDFS减少了19%。与HDFS和Lustre相比，我们的设计还可以显著改善I/ o密集型工作负载。

{"title":"Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store","authors":"Nusrat S. Islam, D. Shankar, Xiaoyi Lu, Md. Wasi-ur-Rahman, D. Panda","doi":"10.1109/ICPP.2015.79","DOIUrl":"https://doi.org/10.1109/ICPP.2015.79","url":null,"abstract":"Hadoop Distributed File System (HDFS) is the underlying storage engine of many Big Data processing frameworks such as Hadoop MapReduce, HBase, Hive, and Spark. Even though HDFS is well-known for its scalability and reliability, the requirement of large amount of local storage space makes HDFS deployment challenging on HPC clusters. Moreover, HPC clusters usually have large installation of parallel file system like Lustre. In this study, we propose a novel design to integrate HDFS with Lustre through a high performance key-value store. We design a burst buffer system using RDMA-based Mem cached and present three schemes to integrate HDFS with Lustre through this buffer layer, considering different aspects of I/O, data-locality, and fault-tolerance. Our proposed schemes can ensure performance improvement for Big Data applications on HPC clusters. At the same time, they lead to reduced local storage requirement. Performance evaluations show that, our design can improve the write performance of Test DFSIO by up to 2.6x over HDFS and 1.5x over Lustre. The gain in read throughput is up to 8x. Sort execution time is reduced by up to 28% over Lustre and 19% over HDFS. Our design can also significantly benefit I/O-intensive workloads compared to both HDFS and Lustre.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124966517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27