2014 International Workshop on Data Intensive Scalable Computing Systems最新文献

英文中文

CULZSS-Bit: A Bit-Vector Algorithm for Lossless Data Compression on GPGPUs CULZSS-Bit:一种gpgpu上无损数据压缩的位矢量算法

2014 International Workshop on Data Intensive Scalable Computing Systems

Pub Date : 2014-11-16 DOI: 10.1109/DISCS.2014.9

Adnan Ozsoy

In this paper, we describe an algorithm to improve dictionary based lossless data compression on GPGPUs. The presented algorithm uses bit-wise computations and leverages bit parallelism for the core part of the algorithm which is the longest prefix match calculations. Using bit parallelism, also known as bit-vector approach, is a fundamentally new approach for data compression and promising in performance for hybrid CPU-GPU environments.The implementation of the new compression algorithm on GPUs improves the performance of the compression process compared to the previous attempts. Moreover, the bit-vector approach opens new opportunities for improvement and increases the applicability of popular heterogeneous environments.

本文提出了一种改进gpgpu上基于字典的无损数据压缩算法。该算法采用逐位计算，并利用位并行性作为算法的核心部分，即最长的前缀匹配计算。使用位并行，也被称为位矢量方法，是一种全新的数据压缩方法，在混合CPU-GPU环境中具有良好的性能。新的压缩算法在gpu上的实现与之前的尝试相比，提高了压缩过程的性能。此外，位向量方法为改进提供了新的机会，并增加了流行的异构环境的适用性。

引用次数: 8

A Caching Approach to Reduce Communication in Graph Search Algorithms 图搜索算法中减少通信的缓存方法

2014 International Workshop on Data Intensive Scalable Computing Systems

Pub Date : 2014-11-16 DOI: 10.1109/DISCS.2014.8

Pietro Cicotti, L. Carrington

In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a graph search, it is possible to capitalize on this characteristic and cache relevant information in high-degree vertexes. We applied this idea by caching remote vertex ids in a parallel breadth-first search implementation, and demonstrated 1.6x to 2.4x speedup over the reference implementation on 64 to 1024 cores. We proposed a system design in which resources are dedicated exclusively to caching, and shared among a set of nodes. Our evaluation demonstrates that this design has the potential to reduce communication and improve performance over large scale systems. Finally, we used a memcached system as the cache server finding that a generic protocol that does not match the usage semantics may hinder the potential performance improvements.

在许多科学和计算领域，图形被用来表示和分析数据。这样的图通常表现出小世界网络的特征:很少有高度顶点连接许多低度顶点。尽管图搜索具有随机性，但可以利用这一特性并将相关信息缓存在高阶顶点中。我们通过在并行宽度优先搜索实现中缓存远程顶点id来应用这个想法，并演示了在64到1024核上比参考实现提高1.6到2.4倍的速度。我们提出了一种系统设计，其中资源专门用于缓存，并在一组节点之间共享。我们的评估表明，这种设计具有减少通信和提高大型系统性能的潜力。最后，我们使用memcached系统作为缓存服务器，发现与使用语义不匹配的通用协议可能会阻碍潜在的性能改进。

引用次数: 3

Mapping of RAID Controller Performance Data to the Job History on Large Computing Systems 大型计算系统中RAID控制器性能数据到作业历史的映射

2014 International Workshop on Data Intensive Scalable Computing Systems

Pub Date : 2014-11-16 DOI: 10.1109/DISCS.2014.7

Marc Hartung, Michael Kluge

For systems executing a mixture of different data intensive applications in parallel there is always the question about the impact that each application has on the storage subsystem. From the perspective of storage, I/O is typically anonymous as it does not contain user identifiers or similar information. This paper focuses on the analysis of performance data collected on shared system components like global file systems that can not be mapped back to user activities immediately. Our approach classifies user jobs based on their properties into classes and correlates these classes with global timelines. Within the paper we will show details of the clustering algorithm, depict our measurement environment and present first results. The results are valuable for tuning HPC storage system to achieve an optimized behavior on a global system level or to separate users into classes with different I/O demands.

对于并行执行不同数据密集型应用程序的系统，总是存在每个应用程序对存储子系统的影响的问题。从存储的角度来看，I/O通常是匿名的，因为它不包含用户标识符或类似信息。本文主要分析在共享系统组件(如全局文件系统)上收集的性能数据，这些组件不能立即映射回用户活动。我们的方法根据用户作业的属性将其分类为类，并将这些类与全局时间轴关联起来。在本文中，我们将展示聚类算法的细节，描述我们的测量环境并给出初步结果。这些结果对于调优HPC存储系统以在全局系统级别上实现优化行为或将用户划分为具有不同I/O需求的类非常有价值。

引用次数: 3

BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution BPAR:用于解耦I/O执行的基于绑定的并行聚合框架

2014 International Workshop on Data Intensive Scalable Computing Systems

Pub Date : 2014-11-16 DOI: 10.1109/DISCS.2014.6

Teng Wang, K. Vasko, Zhuo Liu, Hui Chen, Weikuan Yu

In today's "Big Data" era, developers have adopted I/O techniques such as MPI-IO, Parallel NetCDF and HDF5 to garner enough performance to manage the vast amount of data that scientific applications require. These I/O techniques offer parallel access to shared datasets and together with a set of optimizations such as data sieving and two-phase I/O to boost I/O throughput. While most of these techniques focus on optimizing the access pattern on a single file or file extent, few of these techniques consider cross-file I/O optimizations. This paper aims to explore the potential benefit from cross-file I/O aggregation. We propose a Bundle-based PARallel Aggregation framework (BPAR) and design three partitioning schemes under such framework that targets at improving the I/O performance of a mission-critical application GEOS-5, as well as a broad range of other scientific applications. The results of our experiments reveal that BPAR can achieve on average 2.1× performance improvement over the baseline GEOS-5.

在当今的“大数据”时代，开发人员已经采用了MPI-IO、并行NetCDF和HDF5等I/O技术来获得足够的性能来管理科学应用所需的大量数据。这些I/O技术提供了对共享数据集的并行访问，并提供了一组优化，如数据筛选和两阶段I/O，以提高I/O吞吐量。虽然这些技术中的大多数都侧重于优化单个文件或文件范围上的访问模式，但这些技术中很少考虑跨文件I/O优化。本文旨在探讨跨文件I/O聚合的潜在好处。我们提出了一个基于bundle的并行聚合框架(BPAR)，并在该框架下设计了三种分区方案，旨在提高关键任务应用GEOS-5以及其他广泛的科学应用的I/O性能。实验结果表明，与基线GEOS-5相比，BPAR可以实现平均2.1倍的性能提升。

引用次数: 6

Efficient, Failure Resilient Transactions for Parallel and Distributed Computing 并行和分布式计算的高效、故障弹性事务

2014 International Workshop on Data Intensive Scalable Computing Systems

Pub Date : 2014-11-16 DOI: 10.1109/DISCS.2014.13

J. Lofstead, Jai Dayal, I. Jimenez, C. Maltzahn

Scientific simulations are moving away from using centralized persistent storage for intermediate data between workflow steps towards an all online model. This shift is motivated by the relatively slow IO bandwidth growth compared with compute speed increases. The challenges presented by this shift to Integrated Application Workflows are motivated by the loss of persistent storage semantics for node-to-node communication. One step towards addressing this semantics gap is using transactions to logically delineate a data set from 100,000s of processes to 1000s of servers as an atomic unit. Our previously demonstrated Doubly Distributed Transactions (D2T) protocol showed a high-performance solution, but had not explored how to detect and recover from faults. Instead, the focus was on demonstrating high-performance typical case performance. The research presented here addresses fault detection and recovery based on the enhanced protocol design. The total overhead for a full transaction with multiple operations at 65,536 processes is on average 0.055 seconds. Fault detection and recovery mechanisms demonstrate similar performance to the success case with only the addition of appropriate timeouts for the system. This paper explores the challenges in designing a recoverable protocol for doubly distributed transactions, particularly for parallel computing environments.

科学模拟正在从使用集中式持久存储工作流步骤之间的中间数据转向全在线模型。这种转变是由于IO带宽的增长相对于计算速度的增长相对缓慢。这种向集成应用程序工作流的转变所带来的挑战是由于节点到节点通信的持久存储语义的丢失。解决这种语义差距的一个步骤是使用事务在逻辑上将从100,000个进程到数千个服务器的数据集描述为一个原子单元。我们之前演示的双分布式事务(D2T)协议展示了一种高性能解决方案，但没有探讨如何检测故障并从故障中恢复。相反，重点是演示高性能的典型案例性能。本文研究的是基于增强协议设计的故障检测和恢复。在65,536个进程中，包含多个操作的完整事务的总开销平均为0.055秒。故障检测和恢复机制显示了与成功案例相似的性能，只是为系统添加了适当的超时。本文探讨了为双重分布式事务设计可恢复协议的挑战，特别是在并行计算环境中。

{"title":"Efficient, Failure Resilient Transactions for Parallel and Distributed Computing","authors":"J. Lofstead, Jai Dayal, I. Jimenez, C. Maltzahn","doi":"10.1109/DISCS.2014.13","DOIUrl":"https://doi.org/10.1109/DISCS.2014.13","url":null,"abstract":"Scientific simulations are moving away from using centralized persistent storage for intermediate data between workflow steps towards an all online model. This shift is motivated by the relatively slow IO bandwidth growth compared with compute speed increases. The challenges presented by this shift to Integrated Application Workflows are motivated by the loss of persistent storage semantics for node-to-node communication. One step towards addressing this semantics gap is using transactions to logically delineate a data set from 100,000s of processes to 1000s of servers as an atomic unit. Our previously demonstrated Doubly Distributed Transactions (D2T) protocol showed a high-performance solution, but had not explored how to detect and recover from faults. Instead, the focus was on demonstrating high-performance typical case performance. The research presented here addresses fault detection and recovery based on the enhanced protocol design. The total overhead for a full transaction with multiple operations at 65,536 processes is on average 0.055 seconds. Fault detection and recovery mechanisms demonstrate similar performance to the success case with only the addition of appropriate timeouts for the system. This paper explores the challenges in designing a recoverable protocol for doubly distributed transactions, particularly for parallel computing environments.","PeriodicalId":278119,"journal":{"name":"2014 International Workshop on Data Intensive Scalable Computing Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133574163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Distributed Multipath Routing Algorithm for Data Center Networks 数据中心网络的分布式多路径路由算法

2014 International Workshop on Data Intensive Scalable Computing Systems

Pub Date : 2014-11-16 DOI: 10.1109/DISCS.2014.14

Eun-Sung Jung, V. Vishwanath, R. Kettimuthu

Multipath routing has been studied in diverse contexts such as wide-area networks and wireless networks in order to minimize the finish time of data transfer or the latency of message sending. The fast adoption of cloud computing for various applications including high-performance computing applications has drawn more attention to efficient network utilization through adaptive or multipath routing methods. However, the previous studies have not exploited multiple paths in an optimized way while scaling well with a large number of hosts for some reasons such as high time complexity of algorithms.In this paper, we propose a scalable distributed flow scheduling algorithm that can exploit multiple paths in data center networks. We develop our algorithm based on linear programming and evaluate the algorithm in FatTree network topologies, one of several advanced data center network topologies. The results show that the distributed algorithm performs much better than the centralized algorithm in terms of running time and is comparable to the centralized algorithm within 10% increased finish time in terms of data transfer time.

为了最大限度地减少数据传输的完成时间或消息发送的延迟，在广域网和无线网络等各种环境下对多路径路由进行了研究。云计算在各种应用程序(包括高性能计算应用程序)中的快速采用，引起了人们对通过自适应或多路径路由方法高效利用网络的关注。然而，由于算法的时间复杂度等原因，以往的研究并没有以优化的方式利用多路径，同时在大量主机下也能很好地扩展。在本文中，我们提出了一种可扩展的分布式流调度算法，可以利用数据中心网络中的多路径。我们开发了基于线性规划的算法，并在FatTree网络拓扑(几种先进的数据中心网络拓扑之一)中对算法进行了评估。结果表明，分布式算法在运行时间上明显优于集中式算法，在数据传输时间上与集中式算法相比，完成时间增加了10%以内。

{"title":"Distributed Multipath Routing Algorithm for Data Center Networks","authors":"Eun-Sung Jung, V. Vishwanath, R. Kettimuthu","doi":"10.1109/DISCS.2014.14","DOIUrl":"https://doi.org/10.1109/DISCS.2014.14","url":null,"abstract":"Multipath routing has been studied in diverse contexts such as wide-area networks and wireless networks in order to minimize the finish time of data transfer or the latency of message sending. The fast adoption of cloud computing for various applications including high-performance computing applications has drawn more attention to efficient network utilization through adaptive or multipath routing methods. However, the previous studies have not exploited multiple paths in an optimized way while scaling well with a large number of hosts for some reasons such as high time complexity of algorithms.In this paper, we propose a scalable distributed flow scheduling algorithm that can exploit multiple paths in data center networks. We develop our algorithm based on linear programming and evaluate the algorithm in FatTree network topologies, one of several advanced data center network topologies. The results show that the distributed algorithm performs much better than the centralized algorithm in terms of running time and is comparable to the centralized algorithm within 10% increased finish time in terms of data transfer time.","PeriodicalId":278119,"journal":{"name":"2014 International Workshop on Data Intensive Scalable Computing Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127984220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

PSA: A Performance and Space-Aware Data Layout Scheme for Hybrid Parallel File Systems 混合并行文件系统的性能和空间感知数据布局方案

2014 International Workshop on Data Intensive Scalable Computing Systems

Pub Date : 2014-11-16 DOI: 10.1109/DISCS.2014.10

Shuibing He, Yan Liu, Xian-He Sun

The underlying storage of hybrid parallel file systems (PFS) is composed of both SSD-based file servers (SServer) and HDD-based file servers (HServer). Unlike a traditional HServer, an SServer consistently provides improved storage performance but lacks storage space. However, most current data layout schemes do not consider the differences in performance and space between heterogeneous servers, and may significantly degrade the performance of the hybrid PFSs. In this paper, we propose PSA, a novel data layout scheme, which maximizes the hybrid PFSs performance by applying adaptive varied-size file stripes. PSA dispatches data on heterogeneous file servers not only based on storage performance but also storage space. We have implemented PSA within OrangeFS, a popular parallel file system in the HPC domain. Our extensive experiments using a representative benchmark show that PSA provides superior I/O throughput than the default and performance-aware file data layout schemes.

混合并行文件系统(PFS)的底层存储由基于ssd的文件服务器(SServer)和基于hdd的文件服务器(HServer)组成。与传统的HServer不同，SServer的存储性能不断提高，但存储空间不足。然而，大多数当前的数据布局方案没有考虑异构服务器之间的性能和空间差异，这可能会显著降低混合pfs的性能。在本文中，我们提出了一种新的数据布局方案PSA，该方案通过应用自适应变大小文件条带来最大化混合pfs的性能。PSA在异构文件服务器上调度数据，不仅考虑存储性能，还考虑存储空间。我们已经在高性能计算领域中流行的并行文件系统OrangeFS中实现了PSA。我们使用具有代表性的基准进行了广泛的实验，结果表明PSA提供了比默认和性能敏感的文件数据布局方案更好的I/O吞吐量。

引用次数: 13

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 International Workshop on Data Intensive Scalable Computing Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀