SHAD:可扩展的高性能算法和数据结构库

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI:10.1109/CCGRID.2018.00071

Vito Giovanni Castellana, Marco Minutoli

{"title":"SHAD:可扩展的高性能算法和数据结构库","authors":"Vito Giovanni Castellana, Marco Minutoli","doi":"10.1109/CCGRID.2018.00071","DOIUrl":null,"url":null,"abstract":"The unprecedented amount of data that needs to be processed in emerging data analytics applications poses novel challenges to industry and academia. Scalability and high performance become more than a desirable feature because, due to the scale and the nature of the problems, they draw the line between what is achievable and what is unfeasible. In this paper, we propose SHAD, the Scalable High-performance Algorithms and Data-structures library. SHAD adopts a modular design that confines low level details and promotes reuse. SHAD's core is built on an Abstract Runtime Interface which enhances portability and identifies the minimal set of features of the underlying system required by the framework. The core library includes common data-structures such as: Array, Vector, Map and Set. These are designed to accommodate significant amount of data which can be accessed in massively parallel environments, and used as building blocks for SHAD extensions, i.e. higher level software libraries. We have validated and evaluated our design with a performance and scalability study of the core components of the library. We have validated the design flexibility by proposing a Graph Library as an example of SHAD extension, which implements two different graph data-structures; we evaluate their performance with a set of graph applications. Experimental results show that the approach is promising in terms of both performance and scalability. On a distributed system with 320 cores, SHAD Arrays are able to sustain a throughput of 65 billion operations per second, while SHAD Maps sustain 1 billion of operations per second. Algorithms implemented using the Graph Library exhibit performance and scalability comparable to a custom solution, but with smaller development effort.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"SHAD: The Scalable High-Performance Algorithms and Data-Structures Library\",\"authors\":\"Vito Giovanni Castellana, Marco Minutoli\",\"doi\":\"10.1109/CCGRID.2018.00071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The unprecedented amount of data that needs to be processed in emerging data analytics applications poses novel challenges to industry and academia. Scalability and high performance become more than a desirable feature because, due to the scale and the nature of the problems, they draw the line between what is achievable and what is unfeasible. In this paper, we propose SHAD, the Scalable High-performance Algorithms and Data-structures library. SHAD adopts a modular design that confines low level details and promotes reuse. SHAD's core is built on an Abstract Runtime Interface which enhances portability and identifies the minimal set of features of the underlying system required by the framework. The core library includes common data-structures such as: Array, Vector, Map and Set. These are designed to accommodate significant amount of data which can be accessed in massively parallel environments, and used as building blocks for SHAD extensions, i.e. higher level software libraries. We have validated and evaluated our design with a performance and scalability study of the core components of the library. We have validated the design flexibility by proposing a Graph Library as an example of SHAD extension, which implements two different graph data-structures; we evaluate their performance with a set of graph applications. Experimental results show that the approach is promising in terms of both performance and scalability. On a distributed system with 320 cores, SHAD Arrays are able to sustain a throughput of 65 billion operations per second, while SHAD Maps sustain 1 billion of operations per second. Algorithms implemented using the Graph Library exhibit performance and scalability comparable to a custom solution, but with smaller development effort.\",\"PeriodicalId\":321027,\"journal\":{\"name\":\"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2018.00071\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2018.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

新兴的数据分析应用程序需要处理前所未有的数据量，这给工业界和学术界带来了新的挑战。可伸缩性和高性能不仅仅是一个理想的特性，因为由于问题的规模和性质，它们在可实现和不可实现之间划清了界限。在本文中，我们提出了SHAD，可扩展的高性能算法和数据结构库。SHAD采用模块化设计，限制底层细节并促进重用。SHAD的核心是建立在一个抽象运行时接口上的，它增强了可移植性，并识别了框架所需的底层系统的最小特性集。核心库包括常用的数据结构，如:Array、Vector、Map和Set。它们的设计是为了容纳大量的数据，这些数据可以在大规模并行环境中访问，并用作SHAD扩展的构建块，即更高级别的软件库。我们通过对库的核心组件进行性能和可扩展性研究来验证和评估我们的设计。我们通过提出一个图库作为SHAD扩展的一个例子来验证设计的灵活性，它实现了两种不同的图数据结构;我们用一组图形应用程序来评估它们的性能。实验结果表明，该方法在性能和可扩展性方面都是有希望的。在具有320核的分布式系统上，SHAD阵列能够维持每秒650亿次操作的吞吐量，而SHAD映射能够维持每秒10亿次操作。使用Graph Library实现的算法表现出与定制解决方案相当的性能和可伸缩性，但开发工作量更小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SHAD: The Scalable High-Performance Algorithms and Data-Structures Library

The unprecedented amount of data that needs to be processed in emerging data analytics applications poses novel challenges to industry and academia. Scalability and high performance become more than a desirable feature because, due to the scale and the nature of the problems, they draw the line between what is achievable and what is unfeasible. In this paper, we propose SHAD, the Scalable High-performance Algorithms and Data-structures library. SHAD adopts a modular design that confines low level details and promotes reuse. SHAD's core is built on an Abstract Runtime Interface which enhances portability and identifies the minimal set of features of the underlying system required by the framework. The core library includes common data-structures such as: Array, Vector, Map and Set. These are designed to accommodate significant amount of data which can be accessed in massively parallel environments, and used as building blocks for SHAD extensions, i.e. higher level software libraries. We have validated and evaluated our design with a performance and scalability study of the core components of the library. We have validated the design flexibility by proposing a Graph Library as an example of SHAD extension, which implements two different graph data-structures; we evaluate their performance with a set of graph applications. Experimental results show that the approach is promising in terms of both performance and scalability. On a distributed system with 320 cores, SHAD Arrays are able to sustain a throughput of 65 billion operations per second, while SHAD Maps sustain 1 billion of operations per second. Algorithms implemented using the Graph Library exhibit performance and scalability comparable to a custom solution, but with smaller development effort.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量