Big data processing with 1D-Crosspoint Arrays

IF 0.7 Q4 COMPUTER SCIENCE, THEORY & METHODS International Journal of Parallel Emergent and Distributed Systems Pub Date : 2023-02-27 DOI:10.1080/17445760.2023.2172574

Taeyoung An, A. Oruç

{"title":"Big data processing with 1D-Crosspoint Arrays","authors":"Taeyoung An, A. Oruç","doi":"10.1080/17445760.2023.2172574","DOIUrl":null,"url":null,"abstract":"Increased chip densities offer massive computation power to deal with fundamental big data operations such as searching and sorting. At the same time, the proliferation of processing elements (PEs) in such multicore chips together with the employment of more aggressive parallel algorithms cause the amount of space needed for interprocessor communications to dominate the overall chip space, potentially resulting in reduced computational efficiency. To overcome this issue, this paper introduces a new architecture that uses simple crosspoint switches to pair PEs instead of a complex interconnection network. This new architecture may be viewed as a ‘quadratic’ array of processors as it uses PEs rather than PEs as in linear array processor models. The switches between adjacent PEs are created using a cyclic permutation wiring idea with PEs and as many crosspoints. We demonstrate the versatility of this new parallel architecture by designing fast algorithms to sort and search a list of n elements with it. In particular, we show that finding a minimum, maximum, and searching a list of n elements can all be performed on this parallel architecture in time with additional elementary logic gates with fan-in and in time with fan-in. We further show that sorting a list of n elements can also be carried out in time using additional elementary logic gates with fan-in and threshold logic gates on the same parallel architecture. The sorting time increases to if only elementary logic gates with fan-in are used. In addition, we establish how similar queries can be handled within the same order of time complexities. We use this new parallel architecture to perform sorting and searching on big data on three different models. The first of these models provides an efficient implementation of enumeration sorting and searching for moderate size big data sets. The second model offers increased parallelism by replication of the new parallel architecture but its hardware complexity limits its use to moderate size big data sets as well. The third model removes this limitation by introducing a tradeoff parameter between the time and hardware complexity of the overall computation, thereby providing an optimal use of available resources within a given chip-set space.","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":"38 1","pages":"249 - 274"},"PeriodicalIF":0.7000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Emergent and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/17445760.2023.2172574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Increased chip densities offer massive computation power to deal with fundamental big data operations such as searching and sorting. At the same time, the proliferation of processing elements (PEs) in such multicore chips together with the employment of more aggressive parallel algorithms cause the amount of space needed for interprocessor communications to dominate the overall chip space, potentially resulting in reduced computational efficiency. To overcome this issue, this paper introduces a new architecture that uses simple crosspoint switches to pair PEs instead of a complex interconnection network. This new architecture may be viewed as a ‘quadratic’ array of processors as it uses PEs rather than PEs as in linear array processor models. The switches between adjacent PEs are created using a cyclic permutation wiring idea with PEs and as many crosspoints. We demonstrate the versatility of this new parallel architecture by designing fast algorithms to sort and search a list of n elements with it. In particular, we show that finding a minimum, maximum, and searching a list of n elements can all be performed on this parallel architecture in time with additional elementary logic gates with fan-in and in time with fan-in. We further show that sorting a list of n elements can also be carried out in time using additional elementary logic gates with fan-in and threshold logic gates on the same parallel architecture. The sorting time increases to if only elementary logic gates with fan-in are used. In addition, we establish how similar queries can be handled within the same order of time complexities. We use this new parallel architecture to perform sorting and searching on big data on three different models. The first of these models provides an efficient implementation of enumeration sorting and searching for moderate size big data sets. The second model offers increased parallelism by replication of the new parallel architecture but its hardware complexity limits its use to moderate size big data sets as well. The third model removes this limitation by introducing a tradeoff parameter between the time and hardware complexity of the overall computation, thereby providing an optimal use of available resources within a given chip-set space.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

1D交叉点阵列的大数据处理

增加的芯片密度提供了巨大的计算能力来处理基本的大数据操作，如搜索和排序。同时，这种多核芯片中处理元素(pe)的激增以及更激进的并行算法的使用导致处理器间通信所需的空间量占据了整个芯片空间，从而可能导致计算效率降低。为了克服这个问题，本文引入了一种新的架构，使用简单的交叉点交换机对pe进行配对，而不是复杂的互连网络。这种新架构可以被视为一个“二次”处理器阵列，因为它使用pe而不是线性阵列处理器模型中的pe。相邻pe之间的开关使用pe和尽可能多的交叉点的循环排列布线思想创建。我们通过设计快速算法来对包含n个元素的列表进行排序和搜索，从而展示了这种新的并行架构的多功能性。特别地，我们证明了查找最小值、最大值和搜索n个元素的列表都可以在这个并行架构上及时执行，使用额外的基本逻辑门(带扇入)和及时执行扇入。我们进一步表明，在相同的并行架构上，使用带有扇入和阈值逻辑门的附加基本逻辑门也可以及时地对n个元素的列表进行排序。如果只使用带扇入的初级逻辑门，则排序时间会增加。此外，我们还确定了如何在相同的时间复杂度内处理类似的查询。我们使用这种新的并行架构在三种不同的模型上对大数据进行排序和搜索。第一个模型为中等规模的大数据集提供了枚举排序和搜索的有效实现。第二种模型通过复制新的并行架构提供了更高的并行性，但其硬件复杂性也限制了其用于中等规模的大数据集。第三种模型通过在整体计算的时间和硬件复杂性之间引入权衡参数来消除这一限制，从而在给定的芯片组空间内提供对可用资源的最佳利用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Parallel Emergent and Distributed Systems COMPUTER SCIENCE, THEORY & METHODS-

CiteScore

2.30

自引率

0.00%

发文量