一种快速拓扑并行遍历大型数据集的算法

International Journal of Information Technology and Computer Science Pub Date : 2023-02-08 DOI:10.5815/ijitcs.2023.01.01

T. N. Rodrigues

{"title":"一种快速拓扑并行遍历大型数据集的算法","authors":"T. N. Rodrigues","doi":"10.5815/ijitcs.2023.01.01","DOIUrl":null,"url":null,"abstract":"This work presents a parallel implementation of a graph-generating algorithm designed to be straightforwardly adapted to traverse large datasets. This new approach has been validated in a correlated scenario known as the word ladder problem. The new parallel algorithm induces the same topological structure proposed by its serial version and also builds the shortest path between any pair of words to be connected by a ladder of words. The implemented parallelism paradigm is the Multiple Instruction Stream - Multiple Data Stream (MIMD) and the test suite embraces 23-word ladder instances whose intermediate words were extracted from a dictionary of 183,719 words (dataset). The word morph quality (the shortest path between two input words) and the word morph performance (CPU time) were evaluated against a serial implementation of the original algorithm. The proposed parallel algorithm generated the optimal solution for each pair of words tested, that is, the minimum word ladder connecting an initial word to a final word was found. Thus, there was no negative impact on the quality of the solutions comparing them with those obtained through the serial ANG algorithm. However, there was an outstanding improvement considering the CPU time required to build the word ladder solutions. In fact, the time improvement was up to 99.85%, and speedups greater than 2.0X were achieved with the parallel algorithm.","PeriodicalId":130361,"journal":{"name":"International Journal of Information Technology and Computer Science","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Fast Topological Parallel Algorithm for Traversing Large Datasets\",\"authors\":\"T. N. Rodrigues\",\"doi\":\"10.5815/ijitcs.2023.01.01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work presents a parallel implementation of a graph-generating algorithm designed to be straightforwardly adapted to traverse large datasets. This new approach has been validated in a correlated scenario known as the word ladder problem. The new parallel algorithm induces the same topological structure proposed by its serial version and also builds the shortest path between any pair of words to be connected by a ladder of words. The implemented parallelism paradigm is the Multiple Instruction Stream - Multiple Data Stream (MIMD) and the test suite embraces 23-word ladder instances whose intermediate words were extracted from a dictionary of 183,719 words (dataset). The word morph quality (the shortest path between two input words) and the word morph performance (CPU time) were evaluated against a serial implementation of the original algorithm. The proposed parallel algorithm generated the optimal solution for each pair of words tested, that is, the minimum word ladder connecting an initial word to a final word was found. Thus, there was no negative impact on the quality of the solutions comparing them with those obtained through the serial ANG algorithm. However, there was an outstanding improvement considering the CPU time required to build the word ladder solutions. In fact, the time improvement was up to 99.85%, and speedups greater than 2.0X were achieved with the parallel algorithm.\",\"PeriodicalId\":130361,\"journal\":{\"name\":\"International Journal of Information Technology and Computer Science\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information Technology and Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijitcs.2023.01.01\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijitcs.2023.01.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

这项工作提出了一个图形生成算法的并行实现，旨在直接适应遍历大型数据集。这种新方法已经在一个被称为单词阶梯问题的相关场景中得到了验证。新的并行算法引入了与串行算法相同的拓扑结构，并通过单词阶梯建立了任意一对单词之间的最短路径。实现的并行性范例是多指令流-多数据流(MIMD)，测试套件包含23个单词的阶梯实例，其中间单词是从183,719个单词的字典(数据集)中提取的。根据原始算法的串行实现，评估了词形质量(两个输入词之间的最短路径)和词形性能(CPU时间)。提出的并行算法为每对被测单词生成最优解，即找到连接初始单词和最终单词的最小单词阶梯。因此，与串行ANG算法得到的解相比，对解的质量没有负面影响。然而，考虑到构建单词阶梯解决方案所需的CPU时间，这是一个显著的改进。实际上，该并行算法的时间改进可达99.85%，速度提升大于2.0X。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Fast Topological Parallel Algorithm for Traversing Large Datasets

This work presents a parallel implementation of a graph-generating algorithm designed to be straightforwardly adapted to traverse large datasets. This new approach has been validated in a correlated scenario known as the word ladder problem. The new parallel algorithm induces the same topological structure proposed by its serial version and also builds the shortest path between any pair of words to be connected by a ladder of words. The implemented parallelism paradigm is the Multiple Instruction Stream - Multiple Data Stream (MIMD) and the test suite embraces 23-word ladder instances whose intermediate words were extracted from a dictionary of 183,719 words (dataset). The word morph quality (the shortest path between two input words) and the word morph performance (CPU time) were evaluated against a serial implementation of the original algorithm. The proposed parallel algorithm generated the optimal solution for each pair of words tested, that is, the minimum word ladder connecting an initial word to a final word was found. Thus, there was no negative impact on the quality of the solutions comparing them with those obtained through the serial ANG algorithm. However, there was an outstanding improvement considering the CPU time required to build the word ladder solutions. In fact, the time improvement was up to 99.85%, and speedups greater than 2.0X were achieved with the parallel algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Information Technology and Computer Science

自引率

0.00%

发文量