Distributed parallel generation of indices for very large text databases

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing Pub Date : 1997-12-10 DOI:10.1109/ICAPP.1997.651539

João Paulo W. Kitajima, M. D. Resende, B. Ribeiro-Neto, N. Ziviani

引用次数: 8

Abstract

We propose a new algorithm for the parallel generation of suffix arrays for large text databases on high-bandwidth computer networks. Suffix arrays are structures used in full text indexing which support very powerful query languages. Our algorithm is based on a parallel indirect mergesort (it is not a simple mergesort procedure) and is compared with a well known sequential algorithm (which is very efficient running on a single machine). Although network-bounded, the parallel version is theoretically and experimentally a much better alternative when compared to the sequential version (which is I/O-bounded in disk).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大型文本数据库索引的分布式并行生成

提出了一种基于高带宽计算机网络的大型文本数据库后缀数组并行生成算法。后缀数组是全文索引中使用的结构，它支持非常强大的查询语言。我们的算法基于并行间接归并排序(它不是一个简单的归并排序过程)，并与众所周知的顺序算法(在单台机器上运行非常有效)进行了比较。尽管有网络限制，但与顺序版本(磁盘中有I/ o限制)相比，并行版本在理论上和实验上都是更好的选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

自引率

0.00%

发文量