Selection of equifrequent word fragments for information retrieval

Information Storage and Retrieval Pub Date : 1973-12-01 Epub Date: 2002-08-28 DOI:10.1016/0020-0271(73)90011-9

E.J. Schuegraf, H.S. Heaps

引用次数: 46

Abstract

The design of programs to search large document data bases is discussed with regard to the use of compression coding combined with adoption of word fragments as the basic language elements. An algorithm is described for determination of a set of almost equifrequent fragments. Its efficiency is tested for a sample data base formed from the MARC tapes. A certain threshold frequency acts as a parameter whose value determines the number of distinct fragments. The selection algorithm is designed to give some preference to choice of the longest fragments and hence allow compact coding of the data base by concatenation of non-overlapping fragments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

信息检索中高频词片段的选择

讨论了采用压缩编码并以词片段为基本语言元素的大型文档数据库搜索程序的设计。描述了一种用于确定一组几乎相等的片段的算法。对一个由MARC磁带组成的样本数据库进行了效率测试。某个阈值频率作为一个参数，其值决定了不同片段的数量。选择算法的目的是给予一些选择最长片段的优先权，从而允许通过连接非重叠片段对数据库进行紧凑编码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Storage and Retrieval

自引率

0.00%

发文量

期刊最新文献

Information Storage: A Multidisciplinary Perspective Computer systems in the library: A handbook for managers and designers Knowing books and men: Knowing computers, too Grundlagen universaler wissensordnung; probleme und möglichkeiten eines universalen klassifikationssystems des wissens Resource sharing in libraries: Why, how, when, next action steps