{"title":"Selection of equifrequent word fragments for information retrieval","authors":"E.J. Schuegraf, H.S. Heaps","doi":"10.1016/0020-0271(73)90011-9","DOIUrl":null,"url":null,"abstract":"<div><p>The design of programs to search large document data bases is discussed with regard to the use of compression coding combined with adoption of word fragments as the basic language elements. An algorithm is described for determination of a set of almost equifrequent fragments. Its efficiency is tested for a sample data base formed from the MARC tapes. A certain threshold frequency acts as a parameter whose value determines the number of distinct fragments. The selection algorithm is designed to give some preference to choice of the longest fragments and hence allow compact coding of the data base by concatenation of non-overlapping fragments.</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"9 12","pages":"Pages 697-711"},"PeriodicalIF":0.0000,"publicationDate":"1973-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(73)90011-9","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Storage and Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/0020027173900119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 46
Abstract
The design of programs to search large document data bases is discussed with regard to the use of compression coding combined with adoption of word fragments as the basic language elements. An algorithm is described for determination of a set of almost equifrequent fragments. Its efficiency is tested for a sample data base formed from the MARC tapes. A certain threshold frequency acts as a parameter whose value determines the number of distinct fragments. The selection algorithm is designed to give some preference to choice of the longest fragments and hence allow compact coding of the data base by concatenation of non-overlapping fragments.