Pub Date : 1974-07-01DOI: 10.1016/0020-0271(74)90020-5
George W. Adamson, Jillian Boreham
An automatic classification technique has been developed, based on the character structure of words. Dice's Similarity Coefficient is computed from the number of matching digrams in pairs of character strings, and used to cluster sets of character strings. A sample of words from a chemical data base was chosen to contain certain stems derived from the names of chemical elements. They were successfully clustered into groups of semantically related words. Each cluster is characterised by the root word from which all its members are derived. A second sample of titles from Mathematical Reviews was clustered into well-defined classes, which compare favourably with the subject groupings of Mathematical Reviews.
{"title":"The use of an association measure based on character structure to identify semantically related pairs of words and document titles","authors":"George W. Adamson, Jillian Boreham","doi":"10.1016/0020-0271(74)90020-5","DOIUrl":"10.1016/0020-0271(74)90020-5","url":null,"abstract":"<div><p>An automatic classification technique has been developed, based on the character structure of words. Dice's Similarity Coefficient is computed from the number of matching digrams in pairs of character strings, and used to cluster sets of character strings. A sample of words from a chemical data base was chosen to contain certain stems derived from the names of chemical elements. They were successfully clustered into groups of semantically related words. Each cluster is characterised by the root word from which all its members are derived. A second sample of titles from Mathematical Reviews was clustered into well-defined classes, which compare favourably with the subject groupings of Mathematical Reviews.</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 253-260"},"PeriodicalIF":0.0,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90020-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131421571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1974-07-01DOI: 10.1016/0020-0271(74)90031-X
Kjell Samuelson
{"title":"Computers and the problems of society","authors":"Kjell Samuelson","doi":"10.1016/0020-0271(74)90031-X","DOIUrl":"10.1016/0020-0271(74)90031-X","url":null,"abstract":"","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 286-287"},"PeriodicalIF":0.0,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90031-X","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76739148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1974-07-01DOI: 10.1016/0020-0271(74)90034-5
Ronald W. Eaves
{"title":"Security, accuracy and privacy in computer systems","authors":"Ronald W. Eaves","doi":"10.1016/0020-0271(74)90034-5","DOIUrl":"https://doi.org/10.1016/0020-0271(74)90034-5","url":null,"abstract":"","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 289-290"},"PeriodicalIF":0.0,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90034-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72248125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1974-07-01DOI: 10.1016/0020-0271(74)90023-0
L.E. Stanfel
The paper addresses the problem of finding doubly-chained tree structures for data storage which are best in the sense of minimizing maximum search time as opposed to the usual objective of minimizing average search time.
The feasibility of pursuing the latter invariably rests upon assuming a uniform distribution of inquiries, which is often not a valid assumption. As a result, some situations might be treated more appropriately by seeking solutions that minimize maximum search times. It is shown that for the case of equally costly horizontal and vertical search steps, the solution found for minimizing the average is at the same time a minimax solution. In the more general case, that is not necessarily so, but a minimax solution is easily found.
{"title":"A brief note on minimax optimal trees","authors":"L.E. Stanfel","doi":"10.1016/0020-0271(74)90023-0","DOIUrl":"10.1016/0020-0271(74)90023-0","url":null,"abstract":"<div><p>The paper addresses the problem of finding doubly-chained tree structures for data storage which are best in the sense of minimizing maximum search time as opposed to the usual objective of minimizing average search time.</p><p>The feasibility of pursuing the latter invariably rests upon assuming a uniform distribution of inquiries, which is often not a valid assumption. As a result, some situations might be treated more appropriately by seeking solutions that minimize maximum search times. It is shown that for the case of equally costly horizontal and vertical search steps, the solution found for minimizing the average is at the same time a minimax solution. In the more general case, that is not necessarily so, but a minimax solution is easily found.</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Page 279"},"PeriodicalIF":0.0,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90023-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116017103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1974-07-01DOI: 10.1016/0020-0271(74)90026-6
{"title":"The skyline of information processing","authors":"","doi":"10.1016/0020-0271(74)90026-6","DOIUrl":"https://doi.org/10.1016/0020-0271(74)90026-6","url":null,"abstract":"","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 284-285"},"PeriodicalIF":0.0,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90026-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72248131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1974-07-01DOI: 10.1016/0020-0271(74)90021-7
Krystyna Laus, Miroslaw Dabrowski
The paper presents a model of information retrieval system for the case when a hierarchic relation of descriptors is defined on the descriptor set. The base for this paper is the model of information retrieval system presented in [2] and [3].
{"title":"A model of information retrieval process for hierarchical set of descriptors","authors":"Krystyna Laus, Miroslaw Dabrowski","doi":"10.1016/0020-0271(74)90021-7","DOIUrl":"10.1016/0020-0271(74)90021-7","url":null,"abstract":"<div><p>The paper presents a model of information retrieval system for the case when a hierarchic relation of descriptors is defined on the descriptor set. The base for this paper is the model of information retrieval system presented in [2] and [3].</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 261-265"},"PeriodicalIF":0.0,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90021-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114544783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1974-07-01DOI: 10.1016/0020-0271(74)90022-9
Julie Bichteler, Ronald G. Parsons
The applicability of an automatic classification technique to information retrieval was investigated. A modified version of the Schiminovich algorithm was used to classify articles in the data base, utilizing citations found in their bibliographies and a “triggering file” of bibliographically related papers. Classes of articles were formed by comparing the citations in the articles with members of the triggering file. Those papers with similar citing patterns formed a group; citations occurring sufficiently often within the papers of a group formed a bibliography. Bibliographies in turn became new triggering files in an iterative procedure.
Results of this nontraditional method were compared with those of retrieval by standard American Institute of Physics (AIP) subject analysis of the same material. A combination of the two methods was also investigated to test the hypothesis that one could be used to augment or refine the other.
Nine cooperating physicists defined queries in terms of (1) the AIP classification scheme and (2) a list of articles likely to be cited in current relevant literature. The data base was 18 months of 1971–1972 physics journal articles (AIP SPIN tapes). The chief means of evaluating the results of the three retrieval approaches was a comparison of recall and precision values obtained from relevance judgments by the physicists.
Average precision for the AIP subject analysis was 17 per cent and for the citation processing 62 per cent. Based on an assumption of 100 per cent recall for the AIP analysis, average recall for the citation processing was 45 per cent. All but one physicist would prefer to have both the citation and subject approaches available for information retrieval. Sometimes, only a few relevant articles are desired; at other times comprehensiveness is necessary. However, if only one method were available, all except one participant would choose the subject approach. They felt that one must be able to retrieve all the relevant articles, even if that would mean examining 25–30 irrelevant notices for every relevant one. Several expressed the notion that looking at a file with a very high percentage of relevant articles made one wonder what was missing.
{"title":"Document retrieval by means of an automatic classification algorithm for citations","authors":"Julie Bichteler, Ronald G. Parsons","doi":"10.1016/0020-0271(74)90022-9","DOIUrl":"10.1016/0020-0271(74)90022-9","url":null,"abstract":"<div><p>The applicability of an automatic classification technique to information retrieval was investigated. A modified version of the Schiminovich algorithm was used to classify articles in the data base, utilizing citations found in their bibliographies and a “triggering file” of bibliographically related papers. Classes of articles were formed by comparing the citations in the articles with members of the triggering file. Those papers with similar citing patterns formed a <em>group</em>; citations occurring sufficiently often within the papers of a group formed a <em>bibliography</em>. Bibliographies in turn became new triggering files in an iterative procedure.</p><p>Results of this nontraditional method were compared with those of retrieval by standard American Institute of Physics (AIP) subject analysis of the same material. A combination of the two methods was also investigated to test the hypothesis that one could be used to augment or refine the other.</p><p>Nine cooperating physicists defined queries in terms of (1) the AIP classification scheme and (2) a list of articles likely to be cited in current relevant literature. The data base was 18 months of 1971–1972 physics journal articles (AIP SPIN tapes). The chief means of evaluating the results of the three retrieval approaches was a comparison of recall and precision values obtained from relevance judgments by the physicists.</p><p>Average precision for the AIP subject analysis was 17 per cent and for the citation processing 62 per cent. Based on an assumption of 100 per cent recall for the AIP analysis, average recall for the citation processing was 45 per cent. All but one physicist would prefer to have both the citation and subject approaches available for information retrieval. Sometimes, only a few relevant articles are desired; at other times comprehensiveness is necessary. However, if only one method were available, all except one participant would choose the subject approach. They felt that one must be able to retrieve all the relevant articles, even if that would mean examining 25–30 irrelevant notices for every relevant one. Several expressed the notion that looking at a file with a very high percentage of relevant articles made one wonder what was missing.</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 267-278"},"PeriodicalIF":0.0,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90022-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125026758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1974-07-01DOI: 10.1016/0020-0271(74)90018-7
Aaron Tenenbein, Jay-Louise Weldon
One property of search schemes used frequently as an indication of search efficiency is the expected number of passes (or read operations) required to locate a record in computer memory (or secondary storage). However, other properties related to the variation in the number of passes required may be equally important in determining the most efficient search technique for a given file (or set of records). These properties can be obtained from the distribution of the number of passes.
In this paper, the binary search scheme is discussed within this framework. The probability distribution of the number of passes required is derived along with the mean, standard deviation and the percentiles. The mean and standard deviation of the random sequential search scheme are also presented. The application of these results to common search problems in record access and retrieval are discussed and possible extensions involving a combination of binary and random search techniques are indicated.
{"title":"Probability distributions and search schemes","authors":"Aaron Tenenbein, Jay-Louise Weldon","doi":"10.1016/0020-0271(74)90018-7","DOIUrl":"10.1016/0020-0271(74)90018-7","url":null,"abstract":"<div><p>One property of search schemes used frequently as an indication of search efficiency is the expected number of passes (or read operations) required to locate a record in computer memory (or secondary storage). However, other properties related to the variation in the number of passes required may be equally important in determining the most efficient search technique for a given file (or set of records). These properties can be obtained from the distribution of the number of passes.</p><p>In this paper, the binary search scheme is discussed within this framework. The probability distribution of the number of passes required is derived along with the mean, standard deviation and the percentiles. The mean and standard deviation of the random sequential search scheme are also presented. The application of these results to common search problems in record access and retrieval are discussed and possible extensions involving a combination of binary and random search techniques are indicated.</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 237-242"},"PeriodicalIF":0.0,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90018-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114161179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}