{"title":"Document retrieval by means of an automatic classification algorithm for citations","authors":"Julie Bichteler, Ronald G. Parsons","doi":"10.1016/0020-0271(74)90022-9","DOIUrl":null,"url":null,"abstract":"<div><p>The applicability of an automatic classification technique to information retrieval was investigated. A modified version of the Schiminovich algorithm was used to classify articles in the data base, utilizing citations found in their bibliographies and a “triggering file” of bibliographically related papers. Classes of articles were formed by comparing the citations in the articles with members of the triggering file. Those papers with similar citing patterns formed a <em>group</em>; citations occurring sufficiently often within the papers of a group formed a <em>bibliography</em>. Bibliographies in turn became new triggering files in an iterative procedure.</p><p>Results of this nontraditional method were compared with those of retrieval by standard American Institute of Physics (AIP) subject analysis of the same material. A combination of the two methods was also investigated to test the hypothesis that one could be used to augment or refine the other.</p><p>Nine cooperating physicists defined queries in terms of (1) the AIP classification scheme and (2) a list of articles likely to be cited in current relevant literature. The data base was 18 months of 1971–1972 physics journal articles (AIP SPIN tapes). The chief means of evaluating the results of the three retrieval approaches was a comparison of recall and precision values obtained from relevance judgments by the physicists.</p><p>Average precision for the AIP subject analysis was 17 per cent and for the citation processing 62 per cent. Based on an assumption of 100 per cent recall for the AIP analysis, average recall for the citation processing was 45 per cent. All but one physicist would prefer to have both the citation and subject approaches available for information retrieval. Sometimes, only a few relevant articles are desired; at other times comprehensiveness is necessary. However, if only one method were available, all except one participant would choose the subject approach. They felt that one must be able to retrieve all the relevant articles, even if that would mean examining 25–30 irrelevant notices for every relevant one. Several expressed the notion that looking at a file with a very high percentage of relevant articles made one wonder what was missing.</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 267-278"},"PeriodicalIF":0.0000,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90022-9","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Storage and Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/0020027174900229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
The applicability of an automatic classification technique to information retrieval was investigated. A modified version of the Schiminovich algorithm was used to classify articles in the data base, utilizing citations found in their bibliographies and a “triggering file” of bibliographically related papers. Classes of articles were formed by comparing the citations in the articles with members of the triggering file. Those papers with similar citing patterns formed a group; citations occurring sufficiently often within the papers of a group formed a bibliography. Bibliographies in turn became new triggering files in an iterative procedure.
Results of this nontraditional method were compared with those of retrieval by standard American Institute of Physics (AIP) subject analysis of the same material. A combination of the two methods was also investigated to test the hypothesis that one could be used to augment or refine the other.
Nine cooperating physicists defined queries in terms of (1) the AIP classification scheme and (2) a list of articles likely to be cited in current relevant literature. The data base was 18 months of 1971–1972 physics journal articles (AIP SPIN tapes). The chief means of evaluating the results of the three retrieval approaches was a comparison of recall and precision values obtained from relevance judgments by the physicists.
Average precision for the AIP subject analysis was 17 per cent and for the citation processing 62 per cent. Based on an assumption of 100 per cent recall for the AIP analysis, average recall for the citation processing was 45 per cent. All but one physicist would prefer to have both the citation and subject approaches available for information retrieval. Sometimes, only a few relevant articles are desired; at other times comprehensiveness is necessary. However, if only one method were available, all except one participant would choose the subject approach. They felt that one must be able to retrieve all the relevant articles, even if that would mean examining 25–30 irrelevant notices for every relevant one. Several expressed the notion that looking at a file with a very high percentage of relevant articles made one wonder what was missing.