Document retrieval by means of an automatic classification algorithm for citations

Julie Bichteler, Ronald G. Parsons
{"title":"Document retrieval by means of an automatic classification algorithm for citations","authors":"Julie Bichteler,&nbsp;Ronald G. Parsons","doi":"10.1016/0020-0271(74)90022-9","DOIUrl":null,"url":null,"abstract":"<div><p>The applicability of an automatic classification technique to information retrieval was investigated. A modified version of the Schiminovich algorithm was used to classify articles in the data base, utilizing citations found in their bibliographies and a “triggering file” of bibliographically related papers. Classes of articles were formed by comparing the citations in the articles with members of the triggering file. Those papers with similar citing patterns formed a <em>group</em>; citations occurring sufficiently often within the papers of a group formed a <em>bibliography</em>. Bibliographies in turn became new triggering files in an iterative procedure.</p><p>Results of this nontraditional method were compared with those of retrieval by standard American Institute of Physics (AIP) subject analysis of the same material. A combination of the two methods was also investigated to test the hypothesis that one could be used to augment or refine the other.</p><p>Nine cooperating physicists defined queries in terms of (1) the AIP classification scheme and (2) a list of articles likely to be cited in current relevant literature. The data base was 18 months of 1971–1972 physics journal articles (AIP SPIN tapes). The chief means of evaluating the results of the three retrieval approaches was a comparison of recall and precision values obtained from relevance judgments by the physicists.</p><p>Average precision for the AIP subject analysis was 17 per cent and for the citation processing 62 per cent. Based on an assumption of 100 per cent recall for the AIP analysis, average recall for the citation processing was 45 per cent. All but one physicist would prefer to have both the citation and subject approaches available for information retrieval. Sometimes, only a few relevant articles are desired; at other times comprehensiveness is necessary. However, if only one method were available, all except one participant would choose the subject approach. They felt that one must be able to retrieve all the relevant articles, even if that would mean examining 25–30 irrelevant notices for every relevant one. Several expressed the notion that looking at a file with a very high percentage of relevant articles made one wonder what was missing.</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 267-278"},"PeriodicalIF":0.0000,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90022-9","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Storage and Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/0020027174900229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

The applicability of an automatic classification technique to information retrieval was investigated. A modified version of the Schiminovich algorithm was used to classify articles in the data base, utilizing citations found in their bibliographies and a “triggering file” of bibliographically related papers. Classes of articles were formed by comparing the citations in the articles with members of the triggering file. Those papers with similar citing patterns formed a group; citations occurring sufficiently often within the papers of a group formed a bibliography. Bibliographies in turn became new triggering files in an iterative procedure.

Results of this nontraditional method were compared with those of retrieval by standard American Institute of Physics (AIP) subject analysis of the same material. A combination of the two methods was also investigated to test the hypothesis that one could be used to augment or refine the other.

Nine cooperating physicists defined queries in terms of (1) the AIP classification scheme and (2) a list of articles likely to be cited in current relevant literature. The data base was 18 months of 1971–1972 physics journal articles (AIP SPIN tapes). The chief means of evaluating the results of the three retrieval approaches was a comparison of recall and precision values obtained from relevance judgments by the physicists.

Average precision for the AIP subject analysis was 17 per cent and for the citation processing 62 per cent. Based on an assumption of 100 per cent recall for the AIP analysis, average recall for the citation processing was 45 per cent. All but one physicist would prefer to have both the citation and subject approaches available for information retrieval. Sometimes, only a few relevant articles are desired; at other times comprehensiveness is necessary. However, if only one method were available, all except one participant would choose the subject approach. They felt that one must be able to retrieve all the relevant articles, even if that would mean examining 25–30 irrelevant notices for every relevant one. Several expressed the notion that looking at a file with a very high percentage of relevant articles made one wonder what was missing.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于引文自动分类算法的文献检索
研究了一种自动分类技术在信息检索中的适用性。使用一种改进版的Schiminovich算法对数据库中的文章进行分类,利用参考书目中的引文和参考书目相关论文的“触发文件”。通过将文章中的引用与触发文件的成员进行比较,形成文章的类别。那些引用模式相似的论文组成一组;在一组论文中经常出现的引文构成了参考书目。参考书目依次成为迭代过程中的新的触发文件。将这种非传统方法的检索结果与美国物理学会(AIP)标准主题分析的检索结果进行了比较。还研究了两种方法的组合,以检验一种方法可以用来增强或改进另一种方法的假设。九位合作的物理学家根据(1)AIP分类方案和(2)当前相关文献中可能被引用的文章列表来定义查询。数据库是1971-1972年18个月的物理期刊文章(AIP SPIN磁带)。评价三种检索方法的主要方法是比较物理学家根据相关性判断得出的查全率和查准率。AIP主题分析的平均准确率为17%,引文处理的平均准确率为62%。基于AIP分析100%查全率的假设,引文处理的平均查全率为45%。除了一位物理学家外,所有物理学家都希望在信息检索中同时使用引文和主题方法。有时,只需要一些相关的文章;在其他时候,全面性是必要的。然而,如果只有一种方法可用,除了一名参与者外,所有参与者都会选择主题方法。他们认为,人们必须能够检索所有相关的文章,即使这意味着要为每一篇相关的文章检查25-30个不相关的通知。一些人表示,看到一个相关文章百分比很高的文件,会让人怀疑缺少了什么。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Information Storage: A Multidisciplinary Perspective Computer systems in the library: A handbook for managers and designers Knowing books and men: Knowing computers, too Grundlagen universaler wissensordnung; probleme und möglichkeiten eines universalen klassifikationssystems des wissens Resource sharing in libraries: Why, how, when, next action steps
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1