使用监督项加权方案改进预先分类的集合检索

Proceedings. International Conference on Information Technology: Coding and Computing Pub Date : 2002-04-08 DOI:10.1109/ITCC.2002.1000353

Ying Zhao, G. Karypis

{"title":"使用监督项加权方案改进预先分类的集合检索","authors":"Ying Zhao, G. Karypis","doi":"10.1109/ITCC.2002.1000353","DOIUrl":null,"url":null,"abstract":"The emergence of the World Wide Web has led to an increased interest in methods for searching for information. A key characteristic of many online document collections is that the documents have pre-defined category information, such as the variety of scientific articles accessible via digital libraries (e.g. ACM, IEEE, etc.), medical articles, news-wires and various directories (e.g. Yahoo, OpenDirectory Project, etc.). However, most previous information retrieval systems have not taken the pre-existing category information into account. In this paper, we present weight adjustment schemes based upon the category information in the vector-space model, which are able to select the most content-specific and discriminating features. Our experimental results on TREC data sets show that the pre-existing category information does provide additional beneficial information to improve retrieval. The proposed weight adjustment schemes perform better than the vector-space model with the inverse document frequency (IDF) weighting scheme when queries are less specific. The proposed weighting schemes can also benefit retrieval when clusters are used as an approximations to categories.","PeriodicalId":115190,"journal":{"name":"Proceedings. International Conference on Information Technology: Coding and Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Improve precategorized collection retrieval by using supervised term weighting schemes\",\"authors\":\"Ying Zhao, G. Karypis\",\"doi\":\"10.1109/ITCC.2002.1000353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emergence of the World Wide Web has led to an increased interest in methods for searching for information. A key characteristic of many online document collections is that the documents have pre-defined category information, such as the variety of scientific articles accessible via digital libraries (e.g. ACM, IEEE, etc.), medical articles, news-wires and various directories (e.g. Yahoo, OpenDirectory Project, etc.). However, most previous information retrieval systems have not taken the pre-existing category information into account. In this paper, we present weight adjustment schemes based upon the category information in the vector-space model, which are able to select the most content-specific and discriminating features. Our experimental results on TREC data sets show that the pre-existing category information does provide additional beneficial information to improve retrieval. The proposed weight adjustment schemes perform better than the vector-space model with the inverse document frequency (IDF) weighting scheme when queries are less specific. The proposed weighting schemes can also benefit retrieval when clusters are used as an approximations to categories.\",\"PeriodicalId\":115190,\"journal\":{\"name\":\"Proceedings. International Conference on Information Technology: Coding and Computing\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Information Technology: Coding and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITCC.2002.1000353\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Information Technology: Coding and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCC.2002.1000353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

万维网的出现增加了人们对搜索信息方法的兴趣。许多在线文档集合的一个关键特征是文档具有预定义的分类信息，例如通过数字图书馆(例如ACM, IEEE等)访问的各种科学文章，医学文章，新闻线路和各种目录(例如Yahoo, OpenDirectory Project等)。然而，以往的信息检索系统大多没有考虑到已有的类别信息。在本文中，我们提出了基于向量空间模型中类别信息的权重调整方案，该方案能够选择最具内容特异性和区别性的特征。我们在TREC数据集上的实验结果表明，预先存在的类别信息确实为改进检索提供了额外的有益信息。当查询不太具体时，所提出的权重调整方案比具有逆文档频率(IDF)加权方案的向量空间模型性能更好。当使用聚类作为类别的近似时，所提出的加权方案也有利于检索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improve precategorized collection retrieval by using supervised term weighting schemes

The emergence of the World Wide Web has led to an increased interest in methods for searching for information. A key characteristic of many online document collections is that the documents have pre-defined category information, such as the variety of scientific articles accessible via digital libraries (e.g. ACM, IEEE, etc.), medical articles, news-wires and various directories (e.g. Yahoo, OpenDirectory Project, etc.). However, most previous information retrieval systems have not taken the pre-existing category information into account. In this paper, we present weight adjustment schemes based upon the category information in the vector-space model, which are able to select the most content-specific and discriminating features. Our experimental results on TREC data sets show that the pre-existing category information does provide additional beneficial information to improve retrieval. The proposed weight adjustment schemes perform better than the vector-space model with the inverse document frequency (IDF) weighting scheme when queries are less specific. The proposed weighting schemes can also benefit retrieval when clusters are used as an approximations to categories.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. International Conference on Information Technology: Coding and Computing

自引率

0.00%

发文量

期刊最新文献

Parallel execution of relational algebra operator under distributed database systems Enhancing watermark robustness through mixture of watermarked digital objects Improving precision and recall for Soundex retrieval Performance driven circuit clustering and partitioning Experimental results towards content-based sub-image retrieval