Pseudo-Relevance Feedback Driven for XML Query Expansion

J. Convergence Inf. Technol. Pub Date : 2010-11-30 DOI:10.4156/JCIT.VOL5.ISSUE9.15

Minjuan Zhong, Changxuan Wan

{"title":"Pseudo-Relevance Feedback Driven for XML Query Expansion","authors":"Minjuan Zhong, Changxuan Wan","doi":"10.4156/JCIT.VOL5.ISSUE9.15","DOIUrl":null,"url":null,"abstract":"Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be harmful to the retrieval performance. It is often crucial to identify those good feedback documents from which useful expansion terms can be added to the query. Compared with traditional query expansion, XML query expansion needs not only content expansion but also considering structural expansion. This paper presents a solution for both identifying related documents and selecting good expansion information with new content and path constrains. Combined with XML semantic feature, a naive document similarity measurement is proposed in this paper. Based on this, kmedian clustering algorithm is firstly implemented and some related documents are found. Secondly, query expansion is only performed by two steps in the set of related documents, which key phrase extraction algorithm is carried out to expand original query in the first step and the second step is structural expansion based on the expanded key phrases. Finally a full-edged content-structure query expression which can represent user’s intention is formalized. Experimental results on IEEE CS collection show that the proposed method can reduce the topic drift effectively and obtain the better retrieval quality.","PeriodicalId":360193,"journal":{"name":"J. Convergence Inf. Technol.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Convergence Inf. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4156/JCIT.VOL5.ISSUE9.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be harmful to the retrieval performance. It is often crucial to identify those good feedback documents from which useful expansion terms can be added to the query. Compared with traditional query expansion, XML query expansion needs not only content expansion but also considering structural expansion. This paper presents a solution for both identifying related documents and selecting good expansion information with new content and path constrains. Combined with XML semantic feature, a naive document similarity measurement is proposed in this paper. Based on this, kmedian clustering algorithm is firstly implemented and some related documents are found. Secondly, query expansion is only performed by two steps in the set of related documents, which key phrase extraction algorithm is carried out to expand original query in the first step and the second step is structural expansion based on the expanded key phrases. Finally a full-edged content-structure query expression which can represent user’s intention is formalized. Experimental results on IEEE CS collection show that the proposed method can reduce the topic drift effectively and obtain the better retrieval quality.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

伪相关反馈驱动的XML查询扩展

伪相关反馈被认为是自动查询扩展的有效解决方案。然而，最近的研究表明，传统的伪相关反馈可能会导致主题漂移，从而影响检索性能。识别那些可以将有用的扩展术语添加到查询中的好的反馈文档通常是至关重要的。与传统的查询扩展相比，XML查询扩展不仅需要内容扩展，还需要考虑结构扩展。本文提出了一种识别相关文档和选择具有新内容和路径约束的良好扩展信息的解决方案。结合XML语义特征，提出了一种朴素的文档相似度度量方法。在此基础上，首先实现了kmedian聚类算法，并找到了相关文献。其次，在相关文档集中仅分两步进行查询扩展，第一步执行关键短语提取算法对原始查询进行扩展，第二步基于扩展后的关键短语进行结构化扩展。最后形式化了一个能代表用户意图的全边内容结构查询表达式。在IEEE CS数据集上的实验结果表明，该方法可以有效地减少主题漂移，获得较好的检索质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

J. Convergence Inf. Technol.

自引率

0.00%

发文量