Pseudo-Relevance Feedback Driven for XML Query Expansion

Minjuan Zhong, Changxuan Wan
{"title":"Pseudo-Relevance Feedback Driven for XML Query Expansion","authors":"Minjuan Zhong, Changxuan Wan","doi":"10.4156/JCIT.VOL5.ISSUE9.15","DOIUrl":null,"url":null,"abstract":"Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be harmful to the retrieval performance. It is often crucial to identify those good feedback documents from which useful expansion terms can be added to the query. Compared with traditional query expansion, XML query expansion needs not only content expansion but also considering structural expansion. This paper presents a solution for both identifying related documents and selecting good expansion information with new content and path constrains. Combined with XML semantic feature, a naive document similarity measurement is proposed in this paper. Based on this, kmedian clustering algorithm is firstly implemented and some related documents are found. Secondly, query expansion is only performed by two steps in the set of related documents, which key phrase extraction algorithm is carried out to expand original query in the first step and the second step is structural expansion based on the expanded key phrases. Finally a full-edged content-structure query expression which can represent user’s intention is formalized. Experimental results on IEEE CS collection show that the proposed method can reduce the topic drift effectively and obtain the better retrieval quality.","PeriodicalId":360193,"journal":{"name":"J. Convergence Inf. Technol.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Convergence Inf. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4156/JCIT.VOL5.ISSUE9.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be harmful to the retrieval performance. It is often crucial to identify those good feedback documents from which useful expansion terms can be added to the query. Compared with traditional query expansion, XML query expansion needs not only content expansion but also considering structural expansion. This paper presents a solution for both identifying related documents and selecting good expansion information with new content and path constrains. Combined with XML semantic feature, a naive document similarity measurement is proposed in this paper. Based on this, kmedian clustering algorithm is firstly implemented and some related documents are found. Secondly, query expansion is only performed by two steps in the set of related documents, which key phrase extraction algorithm is carried out to expand original query in the first step and the second step is structural expansion based on the expanded key phrases. Finally a full-edged content-structure query expression which can represent user’s intention is formalized. Experimental results on IEEE CS collection show that the proposed method can reduce the topic drift effectively and obtain the better retrieval quality.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
伪相关反馈驱动的XML查询扩展
伪相关反馈被认为是自动查询扩展的有效解决方案。然而,最近的研究表明,传统的伪相关反馈可能会导致主题漂移,从而影响检索性能。识别那些可以将有用的扩展术语添加到查询中的好的反馈文档通常是至关重要的。与传统的查询扩展相比,XML查询扩展不仅需要内容扩展,还需要考虑结构扩展。本文提出了一种识别相关文档和选择具有新内容和路径约束的良好扩展信息的解决方案。结合XML语义特征,提出了一种朴素的文档相似度度量方法。在此基础上,首先实现了kmedian聚类算法,并找到了相关文献。其次,在相关文档集中仅分两步进行查询扩展,第一步执行关键短语提取算法对原始查询进行扩展,第二步基于扩展后的关键短语进行结构化扩展。最后形式化了一个能代表用户意图的全边内容结构查询表达式。在IEEE CS数据集上的实验结果表明,该方法可以有效地减少主题漂移,获得较好的检索质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Research on Maximal Frequent Pattern Outlier Factor for Online High-Dimensional Time-Series Outlier Detection Spirit: Security and Privacy in Real-Time Monitoring System Integrating Product Information Management (PIM) with Internet-Mediated Transactions (IMTs) Area Optimization in Floorplanning Using AP-TCG People Summarization by Combining Named Entity Recognition and Relation Extraction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1