Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval

Parvaz Mahdabi, Shima Gerani, Xiangji Huang, F. Crestani
{"title":"Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval","authors":"Parvaz Mahdabi, Shima Gerani, Xiangji Huang, F. Crestani","doi":"10.1145/2484028.2484056","DOIUrl":null,"url":null,"abstract":"Patent prior art search is a task in patent retrieval where the goal is to rank documents which describe prior art work related to a patent application. One of the main properties of patent retrieval is that the query topic is a full patent application and does not represent a focused information need. This query by document nature of patent retrieval introduces new challenges and requires new investigations specific to this problem. Researchers have addressed this problem by considering different information resources for query reduction and query disambiguation. However, previous work has not fully studied the effect of using proximity information and exploiting domain specific resources for performing query disambiguation. In this paper, we first reduce the query document by taking the first claim of the document itself. We then build a query-specific patent lexicon based on definitions of the International Patent Classification (IPC). We study how to expand queries by selecting expansion terms from the lexicon that are focused on the query topic. The key problem is how to capture whether an expansion term is focused on the query topic or not. We address this problem by exploiting proximity information. We assign high weights to expansion terms appearing closer to query terms based on the intuition that terms closer to query terms are more likely to be related to the query topic. Experimental results on two patent retrieval datasets show that the proposed method is effective and robust for query expansion, significantly outperforming the standard pseudo relevance feedback (PRF) and existing baselines in patent retrieval.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"55 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484028.2484056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

Patent prior art search is a task in patent retrieval where the goal is to rank documents which describe prior art work related to a patent application. One of the main properties of patent retrieval is that the query topic is a full patent application and does not represent a focused information need. This query by document nature of patent retrieval introduces new challenges and requires new investigations specific to this problem. Researchers have addressed this problem by considering different information resources for query reduction and query disambiguation. However, previous work has not fully studied the effect of using proximity information and exploiting domain specific resources for performing query disambiguation. In this paper, we first reduce the query document by taking the first claim of the document itself. We then build a query-specific patent lexicon based on definitions of the International Patent Classification (IPC). We study how to expand queries by selecting expansion terms from the lexicon that are focused on the query topic. The key problem is how to capture whether an expansion term is focused on the query topic or not. We address this problem by exploiting proximity information. We assign high weights to expansion terms appearing closer to query terms based on the intuition that terms closer to query terms are more likely to be related to the query topic. Experimental results on two patent retrieval datasets show that the proposed method is effective and robust for query expansion, significantly outperforming the standard pseudo relevance feedback (PRF) and existing baselines in patent retrieval.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用概念词典:使用接近信息进行专利检索的查询消歧
专利现有技术检索是专利检索中的一项任务,其目标是对描述与专利申请相关的现有技术工作的文档进行排序。专利检索的主要特性之一是查询主题是完整的专利申请,而不代表集中的信息需求。这种按文档性质查询专利检索带来了新的挑战,需要针对这个问题进行新的调查。研究人员通过考虑不同的信息资源进行查询约简和查询消歧来解决这个问题。然而,以前的工作并没有充分研究使用接近信息和利用特定领域资源来执行查询消歧的效果。在本文中,我们首先通过对文档本身的第一权利要求来减少查询文档。然后,我们基于国际专利分类(IPC)的定义构建一个特定于查询的专利词典。我们研究如何通过从词典中选择与查询主题相关的扩展术语来扩展查询。关键问题是如何捕获展开项是否关注于查询主题。我们通过利用邻近信息来解决这个问题。我们给看起来更接近查询词的扩展词分配了高权重,这是基于这样一种直觉:更接近查询词的词更有可能与查询主题相关。在两个专利检索数据集上的实验结果表明,该方法对查询扩展具有良好的鲁棒性,显著优于标准的伪相关反馈(PRF)和现有的专利检索基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Search engine switching detection based on user personal preferences and behavior patterns Workshop on benchmarking adaptive retrieval and recommender systems: BARS 2013 A test collection for entity search in DBpedia Sentiment analysis of user comments for one-class collaborative filtering over ted talks A document rating system for preference judgements
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1