Diversified relevance feedback

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval Pub Date : 2013-07-28 DOI:10.1145/2484028.2484227

Matt Crane

{"title":"Diversified relevance feedback","authors":"Matt Crane","doi":"10.1145/2484028.2484227","DOIUrl":null,"url":null,"abstract":"The need for a search engine to deal with ambiguous queries has been known for a long time (diversification). However, it is only recently that this need has become a focus within information retrieval research. How to respond to indications that a result is relevant to a query (relevance feedback) has also been a long focus of research. When thinking about the results for a query as being clustered by topic, these two areas of information retrieval research appear to be opposed to each other. Interestingly though, they both appear to improve the performance of search engines, raising the question: they can be combined or made to work with each other? When presented with an ambiguous query there are a number of techniques that can be employed to better select results. The primary technique being researched now is diversification, which aims to populate the results with a set of documents that cover different possible interpretations for the query, while maintaining a degree of relevance, as determined by the search engine. For example, given a query of \"java\" it is unclear whether the user, without any other information, means the programming language, the coffee, the island of Indonesia or a multitude of other meanings. In order to do this the assumption that documents are independent of each other when assessing potential relevance has to be broken. That is, a documents relevance, as calculated by the search engine, is no longer dependent only on the query, but also the other documents that have been selected. How a document is identified as being similar to previously selected documents, and the trade off between estimated relevance and topic coverage are current areas for information retrieval research. For unambiguous queries, or for search engines that do not perform diversification, it is possible to improve the results selected by reacting to information identifying a given result as truly relevant or not. This mechanism is known as relevance feedback. The most common response to relevance feedback is to investigate the documents for their most content-bearing terms, and either add, or subtract, their influence to a newly formed query which is then re-run on the remaining documents to re-order them. There has been a scant amount of research into the combination of these methods. However, Carbonell et al. [1] show that an initially diverse result set can provide a better approach for identifying the topic a user is interested in for a relevance feedback style approach. This approach was further extended by Raman et al. [4]. An important aspect of relevance feedback is the selection of documents to use. In the 2008 TREC relevance feedback track, Meij et al. [3] generated a diversified result set which outperformed other rankings as a source of feedback documents. The use of pseudo-relevance feedback (assuming the top ranked documents are relevant) to extract sub-topics for use in diversification was explored by Santos et al. [5]. These previous approaches suggest that these two ideas are more linked than expected. The ATIRE search engine [6] will be used to further explore the relationship between diversification and relevance feedback. ATIRE was selected because it is developed locally, and is designed to be small and fast. ATIRE also produces a competitive baseline, which would have placed 6th in the 2011 TREC diversity task while performing no diversification and index-time spam filtering [2], although we concede this is not equivalent to submitting a run.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484028.2484227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The need for a search engine to deal with ambiguous queries has been known for a long time (diversification). However, it is only recently that this need has become a focus within information retrieval research. How to respond to indications that a result is relevant to a query (relevance feedback) has also been a long focus of research. When thinking about the results for a query as being clustered by topic, these two areas of information retrieval research appear to be opposed to each other. Interestingly though, they both appear to improve the performance of search engines, raising the question: they can be combined or made to work with each other? When presented with an ambiguous query there are a number of techniques that can be employed to better select results. The primary technique being researched now is diversification, which aims to populate the results with a set of documents that cover different possible interpretations for the query, while maintaining a degree of relevance, as determined by the search engine. For example, given a query of "java" it is unclear whether the user, without any other information, means the programming language, the coffee, the island of Indonesia or a multitude of other meanings. In order to do this the assumption that documents are independent of each other when assessing potential relevance has to be broken. That is, a documents relevance, as calculated by the search engine, is no longer dependent only on the query, but also the other documents that have been selected. How a document is identified as being similar to previously selected documents, and the trade off between estimated relevance and topic coverage are current areas for information retrieval research. For unambiguous queries, or for search engines that do not perform diversification, it is possible to improve the results selected by reacting to information identifying a given result as truly relevant or not. This mechanism is known as relevance feedback. The most common response to relevance feedback is to investigate the documents for their most content-bearing terms, and either add, or subtract, their influence to a newly formed query which is then re-run on the remaining documents to re-order them. There has been a scant amount of research into the combination of these methods. However, Carbonell et al. [1] show that an initially diverse result set can provide a better approach for identifying the topic a user is interested in for a relevance feedback style approach. This approach was further extended by Raman et al. [4]. An important aspect of relevance feedback is the selection of documents to use. In the 2008 TREC relevance feedback track, Meij et al. [3] generated a diversified result set which outperformed other rankings as a source of feedback documents. The use of pseudo-relevance feedback (assuming the top ranked documents are relevant) to extract sub-topics for use in diversification was explored by Santos et al. [5]. These previous approaches suggest that these two ideas are more linked than expected. The ATIRE search engine [6] will be used to further explore the relationship between diversification and relevance feedback. ATIRE was selected because it is developed locally, and is designed to be small and fast. ATIRE also produces a competitive baseline, which would have placed 6th in the 2011 TREC diversity task while performing no diversification and index-time spam filtering [2], although we concede this is not equivalent to submitting a run.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多元化关联反馈

很长一段时间以来，人们都知道搜索引擎需要处理模棱两可的查询(多样化)。然而，直到最近，这种需求才成为信息检索研究的焦点。如何回应结果与查询相关的指示(相关性反馈)也一直是研究的焦点。当考虑按主题聚类查询结果时，这两个信息检索研究领域似乎是相互对立的。有趣的是，它们似乎都提高了搜索引擎的性能，这就提出了一个问题:它们可以组合起来或相互协作吗?当出现歧义查询时，可以使用许多技术来更好地选择结果。目前正在研究的主要技术是多样化，其目的是用一组文档填充结果，这些文档涵盖了对查询的不同可能解释，同时保持一定程度的相关性，这由搜索引擎决定。例如，给定“java”的查询，在没有任何其他信息的情况下，不清楚用户是指编程语言、咖啡、印度尼西亚岛还是许多其他含义。为了做到这一点，在评估潜在相关性时必须打破文档相互独立的假设。也就是说，由搜索引擎计算的文档相关性不再仅仅依赖于查询，还依赖于已选择的其他文档。如何将文档识别为与先前选择的文档相似，以及在估计的相关性和主题覆盖之间进行权衡是当前信息检索研究的领域。对于明确的查询，或者对于不执行多样化的搜索引擎，可以通过对确定给定结果是否真正相关的信息作出反应来改进所选结果。这种机制被称为相关反馈。对相关反馈最常见的反应是调查文档中与内容相关的术语，并在新形成的查询中添加或减去它们的影响，然后在剩余文档上重新运行以重新排序它们。对这些方法的结合进行的研究很少。然而，Carbonell等人表明，最初多样化的结果集可以为相关性反馈风格的方法提供更好的方法来识别用户感兴趣的主题。该方法被Raman等人进一步推广。相关性反馈的一个重要方面是选择要使用的文档。在2008年TREC相关反馈跟踪中，Meij等人([3])生成了一个多样化的结果集，作为反馈文档的来源，该结果集优于其他排名。Santos等人探索了使用伪相关反馈(假设排名靠前的文档是相关的)来提取子主题以用于多样化的方法。这些先前的方法表明，这两个想法比预期的更紧密地联系在一起。本文将使用ATIRE搜索引擎[6]进一步探讨多元化与相关性反馈之间的关系。之所以选择ATIRE，是因为它是在当地开发的，并且设计得小而快。ATIRE还产生了一个具有竞争力的基线，在2011年TREC多样性任务中，它将排名第六，同时不执行多样化和索引时间垃圾邮件过滤[2]，尽管我们承认这并不等同于提交运行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

自引率

0.00%

发文量

期刊最新文献

Search engine switching detection based on user personal preferences and behavior patterns Workshop on benchmarking adaptive retrieval and recommender systems: BARS 2013 A test collection for entity search in DBpedia Sentiment analysis of user comments for one-class collaborative filtering over ted talks A document rating system for preference judgements