Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval最新文献

英文中文

Sentiment diversification with different biases 不同偏见下的情绪多元化

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484060

Elif Aktolga, James Allan

Prior search result diversification work focuses on achieving topical variety in a ranked list, typically equally across all aspects. In this paper, we diversify with sentiments according to an explicit bias. We want to allow users to switch the result perspective to better grasp the polarity of opinionated content, such as during a literature review. For this, we first infer the prior sentiment bias inherent in a controversial topic -- the 'Topic Sentiment'. Then, we utilize this information in 3 different ways to diversify results according to various sentiment biases: (1) Equal diversification to achieve a balanced and unbiased representation of all sentiments on the topic; (2) Diversification towards the Topic Sentiment, in which the actual sentiment bias in the topic is mirrored to emphasize the general perception of the topic; (3) Diversification against the Topic Sentiment, in which documents about the 'minority' or outlying sentiment(s) are boosted and those with the popular sentiment are demoted. Since sentiment classification is an essential tool for this task, we experiment by gradually degrading the accuracy of a perfect classifier down to 40%, and show which diversification approaches prove most stable in this setting. The results reveal that the proportionality-based methods and our SCSF model, considering sentiment strength and frequency in the diversified list, yield the highest gains. Further, in case the Topic Sentiment cannot be reliably estimated, we show how performance is affected by equal diversification when actually an emphasis either towards or against the Topic Sentiment is desired: in the former case, an average of 6.48% is lost across all evaluation measures, whereas in the latter case this is 16.23%, confirming that bias-specific sentiment diversification is crucial.

先前的搜索结果多样化工作侧重于在排名列表中实现主题多样性，通常在所有方面都是平等的。在本文中，我们根据显式偏见进行情绪多元化。我们希望允许用户切换结果视角，以便更好地掌握固执己见的内容的极性，比如在文献综述中。为此，我们首先推断出在一个有争议的话题中固有的先验情绪偏见——“话题情绪”。然后，我们以3种不同的方式利用这些信息根据不同的情绪偏差来多样化结果:(1)平等多样化，以实现对主题上所有情绪的平衡和公正的表示;(2)话题情绪多元化，即反映话题的实际情绪偏向，强调对话题的总体感知;(3)针对话题情绪的多元化，即关于“少数”或“离群之马”的文件被提升，而那些具有大众情绪的文件被贬低。由于情绪分类是该任务的重要工具，我们通过逐渐将完美分类器的准确率降低到40%来进行实验，并展示哪些多样化方法在此设置下证明最稳定。结果表明，基于比例的方法和我们的SCSF模型，考虑了多元化列表中的情绪强度和频率，产生了最高的收益。此外，在无法可靠地估计主题情绪的情况下，我们展示了在实际强调或反对主题情绪时，相等的多样化是如何影响绩效的:在前一种情况下，所有评估措施平均损失6.48%，而在后一种情况下，这是16.23%，证实了偏见特定的情绪多样化是至关重要的。

{"title":"Sentiment diversification with different biases","authors":"Elif Aktolga, James Allan","doi":"10.1145/2484028.2484060","DOIUrl":"https://doi.org/10.1145/2484028.2484060","url":null,"abstract":"Prior search result diversification work focuses on achieving topical variety in a ranked list, typically equally across all aspects. In this paper, we diversify with sentiments according to an explicit bias. We want to allow users to switch the result perspective to better grasp the polarity of opinionated content, such as during a literature review. For this, we first infer the prior sentiment bias inherent in a controversial topic -- the 'Topic Sentiment'. Then, we utilize this information in 3 different ways to diversify results according to various sentiment biases: (1) Equal diversification to achieve a balanced and unbiased representation of all sentiments on the topic; (2) Diversification towards the Topic Sentiment, in which the actual sentiment bias in the topic is mirrored to emphasize the general perception of the topic; (3) Diversification against the Topic Sentiment, in which documents about the 'minority' or outlying sentiment(s) are boosted and those with the popular sentiment are demoted. Since sentiment classification is an essential tool for this task, we experiment by gradually degrading the accuracy of a perfect classifier down to 40%, and show which diversification approaches prove most stable in this setting. The results reveal that the proportionality-based methods and our SCSF model, considering sentiment strength and frequency in the diversified list, yield the highest gains. Further, in case the Topic Sentiment cannot be reliably estimated, we show how performance is affected by equal diversification when actually an emphasis either towards or against the Topic Sentiment is desired: in the former case, an average of 6.48% is lost across all evaluation measures, whereas in the latter case this is 16.23%, confirming that bias-specific sentiment diversification is crucial.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132220263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Summaries, ranked retrieval and sessions: a unified framework for information access evaluation 摘要、排序检索和会话:信息访问评估的统一框架

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484031

T. Sakai, Zhicheng Dou

We introduce a general information access evaluation framework that can potentially handle summaries, ranked document lists and even multi query sessions seamlessly. Our framework first builds a trailtext which represents a concatenation of all the texts read by the user during a search session, and then computes an evaluation metric called U-measure over the trailtext. Instead of discounting the value of a retrieved piece of information based on ranks, U-measure discounts it based on its position within the trailtext. U-measure takes the document length into account just like Time-Biased Gain (TBG), and has the diminishing return property. It is therefore more realistic than rank-based metrics. Furthermore, it is arguably more flexible than TBG, as it is free from the linear traversal assumption (i.e., that the user scans the ranked list from top to bottom), and can handle information access tasks other than ad hoc retrieval. This paper demonstrates the validity and versatility of the U-measure framework. Our main conclusions are: (a) For ad hoc retrieval, U-measure is at least as reliable as TBG in terms of rank correlations with traditional metrics and discriminative power; (b) For diversified search, our diversity versions of U-measure are highly correlated with state-of-the-art diversity metrics; (c) For multi-query sessions, U-measure is highly correlated with Session nDCG; and (d) Unlike rank-based metrics such as DCG, U-measure can quantify the differences between linear and nonlinear traversals in sessions. We argue that our new framework is useful for understanding the user's search behaviour and for comparison across different information access styles (e.g. examining a direct answer vs. examining a ranked list of web pages).

我们引入了一个通用的信息访问评估框架，它可以无缝地处理摘要、排序文档列表甚至多查询会话。我们的框架首先构建一个trailtext，它表示用户在搜索会话期间读取的所有文本的连接，然后计算一个称为U-measure的评价指标。U-measure不是根据排名对检索到的信息进行折扣，而是根据其在trailtext中的位置对其进行折扣。U-measure像时间偏置增益(TBG)一样考虑了文档长度，并且具有收益递减的特性。因此，它比基于排名的指标更现实。此外，可以说它比TBG更灵活，因为它不需要线性遍历假设(即，用户从上到下扫描排名列表)，并且可以处理信息访问任务，而不是临时检索。本文论证了u -测度框架的有效性和通用性。我们的主要结论是:(a)对于临时检索，就与传统指标的等级相关性和判别能力而言，U-measure至少与TBG一样可靠;(b)对于多样化搜索，我们的U-measure的多样性版本与最先进的多样性指标高度相关;(c)对于多查询会话，U-measure与会话nDCG高度相关;(d)与基于等级的指标(如DCG)不同，U-measure可以量化会话中线性和非线性遍历之间的差异。我们认为，我们的新框架对于理解用户的搜索行为和跨不同信息访问风格的比较(例如，检查直接答案与检查网页排名列表)是有用的。

{"title":"Summaries, ranked retrieval and sessions: a unified framework for information access evaluation","authors":"T. Sakai, Zhicheng Dou","doi":"10.1145/2484028.2484031","DOIUrl":"https://doi.org/10.1145/2484028.2484031","url":null,"abstract":"We introduce a general information access evaluation framework that can potentially handle summaries, ranked document lists and even multi query sessions seamlessly. Our framework first builds a trailtext which represents a concatenation of all the texts read by the user during a search session, and then computes an evaluation metric called U-measure over the trailtext. Instead of discounting the value of a retrieved piece of information based on ranks, U-measure discounts it based on its position within the trailtext. U-measure takes the document length into account just like Time-Biased Gain (TBG), and has the diminishing return property. It is therefore more realistic than rank-based metrics. Furthermore, it is arguably more flexible than TBG, as it is free from the linear traversal assumption (i.e., that the user scans the ranked list from top to bottom), and can handle information access tasks other than ad hoc retrieval. This paper demonstrates the validity and versatility of the U-measure framework. Our main conclusions are: (a) For ad hoc retrieval, U-measure is at least as reliable as TBG in terms of rank correlations with traditional metrics and discriminative power; (b) For diversified search, our diversity versions of U-measure are highly correlated with state-of-the-art diversity metrics; (c) For multi-query sessions, U-measure is highly correlated with Session nDCG; and (d) Unlike rank-based metrics such as DCG, U-measure can quantify the differences between linear and nonlinear traversals in sessions. We argue that our new framework is useful for understanding the user's search behaviour and for comparison across different information access styles (e.g. examining a direct answer vs. examining a ranked list of web pages).","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134442877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 90

Modeling the uniqueness of the user preferences for recommendation systems 为推荐系统建模用户偏好的唯一性

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484102

Haggai Roitman, David Carmel, Y. Mass, I. Eiron

In this paper we propose a novel framework for modeling the uniqueness of the user preferences for recommendation systems. User uniqueness is determined by learning to what extent the user's item preferences deviate from those of an "average user" in the system. Based on this framework, we suggest three different recommendation strategies that trade between uniqueness and conformity. Using two real item datasets, we demonstrate the effectiveness of our uniqueness based recommendation framework.

在本文中，我们提出了一个新的框架来建模用户偏好的唯一性推荐系统。用户独特性是通过了解用户的物品偏好偏离系统中“普通用户”的程度来确定的。基于这个框架，我们提出了三种不同的推荐策略，在独特性和一致性之间进行交易。使用两个真实项目数据集，我们证明了基于唯一性的推荐框架的有效性。

引用次数: 1

Time-aware structured query suggestion 有时间意识的结构化查询建议

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484143

Taiki Miyanishi, T. Sakai

Most commercial search engines have a query suggestion feature, which is designed to capture various possible search intents behind the user's original query. However, even though different search intents behind a given query may have been popular at different time periods in the past, existing query suggestion methods neither utilize nor present such information. In this study, we propose Time-aware Structured Query Suggestion (TaSQS) which clusters query suggestions along a timeline so that the user can narrow down his search from a temporal point of view. Moreover, when a suggested query is clicked, TaSQS presents web pages from query-URL bipartite graphs after ranking them according to the click counts within a particular time period. Our experiments using data from a commercial search engine log show that the time-aware clustering and the time-aware document ranking features of TaSQS are both effective.

大多数商业搜索引擎都有查询建议功能，该功能的目的是捕获用户原始查询背后的各种可能的搜索意图。然而，即使给定查询背后的不同搜索意图可能在过去的不同时期流行，现有的查询建议方法既不利用也不呈现这些信息。在这项研究中，我们提出了时间感知结构化查询建议(TaSQS)，它沿着时间轴聚集查询建议，以便用户可以从时间的角度缩小搜索范围。此外，当建议的查询被点击时，TaSQS根据特定时间段内的点击次数对网页进行排序，然后从查询- url二部图中呈现网页。我们使用商业搜索引擎日志数据进行的实验表明，TaSQS的时间感知聚类和时间感知文档排序特征都是有效的。

引用次数: 29

Modeling click-through based word-pairs for web search 为网络搜索建模基于点击的词对

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484082

Jagadeesh Jagarlamudi, Jianfeng Gao

Statistical translation models and latent semantic analysis (LSA) are two effective approaches to exploiting click-through data for Web search ranking. While the former learns semantic relationships between query terms and document terms directly, the latter maps a document and the queries for which it has been clicked to vectors in a lower dimensional semantic space. This paper presents two document ranking models that combine the strengths of both the approaches by explicitly modeling word-pairs. The first model, called PairModel, is a monolingual ranking model based on word-pairs derived from click-through data. It maps queries and documents into a concept space spanned by these word-pairs. The second model, called Bilingual Paired Topic Model (BPTM), uses bilingual word translations and can jointly model query-document collections written in multiple languages. This model uses topics to capture term dependencies and maps queries and documents in multiple languages into a lower dimensional semantic sub-space spanned by the topics. These models are evaluated on the Web search task using real world data sets in three different languages. Results show that they consistently outperform various state-of-the-art baseline models, and the best result is obtained by interpolating PairModel and BPTM.

统计翻译模型和潜在语义分析(LSA)是利用点击率数据进行网页搜索排名的两种有效方法。前者直接学习查询词和文档词之间的语义关系，而后者将文档和被点击的查询映射到较低维语义空间中的向量。本文提出了两个文档排序模型，通过显式地对词对建模，结合了这两种方法的优势。第一个模型名为PairModel，它是一个单语言排序模型，基于从点击量数据中获得的词对。它将查询和文档映射到由这些词对跨越的概念空间中。第二个模型称为双语配对主题模型(BPTM)，它使用双语单词翻译，可以联合建模用多种语言编写的查询文档集合。该模型使用主题捕获术语依赖关系，并将多种语言的查询和文档映射到由主题跨越的较低维度语义子空间中。这些模型使用三种不同语言的真实世界数据集在Web搜索任务上进行评估。结果表明，它们的性能始终优于各种最先进的基线模型，并且通过插值PairModel和BPTM获得了最好的结果。

{"title":"Modeling click-through based word-pairs for web search","authors":"Jagadeesh Jagarlamudi, Jianfeng Gao","doi":"10.1145/2484028.2484082","DOIUrl":"https://doi.org/10.1145/2484028.2484082","url":null,"abstract":"Statistical translation models and latent semantic analysis (LSA) are two effective approaches to exploiting click-through data for Web search ranking. While the former learns semantic relationships between query terms and document terms directly, the latter maps a document and the queries for which it has been clicked to vectors in a lower dimensional semantic space. This paper presents two document ranking models that combine the strengths of both the approaches by explicitly modeling word-pairs. The first model, called PairModel, is a monolingual ranking model based on word-pairs derived from click-through data. It maps queries and documents into a concept space spanned by these word-pairs. The second model, called Bilingual Paired Topic Model (BPTM), uses bilingual word translations and can jointly model query-document collections written in multiple languages. This model uses topics to capture term dependencies and maps queries and documents in multiple languages into a lower dimensional semantic sub-space spanned by the topics. These models are evaluated on the Web search task using real world data sets in three different languages. Results show that they consistently outperform various state-of-the-art baseline models, and the best result is obtained by interpolating PairModel and BPTM.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133793223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Live nuggets extractor: a semi-automated system for text extraction and test collection creation Live掘金提取器:用于文本提取和测试集合创建的半自动化系统

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484211

Matthew Ekstrand-Abueg, Virgil Pavlu, J. Aslam

The Live Nugget Extractor system provides users with a method of efficiently and accurately collecting relevant information for any web query rather than providing a simple ranked lists of documents. The system utilizes an online learning procedure to infer relevance of unjudged documents while extracting and ranking information from judged documents. This creates a set of judged and inferred relevance scores for both documents and text fragments, which can be used for test collections, summarization, and other tasks where high accuracy and large collections with minimal human effort are needed.

Live Nugget Extractor系统为用户提供了一种有效而准确地收集任何网络查询相关信息的方法，而不是提供简单的文档排名列表。该系统利用在线学习程序来推断未被判断的文档的相关性，同时从被判断的文档中提取信息并对其进行排序。这将为文档和文本片段创建一组判断和推断的相关性分数，这些分数可用于测试收集、摘要和其他任务，在这些任务中，需要最少的人力来进行高精度和大型收集。

引用次数: 5

Finding impressive social content creators: searching for SNS illustrators using feedback on motifs and impressions 寻找令人印象深刻的社交内容创作者:使用主题和印象的反馈搜索SNS插图画家

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484133

Yohei Seki, Kiyoto Miyajima

We propose a method for finding impressive creators in online social network sites (SNSs). Many users are actively engaged in publishing their own works, sharing visual content on sites such as YouTube or Flickr. In this paper, we focus on the Japanese illustration-sharing SNS, Pixiv. We implement an illustrator search system based on user impression categories. The impressions of illustrators are estimated from clues in the crowdsourced social-tag annotations on their illustrations. We evaluated our system in terms of normalized discounted cumulative gain and found that using feedback on motifs and impressions for illustrations of relevant illustrators improved illustrator search by 11%.

我们提出了一种在在线社交网站(sns)中寻找令人印象深刻的创作者的方法。许多用户积极地发布自己的作品，在YouTube或Flickr等网站上分享视觉内容。本文以日本插画分享网站Pixiv为研究对象。我们实现了一个基于用户印象分类的插画搜索系统。插图画家的印象是根据他们插图上众包的社会标签注释的线索来估计的。我们根据归一化贴现累积增益评估了我们的系统，发现使用对相关插画家插图的主题和印象的反馈将插画家的搜索提高了11%。

引用次数: 1

Internet advertising: theory and practice 网络广告:理论与实践

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484221

Bin Gao, Jun Yan, Dou Shen, Tie-Yan Liu

Internet advertising, a form of advertising that utilizes the Internet to deliver marketing messages and attract customers, has seen exponential growth since its inception around twenty years ago; it has been pivotal to the success of the World Wide Web. The dramatic growth of internet advertising poses great challenges to information retrieval, machine learning, data mining and game theory, and it calls for novel technologies to be developed. The main purpose of this workshop is to bring together researchers and practitioners in the area of Internet Advertising and enable them to share their latest research results, to express their opinions, and to discuss future directions.

网络广告是一种利用互联网传递营销信息和吸引客户的广告形式，自20多年前出现以来，已经呈指数级增长。它是万维网成功的关键。互联网广告的急剧增长对信息检索、机器学习、数据挖掘和博弈论提出了巨大的挑战，并要求开发新的技术。本次研讨会的主要目的是将互联网广告领域的研究人员和从业者聚集在一起，分享他们最新的研究成果，表达他们的观点，并讨论未来的发展方向。

引用次数: 3

A geolinguistic web application based on linked open data 一个基于链接开放数据的地理语言学web应用程序

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484219

E. D. Buccio, Giorgio Maria Di Nunzio, G. Silvello

Digital Geolinguistic systems encourage collaboration between linguists, historians, archaeologists, ethnographers, as they explore the relationship between language and cultural adaptation and change. In this demo, we propose a Linked Open Data approach for increasing the level of interoperability of geolinguistic applications and the reuse of the data. We present a case study of a geolinguistic project named Atlante Sintattico d'Italia, Syntactic Atlas of Italy (ASIt).

数字地理语言学系统鼓励语言学家、历史学家、考古学家、民族志学家在探索语言与文化适应和变化之间的关系时进行合作。在这个演示中，我们提出了一种链接开放数据方法，以提高地理语言学应用程序的互操作性水平和数据的重用。我们提出了一个名为Atlante Sintattico d'Italia的地理语言学项目的案例研究，即意大利句法地图集(ASIt)。

引用次数: 3

Diversified relevance feedback 多元化关联反馈

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pub Date : 2013-07-28 DOI: 10.1145/2484028.2484227

Matt Crane

The need for a search engine to deal with ambiguous queries has been known for a long time (diversification). However, it is only recently that this need has become a focus within information retrieval research. How to respond to indications that a result is relevant to a query (relevance feedback) has also been a long focus of research. When thinking about the results for a query as being clustered by topic, these two areas of information retrieval research appear to be opposed to each other. Interestingly though, they both appear to improve the performance of search engines, raising the question: they can be combined or made to work with each other? When presented with an ambiguous query there are a number of techniques that can be employed to better select results. The primary technique being researched now is diversification, which aims to populate the results with a set of documents that cover different possible interpretations for the query, while maintaining a degree of relevance, as determined by the search engine. For example, given a query of "java" it is unclear whether the user, without any other information, means the programming language, the coffee, the island of Indonesia or a multitude of other meanings. In order to do this the assumption that documents are independent of each other when assessing potential relevance has to be broken. That is, a documents relevance, as calculated by the search engine, is no longer dependent only on the query, but also the other documents that have been selected. How a document is identified as being similar to previously selected documents, and the trade off between estimated relevance and topic coverage are current areas for information retrieval research. For unambiguous queries, or for search engines that do not perform diversification, it is possible to improve the results selected by reacting to information identifying a given result as truly relevant or not. This mechanism is known as relevance feedback. The most common response to relevance feedback is to investigate the documents for their most content-bearing terms, and either add, or subtract, their influence to a newly formed query which is then re-run on the remaining documents to re-order them. There has been a scant amount of research into the combination of these methods. However, Carbonell et al. [1] show that an initially diverse result set can provide a better approach for identifying the topic a user is interested in for a relevance feedback style approach. This approach was further extended by Raman et al. [4]. An important aspect of relevance feedback is the selection of documents to use. In the 2008 TREC relevance feedback track, Meij et al. [3] generated a diversified result set which outperformed other rankings as a source of feedback documents. The use of pseudo-relevance feedback (assuming the top ranked documents are relevant) to extract sub-topics for use in diversification was explored by Santos et al. [5]. These prev

很长一段时间以来，人们都知道搜索引擎需要处理模棱两可的查询(多样化)。然而，直到最近，这种需求才成为信息检索研究的焦点。如何回应结果与查询相关的指示(相关性反馈)也一直是研究的焦点。当考虑按主题聚类查询结果时，这两个信息检索研究领域似乎是相互对立的。有趣的是，它们似乎都提高了搜索引擎的性能，这就提出了一个问题:它们可以组合起来或相互协作吗?当出现歧义查询时，可以使用许多技术来更好地选择结果。目前正在研究的主要技术是多样化，其目的是用一组文档填充结果，这些文档涵盖了对查询的不同可能解释，同时保持一定程度的相关性，这由搜索引擎决定。例如，给定“java”的查询，在没有任何其他信息的情况下，不清楚用户是指编程语言、咖啡、印度尼西亚岛还是许多其他含义。为了做到这一点，在评估潜在相关性时必须打破文档相互独立的假设。也就是说，由搜索引擎计算的文档相关性不再仅仅依赖于查询，还依赖于已选择的其他文档。如何将文档识别为与先前选择的文档相似，以及在估计的相关性和主题覆盖之间进行权衡是当前信息检索研究的领域。对于明确的查询，或者对于不执行多样化的搜索引擎，可以通过对确定给定结果是否真正相关的信息作出反应来改进所选结果。这种机制被称为相关反馈。对相关反馈最常见的反应是调查文档中与内容相关的术语，并在新形成的查询中添加或减去它们的影响，然后在剩余文档上重新运行以重新排序它们。对这些方法的结合进行的研究很少。然而，Carbonell等人表明，最初多样化的结果集可以为相关性反馈风格的方法提供更好的方法来识别用户感兴趣的主题。该方法被Raman等人进一步推广。相关性反馈的一个重要方面是选择要使用的文档。在2008年TREC相关反馈跟踪中，Meij等人([3])生成了一个多样化的结果集，作为反馈文档的来源，该结果集优于其他排名。Santos等人探索了使用伪相关反馈(假设排名靠前的文档是相关的)来提取子主题以用于多样化的方法。这些先前的方法表明，这两个想法比预期的更紧密地联系在一起。本文将使用ATIRE搜索引擎[6]进一步探讨多元化与相关性反馈之间的关系。之所以选择ATIRE，是因为它是在当地开发的，并且设计得小而快。ATIRE还产生了一个具有竞争力的基线，在2011年TREC多样性任务中，它将排名第六，同时不执行多样化和索引时间垃圾邮件过滤[2]，尽管我们承认这并不等同于提交运行。

{"title":"Diversified relevance feedback","authors":"Matt Crane","doi":"10.1145/2484028.2484227","DOIUrl":"https://doi.org/10.1145/2484028.2484227","url":null,"abstract":"The need for a search engine to deal with ambiguous queries has been known for a long time (diversification). However, it is only recently that this need has become a focus within information retrieval research. How to respond to indications that a result is relevant to a query (relevance feedback) has also been a long focus of research. When thinking about the results for a query as being clustered by topic, these two areas of information retrieval research appear to be opposed to each other. Interestingly though, they both appear to improve the performance of search engines, raising the question: they can be combined or made to work with each other? When presented with an ambiguous query there are a number of techniques that can be employed to better select results. The primary technique being researched now is diversification, which aims to populate the results with a set of documents that cover different possible interpretations for the query, while maintaining a degree of relevance, as determined by the search engine. For example, given a query of \"java\" it is unclear whether the user, without any other information, means the programming language, the coffee, the island of Indonesia or a multitude of other meanings. In order to do this the assumption that documents are independent of each other when assessing potential relevance has to be broken. That is, a documents relevance, as calculated by the search engine, is no longer dependent only on the query, but also the other documents that have been selected. How a document is identified as being similar to previously selected documents, and the trade off between estimated relevance and topic coverage are current areas for information retrieval research. For unambiguous queries, or for search engines that do not perform diversification, it is possible to improve the results selected by reacting to information identifying a given result as truly relevant or not. This mechanism is known as relevance feedback. The most common response to relevance feedback is to investigate the documents for their most content-bearing terms, and either add, or subtract, their influence to a newly formed query which is then re-run on the remaining documents to re-order them. There has been a scant amount of research into the combination of these methods. However, Carbonell et al. [1] show that an initially diverse result set can provide a better approach for identifying the topic a user is interested in for a relevance feedback style approach. This approach was further extended by Raman et al. [4]. An important aspect of relevance feedback is the selection of documents to use. In the 2008 TREC relevance feedback track, Meij et al. [3] generated a diversified result set which outperformed other rankings as a source of feedback documents. The use of pseudo-relevance feedback (assuming the top ranked documents are relevant) to extract sub-topics for use in diversification was explored by Santos et al. [5]. These prev","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129711131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀