首页 > 最新文献

Proceedings of the 2015 International Conference on The Theory of Information Retrieval最新文献

英文 中文
Terms, Topics & Tasks: Enhanced User Modelling for Better Personalization 术语、主题和任务:增强用户建模以实现更好的个性化
Rishabh Mehrotra, Emine Yilmaz
Given the distinct preferences of different users while using search engines, search personalization has become an important problem in information retrieval. Most approaches to search personalization are based on identifying topics a user may be interested in and personalizing search results based on this information. While topical interests information of users can be highly valuable in personalizing search results and improving user experience, it ignores the fact that two different users that have similar topical interests may still be interested in achieving very different tasks with respect to this topic (e.g. the type of tasks a broker is likely to perform related to finance is likely to be very different than that of a regular investor). Hence, considering user's topical interests jointly with the type of tasks they are likely to be interested in could result in better personalised We present an approach that uses search task information embedded in search logs to represent users by their actions over a task-space as well as over their topical-interest space. In particular, we describe a tensor based approach that represents each user in terms of (i) user's topical interests and (ii) user's search task behaviours in a coupled fashion and use these representations for personalization. Additionally, we also integrate user's historic search behavior in a coupled matrix-tensor factorization framework to learn user representations. Through extensive evaluation via query recommendations and user cohort analysis, we demonstrate the value of considering topic specific task information while developing user models.
鉴于不同用户在使用搜索引擎时的不同偏好,搜索个性化已经成为信息检索中的一个重要问题。大多数搜索个性化的方法都是基于识别用户可能感兴趣的主题,并根据这些信息个性化搜索结果。虽然用户的主题兴趣信息在个性化搜索结果和改善用户体验方面非常有价值,但它忽略了这样一个事实,即两个具有相似主题兴趣的不同用户可能仍然对实现与该主题相关的非常不同的任务感兴趣(例如,经纪人可能执行的与金融相关的任务类型可能与普通投资者的任务类型非常不同)。因此,将用户的主题兴趣与他们可能感兴趣的任务类型结合起来考虑,可能会产生更好的个性化。我们提出了一种方法,该方法使用嵌入在搜索日志中的搜索任务信息,通过用户在任务空间和主题兴趣空间上的行为来表示用户。特别是,我们描述了一种基于张量的方法,该方法根据(i)用户的主题兴趣和(ii)用户的搜索任务行为以耦合的方式表示每个用户,并使用这些表示进行个性化。此外,我们还将用户的历史搜索行为集成到一个耦合矩阵-张量分解框架中,以学习用户表示。通过查询推荐和用户队列分析的广泛评估,我们展示了在开发用户模型时考虑特定主题任务信息的价值。
{"title":"Terms, Topics & Tasks: Enhanced User Modelling for Better Personalization","authors":"Rishabh Mehrotra, Emine Yilmaz","doi":"10.1145/2808194.2809467","DOIUrl":"https://doi.org/10.1145/2808194.2809467","url":null,"abstract":"Given the distinct preferences of different users while using search engines, search personalization has become an important problem in information retrieval. Most approaches to search personalization are based on identifying topics a user may be interested in and personalizing search results based on this information. While topical interests information of users can be highly valuable in personalizing search results and improving user experience, it ignores the fact that two different users that have similar topical interests may still be interested in achieving very different tasks with respect to this topic (e.g. the type of tasks a broker is likely to perform related to finance is likely to be very different than that of a regular investor). Hence, considering user's topical interests jointly with the type of tasks they are likely to be interested in could result in better personalised We present an approach that uses search task information embedded in search logs to represent users by their actions over a task-space as well as over their topical-interest space. In particular, we describe a tensor based approach that represents each user in terms of (i) user's topical interests and (ii) user's search task behaviours in a coupled fashion and use these representations for personalization. Additionally, we also integrate user's historic search behavior in a coupled matrix-tensor factorization framework to learn user representations. Through extensive evaluation via query recommendations and user cohort analysis, we demonstrate the value of considering topic specific task information while developing user models.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124266955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Towards Less Biased Web Search 走向更少偏见的网络搜索
Xitong Liu, Hui Fang, Deng Cai
Web search engines now serve as essential assistant to help users make decisions in different aspects. Delivering correct and impartial information is a crucial functionality for search engines as any false information may lead to unwise decision and thus undesirable consequences. Unfortunately, a recent study revealed that Web search engines tend to provide biased information with most results supporting users' beliefs conveyed in queries regardless of the truth. In this paper we propose to alleviate bias in Web search through predicting the topical polarity of documents, which is the overall tendency of one document regarding whether it supports or disapproves the belief in query. By applying the prediction to balance search results, users would receive less biased information and therefore make wiser decision. To achieve this goal, we propose a novel textual segment extraction method to distill and generate document feature representation, and leverage convolution neural network, an effective deep learning approach, to predict topical polarity of documents. We conduct extensive experiments on a set of queries with medical indents and demonstrate that our model performs empirically well on identifying topical polarity with satisfying accuracy. To our best knowledge, our work is the first on investigating the mitigation of bias in Web search and could provide directions on future research.
网络搜索引擎现在是帮助用户在不同方面做出决策的重要助手。提供正确和公正的信息是搜索引擎的关键功能,因为任何虚假信息都可能导致不明智的决定,从而导致不良后果。不幸的是,最近的一项研究表明,Web搜索引擎倾向于提供有偏见的信息,大多数结果支持用户在查询中传达的信念,而不顾事实。在本文中,我们提出通过预测文档的主题极性来减轻Web搜索中的偏见,主题极性是指一个文档在支持或不支持查询信念方面的总体趋势。通过应用预测来平衡搜索结果,用户将收到更少的有偏见的信息,从而做出更明智的决定。为了实现这一目标,我们提出了一种新的文本片段提取方法来提取和生成文档特征表示,并利用卷积神经网络这一有效的深度学习方法来预测文档的主题极性。我们对一组带有医学缩进的查询进行了广泛的实验,并证明我们的模型在识别主题极性方面表现良好,具有令人满意的准确性。据我们所知,我们的工作是第一次调查网络搜索中偏见的缓解,并可能为未来的研究提供方向。
{"title":"Towards Less Biased Web Search","authors":"Xitong Liu, Hui Fang, Deng Cai","doi":"10.1145/2808194.2809476","DOIUrl":"https://doi.org/10.1145/2808194.2809476","url":null,"abstract":"Web search engines now serve as essential assistant to help users make decisions in different aspects. Delivering correct and impartial information is a crucial functionality for search engines as any false information may lead to unwise decision and thus undesirable consequences. Unfortunately, a recent study revealed that Web search engines tend to provide biased information with most results supporting users' beliefs conveyed in queries regardless of the truth. In this paper we propose to alleviate bias in Web search through predicting the topical polarity of documents, which is the overall tendency of one document regarding whether it supports or disapproves the belief in query. By applying the prediction to balance search results, users would receive less biased information and therefore make wiser decision. To achieve this goal, we propose a novel textual segment extraction method to distill and generate document feature representation, and leverage convolution neural network, an effective deep learning approach, to predict topical polarity of documents. We conduct extensive experiments on a set of queries with medical indents and demonstrate that our model performs empirically well on identifying topical polarity with satisfying accuracy. To our best knowledge, our work is the first on investigating the mitigation of bias in Web search and could provide directions on future research.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126417341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Development and Evaluation of Search Tasks for IIR Experiments using a Cognitive Complexity Framework 基于认知复杂性框架的IIR实验搜索任务开发与评价
D. Kelly, Jaime Arguello, A. Edwards, Wan-Ching Wu
One of the most challenging aspects of designing interactive information retrieval (IIR) experiments with users is the development of search tasks. We describe an evaluation of 20 search tasks that were designed for use in IIR experiments and developed using a cognitive complexity framework from educational theory. The search tasks represent five levels of cognitive complexity and four topical domains. The tasks were evaluated in the context of a laboratory IIR experiment with 48 participants. Behavioral and self-report data were used to characterize and understand differences among tasks. Results showed more cognitively complex tasks required significantly more search activity from participants (e.g., more queries, clicks, and time to complete). However, participants did not evaluate more cognitively complex tasks as more difficult and were equally satisfied with their performances across tasks. Our work makes four contributions: (1) it adds to what is known about the relationship among task, search behaviors and user experience; (2) it presents a framework for task creation and evaluation; (3) it provides tasks and questionnaires that can be reused by others and (4) it raises questions about findings and assumptions of many recent studies that only use behavioral signals from search logs as evidence for task difficulty and searcher satisfaction, as many of our results directly contradict these findings.
设计用户交互信息检索(IIR)实验最具挑战性的方面之一是搜索任务的开发。我们描述了20个搜索任务的评估,这些任务是设计用于IIR实验的,并使用来自教育理论的认知复杂性框架开发的。搜索任务代表了五个层次的认知复杂性和四个主题领域。这些任务是在实验室IIR实验的背景下评估的,共有48名参与者。行为和自我报告数据被用来描述和理解任务之间的差异。结果显示,更复杂的认知任务需要参与者更多的搜索活动(例如,更多的查询、点击和完成时间)。然而,参与者并没有把更复杂的认知任务评价为更困难,他们对自己在不同任务中的表现同样满意。我们的工作有四个贡献:(1)它增加了关于任务、搜索行为和用户体验之间关系的已知知识;(2)提出了任务创建和评估的框架;(3)它提供了可以被其他人重复使用的任务和问卷;(4)它对许多最近研究的发现和假设提出了质疑,这些研究只使用搜索日志中的行为信号作为任务难度和搜索者满意度的证据,因为我们的许多结果直接与这些发现相矛盾。
{"title":"Development and Evaluation of Search Tasks for IIR Experiments using a Cognitive Complexity Framework","authors":"D. Kelly, Jaime Arguello, A. Edwards, Wan-Ching Wu","doi":"10.1145/2808194.2809465","DOIUrl":"https://doi.org/10.1145/2808194.2809465","url":null,"abstract":"One of the most challenging aspects of designing interactive information retrieval (IIR) experiments with users is the development of search tasks. We describe an evaluation of 20 search tasks that were designed for use in IIR experiments and developed using a cognitive complexity framework from educational theory. The search tasks represent five levels of cognitive complexity and four topical domains. The tasks were evaluated in the context of a laboratory IIR experiment with 48 participants. Behavioral and self-report data were used to characterize and understand differences among tasks. Results showed more cognitively complex tasks required significantly more search activity from participants (e.g., more queries, clicks, and time to complete). However, participants did not evaluate more cognitively complex tasks as more difficult and were equally satisfied with their performances across tasks. Our work makes four contributions: (1) it adds to what is known about the relationship among task, search behaviors and user experience; (2) it presents a framework for task creation and evaluation; (3) it provides tasks and questionnaires that can be reused by others and (4) it raises questions about findings and assumptions of many recent studies that only use behavioral signals from search logs as evidence for task difficulty and searcher satisfaction, as many of our results directly contradict these findings.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132136604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 135
Searching for Twitter Posts by Location 按位置搜索Twitter帖子
Ariana S. Minot, Andrew Heier, Davis E. King, O. Simek, N. Stanisha
The microblogging service Twitter is an increasingly popular platform for sharing information worldwide. This motivates the potential to mine information from Twitter, which can serve as a valuable resource for applications such as event localization and location-specific recommendation systems. Geolocation of Twitter messages is integral to such applications. However, only a a small percentage of Twitter posts are accompanied by a GPS location. Recent works have begun exploring ways to estimate the unknown location of Twitter users based on the content of their posts and various available metadata. This presents interesting challenges for natural language processing and multi-objective optimization. We propose a new method for estimating the home location of users based on both the content of their posts and their social connections on Twitter. Our method achieves an accuracy of 77% within 10 km in exchange for a reduction in coverage of 76% with respect to techniques which only use social connections.
微博服务推特是一个越来越受欢迎的全球信息共享平台。这激发了从Twitter中挖掘信息的潜力,这些信息可以作为事件本地化和特定位置推荐系统等应用程序的宝贵资源。Twitter消息的地理定位是这些应用程序不可或缺的一部分。然而,只有一小部分推特帖子附有GPS定位。最近的工作已经开始探索基于Twitter用户的帖子内容和各种可用元数据来估计其未知位置的方法。这对自然语言处理和多目标优化提出了有趣的挑战。我们提出了一种基于Twitter上的帖子内容和社交关系来估计用户家庭位置的新方法。我们的方法在10公里范围内实现了77%的准确率,与仅使用社会关系的技术相比,覆盖率降低了76%。
{"title":"Searching for Twitter Posts by Location","authors":"Ariana S. Minot, Andrew Heier, Davis E. King, O. Simek, N. Stanisha","doi":"10.1145/2808194.2809480","DOIUrl":"https://doi.org/10.1145/2808194.2809480","url":null,"abstract":"The microblogging service Twitter is an increasingly popular platform for sharing information worldwide. This motivates the potential to mine information from Twitter, which can serve as a valuable resource for applications such as event localization and location-specific recommendation systems. Geolocation of Twitter messages is integral to such applications. However, only a a small percentage of Twitter posts are accompanied by a GPS location. Recent works have begun exploring ways to estimate the unknown location of Twitter users based on the content of their posts and various available metadata. This presents interesting challenges for natural language processing and multi-objective optimization. We propose a new method for estimating the home location of users based on both the content of their posts and their social connections on Twitter. Our method achieves an accuracy of 77% within 10 km in exchange for a reduction in coverage of 76% with respect to techniques which only use social connections.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131580324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Random Walks on the Reputation Graph 声望图上的随机漫步
Sabir Ribas, B. Ribeiro-Neto, Rodrygo L. T. Santos, E. D. S. E. Silva, A. Ueda, N. Ziviani
The identification of reputable entities is an important task in business, education, and many other fields. On the other hand, as an arguably subjective, multi-faceted concept, quantifying reputation is challenging. In this paper, instead of relying on a single, precise definition of reputation, we propose to exploit the transference of reputation among entities in order to identify the most reputable ones. To this end, we propose a novel random walk model to infer the reputation of a target set of entities with respect to suitable sources of reputation. We instantiate our model in an academic search setting, by modeling research groups as reputation sources and publication venues as reputation targets. By relying on publishing behavior as a reputation signal, we demonstrate the effectiveness of our model in contrast to standard citation-based approaches for identifying reputable venues as well as researchers in the broad area of computer science. In addition, we demonstrate the robustness of our model to perturbations in the selection of reputation sources. Finally, we show that effective reputation sources can be chosen via the proposed model itself in a semi-automatic fashion.
在商业、教育和许多其他领域,识别信誉良好的实体是一项重要任务。另一方面,作为一个主观的、多方面的概念,对声誉进行量化是具有挑战性的。在本文中,我们建议利用实体之间的声誉转移来识别最有信誉的实体,而不是依赖于单一的、精确的声誉定义。为此,我们提出了一种新的随机漫步模型来推断目标实体集相对于合适的声誉来源的声誉。我们在学术搜索设置中实例化我们的模型,将研究小组建模为声誉来源,将出版场所建模为声誉目标。通过依靠发表行为作为声誉信号,我们证明了与标准的基于引用的方法相比,我们的模型在识别信誉良好的场所以及广泛的计算机科学领域的研究人员方面的有效性。此外,我们证明了我们的模型对声誉来源选择的扰动的鲁棒性。最后,我们证明了有效的声誉来源可以通过所提出的模型本身以半自动的方式选择。
{"title":"Random Walks on the Reputation Graph","authors":"Sabir Ribas, B. Ribeiro-Neto, Rodrygo L. T. Santos, E. D. S. E. Silva, A. Ueda, N. Ziviani","doi":"10.1145/2808194.2809462","DOIUrl":"https://doi.org/10.1145/2808194.2809462","url":null,"abstract":"The identification of reputable entities is an important task in business, education, and many other fields. On the other hand, as an arguably subjective, multi-faceted concept, quantifying reputation is challenging. In this paper, instead of relying on a single, precise definition of reputation, we propose to exploit the transference of reputation among entities in order to identify the most reputable ones. To this end, we propose a novel random walk model to infer the reputation of a target set of entities with respect to suitable sources of reputation. We instantiate our model in an academic search setting, by modeling research groups as reputation sources and publication venues as reputation targets. By relying on publishing behavior as a reputation signal, we demonstrate the effectiveness of our model in contrast to standard citation-based approaches for identifying reputable venues as well as researchers in the broad area of computer science. In addition, we demonstrate the robustness of our model to perturbations in the selection of reputation sources. Finally, we show that effective reputation sources can be chosen via the proposed model itself in a semi-automatic fashion.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132592551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The Probability Ranking Principle is Not Optimal in Adversarial Retrieval Settings 在对抗性检索设置中,概率排序原则不是最优的
R. Ben-Basat, Moshe Tennenholtz, Oren Kurland
The probability ranking principle (PRP) - ranking documents in response to a query by their relevance probabilities - is the theoretical foundation of most ad hoc document retrieval methods. A key observation that motivates our work is that the PRP does not account for potential post-ranking effects, specifically, changes to documents that result from a given ranking. Yet, in adversarial retrieval settings such as the Web, authors may consistently try to promote their documents in rankings by changing them. We prove that, indeed, the PRP can be sub-optimal in adversarial retrieval settings. We do so by presenting a novel game theoretic analysis of the adversarial setting. The analysis is performed for different types of documents (single topic and multi topic) and is based on different assumptions about the writing qualities of documents' authors. We show that in some cases, introducing randomization into the document ranking function yields overall user utility that transcends that of applying the PRP.
概率排序原则(PRP)——根据查询的相关概率对文档进行排序——是大多数特殊文档检索方法的理论基础。激励我们工作的一个关键观察结果是,PRP没有考虑到潜在的排名后影响,特别是由于给定排名而导致的文档更改。然而,在像Web这样的对抗性检索设置中,作者可能一直试图通过更改文档来提高其排名。我们证明,确实,PRP可能是次优的对抗性检索设置。我们通过提出一种新的对抗性设置的博弈论分析来做到这一点。对不同类型的文档(单主题和多主题)进行分析,并基于对文档作者写作质量的不同假设。我们表明,在某些情况下,将随机化引入文档排序函数会产生比应用PRP更实用的总体用户效用。
{"title":"The Probability Ranking Principle is Not Optimal in Adversarial Retrieval Settings","authors":"R. Ben-Basat, Moshe Tennenholtz, Oren Kurland","doi":"10.1145/2808194.2809456","DOIUrl":"https://doi.org/10.1145/2808194.2809456","url":null,"abstract":"The probability ranking principle (PRP) - ranking documents in response to a query by their relevance probabilities - is the theoretical foundation of most ad hoc document retrieval methods. A key observation that motivates our work is that the PRP does not account for potential post-ranking effects, specifically, changes to documents that result from a given ranking. Yet, in adversarial retrieval settings such as the Web, authors may consistently try to promote their documents in rankings by changing them. We prove that, indeed, the PRP can be sub-optimal in adversarial retrieval settings. We do so by presenting a novel game theoretic analysis of the adversarial setting. The analysis is performed for different types of documents (single topic and multi topic) and is based on different assumptions about the writing qualities of documents' authors. We show that in some cases, introducing randomization into the document ranking function yields overall user utility that transcends that of applying the PRP.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117294890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Learning to Reinforce Search Effectiveness 学习加强搜索效率
Jiyun Luo, Xuchu Dong, G. Yang
Session search is an Information Retrieval (IR) task which handles a series of queries issued for a search task. In this paper, we propose a novel reinforcement learning style information retrieval framework and develop a new feedback learning algorithm to model user feedback, including clicks and query reformulations, as reinforcement signals and to generate rewards in the RL framework. From a new perspective, we view session search as a cooperative game played between two agents, the user and the search engine. We study the communications between the two agents; they always exchange opinions on "whether the current stage of search is relevant" and "whether we should explore now." The algorithm infers user feedback models by an EM algorithm from the query logs. We compare to several state-of-the-art session search algorithms and evaluate our algorithm on the most recent TREC 2012 to 2014 Session Tracks. The experimental results demonstrates that our approach is highly effective for improving session search accuracy.
会话搜索是一个信息检索(Information Retrieval, IR)任务,它处理为搜索任务发出的一系列查询。在本文中,我们提出了一种新的强化学习风格信息检索框架,并开发了一种新的反馈学习算法来模拟用户反馈,包括点击和查询重新表述,作为强化信号,并在强化学习框架中生成奖励。从一个新的角度来看,我们将会话搜索看作是用户和搜索引擎两个代理之间的合作博弈。我们研究两个agent之间的通信;他们总是就“当前阶段的搜索是否相关”和“我们现在是否应该探索”等问题交换意见。该算法通过EM算法从查询日志中推断出用户反馈模型。我们比较了几种最先进的会话搜索算法,并在最近的TREC 2012到2014会话轨道上评估我们的算法。实验结果表明,该方法可以有效地提高会话搜索的准确率。
{"title":"Learning to Reinforce Search Effectiveness","authors":"Jiyun Luo, Xuchu Dong, G. Yang","doi":"10.1145/2808194.2809468","DOIUrl":"https://doi.org/10.1145/2808194.2809468","url":null,"abstract":"Session search is an Information Retrieval (IR) task which handles a series of queries issued for a search task. In this paper, we propose a novel reinforcement learning style information retrieval framework and develop a new feedback learning algorithm to model user feedback, including clicks and query reformulations, as reinforcement signals and to generate rewards in the RL framework. From a new perspective, we view session search as a cooperative game played between two agents, the user and the search engine. We study the communications between the two agents; they always exchange opinions on \"whether the current stage of search is relevant\" and \"whether we should explore now.\" The algorithm infers user feedback models by an EM algorithm from the query logs. We compare to several state-of-the-art session search algorithms and evaluate our algorithm on the most recent TREC 2012 to 2014 Session Tracks. The experimental results demonstrates that our approach is highly effective for improving session search accuracy.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116260157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval 微博的维数和信息量:利用微博的结构和维数进行自组织检索
Jesus A. Rodriguez Perez, J. Jose
In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs). Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking. Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.
近年来,微博服务如Twitter越来越受欢迎,导致如何有效利用其内容的研究活跃。微博文档(如tweets)在形态上不同于更传统的文档(如网页)。特别是,tweet比web文档短得多(140个字符),并且包含关于主题的上下文标签(hashtags),文档的目标受众(提及)以及指向外部内容的链接(url)。传统的和最先进的检索模型在捕获tweet的相关性方面表现得相当差,因为它们是在非常不同的条件下设计的。在这项工作中,我们将微博文档定义为一个高维实体,并研究了相关文档和不相关文档之间的结构差异。其次,我们尝试通过重新排序方法来增强观察到的最佳执行检索模型的行为,该方法可以解释推文之间这些维度的相对差异。此外,我们通过将相关和不相关的tweet建模为状态机,根据文档中不同维度之间的顺序来研究它们之间的交互。然后使用这些状态机生成分数,这些分数反过来用于重新排名。我们的评估结果显示,两种方法在不同截止点的精度方面比基线有统计学上的显著改善。这些结果证实了文档中不同维度的相对存在及其排序与微博的相关性有关。
{"title":"On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval","authors":"Jesus A. Rodriguez Perez, J. Jose","doi":"10.1145/2808194.2809466","DOIUrl":"https://doi.org/10.1145/2808194.2809466","url":null,"abstract":"In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs). Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking. Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124673189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An Initial Analytical Exploration of Retrievability 可追偿性的初步分析探讨
Aldo Lipani, M. Lupu, Akiko Aizawa, A. Hanbury
We approach the problem of retrievability from an analytical perspective, starting with modeling conjunctive and disjunctive queries in a boolean model. We show that this represents an upper bound on retrievability for all other best match algorithms. We follow this with an observation of imbalance in the distribution of retrievability, using the Gini coefficient. Simulation-based experiments show the behavior of the Gini coefficient for retrievability under different types and lengths of queries, as well as different assumptions about the document length distribution in a collection.
我们从分析的角度来处理可检索性问题,首先在布尔模型中建模合取查询和析取查询。我们证明这代表了所有其他最佳匹配算法的可检索性的上界。我们用基尼系数来观察可恢复性分布的不平衡。基于模拟的实验显示了不同查询类型和长度下可检索性的基尼系数的行为,以及对集合中文档长度分布的不同假设。
{"title":"An Initial Analytical Exploration of Retrievability","authors":"Aldo Lipani, M. Lupu, Akiko Aizawa, A. Hanbury","doi":"10.1145/2808194.2809495","DOIUrl":"https://doi.org/10.1145/2808194.2809495","url":null,"abstract":"We approach the problem of retrievability from an analytical perspective, starting with modeling conjunctive and disjunctive queries in a boolean model. We show that this represents an upper bound on retrievability for all other best match algorithms. We follow this with an observation of imbalance in the distribution of retrievability, using the Gini coefficient. Simulation-based experiments show the behavior of the Gini coefficient for retrievability under different types and lengths of queries, as well as different assumptions about the document length distribution in a collection.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131371135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On Divergence Measures and Static Index Pruning 散度测度与静态指数修剪
Ruey-Cheng Chen, Chia-Jung Lee, W. Bruce Croft
We study the problem of static index pruning in a renowned divergence minimization framework, using a range of divergence measures such as f-divergence and Rényi divergence as the objective. We show that many well-known divergence measures are convex in pruning decisions, and therefore can be exactly minimized using an efficient algorithm. Our approach allows postings be prioritized according to the amount of information they contribute to the index, and through specifying a different divergence measure the contribution is modeled on a different returns curve. In our experiment on GOV2 data, Rényi divergence of order infinity appears the most effective. This divergence measure significantly outperforms many standard methods and achieves identical retrieval effectiveness as full data using only 50% of the postings. When top-k precision is of the only concern, 10% of the data is sufficient to achieve the accuracy that one would usually expect from a full index.
我们在一个著名的散度最小化框架中研究了静态指数修剪问题,使用一系列的散度度量,如f散度和rsamunyi散度作为目标。我们证明了许多众所周知的散度度量在修剪决策中是凸的,因此可以使用有效的算法精确地最小化。我们的方法允许根据对指数贡献的信息量对帖子进行优先排序,并通过指定不同的发散度量,在不同的回报曲线上对贡献进行建模。在我们对GOV2数据的实验中,无限阶的rsamnyi散度是最有效的。这种差异度量明显优于许多标准方法,并且仅使用50%的帖子就实现了与完整数据相同的检索效率。当只考虑top-k精度时,10%的数据足以达到通常期望从完整索引获得的精度。
{"title":"On Divergence Measures and Static Index Pruning","authors":"Ruey-Cheng Chen, Chia-Jung Lee, W. Bruce Croft","doi":"10.1145/2808194.2809472","DOIUrl":"https://doi.org/10.1145/2808194.2809472","url":null,"abstract":"We study the problem of static index pruning in a renowned divergence minimization framework, using a range of divergence measures such as f-divergence and Rényi divergence as the objective. We show that many well-known divergence measures are convex in pruning decisions, and therefore can be exactly minimized using an efficient algorithm. Our approach allows postings be prioritized according to the amount of information they contribute to the index, and through specifying a different divergence measure the contribution is modeled on a different returns curve. In our experiment on GOV2 data, Rényi divergence of order infinity appears the most effective. This divergence measure significantly outperforms many standard methods and achieves identical retrieval effectiveness as full data using only 50% of the postings. When top-k precision is of the only concern, 10% of the data is sufficient to achieve the accuracy that one would usually expect from a full index.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116526351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 2015 International Conference on The Theory of Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1