首页 > 最新文献

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
Untangling Result List Refinement and Ranking Quality: a Framework for Evaluation and Prediction 解缠结果列表细化和质量排序:一个评估和预测的框架
Jiyin He, M. Bron, A. D. Vries, L. Azzopardi, M. de Rijke
Traditional batch evaluation metrics assume that user interaction with search results is limited to scanning down a ranked list. However, modern search interfaces come with additional elements supporting result list refinement (RLR) through facets and filters, making user search behavior increasingly dynamic. We develop an evaluation framework that takes a step beyond the interaction assumption of traditional evaluation metrics and allows for batch evaluation of systems with and without RLR elements. In our framework we model user interaction as switching between different sublists. This provides a measure of user effort based on the joint effect of user interaction with RLR elements and result quality. We validate our framework by conducting a user study and comparing model predictions with real user performance. Our model predictions show significant positive correlation with real user effort. Further, in contrast to traditional evaluation metrics, the predictions using our framework, of when users stand to benefit from RLR elements, reflect findings from our user study. Finally, we use the framework to investigate under what conditions systems with and without RLR elements are likely to be effective. We simulate varying conditions concerning ranking quality, users, task and interface properties demonstrating a cost-effective way to study whole system performance.
传统的批处理评估指标假设用户与搜索结果的交互仅限于扫描排序列表。但是,现代搜索界面提供了通过facet和过滤器支持结果列表细化(RLR)的附加元素,使得用户搜索行为越来越动态。我们开发了一个评估框架,它超越了传统评估指标的交互假设,并允许对有或没有RLR元素的系统进行批量评估。在我们的框架中,我们将用户交互建模为在不同子列表之间切换。这提供了一种基于用户与RLR元素交互的联合效果和结果质量的用户工作度量。我们通过进行用户研究并将模型预测与实际用户性能进行比较来验证我们的框架。我们的模型预测与实际用户的努力有显著的正相关。此外,与传统的评估指标相比,使用我们的框架的预测,即用户何时能够从RLR元素中受益,反映了我们的用户研究结果。最后,我们使用该框架来研究在什么条件下,有或没有RLR元素的系统可能是有效的。我们模拟了关于排序质量、用户、任务和界面属性的不同条件,展示了一种经济有效的方法来研究整个系统的性能。
{"title":"Untangling Result List Refinement and Ranking Quality: a Framework for Evaluation and Prediction","authors":"Jiyin He, M. Bron, A. D. Vries, L. Azzopardi, M. de Rijke","doi":"10.1145/2766462.2767740","DOIUrl":"https://doi.org/10.1145/2766462.2767740","url":null,"abstract":"Traditional batch evaluation metrics assume that user interaction with search results is limited to scanning down a ranked list. However, modern search interfaces come with additional elements supporting result list refinement (RLR) through facets and filters, making user search behavior increasingly dynamic. We develop an evaluation framework that takes a step beyond the interaction assumption of traditional evaluation metrics and allows for batch evaluation of systems with and without RLR elements. In our framework we model user interaction as switching between different sublists. This provides a measure of user effort based on the joint effect of user interaction with RLR elements and result quality. We validate our framework by conducting a user study and comparing model predictions with real user performance. Our model predictions show significant positive correlation with real user effort. Further, in contrast to traditional evaluation metrics, the predictions using our framework, of when users stand to benefit from RLR elements, reflect findings from our user study. Finally, we use the framework to investigate under what conditions systems with and without RLR elements are likely to be effective. We simulate varying conditions concerning ranking quality, users, task and interface properties demonstrating a cost-effective way to study whole system performance.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132957416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Head-Weighted Gap-Sensitive Correlation Coefficient 头部加权的间隙敏感相关系数
Ning Gao, Douglas W. Oard
Information retrieval systems rank documents, and shared-task evaluations yield results that can be used to rank information retrieval systems. Comparing rankings in ways that can yield useful insights is thus an important capability. When making such comparisons, it is often useful to give greater weight to comparisons near the head of a ranked list than to what happens further down. This is the focus of the widely used τAP measure. When scores are available, gap-sensitive measures give greater weight to larger differences than to smaller ones. This is the focus of the widely used Pearson correlation measure (ρ). This paper introduces a new measure, τGAP, which combines both features. System comparisons from the TREC 5 Ad Hoc track are used to illustrate the differences in emphasis achieved by τAP, ρ, and the proposed τGAP.
信息检索系统对文档进行排序,共享任务评估产生的结果可用于对信息检索系统进行排序。因此,以能够产生有用见解的方式比较排名是一项重要的功能。在进行这种比较时,给予排名列表顶部附近的比较更大的权重,而不是后面发生的比较,通常是有用的。这是广泛使用的τAP度量的重点。当分数可用时,差距敏感指标给予较大差异比较小差异更大的权重。这是广泛使用的Pearson相关度量(ρ)的焦点。本文介绍了一种结合这两个特征的新测度τGAP。来自TREC 5 Ad Hoc轨道的系统比较用于说明τAP, ρ和提出的τGAP在重点上的差异。
{"title":"A Head-Weighted Gap-Sensitive Correlation Coefficient","authors":"Ning Gao, Douglas W. Oard","doi":"10.1145/2766462.2767793","DOIUrl":"https://doi.org/10.1145/2766462.2767793","url":null,"abstract":"Information retrieval systems rank documents, and shared-task evaluations yield results that can be used to rank information retrieval systems. Comparing rankings in ways that can yield useful insights is thus an important capability. When making such comparisons, it is often useful to give greater weight to comparisons near the head of a ranked list than to what happens further down. This is the focus of the widely used τAP measure. When scores are available, gap-sensitive measures give greater weight to larger differences than to smaller ones. This is the focus of the widely used Pearson correlation measure (ρ). This paper introduces a new measure, τGAP, which combines both features. System comparisons from the TREC 5 Ad Hoc track are used to illustrate the differences in emphasis achieved by τAP, ρ, and the proposed τGAP.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133200263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Document Comprehensiveness and User Preferences in Novelty Search Tasks 新颖性检索任务中的文档全面性和用户偏好
Ashraf Bah Rabiou, Praveen Chandar, Ben Carterette
Different users may be attempting to satisfy different information needs while providing the same query to a search engine. Addressing that issue is addressing Novelty and Diversity in information retrieval. Novelty and Diversity search task models the task wherein users are interested in seeing more and more documents that are not only relevant, but also cover more aspects (or subtopics) related to the topic of interest. This is in contrast with the traditional IR task where topical relevance is the only factor in evaluating search results. In this paper, we conduct a user study where users are asked to give a preference between one of two documents B and C given a query and also given that they have already seen a document A. We then test a total of ten hypotheses pertaining to the relationship between the "comprehensiveness" of documents (i.e. the number of subtopics a document is relevant to) and real users' preference judgments. Our results show that users are inclined to prefer documents with higher comprehensiveness, even when the prior document A already covers more aspects than the two documents being compared, and even when the least preferred has a higher relevance grade. In fact, users are inclined to prefer documents with higher overall aspect-coverage even in cases where B and C are relevant to the same number of novel subtopics.
不同的用户在向搜索引擎提供相同的查询时,可能试图满足不同的信息需求。解决这个问题就是解决信息检索中的新颖性和多样性问题。新颖性和多样性搜索任务对用户感兴趣的任务进行建模,其中用户有兴趣看到越来越多的文档,这些文档不仅相关,而且涵盖了与感兴趣的主题相关的更多方面(或子主题)。这与传统的IR任务形成对比,其中主题相关性是评估搜索结果的唯一因素。在本文中,我们进行了一项用户研究,用户被要求在给定查询的两个文档B和C中的一个之间给出偏好,并且假设他们已经看到了文档a。然后,我们测试了关于文档的“全面性”(即文档相关的子主题的数量)与真实用户偏好判断之间关系的总共十个假设。我们的研究结果表明,用户倾向于更全面的文档,即使前面的文档A已经比被比较的两个文档涵盖了更多的方面,即使最不喜欢的文档具有更高的相关性等级。事实上,即使在B和C与相同数量的新子主题相关的情况下,用户也倾向于选择具有更高总体方面覆盖率的文档。
{"title":"Document Comprehensiveness and User Preferences in Novelty Search Tasks","authors":"Ashraf Bah Rabiou, Praveen Chandar, Ben Carterette","doi":"10.1145/2766462.2767820","DOIUrl":"https://doi.org/10.1145/2766462.2767820","url":null,"abstract":"Different users may be attempting to satisfy different information needs while providing the same query to a search engine. Addressing that issue is addressing Novelty and Diversity in information retrieval. Novelty and Diversity search task models the task wherein users are interested in seeing more and more documents that are not only relevant, but also cover more aspects (or subtopics) related to the topic of interest. This is in contrast with the traditional IR task where topical relevance is the only factor in evaluating search results. In this paper, we conduct a user study where users are asked to give a preference between one of two documents B and C given a query and also given that they have already seen a document A. We then test a total of ten hypotheses pertaining to the relationship between the \"comprehensiveness\" of documents (i.e. the number of subtopics a document is relevant to) and real users' preference judgments. Our results show that users are inclined to prefer documents with higher comprehensiveness, even when the prior document A already covers more aspects than the two documents being compared, and even when the least preferred has a higher relevance grade. In fact, users are inclined to prefer documents with higher overall aspect-coverage even in cases where B and C are relevant to the same number of novel subtopics.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"345 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122837049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Dynamic Query Modeling for Related Content Finding 用于相关内容查找的动态查询建模
Daan Odijk, E. Meij, I. Sijaranamual, M. de Rijke
While watching television, people increasingly consume additional content related to what they are watching. We consider the task of finding video content related to a live television broadcast for which we leverage the textual stream of subtitles associated with the broadcast. We model this task as a Markov decision process and propose a method that uses reinforcement learning to directly optimize the retrieval effectiveness of queries generated from the stream of subtitles. Our dynamic query modeling approach significantly outperforms state-of-the-art baselines for stationary query modeling and for text-based retrieval in a television setting. In particular we find that carefully weighting terms and decaying these weights based on recency significantly improves effectiveness. Moreover, our method is highly efficient and can be used in a live television setting, i.e., in near real time.
在看电视时,人们越来越多地消费与他们所看的内容相关的额外内容。我们考虑寻找与电视直播相关的视频内容的任务,为此我们利用与广播相关的字幕文本流。我们将此任务建模为马尔可夫决策过程,并提出了一种使用强化学习直接优化从字幕流生成的查询的检索效率的方法。我们的动态查询建模方法在静态查询建模和电视设置中基于文本的检索方面明显优于最先进的基线。特别地,我们发现仔细地对项进行加权,并根据近因对这些权重进行衰减,显著地提高了有效性。此外,我们的方法效率很高,可以在电视直播环境中使用,即接近实时。
{"title":"Dynamic Query Modeling for Related Content Finding","authors":"Daan Odijk, E. Meij, I. Sijaranamual, M. de Rijke","doi":"10.1145/2766462.2767715","DOIUrl":"https://doi.org/10.1145/2766462.2767715","url":null,"abstract":"While watching television, people increasingly consume additional content related to what they are watching. We consider the task of finding video content related to a live television broadcast for which we leverage the textual stream of subtitles associated with the broadcast. We model this task as a Markov decision process and propose a method that uses reinforcement learning to directly optimize the retrieval effectiveness of queries generated from the stream of subtitles. Our dynamic query modeling approach significantly outperforms state-of-the-art baselines for stationary query modeling and for text-based retrieval in a television setting. In particular we find that carefully weighting terms and decaying these weights based on recency significantly improves effectiveness. Moreover, our method is highly efficient and can be used in a live television setting, i.e., in near real time.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123042145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
In Situ Insights 实地观察
Yuanhua Lv, A. Fuxman
When consuming content in applications such as e-readers, word processors, and Web browsers, users often see mentions to topics (or concepts) that attract their attention. In a scenario of significant practical interest, topics are explored in situ, without leaving the context of the application: The user selects a mention of a topic (in the form of continuous text), and the system subsequently recommends references (e.g., Wikipedia concepts) that are relevant in the context of the application. In order to realize this experience, it is necessary to tackle challenges that include: users may select any continuous text, even potentially noisy text for which there is no corresponding reference in the knowledge base; references must be relevant to both the user selection and the text around it; and the real estate available on the application may be constrained, thus limiting the number of results that can be shown. In this paper, we study this novel recommendation task, that we call in situ insights: recommending reference concepts in response to a text selection and its context in-situ of a document consumption application. We first propose a selection-centric context language model and a selection-centric context semantic model to capture user interest. Based on these models, we then measure the quality of a reference concept across three aspects: selection clarity, context coherence, and concept relevance. By leveraging all these aspects, we put forward a machine learning approach to simultaneously decide if a selection is noisy, and filter out low-quality candidate references. In order to quantitatively evaluate our proposed techniques, we construct a test collection based on the simulation of the in situ insights scenario using crowdsourcing in the context of a real-word e-reader application. Our experimental evaluation demonstrates the effectiveness of the proposed techniques.
在电子阅读器、文字处理器和Web浏览器等应用程序中使用内容时,用户经常会看到对吸引他们注意的主题(或概念)的提及。在具有重要实际意义的场景中,主题在现场进行探索,而不离开应用程序的上下文:用户选择一个主题的提及(以连续文本的形式),系统随后推荐与应用程序上下文相关的参考文献(例如,Wikipedia概念)。为了实现这种体验,有必要解决以下挑战:用户可以选择任何连续文本,甚至是知识库中没有相应参考的潜在噪声文本;参考必须与用户选择和周围的文本相关;而且应用程序上可用的空间可能受到限制,从而限制了可以显示的结果的数量。在本文中,我们研究了这种新颖的推荐任务,我们称之为原位洞察:根据文档消费应用程序的文本选择及其上下文现场推荐参考概念。我们首先提出了一个以选择为中心的上下文语言模型和一个以选择为中心的上下文语义模型来捕捉用户兴趣。基于这些模型,我们从三个方面衡量参考概念的质量:选择清晰度、上下文一致性和概念相关性。通过利用所有这些方面,我们提出了一种机器学习方法,可以同时确定选择是否有噪声,并过滤掉低质量的候选参考文献。为了定量评估我们提出的技术,我们在一个真实的电子阅读器应用环境中使用众包构建了一个基于模拟现场洞察场景的测试集合。我们的实验评估证明了所提出技术的有效性。
{"title":"In Situ Insights","authors":"Yuanhua Lv, A. Fuxman","doi":"10.1145/2766462.2767696","DOIUrl":"https://doi.org/10.1145/2766462.2767696","url":null,"abstract":"When consuming content in applications such as e-readers, word processors, and Web browsers, users often see mentions to topics (or concepts) that attract their attention. In a scenario of significant practical interest, topics are explored in situ, without leaving the context of the application: The user selects a mention of a topic (in the form of continuous text), and the system subsequently recommends references (e.g., Wikipedia concepts) that are relevant in the context of the application. In order to realize this experience, it is necessary to tackle challenges that include: users may select any continuous text, even potentially noisy text for which there is no corresponding reference in the knowledge base; references must be relevant to both the user selection and the text around it; and the real estate available on the application may be constrained, thus limiting the number of results that can be shown. In this paper, we study this novel recommendation task, that we call in situ insights: recommending reference concepts in response to a text selection and its context in-situ of a document consumption application. We first propose a selection-centric context language model and a selection-centric context semantic model to capture user interest. Based on these models, we then measure the quality of a reference concept across three aspects: selection clarity, context coherence, and concept relevance. By leveraging all these aspects, we put forward a machine learning approach to simultaneously decide if a selection is noisy, and filter out low-quality candidate references. In order to quantitatively evaluate our proposed techniques, we construct a test collection based on the simulation of the in situ insights scenario using crowdsourcing in the context of a real-word e-reader application. Our experimental evaluation demonstrates the effectiveness of the proposed techniques.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127862276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes 通过分层非参数过程总结对比主题
Z. Ren, M. de Rijke
Given a topic of interest, a contrastive theme is a group of opposing pairs of viewpoints. We address the task of summarizing contrastive themes: given a set of opinionated documents, select meaningful sentences to represent contrastive themes present in those documents. Several factors make this a challenging problem: unknown numbers of topics, unknown relationships among topics, and the extraction of comparative sentences. Our approach has three core ingredients: contrastive theme modeling, diverse theme extraction, and contrastive theme summarization. Specifically, we present a hierarchical non-parametric model to describe hierarchical relations among topics; this model is used to infer threads of topics as themes from the nested Chinese restaurant process. We enhance the diversity of themes by using structured determinantal point processes for selecting a set of diverse themes with high quality. Finally, we pair contrastive themes and employ an iterative optimization algorithm to select sentences, explicitly considering contrast, relevance, and diversity. Experiments on three datasets demonstrate the effectiveness of our method.
给定一个感兴趣的话题,对比主题是一组对立的观点。我们解决了总结对比主题的任务:给定一组固执己见的文档,选择有意义的句子来表示这些文档中存在的对比主题。有几个因素使这成为一个具有挑战性的问题:未知的主题数量,主题之间未知的关系,以及比较句的提取。我们的方法有三个核心成分:对比主题建模、多样化主题提取和对比主题总结。具体来说,我们提出了一个层次非参数模型来描述主题之间的层次关系;该模型用于从嵌套的中餐馆流程中推断主题线程。我们通过使用结构化决定点过程来选择一组高质量的不同主题,从而增强主题的多样性。最后,我们将对比主题配对,并采用迭代优化算法来选择句子,明确考虑对比、相关性和多样性。在三个数据集上的实验证明了该方法的有效性。
{"title":"Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes","authors":"Z. Ren, M. de Rijke","doi":"10.1145/2766462.2767713","DOIUrl":"https://doi.org/10.1145/2766462.2767713","url":null,"abstract":"Given a topic of interest, a contrastive theme is a group of opposing pairs of viewpoints. We address the task of summarizing contrastive themes: given a set of opinionated documents, select meaningful sentences to represent contrastive themes present in those documents. Several factors make this a challenging problem: unknown numbers of topics, unknown relationships among topics, and the extraction of comparative sentences. Our approach has three core ingredients: contrastive theme modeling, diverse theme extraction, and contrastive theme summarization. Specifically, we present a hierarchical non-parametric model to describe hierarchical relations among topics; this model is used to infer threads of topics as themes from the nested Chinese restaurant process. We enhance the diversity of themes by using structured determinantal point processes for selecting a set of diverse themes with high quality. Finally, we pair contrastive themes and employ an iterative optimization algorithm to select sentences, explicitly considering contrast, relevance, and diversity. Experiments on three datasets demonstrate the effectiveness of our method.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128548986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Session details: Session 4A: User Models 会话详细信息:会话4A:用户模型
D. Kelly
{"title":"Session details: Session 4A: User Models","authors":"D. Kelly","doi":"10.1145/3255924","DOIUrl":"https://doi.org/10.1145/3255924","url":null,"abstract":"","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115500573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessor Differences and User Preferences in Tweet Timeline Generation 评估者差异和用户偏好在推文时间轴生成
Yulu Wang, G. Sherman, Jimmy J. Lin, Miles Efron
In information retrieval evaluation, when presented with an effectiveness difference between two systems, there are three relevant questions one might ask. First, are the differences statistically significant? Second, is the comparison stable with respect to assessor differences? Finally, is the difference actually meaningful to a user? This paper tackles the last two questions about assessor differences and user preferences in the context of the newly-introduced tweet timeline generation task in the TREC 2014 Microblog track, where the system's goal is to construct an informative summary of non-redundant tweets that addresses the user's information need. Central to the evaluation methodology is human-generated semantic clusters of tweets that contain substantively similar information. We show that the evaluation is stable with respect to assessor differences in clustering and that user preferences generally correlate with effectiveness metrics even though users are not explicitly aware of the semantic clustering being performed by the systems. Although our analyses are limited to this particular task, we believe that lessons learned could generalize to other evaluations based on establishing semantic equivalence between information units, such as nugget-based evaluations in question answering and temporal summarization.
在信息检索评价中,当两种系统的有效性存在差异时,人们可能会提出三个相关问题。首先,这些差异在统计上是否显著?第二,相对于评估者的差异,比较是否稳定?最后,这种差异对用户来说真的有意义吗?本文在TREC 2014微博轨道中新引入的推文时间线生成任务的背景下解决了关于评估者差异和用户偏好的最后两个问题,其中系统的目标是构建非冗余推文的信息摘要,以满足用户的信息需求。评估方法的核心是人类生成的包含实质相似信息的推文语义集群。我们表明,就聚类的评估者差异而言,评估是稳定的,即使用户没有明确地意识到系统正在执行的语义聚类,用户偏好通常也与有效性指标相关。尽管我们的分析仅限于这一特定任务,但我们相信,通过建立信息单元之间的语义等价,可以将经验教训推广到其他评估中,例如基于金块的问题回答评估和时间摘要评估。
{"title":"Assessor Differences and User Preferences in Tweet Timeline Generation","authors":"Yulu Wang, G. Sherman, Jimmy J. Lin, Miles Efron","doi":"10.1145/2766462.2767699","DOIUrl":"https://doi.org/10.1145/2766462.2767699","url":null,"abstract":"In information retrieval evaluation, when presented with an effectiveness difference between two systems, there are three relevant questions one might ask. First, are the differences statistically significant? Second, is the comparison stable with respect to assessor differences? Finally, is the difference actually meaningful to a user? This paper tackles the last two questions about assessor differences and user preferences in the context of the newly-introduced tweet timeline generation task in the TREC 2014 Microblog track, where the system's goal is to construct an informative summary of non-redundant tweets that addresses the user's information need. Central to the evaluation methodology is human-generated semantic clusters of tweets that contain substantively similar information. We show that the evaluation is stable with respect to assessor differences in clustering and that user preferences generally correlate with effectiveness metrics even though users are not explicitly aware of the semantic clustering being performed by the systems. Although our analyses are limited to this particular task, we believe that lessons learned could generalize to other evaluations based on establishing semantic equivalence between information units, such as nugget-based evaluations in question answering and temporal summarization.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115571940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Session details: Session 2A: Diversity and Bias 会议详情:会议2A:多样性和偏见
Gareth J.F. Jones
{"title":"Session details: Session 2A: Diversity and Bias","authors":"Gareth J.F. Jones","doi":"10.1145/3255918","DOIUrl":"https://doi.org/10.1145/3255918","url":null,"abstract":"","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115636296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Quality Graph-Based Similarity Search 高质量的基于图的相似度搜索
Weiren Yu, J. Mccann
SimRank is an influential link-based similarity measure that has been used in many fields of Web search and sociometry. The best-of-breed method by Kusumoto et. al., however, does not always deliver high-quality results, since it fails to accurately obtain its diagonal correction matrix D. Besides, SimRank is also limited by an unwanted "connectivity trait": increasing the number of paths between nodes a and b often incurs a decrease in score s(a,b). The best-known solution, SimRank++, cannot resolve this problem, since a revised score will be zero if a and b have no common in-neighbors. In this paper, we consider high-quality similarity search. Our scheme, SR#, is efficient and semantically meaningful: (1) We first formulate the exact D, and devise a "varied-D" method to accurately compute SimRank in linear memory. Moreover, by grouping computation, we also reduce the time of from quadratic to linear in the number of iterations. (2) We design a "kernel-based" model to improve the quality of SimRank, and circumvent the "connectivity trait" issue. (3) We give mathematical insights to the semantic difference between SimRank and its variant, and correct an argument: "if D is replaced by a scaled identity matrix, top-K rankings will not be affected much". The experiments confirm that SR# can accurately extract high-quality scores, and is much faster than the state-of-the-art competitors.
simmrank是一种很有影响力的基于链接的相似性度量方法,已被用于网络搜索和社会计量学的许多领域。然而,由Kusumoto等人提出的同类最佳方法并不总是提供高质量的结果,因为它不能准确地获得其对角修正矩阵d。此外,simmrank还受到不必要的“连接特性”的限制:增加节点a和b之间的路径数量通常会导致分数s(a,b)的降低。最著名的解决方案simrank++不能解决这个问题,因为如果a和b没有共同的内邻居,修改后的分数将为零。在本文中,我们考虑高质量的相似度搜索。我们的方案sr#是高效且有语义意义的:(1)我们首先制定了精确的D,并设计了一个“变D”方法来精确计算线性存储器中的simmrank。此外,通过分组计算,我们还减少了迭代次数从二次到线性的时间。(2)设计了“基于核”的simmrank模型,提高了simmrank的质量,规避了“连通性”问题。(3)我们对simmrank及其变体之间的语义差异进行了数学分析,并纠正了一个论点:“如果D被缩放的单位矩阵取代,top-K排名不会受到太大影响”。实验证实,sr#可以准确地提取高质量的分数,并且比最先进的竞争对手快得多。
{"title":"High Quality Graph-Based Similarity Search","authors":"Weiren Yu, J. Mccann","doi":"10.1145/2766462.2767720","DOIUrl":"https://doi.org/10.1145/2766462.2767720","url":null,"abstract":"SimRank is an influential link-based similarity measure that has been used in many fields of Web search and sociometry. The best-of-breed method by Kusumoto et. al., however, does not always deliver high-quality results, since it fails to accurately obtain its diagonal correction matrix D. Besides, SimRank is also limited by an unwanted \"connectivity trait\": increasing the number of paths between nodes a and b often incurs a decrease in score s(a,b). The best-known solution, SimRank++, cannot resolve this problem, since a revised score will be zero if a and b have no common in-neighbors. In this paper, we consider high-quality similarity search. Our scheme, SR#, is efficient and semantically meaningful: (1) We first formulate the exact D, and devise a \"varied-D\" method to accurately compute SimRank in linear memory. Moreover, by grouping computation, we also reduce the time of from quadratic to linear in the number of iterations. (2) We design a \"kernel-based\" model to improve the quality of SimRank, and circumvent the \"connectivity trait\" issue. (3) We give mathematical insights to the semantic difference between SimRank and its variant, and correct an argument: \"if D is replaced by a scaled identity matrix, top-K rankings will not be affected much\". The experiments confirm that SR# can accurately extract high-quality scores, and is much faster than the state-of-the-art competitors.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115736601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
期刊
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1