首页 > 最新文献

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
Understanding Temporal Query Intent 理解时态查询意图
Mohammed Hasanuzzaman, S. Saha, G. Dias, S. Ferrari
Understanding the temporal orientation of web search queries is an important issue for the success of information access systems. In this paper, we propose a multi-objective ensemble learning solution that (1) allows to accurately classify queries along their temporal intent and (2) identifies a set of performing solutions thus offering a wide range of possible applications. Experiments show that correct representation of the problem can lead to great classification improvements when compared to recent state-of-the-art solutions and baseline ensemble techniques.
了解网络搜索查询的时间方向是信息访问系统成功的一个重要问题。在本文中,我们提出了一个多目标集成学习解决方案,该解决方案(1)允许根据查询的时间意图对查询进行准确分类,(2)确定一组执行解决方案,从而提供广泛的可能应用。实验表明,与最近的最先进的解决方案和基线集成技术相比,对问题的正确表示可以带来很大的分类改进。
{"title":"Understanding Temporal Query Intent","authors":"Mohammed Hasanuzzaman, S. Saha, G. Dias, S. Ferrari","doi":"10.1145/2766462.2767792","DOIUrl":"https://doi.org/10.1145/2766462.2767792","url":null,"abstract":"Understanding the temporal orientation of web search queries is an important issue for the success of information access systems. In this paper, we propose a multi-objective ensemble learning solution that (1) allows to accurately classify queries along their temporal intent and (2) identifies a set of performing solutions thus offering a wide range of possible applications. Experiments show that correct representation of the problem can lead to great classification improvements when compared to recent state-of-the-art solutions and baseline ensemble techniques.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"18 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120970822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Practical Lessons for Gathering Quality Labels at Scale 大规模收集质量标签的实践教训
Omar Alonso
Information retrieval researchers and engineers use human computation as a mechanism to produce labeled data sets for product development, research and experimentation. To gather useful results, a successful labeling task relies on many different elements: clear instructions, user interface guidelines, representative high-quality datasets, appropriate inter-rater agreement metrics, work quality checks, and channels for worker feedback. Furthermore, designing and implementing tasks that produce and use several thousands or millions of labels is different than conducting small scale research investigations. In this paper we present a perspective for collecting high quality labels with an emphasis on practical problems and scalability. We focus on three main topics: programming crowds, debugging tasks with low agreement, and algorithms for quality control. We show examples from an industrial setting.
信息检索研究人员和工程师使用人工计算作为一种机制,为产品开发、研究和实验产生标记数据集。为了收集有用的结果,一个成功的标签任务依赖于许多不同的元素:明确的说明、用户界面指南、具有代表性的高质量数据集、适当的评分者之间的协议指标、工作质量检查和工人反馈的渠道。此外,设计和实施产生和使用数千或数百万个标签的任务与进行小规模的研究调查是不同的。在本文中,我们提出了一个收集高质量标签的观点,重点是实际问题和可扩展性。我们主要关注三个主题:编程人群、低一致性调试任务和质量控制算法。我们将展示来自工业环境的例子。
{"title":"Practical Lessons for Gathering Quality Labels at Scale","authors":"Omar Alonso","doi":"10.1145/2766462.2776778","DOIUrl":"https://doi.org/10.1145/2766462.2776778","url":null,"abstract":"Information retrieval researchers and engineers use human computation as a mechanism to produce labeled data sets for product development, research and experimentation. To gather useful results, a successful labeling task relies on many different elements: clear instructions, user interface guidelines, representative high-quality datasets, appropriate inter-rater agreement metrics, work quality checks, and channels for worker feedback. Furthermore, designing and implementing tasks that produce and use several thousands or millions of labels is different than conducting small scale research investigations. In this paper we present a perspective for collecting high quality labels with an emphasis on practical problems and scalability. We focus on three main topics: programming crowds, debugging tasks with low agreement, and algorithms for quality control. We show examples from an industrial setting.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126780736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Linse: A Distributional Semantics Entity Search Engine Linse:分布式语义实体搜索引擎
J. Sales, A. Freitas, S. Handschuh, Brian Davis
Entering 'Football Players from United States' when searching for 'American Footballers' is an example of vocabulary mismatch, which occurs when different words are used to express the same concepts. In order to address this phenomenon for entity search targeting descriptors for complex categories, we propose a compositional-distributional semantics entity search engine, which extracts semantic and commonsense knowledge from large-scale corpora to address the vocabulary gap between query and data.
当搜索“美式足球运动员”时,输入“美式足球运动员来自美国”是词汇不匹配的一个例子,当使用不同的单词来表达相同的概念时,就会发生这种情况。为了解决这一现象,我们提出了一种组合-分布语义实体搜索引擎,该引擎从大规模语料库中提取语义和常识知识,以解决查询和数据之间的词汇差距。
{"title":"Linse: A Distributional Semantics Entity Search Engine","authors":"J. Sales, A. Freitas, S. Handschuh, Brian Davis","doi":"10.1145/2766462.2767871","DOIUrl":"https://doi.org/10.1145/2766462.2767871","url":null,"abstract":"Entering 'Football Players from United States' when searching for 'American Footballers' is an example of vocabulary mismatch, which occurs when different words are used to express the same concepts. In order to address this phenomenon for entity search targeting descriptors for complex categories, we propose a compositional-distributional semantics entity search engine, which extracts semantic and commonsense knowledge from large-scale corpora to address the vocabulary gap between query and data.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121483555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sign-Aware Periodicity Metrics of User Engagement for Online Search Quality Evaluation 基于符号感知的用户参与周期度量用于在线搜索质量评估
Alexey Drutsa
Modern Internet companies improve evaluation criteria of their data-driven decision-making that is based on online controlled experiments (also known as A/B tests). The amplitude metrics of user engagement are known to be well sensitive to service changes, but they could not be used to determine, whether the treatment effect is positive or negative. We propose to overcome this sign-agnostic issue by paying attention to the phase of the corresponding DFT sine wave. We refine the amplitude metrics of the first frequency by the phase ones and formalize our intuition in several novel overall evaluation criteria. These criteria are then verified over A/B experiments on real users of Yandex. We find that our approach holds the sensitivity level of the amplitudes and makes their changes sign-aware w.r.t. the treatment effect.
现代互联网公司改进了基于在线控制实验(也称为A/B测试)的数据驱动决策的评估标准。众所周知,用户参与度的幅度指标对服务变化非常敏感,但它们不能用于确定治疗效果是积极的还是消极的。我们建议通过关注相应DFT正弦波的相位来克服这个符号不可知的问题。我们通过相位改进了第一个频率的幅度指标,并将我们的直觉形式化为几个新的总体评估标准。然后通过对Yandex真实用户的A/B实验验证这些标准。我们发现我们的方法保持了振幅的灵敏度水平,并且使它们的变化与治疗效果无关。
{"title":"Sign-Aware Periodicity Metrics of User Engagement for Online Search Quality Evaluation","authors":"Alexey Drutsa","doi":"10.1145/2766462.2767814","DOIUrl":"https://doi.org/10.1145/2766462.2767814","url":null,"abstract":"Modern Internet companies improve evaluation criteria of their data-driven decision-making that is based on online controlled experiments (also known as A/B tests). The amplitude metrics of user engagement are known to be well sensitive to service changes, but they could not be used to determine, whether the treatment effect is positive or negative. We propose to overcome this sign-agnostic issue by paying attention to the phase of the corresponding DFT sine wave. We refine the amplitude metrics of the first frequency by the phase ones and formalize our intuition in several novel overall evaluation criteria. These criteria are then verified over A/B experiments on real users of Yandex. We find that our approach holds the sensitivity level of the amplitudes and makes their changes sign-aware w.r.t. the treatment effect.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129189861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Automatic Feature Generation on Heterogeneous Graph for Music Recommendation 基于异构图的音乐推荐特征自动生成
Chun Guo, Xiaozhong Liu
Online music streaming services (MSS) experienced exponential growth over the past decade. The giant MSS providers not only built massive music collection with metadata, they also accumulated large amount of heterogeneous data generated from users, e.g. listening history, comment, bookmark, and user generated playlist. While various kinds of user data can potentially be used to enhance the music recommendation performance, most existing studies only focused on audio content features and collaborative filtering approaches based on simple user listening history or music rating. In this paper, we propose a novel approach to solve the music recommendation problem by means of heterogeneous graph mining. Meta-path based features are automatically generated from a content-rich heterogeneous graph schema with 6 types of nodes and 16 types of relations. Meanwhile, we use learning-to-rank approach to integrate different features for music recommendation. Experiment results show that the automatically generated graphical features significantly (p<0.0001) enhance state-of-the-art collaborative filtering algorithm.
在线音乐流媒体服务(MSS)在过去十年中经历了指数级增长。大型MSS提供商不仅用元数据构建了海量的音乐收藏,还积累了大量来自用户的异构数据,如收听历史、评论、书签和用户生成的播放列表。虽然各种各样的用户数据可以用来增强音乐推荐的性能,但大多数现有的研究只关注音频内容特征和基于简单的用户收听历史或音乐评级的协同过滤方法。本文提出了一种基于异构图挖掘的音乐推荐方法。基于元路径的特性是由具有6种节点类型和16种关系类型的内容丰富的异构图模式自动生成的。同时,我们使用学习排序的方法来整合不同的特征进行音乐推荐。实验结果表明,自动生成的图形特征显著(p<0.0001)增强了最先进的协同过滤算法。
{"title":"Automatic Feature Generation on Heterogeneous Graph for Music Recommendation","authors":"Chun Guo, Xiaozhong Liu","doi":"10.1145/2766462.2767808","DOIUrl":"https://doi.org/10.1145/2766462.2767808","url":null,"abstract":"Online music streaming services (MSS) experienced exponential growth over the past decade. The giant MSS providers not only built massive music collection with metadata, they also accumulated large amount of heterogeneous data generated from users, e.g. listening history, comment, bookmark, and user generated playlist. While various kinds of user data can potentially be used to enhance the music recommendation performance, most existing studies only focused on audio content features and collaborative filtering approaches based on simple user listening history or music rating. In this paper, we propose a novel approach to solve the music recommendation problem by means of heterogeneous graph mining. Meta-path based features are automatically generated from a content-rich heterogeneous graph schema with 6 types of nodes and 16 types of relations. Meanwhile, we use learning-to-rank approach to integrate different features for music recommendation. Experiment results show that the automatically generated graphical features significantly (p<0.0001) enhance state-of-the-art collaborative filtering algorithm.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130434673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
How many results per page?: A Study of SERP Size, Search Behavior and User Experience 每页有多少个结果?: SERP大小、搜索行为和用户体验的研究
D. Kelly, L. Azzopardi
The provision of "ten blue links" has emerged as the standard for the design of search engine result pages (SERPs). While numerous aspects of SERPs have been examined, little attention has been paid to the number of results displayed per page. This paper investigates the relationships among the number of results shown on a SERP, search behavior and user experience. We performed a laboratory experiment with 36 subjects, who were randomly assigned to use one of three search interfaces that varied according to the number of results per SERP (three, six or ten). We found subjects' click distributions differed significantly depending on SERP size. We also found those who interacted with three results per page viewed significantly more SERPs per query; interestingly, the number of SERPs they viewed per query corresponded to about 10 search results. Subjects who interacted with ten results per page viewed and saved significantly more documents. They also reported the greatest difficulty finding relevant documents, rated their skills the lowest and reported greater workload, even though these differences were not significant. This work shows that behavior changes with SERP size, such that more time is spent focused on earlier results when SERP size decreases.
提供“十个蓝色链接”已经成为搜索引擎结果页面(serp)设计的标准。虽然研究了serp的许多方面,但很少关注每页显示的结果数量。本文研究了SERP上显示的结果数量、搜索行为和用户体验之间的关系。我们对36名受试者进行了实验室实验,他们被随机分配使用三种搜索界面中的一种,根据每个SERP的结果数量(三个,六个或十个)而变化。我们发现受试者的点击分布显著不同于SERP的大小。我们还发现,那些每页与三个结果互动的人每次查询的serp明显更多;有趣的是,他们每个查询查看的serp数量对应于大约10个搜索结果。每页与10个结果交互的受试者查看并保存了更多的文档。他们还报告了寻找相关文件的最大困难,对自己技能的评价最低,并报告了更大的工作量,尽管这些差异并不显著。这项工作表明,行为随着SERP大小的变化而变化,当SERP大小减小时,更多的时间花在早期的结果上。
{"title":"How many results per page?: A Study of SERP Size, Search Behavior and User Experience","authors":"D. Kelly, L. Azzopardi","doi":"10.1145/2766462.2767732","DOIUrl":"https://doi.org/10.1145/2766462.2767732","url":null,"abstract":"The provision of \"ten blue links\" has emerged as the standard for the design of search engine result pages (SERPs). While numerous aspects of SERPs have been examined, little attention has been paid to the number of results displayed per page. This paper investigates the relationships among the number of results shown on a SERP, search behavior and user experience. We performed a laboratory experiment with 36 subjects, who were randomly assigned to use one of three search interfaces that varied according to the number of results per SERP (three, six or ten). We found subjects' click distributions differed significantly depending on SERP size. We also found those who interacted with three results per page viewed significantly more SERPs per query; interestingly, the number of SERPs they viewed per query corresponded to about 10 search results. Subjects who interacted with ten results per page viewed and saved significantly more documents. They also reported the greatest difficulty finding relevant documents, rated their skills the lowest and reported greater workload, even though these differences were not significant. This work shows that behavior changes with SERP size, such that more time is spent focused on earlier results when SERP size decreases.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130747075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Towards Quantifying the Impact of Non-Uniform Information Access in Collaborative Information Retrieval 对协同信息检索中非统一信息访问影响的量化研究
N. Htun, Martin Halvey, L. Baillie
The majority of research into Collaborative Information Retrieval (CIR) has assumed a uniformity of information access and visibility between collaborators. However in a number of real world scenarios, information access is not uniform between all collaborators in a team e.g. security, health etc. This can be referred to as Multi-Level Collaborative Information Retrieval (MLCIR). To the best of our knowledge, there has not yet been any systematic investigation of the effect of MLCIR on search outcomes. To address this shortcoming, in this paper, we present the results of a simulated evaluation conducted over 4 different non-uniform information access scenarios and 3 different collaborative search strategies. Results indicate that there is some tolerance to removing access to the collection and that there may not always be a negative impact on performance. We also highlight how different access scenarios and search strategies impact on search outcomes.
大多数关于协同信息检索(CIR)的研究都假定协作者之间的信息访问和可见性是一致的。然而,在现实世界的许多场景中,团队中所有协作者之间的信息访问并不统一,例如安全、健康等。这可以称为多层次协同信息检索(MLCIR)。据我们所知,目前还没有任何关于MLCIR对搜索结果影响的系统调查。为了解决这一缺点,在本文中,我们给出了对4种不同的非统一信息访问场景和3种不同的协同搜索策略进行模拟评估的结果。结果表明,删除对集合的访问有一定的容忍度,并且可能并不总是对性能产生负面影响。我们还强调了不同的访问场景和搜索策略如何影响搜索结果。
{"title":"Towards Quantifying the Impact of Non-Uniform Information Access in Collaborative Information Retrieval","authors":"N. Htun, Martin Halvey, L. Baillie","doi":"10.1145/2766462.2767779","DOIUrl":"https://doi.org/10.1145/2766462.2767779","url":null,"abstract":"The majority of research into Collaborative Information Retrieval (CIR) has assumed a uniformity of information access and visibility between collaborators. However in a number of real world scenarios, information access is not uniform between all collaborators in a team e.g. security, health etc. This can be referred to as Multi-Level Collaborative Information Retrieval (MLCIR). To the best of our knowledge, there has not yet been any systematic investigation of the effect of MLCIR on search outcomes. To address this shortcoming, in this paper, we present the results of a simulated evaluation conducted over 4 different non-uniform information access scenarios and 3 different collaborative search strategies. Results indicate that there is some tolerance to removing access to the collection and that there may not always be a negative impact on performance. We also highlight how different access scenarios and search strategies impact on search outcomes.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131061865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Features of Disagreement Between Retrieval Effectiveness Measures 检索有效性度量差异的特征
Timothy Jones, Paul Thomas, Falk Scholer, M. Sanderson
Many IR effectiveness measures are motivated from intuition, theory, or user studies. In general, most effectiveness measures are well correlated with each other. But, what about where they don't correlate? Which rankings cause measures to disagree? Are these rankings predictable for particular pairs of measures? In this work, we examine how and where metrics disagree, and identify differences that should be considered when selecting metrics for use in evaluating retrieval systems.
许多IR有效性度量是由直觉、理论或用户研究驱动的。一般来说,大多数有效性度量都是相互关联的。但是,如果它们不相关呢?哪些排名导致测量结果不一致?这些排名对于特定的衡量标准是可预测的吗?在这项工作中,我们检查度量不一致的方式和位置,并确定在选择用于评估检索系统的度量时应该考虑的差异。
{"title":"Features of Disagreement Between Retrieval Effectiveness Measures","authors":"Timothy Jones, Paul Thomas, Falk Scholer, M. Sanderson","doi":"10.1145/2766462.2767824","DOIUrl":"https://doi.org/10.1145/2766462.2767824","url":null,"abstract":"Many IR effectiveness measures are motivated from intuition, theory, or user studies. In general, most effectiveness measures are well correlated with each other. But, what about where they don't correlate? Which rankings cause measures to disagree? Are these rankings predictable for particular pairs of measures? In this work, we examine how and where metrics disagree, and identify differences that should be considered when selecting metrics for use in evaluating retrieval systems.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131208558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Head-Weighted Gap-Sensitive Correlation Coefficient 头部加权的间隙敏感相关系数
Ning Gao, Douglas W. Oard
Information retrieval systems rank documents, and shared-task evaluations yield results that can be used to rank information retrieval systems. Comparing rankings in ways that can yield useful insights is thus an important capability. When making such comparisons, it is often useful to give greater weight to comparisons near the head of a ranked list than to what happens further down. This is the focus of the widely used τAP measure. When scores are available, gap-sensitive measures give greater weight to larger differences than to smaller ones. This is the focus of the widely used Pearson correlation measure (ρ). This paper introduces a new measure, τGAP, which combines both features. System comparisons from the TREC 5 Ad Hoc track are used to illustrate the differences in emphasis achieved by τAP, ρ, and the proposed τGAP.
信息检索系统对文档进行排序,共享任务评估产生的结果可用于对信息检索系统进行排序。因此,以能够产生有用见解的方式比较排名是一项重要的功能。在进行这种比较时,给予排名列表顶部附近的比较更大的权重,而不是后面发生的比较,通常是有用的。这是广泛使用的τAP度量的重点。当分数可用时,差距敏感指标给予较大差异比较小差异更大的权重。这是广泛使用的Pearson相关度量(ρ)的焦点。本文介绍了一种结合这两个特征的新测度τGAP。来自TREC 5 Ad Hoc轨道的系统比较用于说明τAP, ρ和提出的τGAP在重点上的差异。
{"title":"A Head-Weighted Gap-Sensitive Correlation Coefficient","authors":"Ning Gao, Douglas W. Oard","doi":"10.1145/2766462.2767793","DOIUrl":"https://doi.org/10.1145/2766462.2767793","url":null,"abstract":"Information retrieval systems rank documents, and shared-task evaluations yield results that can be used to rank information retrieval systems. Comparing rankings in ways that can yield useful insights is thus an important capability. When making such comparisons, it is often useful to give greater weight to comparisons near the head of a ranked list than to what happens further down. This is the focus of the widely used τAP measure. When scores are available, gap-sensitive measures give greater weight to larger differences than to smaller ones. This is the focus of the widely used Pearson correlation measure (ρ). This paper introduces a new measure, τGAP, which combines both features. System comparisons from the TREC 5 Ad Hoc track are used to illustrate the differences in emphasis achieved by τAP, ρ, and the proposed τGAP.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133200263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Untangling Result List Refinement and Ranking Quality: a Framework for Evaluation and Prediction 解缠结果列表细化和质量排序:一个评估和预测的框架
Jiyin He, M. Bron, A. D. Vries, L. Azzopardi, M. de Rijke
Traditional batch evaluation metrics assume that user interaction with search results is limited to scanning down a ranked list. However, modern search interfaces come with additional elements supporting result list refinement (RLR) through facets and filters, making user search behavior increasingly dynamic. We develop an evaluation framework that takes a step beyond the interaction assumption of traditional evaluation metrics and allows for batch evaluation of systems with and without RLR elements. In our framework we model user interaction as switching between different sublists. This provides a measure of user effort based on the joint effect of user interaction with RLR elements and result quality. We validate our framework by conducting a user study and comparing model predictions with real user performance. Our model predictions show significant positive correlation with real user effort. Further, in contrast to traditional evaluation metrics, the predictions using our framework, of when users stand to benefit from RLR elements, reflect findings from our user study. Finally, we use the framework to investigate under what conditions systems with and without RLR elements are likely to be effective. We simulate varying conditions concerning ranking quality, users, task and interface properties demonstrating a cost-effective way to study whole system performance.
传统的批处理评估指标假设用户与搜索结果的交互仅限于扫描排序列表。但是,现代搜索界面提供了通过facet和过滤器支持结果列表细化(RLR)的附加元素,使得用户搜索行为越来越动态。我们开发了一个评估框架,它超越了传统评估指标的交互假设,并允许对有或没有RLR元素的系统进行批量评估。在我们的框架中,我们将用户交互建模为在不同子列表之间切换。这提供了一种基于用户与RLR元素交互的联合效果和结果质量的用户工作度量。我们通过进行用户研究并将模型预测与实际用户性能进行比较来验证我们的框架。我们的模型预测与实际用户的努力有显著的正相关。此外,与传统的评估指标相比,使用我们的框架的预测,即用户何时能够从RLR元素中受益,反映了我们的用户研究结果。最后,我们使用该框架来研究在什么条件下,有或没有RLR元素的系统可能是有效的。我们模拟了关于排序质量、用户、任务和界面属性的不同条件,展示了一种经济有效的方法来研究整个系统的性能。
{"title":"Untangling Result List Refinement and Ranking Quality: a Framework for Evaluation and Prediction","authors":"Jiyin He, M. Bron, A. D. Vries, L. Azzopardi, M. de Rijke","doi":"10.1145/2766462.2767740","DOIUrl":"https://doi.org/10.1145/2766462.2767740","url":null,"abstract":"Traditional batch evaluation metrics assume that user interaction with search results is limited to scanning down a ranked list. However, modern search interfaces come with additional elements supporting result list refinement (RLR) through facets and filters, making user search behavior increasingly dynamic. We develop an evaluation framework that takes a step beyond the interaction assumption of traditional evaluation metrics and allows for batch evaluation of systems with and without RLR elements. In our framework we model user interaction as switching between different sublists. This provides a measure of user effort based on the joint effect of user interaction with RLR elements and result quality. We validate our framework by conducting a user study and comparing model predictions with real user performance. Our model predictions show significant positive correlation with real user effort. Further, in contrast to traditional evaluation metrics, the predictions using our framework, of when users stand to benefit from RLR elements, reflect findings from our user study. Finally, we use the framework to investigate under what conditions systems with and without RLR elements are likely to be effective. We simulate varying conditions concerning ranking quality, users, task and interface properties demonstrating a cost-effective way to study whole system performance.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132957416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1