首页 > 最新文献

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
Knowledge Tracing with Sequential Key-Value Memory Networks 基于顺序键值记忆网络的知识跟踪
Ghodai M. Abdelrahman, Qing Wang
Can machines trace human knowledge like humans? Knowledge tracing (KT) is a fundamental task in a wide range of applications in education, such as massive open online courses (MOOCs), intelligent tutoring systems, educational games, and learning management systems. It models dynamics in a student's knowledge states in relation to different learning concepts through their interactions with learning activities. Recently, several attempts have been made to use deep learning models for tackling the KT problem. Although these deep learning models have shown promising results, they have limitations: either lack the ability to go deeper to trace how specific concepts in a knowledge state are mastered by a student, or fail to capture long-term dependencies in an exercise sequence. In this paper, we address these limitations by proposing a novel deep learning model for knowledge tracing, namely Sequential Key-Value Memory Networks (SKVMN). This model unifies the strengths of recurrent modelling capacity and memory capacity of the existing deep learning KT models for modelling student learning. We have extensively evaluated our proposed model on five benchmark datasets. The experimental results show that (1) SKVMN outperforms the state-of-the-art KT models on all datasets, (2) SKVMN can better discover the correlation between latent concepts and questions, and (3) SKVMN can trace the knowledge state of students dynamics, and a leverage sequential dependencies in an exercise sequence for improved predication accuracy.
机器能像人类一样追踪人类的知识吗?知识追踪(KT)是大规模在线开放课程(MOOCs)、智能辅导系统、教育游戏、学习管理系统等教育领域广泛应用的基础任务。它通过不同的学习概念与学习活动的相互作用,对学生的知识状态进行动态建模。最近,人们尝试使用深度学习模型来解决KT问题。尽管这些深度学习模型显示出了令人鼓舞的结果,但它们也有局限性:要么缺乏更深入地追踪学生如何掌握知识状态中的特定概念的能力,要么无法捕捉练习序列中的长期依赖关系。在本文中,我们通过提出一种新的知识跟踪深度学习模型,即顺序键值记忆网络(SKVMN)来解决这些限制。该模型结合了现有深度学习KT模型的循环建模能力和记忆能力的优势,对学生学习进行建模。我们在五个基准数据集上广泛评估了我们提出的模型。实验结果表明:(1)SKVMN在所有数据集上都优于最先进的KT模型;(2)SKVMN可以更好地发现潜在概念和问题之间的相关性;(3)SKVMN可以跟踪学生动态的知识状态,并利用练习序列中的顺序依赖关系提高预测精度。
{"title":"Knowledge Tracing with Sequential Key-Value Memory Networks","authors":"Ghodai M. Abdelrahman, Qing Wang","doi":"10.1145/3331184.3331195","DOIUrl":"https://doi.org/10.1145/3331184.3331195","url":null,"abstract":"Can machines trace human knowledge like humans? Knowledge tracing (KT) is a fundamental task in a wide range of applications in education, such as massive open online courses (MOOCs), intelligent tutoring systems, educational games, and learning management systems. It models dynamics in a student's knowledge states in relation to different learning concepts through their interactions with learning activities. Recently, several attempts have been made to use deep learning models for tackling the KT problem. Although these deep learning models have shown promising results, they have limitations: either lack the ability to go deeper to trace how specific concepts in a knowledge state are mastered by a student, or fail to capture long-term dependencies in an exercise sequence. In this paper, we address these limitations by proposing a novel deep learning model for knowledge tracing, namely Sequential Key-Value Memory Networks (SKVMN). This model unifies the strengths of recurrent modelling capacity and memory capacity of the existing deep learning KT models for modelling student learning. We have extensively evaluated our proposed model on five benchmark datasets. The experimental results show that (1) SKVMN outperforms the state-of-the-art KT models on all datasets, (2) SKVMN can better discover the correlation between latent concepts and questions, and (3) SKVMN can trace the knowledge state of students dynamics, and a leverage sequential dependencies in an exercise sequence for improved predication accuracy.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75499182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
Session details: Session 2A: Question Answering 会议详情:2A:问答环节
C. Shah
{"title":"Session details: Session 2A: Question Answering","authors":"C. Shah","doi":"10.1145/3349678","DOIUrl":"https://doi.org/10.1145/3349678","url":null,"abstract":"","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78401389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Intention Network for Click-through Rate Prediction in Sponsored Search 赞助搜索中点击率预测的图形意图网络
Feng Li, Zhenrui Chen, Pengjie Wang, Yi Ren, Di Zhang, Xiaoyu Zhu
Estimating click-through rate (CTR) accurately has an essential impact on improving user experience and revenue in sponsored search. For CTR prediction model, it is necessary to make out user's real-time search intention. Most of the current work is to mine their intentions based on users' real-time behaviors. However, it is difficult to capture the intention when user behaviors are sparse, causing thebehavior sparsity problem. Moreover, it is difficult for user to jump out of their specific historical behaviors for possible interest exploration, namelyweak generalization problem. We propose a new approach Graph Intention Network (GIN) based on co-occurrence commodity graph to mine user intention. By adopting multi-layered graph diffusion, GIN enriches user behaviors to solve the behavior sparsity problem. By introducing co-occurrence relationship of commodities to explore the potential preferences, the weak generalization problem is also alleviated. To the best of our knowledge, the GIN method is the first to introduce graph learning for user intention mining in CTR prediction and propose end-to-end joint training of graph learning and CTR prediction tasks in sponsored search. At present, GIN has achieved excellent offline results on the real-world data of the e-commerce platform outperforming existing deep learning models, and has been running stable tests online and achieved significant CTR improvements.
准确估算点击率(CTR)对提高赞助搜索的用户体验和收入有着至关重要的影响。在CTR预测模型中,需要了解用户的实时搜索意图。目前的大部分工作是根据用户的实时行为来挖掘他们的意图。然而,当用户行为稀疏时,很难捕捉用户的意图,从而导致行为稀疏性问题。此外,用户很难跳出他们特定的历史行为进行可能的兴趣探索,即弱泛化问题。提出了一种基于共现商品图的用户意向网络(GIN)挖掘方法。GIN采用多层图扩散,丰富用户行为,解决行为稀疏性问题。通过引入商品共现关系来探索潜在偏好,也缓解了弱泛化问题。据我们所知,GIN方法首次将图学习引入到CTR预测的用户意图挖掘中,并提出了赞助搜索中图学习和CTR预测任务的端到端联合训练。目前,GIN在电商平台的真实数据上取得了出色的线下效果,优于现有的深度学习模型,并且在线上运行稳定的测试,点击率有了明显的提升。
{"title":"Graph Intention Network for Click-through Rate Prediction in Sponsored Search","authors":"Feng Li, Zhenrui Chen, Pengjie Wang, Yi Ren, Di Zhang, Xiaoyu Zhu","doi":"10.1145/3331184.3331283","DOIUrl":"https://doi.org/10.1145/3331184.3331283","url":null,"abstract":"Estimating click-through rate (CTR) accurately has an essential impact on improving user experience and revenue in sponsored search. For CTR prediction model, it is necessary to make out user's real-time search intention. Most of the current work is to mine their intentions based on users' real-time behaviors. However, it is difficult to capture the intention when user behaviors are sparse, causing thebehavior sparsity problem. Moreover, it is difficult for user to jump out of their specific historical behaviors for possible interest exploration, namelyweak generalization problem. We propose a new approach Graph Intention Network (GIN) based on co-occurrence commodity graph to mine user intention. By adopting multi-layered graph diffusion, GIN enriches user behaviors to solve the behavior sparsity problem. By introducing co-occurrence relationship of commodities to explore the potential preferences, the weak generalization problem is also alleviated. To the best of our knowledge, the GIN method is the first to introduce graph learning for user intention mining in CTR prediction and propose end-to-end joint training of graph learning and CTR prediction tasks in sponsored search. At present, GIN has achieved excellent offline results on the real-world data of the e-commerce platform outperforming existing deep learning models, and has been running stable tests online and achieved significant CTR improvements.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"146 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77660718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
SIGIR 2019 Tutorial on Explainable Recommendation and Search SIGIR 2019可解释推荐和搜索教程
Yongfeng Zhang, Jiaxin Mao, Qingyao Ai
Explainable recommendation and search attempt to develop models or methods that not only generate high-quality recommendation or search results, but also intuitive explanations of the results for users or system designers, which can help to improve the system transparency, persuasiveness, trustworthiness, and effectiveness, etc. This is even more important in personalized search and recommendation scenarios, where users would like to know why a particular product, web page, news report, or friend suggestion exists in his or her own search and recommendation lists. The tutorial focuses on the research and application of explainable recommendation and search algorithms, as well as their application in real-world systems such as search engine, e-commerce and social networks. The tutorial aims at introducing and communicating explainable recommendation and search methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.
可解释的推荐和搜索试图开发模型或方法,不仅产生高质量的推荐或搜索结果,而且为用户或系统设计者提供对结果的直观解释,从而有助于提高系统的透明度、说服力、可信度和有效性等。这在个性化搜索和推荐场景中更为重要,用户想知道为什么特定的产品、网页、新闻报道或朋友建议会出现在他或她自己的搜索和推荐列表中。本教程侧重于可解释推荐和搜索算法的研究和应用,以及它们在搜索引擎、电子商务和社交网络等现实系统中的应用。本教程旨在向社区介绍和传播可解释的推荐和搜索方法,并聚集对该研究方向感兴趣的研究人员和从业者进行讨论,思想交流和研究推广。
{"title":"SIGIR 2019 Tutorial on Explainable Recommendation and Search","authors":"Yongfeng Zhang, Jiaxin Mao, Qingyao Ai","doi":"10.1145/3331184.3331390","DOIUrl":"https://doi.org/10.1145/3331184.3331390","url":null,"abstract":"Explainable recommendation and search attempt to develop models or methods that not only generate high-quality recommendation or search results, but also intuitive explanations of the results for users or system designers, which can help to improve the system transparency, persuasiveness, trustworthiness, and effectiveness, etc. This is even more important in personalized search and recommendation scenarios, where users would like to know why a particular product, web page, news report, or friend suggestion exists in his or her own search and recommendation lists. The tutorial focuses on the research and application of explainable recommendation and search algorithms, as well as their application in real-world systems such as search engine, e-commerce and social networks. The tutorial aims at introducing and communicating explainable recommendation and search methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80220252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Item Recommendation by Combining Relative and Absolute Feedback Data 结合相对和绝对反馈数据进行项目推荐
Saikishore Kalloori, Tianyu Li, F. Ricci
User preferences in the form of absolute feedback, s.a., ratings, are widely exploited in Recommender Systems (RSs). Recent research has explored the usage of preferences expressed with pairwise comparisons, which signal relative feedback. It has been shown that pairwise comparisons can be effectively combined with ratings, but, it is important to fine tune the technique that leverages both types of feedback. Previous approaches train a single model by converting ratings into pairwise comparisons, and then use only that type of data. However, we claim that these two types of preferences reveal different information about users interests and should be exploited differently. Hence, in this work, we develop a ranking technique that separately exploits absolute and relative preferences in a hybrid model. In particular, we propose a joint loss function which is computed on both absolute and relative preferences of users. Our proposed ranking model uses pairwise comparisons data to predict the user's preference order between pairs of items and uses ratings to push high rated (relevant) items to the top of the ranking. Experimental results on three different data sets demonstrate that the proposed technique outperforms competitive baseline algorithms on popular ranking-oriented evaluation metrics.
绝对反馈形式的用户偏好,如评分,在推荐系统(RSs)中被广泛利用。最近的研究探索了用两两比较来表达偏好的用法,两两比较表示相对反馈。研究表明,两两比较可以有效地与评级相结合,但重要的是要微调利用这两种反馈的技术。以前的方法通过将评级转换为两两比较来训练单个模型,然后只使用该类型的数据。然而,我们认为这两种类型的偏好揭示了用户兴趣的不同信息,应该以不同的方式加以利用。因此,在这项工作中,我们开发了一种排序技术,在混合模型中分别利用绝对偏好和相对偏好。特别地,我们提出了一个联合损失函数,它是根据用户的绝对偏好和相对偏好计算的。我们提出的排名模型使用两两比较数据来预测用户在成对物品之间的偏好顺序,并使用评级将高评级(相关)物品推到排名的顶部。在三个不同数据集上的实验结果表明,该方法在基于排名的评价指标上优于竞争对手的基线算法。
{"title":"Item Recommendation by Combining Relative and Absolute Feedback Data","authors":"Saikishore Kalloori, Tianyu Li, F. Ricci","doi":"10.1145/3331184.3331295","DOIUrl":"https://doi.org/10.1145/3331184.3331295","url":null,"abstract":"User preferences in the form of absolute feedback, s.a., ratings, are widely exploited in Recommender Systems (RSs). Recent research has explored the usage of preferences expressed with pairwise comparisons, which signal relative feedback. It has been shown that pairwise comparisons can be effectively combined with ratings, but, it is important to fine tune the technique that leverages both types of feedback. Previous approaches train a single model by converting ratings into pairwise comparisons, and then use only that type of data. However, we claim that these two types of preferences reveal different information about users interests and should be exploited differently. Hence, in this work, we develop a ranking technique that separately exploits absolute and relative preferences in a hybrid model. In particular, we propose a joint loss function which is computed on both absolute and relative preferences of users. Our proposed ranking model uses pairwise comparisons data to predict the user's preference order between pairs of items and uses ratings to push high rated (relevant) items to the top of the ranking. Experimental results on three different data sets demonstrate that the proposed technique outperforms competitive baseline algorithms on popular ranking-oriented evaluation metrics.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"521 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79020049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
AgentBuddy: an IR System based on Bandit Algorithms to Reduce Cognitive Load for Customer Care Agents AgentBuddy:一个基于Bandit算法的IR系统,以减少客户服务座席的认知负荷
Hrishikesh Ganu, Mithun Ghosh, Freddy Jose, S. Roshan
We describe a human-in-the loop system - AgentBuddy, that is helping Intuit improve the quality of search it offers to its internal Customer Care Agents (CCAs). AgentBuddy aims to reduce the cognitive effort on part of the CCAs while at the same time boosting the quality of our legacy federated search system. Under the hood, it leverages bandit algorithms to improve federated search and other ML models like LDA, Siamese networks to help CCAs zero in on high quality search results. An intuitive UI designed ground up working with the users (CCAs) is another key feature of the system. AgentBuddy has been deployed internally and initial results from User Acceptance Trials indicate a 4x lift in quality of highlights compared to the incumbent system.
我们描述了一个人在循环系统- AgentBuddy,它正在帮助Intuit提高其提供给内部客户服务代理(cca)的搜索质量。AgentBuddy旨在减少部分cca的认知工作,同时提高传统联邦搜索系统的质量。在底层,它利用强盗算法来改进联邦搜索和其他ML模型,如LDA、Siamese网络,以帮助cca锁定高质量的搜索结果。与用户(cca)一起工作的直观UI是该系统的另一个关键特性。AgentBuddy已经在内部部署,用户验收试验的初步结果表明,与现有系统相比,亮点质量提高了4倍。
{"title":"AgentBuddy: an IR System based on Bandit Algorithms to Reduce Cognitive Load for Customer Care Agents","authors":"Hrishikesh Ganu, Mithun Ghosh, Freddy Jose, S. Roshan","doi":"10.1145/3331184.3331408","DOIUrl":"https://doi.org/10.1145/3331184.3331408","url":null,"abstract":"We describe a human-in-the loop system - AgentBuddy, that is helping Intuit improve the quality of search it offers to its internal Customer Care Agents (CCAs). AgentBuddy aims to reduce the cognitive effort on part of the CCAs while at the same time boosting the quality of our legacy federated search system. Under the hood, it leverages bandit algorithms to improve federated search and other ML models like LDA, Siamese networks to help CCAs zero in on high quality search results. An intuitive UI designed ground up working with the users (CCAs) is another key feature of the system. AgentBuddy has been deployed internally and initial results from User Acceptance Trials indicate a 4x lift in quality of highlights compared to the incumbent system.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84955967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Context-Aware Evaluation for Image Search 面向图像搜索的上下文感知评价
Yunqiu Shao, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma
Compared to general web search, image search engines present results in a significantly different way, which leads to changes in user behavior patterns, and thus creates challenges for the existing evaluation mechanisms. In this paper, we pay attention to the context factor in the image search scenario. On the basis of a mean-variance analysis, we investigate the effects of context and find that evaluation metrics align with user satisfaction better when the returned image results have high variance. Furthermore, assuming that the image results a user has examined might affect her following judgments, we propose the Context-Aware Gain (CAG), a novel evaluation metric that incorporates the contextual effects within the well-known gain-discount framework. Our experiment results show that, with a proper combination of discount functions, the proposed context-aware evaluation metric can significantly improve the performances of offline metrics for image search evaluation, considering user satisfaction as the golden standard.
与一般的网络搜索相比,图像搜索引擎呈现结果的方式明显不同,这导致了用户行为模式的变化,从而给现有的评估机制带来了挑战。在本文中,我们关注图像搜索场景中的上下文因素。在均值方差分析的基础上,我们研究了上下文的影响,发现当返回的图像结果具有高方差时,评估指标与用户满意度更一致。此外,假设用户检查的图像结果可能会影响她接下来的判断,我们提出了上下文感知增益(CAG),这是一种新的评估指标,将上下文效应纳入众所周知的增益-折扣框架。实验结果表明,以用户满意度为黄金标准,通过适当组合折扣函数,所提出的上下文感知评价指标可以显著提高离线图像搜索评价指标的性能。
{"title":"Towards Context-Aware Evaluation for Image Search","authors":"Yunqiu Shao, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma","doi":"10.1145/3331184.3331343","DOIUrl":"https://doi.org/10.1145/3331184.3331343","url":null,"abstract":"Compared to general web search, image search engines present results in a significantly different way, which leads to changes in user behavior patterns, and thus creates challenges for the existing evaluation mechanisms. In this paper, we pay attention to the context factor in the image search scenario. On the basis of a mean-variance analysis, we investigate the effects of context and find that evaluation metrics align with user satisfaction better when the returned image results have high variance. Furthermore, assuming that the image results a user has examined might affect her following judgments, we propose the Context-Aware Gain (CAG), a novel evaluation metric that incorporates the contextual effects within the well-known gain-discount framework. Our experiment results show that, with a proper combination of discount functions, the proposed context-aware evaluation metric can significantly improve the performances of offline metrics for image search evaluation, considering user satisfaction as the golden standard.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85005093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A New Perspective on Score Standardization 分数标准化的新视角
Julián Urbano, Harlley Lima, A. Hanjalic
In test collection based evaluation of IR systems, score standardization has been proposed to compare systems across collections and minimize the effect of outlier runs on specific topics. The underlying idea is to account for the difficulty of topics, so that systems are scored relative to it. Webber et al. first proposed standardization through a non-linear transformation with the standard normal distribution, and recently Sakai proposed a simple linear transformation. In this paper, we show that both approaches are actually special cases of a simple standardization which assumes specific distributions for the per-topic scores. From this viewpoint, we argue that a transformation based on the empirical distribution is the most appropriate choice for this kind of standardization. Through a series of experiments on TREC data, we show the benefits of our proposal in terms of score stability and statistical test behavior.
在基于测试集合的IR系统评估中,已经提出了分数标准化来比较不同集合的系统,并尽量减少在特定主题上异常运行的影响。潜在的想法是考虑到题目的难度,这样系统就可以根据题目的难度来评分。Webber等人首先通过标准正态分布的非线性变换提出了标准化,最近Sakai提出了简单的线性变换。在本文中,我们表明这两种方法实际上都是简单标准化的特殊情况,它假设每个主题分数的特定分布。从这个角度出发,我们认为基于经验分布的转换是这种标准化最合适的选择。通过对TREC数据的一系列实验,我们证明了我们的提议在分数稳定性和统计测试行为方面的好处。
{"title":"A New Perspective on Score Standardization","authors":"Julián Urbano, Harlley Lima, A. Hanjalic","doi":"10.1145/3331184.3331315","DOIUrl":"https://doi.org/10.1145/3331184.3331315","url":null,"abstract":"In test collection based evaluation of IR systems, score standardization has been proposed to compare systems across collections and minimize the effect of outlier runs on specific topics. The underlying idea is to account for the difficulty of topics, so that systems are scored relative to it. Webber et al. first proposed standardization through a non-linear transformation with the standard normal distribution, and recently Sakai proposed a simple linear transformation. In this paper, we show that both approaches are actually special cases of a simple standardization which assumes specific distributions for the per-topic scores. From this viewpoint, we argue that a transformation based on the empirical distribution is the most appropriate choice for this kind of standardization. Through a series of experiments on TREC data, we show the benefits of our proposal in terms of score stability and statistical test behavior.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76632062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Quantifying Bias and Variance of System Rankings 系统排名的量化偏差和方差
G. Cormack, Maura R. Grossman
When used to assess the accuracy of system rankings, Kendall's tau and other rank correlation measures conflate bias and variance as sources of error. We derive from tau a distance between rankings in Euclidean space, from which we can determine the magnitude of bias, variance, and error. Using bootstrap estimation, we show that shallow pooling has substantially higher bias and insubstantially lower variance than probability-proportional-to-size sampling, coupled with the recently released dynAP estimator.
当用于评估系统排名的准确性时,肯德尔的tau和其他排名相关度量将偏差和方差合并为误差来源。我们从欧几里得空间中排名之间的距离得到tau,从中我们可以确定偏差,方差和误差的大小。使用自举估计,我们表明,与最近发布的dynAP估计器相比,浅池化具有显着更高的偏差和显着更低的方差。
{"title":"Quantifying Bias and Variance of System Rankings","authors":"G. Cormack, Maura R. Grossman","doi":"10.1145/3331184.3331356","DOIUrl":"https://doi.org/10.1145/3331184.3331356","url":null,"abstract":"When used to assess the accuracy of system rankings, Kendall's tau and other rank correlation measures conflate bias and variance as sources of error. We derive from tau a distance between rankings in Euclidean space, from which we can determine the magnitude of bias, variance, and error. Using bootstrap estimation, we show that shallow pooling has substantially higher bias and insubstantially lower variance than probability-proportional-to-size sampling, coupled with the recently released dynAP estimator.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"739 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81978024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Unbiased Low-Variance Estimators for Precision and Related Information Retrieval Effectiveness Measures 精度和相关信息检索有效性度量的无偏低方差估计
G. Cormack, Maura R. Grossman
This work describes an estimator from which unbiased measurements of precision, rank-biased precision, and cumulative gain may be derived from a uniform or non-uniform sample of relevance assessments. Adversarial testing supports the theory that our estimator yields unbiased low-variance measurements from sparse samples, even when used to measure results that are qualitatively different from those returned by known information retrieval methods. Our results suggest that test collections using sampling to select documents for relevance assessment yield more accurate measurements than test collections using pooling, especially for the results of retrieval methods not contributing to the pool.
这项工作描述了一个估计器,从该估计器中可以从相关评估的均匀或非均匀样本中获得精度,秩偏精度和累积增益的无偏测量。对抗性测试支持这样的理论,即我们的估计器从稀疏的样本中产生无偏的低方差测量,即使用于测量与已知信息检索方法返回的结果在质量上不同的结果。我们的结果表明,使用抽样来选择文档进行相关性评估的测试集合比使用池的测试集合产生更准确的测量结果,特别是对于不参与池的检索方法的结果。
{"title":"Unbiased Low-Variance Estimators for Precision and Related Information Retrieval Effectiveness Measures","authors":"G. Cormack, Maura R. Grossman","doi":"10.1145/3331184.3331355","DOIUrl":"https://doi.org/10.1145/3331184.3331355","url":null,"abstract":"This work describes an estimator from which unbiased measurements of precision, rank-biased precision, and cumulative gain may be derived from a uniform or non-uniform sample of relevance assessments. Adversarial testing supports the theory that our estimator yields unbiased low-variance measurements from sparse samples, even when used to measure results that are qualitatively different from those returned by known information retrieval methods. Our results suggest that test collections using sampling to select documents for relevance assessment yield more accurate measurements than test collections using pooling, especially for the results of retrieval methods not contributing to the pool.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80782033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1