首页 > 最新文献

Proceedings of the 13th International Conference on Web Search and Data Mining最新文献

英文 中文
LARA 劳拉
Pub Date : 2020-01-20 DOI: 10.1093/benz/9780199773787.article.b00104587
Changfeng Sun, Han Liu, Meng Liu, Z. Ren, Tian Gan, Liqiang Nie
Paraphrase the content of the speaker’s words. This step is especially helpful in confirming that you and the speaker are on the same page. If you can put what the speaker says into your own words, it demonstrates you’ve listened attentively and allows the speaker to correct or clarify any misunderstanding. Express a connection between what the speaker said and what you heard. It could be a feeling, an experience, or a common principle shared with the other person.
改写讲话者的话的内容。这一步对确认你和说话人的意见一致特别有帮助。如果你能把说话人说的话用你自己的话表达出来,这就表明你在认真听,并允许说话人纠正或澄清任何误解。表达出说话者所说的和你所听到的之间的联系。它可以是一种感觉,一种经历,或者是与他人分享的共同原则。
{"title":"LARA","authors":"Changfeng Sun, Han Liu, Meng Liu, Z. Ren, Tian Gan, Liqiang Nie","doi":"10.1093/benz/9780199773787.article.b00104587","DOIUrl":"https://doi.org/10.1093/benz/9780199773787.article.b00104587","url":null,"abstract":"Paraphrase the content of the speaker’s words. This step is especially helpful in confirming that you and the speaker are on the same page. If you can put what the speaker says into your own words, it demonstrates you’ve listened attentively and allows the speaker to correct or clarify any misunderstanding. Express a connection between what the speaker said and what you heard. It could be a feeling, an experience, or a common principle shared with the other person.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"479 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122784039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Overview of the Health Search and Data Mining (HSDM 2020) Workshop 健康搜索和数据挖掘(HSDM 2020)研讨会概述
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371879
Carsten Eickhoff, Yubin Kim, Ryen W. White
We present HSDM, a full-day workshop on Health Search and Data Mining co-located with WSDM 2020's Health Day. This event builds on recent biomedical workshops in the NLP and ML communities but puts a clear emphasis on search and data mining (and their intersection) that is lacking in other venues. The program will include two keynote addresses by key opinion leaders in the clinical, search, and data mining domains. The technical program consists of 6 original research presentations. Finally, we will close with a panel discussion with keynote speakers, PC members, and the audience. This workshop aims to help consolidate the growing interest in biomedical applications of data-driven methods that becomes apparent all over the search and data mining spectrum, in WSDM's spirit of collaboration between industry and academia.
我们介绍了HSDM,这是一个关于健康搜索和数据挖掘的全天研讨会,与WSDM 2020健康日位于同一地点。该活动建立在NLP和ML社区最近的生物医学研讨会的基础上,但明确强调了搜索和数据挖掘(及其交叉),这是其他场所所缺乏的。该计划将包括临床,搜索和数据挖掘领域的主要意见领袖的两个主题演讲。技术方案由6个原创研究报告组成。最后,我们将以由主讲人、PC成员和观众参加的小组讨论结束。本次研讨会旨在帮助巩固数据驱动方法在生物医学应用方面日益增长的兴趣,这种兴趣在整个搜索和数据挖掘领域都很明显,这是WSDM在产业界和学术界之间合作的精神。
{"title":"Overview of the Health Search and Data Mining (HSDM 2020) Workshop","authors":"Carsten Eickhoff, Yubin Kim, Ryen W. White","doi":"10.1145/3336191.3371879","DOIUrl":"https://doi.org/10.1145/3336191.3371879","url":null,"abstract":"We present HSDM, a full-day workshop on Health Search and Data Mining co-located with WSDM 2020's Health Day. This event builds on recent biomedical workshops in the NLP and ML communities but puts a clear emphasis on search and data mining (and their intersection) that is lacking in other venues. The program will include two keynote addresses by key opinion leaders in the clinical, search, and data mining domains. The technical program consists of 6 original research presentations. Finally, we will close with a panel discussion with keynote speakers, PC members, and the audience. This workshop aims to help consolidate the growing interest in biomedical applications of data-driven methods that becomes apparent all over the search and data mining spectrum, in WSDM's spirit of collaboration between industry and academia.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128680222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval Capreolus:端到端神经自组织检索工具包
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371868
Andrew Yates, Siddhant Arora, Xinyu Crystina Zhang, Wei Yang, Kevin Martin Jose, Jimmy J. Lin
We present Capreolus, a toolkit designed to facilitate end-to-end it ad hoc retrieval experiments with neural networks by providing implementations of prominent neural ranking models within a common framework. Our toolkit adopts a standard reranking architecture via tight integration with the Anserini toolkit for candidate document generation using standard bag-of-words approaches. Using Capreolus, we are able to reproduce Yang et al.'s recent SIGIR 2019 finding that, in a reranking scenario on the test collection from the TREC 2004 Robust Track, many neural retrieval models do not significantly outperform a strong query expansion baseline. Furthermore, we find that this holds true for five additional models implemented in Capreolus. We describe the architecture and design of our toolkit, which includes a Web interface to facilitate comparisons between rankings returned by different models.
我们介绍了Capreolus,这是一个工具包,旨在通过在公共框架内提供突出的神经排序模型的实现,促进神经网络的端到端特别检索实验。我们的工具包通过与Anserini工具包紧密集成,采用标准的重新排序体系结构,使用标准的词袋方法生成候选文档。使用Capreolus,我们能够重现Yang等人最近的SIGIR 2019发现,在TREC 2004 Robust Track测试集的重新排序场景中,许多神经检索模型并没有显着优于强查询扩展基线。此外,我们发现这适用于在Capreolus中实现的另外五个模型。我们描述了工具包的体系结构和设计,其中包括一个Web界面,用于比较不同模型返回的排名。
{"title":"Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval","authors":"Andrew Yates, Siddhant Arora, Xinyu Crystina Zhang, Wei Yang, Kevin Martin Jose, Jimmy J. Lin","doi":"10.1145/3336191.3371868","DOIUrl":"https://doi.org/10.1145/3336191.3371868","url":null,"abstract":"We present Capreolus, a toolkit designed to facilitate end-to-end it ad hoc retrieval experiments with neural networks by providing implementations of prominent neural ranking models within a common framework. Our toolkit adopts a standard reranking architecture via tight integration with the Anserini toolkit for candidate document generation using standard bag-of-words approaches. Using Capreolus, we are able to reproduce Yang et al.'s recent SIGIR 2019 finding that, in a reranking scenario on the test collection from the TREC 2004 Robust Track, many neural retrieval models do not significantly outperform a strong query expansion baseline. Furthermore, we find that this holds true for five additional models implemented in Capreolus. We describe the architecture and design of our toolkit, which includes a Web interface to facilitate comparisons between rankings returned by different models.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123762103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
personality2vec: Enabling the Analysis of Behavioral Disorders in Social Networks personity2vec:使社会网络中的行为障碍分析成为可能
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371865
A. Beheshti, V. Hashemi, S. Yakhchi, H. M. Nezhad, S. Ghafari, Jian Yang
Enabling the analysis of behavioral disorders over time in social networks, can help in suicide prevention, (school) bullying detection and extremist/criminal activity prediction. In this paper, we present a novel data analytics pipeline to enable the analysis of patterns of behavioral disorders on social networks. We present a Social Behavior Graph (sbGraph) model, to enable the analysis of factors that are driving behavior disorders over time. We use the golden standards in personality, behavior and attitude to build a domain specific Knowledge Base (KB). We use this domain knowledge to design cognitive services to automatically contextualize the raw social data and to prepare them for behavioral analytics. Then we introduce a pattern-based word embedding technique, namely personality2vec, on each feature extracted to build the sbGraph. The goal is to use mathematical embedding from a space with a dimension per feature to a continuous vector space which can be mapped to classes of behavioral disorders (such as cyber-bullying and radicalization) in the domain specific KB. We implement an interactive dashboard to enable social network analysts to analyze and understand the patterns of behavioral disorders over time. We focus on a motivating scenario in Australian government's office of the e-Safety commissioner, where the goal is to empowering all citizens to have safer, more positive experiences online.
随着时间的推移,分析社交网络中的行为障碍可以帮助预防自杀,(学校)欺凌检测和极端主义/犯罪活动预测。在本文中,我们提出了一种新的数据分析管道,可以分析社交网络上的行为障碍模式。我们提出了一个社会行为图(sbGraph)模型,以便分析导致行为障碍的因素。我们使用人格、行为和态度的黄金标准来构建特定领域的知识库(KB)。我们使用这些领域知识来设计认知服务,以自动地将原始社会数据上下文化,并为行为分析做好准备。然后在每个提取的特征上引入基于模式的词嵌入技术personity2vec来构建sbGraph。目标是使用数学嵌入,从每个特征有一个维度的空间到一个连续的向量空间,这个空间可以映射到特定领域知识库中的行为障碍类别(如网络欺凌和激进化)。我们实现了一个交互式仪表板,使社交网络分析师能够分析和理解行为障碍的模式。我们关注的是澳大利亚政府电子安全专员办公室的一个激励方案,其目标是让所有公民都能拥有更安全、更积极的在线体验。
{"title":"personality2vec: Enabling the Analysis of Behavioral Disorders in Social Networks","authors":"A. Beheshti, V. Hashemi, S. Yakhchi, H. M. Nezhad, S. Ghafari, Jian Yang","doi":"10.1145/3336191.3371865","DOIUrl":"https://doi.org/10.1145/3336191.3371865","url":null,"abstract":"Enabling the analysis of behavioral disorders over time in social networks, can help in suicide prevention, (school) bullying detection and extremist/criminal activity prediction. In this paper, we present a novel data analytics pipeline to enable the analysis of patterns of behavioral disorders on social networks. We present a Social Behavior Graph (sbGraph) model, to enable the analysis of factors that are driving behavior disorders over time. We use the golden standards in personality, behavior and attitude to build a domain specific Knowledge Base (KB). We use this domain knowledge to design cognitive services to automatically contextualize the raw social data and to prepare them for behavioral analytics. Then we introduce a pattern-based word embedding technique, namely personality2vec, on each feature extracted to build the sbGraph. The goal is to use mathematical embedding from a space with a dimension per feature to a continuous vector space which can be mapped to classes of behavioral disorders (such as cyber-bullying and radicalization) in the domain specific KB. We implement an interactive dashboard to enable social network analysts to analyze and understand the patterns of behavioral disorders over time. We focus on a motivating scenario in Australian government's office of the e-Safety commissioner, where the goal is to empowering all citizens to have safer, more positive experiences online.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133809494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Label Distribution Augmented Maximum Likelihood Estimation for Reading Comprehension 标签分布增强最大似然估计的阅读理解
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371835
Lixin Su, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xueqi Cheng
Reading comprehension (RC) aims to locate a text span from a context passage to answer the given question. Despite the effectiveness of modern neural RC models, most existing work relies on maximum likelihood estimation (MLE) and ignores the structure of the output space. That is during training, one treats all the text spans do not match the ground truth as equally poor, leading to overconfident predictions on ground truth labels and reduced generalization ability in test. One way to bridge the gap between training and test is to take into account the task reward of alternative outputs using the reinforcement learning (RL) algorithms, which is often deficient in optimization as compared with MLE. In this paper, we propose a new learning criterion for the RC task which combines the merits of both MLE and RL-based methods. Specifically, we show that we are able to derive the distribution of the outputs, i.e., label distribution, using their corresponding task rewards based on the decomposition property of the RC problem. We then optimize the RC model by directly learning towards the auxiliary label distribution, instead of the ground truth label, using the MLE framework. In this way, we can make use of the structure of the output space for better generalization (as RL) via efficient optimization (as MLE). We name our approach as Label Distribution augmented MLE (LD-MLE), which is a general learning criterion that could be adopted by almost all the existing RC models. Experiments on three representative benchmark datasets demonstrate that RC models learned with the LD-MLE criterion can achieve consistently improved results over those based on the traditional MLE and RL-based criteria.
阅读理解(RC)的目的是从上下文文章中找到一个文本跨度来回答给定的问题。尽管现代神经RC模型是有效的,但大多数现有的工作依赖于最大似然估计(MLE)而忽略了输出空间的结构。也就是说,在训练过程中,人们将所有与基础真值不匹配的文本跨度视为同等差,导致对基础真值标签的预测过于自信,从而降低了测试中的泛化能力。一种弥合训练和测试之间差距的方法是使用强化学习(RL)算法考虑备选输出的任务奖励,与MLE相比,RL算法通常缺乏优化。在本文中,我们提出了一种新的RC任务学习准则,该准则结合了基于最大似然学习和基于强化学习两种方法的优点。具体来说,我们证明了我们能够根据RC问题的分解性质,利用它们相应的任务奖励,推导出输出的分布,即标签分布。然后,我们通过直接学习辅助标签分布来优化RC模型,而不是使用MLE框架的真实标签。这样,我们就可以利用输出空间的结构,通过高效的优化(如MLE)进行更好的泛化(如RL)。我们将我们的方法命名为标签分布增强MLE (LD-MLE),这是一种可以被几乎所有现有RC模型采用的通用学习标准。在三个具有代表性的基准数据集上的实验表明,使用LD-MLE准则学习的RC模型比基于传统MLE和基于rl的准则学习的RC模型取得了一致的改进结果。
{"title":"Label Distribution Augmented Maximum Likelihood Estimation for Reading Comprehension","authors":"Lixin Su, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xueqi Cheng","doi":"10.1145/3336191.3371835","DOIUrl":"https://doi.org/10.1145/3336191.3371835","url":null,"abstract":"Reading comprehension (RC) aims to locate a text span from a context passage to answer the given question. Despite the effectiveness of modern neural RC models, most existing work relies on maximum likelihood estimation (MLE) and ignores the structure of the output space. That is during training, one treats all the text spans do not match the ground truth as equally poor, leading to overconfident predictions on ground truth labels and reduced generalization ability in test. One way to bridge the gap between training and test is to take into account the task reward of alternative outputs using the reinforcement learning (RL) algorithms, which is often deficient in optimization as compared with MLE. In this paper, we propose a new learning criterion for the RC task which combines the merits of both MLE and RL-based methods. Specifically, we show that we are able to derive the distribution of the outputs, i.e., label distribution, using their corresponding task rewards based on the decomposition property of the RC problem. We then optimize the RC model by directly learning towards the auxiliary label distribution, instead of the ground truth label, using the MLE framework. In this way, we can make use of the structure of the output space for better generalization (as RL) via efficient optimization (as MLE). We name our approach as Label Distribution augmented MLE (LD-MLE), which is a general learning criterion that could be adopted by almost all the existing RC models. Experiments on three representative benchmark datasets demonstrate that RC models learned with the LD-MLE criterion can achieve consistently improved results over those based on the traditional MLE and RL-based criteria.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"473 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131071315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
TSA: A Truthful Mechanism for Social Advertising TSA:社交广告的真实机制
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371809
Tobias Grubenmann, Reynold Cheng, L. Lakshmanan
Social advertising exploits the interconnectivity of users in social networks to spread advertisement and generate user engagements. A lot of research has focused on how to select the best subset of users in a social network to maximize the number of engagements or the generated revenue of the advertisement. However, there is a lack of studies that consider the advertiser's value-per-engagement, i.e., how much an advertiser is maximally willing to pay for each engagement. Prior work on social advertising is based on the classical framework of influence maximization. In this paper, we propose a model where advertisers compete in an auction mechanism for the influential users within a social network. The auction mechanism can dynamically determine payments for advertisers based on their reported values. The main problem is to find auctions which incentivize advertisers to truthfully reveal their values, and also respect each advertiser's budget constraint. To tackle this problem, we propose a new truthful auction mechanism called TSA. Compared with existing approaches on real and synthetic datasets, TSA performs significantly better in terms of generated revenue.
社交广告利用社交网络中用户的互联性来传播广告并产生用户参与。许多研究都集中在如何在社交网络中选择最佳的用户子集,以最大化参与数量或广告产生的收入。然而,缺乏考虑广告主每次参与价值的研究,即广告主最愿意为每次参与支付多少钱。先前关于社交广告的工作是基于影响最大化的经典框架。在本文中,我们提出了一个模型,其中广告商在拍卖机制中竞争社交网络中有影响力的用户。拍卖机制可以根据广告商报告的价值动态地决定广告商的支付。主要问题是如何找到既能激励广告主如实展示其价值,又能尊重每个广告主预算约束的拍卖方式。为了解决这个问题,我们提出了一种新的真实拍卖机制,叫做TSA。与真实和合成数据集上的现有方法相比,TSA在产生收入方面表现得更好。
{"title":"TSA: A Truthful Mechanism for Social Advertising","authors":"Tobias Grubenmann, Reynold Cheng, L. Lakshmanan","doi":"10.1145/3336191.3371809","DOIUrl":"https://doi.org/10.1145/3336191.3371809","url":null,"abstract":"Social advertising exploits the interconnectivity of users in social networks to spread advertisement and generate user engagements. A lot of research has focused on how to select the best subset of users in a social network to maximize the number of engagements or the generated revenue of the advertisement. However, there is a lack of studies that consider the advertiser's value-per-engagement, i.e., how much an advertiser is maximally willing to pay for each engagement. Prior work on social advertising is based on the classical framework of influence maximization. In this paper, we propose a model where advertisers compete in an auction mechanism for the influential users within a social network. The auction mechanism can dynamically determine payments for advertisers based on their reported values. The main problem is to find auctions which incentivize advertisers to truthfully reveal their values, and also respect each advertiser's budget constraint. To tackle this problem, we propose a new truthful auction mechanism called TSA. Compared with existing approaches on real and synthetic datasets, TSA performs significantly better in terms of generated revenue.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"404 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115318905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
TSA
Pub Date : 2020-01-20 DOI: 10.1007/978-3-319-67199-4_103993
Tobias Grubenmann, Reynold Cheng, L. Lakshmanan
{"title":"TSA","authors":"Tobias Grubenmann, Reynold Cheng, L. Lakshmanan","doi":"10.1007/978-3-319-67199-4_103993","DOIUrl":"https://doi.org/10.1007/978-3-319-67199-4_103993","url":null,"abstract":"","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115544515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Stochastic Treatment of Learning to Rank Scoring Functions 学习排序评分函数的随机处理
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371844
Sebastian Bruch, Shuguang Han, Michael Bendersky, Marc Najork
Learning to Rank, a central problem in information retrieval, is a class of machine learning algorithms that formulate ranking as an optimization task. The objective is to learn a function that produces an ordering of a set of documents in such a way that the utility of the entire ordered list is maximized. Learning-to-rank methods do so by learning a function that computes a score for each document in the set. A ranked list is then compiled by sorting documents according to their scores. While such a deterministic mapping of scores to permutations makes sense during inference where stability of ranked lists is required, we argue that its greedy nature during training leads to less robust models. This is particularly problematic when the loss function under optimization---in agreement with ranking metrics---largely penalizes incorrect rankings and does not take into account the distribution of raw scores. In this work, we present a stochastic framework where, instead of a deterministic derivation of permutations from raw scores, permutations are sampled from a distribution defined by raw scores. Our proposed sampling method is differentiable and works well with gradient descent optimizers. We analytically study our proposed method and demonstrate when and why it leads to model robustness. We also show empirically, through experiments on publicly available learning-to-rank datasets, that the application of our proposed method to a class of ranking loss functions leads to significant model quality improvements.
排序学习是信息检索中的一个核心问题,是一类机器学习算法,它将排序作为一项优化任务来制定。我们的目标是学习一个函数,该函数以使整个有序列表的效用最大化的方式对一组文档进行排序。学习排序方法通过学习一个函数来计算集合中每个文档的分数。然后,根据文件的分数对其进行排序,编制出一个排名列表。虽然这种分数到排列的确定性映射在需要排名列表稳定性的推理中是有意义的,但我们认为它在训练期间的贪婪性质导致模型的鲁棒性较差。当优化中的损失函数(与排名指标一致)在很大程度上惩罚了不正确的排名,并且没有考虑到原始分数的分布时,这尤其成问题。在这项工作中,我们提出了一个随机框架,在这个框架中,排列不是从原始分数中确定的推导,而是从原始分数定义的分布中抽样。我们提出的采样方法是可微的,并且可以很好地与梯度下降优化器一起工作。我们分析地研究了我们提出的方法,并证明了何时以及为什么它会导致模型鲁棒性。我们还通过对公开可用的学习排序数据集的实验表明,将我们提出的方法应用于一类排序损失函数可以显著提高模型质量。
{"title":"A Stochastic Treatment of Learning to Rank Scoring Functions","authors":"Sebastian Bruch, Shuguang Han, Michael Bendersky, Marc Najork","doi":"10.1145/3336191.3371844","DOIUrl":"https://doi.org/10.1145/3336191.3371844","url":null,"abstract":"Learning to Rank, a central problem in information retrieval, is a class of machine learning algorithms that formulate ranking as an optimization task. The objective is to learn a function that produces an ordering of a set of documents in such a way that the utility of the entire ordered list is maximized. Learning-to-rank methods do so by learning a function that computes a score for each document in the set. A ranked list is then compiled by sorting documents according to their scores. While such a deterministic mapping of scores to permutations makes sense during inference where stability of ranked lists is required, we argue that its greedy nature during training leads to less robust models. This is particularly problematic when the loss function under optimization---in agreement with ranking metrics---largely penalizes incorrect rankings and does not take into account the distribution of raw scores. In this work, we present a stochastic framework where, instead of a deterministic derivation of permutations from raw scores, permutations are sampled from a distribution defined by raw scores. Our proposed sampling method is differentiable and works well with gradient descent optimizers. We analytically study our proposed method and demonstrate when and why it leads to model robustness. We also show empirically, through experiments on publicly available learning-to-rank datasets, that the application of our proposed method to a class of ranking loss functions leads to significant model quality improvements.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122444888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
VISION-KG: Topic-centric Visualization System for Summarizing Knowledge Graph VISION-KG:以主题为中心的知识图谱可视化系统
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371863
Jiaqi Wei, Shuo Han, Lei Zou
Large scale knowledge graph (KG) has attracted wide attentions in both academia and industry recently. However, due to the complexity of SPARQL syntax and massive volume of real KG, it remains difficult for ordinary users to access KG. In this demo, we present VISION-KG, a topic-centric visualization system to help users navigate KG easily via entity summarization and entity clustering. Given a query entity v0, VISION-KG summarizes the induced subgraph of v0's neighbor nodes via our proposed facts ranking method that measures importance, relatedness and diversity. Moreover, to achieve conciseness, we split the summarized graph into several topic-centric summarized subgraph according to semantic and structural similarities among entities. We will demonstrate how VISION-KG provides a user-friendly visualization interface for navigating KG.
近年来,大规模知识图谱(KG)受到了学术界和工业界的广泛关注。然而,由于SPARQL语法的复杂性和大量的实际KG,普通用户仍然很难访问KG。在本演示中,我们介绍了VISION-KG,这是一个以主题为中心的可视化系统,可帮助用户通过实体摘要和实体聚类轻松导航KG。给定一个查询实体v0, VISION-KG通过我们提出的衡量重要性、相关性和多样性的事实排序方法,总结v0邻居节点的诱导子图。此外,为了实现简洁性,我们根据实体之间的语义和结构相似性将摘要图划分为几个以主题为中心的摘要子图。我们将演示VISION-KG如何为导航KG提供一个用户友好的可视化界面。
{"title":"VISION-KG: Topic-centric Visualization System for Summarizing Knowledge Graph","authors":"Jiaqi Wei, Shuo Han, Lei Zou","doi":"10.1145/3336191.3371863","DOIUrl":"https://doi.org/10.1145/3336191.3371863","url":null,"abstract":"Large scale knowledge graph (KG) has attracted wide attentions in both academia and industry recently. However, due to the complexity of SPARQL syntax and massive volume of real KG, it remains difficult for ordinary users to access KG. In this demo, we present VISION-KG, a topic-centric visualization system to help users navigate KG easily via entity summarization and entity clustering. Given a query entity v0, VISION-KG summarizes the induced subgraph of v0's neighbor nodes via our proposed facts ranking method that measures importance, relatedness and diversity. Moreover, to achieve conciseness, we split the summarized graph into several topic-centric summarized subgraph according to semantic and structural similarities among entities. We will demonstrate how VISION-KG provides a user-friendly visualization interface for navigating KG.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122594832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
User Recommendation in Content Curation Platforms 内容管理平台中的用户推荐
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371822
Jianling Wang, Ziwei Zhu, James Caverlee
We propose a personalized user recommendation framework for content curation platforms that models preferences for both users and the items they engage with simultaneously. In this way, user preferences for specific item types (e.g., fantasy novels) can be balanced with user specialties (e.g., reviewing novels with strong female protagonists). In particular, the proposed model has three unique characteristics: (i) it simultaneously learns both user-item and user-user preferences through a multi-aspect autoencoder model; (ii) it fuses the latent representations of user preferences on users and items to construct shared factors through an adversarial framework; and (iii) it incorporates an attention layer to produce weighted aggregations of different latent representations, leading to improved personalized recommendation of users and items. Through experiments against state-of-the-art models, we find the proposed framework leads to a 18.43% (Goodreads) and 6.14% (Spotify) improvement in top-k user recommendation.
我们为内容管理平台提出了一个个性化的用户推荐框架,该框架可以模拟用户和他们同时参与的项目的偏好。通过这种方式,用户对特定道具类型(如奇幻小说)的偏好可以与用户特长(如评论具有强大女性主角的小说)相平衡。特别地,所提出的模型具有三个独特的特征:(i)它通过一个多面向的自编码器模型同时学习用户-项目和用户-用户偏好;(ii)通过对抗性框架融合用户偏好对用户和项目的潜在表征,构建共享因素;(iii)它结合了一个关注层来产生不同潜在表示的加权聚合,从而改进了用户和项目的个性化推荐。通过对最先进模型的实验,我们发现所提出的框架在top-k用户推荐方面提高了18.43% (Goodreads)和6.14% (Spotify)。
{"title":"User Recommendation in Content Curation Platforms","authors":"Jianling Wang, Ziwei Zhu, James Caverlee","doi":"10.1145/3336191.3371822","DOIUrl":"https://doi.org/10.1145/3336191.3371822","url":null,"abstract":"We propose a personalized user recommendation framework for content curation platforms that models preferences for both users and the items they engage with simultaneously. In this way, user preferences for specific item types (e.g., fantasy novels) can be balanced with user specialties (e.g., reviewing novels with strong female protagonists). In particular, the proposed model has three unique characteristics: (i) it simultaneously learns both user-item and user-user preferences through a multi-aspect autoencoder model; (ii) it fuses the latent representations of user preferences on users and items to construct shared factors through an adversarial framework; and (iii) it incorporates an attention layer to produce weighted aggregations of different latent representations, leading to improved personalized recommendation of users and items. Through experiments against state-of-the-art models, we find the proposed framework leads to a 18.43% (Goodreads) and 6.14% (Spotify) improvement in top-k user recommendation.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127509683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the 13th International Conference on Web Search and Data Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1