首页 > 最新文献

The World Wide Web Conference最新文献

英文 中文
A Family of Fuzzy Orthogonal Projection Models for Monolingual and Cross-lingual Hypernymy Prediction 一组用于单语和跨语夸张预测的模糊正交投影模型
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313439
Chengyu Wang, Yan Fan, Xiaofeng He, Aoying Zhou
Hypernymy is a semantic relation, expressing the “is-a” relation between a concept and its instances. Such relations are building blocks for large-scale taxonomies, ontologies and knowledge graphs. Recently, much progress has been made for hypernymy prediction in English using textual patterns and/or distributional representations. However, applying such techniques to other languages is challenging due to the high language dependency of these methods and the lack of large training datasets of lower-resourced languages. In this work, we present a family of fuzzy orthogonal projection models for both monolingual and cross-lingual hypernymy prediction. For the monolingual task, we propose a Multi-Wahba Projection (MWP) model to distinguish hypernymy vs. non-hypernymy relations based on word embeddings. This model establishes distributional fuzzy mappings from embeddings of a term to those of its hypernyms and non-hypernyms, which consider the complicated linguistic regularities of these relations. For cross-lingual hypernymy prediction, a Transfer MWP (TMWP) model is proposed to transfer the semantic knowledge from the source language to target languages based on neural word translation. Additionally, an Iterative Transfer MWP (ITMWP) model is built upon TMWP, which augments the training sets of target languages when target languages are lower-resourced with limited training data. Experiments show i) MWP outperforms previous methods over two hypernymy prediction tasks for English; and ii) TMWP and ITMWP are effective to predict hypernymy over seven non-English languages.
上义关系是一种语义关系,表达一个概念与其实例之间的“是-是”关系。这种关系是大规模分类法、本体和知识图的构建块。近年来,利用文本模式和/或分布表示对英语中超音的预测取得了很大进展。然而,由于这些方法的高度语言依赖性和缺乏低资源语言的大型训练数据集,将这些技术应用于其他语言是具有挑战性的。在这项工作中,我们提出了一组模糊正交投影模型,用于单语和跨语超音预测。对于单语任务,我们提出了一个基于词嵌入的多wahba投影(MWP)模型来区分词性关系和非词性关系。该模型考虑了词与词之间的复杂语言规律,建立了词与词之间的分布模糊映射关系。针对跨语言超音预测,提出了一种基于神经词翻译的迁移MWP (Transfer MWP, TMWP)模型,将源语言的语义知识迁移到目标语言。在此基础上建立了迭代迁移MWP (ITMWP)模型,在目标语言资源不足、训练数据有限的情况下增加了目标语言的训练集。实验表明:i) MWP在英语的两个超音预测任务上优于以前的方法;TMWP和ITMWP在7种非英语语言中均能有效预测超音现象。
{"title":"A Family of Fuzzy Orthogonal Projection Models for Monolingual and Cross-lingual Hypernymy Prediction","authors":"Chengyu Wang, Yan Fan, Xiaofeng He, Aoying Zhou","doi":"10.1145/3308558.3313439","DOIUrl":"https://doi.org/10.1145/3308558.3313439","url":null,"abstract":"Hypernymy is a semantic relation, expressing the “is-a” relation between a concept and its instances. Such relations are building blocks for large-scale taxonomies, ontologies and knowledge graphs. Recently, much progress has been made for hypernymy prediction in English using textual patterns and/or distributional representations. However, applying such techniques to other languages is challenging due to the high language dependency of these methods and the lack of large training datasets of lower-resourced languages. In this work, we present a family of fuzzy orthogonal projection models for both monolingual and cross-lingual hypernymy prediction. For the monolingual task, we propose a Multi-Wahba Projection (MWP) model to distinguish hypernymy vs. non-hypernymy relations based on word embeddings. This model establishes distributional fuzzy mappings from embeddings of a term to those of its hypernyms and non-hypernyms, which consider the complicated linguistic regularities of these relations. For cross-lingual hypernymy prediction, a Transfer MWP (TMWP) model is proposed to transfer the semantic knowledge from the source language to target languages based on neural word translation. Additionally, an Iterative Transfer MWP (ITMWP) model is built upon TMWP, which augments the training sets of target languages when target languages are lower-resourced with limited training data. Experiments show i) MWP outperforms previous methods over two hypernymy prediction tasks for English; and ii) TMWP and ITMWP are effective to predict hypernymy over seven non-English languages.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"158 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81560714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Addressing Trust Bias for Unbiased Learning-to-Rank 解决信任偏见的无偏学习排序
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313697
Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork
Existing unbiased learning-to-rank models use counterfactual inference, notably Inverse Propensity Scoring (IPS), to learn a ranking function from biased click data. They handle the click incompleteness bias, but usually assume that the clicks are noise-free, i.e., a clicked document is always assumed to be relevant. In this paper, we relax this unrealistic assumption and study click noise explicitly in the unbiased learning-to-rank setting. Specifically, we model the noise as the position-dependent trust bias and propose a noise-aware Position-Based Model, named TrustPBM, to better capture user click behavior. We propose an Expectation-Maximization algorithm to estimate both examination and trust bias from click data in TrustPBM. Furthermore, we show that it is difficult to use a pure IPS method to incorporate click noise and thus propose a novel method that combines a Bayes rule application with IPS for unbiased learning-to-rank. We evaluate our proposed methods on three personal search data sets and demonstrate that our proposed model can significantly outperform the existing unbiased learning-to-rank methods.
现有的无偏学习排序模型使用反事实推理,特别是逆倾向评分(IPS),从有偏的点击数据中学习排序函数。它们处理点击不完整的偏差,但通常假设点击是无噪声的,也就是说,被点击的文档总是被假设是相关的。在本文中,我们放宽了这种不切实际的假设,并在无偏学习排序设置下明确地研究了点击噪声。具体来说,我们将噪声建模为位置依赖的信任偏差,并提出了一个基于位置的噪声感知模型TrustPBM,以更好地捕获用户点击行为。我们提出了一种期望最大化算法来估计TrustPBM中点击数据的检查和信任偏差。此外,我们表明很难使用纯IPS方法来纳入点击噪声,因此提出了一种将贝叶斯规则应用与IPS相结合的无偏学习排序的新方法。我们在三个个人搜索数据集上评估了我们提出的方法,并证明我们提出的模型可以显著优于现有的无偏学习排序方法。
{"title":"Addressing Trust Bias for Unbiased Learning-to-Rank","authors":"Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork","doi":"10.1145/3308558.3313697","DOIUrl":"https://doi.org/10.1145/3308558.3313697","url":null,"abstract":"Existing unbiased learning-to-rank models use counterfactual inference, notably Inverse Propensity Scoring (IPS), to learn a ranking function from biased click data. They handle the click incompleteness bias, but usually assume that the clicks are noise-free, i.e., a clicked document is always assumed to be relevant. In this paper, we relax this unrealistic assumption and study click noise explicitly in the unbiased learning-to-rank setting. Specifically, we model the noise as the position-dependent trust bias and propose a noise-aware Position-Based Model, named TrustPBM, to better capture user click behavior. We propose an Expectation-Maximization algorithm to estimate both examination and trust bias from click data in TrustPBM. Furthermore, we show that it is difficult to use a pure IPS method to incorporate click noise and thus propose a novel method that combines a Bayes rule application with IPS for unbiased learning-to-rank. We evaluate our proposed methods on three personal search data sets and demonstrate that our proposed model can significantly outperform the existing unbiased learning-to-rank methods.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81049513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
MARINE: Multi-relational Network Embeddings with Relational Proximity and Node Attributes 基于关系接近和节点属性的多关系网络嵌入
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313715
Ming-Han Feng, Chin-Chi Hsu, Cheng-te Li, Mi-Yen Yeh, Shou-de Lin
Network embedding aims at learning an effective vector transformation for entities in a network. We observe that there are two diverse branches of network embedding: for homogeneous graphs and for multi-relational graphs. This paper then proposes MARINE, a unified embedding framework for both homogeneous and multi-relational networks to preserve both the proximity and relation information. We also extend the framework to incorporate existing features of nodes in a graph, which can further be exploited for the ensemble of embedding. Our solution possesses complexity linear to the number of edges, which is suitable for large-scale network applications. Experiments conducted on several real-world network datasets, along with applications in link prediction and multi-label classification, exhibit the superiority of our proposed MARINE.
网络嵌入的目的是学习网络中实体的有效向量变换。我们观察到网络嵌入有两个不同的分支:同构图和多关系图。在此基础上,本文提出了一种用于同构和多关系网络的统一嵌入框架MARINE,以同时保留接近性和关系信息。我们还扩展了框架,将图中节点的现有特征纳入其中,可以进一步利用这些特征进行集成嵌入。该方案的复杂度与边数成线性关系,适合大规模网络应用。在几个真实网络数据集上进行的实验,以及在链路预测和多标签分类中的应用,显示了我们提出的MARINE的优势。
{"title":"MARINE: Multi-relational Network Embeddings with Relational Proximity and Node Attributes","authors":"Ming-Han Feng, Chin-Chi Hsu, Cheng-te Li, Mi-Yen Yeh, Shou-de Lin","doi":"10.1145/3308558.3313715","DOIUrl":"https://doi.org/10.1145/3308558.3313715","url":null,"abstract":"Network embedding aims at learning an effective vector transformation for entities in a network. We observe that there are two diverse branches of network embedding: for homogeneous graphs and for multi-relational graphs. This paper then proposes MARINE, a unified embedding framework for both homogeneous and multi-relational networks to preserve both the proximity and relation information. We also extend the framework to incorporate existing features of nodes in a graph, which can further be exploited for the ensemble of embedding. Our solution possesses complexity linear to the number of edges, which is suitable for large-scale network applications. Experiments conducted on several real-world network datasets, along with applications in link prediction and multi-label classification, exhibit the superiority of our proposed MARINE.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83601329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Bridging Screen Readers and Voice Assistants for Enhanced Eyes-Free Web Search 桥接屏幕阅读器和语音助手增强眼睛自由的网络搜索
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3314136
Alexandra Vtyurina, Adam Fourney, M. Morris, Leah Findlater, Ryen W. White
People with visual impairments often rely on screen readers when interacting with computer systems. Increasingly, these individuals also make extensive use of voice-based virtual assistants (VAs). We conducted a survey of 53 people who are legally blind to identify the strengths and weaknesses of both technologies, as well as the unmet opportunities at their intersection. We learned that virtual assistants are convenient and accessible, but lack the ability to deeply engage with content (e.g., read beyond the first few sentences of Wikipedia), and the ability to get a quick overview of the landscape (list alternative search results & suggestions). In contrast, screen readers allow for deep engagement with content (when content is accessible), and provide fine-grained navigation & control, but at the cost of increased complexity, and reduced walk-up-and-use convenience. In this demonstration, we showcase VERSE, a system that combines the positive aspects of VAs and screen readers, and allows other devices (e.g., smart watches) to serve as optional input accelerators. Together, these features allow people with visual impairments to deeply engage with web content through voice interaction.
有视觉障碍的人在与计算机系统交互时经常依赖屏幕阅读器。这些人也越来越多地广泛使用基于语音的虚拟助手(VAs)。我们对53名法律上失明的人进行了调查,以确定这两种技术的优缺点,以及它们交叉处未满足的机会。我们了解到,虚拟助手方便且易于访问,但缺乏深度参与内容的能力(例如,阅读维基百科的前几句以外的内容),以及快速概述内容的能力(列出可供选择的搜索结果和建议)。相比之下,屏幕阅读器允许与内容进行深度交互(当内容可访问时),并提供细粒度的导航和控制,但代价是增加了复杂性,减少了行走和使用的便利性。在这个演示中,我们展示了VERSE,一个结合了VAs和屏幕阅读器的积极方面的系统,并允许其他设备(例如智能手表)作为可选的输入加速器。总之,这些功能使视障人士能够通过语音交互深入参与网络内容。
{"title":"Bridging Screen Readers and Voice Assistants for Enhanced Eyes-Free Web Search","authors":"Alexandra Vtyurina, Adam Fourney, M. Morris, Leah Findlater, Ryen W. White","doi":"10.1145/3308558.3314136","DOIUrl":"https://doi.org/10.1145/3308558.3314136","url":null,"abstract":"People with visual impairments often rely on screen readers when interacting with computer systems. Increasingly, these individuals also make extensive use of voice-based virtual assistants (VAs). We conducted a survey of 53 people who are legally blind to identify the strengths and weaknesses of both technologies, as well as the unmet opportunities at their intersection. We learned that virtual assistants are convenient and accessible, but lack the ability to deeply engage with content (e.g., read beyond the first few sentences of Wikipedia), and the ability to get a quick overview of the landscape (list alternative search results & suggestions). In contrast, screen readers allow for deep engagement with content (when content is accessible), and provide fine-grained navigation & control, but at the cost of increased complexity, and reduced walk-up-and-use convenience. In this demonstration, we showcase VERSE, a system that combines the positive aspects of VAs and screen readers, and allows other devices (e.g., smart watches) to serve as optional input accelerators. Together, these features allow people with visual impairments to deeply engage with web content through voice interaction.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"88 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91416613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Learning Clusters through Information Diffusion 通过信息扩散学习集群
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313560
L. Ostroumova, Alexey Tikhonov, N. Litvak
When information or infectious diseases spread over a network, in many practical cases, one can observe when nodes adopt information or become infected, but the underlying network is hidden. In this paper, we analyze the problem of finding communities of highly interconnected nodes, given only the infection times of nodes. We propose, analyze, and empirically compare several algorithms for this task. The most stable performance, that improves the current state-of-the-art, is obtained by our proposed heuristic approaches, that are agnostic to a particular graph structure and epidemic model.
当信息或传染病在网络上传播时,在许多实际情况下,人们可以观察到节点何时采用信息或被感染,但底层网络是隐藏的。在给定节点感染次数的情况下,我们分析了寻找高度互联节点群体的问题。我们为这项任务提出、分析和经验比较了几种算法。我们提出的启发式方法对特定的图结构和流行病模型不可知,从而获得了最稳定的性能,提高了当前的技术水平。
{"title":"Learning Clusters through Information Diffusion","authors":"L. Ostroumova, Alexey Tikhonov, N. Litvak","doi":"10.1145/3308558.3313560","DOIUrl":"https://doi.org/10.1145/3308558.3313560","url":null,"abstract":"When information or infectious diseases spread over a network, in many practical cases, one can observe when nodes adopt information or become infected, but the underlying network is hidden. In this paper, we analyze the problem of finding communities of highly interconnected nodes, given only the infection times of nodes. We propose, analyze, and empirically compare several algorithms for this task. The most stable performance, that improves the current state-of-the-art, is obtained by our proposed heuristic approaches, that are agnostic to a particular graph structure and epidemic model.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"464 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91478411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Self- and Cross-Excitation in Stack Exchange Question & Answer Communities 堆栈交换问答社区中的自激励和交叉激励
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313440
Tiago Santos, Simon Walk, Roman Kern, M. Strohmaier, D. Helic
In this paper, we quantify the impact of self- and cross-excitation on the temporal development of user activity in Stack Exchange Question & Answer (Q&A) communities. We study differences in user excitation between growing and declining Stack Exchange communities, and between those dedicated to STEM and humanities topics by leveraging Hawkes processes. We find that growing communities exhibit early stage, high cross-excitation by a small core of power users reacting to the community as a whole, and strong long-term self-excitation in general and cross-excitation by casual users in particular, suggesting community openness towards less active users. Further, we observe that communities in the humanities exhibit long-term power user cross-excitation, whereas in STEM communities activity is more evenly distributed towards casual user self-excitation. We validate our findings via permutation tests and quantify the impact of these excitation effects with a range of prediction experiments. Our work enables researchers to quantitatively assess the evolution and activity potential of Q&A communities.
在本文中,我们量化了自激励和交叉激励对堆栈交换问答(Q&A)社区中用户活动的时间发展的影响。我们通过利用霍克斯流程研究了增长和下降的Stack Exchange社区之间以及致力于STEM和人文主题的社区之间用户兴奋程度的差异。我们发现,成长中的社区表现出早期阶段,一小部分核心高级用户对整个社区的反应产生了高度的交叉激励,并且总体上表现出强烈的长期自我激励,特别是休闲用户的交叉激励,这表明社区对不太活跃的用户开放。此外,我们观察到人文学科社区表现出长期的超级用户交叉激励,而STEM社区的活动更均匀地分布于普通用户的自激励。我们通过排列测试验证了我们的发现,并通过一系列预测实验量化了这些激发效应的影响。我们的工作使研究人员能够定量地评估问答社区的演变和活动潜力。
{"title":"Self- and Cross-Excitation in Stack Exchange Question & Answer Communities","authors":"Tiago Santos, Simon Walk, Roman Kern, M. Strohmaier, D. Helic","doi":"10.1145/3308558.3313440","DOIUrl":"https://doi.org/10.1145/3308558.3313440","url":null,"abstract":"In this paper, we quantify the impact of self- and cross-excitation on the temporal development of user activity in Stack Exchange Question & Answer (Q&A) communities. We study differences in user excitation between growing and declining Stack Exchange communities, and between those dedicated to STEM and humanities topics by leveraging Hawkes processes. We find that growing communities exhibit early stage, high cross-excitation by a small core of power users reacting to the community as a whole, and strong long-term self-excitation in general and cross-excitation by casual users in particular, suggesting community openness towards less active users. Further, we observe that communities in the humanities exhibit long-term power user cross-excitation, whereas in STEM communities activity is more evenly distributed towards casual user self-excitation. We validate our findings via permutation tests and quantify the impact of these excitation effects with a range of prediction experiments. Our work enables researchers to quantitatively assess the evolution and activity potential of Q&A communities.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"128 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84963657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Pcard: Personalized Restaurants Recommendation from Card Payment Transaction Records Pcard:从信用卡支付交易记录中个性化推荐餐厅
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313494
Min Du, Robert Christensen, Wei Zhang, Feifei Li
Personalized Point of Interest (POI) recommendation that incorporates users' personal preferences is an important subject of research. However, challenges exist such as dealing with sparse rating data and spatial location factors. As one of the biggest card payment organizations in the United States, our company holds abundant card payment transaction records with numerous features. In this paper, using restaurant recommendation as a demonstrating example, we present a personalized POI recommendation system (Pcard) that learns user preferences based on user transaction history and restaurants' locations. With a novel embedding approach that captures user embeddings and restaurant embeddings, we model pairwise restaurant preferences with respect to each user based on their locations and dining histories. Finally, a ranking list of restaurants within a spatial region is presented to the user. The evaluation results show that the proposed approach is able to achieve high accuracy and present effective recommendations.
结合用户个人偏好的个性化兴趣点(POI)推荐是一个重要的研究课题。然而,在处理稀疏的评级数据和空间位置因素等方面存在挑战。作为美国最大的信用卡支付机构之一,我们公司拥有丰富的信用卡支付交易记录,特征众多。本文以餐厅推荐为例,提出了一种基于用户交易历史和餐厅位置学习用户偏好的个性化POI推荐系统(Pcard)。通过一种新颖的嵌入方法,捕获用户嵌入和餐厅嵌入,我们基于每个用户的位置和用餐历史,对餐馆偏好进行两两建模。最后,一个空间区域内的餐厅排名列表呈现给用户。评估结果表明,该方法能够达到较高的准确率,并提供有效的推荐。
{"title":"Pcard: Personalized Restaurants Recommendation from Card Payment Transaction Records","authors":"Min Du, Robert Christensen, Wei Zhang, Feifei Li","doi":"10.1145/3308558.3313494","DOIUrl":"https://doi.org/10.1145/3308558.3313494","url":null,"abstract":"Personalized Point of Interest (POI) recommendation that incorporates users' personal preferences is an important subject of research. However, challenges exist such as dealing with sparse rating data and spatial location factors. As one of the biggest card payment organizations in the United States, our company holds abundant card payment transaction records with numerous features. In this paper, using restaurant recommendation as a demonstrating example, we present a personalized POI recommendation system (Pcard) that learns user preferences based on user transaction history and restaurants' locations. With a novel embedding approach that captures user embeddings and restaurant embeddings, we model pairwise restaurant preferences with respect to each user based on their locations and dining histories. Finally, a ranking list of restaurants within a spatial region is presented to the user. The evaluation results show that the proposed approach is able to achieve high accuracy and present effective recommendations.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91079781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Evaluating Neural Text Simplification in the Medical Domain 评价医学领域的神经文本简化
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313630
Laurens Van den Bercken, Robert-Jan Sips, C. Lofi
Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.
卫生素养,即阅读和理解医学文献的能力,是公共卫生的一个相关组成部分。不幸的是,许多医学文本很难被普通大众理解,因为它们针对的是高技能的专业人士,使用复杂的语言和特定领域的术语。在这里,自动文本简化使文本易于理解将是非常有益的。然而,医学文本简化的研究和发展受到缺乏公开可用的训练和测试语料库的阻碍,这些语料库包含复杂的医学句子及其对齐的简化版本。在本文中,我们引入了这样一个数据集来帮助医学文本简化研究。该数据集是通过使用现有对齐语料库中的专家知识和一种新颖的简单的、独立于语言的单语文本对齐方法过滤对齐的健康句而创建的。此外,我们使用该数据集来训练最先进的神经机器翻译模型,并将其与使用自动评估和广泛的人类专家评估在一般简化数据集上训练的模型进行比较。
{"title":"Evaluating Neural Text Simplification in the Medical Domain","authors":"Laurens Van den Bercken, Robert-Jan Sips, C. Lofi","doi":"10.1145/3308558.3313630","DOIUrl":"https://doi.org/10.1145/3308558.3313630","url":null,"abstract":"Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91089628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs 网络规模图的无损和有损摘要
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313402
Kijung Shin, A. Ghoting, Myunghwan Kim, Hema Raghavan
Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400 × faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4 × better compression than them.
给定一个分布在多台机器上的太字节规模的图,我们如何用更少的节点和边来总结它,以便我们能够准确地或在错误范围内恢复原始图?由于大规模图无处不在,从网络图到在线社交网络,紧凑地表示图对于有效地存储和处理它们变得非常重要。给定一个图,图摘要的目的是找到它的紧凑表示,包括(a)一个摘要图,其中节点是输入图中不相交的节点集,每条边表示两个集合中所有节点对之间的边;(b)边缘修正,用于精确地或在误差范围内从汇总图恢复输入图。虽然图摘要是一种广泛使用的图压缩技术,并且可以与其他技术相结合,但现有的图摘要算法在输出的速度或紧凑性方面并不令人满意。更重要的是,它们假设输入图足够小,可以装入主存储器。在这项工作中,我们提出了一个快速并行算法SWeG,用于总结具有紧凑表示的图。SWeG不仅是为共享内存设计的,而且还为MapReduce设置设计,以总结太大而无法在主内存中容纳的图形。我们证明了SWeG是(a)快速的:SWeG比提供类似紧凑表示的竞争对手快5400倍,(b)可扩展的:SWeG缩放到具有数百亿条边的图,以及(c)紧凑的:与最先进的压缩方法相结合,SWeG实现了比它们高3.4倍的压缩。
{"title":"SWeG: Lossless and Lossy Summarization of Web-Scale Graphs","authors":"Kijung Shin, A. Ghoting, Myunghwan Kim, Hema Raghavan","doi":"10.1145/3308558.3313402","DOIUrl":"https://doi.org/10.1145/3308558.3313402","url":null,"abstract":"Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400 × faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4 × better compression than them.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78126246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
ContraVis: Contrastive and Visual Topic Modeling for Comparing Document Collections 对比:比较文档集合的对比和可视化主题建模
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313617
T. Le, L. Akoglu
Given posts on 'abortion' and posts on 'religion' from a political forum, how can we find topics that are discriminative and those in common? In general, (1) how can we compare and contrast two or more different ('labeled') document collections? Moreover, (2) how can we visualize the data (in 2-d or 3-d) to best reflect the similarities and differences between the collections? We introduce (to the best of our knowledge) the first contrastive and visual topic model, called ContraVis, that jointly addresses both problems: (1) contrastive topic modeling, and (2) contrastive visualization. That is, ContraVis learns not only latent topics but also embeddings for the documents, topics and labels for visualization. ContraVis exhibits three key properties by design. It is (i) Contrastive: It enables comparative analysis of different document corpora by extracting latent discriminative and common topics across labeled documents; (ii) Visually-expressive: Different from numerous existing models, it also produces a visualization for all of the documents, labels, and the extracted topics, where proximity in the coordinate space is reflective of proximity in semantic space; (iii) Unified: It extracts topics and visual coordinates simultaneously under a joint model. Through extensive experiments on real-world datasets, we show ContraVis 's potential for providing visual contrastive analysis of multiple document collections. We show both qualitatively and quantitatively that ContraVis significantly outperforms both unsupervised and supervised state-of-the-art topic models in contrastive power, semantic coherence and visual effectiveness.
鉴于政治论坛上关于“堕胎”和“宗教”的帖子,我们如何找到歧视和共同的话题?一般来说,(1)我们如何比较和对比两个或更多不同的(“标记的”)文档集合?此外,(2)我们如何将数据可视化(2 -d或3-d)以最好地反映集合之间的异同?我们介绍(据我们所知)第一个对比和可视化主题模型,称为ContraVis,它共同解决了两个问题:(1)对比主题建模,(2)对比可视化。也就是说,ContraVis不仅学习潜在的主题,还学习文档、主题和标签的嵌入,以实现可视化。ContraVis在设计上展示了三个关键属性。它是(i)对比的:它可以通过在标记的文档中提取潜在的区别性和共同主题来对不同的文档语料库进行比较分析;(ii)视觉表达:与众多现有模型不同,它还对所有文档、标签和提取的主题产生可视化,其中坐标空间的接近性反映了语义空间的接近性;(三)统一:在一个联合模型下同时提取主题和视觉坐标。通过对真实世界数据集的广泛实验,我们展示了ContraVis在提供多个文档集合的视觉对比分析方面的潜力。我们在定性和定量上都表明,ContraVis在对比能力、语义一致性和视觉效果方面显著优于无监督和有监督的最先进主题模型。
{"title":"ContraVis: Contrastive and Visual Topic Modeling for Comparing Document Collections","authors":"T. Le, L. Akoglu","doi":"10.1145/3308558.3313617","DOIUrl":"https://doi.org/10.1145/3308558.3313617","url":null,"abstract":"Given posts on 'abortion' and posts on 'religion' from a political forum, how can we find topics that are discriminative and those in common? In general, (1) how can we compare and contrast two or more different ('labeled') document collections? Moreover, (2) how can we visualize the data (in 2-d or 3-d) to best reflect the similarities and differences between the collections? We introduce (to the best of our knowledge) the first contrastive and visual topic model, called ContraVis, that jointly addresses both problems: (1) contrastive topic modeling, and (2) contrastive visualization. That is, ContraVis learns not only latent topics but also embeddings for the documents, topics and labels for visualization. ContraVis exhibits three key properties by design. It is (i) Contrastive: It enables comparative analysis of different document corpora by extracting latent discriminative and common topics across labeled documents; (ii) Visually-expressive: Different from numerous existing models, it also produces a visualization for all of the documents, labels, and the extracted topics, where proximity in the coordinate space is reflective of proximity in semantic space; (iii) Unified: It extracts topics and visual coordinates simultaneously under a joint model. Through extensive experiments on real-world datasets, we show ContraVis 's potential for providing visual contrastive analysis of multiple document collections. We show both qualitatively and quantitatively that ContraVis significantly outperforms both unsupervised and supervised state-of-the-art topic models in contrastive power, semantic coherence and visual effectiveness.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81912831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
The World Wide Web Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1