首页 > 最新文献

The World Wide Web Conference最新文献

英文 中文
Rethinking the Detection of Child Sexual Abuse Imagery on the Internet 对网络儿童性侵图像检测的再思考
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313482
Elie Bursztein, Einat Clarke, Michelle DeLaune, David M. Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, Travis Bright
Over the last decade, the illegal distribution of child sexual abuse imagery (CSAI) has transformed alongside the rise of online sharing platforms. In this paper, we present the first longitudinal measurement study of CSAI distribution online and the threat it poses to society's ability to combat child sexual abuse. Our results illustrate that CSAI has grown exponentially-to nearly 1 million detected events per month-exceeding the capabilities of independent clearinghouses and law enforcement to take action. In order to scale CSAI protections moving forward, we discuss techniques for automating detection and response by using recent advancements in machine learning.
在过去的十年里,随着在线分享平台的兴起,非法传播儿童性虐待图像(CSAI)的情况发生了变化。在本文中,我们提出了第一个在线CSAI分布的纵向测量研究,以及它对社会打击儿童性虐待能力的威胁。我们的研究结果表明,CSAI已呈指数级增长——每月检测到近100万起事件,超出了独立清算所和执法部门采取行动的能力。为了进一步扩展CSAI保护,我们讨论了通过使用机器学习的最新进展来自动化检测和响应的技术。
{"title":"Rethinking the Detection of Child Sexual Abuse Imagery on the Internet","authors":"Elie Bursztein, Einat Clarke, Michelle DeLaune, David M. Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, Travis Bright","doi":"10.1145/3308558.3313482","DOIUrl":"https://doi.org/10.1145/3308558.3313482","url":null,"abstract":"Over the last decade, the illegal distribution of child sexual abuse imagery (CSAI) has transformed alongside the rise of online sharing platforms. In this paper, we present the first longitudinal measurement study of CSAI distribution online and the threat it poses to society's ability to combat child sexual abuse. Our results illustrate that CSAI has grown exponentially-to nearly 1 million detected events per month-exceeding the capabilities of independent clearinghouses and law enforcement to take action. In order to scale CSAI protections moving forward, we discuss techniques for automating detection and response by using recent advancements in machine learning.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81404019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Addressing Trust Bias for Unbiased Learning-to-Rank 解决信任偏见的无偏学习排序
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313697
Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork
Existing unbiased learning-to-rank models use counterfactual inference, notably Inverse Propensity Scoring (IPS), to learn a ranking function from biased click data. They handle the click incompleteness bias, but usually assume that the clicks are noise-free, i.e., a clicked document is always assumed to be relevant. In this paper, we relax this unrealistic assumption and study click noise explicitly in the unbiased learning-to-rank setting. Specifically, we model the noise as the position-dependent trust bias and propose a noise-aware Position-Based Model, named TrustPBM, to better capture user click behavior. We propose an Expectation-Maximization algorithm to estimate both examination and trust bias from click data in TrustPBM. Furthermore, we show that it is difficult to use a pure IPS method to incorporate click noise and thus propose a novel method that combines a Bayes rule application with IPS for unbiased learning-to-rank. We evaluate our proposed methods on three personal search data sets and demonstrate that our proposed model can significantly outperform the existing unbiased learning-to-rank methods.
现有的无偏学习排序模型使用反事实推理,特别是逆倾向评分(IPS),从有偏的点击数据中学习排序函数。它们处理点击不完整的偏差,但通常假设点击是无噪声的,也就是说,被点击的文档总是被假设是相关的。在本文中,我们放宽了这种不切实际的假设,并在无偏学习排序设置下明确地研究了点击噪声。具体来说,我们将噪声建模为位置依赖的信任偏差,并提出了一个基于位置的噪声感知模型TrustPBM,以更好地捕获用户点击行为。我们提出了一种期望最大化算法来估计TrustPBM中点击数据的检查和信任偏差。此外,我们表明很难使用纯IPS方法来纳入点击噪声,因此提出了一种将贝叶斯规则应用与IPS相结合的无偏学习排序的新方法。我们在三个个人搜索数据集上评估了我们提出的方法,并证明我们提出的模型可以显著优于现有的无偏学习排序方法。
{"title":"Addressing Trust Bias for Unbiased Learning-to-Rank","authors":"Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork","doi":"10.1145/3308558.3313697","DOIUrl":"https://doi.org/10.1145/3308558.3313697","url":null,"abstract":"Existing unbiased learning-to-rank models use counterfactual inference, notably Inverse Propensity Scoring (IPS), to learn a ranking function from biased click data. They handle the click incompleteness bias, but usually assume that the clicks are noise-free, i.e., a clicked document is always assumed to be relevant. In this paper, we relax this unrealistic assumption and study click noise explicitly in the unbiased learning-to-rank setting. Specifically, we model the noise as the position-dependent trust bias and propose a noise-aware Position-Based Model, named TrustPBM, to better capture user click behavior. We propose an Expectation-Maximization algorithm to estimate both examination and trust bias from click data in TrustPBM. Furthermore, we show that it is difficult to use a pure IPS method to incorporate click noise and thus propose a novel method that combines a Bayes rule application with IPS for unbiased learning-to-rank. We evaluate our proposed methods on three personal search data sets and demonstrate that our proposed model can significantly outperform the existing unbiased learning-to-rank methods.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81049513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
A Family of Fuzzy Orthogonal Projection Models for Monolingual and Cross-lingual Hypernymy Prediction 一组用于单语和跨语夸张预测的模糊正交投影模型
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313439
Chengyu Wang, Yan Fan, Xiaofeng He, Aoying Zhou
Hypernymy is a semantic relation, expressing the “is-a” relation between a concept and its instances. Such relations are building blocks for large-scale taxonomies, ontologies and knowledge graphs. Recently, much progress has been made for hypernymy prediction in English using textual patterns and/or distributional representations. However, applying such techniques to other languages is challenging due to the high language dependency of these methods and the lack of large training datasets of lower-resourced languages. In this work, we present a family of fuzzy orthogonal projection models for both monolingual and cross-lingual hypernymy prediction. For the monolingual task, we propose a Multi-Wahba Projection (MWP) model to distinguish hypernymy vs. non-hypernymy relations based on word embeddings. This model establishes distributional fuzzy mappings from embeddings of a term to those of its hypernyms and non-hypernyms, which consider the complicated linguistic regularities of these relations. For cross-lingual hypernymy prediction, a Transfer MWP (TMWP) model is proposed to transfer the semantic knowledge from the source language to target languages based on neural word translation. Additionally, an Iterative Transfer MWP (ITMWP) model is built upon TMWP, which augments the training sets of target languages when target languages are lower-resourced with limited training data. Experiments show i) MWP outperforms previous methods over two hypernymy prediction tasks for English; and ii) TMWP and ITMWP are effective to predict hypernymy over seven non-English languages.
上义关系是一种语义关系,表达一个概念与其实例之间的“是-是”关系。这种关系是大规模分类法、本体和知识图的构建块。近年来,利用文本模式和/或分布表示对英语中超音的预测取得了很大进展。然而,由于这些方法的高度语言依赖性和缺乏低资源语言的大型训练数据集,将这些技术应用于其他语言是具有挑战性的。在这项工作中,我们提出了一组模糊正交投影模型,用于单语和跨语超音预测。对于单语任务,我们提出了一个基于词嵌入的多wahba投影(MWP)模型来区分词性关系和非词性关系。该模型考虑了词与词之间的复杂语言规律,建立了词与词之间的分布模糊映射关系。针对跨语言超音预测,提出了一种基于神经词翻译的迁移MWP (Transfer MWP, TMWP)模型,将源语言的语义知识迁移到目标语言。在此基础上建立了迭代迁移MWP (ITMWP)模型,在目标语言资源不足、训练数据有限的情况下增加了目标语言的训练集。实验表明:i) MWP在英语的两个超音预测任务上优于以前的方法;TMWP和ITMWP在7种非英语语言中均能有效预测超音现象。
{"title":"A Family of Fuzzy Orthogonal Projection Models for Monolingual and Cross-lingual Hypernymy Prediction","authors":"Chengyu Wang, Yan Fan, Xiaofeng He, Aoying Zhou","doi":"10.1145/3308558.3313439","DOIUrl":"https://doi.org/10.1145/3308558.3313439","url":null,"abstract":"Hypernymy is a semantic relation, expressing the “is-a” relation between a concept and its instances. Such relations are building blocks for large-scale taxonomies, ontologies and knowledge graphs. Recently, much progress has been made for hypernymy prediction in English using textual patterns and/or distributional representations. However, applying such techniques to other languages is challenging due to the high language dependency of these methods and the lack of large training datasets of lower-resourced languages. In this work, we present a family of fuzzy orthogonal projection models for both monolingual and cross-lingual hypernymy prediction. For the monolingual task, we propose a Multi-Wahba Projection (MWP) model to distinguish hypernymy vs. non-hypernymy relations based on word embeddings. This model establishes distributional fuzzy mappings from embeddings of a term to those of its hypernyms and non-hypernyms, which consider the complicated linguistic regularities of these relations. For cross-lingual hypernymy prediction, a Transfer MWP (TMWP) model is proposed to transfer the semantic knowledge from the source language to target languages based on neural word translation. Additionally, an Iterative Transfer MWP (ITMWP) model is built upon TMWP, which augments the training sets of target languages when target languages are lower-resourced with limited training data. Experiments show i) MWP outperforms previous methods over two hypernymy prediction tasks for English; and ii) TMWP and ITMWP are effective to predict hypernymy over seven non-English languages.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"158 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81560714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
ContraVis: Contrastive and Visual Topic Modeling for Comparing Document Collections 对比:比较文档集合的对比和可视化主题建模
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313617
T. Le, L. Akoglu
Given posts on 'abortion' and posts on 'religion' from a political forum, how can we find topics that are discriminative and those in common? In general, (1) how can we compare and contrast two or more different ('labeled') document collections? Moreover, (2) how can we visualize the data (in 2-d or 3-d) to best reflect the similarities and differences between the collections? We introduce (to the best of our knowledge) the first contrastive and visual topic model, called ContraVis, that jointly addresses both problems: (1) contrastive topic modeling, and (2) contrastive visualization. That is, ContraVis learns not only latent topics but also embeddings for the documents, topics and labels for visualization. ContraVis exhibits three key properties by design. It is (i) Contrastive: It enables comparative analysis of different document corpora by extracting latent discriminative and common topics across labeled documents; (ii) Visually-expressive: Different from numerous existing models, it also produces a visualization for all of the documents, labels, and the extracted topics, where proximity in the coordinate space is reflective of proximity in semantic space; (iii) Unified: It extracts topics and visual coordinates simultaneously under a joint model. Through extensive experiments on real-world datasets, we show ContraVis 's potential for providing visual contrastive analysis of multiple document collections. We show both qualitatively and quantitatively that ContraVis significantly outperforms both unsupervised and supervised state-of-the-art topic models in contrastive power, semantic coherence and visual effectiveness.
鉴于政治论坛上关于“堕胎”和“宗教”的帖子,我们如何找到歧视和共同的话题?一般来说,(1)我们如何比较和对比两个或更多不同的(“标记的”)文档集合?此外,(2)我们如何将数据可视化(2 -d或3-d)以最好地反映集合之间的异同?我们介绍(据我们所知)第一个对比和可视化主题模型,称为ContraVis,它共同解决了两个问题:(1)对比主题建模,(2)对比可视化。也就是说,ContraVis不仅学习潜在的主题,还学习文档、主题和标签的嵌入,以实现可视化。ContraVis在设计上展示了三个关键属性。它是(i)对比的:它可以通过在标记的文档中提取潜在的区别性和共同主题来对不同的文档语料库进行比较分析;(ii)视觉表达:与众多现有模型不同,它还对所有文档、标签和提取的主题产生可视化,其中坐标空间的接近性反映了语义空间的接近性;(三)统一:在一个联合模型下同时提取主题和视觉坐标。通过对真实世界数据集的广泛实验,我们展示了ContraVis在提供多个文档集合的视觉对比分析方面的潜力。我们在定性和定量上都表明,ContraVis在对比能力、语义一致性和视觉效果方面显著优于无监督和有监督的最先进主题模型。
{"title":"ContraVis: Contrastive and Visual Topic Modeling for Comparing Document Collections","authors":"T. Le, L. Akoglu","doi":"10.1145/3308558.3313617","DOIUrl":"https://doi.org/10.1145/3308558.3313617","url":null,"abstract":"Given posts on 'abortion' and posts on 'religion' from a political forum, how can we find topics that are discriminative and those in common? In general, (1) how can we compare and contrast two or more different ('labeled') document collections? Moreover, (2) how can we visualize the data (in 2-d or 3-d) to best reflect the similarities and differences between the collections? We introduce (to the best of our knowledge) the first contrastive and visual topic model, called ContraVis, that jointly addresses both problems: (1) contrastive topic modeling, and (2) contrastive visualization. That is, ContraVis learns not only latent topics but also embeddings for the documents, topics and labels for visualization. ContraVis exhibits three key properties by design. It is (i) Contrastive: It enables comparative analysis of different document corpora by extracting latent discriminative and common topics across labeled documents; (ii) Visually-expressive: Different from numerous existing models, it also produces a visualization for all of the documents, labels, and the extracted topics, where proximity in the coordinate space is reflective of proximity in semantic space; (iii) Unified: It extracts topics and visual coordinates simultaneously under a joint model. Through extensive experiments on real-world datasets, we show ContraVis 's potential for providing visual contrastive analysis of multiple document collections. We show both qualitatively and quantitatively that ContraVis significantly outperforms both unsupervised and supervised state-of-the-art topic models in contrastive power, semantic coherence and visual effectiveness.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81912831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Self- and Cross-Excitation in Stack Exchange Question & Answer Communities 堆栈交换问答社区中的自激励和交叉激励
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313440
Tiago Santos, Simon Walk, Roman Kern, M. Strohmaier, D. Helic
In this paper, we quantify the impact of self- and cross-excitation on the temporal development of user activity in Stack Exchange Question & Answer (Q&A) communities. We study differences in user excitation between growing and declining Stack Exchange communities, and between those dedicated to STEM and humanities topics by leveraging Hawkes processes. We find that growing communities exhibit early stage, high cross-excitation by a small core of power users reacting to the community as a whole, and strong long-term self-excitation in general and cross-excitation by casual users in particular, suggesting community openness towards less active users. Further, we observe that communities in the humanities exhibit long-term power user cross-excitation, whereas in STEM communities activity is more evenly distributed towards casual user self-excitation. We validate our findings via permutation tests and quantify the impact of these excitation effects with a range of prediction experiments. Our work enables researchers to quantitatively assess the evolution and activity potential of Q&A communities.
在本文中,我们量化了自激励和交叉激励对堆栈交换问答(Q&A)社区中用户活动的时间发展的影响。我们通过利用霍克斯流程研究了增长和下降的Stack Exchange社区之间以及致力于STEM和人文主题的社区之间用户兴奋程度的差异。我们发现,成长中的社区表现出早期阶段,一小部分核心高级用户对整个社区的反应产生了高度的交叉激励,并且总体上表现出强烈的长期自我激励,特别是休闲用户的交叉激励,这表明社区对不太活跃的用户开放。此外,我们观察到人文学科社区表现出长期的超级用户交叉激励,而STEM社区的活动更均匀地分布于普通用户的自激励。我们通过排列测试验证了我们的发现,并通过一系列预测实验量化了这些激发效应的影响。我们的工作使研究人员能够定量地评估问答社区的演变和活动潜力。
{"title":"Self- and Cross-Excitation in Stack Exchange Question & Answer Communities","authors":"Tiago Santos, Simon Walk, Roman Kern, M. Strohmaier, D. Helic","doi":"10.1145/3308558.3313440","DOIUrl":"https://doi.org/10.1145/3308558.3313440","url":null,"abstract":"In this paper, we quantify the impact of self- and cross-excitation on the temporal development of user activity in Stack Exchange Question & Answer (Q&A) communities. We study differences in user excitation between growing and declining Stack Exchange communities, and between those dedicated to STEM and humanities topics by leveraging Hawkes processes. We find that growing communities exhibit early stage, high cross-excitation by a small core of power users reacting to the community as a whole, and strong long-term self-excitation in general and cross-excitation by casual users in particular, suggesting community openness towards less active users. Further, we observe that communities in the humanities exhibit long-term power user cross-excitation, whereas in STEM communities activity is more evenly distributed towards casual user self-excitation. We validate our findings via permutation tests and quantify the impact of these excitation effects with a range of prediction experiments. Our work enables researchers to quantitatively assess the evolution and activity potential of Q&A communities.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"128 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84963657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization 基于稀疏矩阵分解的大规模网络嵌入
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313446
J. Qiu, Yuxiao Dong, Hao Ma, Jun Yu Li, Chi Wang, Kuansan Wang, Jie Tang
We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix-which is dense-is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available1.
我们研究了大规模网络嵌入问题,旨在学习网络挖掘应用的潜在表示。先前的研究表明,1)流行的网络嵌入基准,如DeepWalk,本质上是隐式分解具有封闭形式的矩阵,2)这种矩阵的显式分解产生比现有方法更强大的嵌入。然而,直接构造和分解这个矩阵——它是密集的——在时间和空间上都是非常昂贵的,使得它不能用于大型网络。在这项工作中,我们提出了大规模网络嵌入的稀疏矩阵分解算法(NetSMF)。NetSMF利用谱稀疏化理论有效地稀疏了上述密集矩阵,从而显著提高了嵌入学习的效率。稀疏化后的矩阵在谱上接近原始密集矩阵,具有理论上有界的近似误差,这有助于保持学习到的嵌入的表示能力。我们在各种规模和类型的网络上进行实验。结果表明,在常用的基准测试方法和基于因子分解的方法中,NetSMF是唯一既高效又有效的方法。我们表明,NetSMF只需要24小时就可以为具有数千万个节点的大规模学术协作网络生成有效的嵌入,而这将花费DeepWalk数月的时间,并且对于密集矩阵分解解决方案在计算上是不可行的。NetSMF的源代码是公开的。
{"title":"NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization","authors":"J. Qiu, Yuxiao Dong, Hao Ma, Jun Yu Li, Chi Wang, Kuansan Wang, Jie Tang","doi":"10.1145/3308558.3313446","DOIUrl":"https://doi.org/10.1145/3308558.3313446","url":null,"abstract":"We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix-which is dense-is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available1.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86204512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 143
MARINE: Multi-relational Network Embeddings with Relational Proximity and Node Attributes 基于关系接近和节点属性的多关系网络嵌入
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313715
Ming-Han Feng, Chin-Chi Hsu, Cheng-te Li, Mi-Yen Yeh, Shou-de Lin
Network embedding aims at learning an effective vector transformation for entities in a network. We observe that there are two diverse branches of network embedding: for homogeneous graphs and for multi-relational graphs. This paper then proposes MARINE, a unified embedding framework for both homogeneous and multi-relational networks to preserve both the proximity and relation information. We also extend the framework to incorporate existing features of nodes in a graph, which can further be exploited for the ensemble of embedding. Our solution possesses complexity linear to the number of edges, which is suitable for large-scale network applications. Experiments conducted on several real-world network datasets, along with applications in link prediction and multi-label classification, exhibit the superiority of our proposed MARINE.
网络嵌入的目的是学习网络中实体的有效向量变换。我们观察到网络嵌入有两个不同的分支:同构图和多关系图。在此基础上,本文提出了一种用于同构和多关系网络的统一嵌入框架MARINE,以同时保留接近性和关系信息。我们还扩展了框架,将图中节点的现有特征纳入其中,可以进一步利用这些特征进行集成嵌入。该方案的复杂度与边数成线性关系,适合大规模网络应用。在几个真实网络数据集上进行的实验,以及在链路预测和多标签分类中的应用,显示了我们提出的MARINE的优势。
{"title":"MARINE: Multi-relational Network Embeddings with Relational Proximity and Node Attributes","authors":"Ming-Han Feng, Chin-Chi Hsu, Cheng-te Li, Mi-Yen Yeh, Shou-de Lin","doi":"10.1145/3308558.3313715","DOIUrl":"https://doi.org/10.1145/3308558.3313715","url":null,"abstract":"Network embedding aims at learning an effective vector transformation for entities in a network. We observe that there are two diverse branches of network embedding: for homogeneous graphs and for multi-relational graphs. This paper then proposes MARINE, a unified embedding framework for both homogeneous and multi-relational networks to preserve both the proximity and relation information. We also extend the framework to incorporate existing features of nodes in a graph, which can further be exploited for the ensemble of embedding. Our solution possesses complexity linear to the number of edges, which is suitable for large-scale network applications. Experiments conducted on several real-world network datasets, along with applications in link prediction and multi-label classification, exhibit the superiority of our proposed MARINE.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83601329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Review Response Generation in E-Commerce Platforms with External Product Information 具有外部产品信息的电子商务平台的评审响应生成
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313581
Lujun Zhao, Kaisong Song, Changlong Sun, Qi Zhang, Xuanjing Huang, Xiaozhong Liu
''User reviews” are becoming an essential component of e-commerce. When buyers write a negative or doubting review, ideally, the sellers need to quickly give a response to minimize the potential impact. When the number of reviews is growing at a frightening speed, there is an urgent need to build a response writing assistant for customer service providers. In order to generate high-quality responses, the algorithm needs to consume and understand the information from both the original review and the target product. The classical sequence-to-sequence (Seq2Seq) methods can hardly satisfy this requirement. In this study, we propose a novel deep neural network model based on the Seq2Seq framework for the review response generation task in e-commerce platforms, which can incorporate product information by a gated multi-source attention mechanism and a copy mechanism. Moreover, we employ a reinforcement learning technique to reduce the exposure bias problem. To evaluate the proposed model, we constructed a large-scale dataset from a popular e-commerce website, which contains product information. Empirical studies on both automatic evaluation metrics and human annotations show that the proposed model can generate informative and diverse responses, significantly outperforming state-of-the-art text generation models.
“用户评论”正在成为电子商务的一个重要组成部分。当买家写下负面或怀疑的评论时,理想情况下,卖家需要迅速做出回应,以尽量减少潜在的影响。当评论数量以惊人的速度增长时,迫切需要为客户服务提供商建立一个回复写作助手。为了生成高质量的响应,算法需要消费和理解来自原始评论和目标产品的信息。传统的序列对序列(Seq2Seq)方法很难满足这一要求。本文提出了一种新的基于Seq2Seq框架的深度神经网络模型,该模型通过门控多源注意机制和复制机制将产品信息整合到电子商务平台的评论响应生成任务中。此外,我们采用强化学习技术来减少暴露偏差问题。为了评估所提出的模型,我们从一个流行的电子商务网站构建了一个包含产品信息的大规模数据集。对自动评价指标和人工注释的实证研究表明,该模型可以生成信息丰富且多样化的响应,显著优于目前最先进的文本生成模型。
{"title":"Review Response Generation in E-Commerce Platforms with External Product Information","authors":"Lujun Zhao, Kaisong Song, Changlong Sun, Qi Zhang, Xuanjing Huang, Xiaozhong Liu","doi":"10.1145/3308558.3313581","DOIUrl":"https://doi.org/10.1145/3308558.3313581","url":null,"abstract":"''User reviews” are becoming an essential component of e-commerce. When buyers write a negative or doubting review, ideally, the sellers need to quickly give a response to minimize the potential impact. When the number of reviews is growing at a frightening speed, there is an urgent need to build a response writing assistant for customer service providers. In order to generate high-quality responses, the algorithm needs to consume and understand the information from both the original review and the target product. The classical sequence-to-sequence (Seq2Seq) methods can hardly satisfy this requirement. In this study, we propose a novel deep neural network model based on the Seq2Seq framework for the review response generation task in e-commerce platforms, which can incorporate product information by a gated multi-source attention mechanism and a copy mechanism. Moreover, we employ a reinforcement learning technique to reduce the exposure bias problem. To evaluate the proposed model, we constructed a large-scale dataset from a popular e-commerce website, which contains product information. Empirical studies on both automatic evaluation metrics and human annotations show that the proposed model can generate informative and diverse responses, significantly outperforming state-of-the-art text generation models.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78812517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Sensitivity Analysis of Centralities on Unweighted Networks 非加权网络中心性的敏感性分析
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313422
Shogo Murai, Yuichi Yoshida
Revealing important vertices is a fundamental task in network analysis. As such, many indicators have been proposed for doing so, which are collectively called centralities. However, the abundance of studies on centralities blurs their differences. In this work, we compare centralities based on their sensivitity to modifications in the graph. Specifically, we introduce a quantitative measure called (average-case) edge sensitivity, which measures how much the centrality value of a uniformly chosen vertex (or an edge) changes when we remove a uniformly chosen edge. Edge sensitivity is applicable to unweighted graphs, regarding which, to our knowledge, there has been no theoretical analysis of the centralities. We conducted a theoretical analysis of the edge sensitivities of six major centralities: the closeness centrality, harmonic centrality, betweenness centrality, endpoint betweenness centrality, PageRank, and spanning tree centrality. Our experimental results on synthetic and real graphs confirm the tendency predicted by the theoretical analysis. We also discuss an extension of edge sensitivity to the setting that we remove a uniformly chosen set of edges of size k for an integer k = 1.
揭示重要的顶点是网络分析的一项基本任务。因此,为此提出了许多指标,这些指标统称为中心性。然而,大量关于中心性的研究模糊了它们之间的差异。在这项工作中,我们根据中心性对图中修改的敏感性来比较中心性。具体来说,我们引入了一种称为(平均情况下)边缘灵敏度的定量度量,它测量了当我们删除均匀选择的边缘时,均匀选择的顶点(或边缘)的中心性值的变化程度。边缘灵敏度适用于未加权的图,据我们所知,还没有对中心性的理论分析。我们对六种主要中心性的边缘敏感性进行了理论分析:接近中心性、调和中心性、中间中心性、端点中间中心性、PageRank和生成树中心性。我们在合成图和真实图上的实验结果证实了理论分析预测的趋势。对于整数k = 1,我们也讨论了边灵敏度的扩展,即我们删除一个大小为k的统一选择的边集。
{"title":"Sensitivity Analysis of Centralities on Unweighted Networks","authors":"Shogo Murai, Yuichi Yoshida","doi":"10.1145/3308558.3313422","DOIUrl":"https://doi.org/10.1145/3308558.3313422","url":null,"abstract":"Revealing important vertices is a fundamental task in network analysis. As such, many indicators have been proposed for doing so, which are collectively called centralities. However, the abundance of studies on centralities blurs their differences. In this work, we compare centralities based on their sensivitity to modifications in the graph. Specifically, we introduce a quantitative measure called (average-case) edge sensitivity, which measures how much the centrality value of a uniformly chosen vertex (or an edge) changes when we remove a uniformly chosen edge. Edge sensitivity is applicable to unweighted graphs, regarding which, to our knowledge, there has been no theoretical analysis of the centralities. We conducted a theoretical analysis of the edge sensitivities of six major centralities: the closeness centrality, harmonic centrality, betweenness centrality, endpoint betweenness centrality, PageRank, and spanning tree centrality. Our experimental results on synthetic and real graphs confirm the tendency predicted by the theoretical analysis. We also discuss an extension of edge sensitivity to the setting that we remove a uniformly chosen set of edges of size k for an integer k = 1.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78573560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Unnecessarily Identifiable: Quantifying the fingerprintability of browser extensions due to bloat 不必要的可识别性:由于膨胀而量化浏览器扩展的可识别性
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313458
Oleksii Starov, Pierre Laperdrix, A. Kapravelos, Nick Nikiforakis
In this paper, we investigate to what extent the page modifications that make browser extensions fingerprintable are necessary for their operation. We characterize page modifications that are completely unnecessary for the extension's functionality as extension bloat. By analyzing 58,034 extensions from the Google Chrome store, we discovered that 5.7% of them were unnecessarily identifiable because of extension bloat. To protect users against unnecessary extension fingerprinting due to bloat, we describe the design and implementation of an in-browser mechanism that provides coarse-grained access control for extensions on all websites. The proposed mechanism and its built-in policies, does not only protect users from fingerprinting, but also offers additional protection against malicious extensions exfiltrating user data from sensitive websites.
在本文中,我们研究了在多大程度上,使浏览器扩展可指纹化的页面修改对其操作是必要的。我们将对扩展功能完全不必要的页面修改描述为扩展膨胀。通过分析来自Google Chrome商店的58034个扩展,我们发现5.7%的扩展由于扩展膨胀而无法识别。为了保护用户免受不必要的扩展指纹识别,我们描述了浏览器内机制的设计和实现,该机制为所有网站上的扩展提供粗粒度访问控制。该机制及其内置策略不仅可以保护用户免受指纹识别,还可以提供额外的保护,防止恶意扩展从敏感网站窃取用户数据。
{"title":"Unnecessarily Identifiable: Quantifying the fingerprintability of browser extensions due to bloat","authors":"Oleksii Starov, Pierre Laperdrix, A. Kapravelos, Nick Nikiforakis","doi":"10.1145/3308558.3313458","DOIUrl":"https://doi.org/10.1145/3308558.3313458","url":null,"abstract":"In this paper, we investigate to what extent the page modifications that make browser extensions fingerprintable are necessary for their operation. We characterize page modifications that are completely unnecessary for the extension's functionality as extension bloat. By analyzing 58,034 extensions from the Google Chrome store, we discovered that 5.7% of them were unnecessarily identifiable because of extension bloat. To protect users against unnecessary extension fingerprinting due to bloat, we describe the design and implementation of an in-browser mechanism that provides coarse-grained access control for extensions on all websites. The proposed mechanism and its built-in policies, does not only protect users from fingerprinting, but also offers additional protection against malicious extensions exfiltrating user data from sensitive websites.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84294147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
期刊
The World Wide Web Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1