The World Wide Web Conference最新文献

英文中文

Deriving User- and Content-specific Rewards for Contextual Bandits 为上下文强盗获取用户和内容特定奖励

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313592

Paolo Dragone, Rishabh Mehrotra, M. Lalmas

Bandit algorithms have gained increased attention in recommender systems, as they provide effective and scalable recommendations. These algorithms use reward functions, usually based on a numeric variable such as click-through rates, as the basis for optimization. On a popular music streaming service, a contextual bandit algorithm is used to decide which content to recommend to users, where the reward function is a binarization of a numeric variable that defines success based on a static threshold of user streaming time: 1 if the user streamed for at least 30 seconds and 0 otherwise. We explore alternative methods to provide a more informed reward function, based on the assumptions that streaming time distribution heavily depends on the type of user and the type of content being streamed. To automatically extract user and content groups from streaming data, we employ ”co-clustering”, an unsupervised learning technique to simultaneously extract clusters of rows and columns from a co-occurrence matrix. The streaming distributions within the co-clusters are then used to define rewards specific to each co-cluster. Our proposed co-clustered based reward functions lead to improvement of over 25% in expected stream rate, compared to the standard binarized rewards.

Bandit算法在推荐系统中获得了越来越多的关注，因为它们提供了有效和可扩展的推荐。这些算法使用奖励函数(通常基于数值变量，如点击率)作为优化的基础。在流行的音乐流媒体服务上，使用上下文盗贼算法来决定向用户推荐哪些内容，其中奖励函数是基于用户流媒体时间的静态阈值定义成功的数字变量的二值化:如果用户流媒体至少30秒，则为1，否则为0。基于流媒体时间分布严重依赖于用户类型和流媒体内容类型的假设，我们探索了提供更明智的奖励功能的替代方法。为了从流数据中自动提取用户和内容组，我们采用了“共聚类”，这是一种无监督学习技术，可以同时从共现矩阵中提取行和列的簇。然后使用协同集群内的流分布来定义特定于每个协同集群的奖励。与标准二值化奖励相比，我们提出的基于共聚类的奖励函数导致预期流率提高25%以上。

{"title":"Deriving User- and Content-specific Rewards for Contextual Bandits","authors":"Paolo Dragone, Rishabh Mehrotra, M. Lalmas","doi":"10.1145/3308558.3313592","DOIUrl":"https://doi.org/10.1145/3308558.3313592","url":null,"abstract":"Bandit algorithms have gained increased attention in recommender systems, as they provide effective and scalable recommendations. These algorithms use reward functions, usually based on a numeric variable such as click-through rates, as the basis for optimization. On a popular music streaming service, a contextual bandit algorithm is used to decide which content to recommend to users, where the reward function is a binarization of a numeric variable that defines success based on a static threshold of user streaming time: 1 if the user streamed for at least 30 seconds and 0 otherwise. We explore alternative methods to provide a more informed reward function, based on the assumptions that streaming time distribution heavily depends on the type of user and the type of content being streamed. To automatically extract user and content groups from streaming data, we employ ”co-clustering”, an unsupervised learning technique to simultaneously extract clusters of rows and columns from a co-occurrence matrix. The streaming distributions within the co-clusters are then used to define rewards specific to each co-cluster. Our proposed co-clustered based reward functions lead to improvement of over 25% in expected stream rate, compared to the standard binarized rewards.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77590749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Learning Binary Hash Codes for Fast Anchor Link Retrieval across Networks 学习二进制哈希码快速锚链接检索跨网络

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313430

Yongqing Wang, Huawei Shen, Jinhua Gao, Xueqi Cheng

Users are usually involved in multiple social networks, without explicit anchor links that reveal the correspondence among different accounts of the same user across networks. Anchor link prediction aims to identify the hidden anchor links, which is a fundamental problem for user profiling, information cascading, and cross-domain recommendation. Although existing methods perform well in the accuracy of anchor link prediction, the pairwise search manners on inferring anchor links suffer from big challenge when being deployed in practical systems. To combat the challenges, in this paper we propose a novel embedding and matching architecture to directly learn binary hash code for each node. Hash codes offer us an efficient index to filter out the candidate node pairs for anchor link prediction. Extensive experiments on synthetic and real world large-scale datasets demonstrate that our proposed method has high time efficiency without loss of competitive prediction accuracy in anchor link prediction.

用户通常参与多个社交网络，没有明确的锚链接来揭示同一用户跨网络的不同帐户之间的对应关系。锚链接预测的目的是识别隐藏的锚链接，这是用户分析、信息级联和跨域推荐的基础问题。虽然现有方法在锚链预测的准确性方面表现良好，但在实际系统中部署时，推断锚链的成对搜索方式面临着很大的挑战。为了应对这些挑战，本文提出了一种新的嵌入和匹配架构来直接学习每个节点的二进制哈希码。哈希码为我们提供了一个有效的索引来过滤出候选节点对以进行锚链接预测。在合成和真实世界大规模数据集上的大量实验表明，该方法具有较高的时间效率，且不会损失锚链预测的竞争预测精度。

引用次数: 21

Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network 你的风格你的身份:利用文字和摄影风格在暗网市场中识别毒品贩子

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313537

Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, X. Li, Liang Zhao, C. Shi, Jiabin Wang, Qi Xiong

Due to its anonymity, there has been a dramatic growth of underground drug markets hosted in the darknet (e.g., Dream Market and Valhalla). To combat drug trafficking (a.k.a. illicit drug trading) in the cyberspace, there is an urgent need for automatic analysis of participants in darknet markets. However, one of the key challenges is that drug traffickers (i.e., vendors) may maintain multiple accounts across different markets or within the same market. To address this issue, in this paper, we propose and develop an intelligent system named uStyle-uID leveraging both writing and photography styles for drug trafficker identification at the first attempt. At the core of uStyle-uID is an attributed heterogeneous information network (AHIN) which elegantly integrates both writing and photography styles along with the text and photo contents, as well as other supporting attributes (i.e., trafficker and drug information) and various kinds of relations. Built on the constructed AHIN, to efficiently measure the relatedness over nodes (i.e., traffickers) in the constructed AHIN, we propose a new network embedding model Vendor2Vec to learn the low-dimensional representations for the nodes in AHIN, which leverages complementary attribute information attached in the nodes to guide the meta-path based random walk for path instances sampling. After that, we devise a learning model named vIdentifier to classify if a given pair of traffickers are the same individual. Comprehensive experiments on the data collections from four different darknet markets are conducted to validate the effectiveness of uStyle-uID which integrates our proposed method in drug trafficker identification by comparisons with alternative approaches.

由于其匿名性，在暗网上举办的地下毒品市场(如梦幻市场和英灵殿)急剧增长。为了打击网络空间的毒品贩运(又称非法毒品交易)，迫切需要对暗网市场的参与者进行自动分析。然而，主要挑战之一是毒品贩运者(即卖主)可能在不同市场或同一市场内拥有多个帐户。为了解决这个问题，在本文中，我们提出并开发了一个名为uStyle-uID的智能系统，利用写作和摄影风格在第一次尝试中识别毒贩。uStyle-uID的核心是一个属性异构信息网络(AHIN)，它优雅地整合了写作和摄影风格以及文字和照片内容，以及其他支持属性(如贩运者和毒品信息)和各种关系。在构建AHIN的基础上，为了有效地度量AHIN中节点(即贩运者)之间的相关性，我们提出了一种新的网络嵌入模型Vendor2Vec来学习AHIN中节点的低维表示，该模型利用节点附加的互补属性信息来指导基于元路径的随机行走进行路径实例采样。在此之后，我们设计了一个名为“标识符”的学习模型，用于对给定的一对贩运者是否为同一个体进行分类。通过对四个不同暗网市场的数据收集进行综合实验，通过与其他方法的比较，验证了uStyle-uID在毒贩识别中的有效性。

{"title":"Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network","authors":"Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, X. Li, Liang Zhao, C. Shi, Jiabin Wang, Qi Xiong","doi":"10.1145/3308558.3313537","DOIUrl":"https://doi.org/10.1145/3308558.3313537","url":null,"abstract":"Due to its anonymity, there has been a dramatic growth of underground drug markets hosted in the darknet (e.g., Dream Market and Valhalla). To combat drug trafficking (a.k.a. illicit drug trading) in the cyberspace, there is an urgent need for automatic analysis of participants in darknet markets. However, one of the key challenges is that drug traffickers (i.e., vendors) may maintain multiple accounts across different markets or within the same market. To address this issue, in this paper, we propose and develop an intelligent system named uStyle-uID leveraging both writing and photography styles for drug trafficker identification at the first attempt. At the core of uStyle-uID is an attributed heterogeneous information network (AHIN) which elegantly integrates both writing and photography styles along with the text and photo contents, as well as other supporting attributes (i.e., trafficker and drug information) and various kinds of relations. Built on the constructed AHIN, to efficiently measure the relatedness over nodes (i.e., traffickers) in the constructed AHIN, we propose a new network embedding model Vendor2Vec to learn the low-dimensional representations for the nodes in AHIN, which leverages complementary attribute information attached in the nodes to guide the meta-path based random walk for path instances sampling. After that, we devise a learning model named vIdentifier to classify if a given pair of traffickers are the same individual. Comprehensive experiments on the data collections from four different darknet markets are conducted to validate the effectiveness of uStyle-uID which integrates our proposed method in drug trafficker identification by comparisons with alternative approaches.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86016260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

From Small-scale to Large-scale Text Classification 从小规模到大规模文本分类

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313563

Kang-Min Kim, Yeachan Kim, Jungho Lee, Ji-Min Lee, SangKeun Lee

Neural network models have achieved impressive results in the field of text classification. However, existing approaches often suffer from insufficient training data in a large-scale text classification involving a large number of categories (e.g., several thousands of categories). Several neural network models have utilized multi-task learning to overcome the limited amount of training data. However, these approaches are also limited to small-scale text classification. In this paper, we propose a novel neural network-based multi-task learning framework for large-scale text classification. To this end, we first treat the different scales of text classification (i.e., large and small numbers of categories) as multiple, related tasks. Then, we train the proposed neural network, which learns small- and large-scale text classification tasks simultaneously. In particular, we further enhance this multi-task learning architecture by using a gate mechanism, which controls the flow of features between the small- and large-scale text classification tasks. Experimental results clearly show that our proposed model improves the performance of the large-scale text classification task with the help of the small-scale text classification task. The proposed scheme exhibits significant improvements of as much as 14% and 5% in terms of micro-averaging and macro-averaging F1-score, respectively, over state-of-the-art techniques.

神经网络模型在文本分类领域取得了令人瞩目的成绩。然而，在涉及大量类别(例如数千个类别)的大规模文本分类中，现有方法往往存在训练数据不足的问题。一些神经网络模型利用多任务学习来克服训练数据量有限的问题。然而，这些方法也局限于小规模文本分类。本文提出了一种新的基于神经网络的多任务学习框架，用于大规模文本分类。为此，我们首先将文本分类的不同尺度(即大类和小大类)视为多个相关的任务。然后，我们训练所提出的神经网络，它可以同时学习小型和大规模的文本分类任务。特别是，我们通过使用gate机制进一步增强了这种多任务学习架构，该机制控制了小型和大型文本分类任务之间的特征流。实验结果清楚地表明，我们提出的模型在小规模文本分类任务的帮助下提高了大规模文本分类任务的性能。与最先进的技术相比，所提出的方案在微观平均和宏观平均f1得分方面分别表现出高达14%和5%的显著改进。

{"title":"From Small-scale to Large-scale Text Classification","authors":"Kang-Min Kim, Yeachan Kim, Jungho Lee, Ji-Min Lee, SangKeun Lee","doi":"10.1145/3308558.3313563","DOIUrl":"https://doi.org/10.1145/3308558.3313563","url":null,"abstract":"Neural network models have achieved impressive results in the field of text classification. However, existing approaches often suffer from insufficient training data in a large-scale text classification involving a large number of categories (e.g., several thousands of categories). Several neural network models have utilized multi-task learning to overcome the limited amount of training data. However, these approaches are also limited to small-scale text classification. In this paper, we propose a novel neural network-based multi-task learning framework for large-scale text classification. To this end, we first treat the different scales of text classification (i.e., large and small numbers of categories) as multiple, related tasks. Then, we train the proposed neural network, which learns small- and large-scale text classification tasks simultaneously. In particular, we further enhance this multi-task learning architecture by using a gate mechanism, which controls the flow of features between the small- and large-scale text classification tasks. Experimental results clearly show that our proposed model improves the performance of the large-scale text classification task with the help of the small-scale text classification task. The proposed scheme exhibits significant improvements of as much as 14% and 5% in terms of micro-averaging and macro-averaging F1-score, respectively, over state-of-the-art techniques.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88749187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Rethinking the Detection of Child Sexual Abuse Imagery on the Internet 对网络儿童性侵图像检测的再思考

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313482

Elie Bursztein, Einat Clarke, Michelle DeLaune, David M. Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, Travis Bright

Over the last decade, the illegal distribution of child sexual abuse imagery (CSAI) has transformed alongside the rise of online sharing platforms. In this paper, we present the first longitudinal measurement study of CSAI distribution online and the threat it poses to society's ability to combat child sexual abuse. Our results illustrate that CSAI has grown exponentially-to nearly 1 million detected events per month-exceeding the capabilities of independent clearinghouses and law enforcement to take action. In order to scale CSAI protections moving forward, we discuss techniques for automating detection and response by using recent advancements in machine learning.

在过去的十年里，随着在线分享平台的兴起，非法传播儿童性虐待图像(CSAI)的情况发生了变化。在本文中，我们提出了第一个在线CSAI分布的纵向测量研究，以及它对社会打击儿童性虐待能力的威胁。我们的研究结果表明，CSAI已呈指数级增长——每月检测到近100万起事件，超出了独立清算所和执法部门采取行动的能力。为了进一步扩展CSAI保护，我们讨论了通过使用机器学习的最新进展来自动化检测和响应的技术。

引用次数: 60

A Hierarchical Attention Retrieval Model for Healthcare Question Answering 医疗保健问答的分层注意检索模型

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313699

Ming Zhu, Aman Ahuja, Wei Wei, C. Reddy

The growth of the Web in recent years has resulted in the development of various online platforms that provide healthcare information services. These platforms contain an enormous amount of information, which could be beneficial for a large number of people. However, navigating through such knowledgebases to answer specific queries of healthcare consumers is a challenging task. A majority of such queries might be non-factoid in nature, and hence, traditional keyword-based retrieval models do not work well for such cases. Furthermore, in many scenarios, it might be desirable to get a short answer that sufficiently answers the query, instead of a long document with only a small amount of useful information. In this paper, we propose a neural network model for ranking documents for question answering in the healthcare domain. The proposed model uses a deep attention mechanism at word, sentence, and document levels, for efficient retrieval for both factoid and non-factoid queries, on documents of varied lengths. Specifically, the word-level cross-attention allows the model to identify words that might be most relevant for a query, and the hierarchical attention at sentence and document levels allows it to do effective retrieval on both long and short documents. We also construct a new large-scale healthcare question-answering dataset, which we use to evaluate our model. Experimental evaluation results against several state-of-the-art baselines show that our model outperforms the existing retrieval techniques.

近年来网络的发展导致了各种提供医疗信息服务的在线平台的发展。这些平台包含了大量的信息，这可能对很多人有益。然而，通过这样的知识库来回答医疗保健消费者的特定查询是一项具有挑战性的任务。大多数此类查询本质上可能是非事实性的，因此，传统的基于关键字的检索模型不适用于此类情况。此外，在许多场景中，可能希望得到一个能够充分回答查询的简短答案，而不是一个只有少量有用信息的长文档。在本文中，我们提出了一种神经网络模型，用于对医疗保健领域的问答文档进行排序。提出的模型在单词、句子和文档级别上使用深度注意机制，以便在不同长度的文档上有效地检索事实和非事实查询。具体来说，单词级别的交叉注意允许模型识别可能与查询最相关的单词，句子和文档级别的分层注意允许它对长文档和短文档进行有效检索。我们还构建了一个新的大规模医疗保健问答数据集，我们使用它来评估我们的模型。针对几种最先进的基线的实验评估结果表明，我们的模型优于现有的检索技术。

{"title":"A Hierarchical Attention Retrieval Model for Healthcare Question Answering","authors":"Ming Zhu, Aman Ahuja, Wei Wei, C. Reddy","doi":"10.1145/3308558.3313699","DOIUrl":"https://doi.org/10.1145/3308558.3313699","url":null,"abstract":"The growth of the Web in recent years has resulted in the development of various online platforms that provide healthcare information services. These platforms contain an enormous amount of information, which could be beneficial for a large number of people. However, navigating through such knowledgebases to answer specific queries of healthcare consumers is a challenging task. A majority of such queries might be non-factoid in nature, and hence, traditional keyword-based retrieval models do not work well for such cases. Furthermore, in many scenarios, it might be desirable to get a short answer that sufficiently answers the query, instead of a long document with only a small amount of useful information. In this paper, we propose a neural network model for ranking documents for question answering in the healthcare domain. The proposed model uses a deep attention mechanism at word, sentence, and document levels, for efficient retrieval for both factoid and non-factoid queries, on documents of varied lengths. Specifically, the word-level cross-attention allows the model to identify words that might be most relevant for a query, and the hierarchical attention at sentence and document levels allows it to do effective retrieval on both long and short documents. We also construct a new large-scale healthcare question-answering dataset, which we use to evaluate our model. Experimental evaluation results against several state-of-the-art baselines show that our model outperforms the existing retrieval techniques.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87715922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Unnecessarily Identifiable: Quantifying the fingerprintability of browser extensions due to bloat 不必要的可识别性:由于膨胀而量化浏览器扩展的可识别性

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313458

Oleksii Starov, Pierre Laperdrix, A. Kapravelos, Nick Nikiforakis

In this paper, we investigate to what extent the page modifications that make browser extensions fingerprintable are necessary for their operation. We characterize page modifications that are completely unnecessary for the extension's functionality as extension bloat. By analyzing 58,034 extensions from the Google Chrome store, we discovered that 5.7% of them were unnecessarily identifiable because of extension bloat. To protect users against unnecessary extension fingerprinting due to bloat, we describe the design and implementation of an in-browser mechanism that provides coarse-grained access control for extensions on all websites. The proposed mechanism and its built-in policies, does not only protect users from fingerprinting, but also offers additional protection against malicious extensions exfiltrating user data from sensitive websites.

在本文中，我们研究了在多大程度上，使浏览器扩展可指纹化的页面修改对其操作是必要的。我们将对扩展功能完全不必要的页面修改描述为扩展膨胀。通过分析来自Google Chrome商店的58034个扩展，我们发现5.7%的扩展由于扩展膨胀而无法识别。为了保护用户免受不必要的扩展指纹识别，我们描述了浏览器内机制的设计和实现，该机制为所有网站上的扩展提供粗粒度访问控制。该机制及其内置策略不仅可以保护用户免受指纹识别，还可以提供额外的保护，防止恶意扩展从敏感网站窃取用户数据。

引用次数: 28

The Illusion of Change: Correcting for Biases in Change Inference for Sparse, Societal-Scale Data 变化的错觉:对稀疏的社会尺度数据的变化推断的偏差纠正

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313722

Gabriel Cadamuro, Ramya Korlakai Vinayak, J. Blumenstock, S. Kakade, Jacob N. Shapiro

Societal-scale data is playing an increasingly prominent role in social science research; examples from research on geopolitical events include questions on how emergency events impact the diffusion of information or how new policies change patterns of social interaction. Such research often draws critical inferences from observing how an exogenous event changes meaningful metrics like network degree or network entropy. However, as we show in this work, standard estimation methodologies make systematically incorrect inferences when the event also changes the sparsity of the data. To address this issue, we provide a general framework for inferring changes in social metrics when dealing with non-stationary sparsity. We propose a plug-in correction that can be applied to any estimator, including several recently proposed procedures. Using both simulated and real data, we demonstrate that the correction significantly improves the accuracy of the estimated change under a variety of plausible data generating processes. In particular, using a large dataset of calls from Afghanistan, we show that whereas traditional methods substantially overestimate the impact of a violent event on social diversity, the plug-in correction reveals the true response to be much more modest.

社会尺度数据在社会科学研究中的作用日益突出;地缘政治事件研究的例子包括关于紧急事件如何影响信息传播或新政策如何改变社会互动模式的问题。此类研究通常通过观察外生事件如何改变网络度或网络熵等有意义的指标，得出关键的推论。然而，正如我们在这项工作中所展示的，当事件也改变了数据的稀疏性时，标准估计方法会做出系统错误的推断。为了解决这个问题，我们提供了一个通用框架，用于在处理非平稳稀疏性时推断社会指标的变化。我们提出了一个插件校正，可以应用于任何估计器，包括最近提出的几个过程。利用模拟数据和真实数据，我们证明了在各种可能的数据生成过程下，校正显著提高了估计变化的准确性。特别是，使用来自阿富汗的电话的大型数据集，我们表明，传统方法大大高估了暴力事件对社会多样性的影响，而插件修正显示，真实的反应要温和得多。

{"title":"The Illusion of Change: Correcting for Biases in Change Inference for Sparse, Societal-Scale Data","authors":"Gabriel Cadamuro, Ramya Korlakai Vinayak, J. Blumenstock, S. Kakade, Jacob N. Shapiro","doi":"10.1145/3308558.3313722","DOIUrl":"https://doi.org/10.1145/3308558.3313722","url":null,"abstract":"Societal-scale data is playing an increasingly prominent role in social science research; examples from research on geopolitical events include questions on how emergency events impact the diffusion of information or how new policies change patterns of social interaction. Such research often draws critical inferences from observing how an exogenous event changes meaningful metrics like network degree or network entropy. However, as we show in this work, standard estimation methodologies make systematically incorrect inferences when the event also changes the sparsity of the data. To address this issue, we provide a general framework for inferring changes in social metrics when dealing with non-stationary sparsity. We propose a plug-in correction that can be applied to any estimator, including several recently proposed procedures. Using both simulated and real data, we demonstrate that the correction significantly improves the accuracy of the estimated change under a variety of plausible data generating processes. In particular, using a large dataset of calls from Afghanistan, we show that whereas traditional methods substantially overestimate the impact of a violent event on social diversity, the plug-in correction reveals the true response to be much more modest.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85620995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

With a Little Help from My Friends (and Their Friends): Influence Neighborhoods for Social Recommendations 从我的朋友(和他们的朋友)的一点帮助:影响社区的社会推荐

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313745

Avni Gulati, M. Eirinaki

Social recommendations have been a very intriguing domain for researchers in the past decade. The main premise is that the social network of a user can be leveraged to enhance the rating-based recommendation process. This has been achieved in various ways, and under different assumptions about the network characteristics, structure, and availability of other information (such as trust, content, etc.) In this work, we create neighborhoods of influence leveraging only the social graph structure. These are in turn introduced in the recommendation process both as a pre-processing step and as a social regularization factor of the matrix factorization algorithm. Our experimental evaluation using real-life datasets demonstrates the effectiveness of the proposed technique.

在过去的十年里，社会推荐一直是研究人员非常感兴趣的领域。其主要前提是，可以利用用户的社交网络来增强基于评级的推荐过程。这是通过各种方式实现的，并且是在关于网络特征、结构和其他信息(如信任、内容等)的可用性的不同假设下实现的。在这项工作中，我们仅利用社交图结构创建了影响力社区。这些依次作为预处理步骤和矩阵分解算法的社会正则化因子引入到推荐过程中。我们使用真实数据集的实验评估证明了所提出技术的有效性。

引用次数: 9

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization 基于稀疏矩阵分解的大规模网络嵌入

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313446

J. Qiu, Yuxiao Dong, Hao Ma, Jun Yu Li, Chi Wang, Kuansan Wang, Jie Tang

We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix-which is dense-is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available1.

我们研究了大规模网络嵌入问题，旨在学习网络挖掘应用的潜在表示。先前的研究表明，1)流行的网络嵌入基准，如DeepWalk，本质上是隐式分解具有封闭形式的矩阵，2)这种矩阵的显式分解产生比现有方法更强大的嵌入。然而，直接构造和分解这个矩阵——它是密集的——在时间和空间上都是非常昂贵的，使得它不能用于大型网络。在这项工作中，我们提出了大规模网络嵌入的稀疏矩阵分解算法(NetSMF)。NetSMF利用谱稀疏化理论有效地稀疏了上述密集矩阵，从而显著提高了嵌入学习的效率。稀疏化后的矩阵在谱上接近原始密集矩阵，具有理论上有界的近似误差，这有助于保持学习到的嵌入的表示能力。我们在各种规模和类型的网络上进行实验。结果表明，在常用的基准测试方法和基于因子分解的方法中，NetSMF是唯一既高效又有效的方法。我们表明，NetSMF只需要24小时就可以为具有数千万个节点的大规模学术协作网络生成有效的嵌入，而这将花费DeepWalk数月的时间，并且对于密集矩阵分解解决方案在计算上是不可行的。NetSMF的源代码是公开的。

{"title":"NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization","authors":"J. Qiu, Yuxiao Dong, Hao Ma, Jun Yu Li, Chi Wang, Kuansan Wang, Jie Tang","doi":"10.1145/3308558.3313446","DOIUrl":"https://doi.org/10.1145/3308558.3313446","url":null,"abstract":"We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix-which is dense-is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available1.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86204512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 143

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

The World Wide Web Conference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀