The World Wide Web Conference最新文献_第8页

Tortoise or Hare? Quantifying the Effects of Performance on Mobile App Retention 乌龟还是兔子?量化性能对手机应用留存率的影响

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313428

Agustin Zuniga, Huber Flores, Eemil Lagerspetz, P. Nurmi, S. Tarkoma, P. Hui, J. Manner

We contribute by quantifying the effect of network latency and battery consumption on mobile app performance and retention, i.e., user's decisions to continue or stop using apps. We perform our analysis by fusing two large-scale crowdsensed datasets collected by piggybacking on information captured by mobile apps. We find that app performance has an impact in its retention rate. Our results demonstrate that high energy consumption and high latency decrease the likelihood of retaining an app. Conversely, we show that reducing latency or energy consumption does not guarantee higher likelihood of retention as long as they are within reasonable standards of performance. However, we also demonstrate that what is considered reasonable depends on what users have been accustomed to, with device and network characteristics, and app category playing a role. As our second contribution, we develop a model for predicting retention based on performance metrics. We demonstrate the benefits of our model through empirical benchmarks which show that our model not only predicts retention accurately, but generalizes well across application categories, locations and other factors moderating the effect of performance.

我们通过量化网络延迟和电池消耗对移动应用性能和留存率的影响来做出贡献，即用户继续或停止使用应用的决定。我们通过融合两个大规模的众感数据集来进行分析，这些数据集是通过移动应用程序捕获的信息收集而来的。我们发现应用性能会影响其留存率。我们的研究结果表明，高能量消耗和高延迟降低了保留应用程序的可能性。相反，我们表明，只要在合理的性能标准内，减少延迟或能量消耗并不能保证更高的保留可能性。然而，我们也证明了什么是合理的取决于用户已经习惯了什么，设备和网络特征，以及应用类别发挥作用。作为我们的第二个贡献，我们开发了一个基于绩效指标预测留存率的模型。我们通过经验基准证明了我们模型的优势，这表明我们的模型不仅可以准确地预测留存率，而且可以很好地概括应用类别、位置和其他调节性能影响的因素。

{"title":"Tortoise or Hare? Quantifying the Effects of Performance on Mobile App Retention","authors":"Agustin Zuniga, Huber Flores, Eemil Lagerspetz, P. Nurmi, S. Tarkoma, P. Hui, J. Manner","doi":"10.1145/3308558.3313428","DOIUrl":"https://doi.org/10.1145/3308558.3313428","url":null,"abstract":"We contribute by quantifying the effect of network latency and battery consumption on mobile app performance and retention, i.e., user's decisions to continue or stop using apps. We perform our analysis by fusing two large-scale crowdsensed datasets collected by piggybacking on information captured by mobile apps. We find that app performance has an impact in its retention rate. Our results demonstrate that high energy consumption and high latency decrease the likelihood of retaining an app. Conversely, we show that reducing latency or energy consumption does not guarantee higher likelihood of retention as long as they are within reasonable standards of performance. However, we also demonstrate that what is considered reasonable depends on what users have been accustomed to, with device and network characteristics, and app category playing a role. As our second contribution, we develop a model for predicting retention based on performance metrics. We demonstrate the benefits of our model through empirical benchmarks which show that our model not only predicts retention accurately, but generalizes well across application categories, locations and other factors moderating the effect of performance.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74505905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Review Response Generation in E-Commerce Platforms with External Product Information 具有外部产品信息的电子商务平台的评审响应生成

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313581

Lujun Zhao, Kaisong Song, Changlong Sun, Qi Zhang, Xuanjing Huang, Xiaozhong Liu

''User reviews” are becoming an essential component of e-commerce. When buyers write a negative or doubting review, ideally, the sellers need to quickly give a response to minimize the potential impact. When the number of reviews is growing at a frightening speed, there is an urgent need to build a response writing assistant for customer service providers. In order to generate high-quality responses, the algorithm needs to consume and understand the information from both the original review and the target product. The classical sequence-to-sequence (Seq2Seq) methods can hardly satisfy this requirement. In this study, we propose a novel deep neural network model based on the Seq2Seq framework for the review response generation task in e-commerce platforms, which can incorporate product information by a gated multi-source attention mechanism and a copy mechanism. Moreover, we employ a reinforcement learning technique to reduce the exposure bias problem. To evaluate the proposed model, we constructed a large-scale dataset from a popular e-commerce website, which contains product information. Empirical studies on both automatic evaluation metrics and human annotations show that the proposed model can generate informative and diverse responses, significantly outperforming state-of-the-art text generation models.

“用户评论”正在成为电子商务的一个重要组成部分。当买家写下负面或怀疑的评论时，理想情况下，卖家需要迅速做出回应，以尽量减少潜在的影响。当评论数量以惊人的速度增长时，迫切需要为客户服务提供商建立一个回复写作助手。为了生成高质量的响应，算法需要消费和理解来自原始评论和目标产品的信息。传统的序列对序列(Seq2Seq)方法很难满足这一要求。本文提出了一种新的基于Seq2Seq框架的深度神经网络模型，该模型通过门控多源注意机制和复制机制将产品信息整合到电子商务平台的评论响应生成任务中。此外，我们采用强化学习技术来减少暴露偏差问题。为了评估所提出的模型，我们从一个流行的电子商务网站构建了一个包含产品信息的大规模数据集。对自动评价指标和人工注释的实证研究表明，该模型可以生成信息丰富且多样化的响应，显著优于目前最先进的文本生成模型。

{"title":"Review Response Generation in E-Commerce Platforms with External Product Information","authors":"Lujun Zhao, Kaisong Song, Changlong Sun, Qi Zhang, Xuanjing Huang, Xiaozhong Liu","doi":"10.1145/3308558.3313581","DOIUrl":"https://doi.org/10.1145/3308558.3313581","url":null,"abstract":"''User reviews” are becoming an essential component of e-commerce. When buyers write a negative or doubting review, ideally, the sellers need to quickly give a response to minimize the potential impact. When the number of reviews is growing at a frightening speed, there is an urgent need to build a response writing assistant for customer service providers. In order to generate high-quality responses, the algorithm needs to consume and understand the information from both the original review and the target product. The classical sequence-to-sequence (Seq2Seq) methods can hardly satisfy this requirement. In this study, we propose a novel deep neural network model based on the Seq2Seq framework for the review response generation task in e-commerce platforms, which can incorporate product information by a gated multi-source attention mechanism and a copy mechanism. Moreover, we employ a reinforcement learning technique to reduce the exposure bias problem. To evaluate the proposed model, we constructed a large-scale dataset from a popular e-commerce website, which contains product information. Empirical studies on both automatic evaluation metrics and human annotations show that the proposed model can generate informative and diverse responses, significantly outperforming state-of-the-art text generation models.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78812517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Sampled in Pairs and Driven by Text: A New Graph Embedding Framework 成对采样和文本驱动:一种新的图嵌入框架

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313520

Liheng Chen, Yanru Qu, Zhenghui Wang, Lin Qiu, Weinan Zhang, Ken Chen, Shaodian Zhang, Yong Yu

In graphs with rich texts, incorporating textual information with structural information would benefit constructing expressive graph embeddings. Among various graph embedding models, random walk (RW)-based is one of the most popular and successful groups. However, it is challenged by two issues when applied on graphs with rich texts: (i) sampling efficiency: deriving from the training objective of RW-based models (e.g., DeepWalk and node2vec), we show that RW-based models are likely to generate large amounts of redundant training samples due to three main drawbacks. (ii) text utilization: these models have difficulty in dealing with zero-shot scenarios where graph embedding models have to infer graph structures directly from texts. To solve these problems, we propose a novel framework, namely Text-driven Graph Embedding with Pairs Sampling (TGE-PS). TGE-PS uses Pairs Sampling (PS) to improve the sampling strategy of RW, being able to reduce ~ 99% training samples while preserving competitive performance. TGE-PS uses Text-driven Graph Embedding (TGE), an inductive graph embedding approach, to generate node embeddings from texts. Since each node contains rich texts, TGE is able to generate high-quality embeddings and provide reasonable predictions on existence of links to unseen nodes. We evaluate TGE-PS on several real-world datasets, and experiment results demonstrate that TGE-PS produces state-of-the-art results on both traditional and zero-shot link prediction tasks.

在具有丰富文本的图中，将文本信息与结构信息结合将有利于构造富有表现力的图嵌入。在各种图嵌入模型中，基于随机漫步的图嵌入模型是最受欢迎和成功的一种。然而，当应用于具有丰富文本的图时，它受到两个问题的挑战:(i)采样效率:从基于rw的模型(例如DeepWalk和node2vec)的训练目标出发，我们发现基于rw的模型可能会产生大量冗余的训练样本，这主要有三个缺点。(ii)文本利用:在图嵌入模型必须直接从文本推断图结构的情况下，这些模型难以处理零射击场景。为了解决这些问题，我们提出了一种新的框架，即文本驱动图嵌入对采样(TGE-PS)。TGE-PS使用成对采样(PS)来改进RW的采样策略，能够在保持竞争性能的同时减少~ 99%的训练样本。TGE- ps使用文本驱动图嵌入(TGE)，一种归纳图嵌入方法，从文本中生成节点嵌入。由于每个节点都包含丰富的文本，TGE能够生成高质量的嵌入，并对未见节点的链接的存在提供合理的预测。我们在几个真实数据集上评估了ge - ps，实验结果表明ge - ps在传统和零射击链路预测任务上都能产生最先进的结果。

{"title":"Sampled in Pairs and Driven by Text: A New Graph Embedding Framework","authors":"Liheng Chen, Yanru Qu, Zhenghui Wang, Lin Qiu, Weinan Zhang, Ken Chen, Shaodian Zhang, Yong Yu","doi":"10.1145/3308558.3313520","DOIUrl":"https://doi.org/10.1145/3308558.3313520","url":null,"abstract":"In graphs with rich texts, incorporating textual information with structural information would benefit constructing expressive graph embeddings. Among various graph embedding models, random walk (RW)-based is one of the most popular and successful groups. However, it is challenged by two issues when applied on graphs with rich texts: (i) sampling efficiency: deriving from the training objective of RW-based models (e.g., DeepWalk and node2vec), we show that RW-based models are likely to generate large amounts of redundant training samples due to three main drawbacks. (ii) text utilization: these models have difficulty in dealing with zero-shot scenarios where graph embedding models have to infer graph structures directly from texts. To solve these problems, we propose a novel framework, namely Text-driven Graph Embedding with Pairs Sampling (TGE-PS). TGE-PS uses Pairs Sampling (PS) to improve the sampling strategy of RW, being able to reduce ~ 99% training samples while preserving competitive performance. TGE-PS uses Text-driven Graph Embedding (TGE), an inductive graph embedding approach, to generate node embeddings from texts. Since each node contains rich texts, TGE is able to generate high-quality embeddings and provide reasonable predictions on existence of links to unseen nodes. We evaluate TGE-PS on several real-world datasets, and experiment results demonstrate that TGE-PS produces state-of-the-art results on both traditional and zero-shot link prediction tasks.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"239 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79302356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sensitivity Analysis of Centralities on Unweighted Networks 非加权网络中心性的敏感性分析

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313422

Shogo Murai, Yuichi Yoshida

Revealing important vertices is a fundamental task in network analysis. As such, many indicators have been proposed for doing so, which are collectively called centralities. However, the abundance of studies on centralities blurs their differences. In this work, we compare centralities based on their sensivitity to modifications in the graph. Specifically, we introduce a quantitative measure called (average-case) edge sensitivity, which measures how much the centrality value of a uniformly chosen vertex (or an edge) changes when we remove a uniformly chosen edge. Edge sensitivity is applicable to unweighted graphs, regarding which, to our knowledge, there has been no theoretical analysis of the centralities. We conducted a theoretical analysis of the edge sensitivities of six major centralities: the closeness centrality, harmonic centrality, betweenness centrality, endpoint betweenness centrality, PageRank, and spanning tree centrality. Our experimental results on synthetic and real graphs confirm the tendency predicted by the theoretical analysis. We also discuss an extension of edge sensitivity to the setting that we remove a uniformly chosen set of edges of size k for an integer k = 1.

揭示重要的顶点是网络分析的一项基本任务。因此，为此提出了许多指标，这些指标统称为中心性。然而，大量关于中心性的研究模糊了它们之间的差异。在这项工作中，我们根据中心性对图中修改的敏感性来比较中心性。具体来说，我们引入了一种称为(平均情况下)边缘灵敏度的定量度量，它测量了当我们删除均匀选择的边缘时，均匀选择的顶点(或边缘)的中心性值的变化程度。边缘灵敏度适用于未加权的图，据我们所知，还没有对中心性的理论分析。我们对六种主要中心性的边缘敏感性进行了理论分析:接近中心性、调和中心性、中间中心性、端点中间中心性、PageRank和生成树中心性。我们在合成图和真实图上的实验结果证实了理论分析预测的趋势。对于整数k = 1，我们也讨论了边灵敏度的扩展，即我们删除一个大小为k的统一选择的边集。

{"title":"Sensitivity Analysis of Centralities on Unweighted Networks","authors":"Shogo Murai, Yuichi Yoshida","doi":"10.1145/3308558.3313422","DOIUrl":"https://doi.org/10.1145/3308558.3313422","url":null,"abstract":"Revealing important vertices is a fundamental task in network analysis. As such, many indicators have been proposed for doing so, which are collectively called centralities. However, the abundance of studies on centralities blurs their differences. In this work, we compare centralities based on their sensivitity to modifications in the graph. Specifically, we introduce a quantitative measure called (average-case) edge sensitivity, which measures how much the centrality value of a uniformly chosen vertex (or an edge) changes when we remove a uniformly chosen edge. Edge sensitivity is applicable to unweighted graphs, regarding which, to our knowledge, there has been no theoretical analysis of the centralities. We conducted a theoretical analysis of the edge sensitivities of six major centralities: the closeness centrality, harmonic centrality, betweenness centrality, endpoint betweenness centrality, PageRank, and spanning tree centrality. Our experimental results on synthetic and real graphs confirm the tendency predicted by the theoretical analysis. We also discuss an extension of edge sensitivity to the setting that we remove a uniformly chosen set of edges of size k for an integer k = 1.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78573560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Large-Scale Talent Flow Forecast with Dynamic Latent Factor Model? 基于动态潜在因素模型的大规模人才流动预测

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313525

Le Zhang, Hengshu Zhu, Tong Xu, Chen Zhu, Chuan Qin, Hui Xiong, Enhong Chen

The understanding of talent flow is critical for sharpening company talent strategy to keep competitiveness in the current fast-evolving environment. Existing studies on talent flow analysis generally rely on subjective surveys. However, without large-scale quantitative studies, there are limits to deliver fine-grained predictive business insights for better talent management. To this end, in this paper, we aim to introduce a big data-driven approach for predictive talent flow analysis. Specifically, we first construct a time-aware job transition tensor by mining the large-scale job transition records of digital resumes from online professional networks (OPNs), where each entry refers to a fine-grained talent flow rate of a specific job position between two companies. Then, we design a dynamic latent factor based Evolving Tensor Factorization (ETF) model for predicting the future talent flows. In particular, a novel evolving feature by jointly considering the influence of previous talent flows and global market is introduced for modeling the evolving nature of each company. Furthermore, to improve the predictive performance, we also integrate several representative attributes of companies as side information for regulating the model inference. Finally, we conduct extensive experiments on large-scale real-world data for evaluating the model performances. The experimental results clearly validate the effectiveness of our approach compared with state-of-the-art baselines in terms of talent flow forecast. Meanwhile, the results also reveal some interesting findings on the regularity of talent flows, e.g. Facebook becomes more and more attractive for the engineers from Google in 2016.

对人才流动的理解是企业在当前快速发展的环境中制定人才战略以保持竞争力的关键。现有的人才流动分析研究一般依赖于主观调查。然而，如果没有大规模的定量研究，提供细粒度的预测性业务洞察以更好地管理人才就会受到限制。为此，在本文中，我们旨在引入一种大数据驱动的方法来预测人才流动分析。具体来说，我们首先通过挖掘来自在线职业网络(opn)的数字简历的大规模职位转移记录来构建一个时间感知的职位转移张量，其中每个条目代表两个公司之间特定职位的细粒度人才流动率。然后，我们设计了一个基于动态潜在因素的演化张量分解(ETF)模型来预测未来的人才流动。特别地，我们引入了一个新的演化特征，通过联合考虑之前的人才流动和全球市场的影响来建模每个公司的演化性质。此外，为了提高预测性能，我们还整合了公司的几个代表性属性作为调节模型推理的侧信息。最后，我们在大规模的真实世界数据上进行了大量的实验来评估模型的性能。实验结果清楚地验证了我们的方法在人才流动预测方面与最先进的基线相比的有效性。同时，研究结果还揭示了一些有趣的人才流动规律，例如，2016年Facebook对来自谷歌的工程师越来越有吸引力。

{"title":"Large-Scale Talent Flow Forecast with Dynamic Latent Factor Model?","authors":"Le Zhang, Hengshu Zhu, Tong Xu, Chen Zhu, Chuan Qin, Hui Xiong, Enhong Chen","doi":"10.1145/3308558.3313525","DOIUrl":"https://doi.org/10.1145/3308558.3313525","url":null,"abstract":"The understanding of talent flow is critical for sharpening company talent strategy to keep competitiveness in the current fast-evolving environment. Existing studies on talent flow analysis generally rely on subjective surveys. However, without large-scale quantitative studies, there are limits to deliver fine-grained predictive business insights for better talent management. To this end, in this paper, we aim to introduce a big data-driven approach for predictive talent flow analysis. Specifically, we first construct a time-aware job transition tensor by mining the large-scale job transition records of digital resumes from online professional networks (OPNs), where each entry refers to a fine-grained talent flow rate of a specific job position between two companies. Then, we design a dynamic latent factor based Evolving Tensor Factorization (ETF) model for predicting the future talent flows. In particular, a novel evolving feature by jointly considering the influence of previous talent flows and global market is introduced for modeling the evolving nature of each company. Furthermore, to improve the predictive performance, we also integrate several representative attributes of companies as side information for regulating the model inference. Finally, we conduct extensive experiments on large-scale real-world data for evaluating the model performances. The experimental results clearly validate the effectiveness of our approach compared with state-of-the-art baselines in terms of talent flow forecast. Meanwhile, the results also reveal some interesting findings on the regularity of talent flows, e.g. Facebook becomes more and more attractive for the engineers from Google in 2016.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77014634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

What happened? The Spread of Fake News Publisher Content During the 2016 U.S. Presidential Election 发生了什么事?2016年美国总统大选期间假新闻出版商内容的传播

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313721

Ceren Budak

The spread of content produced by fake news publishers was one of the most discussed characteristics of the 2016 U.S. Presidential Election. Yet, little is known about the prevalence and focus of such content, how its prevalence changed over time, and how this prevalence related to important election dynamics. In this paper, we address these questions using tweets that mention the two presidential candidates sampled at the daily level, the news content mentioned in such tweets, and open-ended responses from nationally representative telephone interviews. The results of our analysis highlight various important lessons for news consumers and journalists. We find that (i.) traditional news producers outperformed fake news producers in aggregate, (ii.) the prevalence of content produced by fake news publishers increased over the course of the campaign-particularly among tweets that mentioned Clinton, and (iii.) changes in such prevalence were closely following changes in net Clinton favorability. Turning to content, we (iv.) identify similarities and differences in agenda setting by fake and traditional news media and show that (v.) information individuals most commonly reported to having read, seen or heard about the candidates was more closely aligned with content produced by fake news outlets than traditional news outlets, in particular for information Republican voters retained about Clinton. We also model fake-ness of retained information as a function of demographics characteristics. Implications for platform owners, news consumers, and journalists are discussed.

假新闻出版商制作的内容的传播是2016年美国总统选举中讨论最多的特征之一。然而，人们对这些内容的流行程度和焦点知之甚少，它的流行程度如何随着时间的推移而变化，以及这种流行程度如何与重要的选举动态相关。在本文中，我们使用在日常层面上提到两位总统候选人的推文，这些推文中提到的新闻内容以及来自全国代表性电话采访的开放式回复来解决这些问题。我们的分析结果为新闻消费者和新闻工作者强调了各种重要的教训。我们发现:(i)传统新闻生产者总体上优于假新闻生产者，(ii)假新闻出版商制作的内容的流行度在竞选过程中增加-特别是在提到克林顿的推文中，以及(iii)这种流行度的变化与克林顿的净好感度变化密切相关。谈到内容，我们(iv)识别假新闻媒体和传统新闻媒体在议程设置上的异同，并表明(v)与传统新闻媒体相比，最常被报道阅读、看到或听到的关于候选人的信息与假新闻媒体制作的内容更接近，尤其是共和党选民保留的关于克林顿的信息。我们还将保留信息的虚假程度建模为人口特征的函数。对平台所有者、新闻消费者和记者的影响进行了讨论。

{"title":"What happened? The Spread of Fake News Publisher Content During the 2016 U.S. Presidential Election","authors":"Ceren Budak","doi":"10.1145/3308558.3313721","DOIUrl":"https://doi.org/10.1145/3308558.3313721","url":null,"abstract":"The spread of content produced by fake news publishers was one of the most discussed characteristics of the 2016 U.S. Presidential Election. Yet, little is known about the prevalence and focus of such content, how its prevalence changed over time, and how this prevalence related to important election dynamics. In this paper, we address these questions using tweets that mention the two presidential candidates sampled at the daily level, the news content mentioned in such tweets, and open-ended responses from nationally representative telephone interviews. The results of our analysis highlight various important lessons for news consumers and journalists. We find that (i.) traditional news producers outperformed fake news producers in aggregate, (ii.) the prevalence of content produced by fake news publishers increased over the course of the campaign-particularly among tweets that mentioned Clinton, and (iii.) changes in such prevalence were closely following changes in net Clinton favorability. Turning to content, we (iv.) identify similarities and differences in agenda setting by fake and traditional news media and show that (v.) information individuals most commonly reported to having read, seen or heard about the candidates was more closely aligned with content produced by fake news outlets than traditional news outlets, in particular for information Republican voters retained about Clinton. We also model fake-ness of retained information as a function of demographics characteristics. Implications for platform owners, news consumers, and journalists are discussed.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"223 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73207980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Leveraging Peer Communication to Enhance Crowdsourcing 利用同业沟通加强众包

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313554

Wei Tang, Ming Yin, Chien-Ju Ho

Crowdsourcing has become a popular tool for large-scale data collection where it is often assumed that crowd workers complete the work independently. In this paper, we relax such independence property and explore the usage of peer communication-a kind of direct interactions between workers-in crowdsourcing. In particular, in the crowdsourcing setting with peer communication, a pair of workers are asked to complete the same task together by first generating their initial answers to the task independently and then freely discussing the task with each other and updating their answers after the discussion. We first experimentally examine the effects of peer communication on individual microtasks. Our results conducted on three types of tasks consistently suggest that work quality is significantly improved in tasks with peer communication compared to tasks where workers complete the work independently. We next explore how to utilize peer communication to optimize the requester's total utility while taking into account higher data correlation and higher cost introduced by peer communication. In particular, we model the requester's online decision problem of whether and when to use peer communication in crowdsourcing as a constrained Markov decision process which maximizes the requester's total utility under budget constraints. Our proposed approach is empirically shown to bring higher total utility compared to baseline approaches.

众包已经成为一种流行的大规模数据收集工具，通常假设众包工作者独立完成工作。在本文中，我们放宽了这种独立性，并探讨了在众包中使用对等通信——一种工人之间的直接互动。特别是在具有同伴沟通的众包环境中，要求一对工人一起完成同一个任务，首先独立生成任务的初始答案，然后彼此自由地讨论任务，并在讨论后更新他们的答案。我们首先通过实验检验同伴交流对个体微任务的影响。我们对三种类型的任务进行的研究结果一致表明，与员工独立完成工作的任务相比，有同伴沟通的任务的工作质量显著提高。接下来，我们将探讨如何利用对等通信来优化请求者的总效用，同时考虑到对等通信带来的更高的数据相关性和更高的成本。特别是，我们将请求者是否以及何时在众包中使用对等通信的在线决策问题建模为一个约束马尔可夫决策过程，该决策过程在预算约束下最大化请求者的总效用。经验表明，与基线方法相比，我们提出的方法具有更高的总效用。

{"title":"Leveraging Peer Communication to Enhance Crowdsourcing","authors":"Wei Tang, Ming Yin, Chien-Ju Ho","doi":"10.1145/3308558.3313554","DOIUrl":"https://doi.org/10.1145/3308558.3313554","url":null,"abstract":"Crowdsourcing has become a popular tool for large-scale data collection where it is often assumed that crowd workers complete the work independently. In this paper, we relax such independence property and explore the usage of peer communication-a kind of direct interactions between workers-in crowdsourcing. In particular, in the crowdsourcing setting with peer communication, a pair of workers are asked to complete the same task together by first generating their initial answers to the task independently and then freely discussing the task with each other and updating their answers after the discussion. We first experimentally examine the effects of peer communication on individual microtasks. Our results conducted on three types of tasks consistently suggest that work quality is significantly improved in tasks with peer communication compared to tasks where workers complete the work independently. We next explore how to utilize peer communication to optimize the requester's total utility while taking into account higher data correlation and higher cost introduced by peer communication. In particular, we model the requester's online decision problem of whether and when to use peer communication in crowdsourcing as a constrained Markov decision process which maximizes the requester's total utility under budget constraints. Our proposed approach is empirically shown to bring higher total utility compared to baseline approaches.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73588197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Personalized Online Spell Correction for Personal Search 个人搜索的个性化在线拼写纠正

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313706

Jai Gupta, Zhen Qin, Michael Bendersky, Donald Metzler

Spell correction is a must-have feature for any modern search engine in applications such as web or e-commerce search. Typical spell correction solutions used in production systems consist of large indexed lookup tables based on a global model trained across many users over a large scale web corpus or a query log. For search over personal corpora, such as email, this global solution is not sufficient, as it ignores the user's personal lexicon. Without personalization, global spelling fails to correct tail queries drawn from a user's own, often idiosyncratic, lexicon. Personalization using existing algorithms is difficult due to resource constraints and unavailability of sufficient data to build per-user models. In this work, we propose a simple and effective personalized spell correction solution that augments existing global solutions for search over private corpora. Our event driven spell correction candidate generation method is specifically designed with personalization as the key construct. Our novel spell correction and query completion algorithms do not require complex model training and is highly efficient. The proposed solution has shown over 30% click-through rate gain on affected queries when evaluated against a range of strong commercial personal search baselines - Google's Gmail, Drive, and Calendar search production systems.

拼写校正是任何现代搜索引擎在网络或电子商务搜索等应用程序中必备的功能。在生产系统中使用的典型拼写纠正解决方案由基于全局模型的大型索引查找表组成，该模型在大规模web语料库或查询日志上对许多用户进行了训练。对于个人语料库(如电子邮件)的搜索，这种全局解决方案是不够的，因为它忽略了用户的个人词汇。如果没有个性化，全局拼写就无法纠正从用户自己的(通常是特殊的)词汇中提取的尾部查询。由于资源限制和无法获得足够的数据来构建每个用户模型，使用现有算法进行个性化是困难的。在这项工作中，我们提出了一个简单有效的个性化拼写纠正方案，增加了现有的全局解决方案在私有语料库上的搜索。我们的事件驱动拼写纠错候选生成方法是专门以个性化为关键结构设计的。我们的新拼写校正和查询补全算法不需要复杂的模型训练，而且效率很高。当与一系列强大的商业个人搜索基准(谷歌的Gmail、Drive和Calendar搜索生产系统)进行评估时，所提出的解决方案显示，受影响查询的点击率提高了30%以上。

{"title":"Personalized Online Spell Correction for Personal Search","authors":"Jai Gupta, Zhen Qin, Michael Bendersky, Donald Metzler","doi":"10.1145/3308558.3313706","DOIUrl":"https://doi.org/10.1145/3308558.3313706","url":null,"abstract":"Spell correction is a must-have feature for any modern search engine in applications such as web or e-commerce search. Typical spell correction solutions used in production systems consist of large indexed lookup tables based on a global model trained across many users over a large scale web corpus or a query log. For search over personal corpora, such as email, this global solution is not sufficient, as it ignores the user's personal lexicon. Without personalization, global spelling fails to correct tail queries drawn from a user's own, often idiosyncratic, lexicon. Personalization using existing algorithms is difficult due to resource constraints and unavailability of sufficient data to build per-user models. In this work, we propose a simple and effective personalized spell correction solution that augments existing global solutions for search over private corpora. Our event driven spell correction candidate generation method is specifically designed with personalization as the key construct. Our novel spell correction and query completion algorithms do not require complex model training and is highly efficient. The proposed solution has shown over 30% click-through rate gain on affected queries when evaluated against a range of strong commercial personal search baselines - Google's Gmail, Drive, and Calendar search production systems.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79867354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Local Matching Networks for Engineering Diagram Search 工程图搜索的局部匹配网络

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313500

Zhuyun Dai, Zhen Fan, Hafeezul Rahman, Jamie Callan

Finding diagrams that contain a specific part or a similar part is important in many engineering tasks. In this search task, the query part is expected to match only a small region in a complex image. This paper investigates several local matching networks that explicitly model local region-to-region similarities. Deep convolutional neural networks extract local features and model local matching patterns. Spatial convolution is employed to cross-match local regions at different scale levels, addressing cases where the target part appears at a different scale, position, and/or angle. A gating network automatically learns region importance, removing noise from sparse areas and visual metadata in engineering diagrams. Experimental results show that local matching approaches are more effective for engineering diagram search than global matching approaches. Suppressing unimportant regions via the gating network enhances accuracy. Matching across different scales via spatial convolution substantially improves robustness to scale and rotation changes. A pipelined architecture efficiently searches a large collection of diagrams by using a simple local matching network to identify a small set of candidate images and a more sophisticated network with convolutional cross-scale matching to re-rank candidates.

在许多工程任务中，查找包含特定部件或类似部件的图是很重要的。在这个搜索任务中，期望查询部分只匹配复杂图像中的一个小区域。本文研究了几种明确地模拟局部区域到区域相似性的局部匹配网络。深度卷积神经网络提取局部特征，对局部匹配模式进行建模。空间卷积用于在不同尺度上交叉匹配局部区域，解决目标部分在不同尺度、位置和/或角度出现的情况。门控网络自动学习区域重要性，去除工程图中稀疏区域的噪声和视觉元数据。实验结果表明，局部匹配方法在工程图搜索中比全局匹配方法更有效。通过门控网络抑制不重要的区域，提高了精度。通过空间卷积在不同尺度上进行匹配，大大提高了对尺度和旋转变化的鲁棒性。流水线架构通过使用简单的局部匹配网络来识别少量候选图像，使用更复杂的卷积跨尺度匹配网络来重新排列候选图像，从而有效地搜索大量图表。

{"title":"Local Matching Networks for Engineering Diagram Search","authors":"Zhuyun Dai, Zhen Fan, Hafeezul Rahman, Jamie Callan","doi":"10.1145/3308558.3313500","DOIUrl":"https://doi.org/10.1145/3308558.3313500","url":null,"abstract":"Finding diagrams that contain a specific part or a similar part is important in many engineering tasks. In this search task, the query part is expected to match only a small region in a complex image. This paper investigates several local matching networks that explicitly model local region-to-region similarities. Deep convolutional neural networks extract local features and model local matching patterns. Spatial convolution is employed to cross-match local regions at different scale levels, addressing cases where the target part appears at a different scale, position, and/or angle. A gating network automatically learns region importance, removing noise from sparse areas and visual metadata in engineering diagrams. Experimental results show that local matching approaches are more effective for engineering diagram search than global matching approaches. Suppressing unimportant regions via the gating network enhances accuracy. Matching across different scales via spatial convolution substantially improves robustness to scale and rotation changes. A pipelined architecture efficiently searches a large collection of diagrams by using a simple local matching network to identify a small set of candidate images and a more sophisticated network with convolutional cross-scale matching to re-rank candidates.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80160979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning Task-Specific City Region Partition 学习特定任务的城市区域划分

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313704

Hongjian Wang, P. Jenkins, Hua Wei, Fei Wu, Z. Li

The proliferation of publicly accessible urban data provide new insights on various urban tasks. A frequently used approach is to treat each region as a data sample and build a model over all the regions to observe the correlations between urban features (e.g., demographics) and the target variable (e.g., crime count). To define regions, most existing studies use fixed grids or pre-defined administrative boundaries (e.g., census tracts or community areas). In reality, however, definitions of regions should be different depending on tasks (e.g., regional crime count prediction vs. real estate prices estimation). In this paper, we propose a new problem of task-specific city region partitioning, aiming to find the best partition in a city w.r.t. a given task. We prove this is an NP-hard search problem with no trivial solution. To learn the partition, we first study two variants of Markov Chain Monte Carlo (MCMC). We further propose a reinforcement learning scheme for effective sampling the search space. We conduct experiments on two real datasets in Chicago (i.e., crime count and real estate price) to demonstrate the effectiveness of our proposed method.

可公开获取的城市数据的激增为各种城市任务提供了新的见解。一种常用的方法是将每个区域视为一个数据样本，并在所有区域建立一个模型，以观察城市特征(例如人口统计)与目标变量(例如犯罪计数)之间的相关性。为了确定区域，大多数现有研究使用固定网格或预先确定的行政边界(例如，人口普查区或社区区域)。然而，在现实中，区域的定义应该根据任务而有所不同(例如，区域犯罪数量预测与房地产价格估计)。本文提出了一种新的基于任务的城市区域划分问题，目的是在给定任务的基础上寻找城市区域的最佳划分。我们证明了这是一个没有平凡解的NP-hard搜索问题。为了学习划分，我们首先研究了马尔可夫链蒙特卡罗(MCMC)的两个变体。我们进一步提出了一种有效采样搜索空间的强化学习方案。我们在芝加哥的两个真实数据集(即犯罪计数和房地产价格)上进行了实验，以证明我们提出的方法的有效性。

引用次数: 4