首页 > 最新文献

The World Wide Web Conference最新文献

英文 中文
Deriving User- and Content-specific Rewards for Contextual Bandits 为上下文强盗获取用户和内容特定奖励
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313592
Paolo Dragone, Rishabh Mehrotra, M. Lalmas
Bandit algorithms have gained increased attention in recommender systems, as they provide effective and scalable recommendations. These algorithms use reward functions, usually based on a numeric variable such as click-through rates, as the basis for optimization. On a popular music streaming service, a contextual bandit algorithm is used to decide which content to recommend to users, where the reward function is a binarization of a numeric variable that defines success based on a static threshold of user streaming time: 1 if the user streamed for at least 30 seconds and 0 otherwise. We explore alternative methods to provide a more informed reward function, based on the assumptions that streaming time distribution heavily depends on the type of user and the type of content being streamed. To automatically extract user and content groups from streaming data, we employ ”co-clustering”, an unsupervised learning technique to simultaneously extract clusters of rows and columns from a co-occurrence matrix. The streaming distributions within the co-clusters are then used to define rewards specific to each co-cluster. Our proposed co-clustered based reward functions lead to improvement of over 25% in expected stream rate, compared to the standard binarized rewards.
Bandit算法在推荐系统中获得了越来越多的关注,因为它们提供了有效和可扩展的推荐。这些算法使用奖励函数(通常基于数值变量,如点击率)作为优化的基础。在流行的音乐流媒体服务上,使用上下文盗贼算法来决定向用户推荐哪些内容,其中奖励函数是基于用户流媒体时间的静态阈值定义成功的数字变量的二值化:如果用户流媒体至少30秒,则为1,否则为0。基于流媒体时间分布严重依赖于用户类型和流媒体内容类型的假设,我们探索了提供更明智的奖励功能的替代方法。为了从流数据中自动提取用户和内容组,我们采用了“共聚类”,这是一种无监督学习技术,可以同时从共现矩阵中提取行和列的簇。然后使用协同集群内的流分布来定义特定于每个协同集群的奖励。与标准二值化奖励相比,我们提出的基于共聚类的奖励函数导致预期流率提高25%以上。
{"title":"Deriving User- and Content-specific Rewards for Contextual Bandits","authors":"Paolo Dragone, Rishabh Mehrotra, M. Lalmas","doi":"10.1145/3308558.3313592","DOIUrl":"https://doi.org/10.1145/3308558.3313592","url":null,"abstract":"Bandit algorithms have gained increased attention in recommender systems, as they provide effective and scalable recommendations. These algorithms use reward functions, usually based on a numeric variable such as click-through rates, as the basis for optimization. On a popular music streaming service, a contextual bandit algorithm is used to decide which content to recommend to users, where the reward function is a binarization of a numeric variable that defines success based on a static threshold of user streaming time: 1 if the user streamed for at least 30 seconds and 0 otherwise. We explore alternative methods to provide a more informed reward function, based on the assumptions that streaming time distribution heavily depends on the type of user and the type of content being streamed. To automatically extract user and content groups from streaming data, we employ ”co-clustering”, an unsupervised learning technique to simultaneously extract clusters of rows and columns from a co-occurrence matrix. The streaming distributions within the co-clusters are then used to define rewards specific to each co-cluster. Our proposed co-clustered based reward functions lead to improvement of over 25% in expected stream rate, compared to the standard binarized rewards.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77590749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Learning Binary Hash Codes for Fast Anchor Link Retrieval across Networks 学习二进制哈希码快速锚链接检索跨网络
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313430
Yongqing Wang, Huawei Shen, Jinhua Gao, Xueqi Cheng
Users are usually involved in multiple social networks, without explicit anchor links that reveal the correspondence among different accounts of the same user across networks. Anchor link prediction aims to identify the hidden anchor links, which is a fundamental problem for user profiling, information cascading, and cross-domain recommendation. Although existing methods perform well in the accuracy of anchor link prediction, the pairwise search manners on inferring anchor links suffer from big challenge when being deployed in practical systems. To combat the challenges, in this paper we propose a novel embedding and matching architecture to directly learn binary hash code for each node. Hash codes offer us an efficient index to filter out the candidate node pairs for anchor link prediction. Extensive experiments on synthetic and real world large-scale datasets demonstrate that our proposed method has high time efficiency without loss of competitive prediction accuracy in anchor link prediction.
用户通常参与多个社交网络,没有明确的锚链接来揭示同一用户跨网络的不同帐户之间的对应关系。锚链接预测的目的是识别隐藏的锚链接,这是用户分析、信息级联和跨域推荐的基础问题。虽然现有方法在锚链预测的准确性方面表现良好,但在实际系统中部署时,推断锚链的成对搜索方式面临着很大的挑战。为了应对这些挑战,本文提出了一种新的嵌入和匹配架构来直接学习每个节点的二进制哈希码。哈希码为我们提供了一个有效的索引来过滤出候选节点对以进行锚链接预测。在合成和真实世界大规模数据集上的大量实验表明,该方法具有较高的时间效率,且不会损失锚链预测的竞争预测精度。
{"title":"Learning Binary Hash Codes for Fast Anchor Link Retrieval across Networks","authors":"Yongqing Wang, Huawei Shen, Jinhua Gao, Xueqi Cheng","doi":"10.1145/3308558.3313430","DOIUrl":"https://doi.org/10.1145/3308558.3313430","url":null,"abstract":"Users are usually involved in multiple social networks, without explicit anchor links that reveal the correspondence among different accounts of the same user across networks. Anchor link prediction aims to identify the hidden anchor links, which is a fundamental problem for user profiling, information cascading, and cross-domain recommendation. Although existing methods perform well in the accuracy of anchor link prediction, the pairwise search manners on inferring anchor links suffer from big challenge when being deployed in practical systems. To combat the challenges, in this paper we propose a novel embedding and matching architecture to directly learn binary hash code for each node. Hash codes offer us an efficient index to filter out the candidate node pairs for anchor link prediction. Extensive experiments on synthetic and real world large-scale datasets demonstrate that our proposed method has high time efficiency without loss of competitive prediction accuracy in anchor link prediction.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90390654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network 你的风格你的身份:利用文字和摄影风格在暗网市场中识别毒品贩子
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313537
Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, X. Li, Liang Zhao, C. Shi, Jiabin Wang, Qi Xiong
Due to its anonymity, there has been a dramatic growth of underground drug markets hosted in the darknet (e.g., Dream Market and Valhalla). To combat drug trafficking (a.k.a. illicit drug trading) in the cyberspace, there is an urgent need for automatic analysis of participants in darknet markets. However, one of the key challenges is that drug traffickers (i.e., vendors) may maintain multiple accounts across different markets or within the same market. To address this issue, in this paper, we propose and develop an intelligent system named uStyle-uID leveraging both writing and photography styles for drug trafficker identification at the first attempt. At the core of uStyle-uID is an attributed heterogeneous information network (AHIN) which elegantly integrates both writing and photography styles along with the text and photo contents, as well as other supporting attributes (i.e., trafficker and drug information) and various kinds of relations. Built on the constructed AHIN, to efficiently measure the relatedness over nodes (i.e., traffickers) in the constructed AHIN, we propose a new network embedding model Vendor2Vec to learn the low-dimensional representations for the nodes in AHIN, which leverages complementary attribute information attached in the nodes to guide the meta-path based random walk for path instances sampling. After that, we devise a learning model named vIdentifier to classify if a given pair of traffickers are the same individual. Comprehensive experiments on the data collections from four different darknet markets are conducted to validate the effectiveness of uStyle-uID which integrates our proposed method in drug trafficker identification by comparisons with alternative approaches.
由于其匿名性,在暗网上举办的地下毒品市场(如梦幻市场和英灵殿)急剧增长。为了打击网络空间的毒品贩运(又称非法毒品交易),迫切需要对暗网市场的参与者进行自动分析。然而,主要挑战之一是毒品贩运者(即卖主)可能在不同市场或同一市场内拥有多个帐户。为了解决这个问题,在本文中,我们提出并开发了一个名为uStyle-uID的智能系统,利用写作和摄影风格在第一次尝试中识别毒贩。uStyle-uID的核心是一个属性异构信息网络(AHIN),它优雅地整合了写作和摄影风格以及文字和照片内容,以及其他支持属性(如贩运者和毒品信息)和各种关系。在构建AHIN的基础上,为了有效地度量AHIN中节点(即贩运者)之间的相关性,我们提出了一种新的网络嵌入模型Vendor2Vec来学习AHIN中节点的低维表示,该模型利用节点附加的互补属性信息来指导基于元路径的随机行走进行路径实例采样。在此之后,我们设计了一个名为“标识符”的学习模型,用于对给定的一对贩运者是否为同一个体进行分类。通过对四个不同暗网市场的数据收集进行综合实验,通过与其他方法的比较,验证了uStyle-uID在毒贩识别中的有效性。
{"title":"Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network","authors":"Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, X. Li, Liang Zhao, C. Shi, Jiabin Wang, Qi Xiong","doi":"10.1145/3308558.3313537","DOIUrl":"https://doi.org/10.1145/3308558.3313537","url":null,"abstract":"Due to its anonymity, there has been a dramatic growth of underground drug markets hosted in the darknet (e.g., Dream Market and Valhalla). To combat drug trafficking (a.k.a. illicit drug trading) in the cyberspace, there is an urgent need for automatic analysis of participants in darknet markets. However, one of the key challenges is that drug traffickers (i.e., vendors) may maintain multiple accounts across different markets or within the same market. To address this issue, in this paper, we propose and develop an intelligent system named uStyle-uID leveraging both writing and photography styles for drug trafficker identification at the first attempt. At the core of uStyle-uID is an attributed heterogeneous information network (AHIN) which elegantly integrates both writing and photography styles along with the text and photo contents, as well as other supporting attributes (i.e., trafficker and drug information) and various kinds of relations. Built on the constructed AHIN, to efficiently measure the relatedness over nodes (i.e., traffickers) in the constructed AHIN, we propose a new network embedding model Vendor2Vec to learn the low-dimensional representations for the nodes in AHIN, which leverages complementary attribute information attached in the nodes to guide the meta-path based random walk for path instances sampling. After that, we devise a learning model named vIdentifier to classify if a given pair of traffickers are the same individual. Comprehensive experiments on the data collections from four different darknet markets are conducted to validate the effectiveness of uStyle-uID which integrates our proposed method in drug trafficker identification by comparisons with alternative approaches.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86016260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Leveraging Peer Communication to Enhance Crowdsourcing 利用同业沟通加强众包
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313554
Wei Tang, Ming Yin, Chien-Ju Ho
Crowdsourcing has become a popular tool for large-scale data collection where it is often assumed that crowd workers complete the work independently. In this paper, we relax such independence property and explore the usage of peer communication-a kind of direct interactions between workers-in crowdsourcing. In particular, in the crowdsourcing setting with peer communication, a pair of workers are asked to complete the same task together by first generating their initial answers to the task independently and then freely discussing the task with each other and updating their answers after the discussion. We first experimentally examine the effects of peer communication on individual microtasks. Our results conducted on three types of tasks consistently suggest that work quality is significantly improved in tasks with peer communication compared to tasks where workers complete the work independently. We next explore how to utilize peer communication to optimize the requester's total utility while taking into account higher data correlation and higher cost introduced by peer communication. In particular, we model the requester's online decision problem of whether and when to use peer communication in crowdsourcing as a constrained Markov decision process which maximizes the requester's total utility under budget constraints. Our proposed approach is empirically shown to bring higher total utility compared to baseline approaches.
众包已经成为一种流行的大规模数据收集工具,通常假设众包工作者独立完成工作。在本文中,我们放宽了这种独立性,并探讨了在众包中使用对等通信——一种工人之间的直接互动。特别是在具有同伴沟通的众包环境中,要求一对工人一起完成同一个任务,首先独立生成任务的初始答案,然后彼此自由地讨论任务,并在讨论后更新他们的答案。我们首先通过实验检验同伴交流对个体微任务的影响。我们对三种类型的任务进行的研究结果一致表明,与员工独立完成工作的任务相比,有同伴沟通的任务的工作质量显著提高。接下来,我们将探讨如何利用对等通信来优化请求者的总效用,同时考虑到对等通信带来的更高的数据相关性和更高的成本。特别是,我们将请求者是否以及何时在众包中使用对等通信的在线决策问题建模为一个约束马尔可夫决策过程,该决策过程在预算约束下最大化请求者的总效用。经验表明,与基线方法相比,我们提出的方法具有更高的总效用。
{"title":"Leveraging Peer Communication to Enhance Crowdsourcing","authors":"Wei Tang, Ming Yin, Chien-Ju Ho","doi":"10.1145/3308558.3313554","DOIUrl":"https://doi.org/10.1145/3308558.3313554","url":null,"abstract":"Crowdsourcing has become a popular tool for large-scale data collection where it is often assumed that crowd workers complete the work independently. In this paper, we relax such independence property and explore the usage of peer communication-a kind of direct interactions between workers-in crowdsourcing. In particular, in the crowdsourcing setting with peer communication, a pair of workers are asked to complete the same task together by first generating their initial answers to the task independently and then freely discussing the task with each other and updating their answers after the discussion. We first experimentally examine the effects of peer communication on individual microtasks. Our results conducted on three types of tasks consistently suggest that work quality is significantly improved in tasks with peer communication compared to tasks where workers complete the work independently. We next explore how to utilize peer communication to optimize the requester's total utility while taking into account higher data correlation and higher cost introduced by peer communication. In particular, we model the requester's online decision problem of whether and when to use peer communication in crowdsourcing as a constrained Markov decision process which maximizes the requester's total utility under budget constraints. Our proposed approach is empirically shown to bring higher total utility compared to baseline approaches.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73588197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
What happened? The Spread of Fake News Publisher Content During the 2016 U.S. Presidential Election 发生了什么事?2016年美国总统大选期间假新闻出版商内容的传播
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313721
Ceren Budak
The spread of content produced by fake news publishers was one of the most discussed characteristics of the 2016 U.S. Presidential Election. Yet, little is known about the prevalence and focus of such content, how its prevalence changed over time, and how this prevalence related to important election dynamics. In this paper, we address these questions using tweets that mention the two presidential candidates sampled at the daily level, the news content mentioned in such tweets, and open-ended responses from nationally representative telephone interviews. The results of our analysis highlight various important lessons for news consumers and journalists. We find that (i.) traditional news producers outperformed fake news producers in aggregate, (ii.) the prevalence of content produced by fake news publishers increased over the course of the campaign-particularly among tweets that mentioned Clinton, and (iii.) changes in such prevalence were closely following changes in net Clinton favorability. Turning to content, we (iv.) identify similarities and differences in agenda setting by fake and traditional news media and show that (v.) information individuals most commonly reported to having read, seen or heard about the candidates was more closely aligned with content produced by fake news outlets than traditional news outlets, in particular for information Republican voters retained about Clinton. We also model fake-ness of retained information as a function of demographics characteristics. Implications for platform owners, news consumers, and journalists are discussed.
假新闻出版商制作的内容的传播是2016年美国总统选举中讨论最多的特征之一。然而,人们对这些内容的流行程度和焦点知之甚少,它的流行程度如何随着时间的推移而变化,以及这种流行程度如何与重要的选举动态相关。在本文中,我们使用在日常层面上提到两位总统候选人的推文,这些推文中提到的新闻内容以及来自全国代表性电话采访的开放式回复来解决这些问题。我们的分析结果为新闻消费者和新闻工作者强调了各种重要的教训。我们发现:(i)传统新闻生产者总体上优于假新闻生产者,(ii)假新闻出版商制作的内容的流行度在竞选过程中增加-特别是在提到克林顿的推文中,以及(iii)这种流行度的变化与克林顿的净好感度变化密切相关。谈到内容,我们(iv)识别假新闻媒体和传统新闻媒体在议程设置上的异同,并表明(v)与传统新闻媒体相比,最常被报道阅读、看到或听到的关于候选人的信息与假新闻媒体制作的内容更接近,尤其是共和党选民保留的关于克林顿的信息。我们还将保留信息的虚假程度建模为人口特征的函数。对平台所有者、新闻消费者和记者的影响进行了讨论。
{"title":"What happened? The Spread of Fake News Publisher Content During the 2016 U.S. Presidential Election","authors":"Ceren Budak","doi":"10.1145/3308558.3313721","DOIUrl":"https://doi.org/10.1145/3308558.3313721","url":null,"abstract":"The spread of content produced by fake news publishers was one of the most discussed characteristics of the 2016 U.S. Presidential Election. Yet, little is known about the prevalence and focus of such content, how its prevalence changed over time, and how this prevalence related to important election dynamics. In this paper, we address these questions using tweets that mention the two presidential candidates sampled at the daily level, the news content mentioned in such tweets, and open-ended responses from nationally representative telephone interviews. The results of our analysis highlight various important lessons for news consumers and journalists. We find that (i.) traditional news producers outperformed fake news producers in aggregate, (ii.) the prevalence of content produced by fake news publishers increased over the course of the campaign-particularly among tweets that mentioned Clinton, and (iii.) changes in such prevalence were closely following changes in net Clinton favorability. Turning to content, we (iv.) identify similarities and differences in agenda setting by fake and traditional news media and show that (v.) information individuals most commonly reported to having read, seen or heard about the candidates was more closely aligned with content produced by fake news outlets than traditional news outlets, in particular for information Republican voters retained about Clinton. We also model fake-ness of retained information as a function of demographics characteristics. Implications for platform owners, news consumers, and journalists are discussed.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"223 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73207980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Tortoise or Hare? Quantifying the Effects of Performance on Mobile App Retention 乌龟还是兔子?量化性能对手机应用留存率的影响
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313428
Agustin Zuniga, Huber Flores, Eemil Lagerspetz, P. Nurmi, S. Tarkoma, P. Hui, J. Manner
We contribute by quantifying the effect of network latency and battery consumption on mobile app performance and retention, i.e., user's decisions to continue or stop using apps. We perform our analysis by fusing two large-scale crowdsensed datasets collected by piggybacking on information captured by mobile apps. We find that app performance has an impact in its retention rate. Our results demonstrate that high energy consumption and high latency decrease the likelihood of retaining an app. Conversely, we show that reducing latency or energy consumption does not guarantee higher likelihood of retention as long as they are within reasonable standards of performance. However, we also demonstrate that what is considered reasonable depends on what users have been accustomed to, with device and network characteristics, and app category playing a role. As our second contribution, we develop a model for predicting retention based on performance metrics. We demonstrate the benefits of our model through empirical benchmarks which show that our model not only predicts retention accurately, but generalizes well across application categories, locations and other factors moderating the effect of performance.
我们通过量化网络延迟和电池消耗对移动应用性能和留存率的影响来做出贡献,即用户继续或停止使用应用的决定。我们通过融合两个大规模的众感数据集来进行分析,这些数据集是通过移动应用程序捕获的信息收集而来的。我们发现应用性能会影响其留存率。我们的研究结果表明,高能量消耗和高延迟降低了保留应用程序的可能性。相反,我们表明,只要在合理的性能标准内,减少延迟或能量消耗并不能保证更高的保留可能性。然而,我们也证明了什么是合理的取决于用户已经习惯了什么,设备和网络特征,以及应用类别发挥作用。作为我们的第二个贡献,我们开发了一个基于绩效指标预测留存率的模型。我们通过经验基准证明了我们模型的优势,这表明我们的模型不仅可以准确地预测留存率,而且可以很好地概括应用类别、位置和其他调节性能影响的因素。
{"title":"Tortoise or Hare? Quantifying the Effects of Performance on Mobile App Retention","authors":"Agustin Zuniga, Huber Flores, Eemil Lagerspetz, P. Nurmi, S. Tarkoma, P. Hui, J. Manner","doi":"10.1145/3308558.3313428","DOIUrl":"https://doi.org/10.1145/3308558.3313428","url":null,"abstract":"We contribute by quantifying the effect of network latency and battery consumption on mobile app performance and retention, i.e., user's decisions to continue or stop using apps. We perform our analysis by fusing two large-scale crowdsensed datasets collected by piggybacking on information captured by mobile apps. We find that app performance has an impact in its retention rate. Our results demonstrate that high energy consumption and high latency decrease the likelihood of retaining an app. Conversely, we show that reducing latency or energy consumption does not guarantee higher likelihood of retention as long as they are within reasonable standards of performance. However, we also demonstrate that what is considered reasonable depends on what users have been accustomed to, with device and network characteristics, and app category playing a role. As our second contribution, we develop a model for predicting retention based on performance metrics. We demonstrate the benefits of our model through empirical benchmarks which show that our model not only predicts retention accurately, but generalizes well across application categories, locations and other factors moderating the effect of performance.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74505905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Large-Scale Talent Flow Forecast with Dynamic Latent Factor Model? 基于动态潜在因素模型的大规模人才流动预测
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313525
Le Zhang, Hengshu Zhu, Tong Xu, Chen Zhu, Chuan Qin, Hui Xiong, Enhong Chen
The understanding of talent flow is critical for sharpening company talent strategy to keep competitiveness in the current fast-evolving environment. Existing studies on talent flow analysis generally rely on subjective surveys. However, without large-scale quantitative studies, there are limits to deliver fine-grained predictive business insights for better talent management. To this end, in this paper, we aim to introduce a big data-driven approach for predictive talent flow analysis. Specifically, we first construct a time-aware job transition tensor by mining the large-scale job transition records of digital resumes from online professional networks (OPNs), where each entry refers to a fine-grained talent flow rate of a specific job position between two companies. Then, we design a dynamic latent factor based Evolving Tensor Factorization (ETF) model for predicting the future talent flows. In particular, a novel evolving feature by jointly considering the influence of previous talent flows and global market is introduced for modeling the evolving nature of each company. Furthermore, to improve the predictive performance, we also integrate several representative attributes of companies as side information for regulating the model inference. Finally, we conduct extensive experiments on large-scale real-world data for evaluating the model performances. The experimental results clearly validate the effectiveness of our approach compared with state-of-the-art baselines in terms of talent flow forecast. Meanwhile, the results also reveal some interesting findings on the regularity of talent flows, e.g. Facebook becomes more and more attractive for the engineers from Google in 2016.
对人才流动的理解是企业在当前快速发展的环境中制定人才战略以保持竞争力的关键。现有的人才流动分析研究一般依赖于主观调查。然而,如果没有大规模的定量研究,提供细粒度的预测性业务洞察以更好地管理人才就会受到限制。为此,在本文中,我们旨在引入一种大数据驱动的方法来预测人才流动分析。具体来说,我们首先通过挖掘来自在线职业网络(opn)的数字简历的大规模职位转移记录来构建一个时间感知的职位转移张量,其中每个条目代表两个公司之间特定职位的细粒度人才流动率。然后,我们设计了一个基于动态潜在因素的演化张量分解(ETF)模型来预测未来的人才流动。特别地,我们引入了一个新的演化特征,通过联合考虑之前的人才流动和全球市场的影响来建模每个公司的演化性质。此外,为了提高预测性能,我们还整合了公司的几个代表性属性作为调节模型推理的侧信息。最后,我们在大规模的真实世界数据上进行了大量的实验来评估模型的性能。实验结果清楚地验证了我们的方法在人才流动预测方面与最先进的基线相比的有效性。同时,研究结果还揭示了一些有趣的人才流动规律,例如,2016年Facebook对来自谷歌的工程师越来越有吸引力。
{"title":"Large-Scale Talent Flow Forecast with Dynamic Latent Factor Model?","authors":"Le Zhang, Hengshu Zhu, Tong Xu, Chen Zhu, Chuan Qin, Hui Xiong, Enhong Chen","doi":"10.1145/3308558.3313525","DOIUrl":"https://doi.org/10.1145/3308558.3313525","url":null,"abstract":"The understanding of talent flow is critical for sharpening company talent strategy to keep competitiveness in the current fast-evolving environment. Existing studies on talent flow analysis generally rely on subjective surveys. However, without large-scale quantitative studies, there are limits to deliver fine-grained predictive business insights for better talent management. To this end, in this paper, we aim to introduce a big data-driven approach for predictive talent flow analysis. Specifically, we first construct a time-aware job transition tensor by mining the large-scale job transition records of digital resumes from online professional networks (OPNs), where each entry refers to a fine-grained talent flow rate of a specific job position between two companies. Then, we design a dynamic latent factor based Evolving Tensor Factorization (ETF) model for predicting the future talent flows. In particular, a novel evolving feature by jointly considering the influence of previous talent flows and global market is introduced for modeling the evolving nature of each company. Furthermore, to improve the predictive performance, we also integrate several representative attributes of companies as side information for regulating the model inference. Finally, we conduct extensive experiments on large-scale real-world data for evaluating the model performances. The experimental results clearly validate the effectiveness of our approach compared with state-of-the-art baselines in terms of talent flow forecast. Meanwhile, the results also reveal some interesting findings on the regularity of talent flows, e.g. Facebook becomes more and more attractive for the engineers from Google in 2016.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"103 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77014634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs 网络规模图的无损和有损摘要
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313402
Kijung Shin, A. Ghoting, Myunghwan Kim, Hema Raghavan
Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400 × faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4 × better compression than them.
给定一个分布在多台机器上的太字节规模的图,我们如何用更少的节点和边来总结它,以便我们能够准确地或在错误范围内恢复原始图?由于大规模图无处不在,从网络图到在线社交网络,紧凑地表示图对于有效地存储和处理它们变得非常重要。给定一个图,图摘要的目的是找到它的紧凑表示,包括(a)一个摘要图,其中节点是输入图中不相交的节点集,每条边表示两个集合中所有节点对之间的边;(b)边缘修正,用于精确地或在误差范围内从汇总图恢复输入图。虽然图摘要是一种广泛使用的图压缩技术,并且可以与其他技术相结合,但现有的图摘要算法在输出的速度或紧凑性方面并不令人满意。更重要的是,它们假设输入图足够小,可以装入主存储器。在这项工作中,我们提出了一个快速并行算法SWeG,用于总结具有紧凑表示的图。SWeG不仅是为共享内存设计的,而且还为MapReduce设置设计,以总结太大而无法在主内存中容纳的图形。我们证明了SWeG是(a)快速的:SWeG比提供类似紧凑表示的竞争对手快5400倍,(b)可扩展的:SWeG缩放到具有数百亿条边的图,以及(c)紧凑的:与最先进的压缩方法相结合,SWeG实现了比它们高3.4倍的压缩。
{"title":"SWeG: Lossless and Lossy Summarization of Web-Scale Graphs","authors":"Kijung Shin, A. Ghoting, Myunghwan Kim, Hema Raghavan","doi":"10.1145/3308558.3313402","DOIUrl":"https://doi.org/10.1145/3308558.3313402","url":null,"abstract":"Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400 × faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4 × better compression than them.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78126246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Personalized Online Spell Correction for Personal Search 个人搜索的个性化在线拼写纠正
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313706
Jai Gupta, Zhen Qin, Michael Bendersky, Donald Metzler
Spell correction is a must-have feature for any modern search engine in applications such as web or e-commerce search. Typical spell correction solutions used in production systems consist of large indexed lookup tables based on a global model trained across many users over a large scale web corpus or a query log. For search over personal corpora, such as email, this global solution is not sufficient, as it ignores the user's personal lexicon. Without personalization, global spelling fails to correct tail queries drawn from a user's own, often idiosyncratic, lexicon. Personalization using existing algorithms is difficult due to resource constraints and unavailability of sufficient data to build per-user models. In this work, we propose a simple and effective personalized spell correction solution that augments existing global solutions for search over private corpora. Our event driven spell correction candidate generation method is specifically designed with personalization as the key construct. Our novel spell correction and query completion algorithms do not require complex model training and is highly efficient. The proposed solution has shown over 30% click-through rate gain on affected queries when evaluated against a range of strong commercial personal search baselines - Google's Gmail, Drive, and Calendar search production systems.
拼写校正是任何现代搜索引擎在网络或电子商务搜索等应用程序中必备的功能。在生产系统中使用的典型拼写纠正解决方案由基于全局模型的大型索引查找表组成,该模型在大规模web语料库或查询日志上对许多用户进行了训练。对于个人语料库(如电子邮件)的搜索,这种全局解决方案是不够的,因为它忽略了用户的个人词汇。如果没有个性化,全局拼写就无法纠正从用户自己的(通常是特殊的)词汇中提取的尾部查询。由于资源限制和无法获得足够的数据来构建每个用户模型,使用现有算法进行个性化是困难的。在这项工作中,我们提出了一个简单有效的个性化拼写纠正方案,增加了现有的全局解决方案在私有语料库上的搜索。我们的事件驱动拼写纠错候选生成方法是专门以个性化为关键结构设计的。我们的新拼写校正和查询补全算法不需要复杂的模型训练,而且效率很高。当与一系列强大的商业个人搜索基准(谷歌的Gmail、Drive和Calendar搜索生产系统)进行评估时,所提出的解决方案显示,受影响查询的点击率提高了30%以上。
{"title":"Personalized Online Spell Correction for Personal Search","authors":"Jai Gupta, Zhen Qin, Michael Bendersky, Donald Metzler","doi":"10.1145/3308558.3313706","DOIUrl":"https://doi.org/10.1145/3308558.3313706","url":null,"abstract":"Spell correction is a must-have feature for any modern search engine in applications such as web or e-commerce search. Typical spell correction solutions used in production systems consist of large indexed lookup tables based on a global model trained across many users over a large scale web corpus or a query log. For search over personal corpora, such as email, this global solution is not sufficient, as it ignores the user's personal lexicon. Without personalization, global spelling fails to correct tail queries drawn from a user's own, often idiosyncratic, lexicon. Personalization using existing algorithms is difficult due to resource constraints and unavailability of sufficient data to build per-user models. In this work, we propose a simple and effective personalized spell correction solution that augments existing global solutions for search over private corpora. Our event driven spell correction candidate generation method is specifically designed with personalization as the key construct. Our novel spell correction and query completion algorithms do not require complex model training and is highly efficient. The proposed solution has shown over 30% click-through rate gain on affected queries when evaluated against a range of strong commercial personal search baselines - Google's Gmail, Drive, and Calendar search production systems.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79867354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Local Matching Networks for Engineering Diagram Search 工程图搜索的局部匹配网络
Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313500
Zhuyun Dai, Zhen Fan, Hafeezul Rahman, Jamie Callan
Finding diagrams that contain a specific part or a similar part is important in many engineering tasks. In this search task, the query part is expected to match only a small region in a complex image. This paper investigates several local matching networks that explicitly model local region-to-region similarities. Deep convolutional neural networks extract local features and model local matching patterns. Spatial convolution is employed to cross-match local regions at different scale levels, addressing cases where the target part appears at a different scale, position, and/or angle. A gating network automatically learns region importance, removing noise from sparse areas and visual metadata in engineering diagrams. Experimental results show that local matching approaches are more effective for engineering diagram search than global matching approaches. Suppressing unimportant regions via the gating network enhances accuracy. Matching across different scales via spatial convolution substantially improves robustness to scale and rotation changes. A pipelined architecture efficiently searches a large collection of diagrams by using a simple local matching network to identify a small set of candidate images and a more sophisticated network with convolutional cross-scale matching to re-rank candidates.
在许多工程任务中,查找包含特定部件或类似部件的图是很重要的。在这个搜索任务中,期望查询部分只匹配复杂图像中的一个小区域。本文研究了几种明确地模拟局部区域到区域相似性的局部匹配网络。深度卷积神经网络提取局部特征,对局部匹配模式进行建模。空间卷积用于在不同尺度上交叉匹配局部区域,解决目标部分在不同尺度、位置和/或角度出现的情况。门控网络自动学习区域重要性,去除工程图中稀疏区域的噪声和视觉元数据。实验结果表明,局部匹配方法在工程图搜索中比全局匹配方法更有效。通过门控网络抑制不重要的区域,提高了精度。通过空间卷积在不同尺度上进行匹配,大大提高了对尺度和旋转变化的鲁棒性。流水线架构通过使用简单的局部匹配网络来识别少量候选图像,使用更复杂的卷积跨尺度匹配网络来重新排列候选图像,从而有效地搜索大量图表。
{"title":"Local Matching Networks for Engineering Diagram Search","authors":"Zhuyun Dai, Zhen Fan, Hafeezul Rahman, Jamie Callan","doi":"10.1145/3308558.3313500","DOIUrl":"https://doi.org/10.1145/3308558.3313500","url":null,"abstract":"Finding diagrams that contain a specific part or a similar part is important in many engineering tasks. In this search task, the query part is expected to match only a small region in a complex image. This paper investigates several local matching networks that explicitly model local region-to-region similarities. Deep convolutional neural networks extract local features and model local matching patterns. Spatial convolution is employed to cross-match local regions at different scale levels, addressing cases where the target part appears at a different scale, position, and/or angle. A gating network automatically learns region importance, removing noise from sparse areas and visual metadata in engineering diagrams. Experimental results show that local matching approaches are more effective for engineering diagram search than global matching approaches. Suppressing unimportant regions via the gating network enhances accuracy. Matching across different scales via spatial convolution substantially improves robustness to scale and rotation changes. A pipelined architecture efficiently searches a large collection of diagrams by using a simple local matching network to identify a small set of candidate images and a more sophisticated network with convolutional cross-scale matching to re-rank candidates.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80160979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
The World Wide Web Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1