Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining最新文献_第4页

AutoNE

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330848

Ke Tu, Jianxin Ma, Peng Cui, J. Pei, Wenwu Zhu

Network embedding (NE) aims to embed the nodes of a network into a vector space, and serves as the bridge between machine learning and network data. Despite their widespread success, NE algorithms typically contain a large number of hyperparameters for preserving the various network properties, which must be carefully tuned in order to achieve satisfactory performance. Though automated machine learning (AutoML) has achieved promising results when applied to many types of data such as images and texts, network data poses great challenges to AutoML and remains largely ignored by the literature of AutoML. The biggest obstacle is the massive scale of real-world networks, along with the coupled node relationships that make any straightforward sampling strategy problematic. In this paper, we propose a novel framework, named AutoNE, to automatically optimize the hyperparameters of a NE algorithm on massive networks. In detail, we employ a multi-start random walk strategy to sample several small sub-networks, perform each trial of configuration selection on the sampled sub-network, and design a meta-leaner to transfer the knowledge about optimal hyperparameters from the sub-networks to the original massive network. The transferred meta-knowledge greatly reduces the number of trials required when predicting the optimal hyperparameters for the original network. Extensive experiments demonstrate that our framework can significantly outperform the existing methods, in that it needs less time and fewer trials to find the optimal hyperparameters.

{"title":"AutoNE","authors":"Ke Tu, Jianxin Ma, Peng Cui, J. Pei, Wenwu Zhu","doi":"10.1145/3292500.3330848","DOIUrl":"https://doi.org/10.1145/3292500.3330848","url":null,"abstract":"Network embedding (NE) aims to embed the nodes of a network into a vector space, and serves as the bridge between machine learning and network data. Despite their widespread success, NE algorithms typically contain a large number of hyperparameters for preserving the various network properties, which must be carefully tuned in order to achieve satisfactory performance. Though automated machine learning (AutoML) has achieved promising results when applied to many types of data such as images and texts, network data poses great challenges to AutoML and remains largely ignored by the literature of AutoML. The biggest obstacle is the massive scale of real-world networks, along with the coupled node relationships that make any straightforward sampling strategy problematic. In this paper, we propose a novel framework, named AutoNE, to automatically optimize the hyperparameters of a NE algorithm on massive networks. In detail, we employ a multi-start random walk strategy to sample several small sub-networks, perform each trial of configuration selection on the sampled sub-network, and design a meta-leaner to transfer the knowledge about optimal hyperparameters from the sub-networks to the original massive network. The transferred meta-knowledge greatly reduces the number of trials required when predicting the optimal hyperparameters for the original network. Extensive experiments demonstrate that our framework can significantly outperform the existing methods, in that it needs less time and fewer trials to find the optimal hyperparameters.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115693006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Log2Intent Log2Intent

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330889

Zhiqiang Tao, Sheng Li, Zhaowen Wang, Chen Fang, Longqi Yang, Handong Zhao, Y. Fu

Modeling user behavior from unstructured software log-trace data is critical in providing personalized service (emphe.g., cross-platform recommendation). Existing user modeling approaches cannot well handle the long-term temporal information in log data, or produce semantically meaningful results for interpreting user logs. To address these challenges, we propose a Log2Intent framework for interpretable user modeling in this paper. Log2Intent adopts a deep sequential modeling framework that contains a temporal encoder, a semantic encoder and a log action decoder, and it fully captures the long-term temporal information in user sessions. Moreover, to bridge the semantic gap between log-trace data and human language, a recurrent semantics memory unit (RSMU) is proposed to encode the annotation sentences from an auxiliary software tutorial dataset, and the output of RSMU is fed into the semantic encoder of Log2Intent. Comprehensive experiments on a real-world Photoshop log-trace dataset with an auxiliary Photoshop tutorial dataset demonstrate the effectiveness of the proposed Log2Intent framework over the state-of-the-art log-trace user modeling method in three different tasks, including log annotation retrieval, user interest detection and user next action prediction.

{"title":"Log2Intent","authors":"Zhiqiang Tao, Sheng Li, Zhaowen Wang, Chen Fang, Longqi Yang, Handong Zhao, Y. Fu","doi":"10.1145/3292500.3330889","DOIUrl":"https://doi.org/10.1145/3292500.3330889","url":null,"abstract":"Modeling user behavior from unstructured software log-trace data is critical in providing personalized service (emphe.g., cross-platform recommendation). Existing user modeling approaches cannot well handle the long-term temporal information in log data, or produce semantically meaningful results for interpreting user logs. To address these challenges, we propose a Log2Intent framework for interpretable user modeling in this paper. Log2Intent adopts a deep sequential modeling framework that contains a temporal encoder, a semantic encoder and a log action decoder, and it fully captures the long-term temporal information in user sessions. Moreover, to bridge the semantic gap between log-trace data and human language, a recurrent semantics memory unit (RSMU) is proposed to encode the annotation sentences from an auxiliary software tutorial dataset, and the output of RSMU is fed into the semantic encoder of Log2Intent. Comprehensive experiments on a real-world Photoshop log-trace dataset with an auxiliary Photoshop tutorial dataset demonstrate the effectiveness of the proposed Log2Intent framework over the state-of-the-art log-trace user modeling method in three different tasks, including log annotation retrieval, user interest detection and user next action prediction.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124288249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Dynamical Origins of Distribution Functions 分布函数的动力学起源

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330842

Chengxi Zang, Peng Cui, Wenwu Zhu, Fei Wang

Many real-world problems are time-evolving in nature, such as the progression of diseases, the cascading process when a post is broadcasting in a social network, or the changing of climates. The observational data characterizing these complex problems are usually only available at discrete time stamps, this makes the existing research on analyzing these problems mostly based on a cross-sectional analysis. In this paper, we try to model these time-evolving phenomena by a dynamic system and the data sets observed at different time stamps are probability distribution functions generated by such a dynamic system. We propose a theorem which builds a mathematical relationship between a dynamical system modeled by differential equations and the distribution function (or survival function) of the cross-sectional states of this system. We then develop a survival analysis framework to learn the differential equations of a dynamical system from its cross-sectional states. With such a framework, we are able to capture the continuous-time dynamics of an evolutionary system.We validate our framework on both synthetic and real-world data sets. The experimental results show that our framework is able to discover and capture the generative dynamics of various data distributions accurately. Our study can potentially facilitate scientific discoveries of the unknown dynamics of complex systems in the real world.

许多现实世界的问题本质上都是时间进化的，比如疾病的发展、社交网络上发布帖子时的级联过程，或者气候的变化。表征这些复杂问题的观测数据通常只能在离散的时间戳上获得，这使得现有的分析这些问题的研究大多基于横截面分析。在本文中，我们试图用一个动态系统来模拟这些时间演化现象，在不同时间戳观测到的数据集是由这个动态系统产生的概率分布函数。我们提出了一个定理，它建立了用微分方程建模的动力系统与该系统横截面状态的分布函数(或生存函数)之间的数学关系。然后，我们开发了一个生存分析框架，从其横截面状态学习动力系统的微分方程。有了这样一个框架，我们就能够捕捉到进化系统的连续时间动态。我们在合成数据集和真实数据集上验证我们的框架。实验结果表明，该框架能够准确地发现和捕获各种数据分布的生成动力学。我们的研究可以潜在地促进对现实世界中复杂系统未知动力学的科学发现。

{"title":"Dynamical Origins of Distribution Functions","authors":"Chengxi Zang, Peng Cui, Wenwu Zhu, Fei Wang","doi":"10.1145/3292500.3330842","DOIUrl":"https://doi.org/10.1145/3292500.3330842","url":null,"abstract":"Many real-world problems are time-evolving in nature, such as the progression of diseases, the cascading process when a post is broadcasting in a social network, or the changing of climates. The observational data characterizing these complex problems are usually only available at discrete time stamps, this makes the existing research on analyzing these problems mostly based on a cross-sectional analysis. In this paper, we try to model these time-evolving phenomena by a dynamic system and the data sets observed at different time stamps are probability distribution functions generated by such a dynamic system. We propose a theorem which builds a mathematical relationship between a dynamical system modeled by differential equations and the distribution function (or survival function) of the cross-sectional states of this system. We then develop a survival analysis framework to learn the differential equations of a dynamical system from its cross-sectional states. With such a framework, we are able to capture the continuous-time dynamics of an evolutionary system.We validate our framework on both synthetic and real-world data sets. The experimental results show that our framework is able to discover and capture the generative dynamics of various data distributions accurately. Our study can potentially facilitate scientific discoveries of the unknown dynamics of complex systems in the real world.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114405381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Towards ML Engineering with TensorFlow Extended (TFX) 使用TensorFlow Extended (TFX)实现机器学习工程

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3340408

Konstantinos Katsiapis, Kevin Haas

The discipline of Software Engineering has evolved over the past 5+ decades to good levels of maturity. This maturity is in fact both a blessing and a necessity, since the modern world largely depends on it. At the same time, the popularity of Machine Learning (ML) has been steadily increasing over the past 2+ decades, and over the last decade ML is being increasingly used for both experimentation and production workloads. It is no longer uncommon for ML to power widely used applications and products that are integral parts of our life. Much like what was the case for Software Engineering, the proliferation of use of ML technology necessitates the evolution of the ML discipline from "Coding" to "Engineering". Gus Katsiapis offers a view from the trenches of using and building end-to-end ML platforms, and shares collective knowledge and experience, gothered over more than a decade of applied ML at Google. We hope this helps pave the way towards a world of ML Engineering. Kevin Haas offers an overview of TensorFlow Extended (TFX), the end-to-end machine learning platform for TensorFlow that powers products across all of Alphabet (and beyond). TFX helps effectively manage the end-to-end training and production workflow including model management, versioning, and serving, thereby helping one realize aspects of ML Engineering.

在过去的50多年里，软件工程学科已经发展到成熟的水平。事实上，这种成熟既是一种祝福，也是一种必要，因为现代世界在很大程度上依赖于它。与此同时，机器学习(ML)的普及在过去的20多年里一直在稳步增长，在过去的十年里，机器学习越来越多地用于实验和生产工作负载。ML为广泛使用的应用程序和产品提供动力已不再罕见，这些应用程序和产品已成为我们生活中不可或缺的一部分。就像软件工程的情况一样，机器学习技术的普及需要机器学习学科从“编码”发展到“工程”。Gus Katsiapis从使用和构建端到端机器学习平台的角度提供了一个观点，并分享了b谷歌十多年来应用机器学习的集体知识和经验。我们希望这有助于为ML工程的世界铺平道路。Kevin Haas提供了TensorFlow Extended (TFX)的概述，TensorFlow是TensorFlow的端到端机器学习平台，为所有Alphabet(及其他)的产品提供支持。TFX有助于有效地管理端到端的培训和生产工作流，包括模型管理、版本控制和服务，从而帮助人们实现机器学习工程的各个方面。

{"title":"Towards ML Engineering with TensorFlow Extended (TFX)","authors":"Konstantinos Katsiapis, Kevin Haas","doi":"10.1145/3292500.3340408","DOIUrl":"https://doi.org/10.1145/3292500.3340408","url":null,"abstract":"The discipline of Software Engineering has evolved over the past 5+ decades to good levels of maturity. This maturity is in fact both a blessing and a necessity, since the modern world largely depends on it. At the same time, the popularity of Machine Learning (ML) has been steadily increasing over the past 2+ decades, and over the last decade ML is being increasingly used for both experimentation and production workloads. It is no longer uncommon for ML to power widely used applications and products that are integral parts of our life. Much like what was the case for Software Engineering, the proliferation of use of ML technology necessitates the evolution of the ML discipline from \"Coding\" to \"Engineering\". Gus Katsiapis offers a view from the trenches of using and building end-to-end ML platforms, and shares collective knowledge and experience, gothered over more than a decade of applied ML at Google. We hope this helps pave the way towards a world of ML Engineering. Kevin Haas offers an overview of TensorFlow Extended (TFX), the end-to-end machine learning platform for TensorFlow that powers products across all of Alphabet (and beyond). TFX helps effectively manage the end-to-end training and production workflow including model management, versioning, and serving, thereby helping one realize aspects of ML Engineering.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115009487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Representation Learning Framework for Property Graphs 属性图的表示学习框架

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330948

Yifan Hou, Hongzhi Chen, Changji Li, James Cheng, Ming Yang

Representation learning on graphs, also called graph embedding, has demonstrated its significant impact on a series of machine learning applications such as classification, prediction and recommendation. However, existing work has largely ignored the rich information contained in the properties (or attributes) of both nodes and edges of graphs in modern applications, e.g., those represented by property graphs. To date, most existing graph embedding methods either focus on plain graphs with only the graph topology, or consider properties on nodes only. We propose PGE, a graph representation learning framework that incorporates both node and edge properties into the graph embedding procedure. PGE uses node clustering to assign biases to differentiate neighbors of a node and leverages multiple data-driven matrices to aggregate the property information of neighbors sampled based on a biased strategy. PGE adopts the popular inductive model for neighborhood aggregation. We provide detailed analyses on the efficacy of our method and validate the performance of PGE by showing how PGE achieves better embedding results than the state-of-the-art graph embedding methods on benchmark applications such as node classification and link prediction over real-world datasets.

图上的表示学习，也称为图嵌入，已经证明了它对分类、预测和推荐等一系列机器学习应用的重要影响。然而，现有的工作在很大程度上忽略了现代应用中图的节点和边的属性(或属性)中包含的丰富信息，例如，那些由属性图表示的信息。到目前为止，大多数现有的图嵌入方法要么只关注具有图拓扑的普通图，要么只考虑节点上的属性。我们提出了PGE，这是一个图表示学习框架，它将节点和边缘属性合并到图嵌入过程中。PGE使用节点聚类来分配偏差来区分节点的邻居，并利用多个数据驱动矩阵来聚合基于偏差策略采样的邻居的属性信息。PGE采用流行的归纳模型进行邻域聚合。我们对我们的方法的有效性进行了详细的分析，并通过展示PGE如何在基准应用程序(如节点分类和真实数据集的链接预测)上获得比最先进的图嵌入方法更好的嵌入结果来验证PGE的性能。

{"title":"A Representation Learning Framework for Property Graphs","authors":"Yifan Hou, Hongzhi Chen, Changji Li, James Cheng, Ming Yang","doi":"10.1145/3292500.3330948","DOIUrl":"https://doi.org/10.1145/3292500.3330948","url":null,"abstract":"Representation learning on graphs, also called graph embedding, has demonstrated its significant impact on a series of machine learning applications such as classification, prediction and recommendation. However, existing work has largely ignored the rich information contained in the properties (or attributes) of both nodes and edges of graphs in modern applications, e.g., those represented by property graphs. To date, most existing graph embedding methods either focus on plain graphs with only the graph topology, or consider properties on nodes only. We propose PGE, a graph representation learning framework that incorporates both node and edge properties into the graph embedding procedure. PGE uses node clustering to assign biases to differentiate neighbors of a node and leverages multiple data-driven matrices to aggregate the property information of neighbors sampled based on a biased strategy. PGE adopts the popular inductive model for neighborhood aggregation. We provide detailed analyses on the efficacy of our method and validate the performance of PGE by showing how PGE achieves better embedding results than the state-of-the-art graph embedding methods on benchmark applications such as node classification and link prediction over real-world datasets.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114632436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Learning From Networks: Algorithms, Theory, and Applications 从网络中学习:算法、理论和应用

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3332293

Xiao Huang, Peng Cui, Yuxiao Dong, Jundong Li, Huan Liu, J. Pei, Le Song, Jie Tang, Fei Wang, Hongxia Yang, Wenwu Zhu

Arguably, every entity in this universe is networked in one wayr another. With the prevalence of network data collected, such as social media and biological networks, learning from networks has become an essential task in many applications. It is well recognized that network data is intricate and large-scale, and analytic tasks on network data become more and more sophisticated. In this tutorial, we systematically review the area of learning from networks, including algorithms, theoretical analysis, and illustrative applications. Starting with a quick recollection of the exciting history of the area, we formulate the core technical problems. Then, we introduce the fundamental approaches, that is, the feature selection based approaches and the network embedding based approaches. Next, we extend our discussion to attributed networks, which are popular in practice. Last, we cover the latest hot topic, graph neural based approaches. For each group of approaches, we also survey the associated theoretical analysis and real-world application examples. Our tutorial also inspires a series of open problems and challenges that may lead to future breakthroughs. The authors are productive and seasoned researchers active in this area who represent a nice combination of academia and industry.

可以说，这个宇宙中的每一个实体都以这样或那样的方式相互连接。随着社交媒体和生物网络等网络数据收集的普及，从网络中学习已成为许多应用中的重要任务。众所周知，网络数据是复杂的、大规模的，对网络数据的分析任务也越来越复杂。在本教程中，我们系统地回顾了从网络中学习的领域，包括算法、理论分析和说明性应用。从快速回顾该地区令人兴奋的历史开始，我们制定了核心技术问题。然后介绍了基于特征选择的方法和基于网络嵌入的方法。接下来，我们将讨论扩展到在实践中很流行的属性网络。最后，我们介绍了最新的热门话题，基于图神经的方法。对于每组方法，我们还调查了相关的理论分析和实际应用示例。我们的教程还激发了一系列开放的问题和挑战，这些问题和挑战可能导致未来的突破。作者是活跃在这一领域的富有成效和经验丰富的研究人员，他们代表了学术界和工业界的良好结合。

{"title":"Learning From Networks: Algorithms, Theory, and Applications","authors":"Xiao Huang, Peng Cui, Yuxiao Dong, Jundong Li, Huan Liu, J. Pei, Le Song, Jie Tang, Fei Wang, Hongxia Yang, Wenwu Zhu","doi":"10.1145/3292500.3332293","DOIUrl":"https://doi.org/10.1145/3292500.3332293","url":null,"abstract":"Arguably, every entity in this universe is networked in one wayr another. With the prevalence of network data collected, such as social media and biological networks, learning from networks has become an essential task in many applications. It is well recognized that network data is intricate and large-scale, and analytic tasks on network data become more and more sophisticated. In this tutorial, we systematically review the area of learning from networks, including algorithms, theoretical analysis, and illustrative applications. Starting with a quick recollection of the exciting history of the area, we formulate the core technical problems. Then, we introduce the fundamental approaches, that is, the feature selection based approaches and the network embedding based approaches. Next, we extend our discussion to attributed networks, which are popular in practice. Last, we cover the latest hot topic, graph neural based approaches. For each group of approaches, we also survey the associated theoretical analysis and real-world application examples. Our tutorial also inspires a series of open problems and challenges that may lead to future breakthroughs. The authors are productive and seasoned researchers active in this area who represent a nice combination of academia and industry.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116918795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Blending Noisy Social Media Signals with Traditional Movement Variables to Predict Forced Migration 混合嘈杂的社交媒体信号和传统的移动变量来预测被迫迁移

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330774

L. Singh, Laila Wahedi, Yanchen Wang, Yifang Wei, Christo Kirov, Susan F. Martin, K. Donato, Yaguang Liu, Kornraphop Kawintiranon

Worldwide displacement due to war and conflict is at all-time high. Unfortunately, determining if, when, and where people will move is a complex problem. This paper proposes integrating both publicly available organic data from social media and newspapers with more traditional indicators of forced migration to determine when and where people will move. We combine movement and organic variables with spatial and temporal variation within different Bayesian models and show the viability of our method using a case study involving displacement in Iraq. Our analysis shows that incorporating open-source generated conversation and event variables maintains or improves predictive accuracy over traditional variables alone. This work is an important step toward understanding how to leverage organic big data for societal--scale problems.

世界范围内由于战争和冲突造成的流离失所达到历史最高水平。不幸的是，决定人们是否、何时、何地迁移是一个复杂的问题。本文建议将来自社交媒体和报纸的公开有机数据与更传统的被迫迁移指标结合起来，以确定人们将在何时何地迁移。我们将运动和有机变量与不同贝叶斯模型中的空间和时间变化结合起来，并通过一个涉及伊拉克流离失所的案例研究显示了我们方法的可行性。我们的分析表明，与传统变量相比，合并开源生成的会话和事件变量保持或提高了预测的准确性。这项工作是理解如何利用有机大数据解决社会规模问题的重要一步。

引用次数: 16

Feedback Shaping: A Modeling Approach to Nurture Content Creation 反馈塑造:培养内容创造的建模方法

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330764

Ye Tu, Chun Lo, Yiping Yuan, S. Chatterjee

Social media platforms bring together content creators and content consumers through recommender systems like newsfeed. The focus of such recommender systems has thus far been primarily on modeling the content consumer preferences and optimizing for their experience. However, it is equally critical to nurture content creation by prioritizing the creators' interests, as quality content forms the seed for sustainable engagement and conversations, bringing in new consumers while retaining existing ones. In this work, we propose a modeling approach to predict how feedback from content consumers incentivizes creators. We then leverage this model to optimize the newsfeed experience for content creators by reshaping the feedback distribution, leading to a more active content ecosystem. Practically, we discuss how we balance the user experience for both consumers and creators, and how we carry out online A/B tests with strong network effects. We present a deployed use case on the LinkedIn newsfeed, where we used this approach to improve content creation significantly without compromising the consumers' experience.

社交媒体平台通过新闻推送等推荐系统将内容创作者和内容消费者聚集在一起。到目前为止，这种推荐系统的重点主要是对消费者的内容偏好进行建模，并优化他们的体验。然而，通过优先考虑创作者的兴趣来培养内容创作也同样重要，因为高质量的内容是持续参与和对话的种子，可以在留住现有用户的同时吸引新用户。在这项工作中，我们提出了一种建模方法来预测内容消费者的反馈如何激励创作者。然后，我们利用这个模型通过重塑反馈分布来优化内容创作者的新闻推送体验，从而形成一个更活跃的内容生态系统。实际上，我们讨论了如何平衡消费者和创造者的用户体验，以及我们如何执行具有强大网络效应的在线A/B测试。我们在LinkedIn新闻feed上展示了一个部署用例，在不影响用户体验的情况下，我们使用这种方法显著改善了内容创建。

{"title":"Feedback Shaping: A Modeling Approach to Nurture Content Creation","authors":"Ye Tu, Chun Lo, Yiping Yuan, S. Chatterjee","doi":"10.1145/3292500.3330764","DOIUrl":"https://doi.org/10.1145/3292500.3330764","url":null,"abstract":"Social media platforms bring together content creators and content consumers through recommender systems like newsfeed. The focus of such recommender systems has thus far been primarily on modeling the content consumer preferences and optimizing for their experience. However, it is equally critical to nurture content creation by prioritizing the creators' interests, as quality content forms the seed for sustainable engagement and conversations, bringing in new consumers while retaining existing ones. In this work, we propose a modeling approach to predict how feedback from content consumers incentivizes creators. We then leverage this model to optimize the newsfeed experience for content creators by reshaping the feedback distribution, leading to a more active content ecosystem. Practically, we discuss how we balance the user experience for both consumers and creators, and how we carry out online A/B tests with strong network effects. We present a deployed use case on the LinkedIn newsfeed, where we used this approach to improve content creation significantly without compromising the consumers' experience.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126164636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Automatic Dialogue Summary Generation for Customer Service 自动对话摘要生成的客户服务

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330683

Chunyi Liu, Peng Wang, Jiang Xu, Zang Li, Jieping Ye

Dialogue summarization extracts useful information from a dialogue. It helps people quickly capture the highlights of a dialogue without going through long and sometimes twisted utterances. For customer service, it saves human resources currently required to write dialogue summaries. A main challenge of dialogue summarization is to design a mechanism to ensure the logic, integrity, and correctness of the summaries. In this paper, we introduce auxiliary key point sequences to solve this problem. A key point sequence describes the logic of the summary. In our training procedure, a key point sequence acts as an auxiliary label. It helps the model learn the logic of the summary. In the prediction procedure, our model predicts the key point sequence first and then uses it to guide the prediction of the summary. Along with the auxiliary key point sequence, we propose a novel Leader-Writer network. The Leader net predicts the key point sequence, and the Writer net predicts the summary based on the decoded key point sequence. The Leader net ensures the summary is logical and integral. The Writer net focuses on generating fluent sentences. We test our model on customer service scenarios. The results show that our model outperforms other models not only on BLEU and ROUGE-L score but also on logic and integrity.

对话摘要从对话中提取有用的信息。它可以帮助人们快速捕捉对话的亮点，而无需经历冗长且有时扭曲的话语。对于客户服务，它节省了目前编写对话摘要所需的人力资源。对话摘要的一个主要挑战是设计一种机制来确保摘要的逻辑性、完整性和正确性。本文引入辅助关键点序列来解决这一问题。关键点序列描述了摘要的逻辑。在我们的训练过程中，一个关键点序列作为辅助标签。它帮助模型学习摘要的逻辑。在预测过程中，我们的模型首先预测关键点序列，然后用它来指导总结的预测。与辅助关键点序列一起，我们提出了一种新的Leader-Writer网络。Leader网预测关键点序列，Writer网根据解码后的关键点序列预测摘要。领导网确保总结是合乎逻辑的和完整的。Writer网侧重于生成流畅的句子。我们在客户服务场景中测试我们的模型。结果表明，我们的模型不仅在BLEU和ROUGE-L评分上优于其他模型，而且在逻辑性和完整性上也优于其他模型。

{"title":"Automatic Dialogue Summary Generation for Customer Service","authors":"Chunyi Liu, Peng Wang, Jiang Xu, Zang Li, Jieping Ye","doi":"10.1145/3292500.3330683","DOIUrl":"https://doi.org/10.1145/3292500.3330683","url":null,"abstract":"Dialogue summarization extracts useful information from a dialogue. It helps people quickly capture the highlights of a dialogue without going through long and sometimes twisted utterances. For customer service, it saves human resources currently required to write dialogue summaries. A main challenge of dialogue summarization is to design a mechanism to ensure the logic, integrity, and correctness of the summaries. In this paper, we introduce auxiliary key point sequences to solve this problem. A key point sequence describes the logic of the summary. In our training procedure, a key point sequence acts as an auxiliary label. It helps the model learn the logic of the summary. In the prediction procedure, our model predicts the key point sequence first and then uses it to guide the prediction of the summary. Along with the auxiliary key point sequence, we propose a novel Leader-Writer network. The Leader net predicts the key point sequence, and the Writer net predicts the summary based on the decoded key point sequence. The Leader net ensures the summary is logical and integral. The Writer net focuses on generating fluent sentences. We test our model on customer service scenarios. The results show that our model outperforms other models not only on BLEU and ROUGE-L score but also on logic and integrity.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129549232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

A Hierarchical Career-Path-Aware Neural Network for Job Mobility Prediction 面向职业流动预测的分层职业路径感知神经网络

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330969

Qingxin Meng, Hengshu Zhu, Keli Xiao, Le Zhang, Hui Xiong

The understanding of job mobility can benefit talent management operations in a number of ways, such as talent recruitment, talent development, and talent retention. While there is extensive literature showing the predictability of the organization-level job mobility patterns (e.g., in terms of the employee turnover rate), there are no effective solutions for supporting the understanding of job mobility at an individual level. To this end, in this paper, we propose a hierarchical career-path-aware neural network for learning individual-level job mobility. Specifically, we aim at answering two questions related to individuals in their career paths: 1) who will be the next employer? 2) how long will the individual work in the new position? Specifically, our model exploits a hierarchical neural network structure with embedded attention mechanism for characterizing the internal and external job mobility. Also, it takes personal profile information into consideration in the learning process. Finally, the extensive results on real-world data show that the proposed model can lead to significant improvements in prediction accuracy for the two aforementioned prediction problems. Moreover, we show that the above two questions are well addressed by our model with a certain level of interpretability. For the case studies, we provide data-driven evidence showing interesting patterns associated with various factors (e.g., job duration, firm type, etc.) in the job mobility prediction process.

对工作流动性的理解可以在人才招聘、人才发展和人才保留等方面有利于人才管理运作。虽然有大量的文献显示了组织层面的工作流动模式的可预测性(例如，在员工流动率方面)，但没有有效的解决方案来支持对个人层面的工作流动的理解。为此，本文提出了一种分层的职业路径感知神经网络来学习个人层面的工作流动性。具体来说，我们的目标是回答与个人职业道路相关的两个问题:1)谁将是下一个雇主?2)个人将在新岗位上工作多久?具体而言，我们的模型利用具有嵌入式注意机制的分层神经网络结构来表征内部和外部的工作流动性。此外，它还在学习过程中考虑了个人资料信息。最后，在实际数据上的大量结果表明，所提出的模型可以显著提高上述两个预测问题的预测精度。此外，我们表明，我们的模型很好地解决了上述两个问题，并具有一定程度的可解释性。在案例研究中，我们提供了数据驱动的证据，显示了在工作流动性预测过程中与各种因素(如工作持续时间、公司类型等)相关的有趣模式。

{"title":"A Hierarchical Career-Path-Aware Neural Network for Job Mobility Prediction","authors":"Qingxin Meng, Hengshu Zhu, Keli Xiao, Le Zhang, Hui Xiong","doi":"10.1145/3292500.3330969","DOIUrl":"https://doi.org/10.1145/3292500.3330969","url":null,"abstract":"The understanding of job mobility can benefit talent management operations in a number of ways, such as talent recruitment, talent development, and talent retention. While there is extensive literature showing the predictability of the organization-level job mobility patterns (e.g., in terms of the employee turnover rate), there are no effective solutions for supporting the understanding of job mobility at an individual level. To this end, in this paper, we propose a hierarchical career-path-aware neural network for learning individual-level job mobility. Specifically, we aim at answering two questions related to individuals in their career paths: 1) who will be the next employer? 2) how long will the individual work in the new position? Specifically, our model exploits a hierarchical neural network structure with embedded attention mechanism for characterizing the internal and external job mobility. Also, it takes personal profile information into consideration in the learning process. Finally, the extensive results on real-world data show that the proposed model can lead to significant improvements in prediction accuracy for the two aforementioned prediction problems. Moreover, we show that the above two questions are well addressed by our model with a certain level of interpretability. For the case studies, we provide data-driven evidence showing interesting patterns associated with various factors (e.g., job duration, firm type, etc.) in the job mobility prediction process.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129600037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43