首页 > 最新文献

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining最新文献

英文 中文
TUBE
Daheng Wang, Tianwen Jiang, N. Chawla, Meng Jiang
identification of presymptomatic NF2 mutation carriers by DNA diagnosis permits improved genetic counselling and clinical management in at-risk subjects. The early detection of VS by gadolinium-enhanced
通过DNA诊断识别症状前NF2突变携带者,可以改善高危受试者的遗传咨询和临床管理。钆增强VS的早期检测
{"title":"TUBE","authors":"Daheng Wang, Tianwen Jiang, N. Chawla, Meng Jiang","doi":"10.1145/3292500.3330867","DOIUrl":"https://doi.org/10.1145/3292500.3330867","url":null,"abstract":"identification of presymptomatic NF2 mutation carriers by DNA diagnosis permits improved genetic counselling and clinical management in at-risk subjects. The early detection of VS by gadolinium-enhanced","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132293455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
AKUPM
Xiaoli Tang, Tengyun Wang, Haizhi Yang, Hengjie Song
Recently, much attention has been paid to the usage of knowledge graph within the context of recommender systems to alleviate the data sparsity and cold-start problems. However, when incorporating entities from a knowledge graph to represent users, most existing works are unaware of the relationships between these entities and users. As a result, the recommendation results may suffer a lot from some unrelated entities. In this paper, we investigate how to explore these relationships which are essentially determined by the interactions among entities. Firstly, we categorize the interactions among entities into two types: inter-entity-interaction and intra-entity-interaction. Inter-entity-interaction is the interactions among entities that affect their importances to represent users. And intra-entity-interaction is the interactions within an entity that describe the different characteristics of this entity when involved in different relations. Then, considering these two types of interactions, we propose a novel model named Attention-enhanced Knowledge-aware User Preference Model (AKUPM) for click-through rate (CTR) prediction. More specifically, a self-attention network is utilized to capture the inter-entity-interaction by learning appropriate importance of each entity w.r.t the user. Moreover, the intra-entity-interaction is modeled by projecting each entity into its connected relation spaces to obtain the suitable characteristics. By doing so, AKUPM is able to figure out the most related part of incorporated entities (i.e., filter out the unrelated entities). Extensive experiments on two real-world public datasets demonstrate that AKUPM achieves substantial gains in terms of common evaluation metrics (e.g., AUC, ACC and Recall@top-K) over several state-of-the-art baselines.
{"title":"AKUPM","authors":"Xiaoli Tang, Tengyun Wang, Haizhi Yang, Hengjie Song","doi":"10.1145/3292500.3330705","DOIUrl":"https://doi.org/10.1145/3292500.3330705","url":null,"abstract":"Recently, much attention has been paid to the usage of knowledge graph within the context of recommender systems to alleviate the data sparsity and cold-start problems. However, when incorporating entities from a knowledge graph to represent users, most existing works are unaware of the relationships between these entities and users. As a result, the recommendation results may suffer a lot from some unrelated entities. In this paper, we investigate how to explore these relationships which are essentially determined by the interactions among entities. Firstly, we categorize the interactions among entities into two types: inter-entity-interaction and intra-entity-interaction. Inter-entity-interaction is the interactions among entities that affect their importances to represent users. And intra-entity-interaction is the interactions within an entity that describe the different characteristics of this entity when involved in different relations. Then, considering these two types of interactions, we propose a novel model named Attention-enhanced Knowledge-aware User Preference Model (AKUPM) for click-through rate (CTR) prediction. More specifically, a self-attention network is utilized to capture the inter-entity-interaction by learning appropriate importance of each entity w.r.t the user. Moreover, the intra-entity-interaction is modeled by projecting each entity into its connected relation spaces to obtain the suitable characteristics. By doing so, AKUPM is able to figure out the most related part of incorporated entities (i.e., filter out the unrelated entities). Extensive experiments on two real-world public datasets demonstrate that AKUPM achieves substantial gains in terms of common evaluation metrics (e.g., AUC, ACC and Recall@top-K) over several state-of-the-art baselines.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115580271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
EpiDeep EpiDeep
B. Adhikari, Xinfeng Xu, Naren Ramakrishnan, B. Prakash
Influenza leads to regular losses of lives annually and requires careful monitoring and control by health organizations. Annual influenza forecasts help policymakers implement effective countermeasures to control both seasonal and pandemic outbreaks. Existing forecasting techniques suffer from problems such as poor forecasting performance, lack of modeling flexibility, data sparsity, and/or lack of intepretability. We propose EpiDeep, a novel deep neural network approach for epidemic forecasting which tackles all of these issues by learning meaningful representations of incidence curves in a continuous feature space and accurately predicting future incidences, peak intensity, peak time, and onset of the upcoming season. We present extensive experiments on forecasting ILI (influenza-like illnesses) in the United States, leveraging multiple metrics to quantify success. Our results demonstrate that EpiDeep is successful at learning meaningful embeddings and, more importantly, that these embeddings evolve as the season progresses. Furthermore, our approach outperforms non-trivial baselines by up to 40%.
{"title":"EpiDeep","authors":"B. Adhikari, Xinfeng Xu, Naren Ramakrishnan, B. Prakash","doi":"10.1145/3292500.3330917","DOIUrl":"https://doi.org/10.1145/3292500.3330917","url":null,"abstract":"Influenza leads to regular losses of lives annually and requires careful monitoring and control by health organizations. Annual influenza forecasts help policymakers implement effective countermeasures to control both seasonal and pandemic outbreaks. Existing forecasting techniques suffer from problems such as poor forecasting performance, lack of modeling flexibility, data sparsity, and/or lack of intepretability. We propose EpiDeep, a novel deep neural network approach for epidemic forecasting which tackles all of these issues by learning meaningful representations of incidence curves in a continuous feature space and accurately predicting future incidences, peak intensity, peak time, and onset of the upcoming season. We present extensive experiments on forecasting ILI (influenza-like illnesses) in the United States, leveraging multiple metrics to quantify success. Our results demonstrate that EpiDeep is successful at learning meaningful embeddings and, more importantly, that these embeddings evolve as the season progresses. Furthermore, our approach outperforms non-trivial baselines by up to 40%.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115665580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Temporal Probabilistic Profiles for Sepsis Prediction in the ICU ICU脓毒症预测的时间概率分布
Eitam Sheetrit, N. Nissim, D. Klimov, Yuval Shahar
Sepsis is a condition caused by the body's overwhelming and life-threatening response to infection, which can lead to tissue damage, organ failure, and finally death. Today, sepsis is one of the leading causes of mortality among populations in intensive care units (ICUs). Sepsis is difficult to predict, diagnose, and treat, as it involves analyzing different sets of multivariate time-series, usually with problems of missing data, different sampling frequencies, and random noise. Here, we propose a new dynamic-behavior-based model, which we call a Temporal Probabilistic proFile (TPF), for classification and prediction tasks of multivariate time series. In the TPF method, the raw, time-stamped data are first abstracted into a series of higher-level, meaningful concepts, which hold over intervals characterizing time periods. We then discover frequently repeating temporal patterns within the data. Using the discovered patterns, we create a probabilistic distribution of the temporal patterns of the overall entity population, of each target class in it, and of each entity. We then exploit TPFs as meta-features to classify the time series of new entities, or to predict their outcome, by measuring their TPF distance, either to the aggregated TPF of each class, or to the individual TPFs of each of the entities, using negative cross entropy. Our experimental results on a large benchmark clinical data set show that TPFs improve sepsis prediction capabilities, and perform better than other machine learning approaches.
败血症是一种由身体对感染的压倒性和危及生命的反应引起的疾病,可能导致组织损伤、器官衰竭,最终导致死亡。今天,脓毒症是重症监护病房(icu)人群死亡的主要原因之一。脓毒症很难预测、诊断和治疗,因为它涉及分析不同的多变量时间序列集,通常存在数据缺失、采样频率不同和随机噪声等问题。在此,我们提出了一种新的基于动态行为的模型,我们称之为时间概率分布(TPF),用于多变量时间序列的分类和预测任务。在TPF方法中,原始的、带有时间戳的数据首先被抽象成一系列高级的、有意义的概念,这些概念保持在表征时间段的间隔内。然后我们发现数据中频繁重复的时间模式。使用发现的模式,我们创建整体实体总体、其中每个目标类和每个实体的时间模式的概率分布。然后,我们利用TPF作为元特征来对新实体的时间序列进行分类,或者通过测量它们的TPF距离来预测它们的结果,或者是到每个类别的总TPF,或者到每个实体的单个TPF,使用负交叉熵。我们在大型基准临床数据集上的实验结果表明,TPFs提高了脓毒症的预测能力,并且比其他机器学习方法表现更好。
{"title":"Temporal Probabilistic Profiles for Sepsis Prediction in the ICU","authors":"Eitam Sheetrit, N. Nissim, D. Klimov, Yuval Shahar","doi":"10.1145/3292500.3330747","DOIUrl":"https://doi.org/10.1145/3292500.3330747","url":null,"abstract":"Sepsis is a condition caused by the body's overwhelming and life-threatening response to infection, which can lead to tissue damage, organ failure, and finally death. Today, sepsis is one of the leading causes of mortality among populations in intensive care units (ICUs). Sepsis is difficult to predict, diagnose, and treat, as it involves analyzing different sets of multivariate time-series, usually with problems of missing data, different sampling frequencies, and random noise. Here, we propose a new dynamic-behavior-based model, which we call a Temporal Probabilistic proFile (TPF), for classification and prediction tasks of multivariate time series. In the TPF method, the raw, time-stamped data are first abstracted into a series of higher-level, meaningful concepts, which hold over intervals characterizing time periods. We then discover frequently repeating temporal patterns within the data. Using the discovered patterns, we create a probabilistic distribution of the temporal patterns of the overall entity population, of each target class in it, and of each entity. We then exploit TPFs as meta-features to classify the time series of new entities, or to predict their outcome, by measuring their TPF distance, either to the aggregated TPF of each class, or to the individual TPFs of each of the entities, using negative cross entropy. Our experimental results on a large benchmark clinical data set show that TPFs improve sepsis prediction capabilities, and perform better than other machine learning approaches.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114084858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
A Severity Score for Retinopathy of Prematurity 早产儿视网膜病变严重程度评分
Peng Tian, Yuan Guo, Jayashree Kalpathy-Cramer, S. Ostmo, J. P. Campbell, M. Chiang, Jennifer G. Dy, Deniz Erdoğmuş, Stratis Ioannidis
Retinopathy of Prematurity (ROP) is a leading cause for childhood blindness worldwide. An automated ROP detection system could significantly improve the chance of a child receiving proper diagnosis and treatment. We propose a means of producing a continuous severity score in an automated fashion, regressed from both (a) diagnostic class labels as well as (b) comparison outcomes. Our generative model combines the two sources, and successfully addresses inherent variability in diagnostic outcomes. In particular, our method exhibits an excellent predictive performance of both diagnostic and comparison outcomes over a broad array of metrics, including AUC, precision, and recall.
早产儿视网膜病变(ROP)是全球儿童失明的主要原因。自动化ROP检测系统可以显著提高儿童接受正确诊断和治疗的机会。我们提出了一种以自动化方式产生连续严重性评分的方法,从(a)诊断类别标签和(b)比较结果进行回归。我们的生成模型结合了这两个来源,并成功地解决了诊断结果的内在变异性。特别是,我们的方法在广泛的指标(包括AUC、精度和召回率)上对诊断和比较结果都表现出出色的预测性能。
{"title":"A Severity Score for Retinopathy of Prematurity","authors":"Peng Tian, Yuan Guo, Jayashree Kalpathy-Cramer, S. Ostmo, J. P. Campbell, M. Chiang, Jennifer G. Dy, Deniz Erdoğmuş, Stratis Ioannidis","doi":"10.1145/3292500.3330713","DOIUrl":"https://doi.org/10.1145/3292500.3330713","url":null,"abstract":"Retinopathy of Prematurity (ROP) is a leading cause for childhood blindness worldwide. An automated ROP detection system could significantly improve the chance of a child receiving proper diagnosis and treatment. We propose a means of producing a continuous severity score in an automated fashion, regressed from both (a) diagnostic class labels as well as (b) comparison outcomes. Our generative model combines the two sources, and successfully addresses inherent variability in diagnostic outcomes. In particular, our method exhibits an excellent predictive performance of both diagnostic and comparison outcomes over a broad array of metrics, including AUC, precision, and recall.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114208987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Chainer: A Deep Learning Framework for Accelerating the Research Cycle Chainer:加速研究周期的深度学习框架
Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, S. Saito, Shuji Suzuki, Kota Uenishi, Brian K. Vogel, Hiroyuki Yamazaki Vincent
Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through Define-by-Run, and also provides add-on packages for state-of-the-art computer vision models as well as distributed training.
神经网络的软件框架在深度学习方法的开发和应用中起着关键作用。在本文中,我们介绍了Chainer框架,它旨在提供一种灵活、直观和高性能的方法来实现研究人员和从业者所需的全方位深度学习模型。Chainer通过CuPy使用图形处理单元(Graphics Processing Units)和熟悉的类似numpy的API提供加速,通过Define-by-Run支持Python中的通用和动态模型,并且还为最先进的计算机视觉模型以及分布式训练提供附加包。
{"title":"Chainer: A Deep Learning Framework for Accelerating the Research Cycle","authors":"Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, S. Saito, Shuji Suzuki, Kota Uenishi, Brian K. Vogel, Hiroyuki Yamazaki Vincent","doi":"10.1145/3292500.3330756","DOIUrl":"https://doi.org/10.1145/3292500.3330756","url":null,"abstract":"Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through Define-by-Run, and also provides add-on packages for state-of-the-art computer vision models as well as distributed training.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115852391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 111
PinText: A Multitask Text Embedding System in Pinterest PinText:一个多任务文本嵌入系统在Pinterest
Jinfeng Zhuang, Yu Liu
Text embedding is a fundamental component for extracting text features in production-level data mining and machine learning systems given textual information is the most ubiqutious signals. However, practitioners often face the tradeoff between effectiveness of underlying embedding algorithms and cost of training and maintaining various embedding results in large-scale applications. In this paper, we propose a multitask text embedding solution called PinText for three major vertical surfaces including homefeed, related pins, and search in Pinterest, which consolidates existing text embedding algorithms into a single solution and produces state-of-the-art performance. Specifically, we learn word level semantic vectors by enforcing that the similarity between positive engagement pairs is larger than the similarity between a randomly sampled background pairs. Based on the learned semantic vectors, we derive embedding vector of a user, a pin, or a search query by simply averaging its word level vectors. In this common compact vector space, we are able to do unified nearest neighbor search with hashing by Hadoop jobs or dockerized images on Kubernetes cluster. Both offline evaluation and online experiments show effectiveness of this PinText system and save storage cost of multiple open-sourced embeddings significantly.
文本嵌入是生产级数据挖掘和机器学习系统中文本特征提取的基本组成部分,因为文本信息是最普遍存在的信号。然而,在大规模应用中,从业者经常面临底层嵌入算法的有效性与训练和维护各种嵌入结果的成本之间的权衡。在本文中,我们提出了一个名为PinText的多任务文本嵌入解决方案,用于Pinterest中的三个主要垂直表面,包括主页提要、相关引脚和搜索,它将现有的文本嵌入算法整合到一个解决方案中,并产生最先进的性能。具体来说,我们通过强制要求积极参与对之间的相似性大于随机抽样背景对之间的相似性来学习词级语义向量。基于学习到的语义向量,我们通过对用户、pin或搜索查询的词级向量进行简单的平均,得到嵌入向量。在这个通用的压缩向量空间中,我们可以通过Hadoop作业或Kubernetes集群上的dockerized映像进行哈希,从而实现统一的最近邻搜索。离线评估和在线实验均证明了该系统的有效性,并显著节省了多个开源嵌入的存储成本。
{"title":"PinText: A Multitask Text Embedding System in Pinterest","authors":"Jinfeng Zhuang, Yu Liu","doi":"10.1145/3292500.3330671","DOIUrl":"https://doi.org/10.1145/3292500.3330671","url":null,"abstract":"Text embedding is a fundamental component for extracting text features in production-level data mining and machine learning systems given textual information is the most ubiqutious signals. However, practitioners often face the tradeoff between effectiveness of underlying embedding algorithms and cost of training and maintaining various embedding results in large-scale applications. In this paper, we propose a multitask text embedding solution called PinText for three major vertical surfaces including homefeed, related pins, and search in Pinterest, which consolidates existing text embedding algorithms into a single solution and produces state-of-the-art performance. Specifically, we learn word level semantic vectors by enforcing that the similarity between positive engagement pairs is larger than the similarity between a randomly sampled background pairs. Based on the learned semantic vectors, we derive embedding vector of a user, a pin, or a search query by simply averaging its word level vectors. In this common compact vector space, we are able to do unified nearest neighbor search with hashing by Hadoop jobs or dockerized images on Kubernetes cluster. Both offline evaluation and online experiments show effectiveness of this PinText system and save storage cost of multiple open-sourced embeddings significantly.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"665 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116100463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Sequential Anomaly Detection using Inverse Reinforcement Learning 基于逆强化学习的序列异常检测
Min-hwan Oh, G. Iyengar
One of the most interesting application scenarios in anomaly detection is when sequential data are targeted. For example, in a safety-critical environment, it is crucial to have an automatic detection system to screen the streaming data gathered by monitoring sensors and to report abnormal observations if detected in real-time. Oftentimes, stakes are much higher when these potential anomalies are intentional or goal-oriented. We propose an end-to-end framework for sequential anomaly detection using inverse reinforcement learning (IRL), whose objective is to determine the decision-making agent's underlying function which triggers his/her behavior. The proposed method takes the sequence of actions of a target agent (and possibly other meta information) as input. The agent's normal behavior is then understood by the reward function which is inferred via IRL. We use a neural network to represent a reward function. Using a learned reward function, we evaluate whether a new observation from the target agent follows a normal pattern. In order to construct a reliable anomaly detection method and take into consideration the confidence of the predicted anomaly score, we adopt a Bayesian approach for IRL. The empirical study on publicly available real-world data shows that our proposed method is effective in identifying anomalies.
异常检测中最有趣的应用场景之一是以顺序数据为目标。例如,在安全至关重要的环境中,拥有一个自动检测系统至关重要,该系统可以筛选监控传感器收集的流数据,并在检测到异常情况时实时报告。通常,当这些潜在的异常是有意的或以目标为导向时,风险要高得多。我们提出了一个使用逆强化学习(IRL)的端到端顺序异常检测框架,其目标是确定决策代理触发其行为的底层功能。所提出的方法将目标代理(可能还有其他元信息)的动作序列作为输入。然后,通过IRL推断的奖励函数可以理解代理的正常行为。我们用神经网络来表示奖励函数。使用学习的奖励函数,我们评估来自目标代理的新观察是否遵循正常模式。为了构建一种可靠的异常检测方法,并考虑到预测异常评分的置信度,我们对IRL采用贝叶斯方法。对公开可用的实际数据的实证研究表明,我们提出的方法在识别异常方面是有效的。
{"title":"Sequential Anomaly Detection using Inverse Reinforcement Learning","authors":"Min-hwan Oh, G. Iyengar","doi":"10.1145/3292500.3330932","DOIUrl":"https://doi.org/10.1145/3292500.3330932","url":null,"abstract":"One of the most interesting application scenarios in anomaly detection is when sequential data are targeted. For example, in a safety-critical environment, it is crucial to have an automatic detection system to screen the streaming data gathered by monitoring sensors and to report abnormal observations if detected in real-time. Oftentimes, stakes are much higher when these potential anomalies are intentional or goal-oriented. We propose an end-to-end framework for sequential anomaly detection using inverse reinforcement learning (IRL), whose objective is to determine the decision-making agent's underlying function which triggers his/her behavior. The proposed method takes the sequence of actions of a target agent (and possibly other meta information) as input. The agent's normal behavior is then understood by the reward function which is inferred via IRL. We use a neural network to represent a reward function. Using a learned reward function, we evaluate whether a new observation from the target agent follows a normal pattern. In order to construct a reliable anomaly detection method and take into consideration the confidence of the predicted anomaly score, we adopt a Bayesian approach for IRL. The empirical study on publicly available real-world data shows that our proposed method is effective in identifying anomalies.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122070924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Real-time Event Detection on Social Data Streams 基于社交数据流的实时事件检测
Mateusz Fedoryszak, Brent Frederick, V. Rajaram, Changtao Zhong
Social networks are quickly becoming the primary medium for discussing what is happening around real-world events. The information that is generated on social platforms like Twitter can produce rich data streams for immediate insights into ongoing matters and the conversations around them. To tackle the problem of event detection, we model events as a list of clusters of trending entities over time. We describe a real-time system for discovering events that is modular in design and novel in scale and speed: it applies clustering on a large stream with millions of entities per minute and produces a dynamically updated set of events. In order to assess clustering methodologies, we build an evaluation dataset derived from a snapshot of the full Twitter Firehose and propose novel metrics for measuring clustering quality. Through experiments and system profiling, we highlight key results from the offline and online pipelines. Finally, we visualize a high profile event on Twitter to show the importance of modeling the evolution of events, especially those detected from social data streams.
社交网络正迅速成为讨论现实世界事件的主要媒介。在Twitter等社交平台上产生的信息可以产生丰富的数据流,可以立即洞察正在发生的事情和围绕它们的对话。为了解决事件检测问题,我们将事件建模为随时间变化的趋势实体集群列表。我们描述了一个用于发现事件的实时系统,该系统在设计上是模块化的,在规模和速度上是新颖的:它在每分钟有数百万个实体的大流上应用集群,并产生动态更新的事件集。为了评估聚类方法,我们从整个Twitter Firehose的快照中构建了一个评估数据集,并提出了衡量聚类质量的新指标。通过实验和系统分析,我们突出了离线和在线管道的关键结果。最后,我们将Twitter上的一个高调事件可视化,以显示对事件演变建模的重要性,尤其是那些从社交数据流中检测到的事件。
{"title":"Real-time Event Detection on Social Data Streams","authors":"Mateusz Fedoryszak, Brent Frederick, V. Rajaram, Changtao Zhong","doi":"10.1145/3292500.3330689","DOIUrl":"https://doi.org/10.1145/3292500.3330689","url":null,"abstract":"Social networks are quickly becoming the primary medium for discussing what is happening around real-world events. The information that is generated on social platforms like Twitter can produce rich data streams for immediate insights into ongoing matters and the conversations around them. To tackle the problem of event detection, we model events as a list of clusters of trending entities over time. We describe a real-time system for discovering events that is modular in design and novel in scale and speed: it applies clustering on a large stream with millions of entities per minute and produces a dynamically updated set of events. In order to assess clustering methodologies, we build an evaluation dataset derived from a snapshot of the full Twitter Firehose and propose novel metrics for measuring clustering quality. Through experiments and system profiling, we highlight key results from the offline and online pipelines. Finally, we visualize a high profile event on Twitter to show the importance of modeling the evolution of events, especially those detected from social data streams.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125280830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com 150个成功的机器学习模型:Booking.com的6个经验教训
Lucas Bernardi, Themistoklis Mavridis, PabloA . Estevez
Booking.com is the world's largest online travel agent where millions of guests find their accommodation and millions of accommodation providers list their properties including hotels, apartments, bed and breakfasts, guest houses, and more. During the last years we have applied Machine Learning to improve the experience of our customers and our business. While most of the Machine Learning literature focuses on the algorithmic or mathematical aspects of the field, not much has been published about how Machine Learning can deliver meaningful impact in an industrial environment where commercial gains are paramount. We conducted an analysis on about 150 successful customer facing applications of Machine Learning, developed by dozens of teams in Booking.com, exposed to hundreds of millions of users worldwide and validated through rigorous Randomized Controlled Trials. Following the phases of a Machine Learning project we describe our approach, the many challenges we found, and the lessons we learned while scaling up such a complex technology across our organization. Our main conclusion is that an iterative, hypothesis driven process, integrated with other disciplines was fundamental to build 150 successful products enabled by Machine Learning.
Booking.com是世界上最大的在线旅行社,数以百万计的客人在这里找到他们的住宿,数以百万计的住宿供应商列出他们的物业,包括酒店、公寓、住宿加早餐、宾馆等等。在过去的几年里,我们应用机器学习来改善我们的客户和我们的业务体验。虽然大多数机器学习文献都集中在该领域的算法或数学方面,但关于机器学习如何在商业收益至关重要的工业环境中产生有意义的影响的文献并不多。我们对大约150个成功的面向客户的机器学习应用程序进行了分析,这些应用程序由Booking.com的数十个团队开发,面向全球数亿用户,并通过严格的随机对照试验进行了验证。在机器学习项目的各个阶段,我们描述了我们的方法,我们发现的许多挑战,以及我们在整个组织中扩展这种复杂技术时学到的经验教训。我们的主要结论是,一个迭代的、假设驱动的过程,与其他学科相结合,是通过机器学习构建150个成功产品的基础。
{"title":"150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com","authors":"Lucas Bernardi, Themistoklis Mavridis, PabloA . Estevez","doi":"10.1145/3292500.3330744","DOIUrl":"https://doi.org/10.1145/3292500.3330744","url":null,"abstract":"Booking.com is the world's largest online travel agent where millions of guests find their accommodation and millions of accommodation providers list their properties including hotels, apartments, bed and breakfasts, guest houses, and more. During the last years we have applied Machine Learning to improve the experience of our customers and our business. While most of the Machine Learning literature focuses on the algorithmic or mathematical aspects of the field, not much has been published about how Machine Learning can deliver meaningful impact in an industrial environment where commercial gains are paramount. We conducted an analysis on about 150 successful customer facing applications of Machine Learning, developed by dozens of teams in Booking.com, exposed to hundreds of millions of users worldwide and validated through rigorous Randomized Controlled Trials. Following the phases of a Machine Learning project we describe our approach, the many challenges we found, and the lessons we learned while scaling up such a complex technology across our organization. Our main conclusion is that an iterative, hypothesis driven process, integrated with other disciplines was fundamental to build 150 successful products enabled by Machine Learning.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"10 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126034289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
期刊
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1