首页 > 最新文献

2020 International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
Kennard-Stone Balance Algorithm for Time-series Big Data Stream Mining 时间序列大数据流挖掘的Kennard-Stone平衡算法
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00122
Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros
Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.
如今,由于物联网和传感器应用的进步,时间序列的生成相对更容易,数量也比以往任何时候都多。使用这样的大数据流有效地训练预测模型对机器学习提出了一定的挑战。数据采样是预处理中处理超大规模数据的一项重要技术,它将庞大的数据流转换为可管理的、具有代表性的子集,然后将其加载到模型归纳过程中。本文提出了一种新的数据转换方法——Kennard-Stone Balance (KSB)算法。在过去的几十年里,研究人员使用KS将有界数据集划分为交叉验证中训练和测试数据的适当部分。在这个新的建议中,我们将KS扩展到考虑到类分布的轮循来平衡子采样数据。这也是KS首次应用于时间序列,目的是提取大数据流的有意义表示,以提高机器学习模型的性能。初步的仿真结果显示了KBS的优势。本文对其进行了分析、讨论和今后的工作。预计KBS将为数据流挖掘带来一种具有巨大潜力的数据采样新方法。
{"title":"Kennard-Stone Balance Algorithm for Time-series Big Data Stream Mining","authors":"Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros","doi":"10.1109/ICDMW51313.2020.00122","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00122","url":null,"abstract":"Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"28 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116728502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data analysis and processing for spatio-temporal forecasting 时空预测的数据分析与处理
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00106
Hyoungwoo Lee, J. Choo
Spatio-temporal forecasting is a research area applicable to many industrial fields, such as forecasting power consumption in real-life and predicting traffic conditions of roads. For example, in the traffic forecasting, it is important to analyze spatial relations and temporal trends in order to predict traffic changes in roads over time. In the spatio-temporal forecasting task, previous studies applied graph modeling to capture spatial relations. However, existing models use only the recently available data to predict traffic conditions, leading to the degraded performance of the model. Further research is necessary for predicting the speed in the far future. As a study to tackle this issue, we aim to improve the performance of the model by providing the model with additional data through time-series segmentation. In order to verify whether the additional data could be meaningful to the model, an experiment was conducted to compare the performance of the model trained with existing data and the model trained with our data and analyze the distribution of the additional data.
时空预测是一个应用于许多工业领域的研究领域,如现实生活中的电力消耗预测和道路交通状况预测。例如,在交通预测中,为了预测道路随时间的交通变化,分析空间关系和时间趋势是很重要的。在时空预测任务中,以往的研究主要采用图模型来捕捉空间关系。然而,现有的模型仅使用最近可用的数据来预测交通状况,导致模型的性能下降。预测遥远未来的速度需要进一步的研究。作为解决这一问题的研究,我们的目标是通过时间序列分割为模型提供额外的数据来提高模型的性能。为了验证这些附加数据对模型是否有意义,我们进行了实验,比较了用已有数据训练的模型和用我们的数据训练的模型的性能,并分析了附加数据的分布。
{"title":"Data analysis and processing for spatio-temporal forecasting","authors":"Hyoungwoo Lee, J. Choo","doi":"10.1109/ICDMW51313.2020.00106","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00106","url":null,"abstract":"Spatio-temporal forecasting is a research area applicable to many industrial fields, such as forecasting power consumption in real-life and predicting traffic conditions of roads. For example, in the traffic forecasting, it is important to analyze spatial relations and temporal trends in order to predict traffic changes in roads over time. In the spatio-temporal forecasting task, previous studies applied graph modeling to capture spatial relations. However, existing models use only the recently available data to predict traffic conditions, leading to the degraded performance of the model. Further research is necessary for predicting the speed in the far future. As a study to tackle this issue, we aim to improve the performance of the model by providing the model with additional data through time-series segmentation. In order to verify whether the additional data could be meaningful to the model, an experiment was conducted to compare the performance of the model trained with existing data and the model trained with our data and analyze the distribution of the additional data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114760428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
COAL: Convolutional Online Adaptation Learning for Opinion Mining 基于卷积在线适应学习的意见挖掘
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00012
I. Chaturvedi, E. Ragusa, P. Gastaldo, E. Cambria
Thanks to recent advances in machine learning, some say AI is the new engine and data is the new coal. Mining this ‘coal’ from the ever-growing Social Web, however, can be a formidable task. In this work, we address this problem in the context of sentiment analysis using convolutional online adaptation learning (COAL). In particular, we consider semi-supervised learning of convolutional features, which we use to train an online model. Such a model, which can be trained in one domain but also used to predict sentiment in other domains, outperforms the baseline in the range of 5-20%.
由于最近机器学习的进步,有人说人工智能是新的引擎,数据是新的煤炭。然而,从不断增长的社交网络中挖掘这种“煤炭”可能是一项艰巨的任务。在这项工作中,我们使用卷积在线适应学习(COAL)在情感分析的背景下解决了这个问题。特别地,我们考虑卷积特征的半监督学习,我们用它来训练在线模型。这样的模型可以在一个领域进行训练,但也可以用于预测其他领域的情绪,在5-20%的范围内优于基线。
{"title":"COAL: Convolutional Online Adaptation Learning for Opinion Mining","authors":"I. Chaturvedi, E. Ragusa, P. Gastaldo, E. Cambria","doi":"10.1109/ICDMW51313.2020.00012","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00012","url":null,"abstract":"Thanks to recent advances in machine learning, some say AI is the new engine and data is the new coal. Mining this ‘coal’ from the ever-growing Social Web, however, can be a formidable task. In this work, we address this problem in the context of sentiment analysis using convolutional online adaptation learning (COAL). In particular, we consider semi-supervised learning of convolutional features, which we use to train an online model. Such a model, which can be trained in one domain but also used to predict sentiment in other domains, outperforms the baseline in the range of 5-20%.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"40 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134221778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Persistent Homology on Streaming Data 流数据的持久同源性
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00090
Anindya Moitra, Nicholas O. Malott, P. Wilsey
This paper introduces a framework to compute persistent homology, a principal tool in Topological Data Analysis, on potentially unbounded and evolving data streams. The framework is organized into online and offline components. The online element maintains a summary of the data that preserves the topological structure of the stream. The offline component computes the persistence intervals from the data captured by the summary. The framework is applied to the detection of horizontal or reticulate genomic exchanges during the evolution of species that cannot be identified by phylogenetic inference or traditional data mining. The method effectively detects reticulate evolution that occurs through reassortment and recombination in large streams of genomic sequences of Influenza and HIV viruses.
本文介绍了一个计算持久同调的框架,这是拓扑数据分析中的一个主要工具,用于计算潜在无界和不断发展的数据流。该框架被组织为在线和离线组件。online元素维护数据的摘要,该摘要保留了流的拓扑结构。脱机组件根据摘要捕获的数据计算持久性间隔。该框架适用于检测物种进化过程中无法通过系统发育推断或传统数据挖掘识别的水平或网状基因组交换。该方法有效地检测了流感病毒和HIV病毒基因组序列大流中通过重组和重组发生的网状进化。
{"title":"Persistent Homology on Streaming Data","authors":"Anindya Moitra, Nicholas O. Malott, P. Wilsey","doi":"10.1109/ICDMW51313.2020.00090","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00090","url":null,"abstract":"This paper introduces a framework to compute persistent homology, a principal tool in Topological Data Analysis, on potentially unbounded and evolving data streams. The framework is organized into online and offline components. The online element maintains a summary of the data that preserves the topological structure of the stream. The offline component computes the persistence intervals from the data captured by the summary. The framework is applied to the detection of horizontal or reticulate genomic exchanges during the evolution of species that cannot be identified by phylogenetic inference or traditional data mining. The method effectively detects reticulate evolution that occurs through reassortment and recombination in large streams of genomic sequences of Influenza and HIV viruses.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133305413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Predictive Nonlinear Modeling by Koopman Mode Decomposition 基于Koopman模态分解的预测非线性建模
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00118
Akira Kusaba, Kilho Shin, D. Shepard, T. Kuboyama
Machine learning has countless applications in time series analysis: controlling smart grids, detecting mechanical failures, and analyzing stock prices. Fourier mode decomposition (FMD) is the most common method of analysis because it decomposes time series into finite waveform components, or modes, but its principal shortcoming is that FMD assumes every mode has a constant amplitude, an assumption that rarely holds in real-world data. In contrast, Koopman mode decomposition (KMD) can detect modes with exponentially-increasing or - decreasing amplitudes, although it has mostly been applied to diagnosing data errors, not to prediction. What has kept KMD from being applied to prediction is partly a shortcoming in a mathematical formulation. This paper seeks to remedy that shortcoming: it provides a mathematically-precise formulation of KMD as a practical tool. This formulation, in turn, allows us to develop a novel practical method for prediction of future data. We further demonstrate our method's effectiveness using both synthetic data and real plasma flow data.
机器学习在时间序列分析中有无数的应用:控制智能电网、检测机械故障和分析股票价格。傅里叶模态分解(FMD)是最常用的分析方法,因为它将时间序列分解为有限的波形分量或模态,但其主要缺点是FMD假设每个模态具有恒定的振幅,这一假设在实际数据中很少成立。相比之下,库普曼模态分解(KMD)可以检测振幅呈指数增长或指数下降的模态,尽管它主要用于诊断数据错误,而不是用于预测。阻碍KMD应用于预测的部分原因是数学公式中的一个缺陷。本文试图弥补这一缺点:它提供了一个数学上精确的KMD公式作为一个实用的工具。这个公式反过来又使我们能够开发一种新的实用方法来预测未来的数据。利用合成数据和实际等离子体流数据进一步验证了该方法的有效性。
{"title":"Predictive Nonlinear Modeling by Koopman Mode Decomposition","authors":"Akira Kusaba, Kilho Shin, D. Shepard, T. Kuboyama","doi":"10.1109/ICDMW51313.2020.00118","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00118","url":null,"abstract":"Machine learning has countless applications in time series analysis: controlling smart grids, detecting mechanical failures, and analyzing stock prices. Fourier mode decomposition (FMD) is the most common method of analysis because it decomposes time series into finite waveform components, or modes, but its principal shortcoming is that FMD assumes every mode has a constant amplitude, an assumption that rarely holds in real-world data. In contrast, Koopman mode decomposition (KMD) can detect modes with exponentially-increasing or - decreasing amplitudes, although it has mostly been applied to diagnosing data errors, not to prediction. What has kept KMD from being applied to prediction is partly a shortcoming in a mathematical formulation. This paper seeks to remedy that shortcoming: it provides a mathematically-precise formulation of KMD as a practical tool. This formulation, in turn, allows us to develop a novel practical method for prediction of future data. We further demonstrate our method's effectiveness using both synthetic data and real plasma flow data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133973211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive Knowledge Graph Attention Network for Recommender Systems 面向推荐系统的交互式知识图关注网络
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00038
Li Yang, E. Shijia, Shiyao Xu, Yang Xiang
Recent progress in personalized recommendation has shown great potential in exploiting structure information provided by a knowledge graph (KG). As a heterogeneous information network, KG contains rich semantic relatedness among entities, which contributes to addressing notorious issues such as data sparsity and cold start. State-of-the-art KG-based recommendation approaches try to propagate information along KG links to encode long-range connectivities into hidden representations. However, most of them only model the user or item representation independently, lacking a focus on user-item interaction. To this end, we propose the Interactive Knowledge Graph Attention Network (IKGAT), which directly models user-item interaction and high-order structure information within KG. For the user representation, following an interactive attention mechanism, we use the item to attend over the user's neighbors and then propagate their information to update the representation. Such a process is extended to multi-hops away to obtain richer neighborhood information. Similarly, the item representation is updated under the supervision of the user. With that design, IKGAT can capture collaborative signals and user preferences effectively. Experiment results on three public datasets show that IKGAT consistently outperforms the state-of-the-art approaches, especially when the dataset is sparse.
个性化推荐的最新进展表明,利用知识图(KG)提供的结构信息具有很大的潜力。KG作为一个异构信息网络,包含了丰富的实体间语义相关性,有助于解决数据稀疏性和冷启动等问题。最先进的基于KG的推荐方法试图沿着KG链路传播信息,将远程连接编码为隐藏表示。然而,它们中的大多数只对用户或项表示进行独立建模,缺乏对用户-项交互的关注。为此,我们提出了交互式知识图注意网络(IKGAT),该网络直接对KG内的用户-物品交互和高阶结构信息进行建模。对于用户表示,遵循交互式关注机制,我们使用项目来关注用户的邻居,然后传播他们的信息来更新表示。将此过程扩展到多跳,以获得更丰富的邻域信息。类似地,项目表示在用户的监督下更新。通过这种设计,IKGAT可以有效地捕获协作信号和用户偏好。在三个公共数据集上的实验结果表明,IKGAT始终优于最先进的方法,特别是在数据集稀疏的情况下。
{"title":"Interactive Knowledge Graph Attention Network for Recommender Systems","authors":"Li Yang, E. Shijia, Shiyao Xu, Yang Xiang","doi":"10.1109/ICDMW51313.2020.00038","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00038","url":null,"abstract":"Recent progress in personalized recommendation has shown great potential in exploiting structure information provided by a knowledge graph (KG). As a heterogeneous information network, KG contains rich semantic relatedness among entities, which contributes to addressing notorious issues such as data sparsity and cold start. State-of-the-art KG-based recommendation approaches try to propagate information along KG links to encode long-range connectivities into hidden representations. However, most of them only model the user or item representation independently, lacking a focus on user-item interaction. To this end, we propose the Interactive Knowledge Graph Attention Network (IKGAT), which directly models user-item interaction and high-order structure information within KG. For the user representation, following an interactive attention mechanism, we use the item to attend over the user's neighbors and then propagate their information to update the representation. Such a process is extended to multi-hops away to obtain richer neighborhood information. Similarly, the item representation is updated under the supervision of the user. With that design, IKGAT can capture collaborative signals and user preferences effectively. Experiment results on three public datasets show that IKGAT consistently outperforms the state-of-the-art approaches, especially when the dataset is sparse.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133447280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Batch Mode Active Learning for Individual Treatment Effect Estimation 批处理模式主动学习的个体治疗效果估计
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00123
Zoltán Puha, M. Kaptein, A. Lemmens
Field experimentation has become a well-established practice to estimate individual treatment effects. In recent years, the Active Learning (AL) literature has developed methods to optimize the design of field experiments and reduce their cost. In this paper, we propose a novel AL algorithm for individual treatment effect estimation that works in batch mode for cases where the outcomes of an intervention are not immediate. It uniquely combines Expected Model Change Maximization and Bayesian Additive Regression Trees. Our approach (B-EMCMITE) uses the predictive uncertainty around the individual treatment effects to actively sample new units for experimentation and decide which treatment they will receive. We perform extensive simulations and test our approach on semi-synthetic, real-life data. B-EMCMITE outperforms alternative approaches and substantially reduces the number of observations needed to estimate individual treatment effects compared to A/B tests.
现场试验已成为一种公认的评估个别处理效果的做法。近年来,主动学习(AL)文献开发了优化现场实验设计和降低实验成本的方法。在本文中,我们提出了一种新的人工智能算法,用于个体治疗效果估计,该算法在批处理模式下工作,用于干预结果不是立竿见影的情况。它独特地结合了期望模型变化最大化和贝叶斯加性回归树。我们的方法(B-EMCMITE)利用个体治疗效果的预测不确定性,积极取样新单位进行实验,并决定他们将接受哪种治疗。我们进行了大量的模拟,并在半合成的真实数据上测试了我们的方法。与A/B测试相比,B- emcmite优于其他方法,并且大大减少了估计单个治疗效果所需的观察次数。
{"title":"Batch Mode Active Learning for Individual Treatment Effect Estimation","authors":"Zoltán Puha, M. Kaptein, A. Lemmens","doi":"10.1109/ICDMW51313.2020.00123","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00123","url":null,"abstract":"Field experimentation has become a well-established practice to estimate individual treatment effects. In recent years, the Active Learning (AL) literature has developed methods to optimize the design of field experiments and reduce their cost. In this paper, we propose a novel AL algorithm for individual treatment effect estimation that works in batch mode for cases where the outcomes of an intervention are not immediate. It uniquely combines Expected Model Change Maximization and Bayesian Additive Regression Trees. Our approach (B-EMCMITE) uses the predictive uncertainty around the individual treatment effects to actively sample new units for experimentation and decide which treatment they will receive. We perform extensive simulations and test our approach on semi-synthetic, real-life data. B-EMCMITE outperforms alternative approaches and substantially reduces the number of observations needed to estimate individual treatment effects compared to A/B tests.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122132433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Explainable Anomaly Detection for District Heating Based on Shapley Additive Explanations 基于Shapley加性解释的区域供热可解释异常检测
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00111
Sungwoo Park, Jihoon Moon, Eenjun Hwang
One key component in the heat-using facility of district heating systems is the differential pressure control valve. This valve ensures a stable flow of water to the heat exchanger and the temperature control valve. It also makes a stable pressure difference between the supply and return lines. Hence, its malfunctioning could cause significant heat losses and, consequently, economic losses. To avoid this, it is necessary to monitor the abnormal operation of the valve in real-time. Despite various machine learning-based anomaly detection models, their decision is limited in practical use unless the rationale for the decision is appropriately explained. In this paper, we propose a Shapley additive explanation-based explainable anomaly detection scheme that can present the degree of contribution of input variables to the derived result. We report some of the experimental results.
差压控制阀是区域供热系统热利用装置的关键部件之一。这种阀门确保水流稳定地流向热交换器和温度控制阀。它还使供应和回油管之间的压力差稳定。因此,它的故障可能会造成严重的热损失,从而造成经济损失。为了避免这种情况,有必要实时监测阀门的异常运行情况。尽管有各种基于机器学习的异常检测模型,但除非决策的基本原理得到适当解释,否则它们的决策在实际使用中受到限制。在本文中,我们提出了一种基于Shapley加性解释的可解释异常检测方案,该方案可以显示输入变量对导出结果的贡献程度。我们报告一些实验结果。
{"title":"Explainable Anomaly Detection for District Heating Based on Shapley Additive Explanations","authors":"Sungwoo Park, Jihoon Moon, Eenjun Hwang","doi":"10.1109/ICDMW51313.2020.00111","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00111","url":null,"abstract":"One key component in the heat-using facility of district heating systems is the differential pressure control valve. This valve ensures a stable flow of water to the heat exchanger and the temperature control valve. It also makes a stable pressure difference between the supply and return lines. Hence, its malfunctioning could cause significant heat losses and, consequently, economic losses. To avoid this, it is necessary to monitor the abnormal operation of the valve in real-time. Despite various machine learning-based anomaly detection models, their decision is limited in practical use unless the rationale for the decision is appropriately explained. In this paper, we propose a Shapley additive explanation-based explainable anomaly detection scheme that can present the degree of contribution of input variables to the derived result. We report some of the experimental results.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Nonlinear Tensor Completion Using Domain Knowledge: An Application in Analysts' Earnings Forecast 基于领域知识的非线性张量补全:在分析师收益预测中的应用
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00059
Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, Dantong Yu
Financial analysts' earnings forecast is one of the most critical inputs for security valuation and investment decisions. However, it is challenging to utilize such information for two main reasons: missing values and heterogeneity among analysts. In this paper, we show that one recent breakthrough in nonlinear tensor completion algorithm, CoSTCo [1], overcomes the difficulty by imputing missing values and significantly improves the forecast accuracy in earnings. Compared with conventional imputation approaches, CoSTCo effectively captures latent information and reduces the tensor completion errors by 50%, even with 98% missing values. Furthermore, we show that using firm characteristics as auxiliary information we can improve firms' earnings prediction accuracy by 6%. Results are consistent using different performance metrics and across various industry sectors. Notably, the performance improvement is more salient for the sectors with high heterogeneity. Our findings imply the successful application of advanced ML techniques in a real financial problem.
金融分析师的盈利预测是证券估值和投资决策最重要的输入之一。然而,由于两个主要原因,利用这些信息是具有挑战性的:缺失值和分析师之间的异质性。在本文中,我们展示了非线性张量补全算法的最新突破CoSTCo[1],通过输入缺失值克服了这一困难,显著提高了收益的预测精度。与传统的插值方法相比,CoSTCo有效地捕获了潜在信息,即使缺失值高达98%,也能将张量补全误差降低50%。此外,我们发现使用企业特征作为辅助信息可以将企业盈利预测的准确性提高6%。使用不同的性能指标和不同的行业部门,结果是一致的。值得注意的是,对于异质性较高的行业,绩效提升更为显著。我们的发现意味着先进的机器学习技术在实际金融问题中的成功应用。
{"title":"Nonlinear Tensor Completion Using Domain Knowledge: An Application in Analysts' Earnings Forecast","authors":"Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, Dantong Yu","doi":"10.1109/ICDMW51313.2020.00059","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00059","url":null,"abstract":"Financial analysts' earnings forecast is one of the most critical inputs for security valuation and investment decisions. However, it is challenging to utilize such information for two main reasons: missing values and heterogeneity among analysts. In this paper, we show that one recent breakthrough in nonlinear tensor completion algorithm, CoSTCo [1], overcomes the difficulty by imputing missing values and significantly improves the forecast accuracy in earnings. Compared with conventional imputation approaches, CoSTCo effectively captures latent information and reduces the tensor completion errors by 50%, even with 98% missing values. Furthermore, we show that using firm characteristics as auxiliary information we can improve firms' earnings prediction accuracy by 6%. Results are consistent using different performance metrics and across various industry sectors. Notably, the performance improvement is more salient for the sectors with high heterogeneity. Our findings imply the successful application of advanced ML techniques in a real financial problem.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"16 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130164361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
One Belt, One Road, One Sentiment? A Hybrid Approach to Gauging Public Opinions on the New Silk Road Initiative 一带一路,一种情怀?新丝绸之路倡议民意调查的混合方法
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00011
Jonathan Kevin Chandra, E. Cambria
With the rapid adoption of the Internet, fast-moving social media platforms have been able to extract and encapsulate real-time public sentiments on different entities. Real-time sentiment analysis on current dynamic events such as elections, global affairs and sports are essential in the understanding the public's reaction to the states and trajectories of these events. In this paper, we aim to extract the sentiments of the Belt and Road Initiative from Twitter. Using aspect-based sentiment analysis, we were able to obtain the tweet's sentiment polarity on the related aspect category to better understand the topics that were discussed. We have developed an end-to-end sentiment analysis system that collects relevant data from Twitter, processes it and visualizes it on an intuitive display. We employed a hybrid approach of symbolic and sub-symbolic techniques using gated convolutional networks, aspect embeddings and the SenticNet framework to solve the subtasks of aspect category detection and aspect category polarity. A confidence score threshold was used to decide on the results provided by the models from the differing approaches.
随着互联网的迅速普及,快速发展的社交媒体平台已经能够提取和封装不同实体的实时公众情绪。对选举、全球事务和体育等当前动态事件进行实时情绪分析,对于了解公众对这些事件的状态和轨迹的反应至关重要。在本文中,我们旨在从Twitter中提取“一带一路”倡议的情绪。使用基于方面的情感分析,我们能够在相关方面类别上获得tweet的情感极性,以更好地理解所讨论的主题。我们开发了一个端到端的情感分析系统,可以从Twitter上收集相关数据,对其进行处理,并在直观的显示上进行可视化。采用门控卷积网络、方面嵌入和SenticNet框架的符号和子符号混合方法来解决方面类别检测和方面类别极性的子任务。使用置信分数阈值来决定不同方法的模型所提供的结果。
{"title":"One Belt, One Road, One Sentiment? A Hybrid Approach to Gauging Public Opinions on the New Silk Road Initiative","authors":"Jonathan Kevin Chandra, E. Cambria","doi":"10.1109/ICDMW51313.2020.00011","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00011","url":null,"abstract":"With the rapid adoption of the Internet, fast-moving social media platforms have been able to extract and encapsulate real-time public sentiments on different entities. Real-time sentiment analysis on current dynamic events such as elections, global affairs and sports are essential in the understanding the public's reaction to the states and trajectories of these events. In this paper, we aim to extract the sentiments of the Belt and Road Initiative from Twitter. Using aspect-based sentiment analysis, we were able to obtain the tweet's sentiment polarity on the related aspect category to better understand the topics that were discussed. We have developed an end-to-end sentiment analysis system that collects relevant data from Twitter, processes it and visualizes it on an intuitive display. We employed a hybrid approach of symbolic and sub-symbolic techniques using gated convolutional networks, aspect embeddings and the SenticNet framework to solve the subtasks of aspect category detection and aspect category polarity. A confidence score threshold was used to decide on the results provided by the models from the differing approaches.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121144839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2020 International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1