In multi-label learning, each instance is associated with multiple labels simultaneously. Most of the existing approaches directly treat each label in a crisp manner, i.e. one class label is either relevant or irrelevant to the instance. However, the latent relative importance of each relevant label is regrettably ignored. In this paper, we propose a novel multi-label learning approach that aims to estimate the latent labeling importances while training the inductive model simultaneously. Specifically, we present a biconvex formulation with both instance and label graph regularization, and solve this problem using an alternating way. On the one hand, the inductive model is trained by minimizing the least squares loss of fitting the latent relative labeling importances. On the other hand, the latent relative labeling importances are estimated by the modeling outputs via a specially constrained label propagation procedure. Through the mutual adaption of the inductive model training and the specially constrained label propagation, an effective multi-label learning model is therefore built by optimally estimating the latent relative labeling importances. Extensive experimental results clearly show the effectiveness of the proposed approach.
{"title":"Estimating Latent Relative Labeling Importances for Multi-label Learning","authors":"Shuo He, Lei Feng, Li Li","doi":"10.1109/ICDM.2018.00127","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00127","url":null,"abstract":"In multi-label learning, each instance is associated with multiple labels simultaneously. Most of the existing approaches directly treat each label in a crisp manner, i.e. one class label is either relevant or irrelevant to the instance. However, the latent relative importance of each relevant label is regrettably ignored. In this paper, we propose a novel multi-label learning approach that aims to estimate the latent labeling importances while training the inductive model simultaneously. Specifically, we present a biconvex formulation with both instance and label graph regularization, and solve this problem using an alternating way. On the one hand, the inductive model is trained by minimizing the least squares loss of fitting the latent relative labeling importances. On the other hand, the latent relative labeling importances are estimated by the modeling outputs via a specially constrained label propagation procedure. Through the mutual adaption of the inductive model training and the specially constrained label propagation, an effective multi-label learning model is therefore built by optimally estimating the latent relative labeling importances. Extensive experimental results clearly show the effectiveness of the proposed approach.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126008299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although many researchers of recommender systems have noted that encoding user-item interactions based on DNNs promotes the performance of collaborative filtering, they ignore that embedding the latent features collected from external sources, e.g., knowledge graphs (KGs), is able to produce more precise recommendation results. Furthermore, CF-based models are still vulnerable to the scenarios of sparse known user-item interactions. In this paper, towards movie recommendation, we propose a novel knowledge-enhanced deep recommendation framework incorporating GAN-based models to acquire robust performance. Specifically, our framework first imports various feature embeddings distilled not only from user-movie interactions, but also from KGs and tags, to constitute initial user/movie representations. Then, user/movie representations are fed into a generator and a discriminator simultaneously to learn final optimal representations through adversarial training, which are conducive to generating better recommendation results. The extensive experiments on a real Douban dataset demonstrate our framework's superiority over some state-of-the-art recommendation models, especially in the scenarios of sparse observed user-movie interactions.
{"title":"A Knowledge-Enhanced Deep Recommendation Framework Incorporating GAN-Based Models","authors":"Deqing Yang, Zikai Guo, Ziyi Wang, Juyang Jiang, Yanghua Xiao, Wei Wang","doi":"10.1109/ICDM.2018.00187","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00187","url":null,"abstract":"Although many researchers of recommender systems have noted that encoding user-item interactions based on DNNs promotes the performance of collaborative filtering, they ignore that embedding the latent features collected from external sources, e.g., knowledge graphs (KGs), is able to produce more precise recommendation results. Furthermore, CF-based models are still vulnerable to the scenarios of sparse known user-item interactions. In this paper, towards movie recommendation, we propose a novel knowledge-enhanced deep recommendation framework incorporating GAN-based models to acquire robust performance. Specifically, our framework first imports various feature embeddings distilled not only from user-movie interactions, but also from KGs and tags, to constitute initial user/movie representations. Then, user/movie representations are fed into a generator and a discriminator simultaneously to learn final optimal representations through adversarial training, which are conducive to generating better recommendation results. The extensive experiments on a real Douban dataset demonstrate our framework's superiority over some state-of-the-art recommendation models, especially in the scenarios of sparse observed user-movie interactions.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128500808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a novel ensemble learning framework for supervised classification. Our proposed framework, mixed bagging, is a form of bootstrap aggregating (bagging) in which the sampling process takes into account the classification hardness of the training instances. The classification hardness, or simply hardness, of an instance is defined as the probability that the instance will be misclassified by a classification model built from the remaining instances in the training set. We incorporate instance hardness into the bagging process by varying the sampling probability of each instance based on its estimated hardness. Bootstraps of differing hardness can be created in this way by over-representing, under-representing and equally representing harder instances. This results in a diverse committee of classifiers induced from the bootstraps, whose individual outputs can be aggregated to achieve a final class prediction. We propose two versions of mixed bagging – one where the bootstraps are grouped as easy, regular or hard, with all bootstraps in one group having the same hardness; and the other where the hardness of bootstraps change gradually from one iteration to the next. We have tested our system on 47 publicly available binary classification problems using C4.5 Decision Trees of varying depth as base learners. We find that the proposed mixed bagging methods perform better than traditional bagging and weighted bagging (wagging) regardless of the base learner. The proposed method also outperforms AdaBoost when the base learner consists of deeper decision trees. We examine the results of mixed bagging in terms of bias-variance decomposition and find that mixed bagging is better than AdaBoost at reducing variance and better than traditional bagging at reducing inductive bias.
{"title":"Mixed Bagging: A Novel Ensemble Learning Framework for Supervised Classification Based on Instance Hardness","authors":"A. Kabir, Carolina Ruiz, S. A. Alvarez","doi":"10.1109/ICDM.2018.00137","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00137","url":null,"abstract":"We introduce a novel ensemble learning framework for supervised classification. Our proposed framework, mixed bagging, is a form of bootstrap aggregating (bagging) in which the sampling process takes into account the classification hardness of the training instances. The classification hardness, or simply hardness, of an instance is defined as the probability that the instance will be misclassified by a classification model built from the remaining instances in the training set. We incorporate instance hardness into the bagging process by varying the sampling probability of each instance based on its estimated hardness. Bootstraps of differing hardness can be created in this way by over-representing, under-representing and equally representing harder instances. This results in a diverse committee of classifiers induced from the bootstraps, whose individual outputs can be aggregated to achieve a final class prediction. We propose two versions of mixed bagging – one where the bootstraps are grouped as easy, regular or hard, with all bootstraps in one group having the same hardness; and the other where the hardness of bootstraps change gradually from one iteration to the next. We have tested our system on 47 publicly available binary classification problems using C4.5 Decision Trees of varying depth as base learners. We find that the proposed mixed bagging methods perform better than traditional bagging and weighted bagging (wagging) regardless of the base learner. The proposed method also outperforms AdaBoost when the base learner consists of deeper decision trees. We examine the results of mixed bagging in terms of bias-variance decomposition and find that mixed bagging is better than AdaBoost at reducing variance and better than traditional bagging at reducing inductive bias.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128532418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenbo Guo, Qinglong Wang, Kaixuan Zhang, Alexander Ororbia, Sui Huang, Xue Liu, C. Lee Giles, Lin Lin, Xinyu Xing
It has been recently shown that deep neural networks (DNNs) are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points that change "fool" the original DNN model, forcing it to misclassify previously correctly classified samples with high confidence. Many believe addressing this flaw is essential for DNNs to be used in critical applications such as cyber security. Previous work has shown that learning algorithms that enhance the robustness of DNN models all use the tactic of "security through obscurity". This means that security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate by examining how previous research dealt with this and propose a generic approach to enhance a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available, such as a large scale malware dataset and MNIST and IMDB datasets. Our results indicate that our approach typically provides superior classification performance and robustness to attacks compared with state-of-art solutions.
{"title":"Defending Against Adversarial Samples Without Security through Obscurity","authors":"Wenbo Guo, Qinglong Wang, Kaixuan Zhang, Alexander Ororbia, Sui Huang, Xue Liu, C. Lee Giles, Lin Lin, Xinyu Xing","doi":"10.1109/ICDM.2018.00029","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00029","url":null,"abstract":"It has been recently shown that deep neural networks (DNNs) are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points that change \"fool\" the original DNN model, forcing it to misclassify previously correctly classified samples with high confidence. Many believe addressing this flaw is essential for DNNs to be used in critical applications such as cyber security. Previous work has shown that learning algorithms that enhance the robustness of DNN models all use the tactic of \"security through obscurity\". This means that security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate by examining how previous research dealt with this and propose a generic approach to enhance a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available, such as a large scale malware dataset and MNIST and IMDB datasets. Our results indicate that our approach typically provides superior classification performance and robustness to attacks compared with state-of-art solutions.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127102272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transfer covariance functions, which can model domain similarities and adaptively control the knowledge transfer across domains, are widely used in Gaussian process (GP) based transfer learning. We focus on regression problems in a black-box learning scenario, and study a family of rather general transfer covariance functions, T_*, that can model the similarity heterogeneity of domains through multiple kernel learning. A necessary and sufficient condition that (i) validates GPs using T_* for any data and (ii) provides semantic interpretations is given. Moreover, building on this condition, we propose a computationally inexpensive model learning rule that can explicitly capture different sub-similarities of domains. Extensive experiments on one synthetic dataset and four real-world datasets demonstrate the effectiveness of the learned GP on the sub-similarity capture and the transfer performance.
{"title":"Uncluttered Domain Sub-Similarity Modeling for Transfer Regression","authors":"Pengfei Wei, Ramón Sagarna, Yiping Ke, Y. Ong","doi":"10.1109/ICDM.2018.00178","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00178","url":null,"abstract":"Transfer covariance functions, which can model domain similarities and adaptively control the knowledge transfer across domains, are widely used in Gaussian process (GP) based transfer learning. We focus on regression problems in a black-box learning scenario, and study a family of rather general transfer covariance functions, T_*, that can model the similarity heterogeneity of domains through multiple kernel learning. A necessary and sufficient condition that (i) validates GPs using T_* for any data and (ii) provides semantic interpretations is given. Moreover, building on this condition, we propose a computationally inexpensive model learning rule that can explicitly capture different sub-similarities of domains. Extensive experiments on one synthetic dataset and four real-world datasets demonstrate the effectiveness of the learned GP on the sub-similarity capture and the transfer performance.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129120833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predicting revisit intention is very important for the retail industry. Converting first-time visitors to repeating customers is of prime importance for high profitability. However, revisit analyses for offline retail businesses have been conducted on a small scale in previous studies, mainly because their methodologies have mostly relied on manually collected data. With the help of noninvasive monitoring, analyzing a customer's behavior inside stores has become possible, and revisit statistics are available from the large portion of customers who turn on their Wi-Fi or Bluetooth devices. Using Wi-Fi fingerprinting data from ZOYI, we propose a systematic framework to predict the revisit intention of customers using only signals received from their mobile devices. Using data collected from seven flagship stores in downtown Seoul, we achieved 67-80% prediction accuracy for all customers and 64-72% prediction accuracy for first-time visitors. The performance improvement by considering customer mobility was 4.7-24.3%. Our framework showed a feasibility to predict revisits using customer mobility from Wi-Fi signals, that have not been considered in previous marketing studies. Toward this goal, we examine the effect of data collection period on the prediction performance and present the robustness of our model on missing customers. Finally, we discuss the difficulties of securing prediction accuracy with the features that look promising but turn out to be unsatisfactory.
{"title":"Utilizing In-store Sensors for Revisit Prediction","authors":"Sundong Kim, Jae-Gil Lee","doi":"10.1109/ICDM.2018.00037","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00037","url":null,"abstract":"Predicting revisit intention is very important for the retail industry. Converting first-time visitors to repeating customers is of prime importance for high profitability. However, revisit analyses for offline retail businesses have been conducted on a small scale in previous studies, mainly because their methodologies have mostly relied on manually collected data. With the help of noninvasive monitoring, analyzing a customer's behavior inside stores has become possible, and revisit statistics are available from the large portion of customers who turn on their Wi-Fi or Bluetooth devices. Using Wi-Fi fingerprinting data from ZOYI, we propose a systematic framework to predict the revisit intention of customers using only signals received from their mobile devices. Using data collected from seven flagship stores in downtown Seoul, we achieved 67-80% prediction accuracy for all customers and 64-72% prediction accuracy for first-time visitors. The performance improvement by considering customer mobility was 4.7-24.3%. Our framework showed a feasibility to predict revisits using customer mobility from Wi-Fi signals, that have not been considered in previous marketing studies. Toward this goal, we examine the effect of data collection period on the prediction performance and present the robustness of our model on missing customers. Finally, we discuss the difficulties of securing prediction accuracy with the features that look promising but turn out to be unsatisfactory.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131979432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Doris Xin, Ahmed El-Kishky, De Liao, Brandon Norick, Jiawei Han
Active learning exploits inherent structures in the unlabeled data to minimize the number of labels required to train an accurate model. It enables effective machine learning in applications with high labeling cost, such as document classification and drug response prediction. We investigate active learning on heterogeneous information networks, with the objective of obtaining accurate node classifications while minimizing the number of labeled nodes. Our proposed algorithm harnesses a multi-armed bandit (MAB) algorithm to determine network structures that identify the most important nodes to the classification task, accounting for node types and without assuming label assortativity. Evaluations on real-world network classification tasks demonstrate that our algorithm outperforms existing methods independent of the underlying classification model.
{"title":"Active Learning on Heterogeneous Information Networks: A Multi-armed Bandit Approach","authors":"Doris Xin, Ahmed El-Kishky, De Liao, Brandon Norick, Jiawei Han","doi":"10.1109/ICDM.2018.00184","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00184","url":null,"abstract":"Active learning exploits inherent structures in the unlabeled data to minimize the number of labels required to train an accurate model. It enables effective machine learning in applications with high labeling cost, such as document classification and drug response prediction. We investigate active learning on heterogeneous information networks, with the objective of obtaining accurate node classifications while minimizing the number of labeled nodes. Our proposed algorithm harnesses a multi-armed bandit (MAB) algorithm to determine network structures that identify the most important nodes to the classification task, accounting for node types and without assuming label assortativity. Evaluations on real-world network classification tasks demonstrate that our algorithm outperforms existing methods independent of the underlying classification model.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132010726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tensor decomposition techniques such as CANDECOMP/PARAFAC (CP) decomposition have achieved great success across a range of scientific fields. They have been traditionally applied to dense, static data. However, today's datasets are often highly sparse and dynamically changing over time. Traditional decomposition methods such as Alternating Least Squares (ALS) cannot be easily applied to sparse tensors, due to poor efficiency. Furthermore, existing online tensor decomposition methods mostly target dense tensors, and thus also encounter significant scalability issues for sparse data. To address this gap, we propose a new incremental algorithm for tracking the CP decompositions of online sparse tensors on-the-fly. Experiments on nine real-world datasets show that our algorithm is able to produce quality decompositions of comparable quality to the most accurate algorithm, ALS, whilst at the same time achieving speed improvements of up to 250 times and 100 times less memory.
{"title":"Online CP Decomposition for Sparse Tensors","authors":"Shuo Zhou, S. Erfani, J. Bailey","doi":"10.1109/ICDM.2018.00202","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00202","url":null,"abstract":"Tensor decomposition techniques such as CANDECOMP/PARAFAC (CP) decomposition have achieved great success across a range of scientific fields. They have been traditionally applied to dense, static data. However, today's datasets are often highly sparse and dynamically changing over time. Traditional decomposition methods such as Alternating Least Squares (ALS) cannot be easily applied to sparse tensors, due to poor efficiency. Furthermore, existing online tensor decomposition methods mostly target dense tensors, and thus also encounter significant scalability issues for sparse data. To address this gap, we propose a new incremental algorithm for tracking the CP decompositions of online sparse tensors on-the-fly. Experiments on nine real-world datasets show that our algorithm is able to produce quality decompositions of comparable quality to the most accurate algorithm, ALS, whilst at the same time achieving speed improvements of up to 250 times and 100 times less memory.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126884808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Title Page i","authors":"","doi":"10.1109/icdm.2018.00001","DOIUrl":"https://doi.org/10.1109/icdm.2018.00001","url":null,"abstract":"","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114201406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vincent Vercruyssen, Wannes Meert, Gust Verbruggen, Koen Maes, Ruben Baumer, Jesse Davis
Nowadays, all aspects of a production process are continuously monitored and visualized in a dashboard. Equipment is monitored using a variety of sensors, natural resource usage is tracked, and interventions are recorded. In this context, a common task is to identify anomalous behavior from the time series data generated by sensors. As manually analyzing such data is laborious and expensive, automated approaches have the potential to be much more efficient as well as cost effective. While anomaly detection could be posed as a supervised learning problem, typically this is not possible as few or no labeled examples of anomalous behavior are available and it is oftentimes infeasible or undesirable to collect them. Therefore, unsupervised approaches are commonly employed which typically identify anomalies as deviations from normal (i.e., common or frequent) behavior. However, in many real-world settings several types of normal behavior exist that occur less frequently than some anomalous behaviors. In this paper, we propose a novel constrained-clustering-based approach for anomaly detection that works in both an unsupervised and semi-supervised setting. Starting from an unlabeled data set, the approach is able to gradually incorporate expert-provided feedback to improve its performance. We evaluated our approach on real-world water monitoring time series data from supermarkets in collaboration with Colruyt Group, one of Belgiums largest retail companies. Empirically, we found that our approach outperforms the current detection system as well as several other baselines. Our system is currently deployed and used by the company to analyze water usage for 20 stores on a daily basis.
{"title":"Semi-Supervised Anomaly Detection with an Application to Water Analytics","authors":"Vincent Vercruyssen, Wannes Meert, Gust Verbruggen, Koen Maes, Ruben Baumer, Jesse Davis","doi":"10.1109/ICDM.2018.00068","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00068","url":null,"abstract":"Nowadays, all aspects of a production process are continuously monitored and visualized in a dashboard. Equipment is monitored using a variety of sensors, natural resource usage is tracked, and interventions are recorded. In this context, a common task is to identify anomalous behavior from the time series data generated by sensors. As manually analyzing such data is laborious and expensive, automated approaches have the potential to be much more efficient as well as cost effective. While anomaly detection could be posed as a supervised learning problem, typically this is not possible as few or no labeled examples of anomalous behavior are available and it is oftentimes infeasible or undesirable to collect them. Therefore, unsupervised approaches are commonly employed which typically identify anomalies as deviations from normal (i.e., common or frequent) behavior. However, in many real-world settings several types of normal behavior exist that occur less frequently than some anomalous behaviors. In this paper, we propose a novel constrained-clustering-based approach for anomaly detection that works in both an unsupervised and semi-supervised setting. Starting from an unlabeled data set, the approach is able to gradually incorporate expert-provided feedback to improve its performance. We evaluated our approach on real-world water monitoring time series data from supermarkets in collaboration with Colruyt Group, one of Belgiums largest retail companies. Empirically, we found that our approach outperforms the current detection system as well as several other baselines. Our system is currently deployed and used by the company to analyze water usage for 20 stores on a daily basis.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116742829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}