Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00050
Georgios Mavroudeas, M. Magdon-Ismail, Xiao Shou, Kristin P. Bennett
We give a method for time series state prediction with a lazy teacher who only partially labels states, in particular only those states of an extreme nature. Hence, the labeling is not only lazy, but biased. Our method has two stages: (i) Impute new state labels for unlabeled states using a relabeling Hidden Markov Model, and in so doing treat the labeling bias. (ii) Use a supervised framework with the relabeled data. Our method is general, agnostic to the application and the supervised framework being used. We show compelling results in synthetic data and two real applications: epilepsy and complex care management. Our HMM-relabeling approach allows us to tackle time series with extremely sparse labels.
{"title":"HMM-Boost: Improved Time Series State Prediction Via Supervised Hidden Markov Models: Case Studies in Epileptic Seizure and Complex Care Management","authors":"Georgios Mavroudeas, M. Magdon-Ismail, Xiao Shou, Kristin P. Bennett","doi":"10.1109/ICDMW58026.2022.00050","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00050","url":null,"abstract":"We give a method for time series state prediction with a lazy teacher who only partially labels states, in particular only those states of an extreme nature. Hence, the labeling is not only lazy, but biased. Our method has two stages: (i) Impute new state labels for unlabeled states using a relabeling Hidden Markov Model, and in so doing treat the labeling bias. (ii) Use a supervised framework with the relabeled data. Our method is general, agnostic to the application and the supervised framework being used. We show compelling results in synthetic data and two real applications: epilepsy and complex care management. Our HMM-relabeling approach allows us to tackle time series with extremely sparse labels.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132177771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00012
Guojing Cong, Seung-Hwan Lim, Steven Young
Graph convolution incorporates topological information of a graph into learning. Message passing corresponds to traversal of a local neighborhood in classical graph algorithms. We show that incorporating additional global structures, such as shortest paths, through distance preserving embedding can improve performance. Our approach, Gavotte, significantly improves the performance of a range of popular graph neu-ral networks such as GCN, GA T,Graph SAGE, and GCNII for transductive learning. Gavotte also improves the performance of graph neural networks for full-supervised tasks, albeit to a smaller degree. As high-quality embeddings are generated by Gavotte as a by-product, we leverage clustering algorithms on these embed dings to augment the training set and introduce Gavotte+. Our results of Gavotte+ on datasets with very few labels demonstrate the advantage of augmenting graph convolution with distance preserving embedding.
图卷积将图的拓扑信息整合到学习中。在经典图算法中,消息传递对应于局部邻域的遍历。我们表明,通过距离保持嵌入结合额外的全局结构,如最短路径,可以提高性能。我们的方法,Gavotte,显著提高了一系列流行的图神经网络的性能,如GCN, GA T, graph SAGE和GCNII用于换能化学习。Gavotte还提高了图神经网络在全监督任务中的性能,尽管程度较小。由于高质量的嵌入是由Gavotte作为副产品生成的,我们利用这些嵌入的聚类算法来增强训练集并引入Gavotte+。我们在标签很少的数据集上的Gavotte+结果证明了用距离保持嵌入增强图卷积的优势。
{"title":"Augmenting Graph Convolution with Distance Preserving Embedding for Improved Learning","authors":"Guojing Cong, Seung-Hwan Lim, Steven Young","doi":"10.1109/ICDMW58026.2022.00012","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00012","url":null,"abstract":"Graph convolution incorporates topological information of a graph into learning. Message passing corresponds to traversal of a local neighborhood in classical graph algorithms. We show that incorporating additional global structures, such as shortest paths, through distance preserving embedding can improve performance. Our approach, Gavotte, significantly improves the performance of a range of popular graph neu-ral networks such as GCN, GA T,Graph SAGE, and GCNII for transductive learning. Gavotte also improves the performance of graph neural networks for full-supervised tasks, albeit to a smaller degree. As high-quality embeddings are generated by Gavotte as a by-product, we leverage clustering algorithms on these embed dings to augment the training set and introduce Gavotte+. Our results of Gavotte+ on datasets with very few labels demonstrate the advantage of augmenting graph convolution with distance preserving embedding.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131834463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00037
Pavel Shumkovskii, A. Kovantsev, Elizaveta Stavinova, P. Chunaev
Motivated by the problem of finding optimal Performance vs. Complexity trade-off in the task of forecasting time series data, we propose a model-agnostic method MetaSieve that performs data dichotomy (i.e., in fact, sieves the data instances in a meta-learning manner) according to a chosen quality level while iterating over the model's complexity. The method is inspired by classical iterative numerical optimization ones but is applied to sets of time series. As a result, the method is significantly less time consuming than a traditional brute force-based meta-learning algorithm. It further turns out in the experiments that the MetaSieve quality results are rather comparable to those of the brute force-based one thus one has a noticeable reduction in time consumption in exchange for a slight decrease of forecasting quality. Additionally, we experimentally show a good performance of a MetaSieve-based classifier that provides the Performance vs. Complexity classes a priori, i.e. before the actual forecasting, on synthetic and real-world time series data.
{"title":"MetaSieve: Performance vs. Complexity Sieve for Time Series Forecasting","authors":"Pavel Shumkovskii, A. Kovantsev, Elizaveta Stavinova, P. Chunaev","doi":"10.1109/ICDMW58026.2022.00037","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00037","url":null,"abstract":"Motivated by the problem of finding optimal Performance vs. Complexity trade-off in the task of forecasting time series data, we propose a model-agnostic method MetaSieve that performs data dichotomy (i.e., in fact, sieves the data instances in a meta-learning manner) according to a chosen quality level while iterating over the model's complexity. The method is inspired by classical iterative numerical optimization ones but is applied to sets of time series. As a result, the method is significantly less time consuming than a traditional brute force-based meta-learning algorithm. It further turns out in the experiments that the MetaSieve quality results are rather comparable to those of the brute force-based one thus one has a noticeable reduction in time consumption in exchange for a slight decrease of forecasting quality. Additionally, we experimentally show a good performance of a MetaSieve-based classifier that provides the Performance vs. Complexity classes a priori, i.e. before the actual forecasting, on synthetic and real-world time series data.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115615492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00084
Agrim Sachdeva, Ben Lazarine, Ruchik Dama, S. Samtani, Hongyi Zhu
The rapid pace of the development of artificial intelligence (AI) solutions is enabled by leveraging foundational tools and frameworks that allow AI developers to focus on application logic and rapid prototyping. However, the security vulnerabilities present in foundation repositories might cause irreparable damage due to the AI solutions built using these libraries being deployed in production environments. Our research leverages source code hosted on the prevailing social coding platform GitHub to identify vulnerabilities in foundational repositories commonly used for modern AI development (Linux, BERT, PyTorch, and Transformers), as well as the AI repositories that utilize foundation repositories as dependencies. Using an unsupervised graph embedding approach, we generate graph embeddings that capture vulnerability information and the relationships between repositories. Based on these embeddings, we performed clustering as our downstream task to group similarly vulnerable repositories. Our research identifies patterns and similarities between repositories and will help develop effective mitigation of vulnerabilities present in groups of repositories based on foundational AI repositories. We also discuss the implications of identifying such clusters of vulnerable repositories.
{"title":"Identifying Patterns of Vulnerability Incidence in Foundational Machine Learning Repositories on GitHub: An Unsupervised Graph Embedding Approach","authors":"Agrim Sachdeva, Ben Lazarine, Ruchik Dama, S. Samtani, Hongyi Zhu","doi":"10.1109/ICDMW58026.2022.00084","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00084","url":null,"abstract":"The rapid pace of the development of artificial intelligence (AI) solutions is enabled by leveraging foundational tools and frameworks that allow AI developers to focus on application logic and rapid prototyping. However, the security vulnerabilities present in foundation repositories might cause irreparable damage due to the AI solutions built using these libraries being deployed in production environments. Our research leverages source code hosted on the prevailing social coding platform GitHub to identify vulnerabilities in foundational repositories commonly used for modern AI development (Linux, BERT, PyTorch, and Transformers), as well as the AI repositories that utilize foundation repositories as dependencies. Using an unsupervised graph embedding approach, we generate graph embeddings that capture vulnerability information and the relationships between repositories. Based on these embeddings, we performed clustering as our downstream task to group similarly vulnerable repositories. Our research identifies patterns and similarities between repositories and will help develop effective mitigation of vulnerabilities present in groups of repositories based on foundational AI repositories. We also discuss the implications of identifying such clusters of vulnerable repositories.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114357632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00039
Derek Xu, William Shiao, Jia Chen, E. Papalexakis
The singular value decomposition (SVD) factors a matrix into three separate matrices: two (semi-)unitary matrices whose columns are left/right singular vectors and one diagonal matrix whose diagonal entries are singular values. Typically, performing SVD on big matrices is taxing due to its compu-tational complexity in the cubic order of its dimensions. With the advances and rapid growth of deep learning techniques in a broad spectrum of applications, a fundamental question arises: can deep neural networks learn the singular values of a matrix? To answer this question, we propose a novel algorithm, namely SV-Iearn, to predict the singular values of a given input matrix by leveraging the advances of neural networks. Numerical results demonstrate that our proposed method outperforms the competing alternatives in terms of achieving lower normalized mean square error on singular value prediction when using real-world datasets. Further, the predicted singular values combined with singular vectors of an input data allow us to reconstruct the input matrices with promising performance.
{"title":"SV-Learn: Learning Matrix Singular Values with Neural Networks","authors":"Derek Xu, William Shiao, Jia Chen, E. Papalexakis","doi":"10.1109/ICDMW58026.2022.00039","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00039","url":null,"abstract":"The singular value decomposition (SVD) factors a matrix into three separate matrices: two (semi-)unitary matrices whose columns are left/right singular vectors and one diagonal matrix whose diagonal entries are singular values. Typically, performing SVD on big matrices is taxing due to its compu-tational complexity in the cubic order of its dimensions. With the advances and rapid growth of deep learning techniques in a broad spectrum of applications, a fundamental question arises: can deep neural networks learn the singular values of a matrix? To answer this question, we propose a novel algorithm, namely SV-Iearn, to predict the singular values of a given input matrix by leveraging the advances of neural networks. Numerical results demonstrate that our proposed method outperforms the competing alternatives in terms of achieving lower normalized mean square error on singular value prediction when using real-world datasets. Further, the predicted singular values combined with singular vectors of an input data allow us to reconstruct the input matrices with promising performance.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00015
Radwa El Shawi, S. Sakr
Machine learning algorithms have been widely employed in various applications and fields. Novel technologies in automated machine learning (AutoML) ease algorithm selection and hyperparameter optimization complexity. AutoML frame-works have achieved notable success in hyperparameter tuning and surpassed the performance of human experts. However, depending on such frameworks as black-box can leave machine learning practitioners without insights into the inner working of the AutoML process and hence influence their trust in the models produced. In addition, excluding humans from the loop creates several limitations. For example, most of the current AutoML frameworks ignore the user preferences on defining or controlling the search space, which consequently can impact the performance of the models produced and the acceptance of these models by the end-users. The research in the area of transparency and controllability of AutoML has attracted much interest lately, both in academia and industry. However, existing tools are usually restricted to supervised learning tasks such as classification and regression, while unsupervised learning, particularly clustering, remains a largely unexplored problem. Motivated by these shortcomings, we design and implement cSmartML-GlassBox, an interactive visualization tool that en-ables users to refine the search space of AutoML and analyze the results. cSmartML-GlassBox is equipped with a recommendation engine to recommend a time budget that is likely adequate for a new dataset to obtain well-performing pipeline. In addition, the tool supports multi-granularity visualization to enable machine learning practitioners to monitor the AutoML process, analyze the explored configurations and refine/control the search space. Furthermore, cSmartML-GlassBox is equipped with a logging mechanism such that repeated runs on the same dataset can be more effective by avoiding evaluating the same previously considered configurations. We demonstrate the effectiveness and usability of the cSmartML-GlassBox through a user evaluation study with 23 participants and an expert-based usability study based on four experts. We find that the proposed tool increases users' understanding and trust in the AutoML frameworks.
{"title":"cSmartML-Glassbox: Increasing Transparency and Controllability in Automated Clustering","authors":"Radwa El Shawi, S. Sakr","doi":"10.1109/ICDMW58026.2022.00015","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00015","url":null,"abstract":"Machine learning algorithms have been widely employed in various applications and fields. Novel technologies in automated machine learning (AutoML) ease algorithm selection and hyperparameter optimization complexity. AutoML frame-works have achieved notable success in hyperparameter tuning and surpassed the performance of human experts. However, depending on such frameworks as black-box can leave machine learning practitioners without insights into the inner working of the AutoML process and hence influence their trust in the models produced. In addition, excluding humans from the loop creates several limitations. For example, most of the current AutoML frameworks ignore the user preferences on defining or controlling the search space, which consequently can impact the performance of the models produced and the acceptance of these models by the end-users. The research in the area of transparency and controllability of AutoML has attracted much interest lately, both in academia and industry. However, existing tools are usually restricted to supervised learning tasks such as classification and regression, while unsupervised learning, particularly clustering, remains a largely unexplored problem. Motivated by these shortcomings, we design and implement cSmartML-GlassBox, an interactive visualization tool that en-ables users to refine the search space of AutoML and analyze the results. cSmartML-GlassBox is equipped with a recommendation engine to recommend a time budget that is likely adequate for a new dataset to obtain well-performing pipeline. In addition, the tool supports multi-granularity visualization to enable machine learning practitioners to monitor the AutoML process, analyze the explored configurations and refine/control the search space. Furthermore, cSmartML-GlassBox is equipped with a logging mechanism such that repeated runs on the same dataset can be more effective by avoiding evaluating the same previously considered configurations. We demonstrate the effectiveness and usability of the cSmartML-GlassBox through a user evaluation study with 23 participants and an expert-based usability study based on four experts. We find that the proposed tool increases users' understanding and trust in the AutoML frameworks.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"2002 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129571501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00135
Venkata M. V. Gunturi, Rakesh Rajeev, Vipul Bondre, Aaditya Barnwal, Samir Jain, Ashank Anshuman, Manish Gupta
Given a spatio-temporal event framework E and a collection of time-stamped events A (over E), the goal of the periodic spatio-temporal hotspot detection (PST-Hotspot) problem is to determine spatial regions which show high “intensity” of events at certain periodic intervals. The output of the PST-Hotspot detection problem consists of the following: (a) a col-lection of spatial regions (which show high intensity of events) and, (b) their respective time intervals of high activity and periodicity values (e.g., daily, weekday-only, etc). PST-Hotspot detection poses significant challenge for designing a suitable interest measure. The aim over here is to design a mathematical representation of a PST-Hotspot such that it can differentiate interesting periodic patterns from trivial persistent patterns in the dataset. The current state of the art in the area of spatial and spatio-temporal hotspot detection focus on non-periodic patterns. In contrast, our proposed approach is able to determine periodic hotspots. We experimentally evaluated our proposed algorithm using real Azure traffic dataset from the Indian region.
{"title":"A Case Study on Periodic Spatio- Temporal Hotspot Detection in Azure Traffic Data","authors":"Venkata M. V. Gunturi, Rakesh Rajeev, Vipul Bondre, Aaditya Barnwal, Samir Jain, Ashank Anshuman, Manish Gupta","doi":"10.1109/ICDMW58026.2022.00135","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00135","url":null,"abstract":"Given a spatio-temporal event framework E and a collection of time-stamped events A (over E), the goal of the periodic spatio-temporal hotspot detection (PST-Hotspot) problem is to determine spatial regions which show high “intensity” of events at certain periodic intervals. The output of the PST-Hotspot detection problem consists of the following: (a) a col-lection of spatial regions (which show high intensity of events) and, (b) their respective time intervals of high activity and periodicity values (e.g., daily, weekday-only, etc). PST-Hotspot detection poses significant challenge for designing a suitable interest measure. The aim over here is to design a mathematical representation of a PST-Hotspot such that it can differentiate interesting periodic patterns from trivial persistent patterns in the dataset. The current state of the art in the area of spatial and spatio-temporal hotspot detection focus on non-periodic patterns. In contrast, our proposed approach is able to determine periodic hotspots. We experimentally evaluated our proposed algorithm using real Azure traffic dataset from the Indian region.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127129500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00017
Guixiang Wang, Hongwei Yin, Wenjun Hu, Y. Liu, Ruiqin Wang
In recent years, a number of multi-view clustering methods have been proposed through a global fusion paradigm. These methods take the entire sample space as the fusion object, where the global complementarity between views is explored and exploited to improve the clustering performance. However, local structures with strong or weak clustering capacity could coexist in each view. The traditional global fusion paradigm ignores the differences in clustering capacity of local structures, which makes it impossible to explore and exploit local complementarity between views. In this paper, a novel deep multi view subspace clustering method based on local fusion is proposed to solve this problem. First, a low rank self-expression layer is inserted into the deep autoencoder to eliminate the influence of noises when obtaining local cluster structure. Then, the fusion object is refined from the entire sample space to the local cluster structure, where a self-weighted strategy is designed to assign contribution weight according to the clustering capacity of the local cluster structure. Meanwhile, we joint orthogonal constraint to enhance the discriminative of local cluster structure that is more suitable for downstream clustering task. Experiments on several real-world datasets show that the proposed method achieves better clustering performance than most traditional multi-view clustering methods based on global fusion.
{"title":"Joint Low-rank and Orthogonal Deep Multi-view Subspace Clustering based on Local Fusion","authors":"Guixiang Wang, Hongwei Yin, Wenjun Hu, Y. Liu, Ruiqin Wang","doi":"10.1109/ICDMW58026.2022.00017","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00017","url":null,"abstract":"In recent years, a number of multi-view clustering methods have been proposed through a global fusion paradigm. These methods take the entire sample space as the fusion object, where the global complementarity between views is explored and exploited to improve the clustering performance. However, local structures with strong or weak clustering capacity could coexist in each view. The traditional global fusion paradigm ignores the differences in clustering capacity of local structures, which makes it impossible to explore and exploit local complementarity between views. In this paper, a novel deep multi view subspace clustering method based on local fusion is proposed to solve this problem. First, a low rank self-expression layer is inserted into the deep autoencoder to eliminate the influence of noises when obtaining local cluster structure. Then, the fusion object is refined from the entire sample space to the local cluster structure, where a self-weighted strategy is designed to assign contribution weight according to the clustering capacity of the local cluster structure. Meanwhile, we joint orthogonal constraint to enhance the discriminative of local cluster structure that is more suitable for downstream clustering task. Experiments on several real-world datasets show that the proposed method achieves better clustering performance than most traditional multi-view clustering methods based on global fusion.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127398369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00115
Mustafa Shuqair, J. Jimenez-shahed, B. Ghoraani
System monitoring has become an area of interest with the increasing growth in wearable sensors and continuous monitoring tools. However, the generalizability of the classification models to unseen incoming data remains challenging. This paper proposes a novel architecture based on reinforcement learning (RL) to incre-mentally learn patterns of time-series data and detect changes in the system state. Our rationale is that RL's ability to learn from past experiences can help increase the performance and generalizability of classification models in time-series monitoring applications. Our novel definition of the environment consists of a set of one-class anomaly detectors to define environment states based on the dynamics of the incoming data and a reward function to reward the RL agent according to its actions. A deep RL agent incrementally learns to perform continuous, binary classification predictions according to the environment states and the received reward. We applied the proposed model for detecting response to medication (ON or OFF) in patients with Parkinson's disease (PD). The PD dataset consisted of 170 minutes of time-series movement signals collected from 12 patients using two wearable sensors. Our proposed model, with a testing accuracy of 77.95%, outperformed Adaptive Boosting, Multi-layer Perceptron, and Support Vector Machines with 53.10%, 44.92%, and 52.70% testing accuracy, respectively. The proposed model had a slight decline in the F-score, decreasing from 88.15% validation score to 78.42% in testing, a significantly slight decline compared to the other three models. These evidence the potential of the proposed RL-based classifier in time-series monitoring applications as a highly generalizable model for unseen incoming data.
{"title":"Incremental Learning in Time-series Data using Reinforcement Learning","authors":"Mustafa Shuqair, J. Jimenez-shahed, B. Ghoraani","doi":"10.1109/ICDMW58026.2022.00115","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00115","url":null,"abstract":"System monitoring has become an area of interest with the increasing growth in wearable sensors and continuous monitoring tools. However, the generalizability of the classification models to unseen incoming data remains challenging. This paper proposes a novel architecture based on reinforcement learning (RL) to incre-mentally learn patterns of time-series data and detect changes in the system state. Our rationale is that RL's ability to learn from past experiences can help increase the performance and generalizability of classification models in time-series monitoring applications. Our novel definition of the environment consists of a set of one-class anomaly detectors to define environment states based on the dynamics of the incoming data and a reward function to reward the RL agent according to its actions. A deep RL agent incrementally learns to perform continuous, binary classification predictions according to the environment states and the received reward. We applied the proposed model for detecting response to medication (ON or OFF) in patients with Parkinson's disease (PD). The PD dataset consisted of 170 minutes of time-series movement signals collected from 12 patients using two wearable sensors. Our proposed model, with a testing accuracy of 77.95%, outperformed Adaptive Boosting, Multi-layer Perceptron, and Support Vector Machines with 53.10%, 44.92%, and 52.70% testing accuracy, respectively. The proposed model had a slight decline in the F-score, decreasing from 88.15% validation score to 78.42% in testing, a significantly slight decline compared to the other three models. These evidence the potential of the proposed RL-based classifier in time-series monitoring applications as a highly generalizable model for unseen incoming data.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127319013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00148
Bharat Sharma, J. Kumar, A. Ganguly, F. Hoffman
Rising atmospheric carbon dioxide due to human activities through fossil fuel emissions and land use changes have increased climate extremes such as heat waves and droughts that have led to and are expected to increase the occurrence of carbon cycle extremes. Carbon cycle extremes represent large anomalies in the carbon cycle that are associated with gains or losses in carbon uptake. Carbon cycle extremes could be continuous in space and time and cross political boundaries. Here, we present a methodology to identify large spatiotemporal extremes (STEs) in the terrestrial carbon cycle using image processing tools for feature detection. We characterized the STE events based on neighborhood structures that are three-dimensional adjacency matrices for the detection of spatiotemporal manifolds of carbon cycle extremes. We found that the area affected and carbon loss during negative carbon cycle extremes were consistent with continuous neighborhood structures. In the gross primary production data we used, 100 carbon cycle STEs accounted for more than 75% of all the negative carbon cycle extremes. This paper presents a comparative analysis of the magnitude of carbon cycle STEs and attribution of those STEs to climate drivers as a function of neighborhood structures for two observational datasets and an Earth system model simulation.
{"title":"Using Image Processing Techniques to Identify and Quantify Spatiotemporal Carbon Cycle Extremes","authors":"Bharat Sharma, J. Kumar, A. Ganguly, F. Hoffman","doi":"10.1109/ICDMW58026.2022.00148","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00148","url":null,"abstract":"Rising atmospheric carbon dioxide due to human activities through fossil fuel emissions and land use changes have increased climate extremes such as heat waves and droughts that have led to and are expected to increase the occurrence of carbon cycle extremes. Carbon cycle extremes represent large anomalies in the carbon cycle that are associated with gains or losses in carbon uptake. Carbon cycle extremes could be continuous in space and time and cross political boundaries. Here, we present a methodology to identify large spatiotemporal extremes (STEs) in the terrestrial carbon cycle using image processing tools for feature detection. We characterized the STE events based on neighborhood structures that are three-dimensional adjacency matrices for the detection of spatiotemporal manifolds of carbon cycle extremes. We found that the area affected and carbon loss during negative carbon cycle extremes were consistent with continuous neighborhood structures. In the gross primary production data we used, 100 carbon cycle STEs accounted for more than 75% of all the negative carbon cycle extremes. This paper presents a comparative analysis of the magnitude of carbon cycle STEs and attribution of those STEs to climate drivers as a function of neighborhood structures for two observational datasets and an Earth system model simulation.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126411172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}