2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

英文中文

HMM-Boost: Improved Time Series State Prediction Via Supervised Hidden Markov Models: Case Studies in Epileptic Seizure and Complex Care Management HMM-Boost:通过监督隐马尔可夫模型改进的时间序列状态预测:癫痫发作和复杂护理管理的案例研究

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00050

Georgios Mavroudeas, M. Magdon-Ismail, Xiao Shou, Kristin P. Bennett

We give a method for time series state prediction with a lazy teacher who only partially labels states, in particular only those states of an extreme nature. Hence, the labeling is not only lazy, but biased. Our method has two stages: (i) Impute new state labels for unlabeled states using a relabeling Hidden Markov Model, and in so doing treat the labeling bias. (ii) Use a supervised framework with the relabeled data. Our method is general, agnostic to the application and the supervised framework being used. We show compelling results in synthetic data and two real applications: epilepsy and complex care management. Our HMM-relabeling approach allows us to tackle time series with extremely sparse labels.

我们给出了一个时间序列状态预测的方法，一个懒惰的老师只部分标记状态，特别是那些极端性质的状态。因此，贴标签不仅是懒惰的，而且是有偏见的。我们的方法有两个阶段:(i)使用重新标记的隐马尔可夫模型为未标记的状态输入新的状态标签，并在这样做时处理标记偏差。(ii)对重新标记的数据使用监督框架。我们的方法是通用的，与应用程序和所使用的监督框架无关。我们在合成数据和两个实际应用方面显示了令人信服的结果:癫痫和复杂的护理管理。我们的hmm重新标记方法允许我们处理具有极其稀疏标签的时间序列。

引用次数: 0

Augmenting Graph Convolution with Distance Preserving Embedding for Improved Learning 基于距离保持嵌入的增强图卷积改进学习

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00012

Guojing Cong, Seung-Hwan Lim, Steven Young

Graph convolution incorporates topological information of a graph into learning. Message passing corresponds to traversal of a local neighborhood in classical graph algorithms. We show that incorporating additional global structures, such as shortest paths, through distance preserving embedding can improve performance. Our approach, Gavotte, significantly improves the performance of a range of popular graph neu-ral networks such as GCN, GA T,Graph SAGE, and GCNII for transductive learning. Gavotte also improves the performance of graph neural networks for full-supervised tasks, albeit to a smaller degree. As high-quality embeddings are generated by Gavotte as a by-product, we leverage clustering algorithms on these embed dings to augment the training set and introduce Gavotte+. Our results of Gavotte+ on datasets with very few labels demonstrate the advantage of augmenting graph convolution with distance preserving embedding.

图卷积将图的拓扑信息整合到学习中。在经典图算法中，消息传递对应于局部邻域的遍历。我们表明，通过距离保持嵌入结合额外的全局结构，如最短路径，可以提高性能。我们的方法，Gavotte，显著提高了一系列流行的图神经网络的性能，如GCN, GA T, graph SAGE和GCNII用于换能化学习。Gavotte还提高了图神经网络在全监督任务中的性能，尽管程度较小。由于高质量的嵌入是由Gavotte作为副产品生成的，我们利用这些嵌入的聚类算法来增强训练集并引入Gavotte+。我们在标签很少的数据集上的Gavotte+结果证明了用距离保持嵌入增强图卷积的优势。

引用次数: 0

MetaSieve: Performance vs. Complexity Sieve for Time Series Forecasting MetaSieve:时间序列预测的性能与复杂性筛

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00037

Pavel Shumkovskii, A. Kovantsev, Elizaveta Stavinova, P. Chunaev

Motivated by the problem of finding optimal Performance vs. Complexity trade-off in the task of forecasting time series data, we propose a model-agnostic method MetaSieve that performs data dichotomy (i.e., in fact, sieves the data instances in a meta-learning manner) according to a chosen quality level while iterating over the model's complexity. The method is inspired by classical iterative numerical optimization ones but is applied to sets of time series. As a result, the method is significantly less time consuming than a traditional brute force-based meta-learning algorithm. It further turns out in the experiments that the MetaSieve quality results are rather comparable to those of the brute force-based one thus one has a noticeable reduction in time consumption in exchange for a slight decrease of forecasting quality. Additionally, we experimentally show a good performance of a MetaSieve-based classifier that provides the Performance vs. Complexity classes a priori, i.e. before the actual forecasting, on synthetic and real-world time series data.

由于在预测时间序列数据的任务中寻找最佳性能与复杂性权衡的问题，我们提出了一种模型不可知的方法MetaSieve，该方法根据选择的质量水平执行数据二分法(即，实际上，以元学习的方式筛选数据实例)，同时迭代模型的复杂性。该方法受到经典迭代数值优化方法的启发，但适用于时间序列集。因此，该方法比传统的基于蛮力的元学习算法消耗的时间要少得多。在实验中进一步证明，MetaSieve的质量结果与基于蛮力的结果相当，因此可以显着减少时间消耗，以换取稍微降低预测质量。此外，我们通过实验展示了基于metaseve的分类器的良好性能，该分类器提供了先验的性能与复杂性类，即在实际预测之前，在合成和真实的时间序列数据上。

引用次数: 0

Identifying Patterns of Vulnerability Incidence in Foundational Machine Learning Repositories on GitHub: An Unsupervised Graph Embedding Approach 识别GitHub基础机器学习存储库中的漏洞发生率模式:一种无监督图嵌入方法

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00084

Agrim Sachdeva, Ben Lazarine, Ruchik Dama, S. Samtani, Hongyi Zhu

The rapid pace of the development of artificial intelligence (AI) solutions is enabled by leveraging foundational tools and frameworks that allow AI developers to focus on application logic and rapid prototyping. However, the security vulnerabilities present in foundation repositories might cause irreparable damage due to the AI solutions built using these libraries being deployed in production environments. Our research leverages source code hosted on the prevailing social coding platform GitHub to identify vulnerabilities in foundational repositories commonly used for modern AI development (Linux, BERT, PyTorch, and Transformers), as well as the AI repositories that utilize foundation repositories as dependencies. Using an unsupervised graph embedding approach, we generate graph embeddings that capture vulnerability information and the relationships between repositories. Based on these embeddings, we performed clustering as our downstream task to group similarly vulnerable repositories. Our research identifies patterns and similarities between repositories and will help develop effective mitigation of vulnerabilities present in groups of repositories based on foundational AI repositories. We also discuss the implications of identifying such clusters of vulnerable repositories.

人工智能(AI)解决方案的快速发展是通过利用基础工具和框架来实现的，这些工具和框架允许AI开发人员专注于应用程序逻辑和快速原型。然而，由于在生产环境中部署使用这些库构建的AI解决方案，基础存储库中存在的安全漏洞可能会造成无法弥补的损害。我们的研究利用了托管在主流社交编码平台GitHub上的源代码来识别现代人工智能开发常用的基础存储库(Linux、BERT、PyTorch和Transformers)中的漏洞，以及利用基础存储库作为依赖的人工智能存储库。使用无监督图嵌入方法，我们生成捕获漏洞信息和存储库之间关系的图嵌入。基于这些嵌入，我们执行集群作为下游任务，对类似的易受攻击的存储库进行分组。我们的研究确定了存储库之间的模式和相似性，并将有助于有效缓解基于基础AI存储库的存储库组中存在的漏洞。我们还讨论了识别易受攻击存储库集群的含义。

{"title":"Identifying Patterns of Vulnerability Incidence in Foundational Machine Learning Repositories on GitHub: An Unsupervised Graph Embedding Approach","authors":"Agrim Sachdeva, Ben Lazarine, Ruchik Dama, S. Samtani, Hongyi Zhu","doi":"10.1109/ICDMW58026.2022.00084","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00084","url":null,"abstract":"The rapid pace of the development of artificial intelligence (AI) solutions is enabled by leveraging foundational tools and frameworks that allow AI developers to focus on application logic and rapid prototyping. However, the security vulnerabilities present in foundation repositories might cause irreparable damage due to the AI solutions built using these libraries being deployed in production environments. Our research leverages source code hosted on the prevailing social coding platform GitHub to identify vulnerabilities in foundational repositories commonly used for modern AI development (Linux, BERT, PyTorch, and Transformers), as well as the AI repositories that utilize foundation repositories as dependencies. Using an unsupervised graph embedding approach, we generate graph embeddings that capture vulnerability information and the relationships between repositories. Based on these embeddings, we performed clustering as our downstream task to group similarly vulnerable repositories. Our research identifies patterns and similarities between repositories and will help develop effective mitigation of vulnerabilities present in groups of repositories based on foundational AI repositories. We also discuss the implications of identifying such clusters of vulnerable repositories.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114357632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SV-Learn: Learning Matrix Singular Values with Neural Networks 用神经网络学习矩阵奇异值

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00039

Derek Xu, William Shiao, Jia Chen, E. Papalexakis

The singular value decomposition (SVD) factors a matrix into three separate matrices: two (semi-)unitary matrices whose columns are left/right singular vectors and one diagonal matrix whose diagonal entries are singular values. Typically, performing SVD on big matrices is taxing due to its compu-tational complexity in the cubic order of its dimensions. With the advances and rapid growth of deep learning techniques in a broad spectrum of applications, a fundamental question arises: can deep neural networks learn the singular values of a matrix? To answer this question, we propose a novel algorithm, namely SV-Iearn, to predict the singular values of a given input matrix by leveraging the advances of neural networks. Numerical results demonstrate that our proposed method outperforms the competing alternatives in terms of achieving lower normalized mean square error on singular value prediction when using real-world datasets. Further, the predicted singular values combined with singular vectors of an input data allow us to reconstruct the input matrices with promising performance.

奇异值分解(SVD)将一个矩阵分解为三个独立的矩阵:两个(半)酉矩阵，其列是左/右奇异向量，一个对角矩阵，其对角项是奇异值。通常，在大矩阵上执行SVD是很费力的，因为它在维度的三次顺序上的计算复杂性。随着深度学习技术在广泛应用中的进步和快速发展，一个基本问题出现了:深度神经网络能学习矩阵的奇异值吗?为了回答这个问题，我们提出了一种新的算法，即sv - learn，通过利用神经网络的进步来预测给定输入矩阵的奇异值。数值结果表明，在使用实际数据集进行奇异值预测时，我们提出的方法在实现更低的归一化均方误差方面优于竞争方案。此外，预测的奇异值与输入数据的奇异向量相结合，使我们能够重建具有良好性能的输入矩阵。

引用次数: 0

cSmartML-Glassbox: Increasing Transparency and Controllability in Automated Clustering cSmartML-Glassbox:在自动集群中增加透明度和可控性

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00015

Radwa El Shawi, S. Sakr

Machine learning algorithms have been widely employed in various applications and fields. Novel technologies in automated machine learning (AutoML) ease algorithm selection and hyperparameter optimization complexity. AutoML frame-works have achieved notable success in hyperparameter tuning and surpassed the performance of human experts. However, depending on such frameworks as black-box can leave machine learning practitioners without insights into the inner working of the AutoML process and hence influence their trust in the models produced. In addition, excluding humans from the loop creates several limitations. For example, most of the current AutoML frameworks ignore the user preferences on defining or controlling the search space, which consequently can impact the performance of the models produced and the acceptance of these models by the end-users. The research in the area of transparency and controllability of AutoML has attracted much interest lately, both in academia and industry. However, existing tools are usually restricted to supervised learning tasks such as classification and regression, while unsupervised learning, particularly clustering, remains a largely unexplored problem. Motivated by these shortcomings, we design and implement cSmartML-GlassBox, an interactive visualization tool that en-ables users to refine the search space of AutoML and analyze the results. cSmartML-GlassBox is equipped with a recommendation engine to recommend a time budget that is likely adequate for a new dataset to obtain well-performing pipeline. In addition, the tool supports multi-granularity visualization to enable machine learning practitioners to monitor the AutoML process, analyze the explored configurations and refine/control the search space. Furthermore, cSmartML-GlassBox is equipped with a logging mechanism such that repeated runs on the same dataset can be more effective by avoiding evaluating the same previously considered configurations. We demonstrate the effectiveness and usability of the cSmartML-GlassBox through a user evaluation study with 23 participants and an expert-based usability study based on four experts. We find that the proposed tool increases users' understanding and trust in the AutoML frameworks.

机器学习算法已广泛应用于各种应用和领域。自动化机器学习(AutoML)中的新技术简化了算法选择和超参数优化的复杂性。AutoML框架在超参数调优方面取得了显著的成功，超越了人类专家的表现。然而，依赖于黑盒这样的框架可能会让机器学习从业者无法深入了解AutoML过程的内部工作，从而影响他们对生成的模型的信任。此外，将人类排除在循环之外会产生一些限制。例如，大多数当前的AutoML框架都忽略了用户在定义或控制搜索空间时的偏好，这可能会影响生成的模型的性能以及最终用户对这些模型的接受程度。近年来，自动化系统的透明性和可控性研究引起了学术界和工业界的广泛关注。然而，现有的工具通常仅限于监督学习任务，如分类和回归，而无监督学习，特别是聚类，仍然是一个很大程度上未被探索的问题。针对这些不足，我们设计并实现了交互式可视化工具cSmartML-GlassBox，使用户能够细化AutoML的搜索空间并分析结果。cSmartML-GlassBox配备了一个推荐引擎，可以为新数据集推荐一个可能足够的时间预算，以获得性能良好的管道。此外，该工具支持多粒度可视化，使机器学习从业者能够监控AutoML过程，分析探索的配置并优化/控制搜索空间。此外，cSmartML-GlassBox配备了日志记录机制，通过避免评估相同的先前考虑的配置，在同一数据集上重复运行可以更有效。我们通过23名参与者的用户评估研究和4名专家的基于专家的可用性研究来证明cSmartML-GlassBox的有效性和可用性。我们发现所提出的工具增加了用户对AutoML框架的理解和信任。

{"title":"cSmartML-Glassbox: Increasing Transparency and Controllability in Automated Clustering","authors":"Radwa El Shawi, S. Sakr","doi":"10.1109/ICDMW58026.2022.00015","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00015","url":null,"abstract":"Machine learning algorithms have been widely employed in various applications and fields. Novel technologies in automated machine learning (AutoML) ease algorithm selection and hyperparameter optimization complexity. AutoML frame-works have achieved notable success in hyperparameter tuning and surpassed the performance of human experts. However, depending on such frameworks as black-box can leave machine learning practitioners without insights into the inner working of the AutoML process and hence influence their trust in the models produced. In addition, excluding humans from the loop creates several limitations. For example, most of the current AutoML frameworks ignore the user preferences on defining or controlling the search space, which consequently can impact the performance of the models produced and the acceptance of these models by the end-users. The research in the area of transparency and controllability of AutoML has attracted much interest lately, both in academia and industry. However, existing tools are usually restricted to supervised learning tasks such as classification and regression, while unsupervised learning, particularly clustering, remains a largely unexplored problem. Motivated by these shortcomings, we design and implement cSmartML-GlassBox, an interactive visualization tool that en-ables users to refine the search space of AutoML and analyze the results. cSmartML-GlassBox is equipped with a recommendation engine to recommend a time budget that is likely adequate for a new dataset to obtain well-performing pipeline. In addition, the tool supports multi-granularity visualization to enable machine learning practitioners to monitor the AutoML process, analyze the explored configurations and refine/control the search space. Furthermore, cSmartML-GlassBox is equipped with a logging mechanism such that repeated runs on the same dataset can be more effective by avoiding evaluating the same previously considered configurations. We demonstrate the effectiveness and usability of the cSmartML-GlassBox through a user evaluation study with 23 participants and an expert-based usability study based on four experts. We find that the proposed tool increases users' understanding and trust in the AutoML frameworks.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"2002 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129571501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Case Study on Periodic Spatio- Temporal Hotspot Detection in Azure Traffic Data 基于Azure交通数据的周期性时空热点检测实例研究

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00135

Venkata M. V. Gunturi, Rakesh Rajeev, Vipul Bondre, Aaditya Barnwal, Samir Jain, Ashank Anshuman, Manish Gupta

Given a spatio-temporal event framework E and a collection of time-stamped events A (over E), the goal of the periodic spatio-temporal hotspot detection (PST-Hotspot) problem is to determine spatial regions which show high “intensity” of events at certain periodic intervals. The output of the PST-Hotspot detection problem consists of the following: (a) a col-lection of spatial regions (which show high intensity of events) and, (b) their respective time intervals of high activity and periodicity values (e.g., daily, weekday-only, etc). PST-Hotspot detection poses significant challenge for designing a suitable interest measure. The aim over here is to design a mathematical representation of a PST-Hotspot such that it can differentiate interesting periodic patterns from trivial persistent patterns in the dataset. The current state of the art in the area of spatial and spatio-temporal hotspot detection focus on non-periodic patterns. In contrast, our proposed approach is able to determine periodic hotspots. We experimentally evaluated our proposed algorithm using real Azure traffic dataset from the Indian region.

给定一个时空事件框架E和一组时间戳事件a(在E上)，周期时空热点检测(PST-Hotspot)问题的目标是确定在一定的周期间隔内显示出高“强度”事件的空间区域。PST-Hotspot检测问题的输出包括以下内容:(a)空间区域的集合(显示事件的高强度)和(b)它们各自的高活动和周期性值的时间间隔(例如，每天，仅工作日等)。pst -热点检测对设计合适的兴趣测量提出了重大挑战。这里的目标是设计一个PST-Hotspot的数学表示，这样它就可以区分数据集中有趣的周期性模式和琐碎的持久模式。目前在空间和时空热点检测领域的研究主要集中在非周期模式。相比之下，我们提出的方法能够确定周期性热点。我们使用来自印度地区的真实Azure交通数据集对我们提出的算法进行了实验评估。

{"title":"A Case Study on Periodic Spatio- Temporal Hotspot Detection in Azure Traffic Data","authors":"Venkata M. V. Gunturi, Rakesh Rajeev, Vipul Bondre, Aaditya Barnwal, Samir Jain, Ashank Anshuman, Manish Gupta","doi":"10.1109/ICDMW58026.2022.00135","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00135","url":null,"abstract":"Given a spatio-temporal event framework E and a collection of time-stamped events A (over E), the goal of the periodic spatio-temporal hotspot detection (PST-Hotspot) problem is to determine spatial regions which show high “intensity” of events at certain periodic intervals. The output of the PST-Hotspot detection problem consists of the following: (a) a col-lection of spatial regions (which show high intensity of events) and, (b) their respective time intervals of high activity and periodicity values (e.g., daily, weekday-only, etc). PST-Hotspot detection poses significant challenge for designing a suitable interest measure. The aim over here is to design a mathematical representation of a PST-Hotspot such that it can differentiate interesting periodic patterns from trivial persistent patterns in the dataset. The current state of the art in the area of spatial and spatio-temporal hotspot detection focus on non-periodic patterns. In contrast, our proposed approach is able to determine periodic hotspots. We experimentally evaluated our proposed algorithm using real Azure traffic dataset from the Indian region.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127129500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint Low-rank and Orthogonal Deep Multi-view Subspace Clustering based on Local Fusion 基于局部融合的联合低秩正交深度多视图子空间聚类

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00017

Guixiang Wang, Hongwei Yin, Wenjun Hu, Y. Liu, Ruiqin Wang

In recent years, a number of multi-view clustering methods have been proposed through a global fusion paradigm. These methods take the entire sample space as the fusion object, where the global complementarity between views is explored and exploited to improve the clustering performance. However, local structures with strong or weak clustering capacity could coexist in each view. The traditional global fusion paradigm ignores the differences in clustering capacity of local structures, which makes it impossible to explore and exploit local complementarity between views. In this paper, a novel deep multi view subspace clustering method based on local fusion is proposed to solve this problem. First, a low rank self-expression layer is inserted into the deep autoencoder to eliminate the influence of noises when obtaining local cluster structure. Then, the fusion object is refined from the entire sample space to the local cluster structure, where a self-weighted strategy is designed to assign contribution weight according to the clustering capacity of the local cluster structure. Meanwhile, we joint orthogonal constraint to enhance the discriminative of local cluster structure that is more suitable for downstream clustering task. Experiments on several real-world datasets show that the proposed method achieves better clustering performance than most traditional multi-view clustering methods based on global fusion.

近年来，通过全局融合范式提出了许多多视图聚类方法。这些方法以整个样本空间为融合对象，探索并利用视图之间的全局互补性来提高聚类性能。然而，具有强或弱聚类能力的局部结构可以在每个视图中共存。传统的全局融合范式忽略了局部结构聚类能力的差异，使得无法探索和利用视图之间的局部互补性。针对这一问题，提出了一种基于局部融合的深度多视图子空间聚类方法。首先，在深度自编码器中插入低秩自表达层，在获取局部簇结构时消除噪声的影响;然后，将融合对象从整个样本空间细化到局部聚类结构，并设计自加权策略，根据局部聚类结构的聚类能力分配贡献权重;同时，结合正交约束，增强局部聚类结构的判别性，使其更适合下游聚类任务。在多个真实数据集上的实验表明，该方法比大多数传统的基于全局融合的多视图聚类方法具有更好的聚类性能。

{"title":"Joint Low-rank and Orthogonal Deep Multi-view Subspace Clustering based on Local Fusion","authors":"Guixiang Wang, Hongwei Yin, Wenjun Hu, Y. Liu, Ruiqin Wang","doi":"10.1109/ICDMW58026.2022.00017","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00017","url":null,"abstract":"In recent years, a number of multi-view clustering methods have been proposed through a global fusion paradigm. These methods take the entire sample space as the fusion object, where the global complementarity between views is explored and exploited to improve the clustering performance. However, local structures with strong or weak clustering capacity could coexist in each view. The traditional global fusion paradigm ignores the differences in clustering capacity of local structures, which makes it impossible to explore and exploit local complementarity between views. In this paper, a novel deep multi view subspace clustering method based on local fusion is proposed to solve this problem. First, a low rank self-expression layer is inserted into the deep autoencoder to eliminate the influence of noises when obtaining local cluster structure. Then, the fusion object is refined from the entire sample space to the local cluster structure, where a self-weighted strategy is designed to assign contribution weight according to the clustering capacity of the local cluster structure. Meanwhile, we joint orthogonal constraint to enhance the discriminative of local cluster structure that is more suitable for downstream clustering task. Experiments on several real-world datasets show that the proposed method achieves better clustering performance than most traditional multi-view clustering methods based on global fusion.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127398369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incremental Learning in Time-series Data using Reinforcement Learning 使用强化学习的时间序列数据增量学习

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00115

Mustafa Shuqair, J. Jimenez-shahed, B. Ghoraani

System monitoring has become an area of interest with the increasing growth in wearable sensors and continuous monitoring tools. However, the generalizability of the classification models to unseen incoming data remains challenging. This paper proposes a novel architecture based on reinforcement learning (RL) to incre-mentally learn patterns of time-series data and detect changes in the system state. Our rationale is that RL's ability to learn from past experiences can help increase the performance and generalizability of classification models in time-series monitoring applications. Our novel definition of the environment consists of a set of one-class anomaly detectors to define environment states based on the dynamics of the incoming data and a reward function to reward the RL agent according to its actions. A deep RL agent incrementally learns to perform continuous, binary classification predictions according to the environment states and the received reward. We applied the proposed model for detecting response to medication (ON or OFF) in patients with Parkinson's disease (PD). The PD dataset consisted of 170 minutes of time-series movement signals collected from 12 patients using two wearable sensors. Our proposed model, with a testing accuracy of 77.95%, outperformed Adaptive Boosting, Multi-layer Perceptron, and Support Vector Machines with 53.10%, 44.92%, and 52.70% testing accuracy, respectively. The proposed model had a slight decline in the F-score, decreasing from 88.15% validation score to 78.42% in testing, a significantly slight decline compared to the other three models. These evidence the potential of the proposed RL-based classifier in time-series monitoring applications as a highly generalizable model for unseen incoming data.

随着可穿戴传感器和连续监测工具的日益增长，系统监测已成为一个感兴趣的领域。然而，分类模型对未知输入数据的泛化性仍然具有挑战性。本文提出了一种基于强化学习(RL)的新架构，用于增量学习时间序列数据的模式并检测系统状态的变化。我们的基本原理是，强化学习从过去的经验中学习的能力可以帮助提高时间序列监控应用中分类模型的性能和泛化性。我们对环境的新定义包括一组单类异常检测器，用于根据传入数据的动态定义环境状态，以及一个奖励函数，根据RL代理的行为对其进行奖励。深度强化学习代理根据环境状态和收到的奖励增量学习执行连续的二元分类预测。我们将提出的模型用于检测帕金森病(PD)患者对药物的反应(ON或OFF)。PD数据集包括使用两个可穿戴传感器从12名患者收集的170分钟时间序列运动信号。该模型的测试准确率为77.95%，优于自适应增强、多层感知机和支持向量机，其测试准确率分别为53.10%、44.92%和52.70%。该模型的f得分略有下降，从88.15%的验证分数下降到78.42%，与其他三个模型相比，f得分略有下降。这些证据表明，所提出的基于强化学习的分类器在时间序列监测应用中具有潜力，可以作为一种高度一般化的模型来处理未见过的传入数据。

{"title":"Incremental Learning in Time-series Data using Reinforcement Learning","authors":"Mustafa Shuqair, J. Jimenez-shahed, B. Ghoraani","doi":"10.1109/ICDMW58026.2022.00115","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00115","url":null,"abstract":"System monitoring has become an area of interest with the increasing growth in wearable sensors and continuous monitoring tools. However, the generalizability of the classification models to unseen incoming data remains challenging. This paper proposes a novel architecture based on reinforcement learning (RL) to incre-mentally learn patterns of time-series data and detect changes in the system state. Our rationale is that RL's ability to learn from past experiences can help increase the performance and generalizability of classification models in time-series monitoring applications. Our novel definition of the environment consists of a set of one-class anomaly detectors to define environment states based on the dynamics of the incoming data and a reward function to reward the RL agent according to its actions. A deep RL agent incrementally learns to perform continuous, binary classification predictions according to the environment states and the received reward. We applied the proposed model for detecting response to medication (ON or OFF) in patients with Parkinson's disease (PD). The PD dataset consisted of 170 minutes of time-series movement signals collected from 12 patients using two wearable sensors. Our proposed model, with a testing accuracy of 77.95%, outperformed Adaptive Boosting, Multi-layer Perceptron, and Support Vector Machines with 53.10%, 44.92%, and 52.70% testing accuracy, respectively. The proposed model had a slight decline in the F-score, decreasing from 88.15% validation score to 78.42% in testing, a significantly slight decline compared to the other three models. These evidence the potential of the proposed RL-based classifier in time-series monitoring applications as a highly generalizable model for unseen incoming data.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127319013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Using Image Processing Techniques to Identify and Quantify Spatiotemporal Carbon Cycle Extremes 利用图像处理技术识别和量化时空碳循环极值

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00148

Bharat Sharma, J. Kumar, A. Ganguly, F. Hoffman

Rising atmospheric carbon dioxide due to human activities through fossil fuel emissions and land use changes have increased climate extremes such as heat waves and droughts that have led to and are expected to increase the occurrence of carbon cycle extremes. Carbon cycle extremes represent large anomalies in the carbon cycle that are associated with gains or losses in carbon uptake. Carbon cycle extremes could be continuous in space and time and cross political boundaries. Here, we present a methodology to identify large spatiotemporal extremes (STEs) in the terrestrial carbon cycle using image processing tools for feature detection. We characterized the STE events based on neighborhood structures that are three-dimensional adjacency matrices for the detection of spatiotemporal manifolds of carbon cycle extremes. We found that the area affected and carbon loss during negative carbon cycle extremes were consistent with continuous neighborhood structures. In the gross primary production data we used, 100 carbon cycle STEs accounted for more than 75% of all the negative carbon cycle extremes. This paper presents a comparative analysis of the magnitude of carbon cycle STEs and attribution of those STEs to climate drivers as a function of neighborhood structures for two observational datasets and an Earth system model simulation.

由于化石燃料排放和土地利用变化等人类活动造成的大气二氧化碳上升，增加了热浪和干旱等极端气候，这些极端气候已经导致并预计将增加极端碳循环的发生。碳循环极值表示与碳吸收的增益或损失有关的碳循环中的巨大异常。极端碳循环在时空上可能是连续的，并跨越政治边界。本文提出了一种利用图像处理工具进行特征检测的方法来识别陆地碳循环中的大时空极值(ste)。我们基于邻域结构对STE事件进行表征，邻域结构是用于检测碳循环极值时空流形的三维邻接矩阵。研究发现，负碳循环极端期的影响面积和碳损失与连续的邻域结构一致。在我们使用的总初级生产数据中，100个碳循环企业占所有负碳循环极端事件的75%以上。本文利用两个观测数据集和一个地球系统模式模拟，比较分析了碳循环STEs的大小以及这些STEs作为邻域结构函数对气候驱动因素的归因。

{"title":"Using Image Processing Techniques to Identify and Quantify Spatiotemporal Carbon Cycle Extremes","authors":"Bharat Sharma, J. Kumar, A. Ganguly, F. Hoffman","doi":"10.1109/ICDMW58026.2022.00148","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00148","url":null,"abstract":"Rising atmospheric carbon dioxide due to human activities through fossil fuel emissions and land use changes have increased climate extremes such as heat waves and droughts that have led to and are expected to increase the occurrence of carbon cycle extremes. Carbon cycle extremes represent large anomalies in the carbon cycle that are associated with gains or losses in carbon uptake. Carbon cycle extremes could be continuous in space and time and cross political boundaries. Here, we present a methodology to identify large spatiotemporal extremes (STEs) in the terrestrial carbon cycle using image processing tools for feature detection. We characterized the STE events based on neighborhood structures that are three-dimensional adjacency matrices for the detection of spatiotemporal manifolds of carbon cycle extremes. We found that the area affected and carbon loss during negative carbon cycle extremes were consistent with continuous neighborhood structures. In the gross primary production data we used, 100 carbon cycle STEs accounted for more than 75% of all the negative carbon cycle extremes. This paper presents a comparative analysis of the magnitude of carbon cycle STEs and attribution of those STEs to climate drivers as a function of neighborhood structures for two observational datasets and an Earth system model simulation.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126411172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀