首页 > 最新文献

2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
Discovering Unknown Labels for Multi-Label Image Classification 多标签图像分类中的未知标签发现
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00108
Jun Huang, Yu Yan, Xiao Zheng, Xiwen Qu, Xudong Hong
A multi-label learning (MLL) method can simul-taneously process the instances with multiple labels, and many well-known methods have been proposed to solve various MLL-related problems. The existing MLL methods are mainly applied under the assumption of a fixed label set, i.e., the class labels are all observed for the training data. However, in many real-world applications, there may be some unknown labels outside of this set, especially for large-scale and complex datasets. In this paper, a multi-label classification model based on deep learning is proposed to discover the unknown labels for multi-label image classification. It can simultaneously predict known and unknown labels for unseen images. Besides, an attention mechanism is introduced into the model, where the attention maps of unknown labels can be used to observe the corresponding objects of an image and to get the semantic information of these unknown labels.
多标签学习(multi-label learning, MLL)方法可以同时处理具有多个标签的实例,已经提出了许多著名的方法来解决各种与多标签学习相关的问题。现有的MLL方法主要是在固定标签集的假设下应用的,即对训练数据都观察到类标签。然而,在许多现实世界的应用程序中,可能会有一些未知的标签在这个集合之外,特别是对于大规模和复杂的数据集。本文提出了一种基于深度学习的多标签分类模型,用于多标签图像分类中未知标签的发现。它可以同时预测未知图像的已知和未知标签。此外,在模型中引入了注意机制,利用未知标签的注意图来观察图像中对应的对象,并获得这些未知标签的语义信息。
{"title":"Discovering Unknown Labels for Multi-Label Image Classification","authors":"Jun Huang, Yu Yan, Xiao Zheng, Xiwen Qu, Xudong Hong","doi":"10.1109/ICDMW58026.2022.00108","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00108","url":null,"abstract":"A multi-label learning (MLL) method can simul-taneously process the instances with multiple labels, and many well-known methods have been proposed to solve various MLL-related problems. The existing MLL methods are mainly applied under the assumption of a fixed label set, i.e., the class labels are all observed for the training data. However, in many real-world applications, there may be some unknown labels outside of this set, especially for large-scale and complex datasets. In this paper, a multi-label classification model based on deep learning is proposed to discover the unknown labels for multi-label image classification. It can simultaneously predict known and unknown labels for unseen images. Besides, an attention mechanism is introduced into the model, where the attention maps of unknown labels can be used to observe the corresponding objects of an image and to get the semantic information of these unknown labels.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126573765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling 基于两阶段建模的文本与调查数据组合特征提取与预测
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00064
A. A. Neloy, M. Turgeon
Deep learning (DL) based natural language processing (NLP) has recently grown as one the fastest research domain and retained remarkable improvement in many applications. Due to the significant amount of data, the adaptation of feature learning and symmetric data efficiency is a critical underlying task in such applications. However, their ability to extract features is limited due to a lack of proper model formation. Moreover, the use of these methods on smaller datasets is unexplored and underdeveloped compared to more popular research areas. This work introduces a two-stage modeling approach to combine classical statistical analysis with NLP problems in a real-world dataset. We effectively layout a combination of the classical statistical model incorporating a stacked ensemble classifier and a DL framework of convolutional neural network (CNN) and Bidirectional Recurrent Neural Networks (Bi-RNN) to structure a more decomposed architecture with lower computational complexity. Additionally, the experimental results illustrating 96.69 % training and 70.56 % testing accuracy and hypothesis testing from our DL models followed by an ablation study empirically demonstrate the validation of our proposed combined modeling technique.
基于深度学习(DL)的自然语言处理(NLP)是近年来发展最快的研究领域之一,在许多应用中都取得了显著的进步。由于数据量巨大,特征学习的适应和对称数据的效率是这类应用的关键底层任务。然而,由于缺乏适当的模型形成,它们提取特征的能力受到限制。此外,与更流行的研究领域相比,这些方法在较小数据集上的使用是未经探索和不发达的。这项工作介绍了一种两阶段建模方法,将经典统计分析与现实世界数据集中的NLP问题结合起来。我们有效地将经典统计模型与卷积神经网络(CNN)和双向递归神经网络(Bi-RNN)的堆叠集成分类器和深度学习框架组合在一起,以构建具有更低计算复杂度的更分解的体系结构。此外,实验结果表明,我们的深度学习模型的训练准确率为96.69%,测试准确率为70.56%,并且在消融研究之后进行了假设检验,从经验上证明了我们提出的组合建模技术的有效性。
{"title":"Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling","authors":"A. A. Neloy, M. Turgeon","doi":"10.1109/ICDMW58026.2022.00064","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00064","url":null,"abstract":"Deep learning (DL) based natural language processing (NLP) has recently grown as one the fastest research domain and retained remarkable improvement in many applications. Due to the significant amount of data, the adaptation of feature learning and symmetric data efficiency is a critical underlying task in such applications. However, their ability to extract features is limited due to a lack of proper model formation. Moreover, the use of these methods on smaller datasets is unexplored and underdeveloped compared to more popular research areas. This work introduces a two-stage modeling approach to combine classical statistical analysis with NLP problems in a real-world dataset. We effectively layout a combination of the classical statistical model incorporating a stacked ensemble classifier and a DL framework of convolutional neural network (CNN) and Bidirectional Recurrent Neural Networks (Bi-RNN) to structure a more decomposed architecture with lower computational complexity. Additionally, the experimental results illustrating 96.69 % training and 70.56 % testing accuracy and hypothesis testing from our DL models followed by an ablation study empirically demonstrate the validation of our proposed combined modeling technique.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116606562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning-based approach for mercury detection in marine waters 一种基于机器学习的海水汞检测方法
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00074
F. Piccialli, F. Giampaolo, Vincenzo Schiano Di Cola, Federico Gatta, Diletta Chiaro, E. Prezioso, Stefano Izzo, S. Cuomo
Thanks to the widespread use of mobile devices, analyses that in the past had to be carried out in specifically designated and equipped laboratories and which required long processing times, may now take place outdoor and in real time. In the marine science, for example, the development of a mobile and compact system for the on-site detection of heavy metals contamination in seawater would be helpful for scientists and society in at least two ways: i) reduction of time and costs associated with these experiments; ii) the implementation of a strategy for outdoor analysis, eventually embeddable in a lab-on-hardware system. This paper falls within the context of machine learning (ML) for utility pattern mining applied on interdisciplinary domains: starting from wellplates images, we provide a novel proof-of-concept (PoC) machine learning-based framework to assist scientists in their daily research on seawater samples, proposing a system which automatically recognise wells in a multiwell firstly and then predicts the degree of fluorescence in each of them, thus showing possible presence of heavy metals.
由于移动设备的广泛使用,过去必须在专门指定和装备齐全的实验室进行的分析,需要很长的处理时间,现在可以在室外实时进行。例如,在海洋科学中,开发一种可移动的紧凑型系统,用于现场检测海水中的重金属污染,至少在两个方面对科学家和社会有帮助:1)减少与这些实验相关的时间和成本;Ii)室外分析策略的实施,最终可嵌入到硬件实验室系统中。本文属于机器学习(ML)在跨学科领域应用的实用模式挖掘的背景下:从井板图像开始,我们提供了一个新的概念验证(PoC)基于机器学习的框架,以协助科学家对海水样本进行日常研究,提出了一个系统,该系统首先自动识别多井中的井,然后预测每个井的荧光程度,从而显示重金属的可能存在。
{"title":"A machine learning-based approach for mercury detection in marine waters","authors":"F. Piccialli, F. Giampaolo, Vincenzo Schiano Di Cola, Federico Gatta, Diletta Chiaro, E. Prezioso, Stefano Izzo, S. Cuomo","doi":"10.1109/ICDMW58026.2022.00074","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00074","url":null,"abstract":"Thanks to the widespread use of mobile devices, analyses that in the past had to be carried out in specifically designated and equipped laboratories and which required long processing times, may now take place outdoor and in real time. In the marine science, for example, the development of a mobile and compact system for the on-site detection of heavy metals contamination in seawater would be helpful for scientists and society in at least two ways: i) reduction of time and costs associated with these experiments; ii) the implementation of a strategy for outdoor analysis, eventually embeddable in a lab-on-hardware system. This paper falls within the context of machine learning (ML) for utility pattern mining applied on interdisciplinary domains: starting from wellplates images, we provide a novel proof-of-concept (PoC) machine learning-based framework to assist scientists in their daily research on seawater samples, proposing a system which automatically recognise wells in a multiwell firstly and then predicts the degree of fluorescence in each of them, thus showing possible presence of heavy metals.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127817917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HeteroGuard: Defending Heterogeneous Graph Neural Networks against Adversarial Attacks 异构保护:防御异构图神经网络对抗攻击
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00096
Udesh Kumarasinghe, Mohamed Nabeel, K. de Zoysa, K. Gunawardana, Charitha Elvitigala
Graph neural networks (GNNs) have achieved re-markable success in many application domains including drug discovery, program analysis, social networks, and cyber security. However, it has been shown that they are not robust against adversarial attacks. In the recent past, many adversarial attacks against homogeneous GNNs and defenses have been proposed. However, most of these attacks and defenses are ineffective on heterogeneous graphs as these algorithms optimize under the assumption that all edge and node types are of the same and further they introduce semantically incorrect edges to perturbed graphs. Here, we first develop, HetePR-BCD, a training time (i.e. poisoning) adversarial attack on heterogeneous graphs that outperforms the start of the art attacks proposed in the literature. Our experimental results on three benchmark heterogeneous graphs show that our attack, with a small perturbation budget of 15 %, degrades the performance up to 32 % (Fl score) compared to existing ones. It is concerning to mention that existing defenses are not robust against our attack. These defenses primarily modify the GNN's neural message passing operators assuming that adversarial attacks tend to connect nodes with dissimilar features, but this assumption does not hold in heterogeneous graphs. We construct HeteroGuard, an effective defense against training time attacks including HetePR-BCD on heterogeneous models. HeteroGuard outperforms the existing defenses by 3–8 % on Fl score depending on the benchmark dataset.
图神经网络(gnn)在药物发现、程序分析、社交网络和网络安全等许多应用领域取得了显著的成功。然而,已经证明它们对对抗性攻击并不健壮。在最近的过去,已经提出了许多针对同构gnn的对抗性攻击和防御。然而,这些攻击和防御在异构图上大多是无效的,因为这些算法是在假设所有的边和节点类型都是相同的前提下进行优化的,而且它们还向扰动图引入了语义上不正确的边。在这里,我们首先开发了HetePR-BCD,这是一种针对异构图的训练时间(即中毒)对抗性攻击,优于文献中提出的艺术攻击的开始。我们在三个基准异构图上的实验结果表明,我们的攻击,在15%的小扰动预算下,与现有的攻击相比,性能下降了32% (Fl分数)。值得一提的是,现有的防御工事经不起我们的进攻。这些防御主要修改了GNN的神经信息传递算子,假设对抗性攻击倾向于连接具有不同特征的节点,但这种假设在异构图中并不成立。在异构模型上构建了一种有效防御包括HetePR-BCD在内的训练时间攻击的异构防御机制。根据基准数据集,HeteroGuard在Fl得分上比现有防御高出3 - 8%。
{"title":"HeteroGuard: Defending Heterogeneous Graph Neural Networks against Adversarial Attacks","authors":"Udesh Kumarasinghe, Mohamed Nabeel, K. de Zoysa, K. Gunawardana, Charitha Elvitigala","doi":"10.1109/ICDMW58026.2022.00096","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00096","url":null,"abstract":"Graph neural networks (GNNs) have achieved re-markable success in many application domains including drug discovery, program analysis, social networks, and cyber security. However, it has been shown that they are not robust against adversarial attacks. In the recent past, many adversarial attacks against homogeneous GNNs and defenses have been proposed. However, most of these attacks and defenses are ineffective on heterogeneous graphs as these algorithms optimize under the assumption that all edge and node types are of the same and further they introduce semantically incorrect edges to perturbed graphs. Here, we first develop, HetePR-BCD, a training time (i.e. poisoning) adversarial attack on heterogeneous graphs that outperforms the start of the art attacks proposed in the literature. Our experimental results on three benchmark heterogeneous graphs show that our attack, with a small perturbation budget of 15 %, degrades the performance up to 32 % (Fl score) compared to existing ones. It is concerning to mention that existing defenses are not robust against our attack. These defenses primarily modify the GNN's neural message passing operators assuming that adversarial attacks tend to connect nodes with dissimilar features, but this assumption does not hold in heterogeneous graphs. We construct HeteroGuard, an effective defense against training time attacks including HetePR-BCD on heterogeneous models. HeteroGuard outperforms the existing defenses by 3–8 % on Fl score depending on the benchmark dataset.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133034498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Degree-Related Bias in Link Prediction 链接预测中的度相关偏差
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00103
Yu Wang, Tyler Derr
Link prediction is a fundamental problem for network-structured data and has achieved unprecedented success in many real-world applications. Despite the significant progress being made towards improving its performance by characterizing underlined topological patterns or leveraging representation learning, few works have focused on the imbalanced performance among nodes of different degrees. In this paper, we propose a novel problem, degree-related bias and evaluation bias, on link prediction with an emphasis on recommender system applications. We first empirically demonstrate the performance differ-ence among nodes with different degrees and then theoretically prove that Recall is an unbiased evaluation metric compared with Fl, NDCG and Precision. Furthermore, we show that under the unbiased evaluation metric Recall, low-degree nodes tend to have higher performance than high-degree nodes in link prediction.
链路预测是网络结构化数据的一个基本问题,在许多实际应用中取得了前所未有的成功。尽管通过表征下划线拓扑模式或利用表示学习在提高其性能方面取得了重大进展,但很少有作品关注不同程度节点之间的不平衡性能。在本文中,我们提出了一个关于链接预测的新问题,学位相关偏差和评价偏差,并重点讨论了推荐系统的应用。我们首先通过实证证明了不同程度节点之间的性能差异,然后从理论上证明了召回率与Fl、NDCG和Precision相比是一个无偏的评价指标。此外,我们发现在无偏评价指标Recall下,低度节点在链路预测方面往往比高度节点具有更高的性能。
{"title":"Degree-Related Bias in Link Prediction","authors":"Yu Wang, Tyler Derr","doi":"10.1109/ICDMW58026.2022.00103","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00103","url":null,"abstract":"Link prediction is a fundamental problem for network-structured data and has achieved unprecedented success in many real-world applications. Despite the significant progress being made towards improving its performance by characterizing underlined topological patterns or leveraging representation learning, few works have focused on the imbalanced performance among nodes of different degrees. In this paper, we propose a novel problem, degree-related bias and evaluation bias, on link prediction with an emphasis on recommender system applications. We first empirically demonstrate the performance differ-ence among nodes with different degrees and then theoretically prove that Recall is an unbiased evaluation metric compared with Fl, NDCG and Precision. Furthermore, we show that under the unbiased evaluation metric Recall, low-degree nodes tend to have higher performance than high-degree nodes in link prediction.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133810451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DragStream: An Anomaly And Concept Drift Detector In Univariate Data Streams DragStream:单变量数据流中的异常和概念漂移检测器
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00113
Anne Marthe Sophie Ngo Bibinbe, A. J. Mahamadou, Michael Franklin Mbouopda, E. Nguifo
Anomaly detection in data streams comes with different technical challenges due to the data nature. The main challenges include storage limitations, the speed of data arrival, and concept drifts. In the literature, methods for mining data streams in order to detect anomalies have been proposed. While some methods focus on tackling a specific issue, other methods handle diverse problems but may have high complexity (time and memory). In the present work, we propose DragStream, a novel subsequence anomaly and concept drift detection algorithm for univariate data streams. DragStream extends the subsequence anomaly detection method for time series data Drag to streaming data. Furthermore, the new method is inspired by the well-known Matrix Profile, Drag, and MILOF which are respectively point and subsequence anomaly detection methods for time series and data streams. We conducted intensive experiments and statistical analysis to evaluate the performance of the proposed approach against existing methods. The results show that our method is competitive in performance while being linear in time and memory complexity. Finally, we provide an open-source implementation of the new method.
由于数据的性质,数据流中的异常检测面临着不同的技术挑战。主要的挑战包括存储限制、数据到达速度和概念漂移。在文献中,已经提出了挖掘数据流以检测异常的方法。虽然有些方法专注于解决特定问题,但其他方法处理各种问题,但可能具有很高的复杂性(时间和内存)。在本工作中,我们提出了一种新的单变量数据流子序列异常和概念漂移检测算法DragStream。DragStream将时间序列数据的子序列异常检测方法扩展到流数据。此外,该方法还受到了著名的矩阵轮廓、拖拽和MILOF的启发,这三种方法分别是时间序列和数据流的点和子序列异常检测方法。我们进行了大量的实验和统计分析,以评估所提出的方法与现有方法的性能。结果表明,该方法具有较好的性能,同时在时间和内存复杂度上保持线性。最后,我们提供了一个新方法的开源实现。
{"title":"DragStream: An Anomaly And Concept Drift Detector In Univariate Data Streams","authors":"Anne Marthe Sophie Ngo Bibinbe, A. J. Mahamadou, Michael Franklin Mbouopda, E. Nguifo","doi":"10.1109/ICDMW58026.2022.00113","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00113","url":null,"abstract":"Anomaly detection in data streams comes with different technical challenges due to the data nature. The main challenges include storage limitations, the speed of data arrival, and concept drifts. In the literature, methods for mining data streams in order to detect anomalies have been proposed. While some methods focus on tackling a specific issue, other methods handle diverse problems but may have high complexity (time and memory). In the present work, we propose DragStream, a novel subsequence anomaly and concept drift detection algorithm for univariate data streams. DragStream extends the subsequence anomaly detection method for time series data Drag to streaming data. Furthermore, the new method is inspired by the well-known Matrix Profile, Drag, and MILOF which are respectively point and subsequence anomaly detection methods for time series and data streams. We conducted intensive experiments and statistical analysis to evaluate the performance of the proposed approach against existing methods. The results show that our method is competitive in performance while being linear in time and memory complexity. Finally, we provide an open-source implementation of the new method.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123658573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Emerging properties from Bayesian Non-Parametric for multiple clustering: Application for multi-view image dataset 贝叶斯非参数多聚类的新特性:多视图图像数据集的应用
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00013
Reda Khoufache, M. Dilmi, Hanene Azzag, Etienne Gofinnet, M. Lebbah
Artificial Intelligence (AI) in supermarkets is moving fast with the recent advances in deep learning. One important project in the retail sector is the development of AI solutions for smart stores, mainly to improve product recognition. In this paper, we present a new framework to address the multi-view image classification using multiple clustering. The proposed framework combines a pre-trained Vision Transformer with a Bayesian Non-Parametric multiple clustering. In this work, we propose an M CM C- based inference approach to learn the column-partition and the row-partitions. This method infers multiple clustering solutions and allows to find automatically the number of clusters. Our method provides interesting results on a multi-view image dataset and emphasizes, on one hand, the power of pre-trained Vision Transformers combined with the multiple clustering algorithm, on the other hand, the usefulness of the Bayesian Non-Parametric modeling, which automatically performs a model selection.
随着深度学习的最新进展,超市中的人工智能(AI)正在迅速发展。零售领域的一个重要项目是为智能商店开发人工智能解决方案,主要是为了提高产品识别。本文提出了一种新的基于多聚类的多视图图像分类框架。该框架将预训练的视觉转换器与贝叶斯非参数多聚类相结合。在这项工作中,我们提出了一种基于M - CM - C的推理方法来学习列分区和行分区。该方法推断出多个聚类解决方案,并允许自动查找聚类的数量。我们的方法在多视图图像数据集上提供了有趣的结果,并且一方面强调了预先训练的视觉变形器与多聚类算法相结合的强大功能,另一方面强调了贝叶斯非参数建模的有用性,该建模可以自动执行模型选择。
{"title":"Emerging properties from Bayesian Non-Parametric for multiple clustering: Application for multi-view image dataset","authors":"Reda Khoufache, M. Dilmi, Hanene Azzag, Etienne Gofinnet, M. Lebbah","doi":"10.1109/ICDMW58026.2022.00013","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00013","url":null,"abstract":"Artificial Intelligence (AI) in supermarkets is moving fast with the recent advances in deep learning. One important project in the retail sector is the development of AI solutions for smart stores, mainly to improve product recognition. In this paper, we present a new framework to address the multi-view image classification using multiple clustering. The proposed framework combines a pre-trained Vision Transformer with a Bayesian Non-Parametric multiple clustering. In this work, we propose an M CM C- based inference approach to learn the column-partition and the row-partitions. This method infers multiple clustering solutions and allows to find automatically the number of clusters. Our method provides interesting results on a multi-view image dataset and emphasizes, on one hand, the power of pre-trained Vision Transformers combined with the multiple clustering algorithm, on the other hand, the usefulness of the Bayesian Non-Parametric modeling, which automatically performs a model selection.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123678262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Valuable Fuzzy Patterns via the RFM Model 利用RFM模型挖掘有价值的模糊模式
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00075
Yanlin Qi, Fuyin Lai, Guoting Chen, Wensheng Gan
This paper aims to propose an effective algorithm to discover valuable patterns by applying the fuzzy method to the RFM model. RFM analysis is a common method in customer relationship management, through which we can identify valuable customer groups. By combining RFM analysis with frequent pattern mining, valuable RFM - patterns can be found from the RFM-pattern-tree, such as the RFMP-growth algorithm. Aiming to mine patterns that have quantitative relationships among items, we introduce the fuzzy method in the RFM model, and we present a fuzzy - Rfu - tree algorithm in which a new pruning strategy is proposed to prune candidate patterns. Experiments show the effectiveness of the new algorithm. The new algorithm guarantees a high overlap degree with the RFM-patterns gen-erated by RFMP-growth, with more valuable information (with additional fuzzy level) in the mined patterns.
本文旨在将模糊方法应用于RFM模型,提出一种有效的算法来发现有价值的模式。RFM分析是客户关系管理中常用的一种方法,通过RFM分析可以识别出有价值的客户群体。通过将RFM分析与频繁的模式挖掘相结合,可以从RFM模式树中发现有价值的RFM模式,例如RFM增长算法。为了挖掘项目间具有定量关系的模式,在RFM模型中引入模糊方法,提出了一种模糊- Rfu -树算法,该算法提出了一种新的剪剪策略来剪剪候选模式。实验证明了新算法的有效性。新算法保证了与由rfm生长生成的rfm模式的高度重叠,挖掘的模式中有更多有价值的信息(增加了额外的模糊级别)。
{"title":"Mining Valuable Fuzzy Patterns via the RFM Model","authors":"Yanlin Qi, Fuyin Lai, Guoting Chen, Wensheng Gan","doi":"10.1109/ICDMW58026.2022.00075","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00075","url":null,"abstract":"This paper aims to propose an effective algorithm to discover valuable patterns by applying the fuzzy method to the RFM model. RFM analysis is a common method in customer relationship management, through which we can identify valuable customer groups. By combining RFM analysis with frequent pattern mining, valuable RFM - patterns can be found from the RFM-pattern-tree, such as the RFMP-growth algorithm. Aiming to mine patterns that have quantitative relationships among items, we introduce the fuzzy method in the RFM model, and we present a fuzzy - Rfu - tree algorithm in which a new pruning strategy is proposed to prune candidate patterns. Experiments show the effectiveness of the new algorithm. The new algorithm guarantees a high overlap degree with the RFM-patterns gen-erated by RFMP-growth, with more valuable information (with additional fuzzy level) in the mined patterns.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126541686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unknown Type Streaming Feature Selection via Maximal Information Coefficient 基于最大信息系数的未知类型流特征选择
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00089
Peng Zhou, Yunyun Zhang, Yuan-Ting Yan, Shu Zhao
Feature selection aims to select an optimal minimal feature subset from the original datasets and has become an indispensable preprocessing component before data mining and machine learning, especially in the era of big data. Most feature selection methods implicitly assume that we can know the feature type (categorical, numerical, or mixed) before learning, then design corresponding measurements to calculate the correlation between features. However, in practical applications, features may be generated dynamically and arrive one by one over time, which we call streaming features. Most existing streaming feature selection methods assume that all dynamically generated features are the same type or assume we can know the feature type for each new arriving feature on the fly, but this is unreasonable and unrealistic. Therefore, this paper firstly studies a practical issue of Unknown Type Streaming Feature Selection and proposes a new method to handle it, named UT-SFS. Extensive experimental results indicate the effectiveness of our new method. UT-SFS is nonparametric and does not need to know the feature type before learning, which aligns with practical application needs.
特征选择旨在从原始数据集中选择最优的最小特征子集,是数据挖掘和机器学习前不可或缺的预处理组成部分,特别是在大数据时代。大多数特征选择方法隐含地假设我们可以在学习之前知道特征类型(分类、数值或混合),然后设计相应的测量来计算特征之间的相关性。然而,在实际应用中,特征可能是动态生成的,并随着时间的推移一个接一个地到达,我们称之为流特征。大多数现有的流特征选择方法假设所有动态生成的特征都是相同的类型,或者假设我们可以动态地知道每个新到达的特征的特征类型,但这是不合理和不现实的。因此,本文首先研究了未知类型流特征选择的实际问题,并提出了一种新的处理方法UT-SFS。大量的实验结果表明了新方法的有效性。UT-SFS是非参数的,在学习前不需要知道特征类型,符合实际应用需求。
{"title":"Unknown Type Streaming Feature Selection via Maximal Information Coefficient","authors":"Peng Zhou, Yunyun Zhang, Yuan-Ting Yan, Shu Zhao","doi":"10.1109/ICDMW58026.2022.00089","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00089","url":null,"abstract":"Feature selection aims to select an optimal minimal feature subset from the original datasets and has become an indispensable preprocessing component before data mining and machine learning, especially in the era of big data. Most feature selection methods implicitly assume that we can know the feature type (categorical, numerical, or mixed) before learning, then design corresponding measurements to calculate the correlation between features. However, in practical applications, features may be generated dynamically and arrive one by one over time, which we call streaming features. Most existing streaming feature selection methods assume that all dynamically generated features are the same type or assume we can know the feature type for each new arriving feature on the fly, but this is unreasonable and unrealistic. Therefore, this paper firstly studies a practical issue of Unknown Type Streaming Feature Selection and proposes a new method to handle it, named UT-SFS. Extensive experimental results indicate the effectiveness of our new method. UT-SFS is nonparametric and does not need to know the feature type before learning, which aligns with practical application needs.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125894631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting Entities and Events from Cyber-Physical Security Incident Reports 从网络物理安全事件报告中提取实体和事件
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00083
Nitin Ramrakhiyani, Sangameshwar Patil, Manideep Jella, Alok Kumar, G. Palshikar
Cyber- physical systems are an important part of many industries such as the chemical process industry, manufac- turing industry, automobiles, and even sophisticated weaponry. Given the economic importance and influence of these systems, they have increasingly faced the cybersecurity attacks. In this paper, we provide a dataset of real-life security incident reports on cyber-physical systems annotated with entities and events that are important for analysing such security incidents. We analyze and identify the limitations of the 'Domain Objects' in Structured Threat Information Expression (STIX) standard as well as recent research literature for the entity type clas- sification schemes in cybersecurity domain. We propose an updated classification scheme for entity types in the cybersecurity domain. The enhanced coverage provided by the entity scheme is important for automated information extraction and natural language understanding of textual reports containing details of the cybersecurity incident reports. We use deep-learning based sequence labelling techniques and cybersecurity domain specific word embed dings to set up a benchmark for entity and event extraction for cyber- physical security incident report analysis. The annotated dataset of real-life industrial security incidents will be made available for research purpose.
网络物理系统是许多行业的重要组成部分,如化学加工工业,制造业,汽车,甚至精密武器。鉴于这些系统的经济重要性和影响力,它们越来越多地面临网络安全攻击。在本文中,我们提供了一个关于网络物理系统的真实安全事件报告的数据集,其中注释了对分析此类安全事件很重要的实体和事件。我们分析和识别了结构化威胁信息表达(STIX)标准中“领域对象”的局限性,以及网络安全领域实体类型分类方案的最新研究文献。我们提出了一种更新的网络安全领域实体类型分类方案。实体方案提供的增强覆盖范围对于包含网络安全事件报告细节的文本报告的自动信息提取和自然语言理解非常重要。我们使用基于深度学习的序列标记技术和网络安全领域特定词嵌入来建立实体和事件提取的基准,用于网络物理安全事件报告分析。真实工业安全事件的注释数据集将用于研究目的。
{"title":"Extracting Entities and Events from Cyber-Physical Security Incident Reports","authors":"Nitin Ramrakhiyani, Sangameshwar Patil, Manideep Jella, Alok Kumar, G. Palshikar","doi":"10.1109/ICDMW58026.2022.00083","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00083","url":null,"abstract":"Cyber- physical systems are an important part of many industries such as the chemical process industry, manufac- turing industry, automobiles, and even sophisticated weaponry. Given the economic importance and influence of these systems, they have increasingly faced the cybersecurity attacks. In this paper, we provide a dataset of real-life security incident reports on cyber-physical systems annotated with entities and events that are important for analysing such security incidents. We analyze and identify the limitations of the 'Domain Objects' in Structured Threat Information Expression (STIX) standard as well as recent research literature for the entity type clas- sification schemes in cybersecurity domain. We propose an updated classification scheme for entity types in the cybersecurity domain. The enhanced coverage provided by the entity scheme is important for automated information extraction and natural language understanding of textual reports containing details of the cybersecurity incident reports. We use deep-learning based sequence labelling techniques and cybersecurity domain specific word embed dings to set up a benchmark for entity and event extraction for cyber- physical security incident report analysis. The annotated dataset of real-life industrial security incidents will be made available for research purpose.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121620742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1