首页 > 最新文献

2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
Extracting Entities and Events from Cyber-Physical Security Incident Reports 从网络物理安全事件报告中提取实体和事件
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00083
Nitin Ramrakhiyani, Sangameshwar Patil, Manideep Jella, Alok Kumar, G. Palshikar
Cyber- physical systems are an important part of many industries such as the chemical process industry, manufac- turing industry, automobiles, and even sophisticated weaponry. Given the economic importance and influence of these systems, they have increasingly faced the cybersecurity attacks. In this paper, we provide a dataset of real-life security incident reports on cyber-physical systems annotated with entities and events that are important for analysing such security incidents. We analyze and identify the limitations of the 'Domain Objects' in Structured Threat Information Expression (STIX) standard as well as recent research literature for the entity type clas- sification schemes in cybersecurity domain. We propose an updated classification scheme for entity types in the cybersecurity domain. The enhanced coverage provided by the entity scheme is important for automated information extraction and natural language understanding of textual reports containing details of the cybersecurity incident reports. We use deep-learning based sequence labelling techniques and cybersecurity domain specific word embed dings to set up a benchmark for entity and event extraction for cyber- physical security incident report analysis. The annotated dataset of real-life industrial security incidents will be made available for research purpose.
网络物理系统是许多行业的重要组成部分,如化学加工工业,制造业,汽车,甚至精密武器。鉴于这些系统的经济重要性和影响力,它们越来越多地面临网络安全攻击。在本文中,我们提供了一个关于网络物理系统的真实安全事件报告的数据集,其中注释了对分析此类安全事件很重要的实体和事件。我们分析和识别了结构化威胁信息表达(STIX)标准中“领域对象”的局限性,以及网络安全领域实体类型分类方案的最新研究文献。我们提出了一种更新的网络安全领域实体类型分类方案。实体方案提供的增强覆盖范围对于包含网络安全事件报告细节的文本报告的自动信息提取和自然语言理解非常重要。我们使用基于深度学习的序列标记技术和网络安全领域特定词嵌入来建立实体和事件提取的基准,用于网络物理安全事件报告分析。真实工业安全事件的注释数据集将用于研究目的。
{"title":"Extracting Entities and Events from Cyber-Physical Security Incident Reports","authors":"Nitin Ramrakhiyani, Sangameshwar Patil, Manideep Jella, Alok Kumar, G. Palshikar","doi":"10.1109/ICDMW58026.2022.00083","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00083","url":null,"abstract":"Cyber- physical systems are an important part of many industries such as the chemical process industry, manufac- turing industry, automobiles, and even sophisticated weaponry. Given the economic importance and influence of these systems, they have increasingly faced the cybersecurity attacks. In this paper, we provide a dataset of real-life security incident reports on cyber-physical systems annotated with entities and events that are important for analysing such security incidents. We analyze and identify the limitations of the 'Domain Objects' in Structured Threat Information Expression (STIX) standard as well as recent research literature for the entity type clas- sification schemes in cybersecurity domain. We propose an updated classification scheme for entity types in the cybersecurity domain. The enhanced coverage provided by the entity scheme is important for automated information extraction and natural language understanding of textual reports containing details of the cybersecurity incident reports. We use deep-learning based sequence labelling techniques and cybersecurity domain specific word embed dings to set up a benchmark for entity and event extraction for cyber- physical security incident report analysis. The annotated dataset of real-life industrial security incidents will be made available for research purpose.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121620742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DragStream: An Anomaly And Concept Drift Detector In Univariate Data Streams DragStream:单变量数据流中的异常和概念漂移检测器
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00113
Anne Marthe Sophie Ngo Bibinbe, A. J. Mahamadou, Michael Franklin Mbouopda, E. Nguifo
Anomaly detection in data streams comes with different technical challenges due to the data nature. The main challenges include storage limitations, the speed of data arrival, and concept drifts. In the literature, methods for mining data streams in order to detect anomalies have been proposed. While some methods focus on tackling a specific issue, other methods handle diverse problems but may have high complexity (time and memory). In the present work, we propose DragStream, a novel subsequence anomaly and concept drift detection algorithm for univariate data streams. DragStream extends the subsequence anomaly detection method for time series data Drag to streaming data. Furthermore, the new method is inspired by the well-known Matrix Profile, Drag, and MILOF which are respectively point and subsequence anomaly detection methods for time series and data streams. We conducted intensive experiments and statistical analysis to evaluate the performance of the proposed approach against existing methods. The results show that our method is competitive in performance while being linear in time and memory complexity. Finally, we provide an open-source implementation of the new method.
由于数据的性质,数据流中的异常检测面临着不同的技术挑战。主要的挑战包括存储限制、数据到达速度和概念漂移。在文献中,已经提出了挖掘数据流以检测异常的方法。虽然有些方法专注于解决特定问题,但其他方法处理各种问题,但可能具有很高的复杂性(时间和内存)。在本工作中,我们提出了一种新的单变量数据流子序列异常和概念漂移检测算法DragStream。DragStream将时间序列数据的子序列异常检测方法扩展到流数据。此外,该方法还受到了著名的矩阵轮廓、拖拽和MILOF的启发,这三种方法分别是时间序列和数据流的点和子序列异常检测方法。我们进行了大量的实验和统计分析,以评估所提出的方法与现有方法的性能。结果表明,该方法具有较好的性能,同时在时间和内存复杂度上保持线性。最后,我们提供了一个新方法的开源实现。
{"title":"DragStream: An Anomaly And Concept Drift Detector In Univariate Data Streams","authors":"Anne Marthe Sophie Ngo Bibinbe, A. J. Mahamadou, Michael Franklin Mbouopda, E. Nguifo","doi":"10.1109/ICDMW58026.2022.00113","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00113","url":null,"abstract":"Anomaly detection in data streams comes with different technical challenges due to the data nature. The main challenges include storage limitations, the speed of data arrival, and concept drifts. In the literature, methods for mining data streams in order to detect anomalies have been proposed. While some methods focus on tackling a specific issue, other methods handle diverse problems but may have high complexity (time and memory). In the present work, we propose DragStream, a novel subsequence anomaly and concept drift detection algorithm for univariate data streams. DragStream extends the subsequence anomaly detection method for time series data Drag to streaming data. Furthermore, the new method is inspired by the well-known Matrix Profile, Drag, and MILOF which are respectively point and subsequence anomaly detection methods for time series and data streams. We conducted intensive experiments and statistical analysis to evaluate the performance of the proposed approach against existing methods. The results show that our method is competitive in performance while being linear in time and memory complexity. Finally, we provide an open-source implementation of the new method.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123658573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Extensive Attention Mechanisms in Graph Neural Networks for Materials Discovery 图神经网络在材料发现中的广泛注意机制
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00090
Guojing Cong, Talia Ben-Naim, Victor Fung, Anshul Gupta, R. Neumann, Mathias Steiner
We present our research where attention mechanism is extensively applied to various aspects of graph neural net- works for predicting materials properties. As a result, surrogate models can not only replace costly simulations for materials screening but also formulate hypotheses and insights to guide further design exploration. We predict formation energy of the Materials Project and gas adsorption of crystalline adsorbents, and demonstrate the superior performance of our graph neural networks. Moreover, attention reveals important substructures that the machine learning models deem important for a material to achieve desired target properties. Our model is based solely on standard structural input files containing atomistic descriptions of the adsorbent material candidates. We construct novel methodological extensions to match the prediction accuracy of state-of-the-art models some of which were built with hundreds of features at much higher computational cost. We show that sophisticated neural networks can obviate the need for elaborate feature engineering. Our approach can be more broadly applied to optimize gas capture processes at industrial scale.
我们介绍了我们的研究,其中注意机制被广泛应用于图神经网络预测材料性能的各个方面。因此,替代模型不仅可以取代昂贵的材料筛选模拟,还可以制定假设和见解,以指导进一步的设计探索。我们预测了材料项目的形成能和晶体吸附剂的气体吸附,并证明了我们的图神经网络的优越性能。此外,注意力揭示了重要的子结构,机器学习模型认为这些子结构对于材料实现所需的目标特性很重要。我们的模型完全基于包含吸附剂候选材料的原子描述的标准结构输入文件。我们构建了新的方法扩展,以匹配最先进的模型的预测精度,其中一些模型是用数百个特征构建的,计算成本要高得多。我们表明,复杂的神经网络可以避免复杂的特征工程的需要。我们的方法可以更广泛地应用于优化工业规模的气体捕获过程。
{"title":"Extensive Attention Mechanisms in Graph Neural Networks for Materials Discovery","authors":"Guojing Cong, Talia Ben-Naim, Victor Fung, Anshul Gupta, R. Neumann, Mathias Steiner","doi":"10.1109/ICDMW58026.2022.00090","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00090","url":null,"abstract":"We present our research where attention mechanism is extensively applied to various aspects of graph neural net- works for predicting materials properties. As a result, surrogate models can not only replace costly simulations for materials screening but also formulate hypotheses and insights to guide further design exploration. We predict formation energy of the Materials Project and gas adsorption of crystalline adsorbents, and demonstrate the superior performance of our graph neural networks. Moreover, attention reveals important substructures that the machine learning models deem important for a material to achieve desired target properties. Our model is based solely on standard structural input files containing atomistic descriptions of the adsorbent material candidates. We construct novel methodological extensions to match the prediction accuracy of state-of-the-art models some of which were built with hundreds of features at much higher computational cost. We show that sophisticated neural networks can obviate the need for elaborate feature engineering. Our approach can be more broadly applied to optimize gas capture processes at industrial scale.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121676321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Domain-Specific Deep Learning Feature Extractor for Diabetic Foot Ulcer Detection 特定领域深度学习特征提取器用于糖尿病足溃疡检测
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00041
R. Basiri, M. Popovic, Shehroz S. Khan
Diabetic Foot Ulcer (DFU) is a condition requiring constant monitoring and evaluations for treatment. DFU patient population is on the rise and will soon outpace the available health resources. Autonomous monitoring and evaluation of DFU wounds is a much-needed area in health care. In this paper, we evaluate and identify the most accurate feature extractor that is the core basis for developing a deep learning wound detection network. For the evaluation, we used mAP and F1-score on the publicly available DFU2020 dataset. A combination of UNet and EfficientNetb3 feature extractor resulted in the best evaluation among the 14 networks compared. UNet and Efficientnetb3 can be used as the classifier in the development of a comprehensive DFU domain-specific autonomous wound detection pipeline.
糖尿病足溃疡(DFU)是一种需要持续监测和评估治疗的疾病。DFU患者人数正在上升,并将很快超过现有的卫生资源。DFU伤口的自主监测和评估是医疗保健中急需的一个领域。在本文中,我们评估和识别最准确的特征提取器,这是开发深度学习伤口检测网络的核心基础。为了进行评估,我们在公开的DFU2020数据集上使用了mAP和F1-score。UNet和effentnetb3特征提取器的组合在14个网络中获得了最好的评价。UNet和Efficientnetb3可以作为分类器用于开发全面的DFU领域自主伤口检测管道。
{"title":"Domain-Specific Deep Learning Feature Extractor for Diabetic Foot Ulcer Detection","authors":"R. Basiri, M. Popovic, Shehroz S. Khan","doi":"10.1109/ICDMW58026.2022.00041","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00041","url":null,"abstract":"Diabetic Foot Ulcer (DFU) is a condition requiring constant monitoring and evaluations for treatment. DFU patient population is on the rise and will soon outpace the available health resources. Autonomous monitoring and evaluation of DFU wounds is a much-needed area in health care. In this paper, we evaluate and identify the most accurate feature extractor that is the core basis for developing a deep learning wound detection network. For the evaluation, we used mAP and F1-score on the publicly available DFU2020 dataset. A combination of UNet and EfficientNetb3 feature extractor resulted in the best evaluation among the 14 networks compared. UNet and Efficientnetb3 can be used as the classifier in the development of a comprehensive DFU domain-specific autonomous wound detection pipeline.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"52 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120883848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emerging properties from Bayesian Non-Parametric for multiple clustering: Application for multi-view image dataset 贝叶斯非参数多聚类的新特性:多视图图像数据集的应用
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00013
Reda Khoufache, M. Dilmi, Hanene Azzag, Etienne Gofinnet, M. Lebbah
Artificial Intelligence (AI) in supermarkets is moving fast with the recent advances in deep learning. One important project in the retail sector is the development of AI solutions for smart stores, mainly to improve product recognition. In this paper, we present a new framework to address the multi-view image classification using multiple clustering. The proposed framework combines a pre-trained Vision Transformer with a Bayesian Non-Parametric multiple clustering. In this work, we propose an M CM C- based inference approach to learn the column-partition and the row-partitions. This method infers multiple clustering solutions and allows to find automatically the number of clusters. Our method provides interesting results on a multi-view image dataset and emphasizes, on one hand, the power of pre-trained Vision Transformers combined with the multiple clustering algorithm, on the other hand, the usefulness of the Bayesian Non-Parametric modeling, which automatically performs a model selection.
随着深度学习的最新进展,超市中的人工智能(AI)正在迅速发展。零售领域的一个重要项目是为智能商店开发人工智能解决方案,主要是为了提高产品识别。本文提出了一种新的基于多聚类的多视图图像分类框架。该框架将预训练的视觉转换器与贝叶斯非参数多聚类相结合。在这项工作中,我们提出了一种基于M - CM - C的推理方法来学习列分区和行分区。该方法推断出多个聚类解决方案,并允许自动查找聚类的数量。我们的方法在多视图图像数据集上提供了有趣的结果,并且一方面强调了预先训练的视觉变形器与多聚类算法相结合的强大功能,另一方面强调了贝叶斯非参数建模的有用性,该建模可以自动执行模型选择。
{"title":"Emerging properties from Bayesian Non-Parametric for multiple clustering: Application for multi-view image dataset","authors":"Reda Khoufache, M. Dilmi, Hanene Azzag, Etienne Gofinnet, M. Lebbah","doi":"10.1109/ICDMW58026.2022.00013","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00013","url":null,"abstract":"Artificial Intelligence (AI) in supermarkets is moving fast with the recent advances in deep learning. One important project in the retail sector is the development of AI solutions for smart stores, mainly to improve product recognition. In this paper, we present a new framework to address the multi-view image classification using multiple clustering. The proposed framework combines a pre-trained Vision Transformer with a Bayesian Non-Parametric multiple clustering. In this work, we propose an M CM C- based inference approach to learn the column-partition and the row-partitions. This method infers multiple clustering solutions and allows to find automatically the number of clusters. Our method provides interesting results on a multi-view image dataset and emphasizes, on one hand, the power of pre-trained Vision Transformers combined with the multiple clustering algorithm, on the other hand, the usefulness of the Bayesian Non-Parametric modeling, which automatically performs a model selection.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123678262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentence-BERT Distinguishes Good and Bad Essays in Cross-prompt Automated Essay Scoring 句子- bert在交叉提示自动作文评分中区分好文章和坏文章
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00045
Toru Sasaki, Tomonari Masada
Automated Essay Scoring (AES) refers to a set of processes that automatically assigns grades to student-written essays with machine learning models. Existing AES models are mostly trained prompt-specifically with supervised learning, which requires the essay prompt to be accessible to the system vendor at the time of model training. However, essay prompts for high-stakes testing should usually be kept confidential before the test date, which demands the model to be cross-promptly trainable with pre-scored essay data already in hands. Document embeddings obtained from pretrained language models such as Sentence-BERT (sbert) are primarily expected to represent the semantic content of the text. We hypothesize SBERT embeddings also contain assessment-relevant elements that are extractable by document embedding decomposition through Principal Component Analysis (PCA) enhanced with Normalized Discounted Cumulative Gain (nDCG) measurement. The identified evaluative elements in the entire embedding space of the source essays are then cross-promptly transferred to the target essays written on different prompts for binary clustering task of dividing high/low-scored groups. The result implies non-finetuned SBERT already contains evaluative elements to distinguish good and bad essays.
自动论文评分(Automated Essay Scoring, AES)是指一组使用机器学习模型自动为学生写的论文分配分数的过程。现有的AES模型大多是即时训练的——特别是有监督的学习,这要求系统供应商在模型训练时可以访问论文提示。然而,高风险考试的作文提示通常应该在考试日期前保密,这就要求该模型可以交叉快速训练,并且已经掌握了预评分的作文数据。从预训练语言模型(如Sentence-BERT (sbert))中获得的文档嵌入主要用于表示文本的语义内容。我们假设SBERT嵌入还包含与评估相关的元素,这些元素可以通过主成分分析(PCA)和归一化贴现累积增益(nDCG)测量增强的文档嵌入分解来提取。然后,在源文章的整个嵌入空间中识别出的评价元素被交叉迅速地转移到在不同提示上写的目标文章中,用于划分高分/低分组的二元聚类任务。结果表明,非微调的SBERT已经包含了区分好文章和坏文章的评价元素。
{"title":"Sentence-BERT Distinguishes Good and Bad Essays in Cross-prompt Automated Essay Scoring","authors":"Toru Sasaki, Tomonari Masada","doi":"10.1109/ICDMW58026.2022.00045","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00045","url":null,"abstract":"Automated Essay Scoring (AES) refers to a set of processes that automatically assigns grades to student-written essays with machine learning models. Existing AES models are mostly trained prompt-specifically with supervised learning, which requires the essay prompt to be accessible to the system vendor at the time of model training. However, essay prompts for high-stakes testing should usually be kept confidential before the test date, which demands the model to be cross-promptly trainable with pre-scored essay data already in hands. Document embeddings obtained from pretrained language models such as Sentence-BERT (sbert) are primarily expected to represent the semantic content of the text. We hypothesize SBERT embeddings also contain assessment-relevant elements that are extractable by document embedding decomposition through Principal Component Analysis (PCA) enhanced with Normalized Discounted Cumulative Gain (nDCG) measurement. The identified evaluative elements in the entire embedding space of the source essays are then cross-promptly transferred to the target essays written on different prompts for binary clustering task of dividing high/low-scored groups. The result implies non-finetuned SBERT already contains evaluative elements to distinguish good and bad essays.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scene and Texture Based Feature Set for DeepFake Video Detection 基于场景和纹理的深度假视频检测特征集
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00021
A. Ramkissoon, Vijayanandh Rajamanickam, W. Goodridge
The existence of fake videos is a problem that is challenging today's social media-enabled world. There are many classifications for fake videos with one of the most popular being DeepFakes. Detecting such fake videos is a challenging issue. This research attempts to comprehend the characteristics that belong to DeepFake videos. In attempting to understand DeepFake videos this work investigates the characteristics of the video that make them unique. As such this research uses scene and texture detection to develop a unique feature set containing 19 data features which is capable of detecting whether a video is a DeepFake or not. This study validates the feature set using a standard dataset of the features relating to the characteristics of the video. These features are analysed using a classification machine learning model. The results of these experiments are examined using four evaluation methodologies. The analysis reveals positive performance with the use of the ML method and the feature set. From these results, it can be ascertained that using the proposed feature set, a video can be predicted as a DeepFake or not and as such prove the hypothesis that there exists a correlation between the characteristics of a video and its genuineness, i.e., whether or not a video is a DeepFake.
假视频的存在是一个挑战当今社交媒体世界的问题。假视频有很多分类,其中最流行的一种是DeepFakes。检测这样的假视频是一个具有挑战性的问题。本研究试图理解属于DeepFake视频的特征。在试图理解DeepFake视频的过程中,这项工作调查了使其独特的视频特征。因此,本研究使用场景和纹理检测来开发一个包含19个数据特征的独特特征集,该特征集能够检测视频是否为DeepFake。本研究使用与视频特征相关的特征的标准数据集验证了特征集。使用分类机器学习模型对这些特征进行分析。这些实验的结果使用四种评价方法进行检验。分析表明,使用ML方法和特征集具有积极的性能。从这些结果中,可以确定使用所提出的特征集,可以预测视频是否为DeepFake,从而证明了视频的特征与其真实性之间存在相关性的假设,即视频是否为DeepFake。
{"title":"Scene and Texture Based Feature Set for DeepFake Video Detection","authors":"A. Ramkissoon, Vijayanandh Rajamanickam, W. Goodridge","doi":"10.1109/ICDMW58026.2022.00021","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00021","url":null,"abstract":"The existence of fake videos is a problem that is challenging today's social media-enabled world. There are many classifications for fake videos with one of the most popular being DeepFakes. Detecting such fake videos is a challenging issue. This research attempts to comprehend the characteristics that belong to DeepFake videos. In attempting to understand DeepFake videos this work investigates the characteristics of the video that make them unique. As such this research uses scene and texture detection to develop a unique feature set containing 19 data features which is capable of detecting whether a video is a DeepFake or not. This study validates the feature set using a standard dataset of the features relating to the characteristics of the video. These features are analysed using a classification machine learning model. The results of these experiments are examined using four evaluation methodologies. The analysis reveals positive performance with the use of the ML method and the feature set. From these results, it can be ascertained that using the proposed feature set, a video can be predicted as a DeepFake or not and as such prove the hypothesis that there exists a correlation between the characteristics of a video and its genuineness, i.e., whether or not a video is a DeepFake.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128776656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unknown Type Streaming Feature Selection via Maximal Information Coefficient 基于最大信息系数的未知类型流特征选择
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00089
Peng Zhou, Yunyun Zhang, Yuan-Ting Yan, Shu Zhao
Feature selection aims to select an optimal minimal feature subset from the original datasets and has become an indispensable preprocessing component before data mining and machine learning, especially in the era of big data. Most feature selection methods implicitly assume that we can know the feature type (categorical, numerical, or mixed) before learning, then design corresponding measurements to calculate the correlation between features. However, in practical applications, features may be generated dynamically and arrive one by one over time, which we call streaming features. Most existing streaming feature selection methods assume that all dynamically generated features are the same type or assume we can know the feature type for each new arriving feature on the fly, but this is unreasonable and unrealistic. Therefore, this paper firstly studies a practical issue of Unknown Type Streaming Feature Selection and proposes a new method to handle it, named UT-SFS. Extensive experimental results indicate the effectiveness of our new method. UT-SFS is nonparametric and does not need to know the feature type before learning, which aligns with practical application needs.
特征选择旨在从原始数据集中选择最优的最小特征子集,是数据挖掘和机器学习前不可或缺的预处理组成部分,特别是在大数据时代。大多数特征选择方法隐含地假设我们可以在学习之前知道特征类型(分类、数值或混合),然后设计相应的测量来计算特征之间的相关性。然而,在实际应用中,特征可能是动态生成的,并随着时间的推移一个接一个地到达,我们称之为流特征。大多数现有的流特征选择方法假设所有动态生成的特征都是相同的类型,或者假设我们可以动态地知道每个新到达的特征的特征类型,但这是不合理和不现实的。因此,本文首先研究了未知类型流特征选择的实际问题,并提出了一种新的处理方法UT-SFS。大量的实验结果表明了新方法的有效性。UT-SFS是非参数的,在学习前不需要知道特征类型,符合实际应用需求。
{"title":"Unknown Type Streaming Feature Selection via Maximal Information Coefficient","authors":"Peng Zhou, Yunyun Zhang, Yuan-Ting Yan, Shu Zhao","doi":"10.1109/ICDMW58026.2022.00089","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00089","url":null,"abstract":"Feature selection aims to select an optimal minimal feature subset from the original datasets and has become an indispensable preprocessing component before data mining and machine learning, especially in the era of big data. Most feature selection methods implicitly assume that we can know the feature type (categorical, numerical, or mixed) before learning, then design corresponding measurements to calculate the correlation between features. However, in practical applications, features may be generated dynamically and arrive one by one over time, which we call streaming features. Most existing streaming feature selection methods assume that all dynamically generated features are the same type or assume we can know the feature type for each new arriving feature on the fly, but this is unreasonable and unrealistic. Therefore, this paper firstly studies a practical issue of Unknown Type Streaming Feature Selection and proposes a new method to handle it, named UT-SFS. Extensive experimental results indicate the effectiveness of our new method. UT-SFS is nonparametric and does not need to know the feature type before learning, which aligns with practical application needs.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125894631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Convolutional Networks with Dependency Parser towards Multiview Representation Learning for Sentiment Analysis 面向情感分析多视图表示学习的依赖解析器图卷积网络
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00070
Minqiang Yang, Xinqi Liu, Chengsheng Mao, Bin Hu
Sentiment analysis has become increasingly important in natural language processing (NLP). Recent efforts have been devoted to the graph convolutional network (GCN) due to its advantages in handling the complex information. However, the improvement of GCN in NLP is hindered because the pretrained word vectors do not fit well in various contexts and the traditional edge building methods are not suited well for the long and complex context. To address these problems, we propose the LSTM-GCN model to contextualize the pretrained word vectors and extract the sentiment representations from the complex texts. Particularly, LSTM-GCN captures the sentiment feature representations from multiple different perspectives including context and syntax. In addition to extracting contextual representation from pretrained word vectors, we utilize the dependency parser to analyse the dependency correlation between each word to extract the syntax representation. For each text, we build a graph with each word in the text as a node. Besides the edges between the neighboring words, we also connect the nodes with dependency correlation to capture syntax representations. Moreover, we introduce the message passing mechanism (MPM) which allows the nodes to update their representation by extract information from its neighbors. Also, to improve the message passing performance, we set the edges to be trainable and initialize the edge weights with the pointwise mutual information (PMI) method. The results of the experiments show that our LSTM-GCN model outperforms several state-of-the-art models. And extensive experiments validate the rationality and effectiveness of our model.
情感分析在自然语言处理(NLP)中越来越重要。图卷积网络(GCN)由于其在处理复杂信息方面的优势,近年来得到了广泛的研究。然而,由于预训练的词向量不能很好地适应各种上下文,传统的边缘构建方法不能很好地适应长而复杂的上下文,阻碍了GCN在自然语言处理中的改进。为了解决这些问题,我们提出了LSTM-GCN模型来将预训练的词向量语境化,并从复杂文本中提取情感表示。特别是,LSTM-GCN从多个不同的角度(包括上下文和语法)捕获情感特征表示。除了从预训练的词向量中提取上下文表示外,我们还利用依赖解析器分析每个词之间的依赖关系以提取语法表示。对于每个文本,我们用文本中的每个单词作为节点构建一个图。除了相邻词之间的边,我们还使用依赖关系连接节点以捕获语法表示。此外,我们还引入了消息传递机制(MPM),该机制允许节点通过从其邻居中提取信息来更新其表示。此外,为了提高消息传递性能,我们将边缘设置为可训练的,并使用点互信息(PMI)方法初始化边缘权重。实验结果表明,我们的LSTM-GCN模型优于几种最先进的模型。大量的实验验证了该模型的合理性和有效性。
{"title":"Graph Convolutional Networks with Dependency Parser towards Multiview Representation Learning for Sentiment Analysis","authors":"Minqiang Yang, Xinqi Liu, Chengsheng Mao, Bin Hu","doi":"10.1109/ICDMW58026.2022.00070","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00070","url":null,"abstract":"Sentiment analysis has become increasingly important in natural language processing (NLP). Recent efforts have been devoted to the graph convolutional network (GCN) due to its advantages in handling the complex information. However, the improvement of GCN in NLP is hindered because the pretrained word vectors do not fit well in various contexts and the traditional edge building methods are not suited well for the long and complex context. To address these problems, we propose the LSTM-GCN model to contextualize the pretrained word vectors and extract the sentiment representations from the complex texts. Particularly, LSTM-GCN captures the sentiment feature representations from multiple different perspectives including context and syntax. In addition to extracting contextual representation from pretrained word vectors, we utilize the dependency parser to analyse the dependency correlation between each word to extract the syntax representation. For each text, we build a graph with each word in the text as a node. Besides the edges between the neighboring words, we also connect the nodes with dependency correlation to capture syntax representations. Moreover, we introduce the message passing mechanism (MPM) which allows the nodes to update their representation by extract information from its neighbors. Also, to improve the message passing performance, we set the edges to be trainable and initialize the edge weights with the pointwise mutual information (PMI) method. The results of the experiments show that our LSTM-GCN model outperforms several state-of-the-art models. And extensive experiments validate the rationality and effectiveness of our model.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121063083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Valuable Fuzzy Patterns via the RFM Model 利用RFM模型挖掘有价值的模糊模式
Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00075
Yanlin Qi, Fuyin Lai, Guoting Chen, Wensheng Gan
This paper aims to propose an effective algorithm to discover valuable patterns by applying the fuzzy method to the RFM model. RFM analysis is a common method in customer relationship management, through which we can identify valuable customer groups. By combining RFM analysis with frequent pattern mining, valuable RFM - patterns can be found from the RFM-pattern-tree, such as the RFMP-growth algorithm. Aiming to mine patterns that have quantitative relationships among items, we introduce the fuzzy method in the RFM model, and we present a fuzzy - Rfu - tree algorithm in which a new pruning strategy is proposed to prune candidate patterns. Experiments show the effectiveness of the new algorithm. The new algorithm guarantees a high overlap degree with the RFM-patterns gen-erated by RFMP-growth, with more valuable information (with additional fuzzy level) in the mined patterns.
本文旨在将模糊方法应用于RFM模型,提出一种有效的算法来发现有价值的模式。RFM分析是客户关系管理中常用的一种方法,通过RFM分析可以识别出有价值的客户群体。通过将RFM分析与频繁的模式挖掘相结合,可以从RFM模式树中发现有价值的RFM模式,例如RFM增长算法。为了挖掘项目间具有定量关系的模式,在RFM模型中引入模糊方法,提出了一种模糊- Rfu -树算法,该算法提出了一种新的剪剪策略来剪剪候选模式。实验证明了新算法的有效性。新算法保证了与由rfm生长生成的rfm模式的高度重叠,挖掘的模式中有更多有价值的信息(增加了额外的模糊级别)。
{"title":"Mining Valuable Fuzzy Patterns via the RFM Model","authors":"Yanlin Qi, Fuyin Lai, Guoting Chen, Wensheng Gan","doi":"10.1109/ICDMW58026.2022.00075","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00075","url":null,"abstract":"This paper aims to propose an effective algorithm to discover valuable patterns by applying the fuzzy method to the RFM model. RFM analysis is a common method in customer relationship management, through which we can identify valuable customer groups. By combining RFM analysis with frequent pattern mining, valuable RFM - patterns can be found from the RFM-pattern-tree, such as the RFMP-growth algorithm. Aiming to mine patterns that have quantitative relationships among items, we introduce the fuzzy method in the RFM model, and we present a fuzzy - Rfu - tree algorithm in which a new pruning strategy is proposed to prune candidate patterns. Experiments show the effectiveness of the new algorithm. The new algorithm guarantees a high overlap degree with the RFM-patterns gen-erated by RFMP-growth, with more valuable information (with additional fuzzy level) in the mined patterns.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126541686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 IEEE International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1