首页 > 最新文献

WIREs Data Mining and Knowledge Discovery最新文献

英文 中文
A survey of episode mining 插曲挖掘研究综述
Pub Date : 2023-11-28 DOI: 10.1002/widm.1524
Oualid Ouarem, Farid Nouioua, Philippe Fournier-Viger
Episode mining is a research area in data mining, where the aim is to discover interesting episodes, that is, subsequences of events, in an event sequence. The most popular episode-mining task is frequent episode mining (FEM), which consists of identifying episodes that appear frequently in an event sequence, but this task has also been extended in various ways. It was shown that episode mining can reveal insightful patterns for numerous applications such as web stream analysis, network fault management, and cybersecurity, and that episodes can be useful for prediction. Episode mining is an active research area, and there have been numerous advances in the field over the last 25 years. However, due to the rapid evolution of the pattern mining field, there is no prior study that summarizes and gives a detailed overview of this field. The contribution of this article is to fill this gap by presenting an up-to-date survey that provides an introduction to episode mining and an overview of recent developments and research opportunities. This advanced review first gives an introduction to the field of episode mining and the first algorithms. Then, the main concepts used in these algorithms are explained. After that, several recent studies are reviewed that have addressed some limitations of these algorithms and proposed novel solutions to overcome them. Finally, the paper lists some possible extensions of the existing frameworks to mine more meaningful patterns and presents some possible orientations for future work that may contribute to the evolution of the episode mining field.
集挖掘是数据挖掘的一个研究领域,其目的是在事件序列中发现有趣的集,即事件的子序列。最流行的情节挖掘任务是频繁情节挖掘(FEM),它包括识别在事件序列中频繁出现的情节,但该任务也以各种方式进行了扩展。研究表明,情节挖掘可以为许多应用(如web流分析、网络故障管理和网络安全)揭示有洞察力的模式,并且情节可以用于预测。集采矿是一个活跃的研究领域,在过去的25年里,该领域取得了许多进展。然而,由于模式挖掘领域的快速发展,目前还没有对该领域进行总结和详细概述的研究。本文的贡献是通过提供最新的调查来填补这一空白,该调查提供了集挖掘的介绍,并概述了最近的发展和研究机会。这篇高级综述首先介绍了集挖掘领域和最早的算法。然后,解释了这些算法中使用的主要概念。之后,回顾了最近的几项研究,这些研究解决了这些算法的一些局限性,并提出了克服这些局限性的新解决方案。最后,本文列出了现有框架的一些可能的扩展,以挖掘更有意义的模式,并提出了一些可能有助于情节挖掘领域发展的未来工作方向。
{"title":"A survey of episode mining","authors":"Oualid Ouarem, Farid Nouioua, Philippe Fournier-Viger","doi":"10.1002/widm.1524","DOIUrl":"https://doi.org/10.1002/widm.1524","url":null,"abstract":"Episode mining is a research area in data mining, where the aim is to discover interesting episodes, that is, subsequences of events, in an event sequence. The most popular episode-mining task is frequent episode mining (FEM), which consists of identifying episodes that appear frequently in an event sequence, but this task has also been extended in various ways. It was shown that episode mining can reveal insightful patterns for numerous applications such as web stream analysis, network fault management, and cybersecurity, and that episodes can be useful for prediction. Episode mining is an active research area, and there have been numerous advances in the field over the last 25 years. However, due to the rapid evolution of the pattern mining field, there is no prior study that summarizes and gives a detailed overview of this field. The contribution of this article is to fill this gap by presenting an up-to-date survey that provides an introduction to episode mining and an overview of recent developments and research opportunities. This advanced review first gives an introduction to the field of episode mining and the first algorithms. Then, the main concepts used in these algorithms are explained. After that, several recent studies are reviewed that have addressed some limitations of these algorithms and proposed novel solutions to overcome them. Finally, the paper lists some possible extensions of the existing frameworks to mine more meaningful patterns and presents some possible orientations for future work that may contribute to the evolution of the episode mining field.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"107 48","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138455943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multispectral data mining: A focus on remote sensing satellite images 多光谱数据挖掘:遥感卫星图像的焦点
Pub Date : 2023-11-22 DOI: 10.1002/widm.1522
Sin Liang Lim, Jaya Sreevalsan-Nair, B. S. Daya Sagar
This article gives a brief overview of various aspects of data mining of multispectral image data. We focus on specifically the remote sensing satellite images acquired using multispectral imaging (MSI), given the technology used across multiple knowledge domains, such as chemistry, medical imaging, remote sensing, and so on with a sufficient amount of variation. In this article, the different data mining processes are reviewed along with state-of-the-art methods and applications. To study data mining, it is important to know how the data are acquired and preprocessed. Hence, those topics are briefly covered in the article. The article concludes with applications demonstrating the knowledge discovery from data mining, modern challenges, and promising future directions for MSI data mining research.
本文简要介绍了多光谱图像数据挖掘的各个方面。我们特别关注使用多光谱成像(MSI)获得的遥感卫星图像,因为使用的技术跨越多个知识领域,如化学、医学成像、遥感等,具有足够的变化量。在本文中,将介绍不同的数据挖掘过程以及最新的方法和应用程序。为了研究数据挖掘,了解如何获取和预处理数据是很重要的。因此,本文将简要介绍这些主题。文章最后展示了数据挖掘的知识发现、现代挑战以及MSI数据挖掘研究的未来方向。
{"title":"Multispectral data mining: A focus on remote sensing satellite images","authors":"Sin Liang Lim, Jaya Sreevalsan-Nair, B. S. Daya Sagar","doi":"10.1002/widm.1522","DOIUrl":"https://doi.org/10.1002/widm.1522","url":null,"abstract":"This article gives a brief overview of various aspects of data mining of multispectral image data. We focus on specifically the remote sensing satellite images acquired using multispectral imaging (MSI), given the technology used across multiple knowledge domains, such as chemistry, medical imaging, remote sensing, and so on with a sufficient amount of variation. In this article, the different data mining processes are reviewed along with state-of-the-art methods and applications. To study data mining, it is important to know how the data are acquired and preprocessed. Hence, those topics are briefly covered in the article. The article concludes with applications demonstrating the knowledge discovery from data mining, modern challenges, and promising future directions for MSI data mining research.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"35 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138455933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deepfake detection using deep learning methods: A systematic and comprehensive review 使用深度学习方法的深度伪造检测:系统和全面的回顾
Pub Date : 2023-11-20 DOI: 10.1002/widm.1520
Arash Heidari, Nima Jafari Navimipour, Hasan Dag, Mehmet Unal
Deep Learning (DL) has been effectively utilized in various complicated challenges in healthcare, industry, and academia for various purposes, including thyroid diagnosis, lung nodule recognition, computer vision, large data analytics, and human-level control. Nevertheless, developments in digital technology have been used to produce software that poses a threat to democracy, national security, and confidentiality. Deepfake is one of those DL-powered apps that has lately surfaced. So, deepfake systems can create fake images primarily by replacement of scenes or images, movies, and sounds that humans cannot tell apart from real ones. Various technologies have brought the capacity to change a synthetic speech, image, or video to our fingers. Furthermore, video and image frauds are now so convincing that it is hard to distinguish between false and authentic content with the naked eye. It might result in various issues and ranging from deceiving public opinion to using doctored evidence in a court. For such considerations, it is critical to have technologies that can assist us in discerning reality. This study gives a complete assessment of the literature on deepfake detection strategies using DL-based algorithms. We categorize deepfake detection methods in this work based on their applications, which include video detection, image detection, audio detection, and hybrid multimedia detection. The objective of this paper is to give the reader a better knowledge of (1) how deepfakes are generated and identified, (2) the latest developments and breakthroughs in this realm, (3) weaknesses of existing security methods, and (4) areas requiring more investigation and consideration. The results suggest that the Conventional Neural Networks (CNN) methodology is the most often employed DL method in publications. According to research, the majority of the articles are on the subject of video deepfake detection. The majority of the articles focused on enhancing only one parameter, with the accuracy parameter receiving the most attention.
深度学习(DL)已被有效地应用于医疗保健、工业和学术界的各种复杂挑战中,用于各种目的,包括甲状腺诊断、肺结节识别、计算机视觉、大数据分析和人类水平控制。然而,数字技术的发展已经被用来生产对民主、国家安全和保密构成威胁的软件。Deepfake是最近出现的人工智能应用之一。因此,深度伪造系统主要可以通过替换人类无法区分的场景或图像、电影和声音来创建假图像。各种各样的技术已经使我们的手指能够改变合成的语音、图像或视频。此外,视频和图像欺诈现在如此令人信服,以至于很难用肉眼区分虚假和真实的内容。这可能导致各种各样的问题,从欺骗公众舆论到在法庭上使用伪造的证据。出于这些考虑,拥有能够帮助我们辨别现实的技术是至关重要的。本研究对使用基于dl的算法的深度伪造检测策略的文献进行了完整的评估。在本工作中,我们根据其应用对深度伪造检测方法进行了分类,包括视频检测、图像检测、音频检测和混合多媒体检测。本文的目的是让读者更好地了解(1)如何产生和识别深度伪造,(2)该领域的最新发展和突破,(3)现有安全方法的弱点,以及(4)需要更多调查和考虑的领域。结果表明,传统神经网络(CNN)方法是出版物中最常用的深度学习方法。根据研究,大多数文章都是关于视频深度假检测的主题。大多数文章只关注一个参数的增强,其中精度参数受到的关注最多。
{"title":"Deepfake detection using deep learning methods: A systematic and comprehensive review","authors":"Arash Heidari, Nima Jafari Navimipour, Hasan Dag, Mehmet Unal","doi":"10.1002/widm.1520","DOIUrl":"https://doi.org/10.1002/widm.1520","url":null,"abstract":"Deep Learning (DL) has been effectively utilized in various complicated challenges in healthcare, industry, and academia for various purposes, including thyroid diagnosis, lung nodule recognition, computer vision, large data analytics, and human-level control. Nevertheless, developments in digital technology have been used to produce software that poses a threat to democracy, national security, and confidentiality. Deepfake is one of those DL-powered apps that has lately surfaced. So, deepfake systems can create fake images primarily by replacement of scenes or images, movies, and sounds that humans cannot tell apart from real ones. Various technologies have brought the capacity to change a synthetic speech, image, or video to our fingers. Furthermore, video and image frauds are now so convincing that it is hard to distinguish between false and authentic content with the naked eye. It might result in various issues and ranging from deceiving public opinion to using doctored evidence in a court. For such considerations, it is critical to have technologies that can assist us in discerning reality. This study gives a complete assessment of the literature on deepfake detection strategies using DL-based algorithms. We categorize deepfake detection methods in this work based on their applications, which include video detection, image detection, audio detection, and hybrid multimedia detection. The objective of this paper is to give the reader a better knowledge of (1) how deepfakes are generated and identified, (2) the latest developments and breakthroughs in this realm, (3) weaknesses of existing security methods, and (4) areas requiring more investigation and consideration. The results suggest that the Conventional Neural Networks (CNN) methodology is the most often employed DL method in publications. According to research, the majority of the articles are on the subject of video deepfake detection. The majority of the articles focused on enhancing only one parameter, with the accuracy parameter receiving the most attention.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"35 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138455932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The use of gene expression datasets in feature selection research: 20 years of inherent bias? 基因表达数据集在特征选择研究中的应用:20年来的固有偏见?
Pub Date : 2023-11-16 DOI: 10.1002/widm.1523
Bruno I. Grisci, Bruno César Feltes, Joice de Faria Poloni, Pedro H. Narloch, Márcio Dorn
Feature selection algorithms are frequently employed in preprocessing machine learning pipelines applied to biological data to identify relevant features. The use of feature selection in gene expression studies began at the end of the 1990s with the analysis of human cancer microarray datasets. Since then, gene expression technology has been perfected, the Human Genome Project has been completed, new microarray platforms have been created and discontinued, and RNA-seq has gradually replaced microarrays. However, most feature selection methods in the last two decades were designed, evaluated, and validated on the same datasets from the microarray technology's infancy. In this review of over 1200 publications regarding feature selection and gene expression, published between 2010 and 2020, we found that 57% of the publications used at least one outdated dataset, 23% used only outdated data, and 32% did not cite data sources. Other issues include referencing databases that are no longer available, the slow adoption of RNA-seq datasets, and bias toward human cancer data, even for methods designed for a broader scope. In the most popular datasets, some being 23 years old, mislabeled samples, experimental biases, distribution shifts, and the absence of classification challenges are common. These problems are more predominant in publications with computer science backgrounds compared to publications from biology and can lead to inaccurate and misleading biological results.
特征选择算法经常用于预处理机器学习管道,用于识别相关特征的生物数据。特征选择在基因表达研究中的应用始于20世纪90年代末对人类癌症微阵列数据集的分析。此后,基因表达技术不断完善,人类基因组计划完成,新的微阵列平台被创建和停产,RNA-seq逐渐取代了微阵列。然而,在过去的二十年中,大多数特征选择方法都是在微阵列技术初期的相同数据集上设计、评估和验证的。在对2010年至2020年间发表的1200多篇关于特征选择和基因表达的论文进行综述后,我们发现57%的论文至少使用了一个过时的数据集,23%的论文只使用了过时的数据,32%的论文没有引用数据来源。其他问题包括引用不再可用的数据库,RNA-seq数据集的缓慢采用,以及对人类癌症数据的偏见,即使是为更广泛的范围设计的方法。在最流行的数据集中,一些是23年前的,错误标记的样本,实验偏差,分布变化,以及缺乏分类挑战是常见的。与生物学的出版物相比,这些问题在具有计算机科学背景的出版物中更为突出,并可能导致不准确和误导性的生物学结果。
{"title":"The use of gene expression datasets in feature selection research: 20 years of inherent bias?","authors":"Bruno I. Grisci, Bruno César Feltes, Joice de Faria Poloni, Pedro H. Narloch, Márcio Dorn","doi":"10.1002/widm.1523","DOIUrl":"https://doi.org/10.1002/widm.1523","url":null,"abstract":"Feature selection algorithms are frequently employed in preprocessing machine learning pipelines applied to biological data to identify relevant features. The use of feature selection in gene expression studies began at the end of the 1990s with the analysis of human cancer microarray datasets. Since then, gene expression technology has been perfected, the Human Genome Project has been completed, new microarray platforms have been created and discontinued, and RNA-seq has gradually replaced microarrays. However, most feature selection methods in the last two decades were designed, evaluated, and validated on the same datasets from the microarray technology's infancy. In this review of over 1200 publications regarding feature selection and gene expression, published between 2010 and 2020, we found that 57% of the publications used at least one outdated dataset, 23% used only outdated data, and 32% did not cite data sources. Other issues include referencing databases that are no longer available, the slow adoption of RNA-seq datasets, and bias toward human cancer data, even for methods designed for a broader scope. In the most popular datasets, some being 23 years old, mislabeled samples, experimental biases, distribution shifts, and the absence of classification challenges are common. These problems are more predominant in publications with computer science backgrounds compared to publications from biology and can lead to inaccurate and misleading biological results.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"28 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138455934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
WIREs Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1