Recommender systems are essential for information filtering but often suffer from the cold start problem caused by limited interaction data. Recent advances in deep learning (DL) and large language models (LLMs) have shown promise, yet systematic analysis of their effectiveness remains scarce. To address this gap, we introduce a paradigm-driven taxonomy that categorizes solutions by their primary source of information: content, structure, transfer, and generation. Within this framework, DL methods have matured in leveraging content and structural information from interaction logs and multimodal data, while LLMs demonstrate advantages in text-rich and data-sparse environments through transfer-based paradigms that exploit semantic understanding and pre-trained knowledge. Furthermore, emerging generative approaches show potential for synthesizing data or relations to alleviate information scarcity. No universal solution exists; effectiveness depends on the dominant paradigm of a given scenario as well as data availability and computational cost. Combining DL and LLM offers substantial opportunities, including enhanced feature representation, data augmentation, and hybrid pipelines. However, research gaps persist, particularly the lack of standardized evaluation metrics and limited exploration of integration strategies. Addressing these challenges through a paradigm-aware perspective could significantly improve the robustness and adaptability of the cold-start recommendation in diverse contexts.
{"title":"A Review of Deep Learning and Large Language Models for Cold Start Problem in Recommender Systems","authors":"Chenlong Liu, Daguang Jiang, Yi Cai, Hui Li","doi":"10.1002/widm.70068","DOIUrl":"https://doi.org/10.1002/widm.70068","url":null,"abstract":"Recommender systems are essential for information filtering but often suffer from the cold start problem caused by limited interaction data. Recent advances in deep learning (DL) and large language models (LLMs) have shown promise, yet systematic analysis of their effectiveness remains scarce. To address this gap, we introduce a paradigm-driven taxonomy that categorizes solutions by their primary source of information: content, structure, transfer, and generation. Within this framework, DL methods have matured in leveraging content and structural information from interaction logs and multimodal data, while LLMs demonstrate advantages in text-rich and data-sparse environments through transfer-based paradigms that exploit semantic understanding and pre-trained knowledge. Furthermore, emerging generative approaches show potential for synthesizing data or relations to alleviate information scarcity. No universal solution exists; effectiveness depends on the dominant paradigm of a given scenario as well as data availability and computational cost. Combining DL and LLM offers substantial opportunities, including enhanced feature representation, data augmentation, and hybrid pipelines. However, research gaps persist, particularly the lack of standardized evaluation metrics and limited exploration of integration strategies. Addressing these challenges through a paradigm-aware perspective could significantly improve the robustness and adaptability of the cold-start recommendation in diverse contexts.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Toshitaka Hayashi, Dalibor Cimr, Hamido Fujita, Richard Cimler
This paper presents a critical review of one-class classification (OCC). Old articles defined OCC in a vague way, which allowed OCC models to learn from multiple classes. This paper reconsiders the OCC definition, as training data includes solely one class, and samples belonging to other classes are not available. Moreover, the review introduces a new OCC taxonomy consisting of boundary, distance, probability, fake, and subtask-based approaches. Additionally, the article reveals that many OCC algorithms have learned multiple classes. Common violations include accessing unlabeled datasets, importing other datasets, and hyperparameter tuning based on the testing results. In addition, this paper suggests two gray zones in OCC: creating fake datasets and fake OCC problems from scratch, and decomposing samples into smaller units for accessing multiple classes. These gray zones could contribute to future theory to learn from a single class. On the other hand, the application of OCC can use multiple classes; generally, multiple classes outperform a single class. However, the applications will no longer be OCC after learning multiple classes.
{"title":"Critical Review for One-Class Classification: Recent Advances and Reality Behind Them","authors":"Toshitaka Hayashi, Dalibor Cimr, Hamido Fujita, Richard Cimler","doi":"10.1002/widm.70058","DOIUrl":"https://doi.org/10.1002/widm.70058","url":null,"abstract":"This paper presents a critical review of one-class classification (OCC). Old articles defined OCC in a vague way, which allowed OCC models to learn from multiple classes. This paper reconsiders the OCC definition, as training data includes solely one class, and samples belonging to other classes are not available. Moreover, the review introduces a new OCC taxonomy consisting of boundary, distance, probability, fake, and subtask-based approaches. Additionally, the article reveals that many OCC algorithms have learned multiple classes. Common violations include accessing unlabeled datasets, importing other datasets, and hyperparameter tuning based on the testing results. In addition, this paper suggests two gray zones in OCC: creating fake datasets and fake OCC problems from scratch, and decomposing samples into smaller units for accessing multiple classes. These gray zones could contribute to future theory to learn from a single class. On the other hand, the application of OCC can use multiple classes; generally, multiple classes outperform a single class. However, the applications will no longer be OCC after learning multiple classes.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In an era of rapid digital communication, the proliferation of manipulated information has emerged as a critical global challenge that undermines the integrity of information. Misinformation, often spread unintentionally, and disinformation, deliberately crafted to deceive, have far-reaching consequences, including eroding public trust, disrupting democratic processes, and endangering public health. Various forms, such as fake news, manipulated media, fake reviews, spam, and phishing, exploit social media and communication platforms to mislead users. Numerous techniques have been developed to detect false content, as discussed in several review articles devoted to the topic, but without mentioning quantum computing approaches. Notably, recent quantum computing reviews have not addressed misinformation or disinformation-related applications, despite growing interest in quantum methods across domains such as medicine, finance, and cybersecurity. This gap and the presence of relevant literature, especially over the last 2 years, highlight a pressing need for surveying research works into the intersection of quantum computing and misinformation or disinformation detection, which this work aims to address.
{"title":"Quantum Frontiers in the Battle for Information Integrity","authors":"Vincenzo Loia, Stefania Tomasiello","doi":"10.1002/widm.70067","DOIUrl":"https://doi.org/10.1002/widm.70067","url":null,"abstract":"In an era of rapid digital communication, the proliferation of manipulated information has emerged as a critical global challenge that undermines the integrity of information. Misinformation, often spread unintentionally, and disinformation, deliberately crafted to deceive, have far-reaching consequences, including eroding public trust, disrupting democratic processes, and endangering public health. Various forms, such as fake news, manipulated media, fake reviews, spam, and phishing, exploit social media and communication platforms to mislead users. Numerous techniques have been developed to detect false content, as discussed in several review articles devoted to the topic, but without mentioning quantum computing approaches. Notably, recent quantum computing reviews have not addressed misinformation or disinformation-related applications, despite growing interest in quantum methods across domains such as medicine, finance, and cybersecurity. This gap and the presence of relevant literature, especially over the last 2 years, highlight a pressing need for surveying research works into the intersection of quantum computing and misinformation or disinformation detection, which this work aims to address.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146110182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real‐world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairness notions. However, the lack of clear agreement on which fairness definition to apply in specific contexts and the complexity of understanding the distinctions between these definitions can create confusion and impede further progress. To this end, this paper proposes a systematic survey that clarifies the definitions of fairness as they apply to LMs. Specifically, we begin with a brief introduction to LMs and fairness in LMs, followed by a comprehensive, up‐to‐date overview of existing fairness notions in LMs and the introduction of a novel taxonomy that categorizes these concepts based on their transformer architecture: encoder‐only, decoder‐only, and encoder‐decoder LMs. We further illustrate each definition through experiments, showcasing their practical implications and outcomes. Finally, we discuss current research challenges and open questions, aiming to foster innovative ideas and advance the field. The repository is publicly available online at https://github.com/vanbanTruong/Fairness‐in‐Large‐Language‐Models/tree/main/definitions . This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Commercial, Legal, and Ethical Issues > Social Considerations Technologies > Artificial Intelligence .
{"title":"Fairness Definitions in Language Models Explained","authors":"Zhipeng Yin, Zichong Wang, Avash Palikhe, Wenbin Zhang","doi":"10.1002/widm.70063","DOIUrl":"https://doi.org/10.1002/widm.70063","url":null,"abstract":"Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real‐world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairness notions. However, the lack of clear agreement on which fairness definition to apply in specific contexts and the complexity of understanding the distinctions between these definitions can create confusion and impede further progress. To this end, this paper proposes a systematic survey that clarifies the definitions of fairness as they apply to LMs. Specifically, we begin with a brief introduction to LMs and fairness in LMs, followed by a comprehensive, up‐to‐date overview of existing fairness notions in LMs and the introduction of a novel taxonomy that categorizes these concepts based on their transformer architecture: encoder‐only, decoder‐only, and encoder‐decoder LMs. We further illustrate each definition through experiments, showcasing their practical implications and outcomes. Finally, we discuss current research challenges and open questions, aiming to foster innovative ideas and advance the field. The repository is publicly available online at <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/vanbanTruong/Fairness-in-Large-Language-Models/tree/main/definitions\">https://github.com/vanbanTruong/Fairness‐in‐Large‐Language‐Models/tree/main/definitions</jats:ext-link> . This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item> Commercial, Legal, and Ethical Issues > Fairness in Data Mining </jats:list-item> <jats:list-item> Commercial, Legal, and Ethical Issues > Social Considerations </jats:list-item> <jats:list-item> Technologies > Artificial Intelligence </jats:list-item> </jats:list> .","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Road safety is a critical issue due to its significant impact on public health and economic stability. Traffic accidents result in millions of fatalities and injuries globally each year, imposing substantial healthcare costs and loss of productivity. Therefore, systematic data collection is urgently needed to identify key road safety challenges and implement effective solutions. This study examines recent advancements in artificial intelligence (AI) and deep learning techniques for detecting road anomalies, including potholes and speed bumps, utilizing cost‐effective, commercially available cameras. It provides a comprehensive overview of various methodologies for detecting road damage, emphasizing the value of integrating visual, qualitative, and quantitative analyses. Additionally, the study evaluates various algorithms, including R‐CNN (Regions with CNN) for object detection and CrackU‐net for crack detection, to analyze their effectiveness in enhancing road maintenance and safety. Beyond technical methods, the study also examines global trends in road safety, emphasizing the need for comprehensive policy frameworks and knowledge transfer from developed to developing countries to reduce fatalities and enhance road infrastructure. Finally, the study addresses challenges such as limited visibility, adverse weather conditions, and the current limitations of existing models, while discussing the potential for future advancements in automated road safety systems. This article is categorized under: Technologies > Artificial Intelligence
{"title":"Artificial Intelligence for Road Anomaly Detection: A Review","authors":"Rohit Samanta, Amutha Sadasivan, Muthu Subash Kavitha, Surendiran Balasubramanian","doi":"10.1002/widm.70054","DOIUrl":"https://doi.org/10.1002/widm.70054","url":null,"abstract":"Road safety is a critical issue due to its significant impact on public health and economic stability. Traffic accidents result in millions of fatalities and injuries globally each year, imposing substantial healthcare costs and loss of productivity. Therefore, systematic data collection is urgently needed to identify key road safety challenges and implement effective solutions. This study examines recent advancements in artificial intelligence (AI) and deep learning techniques for detecting road anomalies, including potholes and speed bumps, utilizing cost‐effective, commercially available cameras. It provides a comprehensive overview of various methodologies for detecting road damage, emphasizing the value of integrating visual, qualitative, and quantitative analyses. Additionally, the study evaluates various algorithms, including R‐CNN (Regions with CNN) for object detection and CrackU‐net for crack detection, to analyze their effectiveness in enhancing road maintenance and safety. Beyond technical methods, the study also examines global trends in road safety, emphasizing the need for comprehensive policy frameworks and knowledge transfer from developed to developing countries to reduce fatalities and enhance road infrastructure. Finally, the study addresses challenges such as limited visibility, adverse weather conditions, and the current limitations of existing models, while discussing the potential for future advancements in automated road safety systems. This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item> Technologies > Artificial Intelligence </jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the contemporary healthcare landscape, secure and efficient data sharing is paramount, especially when utilizing cloud‐based platforms. The advent of cloud computing has revolutionized healthcare data sharing, offering unparalleled accessibility and scalability. However, the inherent risks associated with data breaches and privacy violations pose significant challenges, necessitating robust security measures. In such scenarios, the integration of threat intelligence with privacy‐preserving techniques becomes imperative to safeguard sensitive healthcare information. This research introduces a novel algorithm, FedGANet, alongside an integrated Privacy‐Preserving Threat Intelligence Model (FedGAN‐PPTIM), developed to strengthen secure healthcare data exchange within cloud and IoMT environments. FedGANet enhances traditional security paradigms by jointly leveraging Generative Adversarial Networks (GANs) to synthesize realistic threat scenarios and Federated Learning (FL) to enable decentralized model training without exposing sensitive patient data. The model further aligns with interoperability considerations, supporting seamless integration into diverse clinical ecosystems. The proposed FedGAN‐PPTIM framework is extensively compared with established privacy‐preserving and threat intelligence approaches across multiple evaluation metrics, including privacy leakage, threat detection rate, false positive rate, and communication overhead. The simulation analysis demonstrates that FedGANet outperforms existing methods, significantly reducing privacy leakage and communication overhead while maintaining high threat detection rates and low false positive rates. These results underscore the efficacy of FedGANet in addressing privacy and security challenges in healthcare data sharing. This article is categorized under: Technologies > Cloud Computing Technologies > Artificial Intelligence Commercial, Legal, and Ethical Issues > Security and Privacy
{"title":"A Privacy‐Preserving Threat Intelligence Model for Secure Healthcare Data Sharing in the Cloud","authors":"I. Sakthidevi, G. Fathima","doi":"10.1002/widm.70064","DOIUrl":"https://doi.org/10.1002/widm.70064","url":null,"abstract":"In the contemporary healthcare landscape, secure and efficient data sharing is paramount, especially when utilizing cloud‐based platforms. The advent of cloud computing has revolutionized healthcare data sharing, offering unparalleled accessibility and scalability. However, the inherent risks associated with data breaches and privacy violations pose significant challenges, necessitating robust security measures. In such scenarios, the integration of threat intelligence with privacy‐preserving techniques becomes imperative to safeguard sensitive healthcare information. This research introduces a novel algorithm, FedGANet, alongside an integrated Privacy‐Preserving Threat Intelligence Model (FedGAN‐PPTIM), developed to strengthen secure healthcare data exchange within cloud and IoMT environments. FedGANet enhances traditional security paradigms by jointly leveraging Generative Adversarial Networks (GANs) to synthesize realistic threat scenarios and Federated Learning (FL) to enable decentralized model training without exposing sensitive patient data. The model further aligns with interoperability considerations, supporting seamless integration into diverse clinical ecosystems. The proposed FedGAN‐PPTIM framework is extensively compared with established privacy‐preserving and threat intelligence approaches across multiple evaluation metrics, including privacy leakage, threat detection rate, false positive rate, and communication overhead. The simulation analysis demonstrates that FedGANet outperforms existing methods, significantly reducing privacy leakage and communication overhead while maintaining high threat detection rates and low false positive rates. These results underscore the efficacy of FedGANet in addressing privacy and security challenges in healthcare data sharing. This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item> Technologies > Cloud Computing </jats:list-item> <jats:list-item> Technologies > Artificial Intelligence </jats:list-item> <jats:list-item> Commercial, Legal, and Ethical Issues > Security and Privacy </jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the advancement of generative artificial intelligence, AI‐generated image methods have experienced rapid development in interior design rendering. These methods enable the rapid generation of creative interior design renderings but accompany uncertainties in the generated images, which challenges the requirements of design renderings. Researchers have explored various approaches to enhance consistency in AI‐generated images. This review summarizes the methods and roles of generative artificial intelligence in interior design compared with traditional techniques and the relationships between the AI‐generated images and controlled parameters such as the workflow nodes, prompts, and models. Image consistency is a critical factor in the design generation process; their methods to control interior design renderings include prompts, image‐to‐image, ControlNet, IP‐Adapter, LoRA, SAM, and so forth. Much evidence reveals that ControlNet could control the positional relationship, IP‐Adapter could influence different styles, LoRA could excel in customized styles, and SAM could modify local regions. This article is categorized under: Technologies > Artificial Intelligence Commercial, Legal, and Ethical Issues > Fairness in Data Mining
{"title":"A Review on the Consistency of AI ‐Generated Images for Interior Design Rendering","authors":"Shuangyang Tan, Shasha Chen","doi":"10.1002/widm.70056","DOIUrl":"https://doi.org/10.1002/widm.70056","url":null,"abstract":"With the advancement of generative artificial intelligence, AI‐generated image methods have experienced rapid development in interior design rendering. These methods enable the rapid generation of creative interior design renderings but accompany uncertainties in the generated images, which challenges the requirements of design renderings. Researchers have explored various approaches to enhance consistency in AI‐generated images. This review summarizes the methods and roles of generative artificial intelligence in interior design compared with traditional techniques and the relationships between the AI‐generated images and controlled parameters such as the workflow nodes, prompts, and models. Image consistency is a critical factor in the design generation process; their methods to control interior design renderings include prompts, image‐to‐image, ControlNet, IP‐Adapter, LoRA, SAM, and so forth. Much evidence reveals that ControlNet could control the positional relationship, IP‐Adapter could influence different styles, LoRA could excel in customized styles, and SAM could modify local regions. This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item> Technologies > Artificial Intelligence </jats:list-item> <jats:list-item> Commercial, Legal, and Ethical Issues > Fairness in Data Mining </jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145897539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The sudden increase in adoption of the Internet of Things (IoT) has revolutionized modern living but also brought unprecedented security challenges due to its distributed, heterogeneous, and resource‐constrained nature. This review paper offers a comprehensive examination of machine learning (ML) and deep learning (DL) approaches tailored for intrusion detection and threat mitigation in IoT ecosystems. It explores the landscape of anomaly detection and classification techniques while analyzing their suitability, limitations, and deployment feasibility across IoT layers. The study also investigates the significance of feature engineering, model selection, and system scalability. A novel addition to this review is the integration of emerging trends such as explainable AI (XAI), which enhances transparency and trust in black‐box ML/DL models, and federated learning (FL), a privacy‐preserving paradigm that allows decentralized model training without raw data sharing. The synergy between FL and Edge AI is discussed to highlight real‐time, low‐latency security analytics at the network's edge. Comparative tables, domain‐specific applications (e.g., smart homes, healthcare, and industrial IoT), and architectural illustrations support the discourse, providing readers with an up‐to‐date understanding of current capabilities and ongoing research challenges. This paper concludes with practical implications, research gaps, and future directions for building intelligent, secure, and explainable IoT security frameworks that respect user privacy and enable scalable deployment. This article is categorized under: Fundamental Concepts of Data and Knowledge > Explainable AI Technologies > Internet of Things Technologies > Machine Learning
{"title":"Security Solutions for the Internet of Things Using Machine Learning and Deep Learning: Current Trends and Future Directions","authors":"Himanshu Sharma, Prabhat Kumar, Kavita Sharma","doi":"10.1002/widm.70059","DOIUrl":"https://doi.org/10.1002/widm.70059","url":null,"abstract":"The sudden increase in adoption of the Internet of Things (IoT) has revolutionized modern living but also brought unprecedented security challenges due to its distributed, heterogeneous, and resource‐constrained nature. This review paper offers a comprehensive examination of machine learning (ML) and deep learning (DL) approaches tailored for intrusion detection and threat mitigation in IoT ecosystems. It explores the landscape of anomaly detection and classification techniques while analyzing their suitability, limitations, and deployment feasibility across IoT layers. The study also investigates the significance of feature engineering, model selection, and system scalability. A novel addition to this review is the integration of emerging trends such as explainable AI (XAI), which enhances transparency and trust in black‐box ML/DL models, and federated learning (FL), a privacy‐preserving paradigm that allows decentralized model training without raw data sharing. The synergy between FL and Edge AI is discussed to highlight real‐time, low‐latency security analytics at the network's edge. Comparative tables, domain‐specific applications (e.g., smart homes, healthcare, and industrial IoT), and architectural illustrations support the discourse, providing readers with an up‐to‐date understanding of current capabilities and ongoing research challenges. This paper concludes with practical implications, research gaps, and future directions for building intelligent, secure, and explainable IoT security frameworks that respect user privacy and enable scalable deployment. This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item> Fundamental Concepts of Data and Knowledge > Explainable AI </jats:list-item> <jats:list-item> Technologies > Internet of Things </jats:list-item> <jats:list-item> Technologies > Machine Learning </jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pamela Buñay‐Guisñan, Juan A. Lara, Cristóbal Romero
Counterfactuals are a type of explanations based on hypothetical scenarios used in Explainable Artificial Intelligence (XAI), showing what changes in input variables could have led to different outcomes in predictive problems. In the field of education, counterfactuals enable educators to explore various hypothetical scenarios, facilitating informed decision‐making and the application of educational strategies for improving students' academic performance or reducing dropout rates, among others. Despite the gradual expansion of research on counterfactuals in education, systematic literature reviews on this topic remain scarce. The identification of the most relevant advancements in this field can provide a deep insight into the current state of research, highlighting the most effective areas and revealing opportunities for future studies. The objective of this research is to conduct a systematic literature review, using the PRISMA methodology, to analyze three aspects regarding the use of counterfactuals in education: the problems that counterfactuals help to address in education, the methods and/or algorithms used to generate them, and how the counterfactuals are presented in the educational context. As a result, we have identified a series of key challenges and opportunities for future research over the next few years, which constitute the main contribution of this paper. This article is categorized under: Application Areas > Education and Learning Algorithmic Development > Causality Discovery Fundamental Concepts of Data and Knowledge > Explainable AI
{"title":"Counterfactual Explanations in Education: A Systematic Review","authors":"Pamela Buñay‐Guisñan, Juan A. Lara, Cristóbal Romero","doi":"10.1002/widm.70060","DOIUrl":"https://doi.org/10.1002/widm.70060","url":null,"abstract":"Counterfactuals are a type of explanations based on hypothetical scenarios used in Explainable Artificial Intelligence (XAI), showing what changes in input variables could have led to different outcomes in predictive problems. In the field of education, counterfactuals enable educators to explore various hypothetical scenarios, facilitating informed decision‐making and the application of educational strategies for improving students' academic performance or reducing dropout rates, among others. Despite the gradual expansion of research on counterfactuals in education, systematic literature reviews on this topic remain scarce. The identification of the most relevant advancements in this field can provide a deep insight into the current state of research, highlighting the most effective areas and revealing opportunities for future studies. The objective of this research is to conduct a systematic literature review, using the PRISMA methodology, to analyze three aspects regarding the use of counterfactuals in education: the problems that counterfactuals help to address in education, the methods and/or algorithms used to generate them, and how the counterfactuals are presented in the educational context. As a result, we have identified a series of key challenges and opportunities for future research over the next few years, which constitute the main contribution of this paper. This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item> Application Areas > Education and Learning </jats:list-item> <jats:list-item> Algorithmic Development > Causality Discovery </jats:list-item> <jats:list-item> Fundamental Concepts of Data and Knowledge > Explainable AI </jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145847529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human brain neuron activities are incredibly significant nowadays. Neuronal behavior is assessed by analyzing signal data such as extracellular recording, which can offer scientists valuable information about diseases and neuron activities. One of the difficulties researchers confront while evaluating these signals is the existence of large volumes of spike data. Spikes are significant components of signal data that can happen as a consequence of vital biomarkers or physical issues such as electrode movements. Hence, distinguishing types of spikes is essential. From this spot, the spike classification concept commences. Previously, researchers classified spikes manually. The manual classification was not precise enough, as it involved extensive analysis. Consequently, Artificial Intelligence (AI) was introduced into neuroscience to assist clinicians in classifying spikes correctly. Recognizing noises from spikes produced by neural activity causes the spike classification task to bear a significant demand. Classifying spikes accurately and quickly reveals the role of AI in the scope of spike classification. This review provides an in‐depth discussion of the importance and use of AI in spike classification. This work organizes materials in the spike classification field for future studies and fully describes how spikes are recognized. Therefore, the existing datasets are described first. The topic of spike classification is then separated into three major components: preprocessing, classification, and evaluation. Each of these sections introduces existing methods and determines their importance. Having been summarized and compared, more efficient algorithms are highlighted. The primary goal of this work is to provide a perspective on spike classification for future research, as well as a thorough grasp of the methodologies and issues involved. In this work, numerous studies were extracted from various databases. The PRISMA‐related research guidelines were then used to choose papers. Then, research studies based on spike classification using machine learning and deep learning approaches with effective preprocessing were selected. Although there are research papers on spike sorting using the keyword spike, the primary focus of this study is on spike classification. Finally, 47 papers were selected for in‐depth review. First, useful information on the datasets for these papers is supplied. In addition, preprocessing approaches, classification methods, and ultimate performance are investigated in each of these studies. The material is then summarized. Furthermore, the fundamental concerns regarding spike classification raised in the opening of this paper are thoroughly addressed throughout the review. Our reviewing outcomes illustrate that support vector machine and clustering‐based algorithms drastically influence machine learning methods in terms of high accuracy and many uses. Moreover, convolutional neural networks, spiky neural networks, and atten
{"title":"Functional Classification of Spiking Signal Data Using Artificial Intelligence Techniques: A Systematic Review","authors":"Danial Sharifrazi, Nouman Javed, Javad Hassannataj Joloudari, Roohallah Alizadehsani, Saadat Behzadi, Prasad N. Paradkar, Ru‐San Tan, U. Rajendra Acharya, Asim Bhatti","doi":"10.1002/widm.70053","DOIUrl":"https://doi.org/10.1002/widm.70053","url":null,"abstract":"Human brain neuron activities are incredibly significant nowadays. Neuronal behavior is assessed by analyzing signal data such as extracellular recording, which can offer scientists valuable information about diseases and neuron activities. One of the difficulties researchers confront while evaluating these signals is the existence of large volumes of spike data. Spikes are significant components of signal data that can happen as a consequence of vital biomarkers or physical issues such as electrode movements. Hence, distinguishing types of spikes is essential. From this spot, the spike classification concept commences. Previously, researchers classified spikes manually. The manual classification was not precise enough, as it involved extensive analysis. Consequently, Artificial Intelligence (AI) was introduced into neuroscience to assist clinicians in classifying spikes correctly. Recognizing noises from spikes produced by neural activity causes the spike classification task to bear a significant demand. Classifying spikes accurately and quickly reveals the role of AI in the scope of spike classification. This review provides an in‐depth discussion of the importance and use of AI in spike classification. This work organizes materials in the spike classification field for future studies and fully describes how spikes are recognized. Therefore, the existing datasets are described first. The topic of spike classification is then separated into three major components: preprocessing, classification, and evaluation. Each of these sections introduces existing methods and determines their importance. Having been summarized and compared, more efficient algorithms are highlighted. The primary goal of this work is to provide a perspective on spike classification for future research, as well as a thorough grasp of the methodologies and issues involved. In this work, numerous studies were extracted from various databases. The PRISMA‐related research guidelines were then used to choose papers. Then, research studies based on spike classification using machine learning and deep learning approaches with effective preprocessing were selected. Although there are research papers on spike sorting using the keyword spike, the primary focus of this study is on spike classification. Finally, 47 papers were selected for in‐depth review. First, useful information on the datasets for these papers is supplied. In addition, preprocessing approaches, classification methods, and ultimate performance are investigated in each of these studies. The material is then summarized. Furthermore, the fundamental concerns regarding spike classification raised in the opening of this paper are thoroughly addressed throughout the review. Our reviewing outcomes illustrate that support vector machine and clustering‐based algorithms drastically influence machine learning methods in terms of high accuracy and many uses. Moreover, convolutional neural networks, spiky neural networks, and atten","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145844735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}