WIREs Data Mining and Knowledge Discovery最新文献_第5页

Machine Learning and Deep Learning Techniques to Detect Mental Stress Using Various Physiological Signals: A Critical Insight 利用各种生理信号检测精神压力的机器学习和深度学习技术：一个关键的见解

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-07-14 DOI: 10.1002/widm.70035

Megha Khandelwal, Arun Sharma

This paper presents a comprehensive review on the various techniques and methodologies employed to detect stress among individuals. The review encompasses a broad spectrum of methods, including physiological measurements, wearable technology, machine learning and deep learning algorithms, and contactless image‐based techniques. The paper outlines the physiological markers commonly associated with stress, such as Electrocardiogram (ECG), Electroencephalography (EEG), Photoplethysmography (PPG), and Skin Galvanic response. It examines the various wearable and contactless techniques to acquire data. Furthermore, it explores the integration of machine learning and deep learning techniques for the development of predictive stress detection models, highlighting their accuracy. It also addresses the potential of multispectral and hyperspectral imaging in this area. Some of the publicly available datasets are also discussed in this paper.This article is categorized under:

Application Areas > Health Care

Technologies > Machine Learning

本文提出了对各种技术和方法的全面审查，用于检测个人之间的压力。该综述涵盖了广泛的方法，包括生理测量、可穿戴技术、机器学习和深度学习算法，以及基于非接触式图像的技术。本文概述了通常与应激相关的生理指标，如心电图（ECG）、脑电图（EEG）、光容积脉搏波（PPG）和皮肤电反应。它检查了各种可穿戴和非接触式技术来获取数据。此外，它还探讨了机器学习和深度学习技术的集成，以开发预测应力检测模型，突出其准确性。它还讨论了多光谱和高光谱成像在该领域的潜力。本文还讨论了一些公开可用的数据集。本文分类如下：应用领域>；医疗保健技术；机器学习

{"title":"Machine Learning and Deep Learning Techniques to Detect Mental Stress Using Various Physiological Signals: A Critical Insight","authors":"Megha Khandelwal, Arun Sharma","doi":"10.1002/widm.70035","DOIUrl":"https://doi.org/10.1002/widm.70035","url":null,"abstract":"This paper presents a comprehensive review on the various techniques and methodologies employed to detect stress among individuals. The review encompasses a broad spectrum of methods, including physiological measurements, wearable technology, machine learning and deep learning algorithms, and contactless image‐based techniques. The paper outlines the physiological markers commonly associated with stress, such as Electrocardiogram (ECG), Electroencephalography (EEG), Photoplethysmography (PPG), and Skin Galvanic response. It examines the various wearable and contactless techniques to acquire data. Furthermore, it explores the integration of machine learning and deep learning techniques for the development of predictive stress detection models, highlighting their accuracy. It also addresses the potential of multispectral and hyperspectral imaging in this area. Some of the publicly available datasets are also discussed in this paper.This article is categorized under: Application Areas > Health Care Technologies > Machine Learning ","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey on Efficient Vision‐Language Models 高效视觉语言模型研究综述

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-07-14 DOI: 10.1002/widm.70036

Gaurav Shinde, Anuradha Ravi, Emon Dey, Shadman Sakib, Milind Rampure, Nirmalya Roy

Vision‐language models (VLMs) integrate visual and textual information, enabling a wide range of applications such as image captioning and visual question answering, making them crucial for modern AI systems. However, their high computational demands pose challenges for real‐time applications. This has led to a growing focus on developing efficient vision‐language models. In this survey, we review key techniques for optimizing VLMs on edge and resource‐constrained devices. We also explore compact VLM architectures, frameworks, and provide detailed insights into the performance–memory trade‐offs of efficient VLMs. Furthermore, we establish a GitHub repository at MPSC‐GitHub to compile all surveyed papers, which we will actively update. Our objective is to foster deeper research in this area.This article is categorized under:

Fundamental Concepts of Data and Knowledge > Big Data Mining

Technologies > Internet of Things

Technologies > Artificial Intelligence

视觉语言模型（vlm）集成了视觉和文本信息，实现了广泛的应用，如图像字幕和视觉问答，使其成为现代人工智能系统的关键。然而，它们的高计算需求给实时应用带来了挑战。这使得人们越来越关注开发高效的视觉语言模型。在本调查中，我们回顾了在边缘和资源受限设备上优化vlm的关键技术。我们还探讨了紧凑的VLM架构、框架，并提供了高效VLM的性能内存权衡的详细见解。此外，我们在MPSC - GitHub上建立了一个GitHub存储库来编译所有被调查的论文，我们将积极更新。我们的目标是促进这一领域的深入研究。本文分类如下：数据和知识的基本概念>；大数据挖掘技术；物联网技术；人工智能

引用次数: 0

Application of Explainable Artificial Intelligence (XAI) Techniques in Patients With Intracranial Hemorrhage: A Systematic Review 可解释人工智能（XAI）技术在颅内出血患者中的应用：系统综述

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-06-28 DOI: 10.1002/widm.70031

Ali Kohan, Amir Zahedi, Roohallah Alizadehsani, Ru‐San Tan, U. Rajendra Acharya

Intracranial hemorrhage (IH) is a critical condition requiring rapid and accurate diagnosis to ensure effective treatment and reduce mortality rates. Recently, artificial intelligence (AI) models have demonstrated significant potential in automating the detection and analysis of brain injuries in IH patients. However, the “black‐box” nature of many AI systems raises concerns about transparency, reliability, and clinical applicability. Explainable AI (XAI) addresses these challenges by making AI models more interpretable, allowing healthcare professionals to understand and trust the decision‐making processes. This review paper explores various XAI techniques—such as SHapley Additive exPlanations (SHAP), Local Interpretable Model‐Agnostic Explanations (LIME), Randomized Input Sampling for Explanation (RISE), Class Activation Mapping (CAM), and its variants—and their specific applications in IH clinical tasks. We systematically examine studies incorporating XAI for curing IH patients, highlighting how these methods enhance model transparency and support clinical decision‐making. The Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) methodology was employed to select the papers. Studies are categorized into those using tabular data and those using image data. The literature indicates a rapidly growing number of XAI publications in this field. SHAP is the most commonly used XAI method for tabular data, while CAM‐based methods, such as Grad‐CAM, dominate in image‐based applications. Furthermore, we discuss current limitations of XAI methods and future research directions. This review aims to provide researchers and clinicians with valuable insights into the role of XAI in improving the reliability and practical integration of AI‐driven tools for IH patient care.This article is categorized under:

Application Areas > Health Care

Fundamental Concepts of Data and Knowledge > Explainable AI

Technologies > Machine Learning

颅内出血（IH）是一种危重疾病，需要快速准确的诊断，以确保有效治疗和降低死亡率。最近，人工智能（AI）模型在IH患者脑损伤的自动化检测和分析方面显示出巨大的潜力。然而，许多人工智能系统的“黑箱”性质引起了人们对透明度、可靠性和临床适用性的担忧。可解释人工智能（XAI）通过使人工智能模型更具可解释性来解决这些挑战，使医疗保健专业人员能够理解和信任决策过程。这篇综述论文探讨了各种XAI技术，如SHapley加性解释（SHAP）、局部可解释模型不确定解释（LIME）、随机输入解释抽样（RISE）、类激活映射（CAM）及其变体，以及它们在IH临床任务中的具体应用。我们系统地检查了结合XAI治疗IH患者的研究，强调了这些方法如何提高模型透明度和支持临床决策。采用系统评价和Meta分析首选报告项目（PRISMA）方法选择论文。研究分为使用表格数据和使用图像数据。文献表明，该领域的XAI出版物数量正在迅速增长。对于表格数据，SHAP是最常用的XAI方法，而基于CAM的方法，如Grad - CAM，在基于图像的应用中占主导地位。此外，我们还讨论了当前XAI方法的局限性和未来的研究方向。本综述旨在为研究人员和临床医生提供有价值的见解，以了解人工智能在提高人工智能驱动的IH患者护理工具的可靠性和实际集成方面的作用。本文分类如下：应用领域>；卫生保健数据与知识的基本概念可解释的人工智能技术机器学习

{"title":"Application of Explainable Artificial Intelligence (XAI) Techniques in Patients With Intracranial Hemorrhage: A Systematic Review","authors":"Ali Kohan, Amir Zahedi, Roohallah Alizadehsani, Ru‐San Tan, U. Rajendra Acharya","doi":"10.1002/widm.70031","DOIUrl":"https://doi.org/10.1002/widm.70031","url":null,"abstract":"Intracranial hemorrhage (IH) is a critical condition requiring rapid and accurate diagnosis to ensure effective treatment and reduce mortality rates. Recently, artificial intelligence (AI) models have demonstrated significant potential in automating the detection and analysis of brain injuries in IH patients. However, the “black‐box” nature of many AI systems raises concerns about transparency, reliability, and clinical applicability. Explainable AI (XAI) addresses these challenges by making AI models more interpretable, allowing healthcare professionals to understand and trust the decision‐making processes. This review paper explores various XAI techniques—such as SHapley Additive exPlanations (SHAP), Local Interpretable Model‐Agnostic Explanations (LIME), Randomized Input Sampling for Explanation (RISE), Class Activation Mapping (CAM), and its variants—and their specific applications in IH clinical tasks. We systematically examine studies incorporating XAI for curing IH patients, highlighting how these methods enhance model transparency and support clinical decision‐making. The Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) methodology was employed to select the papers. Studies are categorized into those using tabular data and those using image data. The literature indicates a rapidly growing number of XAI publications in this field. SHAP is the most commonly used XAI method for tabular data, while CAM‐based methods, such as Grad‐CAM, dominate in image‐based applications. Furthermore, we discuss current limitations of XAI methods and future research directions. This review aims to provide researchers and clinicians with valuable insights into the role of XAI in improving the reliability and practical integration of AI‐driven tools for IH patient care.This article is categorized under: Application Areas > Health Care Fundamental Concepts of Data and Knowledge > Explainable AI Technologies > Machine Learning ","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144503560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial Intelligence Techniques Enabled Soil Moisture Estimation Frameworks Using Remote Sensing Satellite Images: Challenges and Future Directions‐ Review 利用遥感卫星图像的人工智能技术实现土壤湿度估算框架：挑战和未来方向-综述

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-06-27 DOI: 10.1002/widm.70032

Mangayarkarasi Ramaiah, Prabhavathy Settu, Vinayakumar Ravi

Forecasting soil moisture is critical for keeping groundwater levels stable, monitoring droughts, and assisting agricultural productivity. Surface soil moisture has a tremendous impact on both the environment and society. To provide proper soil moisture, the right tools are required. Gravimetric, physical, and empirical models produce reliable results, but they are generally context‐dependent and inappropriate for large‐scale investigations. Remote sensing has developed as a credible technology for estimating large‐scale soil moisture levels. However, various obstacles exist when getting soil moisture data using remote sensing, including the availability and precision of data sources. The spatial and temporal limits of many remote sensing sources, such as microwave and optical sensors, combined with environmental conditions, provide considerable feasibility issues. As a result, a robust model capable of accurately capturing both linear and nonlinear connections between multiple surface soil variables is critical. Recently, AI approaches have been identified as promising options for managing complicated factors in this domain. This review paper investigates the use of several AI algorithms for estimating soil moisture content (SMC). It focusses on AI‐enabled frameworks built with remote sensing satellite imagery. In addition to including in situ observations, the study discusses the advantages of AI approaches, the issues they solve, and provides a detailed description of the integration of microwave, optical, and combination (synergistic) data sources. This paper also addresses the most common AI approaches applied with various types of remote sensing data and the results they produced. By exploring the strengths and technical problems associated with diverse data sources, this work hopes to help researchers make wise choices about data selection and model construction. Finally, the proposed future research directions are likely to assist emerging researchers in broadening the scope of this critical topic in a way that corresponds with future demands.This article is categorized under:

Technologies > Artificial Intelligence

Technologies > Machine Learning

Technologies > Prediction

预测土壤湿度对于保持地下水位稳定、监测干旱和促进农业生产力至关重要。表层土壤湿度对环境和社会都有巨大的影响。为了提供适当的土壤湿度，需要使用合适的工具。重力、物理和经验模型产生可靠的结果，但它们通常依赖于环境，不适合大规模的研究。遥感已经发展成为估算大尺度土壤湿度水平的可靠技术。然而，在利用遥感获取土壤湿度数据时，存在各种障碍，包括数据源的可用性和精度。微波和光学传感器等许多遥感源的空间和时间限制，加上环境条件，造成了相当大的可行性问题。因此，一个能够准确捕捉多个表层土壤变量之间的线性和非线性联系的稳健模型至关重要。最近，人工智能方法已被确定为管理该领域复杂因素的有希望的选择。本文综述了几种人工智能算法在估算土壤含水量（SMC）中的应用。它侧重于利用遥感卫星图像构建的支持人工智能的框架。除了包括现场观测外，该研究还讨论了人工智能方法的优势及其解决的问题，并详细描述了微波、光学和组合（协同）数据源的集成。本文还讨论了应用于各种类型遥感数据的最常见人工智能方法及其产生的结果。本工作希望通过探索不同数据源的优势和技术问题，帮助研究人员在数据选择和模型构建方面做出明智的选择。最后，提出的未来研究方向可能有助于新兴研究人员以符合未来需求的方式扩大这一关键主题的范围。本文分类如下：技术>；人工智能技术；机器学习技术；预测

{"title":"Artificial Intelligence Techniques Enabled Soil Moisture Estimation Frameworks Using Remote Sensing Satellite Images: Challenges and Future Directions‐ Review","authors":"Mangayarkarasi Ramaiah, Prabhavathy Settu, Vinayakumar Ravi","doi":"10.1002/widm.70032","DOIUrl":"https://doi.org/10.1002/widm.70032","url":null,"abstract":"Forecasting soil moisture is critical for keeping groundwater levels stable, monitoring droughts, and assisting agricultural productivity. Surface soil moisture has a tremendous impact on both the environment and society. To provide proper soil moisture, the right tools are required. Gravimetric, physical, and empirical models produce reliable results, but they are generally context‐dependent and inappropriate for large‐scale investigations. Remote sensing has developed as a credible technology for estimating large‐scale soil moisture levels. However, various obstacles exist when getting soil moisture data using remote sensing, including the availability and precision of data sources. The spatial and temporal limits of many remote sensing sources, such as microwave and optical sensors, combined with environmental conditions, provide considerable feasibility issues. As a result, a robust model capable of accurately capturing both linear and nonlinear connections between multiple surface soil variables is critical. Recently, AI approaches have been identified as promising options for managing complicated factors in this domain. This review paper investigates the use of several AI algorithms for estimating soil moisture content (SMC). It focusses on AI‐enabled frameworks built with remote sensing satellite imagery. In addition to including in situ observations, the study discusses the advantages of AI approaches, the issues they solve, and provides a detailed description of the integration of microwave, optical, and combination (synergistic) data sources. This paper also addresses the most common AI approaches applied with various types of remote sensing data and the results they produced. By exploring the strengths and technical problems associated with diverse data sources, this work hopes to help researchers make wise choices about data selection and model construction. Finally, the proposed future research directions are likely to assist emerging researchers in broadening the scope of this critical topic in a way that corresponds with future demands.This article is categorized under: Technologies > Artificial Intelligence Technologies > Machine Learning Technologies > Prediction ","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144503606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Literature Review of Textual Cyber Abuse Detection Using Cutting‐Edge Natural Language Processing Techniques: Language Models and Large Language Models 基于前沿自然语言处理技术的文本网络滥用检测的文献综述：语言模型和大型语言模型

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-06-27 DOI: 10.1002/widm.70029

J. Angel Diaz‐Garcia, Joao Paulo Carvalho

The success of social media platforms has facilitated the emergence of various forms of online abuse within digital communities. This abuse manifests in multiple ways, including hate speech, cyberbullying, emotional abuse, grooming, and shame sexting or sextortion. In this paper, we present a comprehensive analysis of the different forms of abuse prevalent in social media, with a particular focus on how emerging technologies, such as Language Models (LMs) and Large Language Models (LLMs), are reshaping both the detection and generation of abusive content within these networks. We delve into the mechanisms through which social media abuse is perpetuated, exploring the psychological and social impact. To achieve this, we conducted a literature review based on PRISMA methodology, deriving key insights in the field of cyber abuse detection. Additionally, we examine the dual role of advanced language models—highlighting their potential to enhance automated detection systems for abusive behavior while also acknowledging their capacity to generate harmful content. This paper contributes to the ongoing discourse on online safety and ethics by offering both theoretical and practical insights into the evolving landscape of cyber abuse, as well as the technological innovations that simultaneously mitigate and exacerbate it. The findings support platform administrators and policymakers in developing more effective moderation strategies, conducting comprehensive risk assessments, and integrating AI responsibly to create safer digital environments.This article is categorized under:

Algorithmic Development > Web Mining

Technologies > Classification

社交媒体平台的成功促进了数字社区中各种形式的网络虐待的出现。这种虐待以多种方式表现出来，包括仇恨言论、网络欺凌、情感虐待、引诱、羞辱性短信或性勒索。在本文中，我们对社交媒体中普遍存在的不同形式的滥用进行了全面分析，特别关注语言模型（LMs）和大型语言模型（LLMs）等新兴技术如何重塑这些网络中滥用内容的检测和生成。我们深入研究了社交媒体滥用持续存在的机制，探索了心理和社会影响。为了实现这一目标，我们基于PRISMA方法进行了文献综述，得出了网络滥用检测领域的关键见解。此外，我们研究了高级语言模型的双重作用——强调它们增强滥用行为自动检测系统的潜力，同时也承认它们产生有害内容的能力。本文通过对不断演变的网络滥用情况以及同时减轻和加剧网络滥用的技术创新提供理论和实践见解，为正在进行的关于网络安全和道德的讨论做出了贡献。研究结果支持平台管理者和政策制定者制定更有效的节制策略，进行全面的风险评估，并负责任地整合人工智能，以创造更安全的数字环境。本文分类如下：算法开发>；Web挖掘技术；分类

{"title":"A Literature Review of Textual Cyber Abuse Detection Using Cutting‐Edge Natural Language Processing Techniques: Language Models and Large Language Models","authors":"J. Angel Diaz‐Garcia, Joao Paulo Carvalho","doi":"10.1002/widm.70029","DOIUrl":"https://doi.org/10.1002/widm.70029","url":null,"abstract":"The success of social media platforms has facilitated the emergence of various forms of online abuse within digital communities. This abuse manifests in multiple ways, including hate speech, cyberbullying, emotional abuse, grooming, and shame sexting or sextortion. In this paper, we present a comprehensive analysis of the different forms of abuse prevalent in social media, with a particular focus on how emerging technologies, such as Language Models (LMs) and Large Language Models (LLMs), are reshaping both the detection and generation of abusive content within these networks. We delve into the mechanisms through which social media abuse is perpetuated, exploring the psychological and social impact. To achieve this, we conducted a literature review based on PRISMA methodology, deriving key insights in the field of cyber abuse detection. Additionally, we examine the dual role of advanced language models—highlighting their potential to enhance automated detection systems for abusive behavior while also acknowledging their capacity to generate harmful content. This paper contributes to the ongoing discourse on online safety and ethics by offering both theoretical and practical insights into the evolving landscape of cyber abuse, as well as the technological innovations that simultaneously mitigate and exacerbate it. The findings support platform administrators and policymakers in developing more effective moderation strategies, conducting comprehensive risk assessments, and integrating AI responsibly to create safer digital environments.This article is categorized under: Algorithmic Development > Web Mining Technologies > Classification ","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144503457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Overview of Heterogeneous Social Network Analysis 异质社会网络分析综述

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-06-13 DOI: 10.1002/widm.70028

Deepti Singh, Ankita Verma

Heterogeneous Social Networks (HSNs) represent complex structures where diverse entities, such as users, items, and interactions, coexist and interact within a unified framework. This paper offers a systematic review of HSN Analysis, addressing the theoretical and practical challenges associated with investigating the interplay between varied node types and diverse relationships within HSNs. The paper begins by defining HSNs and outlining their characteristics, highlighting the existence of diverse entity kinds and a range of relationship types. It explores the significance of HSNs in modeling real‐world systems, including online social platforms, biological networks, e‐commerce networks, and recommendation systems, where diverse entities play distinct roles. The analysis of HSNs extends beyond traditional homogeneous networks, incorporating various types of nodes and edges, and introduces novel considerations for effective analysis. The difficulties in modeling, representing, and analyzing HSNs will be covered in this work. Several reviews of social network analysis have been published in the past, but they often focus on simple networks, not HSN analysis specifically. This paper aims to fill that gap by comprehensively reviewing different aspects of HSN and its analysis. We start with the fundamentals of HSNs, explore its major types‐multi‐relational networks and multi‐modal networks and further their impact on popular data mining tasks. Then, we explore various applications of heterogeneous information network analysis, like recommender systems, text mining, fraud detection, and e‐commerce. Finally, we look at recent research and suggest promising future directions in the field of HSN analysis.

异构社会网络（hsn）表示复杂的结构，其中不同的实体（如用户、项目和交互）共存，并在统一的框架内进行交互。本文对HSN分析进行了系统回顾，解决了与调查HSN内不同节点类型和不同关系之间相互作用相关的理论和实践挑战。本文首先定义了hsn并概述了其特征，强调了不同实体类型和一系列关系类型的存在。它探讨了hsn在建模现实世界系统中的重要性，包括在线社交平台、生物网络、电子商务网络和推荐系统，在这些系统中，不同的实体扮演着不同的角色。hsn的分析超越了传统的同构网络，纳入了各种类型的节点和边缘，并为有效分析引入了新的考虑因素。本文将讨论hsn在建模、表示和分析方面的困难。过去已经发表了一些关于社会网络分析的评论，但它们通常侧重于简单的网络，而不是专门针对HSN的分析。本文旨在通过全面回顾HSN的不同方面及其分析来填补这一空白。我们从hsn的基础开始，探索其主要类型——多关系网络和多模态网络，并进一步探讨它们对流行数据挖掘任务的影响。然后，我们探索了异构信息网络分析的各种应用，如推荐系统、文本挖掘、欺诈检测和电子商务。最后，我们回顾了最近的研究，并提出了HSN分析领域的未来发展方向。

{"title":"An Overview of Heterogeneous Social Network Analysis","authors":"Deepti Singh, Ankita Verma","doi":"10.1002/widm.70028","DOIUrl":"https://doi.org/10.1002/widm.70028","url":null,"abstract":"Heterogeneous Social Networks (HSNs) represent complex structures where diverse entities, such as users, items, and interactions, coexist and interact within a unified framework. This paper offers a systematic review of HSN Analysis, addressing the theoretical and practical challenges associated with investigating the interplay between varied node types and diverse relationships within HSNs. The paper begins by defining HSNs and outlining their characteristics, highlighting the existence of diverse entity kinds and a range of relationship types. It explores the significance of HSNs in modeling real‐world systems, including online social platforms, biological networks, e‐commerce networks, and recommendation systems, where diverse entities play distinct roles. The analysis of HSNs extends beyond traditional homogeneous networks, incorporating various types of nodes and edges, and introduces novel considerations for effective analysis. The difficulties in modeling, representing, and analyzing HSNs will be covered in this work. Several reviews of social network analysis have been published in the past, but they often focus on simple networks, not HSN analysis specifically. This paper aims to fill that gap by comprehensively reviewing different aspects of HSN and its analysis. We start with the fundamentals of HSNs, explore its major types‐multi‐relational networks and multi‐modal networks and further their impact on popular data mining tasks. Then, we explore various applications of heterogeneous information network analysis, like recommender systems, text mining, fraud detection, and e‐commerce. Finally, we look at recent research and suggest promising future directions in the field of HSN analysis.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144288333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Vehicle Damage Detection Using Artificial Intelligence: A Systematic Literature Review 基于人工智能的车辆损伤检测：系统的文献综述

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-06-07 DOI: 10.1002/widm.70027

Md Jahid Hasan, Cong Kha Nguyen, Yee Ling Boo, Hamed Jahani, Kok-Leong Ong

Automating vehicle damage detection is essential for automotive industry applications like insurance claims, online sales, and repair cost estimates, addressing the labor-intensive, time-consuming, and error-prone nature of current manual inspections. This systematic literature review explores the use of artificial intelligence (AI), particularly deep learning-based algorithms, to improve the accuracy and efficiency of damage detection under dynamic and challenging conditions specific to the requirements of our industry partners. The review is structured around five key research questions and includes extensive empirical evaluations to identify gaps and challenges in existing methods. Findings reveal significant potential for AI to automate and enhance the damage detection process but also highlight areas requiring further research and development. The review discusses these gaps in detail, providing a comprehensive foundation for future work in this field. Furthermore, the review findings are intended to guide both our research and the broader research community in advancing the practical application of AI for vehicle damage assessment. The insights gained from this review are crucial for developing robust AI solutions that can operate effectively in real-world scenarios, ultimately improving operational efficiency and customer experience in the automotive industry.

自动车辆损坏检测对于保险索赔、在线销售和维修成本估算等汽车行业应用至关重要，它解决了当前人工检查的劳动密集型、耗时和容易出错的特点。本系统的文献综述探讨了人工智能（AI）的使用，特别是基于深度学习的算法，以提高在动态和具有挑战性的条件下的损伤检测的准确性和效率，以满足我们的行业合作伙伴的特定要求。这篇综述围绕五个关键研究问题展开，并包括广泛的实证评估，以确定现有方法中的差距和挑战。研究结果揭示了人工智能在自动化和增强损伤检测过程方面的巨大潜力，但也强调了需要进一步研究和开发的领域。本文详细讨论了这些差距，为今后在这一领域的工作提供了全面的基础。此外，审查结果旨在指导我们的研究和更广泛的研究界推进人工智能在车辆损伤评估中的实际应用。从本次审查中获得的见解对于开发强大的人工智能解决方案至关重要，这些解决方案可以在现实场景中有效运行，最终提高汽车行业的运营效率和客户体验。

{"title":"Vehicle Damage Detection Using Artificial Intelligence: A Systematic Literature Review","authors":"Md Jahid Hasan, Cong Kha Nguyen, Yee Ling Boo, Hamed Jahani, Kok-Leong Ong","doi":"10.1002/widm.70027","DOIUrl":"https://doi.org/10.1002/widm.70027","url":null,"abstract":"Automating vehicle damage detection is essential for automotive industry applications like insurance claims, online sales, and repair cost estimates, addressing the labor-intensive, time-consuming, and error-prone nature of current manual inspections. This systematic literature review explores the use of artificial intelligence (AI), particularly deep learning-based algorithms, to improve the accuracy and efficiency of damage detection under dynamic and challenging conditions specific to the requirements of our industry partners. The review is structured around five key research questions and includes extensive empirical evaluations to identify gaps and challenges in existing methods. Findings reveal significant potential for AI to automate and enhance the damage detection process but also highlight areas requiring further research and development. The review discusses these gaps in detail, providing a comprehensive foundation for future work in this field. Furthermore, the review findings are intended to guide both our research and the broader research community in advancing the practical application of AI for vehicle damage assessment. The insights gained from this review are crucial for developing robust AI solutions that can operate effectively in real-world scenarios, ultimately improving operational efficiency and customer experience in the automotive industry.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144237453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advances in Feature Selection Using Memetic Algorithms: A Comprehensive Review 基于模因算法的特征选择研究进展

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-06-03 DOI: 10.1002/widm.70026

Keerthi Gabbi Reddy, Deepasikha Mishra

This review paper presents a comprehensive analysis of the memetic algorithms (MAs) for feature selection (FS), particularly in high‐dimensional datasets. MAs effectively address the challenges of feature selection by combining the global exploration capabilities of evolutionary algorithms with the local optimization of search techniques. Their hybrid nature makes them well suited for tackling the complexity, scalability, and computational demands of FS problems across various domains, including bioinformatics, image processing, and financial forecasting. This review highlights the recent advancements, customized variants, and practical applications of MA‐based FS methods while providing critical insights into their limitations, such as computational overhead and overfitting. Additionally, the paper outlines future research directions to further enhance the efficacy of MAs in feature selection, offering a balanced perspective on their contributions to the field.

这篇综述文章提出了一个全面的分析模因算法（MAs）的特征选择（FS），特别是在高维数据集。MAs通过将进化算法的全局探索能力与搜索技术的局部优化能力相结合，有效地解决了特征选择的挑战。它们的混合性质使它们非常适合处理跨各个领域（包括生物信息学、图像处理和财务预测）的复杂性、可伸缩性和计算需求的FS问题。这篇综述强调了基于MA的FS方法的最新进展、定制变体和实际应用，同时提供了对其局限性的关键见解，如计算开销和过拟合。此外，本文还概述了未来的研究方向，以进一步提高MAs在特征选择方面的有效性，并对其在该领域的贡献提供了一个平衡的视角。

引用次数: 0

The “Curious Case of Contexts” in Retrieval-Augmented Generation With a Combination of Labeled and Unlabeled Data 标记与未标记数据结合的检索增强生成中的“上下文奇特案例”

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-05-29 DOI: 10.1002/widm.70021

Payel Santra, Madhusudan Ghosh, Debasis Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar

With the growing reliance on LLMs for a wide range of NLP tasks, optimizing the use of labeled and unlabeled data for effective context generation has become critical. This work explores the interplay between two prominent methodologies in few-shot learning: in-context learning (ICL), which utilizes labeled task-specific data, and retrieval-augmented generation (RAG), which leverages unlabeled external knowledge to augment generative models. Since each has its individual limitations, we propose a novel hybrid approach to obtain “the best of both worlds” by dynamically integrating both labeled and unlabeled data towards improving the downstream performance of LLMs. Our methodology, which we call LU-RAG (labeled and unlabeled RAG), recomputes the scores of top-k labeled instances and top-m unlabeled passages to refine context selection. Our experimental results demonstrate that LU-RAG consistently outperforms both standalone ICL and RAG across multiple benchmarks, showing significant gains in downstream performance. Furthermore, we show that LU-RAG performs better with a semantic neighborhood as compared to a lexical one, highlighting its ability to generalize effectively.

随着越来越多的NLP任务依赖于llm，优化标记和未标记数据的使用以有效生成上下文变得至关重要。这项工作探讨了在少量学习中两种突出方法之间的相互作用：上下文学习（ICL），它利用标记的任务特定数据，以及检索增强生成（RAG），它利用未标记的外部知识来增强生成模型。由于每种方法都有其各自的局限性，我们提出了一种新的混合方法，通过动态集成标记和未标记的数据来提高llm的下游性能，从而获得“两全其美”。我们的方法，我们称之为LU-RAG（标记和未标记的RAG），重新计算前k个标记实例和前m个未标记段落的分数，以改进上下文选择。我们的实验结果表明，在多个基准测试中，LU-RAG始终优于独立的ICL和RAG，显示出下游性能的显著提高。此外，我们表明，与词汇邻域相比，LU-RAG在语义邻域上的表现更好，突出了其有效泛化的能力。

{"title":"The “Curious Case of Contexts” in Retrieval-Augmented Generation With a Combination of Labeled and Unlabeled Data","authors":"Payel Santra, Madhusudan Ghosh, Debasis Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar","doi":"10.1002/widm.70021","DOIUrl":"https://doi.org/10.1002/widm.70021","url":null,"abstract":"With the growing reliance on LLMs for a wide range of NLP tasks, optimizing the use of labeled and unlabeled data for effective context generation has become critical. This work explores the interplay between two prominent methodologies in few-shot learning: in-context learning (ICL), which utilizes labeled task-specific data, and retrieval-augmented generation (RAG), which leverages unlabeled external knowledge to augment generative models. Since each has its individual limitations, we propose a novel hybrid approach to obtain “the best of both worlds” by dynamically integrating both labeled and unlabeled data towards improving the downstream performance of LLMs. Our methodology, which we call LU-RAG (labeled and unlabeled RAG), recomputes the scores of top-<i>k</i> labeled instances and top-<i>m</i> unlabeled passages to refine context selection. Our experimental results demonstrate that LU-RAG consistently outperforms both standalone ICL and RAG across multiple benchmarks, showing significant gains in downstream performance. Furthermore, we show that LU-RAG performs better with a semantic neighborhood as compared to a lexical one, highlighting its ability to generalize effectively.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"134 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144165784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey on Causal Inference-Driven Data Bias Optimization in Recommendation Systems: Principles, Opportunities and Challenges 推荐系统中因果推理驱动的数据偏差优化研究：原则、机遇与挑战

WIREs Data Mining and Knowledge Discovery

Pub Date : 2025-05-24 DOI: 10.1002/widm.70020

Yongkang Li, Xingyu Zhu, Yuheng Wu, Wenxu Zhao, Xiaona Xia

Recommendation systems predict user interests and recommend items for online platforms including e-commerce, social networks, and decision systems. However, data bias has become a significant obstacle, severely impacting the accuracy, fairness, and reliability of recommendation results. This survey examines causal inference for optimizing recommendation systems and mitigating data bias, addressing three questions: (1) Bias types and performance impacts; (2) Causal inference mitigation methods; (3) Approach advantages, limitations, and research opportunities. The motivation for this survey stems from the limitations of traditional debiasing methods, which often fail to account for causal relationships and struggle in dynamic, real-world scenarios. Causal inference provides a robust framework for identifying and addressing the underlying causes of bias, enabling more transparent and accurate recommendation systems. Therefore, we define three critical stages of bias: bias in the data stage, model selection stage, and model evaluation stage. For each stage, causal inference-based optimization methods are introduced and critically analyzed. Unlike traditional debiasing methods, this study analyzes data augmentation and regularization techniques as potential strategies for future research. The whole research might highlight the ability of causal inference to uncover and control confounding factors, offering deeper insights into the mechanisms driving biases.

推荐系统预测用户兴趣并为在线平台推荐项目，包括电子商务、社交网络和决策系统。然而，数据偏差已经成为一个重要的障碍，严重影响了推荐结果的准确性、公平性和可靠性。本研究探讨了优化推荐系统和减轻数据偏差的因果推理，解决了三个问题：(1)偏差类型和性能影响；(2)因果推理缓解方法；(3)方法优势、局限性和研究机会。这项调查的动机源于传统的去偏方法的局限性，这些方法往往不能解释因果关系，并且在动态的现实世界场景中挣扎。因果推理为识别和解决偏见的潜在原因提供了一个强大的框架，使推荐系统更加透明和准确。因此，我们定义了偏差的三个关键阶段：数据阶段的偏差，模型选择阶段和模型评估阶段。对于每个阶段，介绍了基于因果推理的优化方法并进行了批判性分析。与传统的去偏方法不同，本研究分析了数据增强和正则化技术作为未来研究的潜在策略。整个研究可能会突出因果推理发现和控制混杂因素的能力，为驱动偏见的机制提供更深入的见解。

{"title":"A Survey on Causal Inference-Driven Data Bias Optimization in Recommendation Systems: Principles, Opportunities and Challenges","authors":"Yongkang Li, Xingyu Zhu, Yuheng Wu, Wenxu Zhao, Xiaona Xia","doi":"10.1002/widm.70020","DOIUrl":"https://doi.org/10.1002/widm.70020","url":null,"abstract":"Recommendation systems predict user interests and recommend items for online platforms including e-commerce, social networks, and decision systems. However, data bias has become a significant obstacle, severely impacting the accuracy, fairness, and reliability of recommendation results. This survey examines causal inference for optimizing recommendation systems and mitigating data bias, addressing three questions: (1) Bias types and performance impacts; (2) Causal inference mitigation methods; (3) Approach advantages, limitations, and research opportunities. The motivation for this survey stems from the limitations of traditional debiasing methods, which often fail to account for causal relationships and struggle in dynamic, real-world scenarios. Causal inference provides a robust framework for identifying and addressing the underlying causes of bias, enabling more transparent and accurate recommendation systems. Therefore, we define three critical stages of bias: bias in the data stage, model selection stage, and model evaluation stage. For each stage, causal inference-based optimization methods are introduced and critically analyzed. Unlike traditional debiasing methods, this study analyzes data augmentation and regularization techniques as potential strategies for future research. The whole research might highlight the ability of causal inference to uncover and control confounding factors, offering deeper insights into the mechanisms driving biases.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144130746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0