Artificial Intelligence Review最新文献

Unsupervised clustering optimization-based efficient attention in YOLO for underwater object detection

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-23 DOI: 10.1007/s10462-025-11218-6

Xin Shen, Guoliang Yuan, Huibing Wang, Xianping Fu

Underwater object detection is a prerequisite for underwater robots to realize ocean exploration and autonomous grasping. However, underwater detection tasks face some inevitable interference factors, such as poor imaging quality, strong environment randomness, and high organism concealment. These phenomena will lead to strong underwater background interference and weak underwater object perception, which greatly aggravates the difficulty of underwater object detection. In order to deal with the above problems, we propose an unsupervised clustering optimization-based efficient attention (UCOEA). Different from the channel-wise strategy, cross-channel strategy and channel grouping strategy, we design a channel clustering strategy, which achieves autonomous dynamic screening of channel information by using the K-Means algorithm. Same types of channel information with high redundancy are learned uniformly to share the same operation. Different types of channel information with high specificity are learned independently to avoid channel noise information interference. Different from the single spatial strategy and multiple spatial strategy, we design a spatial clustering strategy, which achieves autonomous dynamic stripping of spatial information by using the EM algorithm. This strategy can extract multiple required spatial information at one time from different spatial locations. We further assign learnable weight parameters to distinguish dominant information and auxiliary information, which can alleviate spatial noise information interference. Our strategies can better balance additional cost overhead and information processing quality, which is crucial for the proposed attention to achieve fast and accurate underwater information calibration. In order to achieve high-precision and real-time underwater object detection, we propose a combined system of UCOEA underwater adapter and one-stage YOLO detector, which can efficiently detect small, medium and large targets at the same time. Extensive experiments demonstrate the effectiveness of our work. More importantly, we publish an underwater detection dataset DLMU2024 with low image continuity and high data diversity, which provides reliable support for the rapid development of underwater detection research. Our dataset is available at https://github.com/shenxin-dlmu/DLMU2024.

水下物体探测是水下机器人实现海洋探测和自主抓取的前提条件。然而，水下探测任务面临着一些不可避免的干扰因素，如成像质量差、环境随机性强、生物隐蔽性高等。这些现象会导致水下背景干扰强、水下物体感知弱，大大增加了水下物体探测的难度。针对上述问题，我们提出了一种基于无监督聚类优化的高效注意力（UCOEA）。与信道策略、跨信道策略和信道分组策略不同，我们设计了一种信道聚类策略，利用 K-Means 算法实现了信道信息的自主动态筛选。对冗余度高的同类型信道信息进行统一学习，共享同一操作。不同类型的高特异性信道信息独立学习，避免信道噪声信息干扰。与单空间策略和多空间策略不同，我们设计了一种空间聚类策略，利用电磁算法实现空间信息的自主动态剥离。这种策略可以一次性从不同的空间位置提取多种所需的空间信息。我们还进一步分配了可学习的权重参数，以区分主导信息和辅助信息，从而减轻空间噪声信息的干扰。我们的策略能更好地平衡额外成本开销和信息处理质量，这对所提出的注意力实现快速、准确的水下信息校准至关重要。为了实现高精度、实时的水下物体检测，我们提出了一种由 UCOEA 水下适配器和单级 YOLO 检测器组成的组合系统，可以同时高效地检测小型、中型和大型目标。大量实验证明了我们工作的有效性。更重要的是，我们发布了一个图像连续性低、数据多样性高的水下探测数据集 DLMU2024，为水下探测研究的快速发展提供了可靠的支持。我们的数据集可在 https://github.com/shenxin-dlmu/DLMU2024 网站上查阅。

{"title":"Unsupervised clustering optimization-based efficient attention in YOLO for underwater object detection","authors":"Xin Shen, Guoliang Yuan, Huibing Wang, Xianping Fu","doi":"10.1007/s10462-025-11218-6","DOIUrl":"10.1007/s10462-025-11218-6","url":null,"abstract":"<div><p>Underwater object detection is a prerequisite for underwater robots to realize ocean exploration and autonomous grasping. However, underwater detection tasks face some inevitable interference factors, such as poor imaging quality, strong environment randomness, and high organism concealment. These phenomena will lead to strong underwater background interference and weak underwater object perception, which greatly aggravates the difficulty of underwater object detection. In order to deal with the above problems, we propose an unsupervised clustering optimization-based efficient attention (UCOEA). Different from the channel-wise strategy, cross-channel strategy and channel grouping strategy, we design a channel clustering strategy, which achieves autonomous dynamic screening of channel information by using the K-Means algorithm. Same types of channel information with high redundancy are learned uniformly to share the same operation. Different types of channel information with high specificity are learned independently to avoid channel noise information interference. Different from the single spatial strategy and multiple spatial strategy, we design a spatial clustering strategy, which achieves autonomous dynamic stripping of spatial information by using the EM algorithm. This strategy can extract multiple required spatial information at one time from different spatial locations. We further assign learnable weight parameters to distinguish dominant information and auxiliary information, which can alleviate spatial noise information interference. Our strategies can better balance additional cost overhead and information processing quality, which is crucial for the proposed attention to achieve fast and accurate underwater information calibration. In order to achieve high-precision and real-time underwater object detection, we propose a combined system of UCOEA underwater adapter and one-stage YOLO detector, which can efficiently detect small, medium and large targets at the same time. Extensive experiments demonstrate the effectiveness of our work. More importantly, we publish an underwater detection dataset DLMU2024 with low image continuity and high data diversity, which provides reliable support for the rapid development of underwater detection research. Our dataset is available at https://github.com/shenxin-dlmu/DLMU2024.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11218-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143861338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SFIAD: Deepfake detection through spatial-frequency feature integration and dynamic margin optimization

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-23 DOI: 10.1007/s10462-025-11225-7

Yi Kou, Peng Li, Hongjiang Ma, Jiliu Zhou, Zhan ao Huang, Xiaojie Li

The rapid advancement of generative models has profoundly transformed the field of digital content creation, bringing unprecedented opportunities for media generation. However, the widespread adoption of this technology has also led to the emergence of highly realistic fake facial images and videos, which pose significant threats to public trust and societal security. To address the challenges of deepfake detection, this paper proposes a novel method based on Spatial-Frequency Feature Integration (SFFI), which effectively identifies fake content by combining spatial and frequency features of images. Additionally, to tackle the issue of class imbalance in the datasets, we propose an Authenticity-Aware Margin Loss (AAML). This loss function dynamically adjusts the decision boundary to enhance the model’s ability to recognize minority class samples. The proposed method was trained and evaluated on four challenging datasets: FaceForensics++, Celeb-DF v1, Celeb-DF v2, and the DeepFake Detection Challenge Preview, and compared against ten state-of-the-art methods. Experimental results demonstrate that the proposed method consistently outperforms all existing approaches across all datasets.

生成模型的飞速发展深刻地改变了数字内容创作领域，为媒体生成带来了前所未有的机遇。然而，这一技术的广泛应用也导致了高度逼真的虚假面部图像和视频的出现，对公众信任和社会安全构成了重大威胁。为了应对深度防伪检测的挑战，本文提出了一种基于空间-频率特性集成（SFFI）的新方法，通过结合图像的空间和频率特性来有效识别虚假内容。此外，为了解决数据集中的类不平衡问题，我们提出了一种真实性感知边际损失（Authenticity-Aware Margin Loss，AAML）。该损失函数可动态调整决策边界，以增强模型识别少数类别样本的能力。我们在四个具有挑战性的数据集上对所提出的方法进行了训练和评估：FaceForensics++、Celeb-DF v1、Celeb-DF v2 和 DeepFake Detection Challenge Preview，并与十种最先进的方法进行了比较。实验结果表明，在所有数据集上，所提出的方法始终优于所有现有方法。

{"title":"SFIAD: Deepfake detection through spatial-frequency feature integration and dynamic margin optimization","authors":"Yi Kou, Peng Li, Hongjiang Ma, Jiliu Zhou, Zhan ao Huang, Xiaojie Li","doi":"10.1007/s10462-025-11225-7","DOIUrl":"10.1007/s10462-025-11225-7","url":null,"abstract":"<div><p>The rapid advancement of generative models has profoundly transformed the field of digital content creation, bringing unprecedented opportunities for media generation. However, the widespread adoption of this technology has also led to the emergence of highly realistic fake facial images and videos, which pose significant threats to public trust and societal security. To address the challenges of deepfake detection, this paper proposes a novel method based on Spatial-Frequency Feature Integration (SFFI), which effectively identifies fake content by combining spatial and frequency features of images. Additionally, to tackle the issue of class imbalance in the datasets, we propose an Authenticity-Aware Margin Loss (AAML). This loss function dynamically adjusts the decision boundary to enhance the model’s ability to recognize minority class samples. The proposed method was trained and evaluated on four challenging datasets: FaceForensics++, Celeb-DF v1, Celeb-DF v2, and the DeepFake Detection Challenge Preview, and compared against ten state-of-the-art methods. Experimental results demonstrate that the proposed method consistently outperforms all existing approaches across all datasets.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11225-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143861340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive survey of deep learning for time series forecasting: architectural diversity and open challenges

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-23 DOI: 10.1007/s10462-025-11223-9

Jongseon Kim, Hyungjoon Kim, HyunGi Kim, Dongjun Lee, Sungroh Yoon

Time series forecasting is a critical task that provides key information for decision-making across various fields, such as economic planning, supply chain management, and medical diagnosis. After the use of traditional statistical methodologies and machine learning in the past, various fundamental deep learning architectures such as MLPs, CNNs, RNNs, and GNNs have been developed and applied to solve time series forecasting problems. However, the structural limitations caused by the inductive biases of each deep learning architecture constrained their performance. Transformer models, which excel at handling long-term dependencies, have become significant architectural components for time series forecasting. However, recent research has shown that alternatives such as simple linear layers can outperform Transformers. These findings have opened up new possibilities for using diverse architectures, ranging from fundamental deep learning models to emerging architectures and hybrid approaches. In this context of exploration into various models, the architectural modeling of time series forecasting has now entered a renaissance. This survey not only provides a historical context for time series forecasting but also offers comprehensive and timely analysis of the movement toward architectural diversification. By comparing and re-examining various deep learning models, we uncover new perspectives and present the latest trends in time series forecasting, including the emergence of hybrid models, diffusion models, Mamba models, and foundation models. By focusing on the inherent characteristics of time series data, we also address open challenges that have gained attention in time series forecasting, such as channel dependency, distribution shift, causality, and feature extraction. This survey explores vital elements that can enhance forecasting performance through diverse approaches. These contributions help lower entry barriers for newcomers by providing a systematic understanding of the diverse research areas in time series forecasting (TSF), while offering seasoned researchers broader perspectives and new opportunities through in-depth exploration of TSF challenges.

时间序列预测是一项关键任务，它为经济规划、供应链管理和医疗诊断等各个领域的决策提供关键信息。在过去使用传统统计方法和机器学习之后，MLP、CNN、RNN 和 GNN 等各种基本深度学习架构已被开发并应用于解决时间序列预测问题。然而，每种深度学习架构的归纳偏差所造成的结构限制制约了它们的性能。擅长处理长期依赖关系的变压器模型已成为时间序列预测的重要架构组件。不过，最近的研究表明，简单线性层等替代方案的性能可以超过变形器。这些发现为使用各种架构提供了新的可能性，从基本的深度学习模型到新兴架构和混合方法，不一而足。在探索各种模型的背景下，时间序列预测的架构建模现已进入复兴阶段。本调查不仅提供了时间序列预测的历史背景，还对架构多样化的发展进行了全面而及时的分析。通过比较和重新审视各种深度学习模型，我们发现了新的视角，并介绍了时间序列预测的最新趋势，包括混合模型、扩散模型、曼巴模型和基础模型的出现。通过关注时间序列数据的固有特征，我们还探讨了时间序列预测中备受关注的公开挑战，如渠道依赖性、分布偏移、因果关系和特征提取。本调查探讨了可通过不同方法提高预测性能的重要因素。通过系统地了解时间序列预测（TSF）的不同研究领域，这些贡献有助于降低新手的入门门槛，同时通过深入探讨 TSF 面临的挑战，为经验丰富的研究人员提供更广阔的视角和新的机遇。

{"title":"A comprehensive survey of deep learning for time series forecasting: architectural diversity and open challenges","authors":"Jongseon Kim, Hyungjoon Kim, HyunGi Kim, Dongjun Lee, Sungroh Yoon","doi":"10.1007/s10462-025-11223-9","DOIUrl":"10.1007/s10462-025-11223-9","url":null,"abstract":"<div><p>Time series forecasting is a critical task that provides key information for decision-making across various fields, such as economic planning, supply chain management, and medical diagnosis. After the use of traditional statistical methodologies and machine learning in the past, various fundamental deep learning architectures such as MLPs, CNNs, RNNs, and GNNs have been developed and applied to solve time series forecasting problems. However, the structural limitations caused by the inductive biases of each deep learning architecture constrained their performance. Transformer models, which excel at handling long-term dependencies, have become significant architectural components for time series forecasting. However, recent research has shown that alternatives such as simple linear layers can outperform Transformers. These findings have opened up new possibilities for using diverse architectures, ranging from fundamental deep learning models to emerging architectures and hybrid approaches. In this context of exploration into various models, the architectural modeling of time series forecasting has now entered a renaissance. This survey not only provides a historical context for time series forecasting but also offers comprehensive and timely analysis of the movement toward architectural diversification. By comparing and re-examining various deep learning models, we uncover new perspectives and present the latest trends in time series forecasting, including the emergence of hybrid models, diffusion models, Mamba models, and foundation models. By focusing on the inherent characteristics of time series data, we also address open challenges that have gained attention in time series forecasting, such as channel dependency, distribution shift, causality, and feature extraction. This survey explores vital elements that can enhance forecasting performance through diverse approaches. These contributions help lower entry barriers for newcomers by providing a systematic understanding of the diverse research areas in time series forecasting (TSF), while offering seasoned researchers broader perspectives and new opportunities through in-depth exploration of TSF challenges.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11223-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143861339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive survey of specularity detection: state-of-the-art techniques and breakthroughs

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-23 DOI: 10.1007/s10462-025-11233-7

Fengze Li, Jieming Ma, Hai-Ning Liang, Zhongbei Tian, Zhijing Wu, Tianxi Wen, Dawei Liu

Specularity poses significant challenges in computer vision (CV), often leading to performance degradation in various tasks. Despite its importance, the CV field lacks a comprehensive review of specularity detection techniques. This survey addresses this gap by synthesizing diverse definitions of specularity and providing a unified framework to enhance consistency. It also presents a systematic review of traditional and deep learning-based methods for detecting specularity. Comparative experiments on a standardized dataset enable in-depth evaluation of each method, highlighting their strengths and limitations. The survey further provides structured insights and guidance for selecting appropriate methods across diverse scenarios. Through this, it identifies key areas for future research, aiming to support the development of more advanced detection models. By integrating diverse methodologies and quantitative analyzes, this survey contributes to a deeper understanding of current advancements and potential innovations in specularity detection.

{"title":"A comprehensive survey of specularity detection: state-of-the-art techniques and breakthroughs","authors":"Fengze Li, Jieming Ma, Hai-Ning Liang, Zhongbei Tian, Zhijing Wu, Tianxi Wen, Dawei Liu","doi":"10.1007/s10462-025-11233-7","DOIUrl":"10.1007/s10462-025-11233-7","url":null,"abstract":"<div><p>Specularity poses significant challenges in computer vision (CV), often leading to performance degradation in various tasks. Despite its importance, the CV field lacks a comprehensive review of specularity detection techniques. This survey addresses this gap by synthesizing diverse definitions of specularity and providing a unified framework to enhance consistency. It also presents a systematic review of traditional and deep learning-based methods for detecting specularity. Comparative experiments on a standardized dataset enable in-depth evaluation of each method, highlighting their strengths and limitations. The survey further provides structured insights and guidance for selecting appropriate methods across diverse scenarios. Through this, it identifies key areas for future research, aiming to support the development of more advanced detection models. By integrating diverse methodologies and quantitative analyzes, this survey contributes to a deeper understanding of current advancements and potential innovations in specularity detection.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11233-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143861291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Natural language processing in the patent domain: a survey

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-22 DOI: 10.1007/s10462-025-11168-z

Lekang Jiang, Stephan M. Goetz

Patents, which encapsulate crucial technical and legal information in text form and referenced drawings, present a rich domain for natural language processing (NLP). As NLP technologies evolve, large language models (LLMs) have demonstrated outstanding capabilities in general text processing and generation tasks. However, the application of LLMs in the patent domain remains under-explored and under-developed due to the complexity of patents, particularly their language and legal framework. Understanding the unique characteristics of patent documents and related research in the patent domain becomes essential for researchers to apply these tools effectively. Therefore, this paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently. We introduce the relevant fundamental aspects of patents to provide solid background information. In addition, we systematically break down the structural and linguistic characteristics unique to patents and map out how NLP can be leveraged for patent analysis and generation. Moreover, we demonstrate the spectrum of text-based and multimodal patent-related tasks, including nine patent analysis and four patent generation tasks.

{"title":"Natural language processing in the patent domain: a survey","authors":"Lekang Jiang, Stephan M. Goetz","doi":"10.1007/s10462-025-11168-z","DOIUrl":"10.1007/s10462-025-11168-z","url":null,"abstract":"<div><p>Patents, which encapsulate crucial technical and legal information in text form and referenced drawings, present a rich domain for natural language processing (NLP). As NLP technologies evolve, large language models (LLMs) have demonstrated outstanding capabilities in general text processing and generation tasks. However, the application of LLMs in the patent domain remains under-explored and under-developed due to the complexity of patents, particularly their language and legal framework. Understanding the unique characteristics of patent documents and related research in the patent domain becomes essential for researchers to apply these tools effectively. Therefore, this paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently. We introduce the relevant fundamental aspects of patents to provide solid background information. In addition, we systematically break down the structural and linguistic characteristics unique to patents and map out how NLP can be leveraged for patent analysis and generation. Moreover, we demonstrate the spectrum of text-based and multimodal patent-related tasks, including nine patent analysis and four patent generation tasks.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11168-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel fuzzy neural network approach with triangular fuzzy information for the selection of logistics service providers

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-22 DOI: 10.1007/s10462-025-11209-7

Lifang Wang, Saleem Abdullah, Ariana Abdul Rahimzai, Ihsan Ullah

In this article, we presents a novel fuzzy neural network approach designed to address multi criteria decision making (MCDM) problems, specifically for selecting logistics service providers. The proposed decision making model integrates triangular fuzzy numbers (TFNs) with a triangular fuzzy Einstein weighted averaging (TFEWA) aggregation operator to enhance the decision making process under uncertainty. Initially, we discussed the concept of triangular fuzzy numbers, which allows for the representation of uncertain and imprecise data typically presented in real-world decision making environments. The operational laws, score function, and Hamming distance measures for TFNs are presented to ensure accurate handling of the fuzzy input data. The TFEWA aggregation operator, which is based on Einstein norms and plays a crucial role in aggregating expert opinions in the evaluation process. In the decision making process, we collect expert opinions regarding logistics service providers, expressed as TFNs, which are then processed through the fuzzy neural network model. After that, we apply the proposed decision making model to select the best logistics service providers. The TFEWA operator computes values at the hidden and output layers, and activation functions are applied to produce final output values. These outputs provide a ranked list of logistics service providers based on their overall performance across multiple criteria. The effectiveness of this novel approach is validated through a comparative analysis with existing MCDM methods. The results demonstrate that the triangular fuzzy neural network approach outperforms traditional methods in terms of flexibility, accuracy, and its ability to handle uncertain, fuzzy data. Our method provides a robust decision support system, capable of managing complex decision making tasks in logistics and other fields.

{"title":"A novel fuzzy neural network approach with triangular fuzzy information for the selection of logistics service providers","authors":"Lifang Wang, Saleem Abdullah, Ariana Abdul Rahimzai, Ihsan Ullah","doi":"10.1007/s10462-025-11209-7","DOIUrl":"10.1007/s10462-025-11209-7","url":null,"abstract":"<div><p>In this article, we presents a novel fuzzy neural network approach designed to address multi criteria decision making (MCDM) problems, specifically for selecting logistics service providers. The proposed decision making model integrates triangular fuzzy numbers (TFNs) with a triangular fuzzy Einstein weighted averaging (TFEWA) aggregation operator to enhance the decision making process under uncertainty. Initially, we discussed the concept of triangular fuzzy numbers, which allows for the representation of uncertain and imprecise data typically presented in real-world decision making environments. The operational laws, score function, and Hamming distance measures for TFNs are presented to ensure accurate handling of the fuzzy input data. The TFEWA aggregation operator, which is based on Einstein norms and plays a crucial role in aggregating expert opinions in the evaluation process. In the decision making process, we collect expert opinions regarding logistics service providers, expressed as TFNs, which are then processed through the fuzzy neural network model. After that, we apply the proposed decision making model to select the best logistics service providers. The TFEWA operator computes values at the hidden and output layers, and activation functions are applied to produce final output values. These outputs provide a ranked list of logistics service providers based on their overall performance across multiple criteria. The effectiveness of this novel approach is validated through a comparative analysis with existing MCDM methods. The results demonstrate that the triangular fuzzy neural network approach outperforms traditional methods in terms of flexibility, accuracy, and its ability to handle uncertain, fuzzy data. Our method provides a robust decision support system, capable of managing complex decision making tasks in logistics and other fields.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11209-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A review on 3D Gaussian splatting for sparse view reconstruction 用于稀疏视图重建的 3D 高斯拼接综述

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-22 DOI: 10.1007/s10462-025-11171-4

Haitian Liu, Binglin Liu, Qianchao Hu, Peilun Du, Jing Li, Yang Bao, Feng Wang

Sparse view 3D reconstruction remains challenging due to inherent data scale limitations. Mainstream sparse view 3D reconstruction algorithms based on the NeRF framework struggle to balance generation quality and real-time performance. Recently, the advent of 3D Gaussian Splatting technology has demonstrated remarkable results, becoming increasingly prominent in 3D scene representation and reconstruction. Exploring the application of 3D Gaussian Splatting technology for sparse view 3D reconstruction represents a promising research avenue. Based on this, our paper provides a comprehensive review of current sparse view 3D reconstruction methods leveraging 3D Gaussian Splatting, with an emphasis on extracting effective reconstruction information from input images and utilizing these data to generate realistic scenes efficiently and reliably. We then provide a detailed discussion on how the algorithm addresses issues such as artifacts and scale ambiguous, which are common challenges in this field. In the subsequent sections, we present both quantitative and qualitative comparisons of various sparse-view 3D reconstruction methods, roughly demonstrating the advantages of sparse view 3D Gaussian splatting methods in terms of reconstruction quality and efficiency. Furthermore, we analyze the potential applications of sparse view 3D Gaussian splatting methods. Finally, we identify the challenges faced by sparse-view 3D Gaussian splatting reconstruction and suggest potential solutions. We hope that our analysis will provide valuable insights for future research efforts.

{"title":"A review on 3D Gaussian splatting for sparse view reconstruction","authors":"Haitian Liu, Binglin Liu, Qianchao Hu, Peilun Du, Jing Li, Yang Bao, Feng Wang","doi":"10.1007/s10462-025-11171-4","DOIUrl":"10.1007/s10462-025-11171-4","url":null,"abstract":"<div><p>Sparse view 3D reconstruction remains challenging due to inherent data scale limitations. Mainstream sparse view 3D reconstruction algorithms based on the NeRF framework struggle to balance generation quality and real-time performance. Recently, the advent of 3D Gaussian Splatting technology has demonstrated remarkable results, becoming increasingly prominent in 3D scene representation and reconstruction. Exploring the application of 3D Gaussian Splatting technology for sparse view 3D reconstruction represents a promising research avenue. Based on this, our paper provides a comprehensive review of current sparse view 3D reconstruction methods leveraging 3D Gaussian Splatting, with an emphasis on extracting effective reconstruction information from input images and utilizing these data to generate realistic scenes efficiently and reliably. We then provide a detailed discussion on how the algorithm addresses issues such as artifacts and scale ambiguous, which are common challenges in this field. In the subsequent sections, we present both quantitative and qualitative comparisons of various sparse-view 3D reconstruction methods, roughly demonstrating the advantages of sparse view 3D Gaussian splatting methods in terms of reconstruction quality and efficiency. Furthermore, we analyze the potential applications of sparse view 3D Gaussian splatting methods. Finally, we identify the challenges faced by sparse-view 3D Gaussian splatting reconstruction and suggest potential solutions. We hope that our analysis will provide valuable insights for future research efforts.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11171-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast machine learning for building management systems

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-17 DOI: 10.1007/s10462-025-11226-6

Mohammed Mshragi, Ioan Petri

Building management systems (BMSs) are increasingly integrating advanced machine learning (ML) and artificial intelligence (AI) capabilities to enhance operational efficiency and responsiveness. The transformation of BMSs involves a wide range of environmental, behavioural, economical and technical factors as well as optimum performance considerations in order to reach energy efficiency and for long term sustainability. Existing BMSs can only provide local adaptability by creating and managing information for a built asset lacking the capability to learn and adapt based on performance objectives. This research provides a comprehensive review of ML techniques in BMSs, with particular emphasis and demonstration of fast machine learning (FastML) techniques in a real-case study application. The study reviews optimization methods for ML algorithms, focusing on Long Short-Term Memory (LSTM) networks for energy consumption forecasting and exploring solutions that leverage hardware accelerators for low-latency and high-throughput processing. The High-Level Synthesis for Machine Learning (HLS4ML) framework facilitates deployment of fast machine learning models with BMSs, achieving substantial gains in hardware efficiency and inference speed in resource-constrained environments. Findings reveal that HLS4ML-optimized models maintain accuracy while offering computational efficiency through techniques like pruning and quantization, supporting real-time BMS applications. This research significantly contributes to the development of intelligent BMSs by integrating ML algorithms with advanced hardware solutions, ultimately improving energy management, occupant comfort, and safety in modern buildings.

{"title":"Fast machine learning for building management systems","authors":"Mohammed Mshragi, Ioan Petri","doi":"10.1007/s10462-025-11226-6","DOIUrl":"10.1007/s10462-025-11226-6","url":null,"abstract":"<div><p>Building management systems (BMSs) are increasingly integrating advanced machine learning (ML) and artificial intelligence (AI) capabilities to enhance operational efficiency and responsiveness. The transformation of BMSs involves a wide range of environmental, behavioural, economical and technical factors as well as optimum performance considerations in order to reach energy efficiency and for long term sustainability. Existing BMSs can only provide local adaptability by creating and managing information for a built asset lacking the capability to learn and adapt based on performance objectives. This research provides a comprehensive review of ML techniques in BMSs, with particular emphasis and demonstration of fast machine learning (FastML) techniques in a real-case study application. The study reviews optimization methods for ML algorithms, focusing on Long Short-Term Memory (LSTM) networks for energy consumption forecasting and exploring solutions that leverage hardware accelerators for low-latency and high-throughput processing. The High-Level Synthesis for Machine Learning (HLS4ML) framework facilitates deployment of fast machine learning models with BMSs, achieving substantial gains in hardware efficiency and inference speed in resource-constrained environments. Findings reveal that HLS4ML-optimized models maintain accuracy while offering computational efficiency through techniques like pruning and quantization, supporting real-time BMS applications. This research significantly contributes to the development of intelligent BMSs by integrating ML algorithms with advanced hardware solutions, ultimately improving energy management, occupant comfort, and safety in modern buildings.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11226-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scalable multi-modal representation learning networks

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-17 DOI: 10.1007/s10462-025-11224-8

Zihan Fang, Ying Zou, Shiyang Lan, Shide Du, Yanchao Tan, Shiping Wang

Multi-modal representation learning is recognized for its comprehensive interpretation across diverse modalities. Although existing approaches have yielded favorable results, they face challenges in high-order information preservation and out-of-sample data generalization. To tackle these issues, we propose a scalable multi-modal representation learning networks framework, which aims to learn optimal modality-specific projection matrices to project multi-modal features to a shared representation space. Specifically, weight guided modality-wise and row-sparsity driven feature-wise measures are considered to achieve adaptively hierarchical feature selection from the original data. Then, within the unified latent representation space, we employ hypergraph embedding to preserve the intricate high-order local geometric structures within the modality-specific high-dimensional spaces. Finally, we propose a proximal operator-inspired network architecture to resolve the optimization objectives, streamlining the process of feature auto-weighted selection and representation learning. The experimental results highlight the effectiveness and superiority of the proposed method, while online testing on out-of-sample data further demonstrates robust generalization. The code of the proposed method is publicly available at: https://github.com/ZihanFang11/SMMRL.

{"title":"Scalable multi-modal representation learning networks","authors":"Zihan Fang, Ying Zou, Shiyang Lan, Shide Du, Yanchao Tan, Shiping Wang","doi":"10.1007/s10462-025-11224-8","DOIUrl":"10.1007/s10462-025-11224-8","url":null,"abstract":"<div><p>Multi-modal representation learning is recognized for its comprehensive interpretation across diverse modalities. Although existing approaches have yielded favorable results, they face challenges in high-order information preservation and out-of-sample data generalization. To tackle these issues, we propose a scalable multi-modal representation learning networks framework, which aims to learn optimal modality-specific projection matrices to project multi-modal features to a shared representation space. Specifically, weight guided modality-wise and row-sparsity driven feature-wise measures are considered to achieve adaptively hierarchical feature selection from the original data. Then, within the unified latent representation space, we employ hypergraph embedding to preserve the intricate high-order local geometric structures within the modality-specific high-dimensional spaces. Finally, we propose a proximal operator-inspired network architecture to resolve the optimization objectives, streamlining the process of feature auto-weighted selection and representation learning. The experimental results highlight the effectiveness and superiority of the proposed method, while online testing on out-of-sample data further demonstrates robust generalization. The code of the proposed method is publicly available at: https://github.com/ZihanFang11/SMMRL.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11224-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning semantic consistency for audio-visual zero-shot learning

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review

Pub Date : 2025-04-17 DOI: 10.1007/s10462-025-11228-4

Xiaoyong Li, Jing Yang, Yuling Chen, Wei Zhang, Xiaoli Ruan, Chengjiang Li, Zhidong Su

Audio-visual zero-shot learning requires an understanding of the relationship between audio and visual information to determine unseen classes. Despite many efforts and significant progress in the field, many existing methods tend to focus on learning strong representations, neglecting the semantic consistency between audio and video as well as the inherent hierarchical structure of the data. To address these issues, we propose Learning Semantic Consistency for Audio-Visual Zero-shot Learning. Specifically, we employ an attention mechanism to enhance cross-modal information interactions, aiming to capture the semantic consistency between audio and visual data. Meanwhile, we introduce a hyperbolic space to model the hierarchical structure of the data itself. Moreover, the proposed approach includes a novel loss function that considers the relationships between input modalities, reducing the distance between features of different modalities. To evaluate the proposed method, we test it on three benchmark datasets (hbox {VGGSound-GZS}{{textrm{L}}^{cls}}), (hbox {UCF-GZS}{{textrm{L}}^{cls}}), and (hbox {ActivityNet-GZS}{{textrm{L}}^{cls}}). Extensive experimental results show that the proposed method achieves state-of-the-art performance on all three datasets. For example, on the (hbox {UCF-GZS}{{textrm{L}}^{cls}}) dataset, the harmonic mean is improved by 5.7%. Code and data available at https://github.com/ybyangjing/LSC-AVZSL.

{"title":"Learning semantic consistency for audio-visual zero-shot learning","authors":"Xiaoyong Li, Jing Yang, Yuling Chen, Wei Zhang, Xiaoli Ruan, Chengjiang Li, Zhidong Su","doi":"10.1007/s10462-025-11228-4","DOIUrl":"10.1007/s10462-025-11228-4","url":null,"abstract":"<div><p>Audio-visual zero-shot learning requires an understanding of the relationship between audio and visual information to determine unseen classes. Despite many efforts and significant progress in the field, many existing methods tend to focus on learning strong representations, neglecting the semantic consistency between audio and video as well as the inherent hierarchical structure of the data. To address these issues, we propose Learning Semantic Consistency for Audio-Visual Zero-shot Learning. Specifically, we employ an attention mechanism to enhance cross-modal information interactions, aiming to capture the semantic consistency between audio and visual data. Meanwhile, we introduce a hyperbolic space to model the hierarchical structure of the data itself. Moreover, the proposed approach includes a novel loss function that considers the relationships between input modalities, reducing the distance between features of different modalities. To evaluate the proposed method, we test it on three benchmark datasets <span>(hbox {VGGSound-GZS}{{textrm{L}}^{cls}})</span>, <span>(hbox {UCF-GZS}{{textrm{L}}^{cls}})</span>, and <span>(hbox {ActivityNet-GZS}{{textrm{L}}^{cls}})</span>. Extensive experimental results show that the proposed method achieves state-of-the-art performance on all three datasets. For example, on the <span>(hbox {UCF-GZS}{{textrm{L}}^{cls}})</span> dataset, the harmonic mean is improved by 5.7%. Code and data available at https://github.com/ybyangjing/LSC-AVZSL.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 7","pages":""},"PeriodicalIF":10.7,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11228-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0