首页 > 最新文献

Expert Systems with Applications最新文献

英文 中文
Text-prompted generative data augmentation and semi-supervised learning for indoor defect detection using quadruped robots 四足机器人室内缺陷检测的文本提示生成数据增强和半监督学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.eswa.2026.131545
Xiang Ji , Junjie Chen , Yonglin Fu , Isabelle Chan , Zhen Dong
Timely defect detection is the key to formulating targeted maintenance plans to extend a facility’s lifespan. While tremendous efforts have been made in deploying robots (e.g., drones) for outdoor defect detection, little attention has been paid to defects taking place indoors. Indoor defect detection (IDD) has distinctive characteristics concerning (a) the complex environment (narrow passages, staircases, etc.) that challenges inspection data collection, and (b) the drastic image feature variation caused by uneven illumination and view point changes, which renders methods viable for outdoor detection less useful. This research takes on the challenges and proposes an automated IDD approach. To navigate challenging indoor environments (e.g., staircases), a quadruped robot platform is proposed for inspection image collection. To address the scarcity of indoor data, a novel algorithmic framework for IDD is formulated that integrates large generative models for data augmentation and semi-supervised learning to train on the generated unlabeled data. It is found that the proposed approach can effectively inspect challenging indoor space for defect detection by leveraging the unique locomotion capability of legged robots. Despite the lack of training data, the framework resulted in a performance gain of 5.03% for the model. Future research is suggested to explore autonomous navigation of the robots and three dimensional modeling of the detected defects.
及时的缺陷检测是制定有针对性的维护计划以延长设备寿命的关键。虽然在部署机器人(如无人机)进行室外缺陷检测方面已经做出了巨大的努力,但很少关注室内发生的缺陷。室内缺陷检测(IDD)在以下方面具有鲜明的特点:(a)复杂的环境(狭窄的通道、楼梯等)对检测数据的收集提出了挑战;(b)由于光照不均匀和视点变化引起的图像特征的剧烈变化,使得室外检测方法的可行性降低。本研究面对挑战,提出了一种自动化IDD方法。为了在具有挑战性的室内环境(如楼梯)中导航,提出了一种四足机器人平台用于检测图像采集。为了解决室内数据的稀缺性,提出了一种新的IDD算法框架,该框架集成了用于数据增强的大型生成模型和半监督学习,以对生成的未标记数据进行训练。研究发现,该方法利用足式机器人独特的运动能力,可以有效地检测具有挑战性的室内空间进行缺陷检测。尽管缺乏训练数据,但该框架使模型的性能提高了5.03%。未来的研究建议探索机器人的自主导航和检测缺陷的三维建模。
{"title":"Text-prompted generative data augmentation and semi-supervised learning for indoor defect detection using quadruped robots","authors":"Xiang Ji ,&nbsp;Junjie Chen ,&nbsp;Yonglin Fu ,&nbsp;Isabelle Chan ,&nbsp;Zhen Dong","doi":"10.1016/j.eswa.2026.131545","DOIUrl":"10.1016/j.eswa.2026.131545","url":null,"abstract":"<div><div>Timely defect detection is the key to formulating targeted maintenance plans to extend a facility’s lifespan. While tremendous efforts have been made in deploying robots (e.g., drones) for outdoor defect detection, little attention has been paid to defects taking place indoors. Indoor defect detection (IDD) has distinctive characteristics concerning (a) the complex environment (narrow passages, staircases, etc.) that challenges inspection data collection, and (b) the drastic image feature variation caused by uneven illumination and view point changes, which renders methods viable for outdoor detection less useful. This research takes on the challenges and proposes an automated IDD approach. To navigate challenging indoor environments (e.g., staircases), a quadruped robot platform is proposed for inspection image collection. To address the scarcity of indoor data, a novel algorithmic framework for IDD is formulated that integrates large generative models for data augmentation and semi-supervised learning to train on the generated unlabeled data. It is found that the proposed approach can effectively inspect challenging indoor space for defect detection by leveraging the unique locomotion capability of legged robots. Despite the lack of training data, the framework resulted in a performance gain of 5.03% for the model. Future research is suggested to explore autonomous navigation of the robots and three dimensional modeling of the detected defects.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131545"},"PeriodicalIF":7.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deep learning-based multi-modal approach to robust place recognition in challenging orchard environments 基于深度学习的多模态方法在具有挑战性的果园环境中进行鲁棒位置识别
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.eswa.2026.131543
Dilshan Ranasinghe, Chao Chen
Place recognition addresses the problem of identifying the current location on revisit using the acquired knowledge of all the previously explored locations. The abstract representations of the locations can be generated as they are being explored using processed single or multi-sensory inputs. Later, the current abstraction of the environment can be compared with previously collected information to detect revisits. The loop closure component of Simultaneous Localization And Mapping (SLAM) uses these techniques to identify revisits, which allow the SLAM system to mitigate the associated long-term drift. RGB images, LiDAR point clouds, and LiDAR intensities are the three most popular sensory inputs used in place recognition literature. However, they each have certain disadvantages when used as a single-modal input. Additionally, in recent years, Deep Neural Network (DNN) based methods have emerged increasingly in the literature that addresses the place recognition problem. Therefore, In this work, we introduce a novel multi-modal method that takes advantage of the rich complementary information provided by the above three modalities along with a DNN for place recognition. This work was evaluated on multiple publicly available datasets as well as on a highly repetitive orchard dataset collected by our team. The results demonstrate the ability of this method to be used even in highly challenging environments such as orchards.
地点识别解决了在重访时使用所有先前探索过的地点所获得的知识来识别当前位置的问题。位置的抽象表示可以在使用处理过的单一或多感官输入进行探索时生成。然后,可以将当前环境的抽象与先前收集的信息进行比较,以检测重访。同时定位和映射(SLAM)的闭环组件使用这些技术来识别重访,从而允许SLAM系统减轻相关的长期漂移。RGB图像、激光雷达点云和激光雷达强度是位置识别文献中最常用的三种感官输入。然而,当用作单模态输入时,它们都有一定的缺点。此外,近年来,基于深度神经网络(DNN)的方法越来越多地出现在解决位置识别问题的文献中。因此,在这项工作中,我们引入了一种新的多模态方法,利用上述三种模态提供的丰富互补信息以及DNN进行位置识别。这项工作在多个公开可用的数据集以及我们团队收集的高度重复的果园数据集上进行了评估。结果表明,即使在果园等极具挑战性的环境中,这种方法也可以使用。
{"title":"A deep learning-based multi-modal approach to robust place recognition in challenging orchard environments","authors":"Dilshan Ranasinghe,&nbsp;Chao Chen","doi":"10.1016/j.eswa.2026.131543","DOIUrl":"10.1016/j.eswa.2026.131543","url":null,"abstract":"<div><div>Place recognition addresses the problem of identifying the current location on revisit using the acquired knowledge of all the previously explored locations. The abstract representations of the locations can be generated as they are being explored using processed single or multi-sensory inputs. Later, the current abstraction of the environment can be compared with previously collected information to detect revisits. The loop closure component of Simultaneous Localization And Mapping (SLAM) uses these techniques to identify revisits, which allow the SLAM system to mitigate the associated long-term drift. RGB images, LiDAR point clouds, and LiDAR intensities are the three most popular sensory inputs used in place recognition literature. However, they each have certain disadvantages when used as a single-modal input. Additionally, in recent years, Deep Neural Network (DNN) based methods have emerged increasingly in the literature that addresses the place recognition problem. Therefore, In this work, we introduce a novel multi-modal method that takes advantage of the rich complementary information provided by the above three modalities along with a DNN for place recognition. This work was evaluated on multiple publicly available datasets as well as on a highly repetitive orchard dataset collected by our team. The results demonstrate the ability of this method to be used even in highly challenging environments such as orchards.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131543"},"PeriodicalIF":7.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-performance clustering and routing for industrial wireless sensor networks: An innovative heuristic optimization approach 工业无线传感器网络的高性能聚类和路由:一种创新的启发式优化方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.eswa.2026.131408
Jing Xiao , Mengfei Wang , Tao Luo , Zhigang Li
Industrial wireless sensor networks (IWSNs) play a critical role in enabling reliable data acquisition in industrial environments, where routing scheme directly affects network lifetime and communication reliability. This paper presents a novel clustering routing model that integrates node residual energy, clustering distance, and link quality to better reflect practical network conditions. Based on this model, a high-performance cluster routing protocol leveraging a quantum chaotic immune clone algorithm (HCR-QCICA) is proposed. The protocol employs a new chaotic initialization strategy to avoid optima and a novel quantum optimization strategy to enhance global search capability. Additionally, a type-differentiated inter-cluster multi-hop routing method is developed to address diverse data transmission requirements. Experimental results in simulated industrial scenarios show that HCR-QCICA reduces delay and packet loss rate by at least 4.08% and 9.16%, respectively, while increasing network lifetime and throughput by over 14.07% and 24.08%, compared to LEACH, TEEN, CHEABC-QCRP, MOGWO, and GAPSO-H protocols. These findings demonstrate the effectiveness of the proposed approach in improving IWSNs performance.
工业无线传感器网络(iwsn)在实现工业环境中可靠的数据采集方面起着至关重要的作用,其中路由方案直接影响到网络的生存期和通信可靠性。本文提出了一种新的聚类路由模型,该模型综合了节点剩余能量、聚类距离和链路质量,以更好地反映实际网络情况。基于该模型,提出了一种利用量子混沌免疫克隆算法(HCR-QCICA)的高性能集群路由协议。该协议采用了一种新的混沌初始化策略来避免最优,并采用了一种新的量子优化策略来增强全局搜索能力。此外,为了满足不同的数据传输需求,还开发了一种类型差异化的集群间多跳路由方法。模拟工业场景的实验结果表明,与LEACH、TEEN、CHEABC-QCRP、MOGWO和GAPSO-H协议相比,HCR-QCICA协议的时延和丢包率分别降低了至少4.08%和9.16%,网络寿命和吞吐量分别提高了14.07%和24.08%以上。这些发现证明了所提出的方法在提高IWSNs性能方面的有效性。
{"title":"High-performance clustering and routing for industrial wireless sensor networks: An innovative heuristic optimization approach","authors":"Jing Xiao ,&nbsp;Mengfei Wang ,&nbsp;Tao Luo ,&nbsp;Zhigang Li","doi":"10.1016/j.eswa.2026.131408","DOIUrl":"10.1016/j.eswa.2026.131408","url":null,"abstract":"<div><div>Industrial wireless sensor networks (IWSNs) play a critical role in enabling reliable data acquisition in industrial environments, where routing scheme directly affects network lifetime and communication reliability. This paper presents a novel clustering routing model that integrates node residual energy, clustering distance, and link quality to better reflect practical network conditions. Based on this model, a high-performance cluster routing protocol leveraging a quantum chaotic immune clone algorithm (HCR-QCICA) is proposed. The protocol employs a new chaotic initialization strategy to avoid optima and a novel quantum optimization strategy to enhance global search capability. Additionally, a type-differentiated inter-cluster multi-hop routing method is developed to address diverse data transmission requirements. Experimental results in simulated industrial scenarios show that HCR-QCICA reduces delay and packet loss rate by at least 4.08% and 9.16%, respectively, while increasing network lifetime and throughput by over 14.07% and 24.08%, compared to LEACH, TEEN, CHEABC-QCRP, MOGWO, and GAPSO-H protocols. These findings demonstrate the effectiveness of the proposed approach in improving IWSNs performance.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131408"},"PeriodicalIF":7.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying expert speech: A comprehensive analysis of instructional discourses 专家话语量化:教学话语的综合分析
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.eswa.2026.131515
Mariano Albaladejo-González , Manuel J. Gomez , Óscar Cánovas , Félix Gómez Mármol , José A. Ruipérez-Valiente
Oral communication is a crucial skill in modern society. Nevertheless, it requires sustained practice and constructive feedback. Consequently, several studies have explored the development of oral communication trainers powered by Artificial Intelligence (AI). However, what characterizes expert speech remains unclear, especially given the need to adapt speech to contextual factors. In instructional environments, the speaker’s communication proficiency is a key determinant of audience learning outcomes. For this reason, we have analyzed 1250 speeches from five types of instructional discourses: in-person college classes (Lectures), online learning lessons (Online Courses), instructional animations (Animated Lessons), supplementary materials for school and high school (Supplementary Lessons), and public presentations (Public Talks). We extracted 16 speech metrics, including six additional multiple-participant metrics for Lectures. We obtained 250 videos of each discourse type, ensuring a minimum length of five minutes. Our analysis revealed expert values for each speech metric and showed how speech metrics vary across discourse types. We also developed an AI speech classifier that achieved an F1 score of 0.78. The model struggled to identify Online Courses, which is consistent with the Uniform Manifold Approximation and Projection analysis, showing that Online Courses are closely interjected with the speech of other instructional discourses. Furthermore, we identified distinct speech profiles in Lectures, Public Talks, and Online Courses, highlighting variations in speaking styles. This research provides valuable insights into expert speech in instructional discourses by offering reference values that can help speakers refine their delivery and support researchers in developing more effective speech training systems.
口语交际在现代社会是一项至关重要的技能。然而,它需要持续的实践和建设性的反馈。因此,一些研究已经探索了由人工智能(AI)驱动的口语交际训练器的开发。然而,专家言语的特征仍然不清楚,特别是考虑到需要使言语适应上下文因素。在教学环境中,说话者的沟通能力是决定听众学习效果的关键因素。为此,我们分析了来自五种教学话语类型的1250个演讲:面对面的大学课程(Lectures)、在线学习课程(online Courses)、教学动画(Animated lessons)、中小学辅助教材(supplementary lessons)和公开演讲(public Talks)。我们提取了16个语音指标,包括6个讲座的额外多参与者指标。我们获得了每种话语类型的250个视频,确保最小长度为5分钟。我们的分析揭示了每个语音度量的专家值,并显示了语音度量在不同话语类型中是如何变化的。我们还开发了一个人工智能语音分类器,其F1得分为0.78。该模型努力识别在线课程,这与统一流形近似和投影分析一致,表明在线课程与其他教学话语的演讲密切相关。此外,我们在讲座、公开演讲和在线课程中发现了不同的演讲特征,突出了演讲风格的变化。本研究通过提供参考价值,为教学话语中的专家演讲提供了有价值的见解,可以帮助演讲者改进他们的演讲,并支持研究人员开发更有效的语言训练系统。
{"title":"Quantifying expert speech: A comprehensive analysis of instructional discourses","authors":"Mariano Albaladejo-González ,&nbsp;Manuel J. Gomez ,&nbsp;Óscar Cánovas ,&nbsp;Félix Gómez Mármol ,&nbsp;José A. Ruipérez-Valiente","doi":"10.1016/j.eswa.2026.131515","DOIUrl":"10.1016/j.eswa.2026.131515","url":null,"abstract":"<div><div>Oral communication is a crucial skill in modern society. Nevertheless, it requires sustained practice and constructive feedback. Consequently, several studies have explored the development of oral communication trainers powered by Artificial Intelligence (AI). However, what characterizes expert speech remains unclear, especially given the need to adapt speech to contextual factors. In instructional environments, the speaker’s communication proficiency is a key determinant of audience learning outcomes. For this reason, we have analyzed 1250 speeches from five types of instructional discourses: in-person college classes (<em>Lectures</em>), online learning lessons (<em>Online Courses</em>), instructional animations (<em>Animated Lessons</em>), supplementary materials for school and high school (<em>Supplementary Lessons</em>), and public presentations (<em>Public Talks</em>). We extracted 16 speech metrics, including six additional multiple-participant metrics for <em>Lectures</em>. We obtained 250 videos of each discourse type, ensuring a minimum length of five minutes. Our analysis revealed expert values for each speech metric and showed how speech metrics vary across discourse types. We also developed an AI speech classifier that achieved an F1 score of 0.78. The model struggled to identify <em>Online Courses</em>, which is consistent with the Uniform Manifold Approximation and Projection analysis, showing that <em>Online Courses</em> are closely interjected with the speech of other instructional discourses. Furthermore, we identified distinct speech profiles in <em>Lectures, Public Talks</em>, and <em>Online Courses</em>, highlighting variations in speaking styles. This research provides valuable insights into expert speech in instructional discourses by offering reference values that can help speakers refine their delivery and support researchers in developing more effective speech training systems.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131515"},"PeriodicalIF":7.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating diversity-integrated weighted ranking in metaheuristic algorithms for medical applications 在医学应用的元启发式算法中集成多样性综合加权排序
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.eswa.2026.131301
Qinghong Hou , Qike Shao , Ali Asghar Heidari , Lei Liu , Huiling Chen , Guoxi Liang
Metaheuristic algorithms have become essential for addressing high-dimensional medical optimization problems, yet many still suffer from premature convergence and limited precision. To overcome these issues, we propose a diversity-integrated weighted ranking strategy that combines fitness and population diversity to construct a novel ranking matrix, reshaping convergence dynamics and achieving a better balance between exploration and exploitation. On the IEEE CEC 2017 benchmark suite, this strategy achieves significant improvements on 29 of 30 functions. Building on this foundation, we integrate the mechanism into the Artemisinin Optimization Algorithm and develop WRAO, which is further applied to two key medical tasks: feature selection and medical image segmentation. In feature selection, WRAO effectively identifies compact and discriminative medical features, enhancing disease classification accuracy and interpretability. In image segmentation, it optimizes boundary localization and structural consistency across varying imaging conditions, facilitating more precise pathological analysis. These applications demonstrate that WRAO not only advances metaheuristic optimization but also provides a robust and adaptable computational tool for medical data analysis and decision support, capable of meeting the precision and reliability demands of complex clinical environments.
元启发式算法已经成为解决高维医疗优化问题的关键,但许多算法仍然存在过早收敛和精度有限的问题。为了克服这些问题,我们提出了一种结合适应度和种群多样性的多样性综合加权排名策略,构建了一个新的排名矩阵,重塑了收敛动态,更好地平衡了探索和开发之间的关系。在IEEE CEC 2017基准测试套件中,该策略在30个功能中的29个功能上实现了显着改进。在此基础上,我们将该机制整合到青蒿素优化算法(Artemisinin Optimization Algorithm)中,开发了WRAO,并将其进一步应用于特征选择和医学图像分割两个关键的医学任务。在特征选择方面,WRAO有效地识别了紧凑和判别性的医学特征,提高了疾病分类的准确性和可解释性。在图像分割中,它优化了不同成像条件下的边界定位和结构一致性,有助于更精确的病理分析。这些应用表明,WRAO不仅推进了元启发式优化,而且为医疗数据分析和决策支持提供了一个鲁棒性和适应性强的计算工具,能够满足复杂临床环境对精度和可靠性的要求。
{"title":"Integrating diversity-integrated weighted ranking in metaheuristic algorithms for medical applications","authors":"Qinghong Hou ,&nbsp;Qike Shao ,&nbsp;Ali Asghar Heidari ,&nbsp;Lei Liu ,&nbsp;Huiling Chen ,&nbsp;Guoxi Liang","doi":"10.1016/j.eswa.2026.131301","DOIUrl":"10.1016/j.eswa.2026.131301","url":null,"abstract":"<div><div>Metaheuristic algorithms have become essential for addressing high-dimensional medical optimization problems, yet many still suffer from premature convergence and limited precision. To overcome these issues, we propose a diversity-integrated weighted ranking strategy that combines fitness and population diversity to construct a novel ranking matrix, reshaping convergence dynamics and achieving a better balance between exploration and exploitation. On the IEEE CEC 2017 benchmark suite, this strategy achieves significant improvements on 29 of 30 functions. Building on this foundation, we integrate the mechanism into the Artemisinin Optimization Algorithm and develop WRAO, which is further applied to two key medical tasks: feature selection and medical image segmentation. In feature selection, WRAO effectively identifies compact and discriminative medical features, enhancing disease classification accuracy and interpretability. In image segmentation, it optimizes boundary localization and structural consistency across varying imaging conditions, facilitating more precise pathological analysis. These applications demonstrate that WRAO not only advances metaheuristic optimization but also provides a robust and adaptable computational tool for medical data analysis and decision support, capable of meeting the precision and reliability demands of complex clinical environments.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131301"},"PeriodicalIF":7.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146190987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressive sensing image restoration with deep prior guided group sparse representation 基于深度先验引导群稀疏表示的压缩感知图像恢复
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.eswa.2026.131465
Zhulin Ji , Shenghai Liao , Ruyi Han , Shujun Fu
Compressive sensing (CS) enables accurate reconstruction of images from significantly fewer measurements than required by the Nyquist-Shannon sampling theorem, relying critically on effective image priors to regularize the ill-posed inverse problem. Conventional patch-based sparse representation utilize fixed dictionaries that are learned off-the-shelf using the K-SVD algorithm. However, patch-based sparse representation ignores the relationship among patches, and the learned dictionaries can not capture the global image statistics, which will lead to suboptimal reconstruction performance. In this paper, we exploit group sparse representation (GSR) for image compressive sensing reconstruction. By clustering non-local image patches into group and regarding each group as a unit, group sparse representation simultaneously finding sparse codes for all patches within a group, leading to improved reconstruction fidelity and edge preservation. However, GSR relies solely on the undersampled image itself to construct dictionary that is not learnable, being increasingly unreliable at low compressive sensing rates where substantial loss of local image information occurs. To address this limitation, we propose a Deep Prior guided Group Sparse Representation (DPGSR) model for compressive image restoration, where a deep denoiser is responsible for capturing and learning both local and global image statistics by training on external data. The proposed DPGSR achieves improved global consistency, effectively reducing block artifacts while preserving sharper local details. Extensive experiments on image compressive sensing reconstruction and fast MRI demonstrate that the proposed method outperforms state-of-the-art approaches, particularly in preserving fine details and reducing over-smoothing artifacts.
压缩感知(CS)能够从比Nyquist-Shannon采样定理所需的更少的测量中精确地重建图像,主要依赖于有效的图像先验来正则化不适定逆问题。传统的基于补丁的稀疏表示使用K-SVD算法学习现成的固定字典。然而,基于patch的稀疏表示忽略了patch之间的关系,并且学习到的字典不能捕获全局的图像统计信息,这将导致重建性能不理想。本文利用群稀疏表示(GSR)进行图像压缩感知重构。将非局部图像斑块聚类成一组,并以每一组为单位,进行分组稀疏表示,同时为一组内的所有斑块寻找稀疏编码,从而提高了重建保真度和边缘保持能力。然而,GSR仅依赖于欠采样图像本身来构建不可学习的字典,在低压缩感知率下越来越不可靠,会发生大量的局部图像信息丢失。为了解决这一限制,我们提出了一种用于压缩图像恢复的深度先验引导群稀疏表示(DPGSR)模型,其中深度去噪器负责通过外部数据的训练来捕获和学习局部和全局图像统计。提出的DPGSR实现了改进的全局一致性,有效地减少了块伪影,同时保留了更清晰的局部细节。大量的图像压缩感知重建和快速MRI实验表明,该方法优于最先进的方法,特别是在保留细节和减少过度平滑伪影方面。
{"title":"Compressive sensing image restoration with deep prior guided group sparse representation","authors":"Zhulin Ji ,&nbsp;Shenghai Liao ,&nbsp;Ruyi Han ,&nbsp;Shujun Fu","doi":"10.1016/j.eswa.2026.131465","DOIUrl":"10.1016/j.eswa.2026.131465","url":null,"abstract":"<div><div>Compressive sensing (CS) enables accurate reconstruction of images from significantly fewer measurements than required by the Nyquist-Shannon sampling theorem, relying critically on effective image priors to regularize the ill-posed inverse problem. Conventional patch-based sparse representation utilize fixed dictionaries that are learned off-the-shelf using the K-SVD algorithm. However, patch-based sparse representation ignores the relationship among patches, and the learned dictionaries can not capture the global image statistics, which will lead to suboptimal reconstruction performance. In this paper, we exploit group sparse representation (GSR) for image compressive sensing reconstruction. By clustering non-local image patches into group and regarding each group as a unit, group sparse representation simultaneously finding sparse codes for all patches within a group, leading to improved reconstruction fidelity and edge preservation. However, GSR relies solely on the undersampled image itself to construct dictionary that is not learnable, being increasingly unreliable at low compressive sensing rates where substantial loss of local image information occurs. To address this limitation, we propose a Deep Prior guided Group Sparse Representation (DPGSR) model for compressive image restoration, where a deep denoiser is responsible for capturing and learning both local and global image statistics by training on external data. The proposed DPGSR achieves improved global consistency, effectively reducing block artifacts while preserving sharper local details. Extensive experiments on image compressive sensing reconstruction and fast MRI demonstrate that the proposed method outperforms state-of-the-art approaches, particularly in preserving fine details and reducing over-smoothing artifacts.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131465"},"PeriodicalIF":7.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HHGDroid: Hybrid heterogeneous graph-based android malware detection via multi-evidence similarity fusion HHGDroid:基于多证据相似性融合的基于混合异构图形的android恶意软件检测
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.eswa.2026.131528
Junwei Tang , Xiaomei Tian , Tao Peng , Jianfeng Lu , Haozhao Wang , Ruixuan Li
Currently, static analysis is insufficient to deal with Android malware that employs advanced evasion techniques such as code obfuscation and dynamic loading. Therefore, hybrid analysis that combines static structure and dynamic behavior has become the mainstream trend. However, existing hybrid analysis methods often adopt simple feature concatenation or shallow fusion mechanisms, which cannot effectively integrate heterogeneous static and dynamic features or capture the complex correlations between structure and behavior. To address this, we propose a hybrid heterogeneous graph-based Android malware detection method via multi-evidence similarity fusion, named HHGDroid. The function call graph generated by static analysis and the event graph obtained through dynamic analysis are connected through a comprehensive similarity of multiple evidences such as semantics, permissions, and time frequency, ultimately forming the hybrid heterogeneous graph with multiple heterogeneous nodes and edges. Our constructed hybrid heterogeneous graph is the first one that simultaneously possesses static and dynamic features. Finally, we improve Reliability-Calibrated Heterogeneous Graph Transformer (RCHGT) to learn the multiple relationships in the hybrid heterogeneous graph, which can automatically distinguish reliable and unreliable edges during the information propagation stage. We conduct experiments on real Android malware applications and achieved an F1-score of 97.87%, outperforming the state-of-the-art methods. Additionally, we verify our method on an unknown malware dataset and obtained an F1-score of 81.52%, which is superior to existing methods. HHGDroid is a novel and effective method for detecting Android malware.
目前,静态分析不足以处理采用高级逃避技术(如代码混淆和动态加载)的Android恶意软件。因此,结合静力结构和动力行为的混合分析已成为主流趋势。然而,现有的混合分析方法往往采用简单的特征拼接或浅融合机制,无法有效整合异构的静态和动态特征,也无法捕捉结构与行为之间的复杂关联。为了解决这一问题,我们提出了一种基于多证据相似性融合的基于混合异构图的Android恶意软件检测方法,命名为HHGDroid。静态分析生成的函数调用图与动态分析得到的事件图通过语义、权限、时间频率等多个证据的综合相似度连接起来,最终形成具有多个异构节点和异构边的混合异构图。本文构造的混合异构图是第一个同时具有静态和动态特征的混合异构图。最后,对RCHGT (reliability - calibration Heterogeneous Graph Transformer)进行改进,学习混合异构图中的多重关系,在信息传播阶段自动区分可靠边和不可靠边。我们在真实的Android恶意软件应用上进行了实验,获得了97.87%的f1得分,优于目前最先进的方法。此外,我们在一个未知的恶意软件数据集上验证了我们的方法,得到了81.52%的f1得分,优于现有的方法。HHGDroid是一种新颖有效的Android恶意软件检测方法。
{"title":"HHGDroid: Hybrid heterogeneous graph-based android malware detection via multi-evidence similarity fusion","authors":"Junwei Tang ,&nbsp;Xiaomei Tian ,&nbsp;Tao Peng ,&nbsp;Jianfeng Lu ,&nbsp;Haozhao Wang ,&nbsp;Ruixuan Li","doi":"10.1016/j.eswa.2026.131528","DOIUrl":"10.1016/j.eswa.2026.131528","url":null,"abstract":"<div><div>Currently, static analysis is insufficient to deal with Android malware that employs advanced evasion techniques such as code obfuscation and dynamic loading. Therefore, hybrid analysis that combines static structure and dynamic behavior has become the mainstream trend. However, existing hybrid analysis methods often adopt simple feature concatenation or shallow fusion mechanisms, which cannot effectively integrate heterogeneous static and dynamic features or capture the complex correlations between structure and behavior. To address this, we propose a hybrid heterogeneous graph-based Android malware detection method via multi-evidence similarity fusion, named HHGDroid. The function call graph generated by static analysis and the event graph obtained through dynamic analysis are connected through a comprehensive similarity of multiple evidences such as semantics, permissions, and time frequency, ultimately forming the hybrid heterogeneous graph with multiple heterogeneous nodes and edges. Our constructed hybrid heterogeneous graph is the first one that simultaneously possesses static and dynamic features. Finally, we improve Reliability-Calibrated Heterogeneous Graph Transformer (RCHGT) to learn the multiple relationships in the hybrid heterogeneous graph, which can automatically distinguish reliable and unreliable edges during the information propagation stage. We conduct experiments on real Android malware applications and achieved an F1-score of 97.87%, outperforming the state-of-the-art methods. Additionally, we verify our method on an unknown malware dataset and obtained an F1-score of 81.52%, which is superior to existing methods. HHGDroid is a novel and effective method for detecting Android malware.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131528"},"PeriodicalIF":7.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Surv-RWKV: Cross-modal receptance weighted key-value interaction with optimal transport feature alignment for survival analysis Surv-RWKV:跨模态接受加权键值相互作用与最佳运输特征对齐的生存分析
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.eswa.2026.131506
Xiyang Kuang , Bin Yang , Bingo Wing-Kuen Ling , Kok Lay Teo , Xiaozhi Zhang
Multimodal learning has played a pivotal role in survival prediction, particularly in integrating pathological images and genomic data for improving predictive performance. Pathological images provide macroscopic histological information about tumor morphology, while genomic data reveal molecular-level genetic characteristics. The integration of these two modalities enables a comprehensive characterization of tumor heterogeneity and disease progression mechanisms. Despite recent advances in multimodal integration that have significantly enhanced prognostic accuracy, challenges remain in effectively analyzing high-dimensional and heterogeneous whole-slide images and omics data. Current Transformer-based sequence modeling approaches suffer from limited computational efficiency when processing long feature sequences and capturing complex cross-modal interactions. To address these challenges, we propose an innovative cross-modal receptance weighted key-value (RWKV)-based framework, termed Surv-RWKV, for survival prediction. This framework integrates RWKV-based sequence modeling with advanced multimodal fusion strategies to enhance both predictive accuracy and model efficiency. Specifically, Surv-RWKV employs parallel RWKV-based encoders to model long-range dependencies in WSI tissue cluster patterns and genomic pathway activation profiles, achieving improved prognostic performance with optimized computational efficiency. Subsequently, a transport-based optimal cross-modal alignment module is introduced to establish semantic correspondences between histopathological and genomic feature spaces. Furthermore, a progressive feature fusion strategy is implemented to enable effective cross-modal interaction. An RWKV-based shallow fusion module is first developed to explore cross-modal dependencies through spatial-channel hybrid operations, thereby enhancing the representational quality of fused features. A cross-RWKV deep interaction module is then designed to further strengthen information synthesis via iterative cross-attention mechanisms, while simultaneously reinforcing intra-modal representation learning and cross-modal knowledge transfer. Surv-RWKV is expected to effectively capture such cross-modal correlations, thereby improving the accuracy and interpretability of survival predictions. Extensive validation across five TCGA cancer cohorts demonstrates that Surv-RWKV achieves state-of-the-art predictive performance with superior computational efficiency.
多模式学习在生存预测中发挥了关键作用,特别是在整合病理图像和基因组数据以提高预测性能方面。病理图像提供了肿瘤形态的宏观组织学信息,而基因组数据揭示了分子水平的遗传特征。这两种模式的整合可以全面表征肿瘤异质性和疾病进展机制。尽管最近在多模式整合方面取得了进展,显著提高了预后准确性,但在有效分析高维和异构的全切片图像和组学数据方面仍然存在挑战。当前基于变压器的序列建模方法在处理长特征序列和捕获复杂的跨模态相互作用时,计算效率有限。为了应对这些挑战,我们提出了一种创新的基于跨模态接受加权键值(RWKV)的框架,称为Surv-RWKV,用于生存预测。该框架将基于rwkv的序列建模与先进的多模态融合策略相结合,提高了预测精度和模型效率。具体来说,Surv-RWKV采用基于并行rwkv的编码器来模拟WSI组织簇模式和基因组通路激活谱的远程依赖关系,通过优化计算效率来提高预后性能。随后,引入基于转运的最优跨模态对齐模块来建立组织病理和基因组特征空间之间的语义对应关系。此外,实现了渐进式特征融合策略以实现有效的跨模态交互。首先开发了基于rwkv的浅融合模块,通过空间通道混合操作探索跨模态依赖关系,从而提高融合特征的表征质量。然后设计一个跨rwkv深度交互模块,通过迭代的交叉注意机制进一步加强信息合成,同时加强模态内表征学习和跨模态知识转移。Surv-RWKV有望有效捕获这种跨模态相关性,从而提高生存预测的准确性和可解释性。在五个TCGA癌症队列中进行的广泛验证表明,Surv-RWKV具有卓越的计算效率,实现了最先进的预测性能。
{"title":"Surv-RWKV: Cross-modal receptance weighted key-value interaction with optimal transport feature alignment for survival analysis","authors":"Xiyang Kuang ,&nbsp;Bin Yang ,&nbsp;Bingo Wing-Kuen Ling ,&nbsp;Kok Lay Teo ,&nbsp;Xiaozhi Zhang","doi":"10.1016/j.eswa.2026.131506","DOIUrl":"10.1016/j.eswa.2026.131506","url":null,"abstract":"<div><div>Multimodal learning has played a pivotal role in survival prediction, particularly in integrating pathological images and genomic data for improving predictive performance. Pathological images provide macroscopic histological information about tumor morphology, while genomic data reveal molecular-level genetic characteristics. The integration of these two modalities enables a comprehensive characterization of tumor heterogeneity and disease progression mechanisms. Despite recent advances in multimodal integration that have significantly enhanced prognostic accuracy, challenges remain in effectively analyzing high-dimensional and heterogeneous whole-slide images and omics data. Current Transformer-based sequence modeling approaches suffer from limited computational efficiency when processing long feature sequences and capturing complex cross-modal interactions. To address these challenges, we propose an innovative cross-modal receptance weighted key-value (RWKV)-based framework, termed Surv-RWKV, for survival prediction. This framework integrates RWKV-based sequence modeling with advanced multimodal fusion strategies to enhance both predictive accuracy and model efficiency. Specifically, Surv-RWKV employs parallel RWKV-based encoders to model long-range dependencies in WSI tissue cluster patterns and genomic pathway activation profiles, achieving improved prognostic performance with optimized computational efficiency. Subsequently, a transport-based optimal cross-modal alignment module is introduced to establish semantic correspondences between histopathological and genomic feature spaces. Furthermore, a progressive feature fusion strategy is implemented to enable effective cross-modal interaction. An RWKV-based shallow fusion module is first developed to explore cross-modal dependencies through spatial-channel hybrid operations, thereby enhancing the representational quality of fused features. A cross-RWKV deep interaction module is then designed to further strengthen information synthesis via iterative cross-attention mechanisms, while simultaneously reinforcing intra-modal representation learning and cross-modal knowledge transfer. Surv-RWKV is expected to effectively capture such cross-modal correlations, thereby improving the accuracy and interpretability of survival predictions. Extensive validation across five TCGA cancer cohorts demonstrates that Surv-RWKV achieves state-of-the-art predictive performance with superior computational efficiency.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131506"},"PeriodicalIF":7.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ESG investment under different supply chain power structures: decisions, impacts, and the triple win 不同供应链权力结构下的ESG投资:决策、影响与三赢
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.eswa.2026.131518
Man Yu , Erbao Cao , Yu Zhang
ESG serves as a key global metric for corporate green and sustainable development. This study investigates a manufacturer’s investment decisions on ESG subcategories (E, S, and G) and their impacts on enterprises, consumers and the environment across different supply chain power structures. The findings show that a dominant manufacturer invests less in pollutant emission reduction and corporate governance, whereas the highest ESG investment occurs under balanced power. Under ESG investment, an enterprise’s profit increases with power, yet consumer surplus peaks under balanced power. Regarding the impacts on the environment, it depends on potential market demand. Moreover, ESG investment consistently increases the retailer’s profit and consumer surplus. When the level of social responsibility commitment is below a threshold, it can increase the manufacturer’s profit. Furthermore, this work identifies the conditions for three different types of Pareto improvement, namely, the win–win scenario for firms and consumers, the win–win scenario for consumers and the environment, and the triple win–win scenario for the three parties. This study sheds light on how enterprises make decisions regarding ESG investment to realize economic and environmental benefits.
ESG是衡量企业绿色和可持续发展的关键全球指标。本研究探讨制造商在ESG子类别(E、S和G)上的投资决策,以及它们在不同供应链权力结构下对企业、消费者和环境的影响。研究结果表明,优势制造商在污染物减排和公司治理方面的投资较少,而在平衡功率下,ESG投资最高。在ESG投资下,企业利润随功率增加而增加,而在均衡功率下,消费者剩余达到峰值。至于对环境的影响,这取决于潜在的市场需求。此外,ESG投资持续增加零售商的利润和消费者剩余。当企业的社会责任承诺水平低于某一阈值时,可以增加企业的利润。此外,本文还确定了三种不同类型的帕累托改进的条件,即企业与消费者的双赢、消费者与环境的双赢以及三方的三重双赢。本研究揭示了企业如何进行ESG投资决策以实现经济效益和环境效益。
{"title":"ESG investment under different supply chain power structures: decisions, impacts, and the triple win","authors":"Man Yu ,&nbsp;Erbao Cao ,&nbsp;Yu Zhang","doi":"10.1016/j.eswa.2026.131518","DOIUrl":"10.1016/j.eswa.2026.131518","url":null,"abstract":"<div><div>ESG serves as a key global metric for corporate green and sustainable development. This study investigates a manufacturer’s investment decisions on ESG subcategories (E, S, and G) and their impacts on enterprises, consumers and the environment across different supply chain power structures. The findings show that a dominant manufacturer invests less in pollutant emission reduction and corporate governance, whereas the highest ESG investment occurs under balanced power. Under ESG investment, an enterprise’s profit increases with power, yet consumer surplus peaks under balanced power. Regarding the impacts on the environment, it depends on potential market demand. Moreover, ESG investment consistently increases the retailer’s profit and consumer surplus. When the level of social responsibility commitment is below a threshold, it can increase the manufacturer’s profit. Furthermore, this work identifies the conditions for three different types of Pareto improvement, namely, the win–win scenario for firms and consumers, the win–win scenario for consumers and the environment, and the triple win–win scenario for the three parties. This study sheds light on how enterprises make decisions regarding ESG investment to realize economic and environmental benefits.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131518"},"PeriodicalIF":7.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RVFormer: Keypoint-based fusion of 4D radar and vision for 3D object detection in autonomous driving RVFormer:基于关键点的四维雷达与视觉融合,用于自动驾驶中3D物体检测
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.eswa.2026.131497
Xin Bi , Caien Weng , Panpan Tong , Arno Eichberger , Lu Xiong
Multi-modal fusion is crucial in autonomous driving perception, enhancing reliability, completeness, and accuracy, which extends the performance limits of perception systems. Specifically, large-scale perception through 4D radar and vision fusion has become a key research focus aimed at improving driving safety, enhancing complex scene understanding, and supporting fine-grained local planning and control. However, existing 3D object detection methods typically rely on fixed-voxel representations to maintain detection accuracy. As the perception range increases, these methods incur considerable computational overhead. While transformer-based query methods show strong potential in capturing dependencies over large receptive fields in image-domain tasks, their application in radar-vision fusion is limited due to radar point cloud sparsity and cross-modal alignment challenges. To address these limitations, we propose RVFormer, a dual-branch feature-level fusion network that uses a sparse keypoint-based query strategy to integrate features from both modalities, thereby mitigating the impact of large-scale scenes on inference speed. Additionally, we introduce clustered voxel query initialization (CVQI) to accelerate convergence and enhance object localization. By incorporating the radar voxel painter (RVP), radar-image cross-attention (RICA), and gated adaptive fusion (GAF) modules, our framework enables deep and adaptive fusion of radar and visual features, effectively mitigating issues caused by point cloud sparsity and modality inconsistency. Compared to existing radar-vision fusion models, RVFormer demonstrates competitive performance, with an inference speed of approximately 15.2 frames per second. It delivers accuracy comparable to CNN-based approaches, while outperforming baseline methods by at least 4.72% in 3D mean average precision and 5.82% in bird’s-eye view mean average precision.
多模态融合在自动驾驶感知中至关重要,提高了感知系统的可靠性、完整性和准确性,扩展了感知系统的性能极限。具体而言,通过4D雷达和视觉融合进行大规模感知已成为提高驾驶安全性、增强复杂场景理解和支持细粒度局部规划和控制的关键研究热点。然而,现有的3D物体检测方法通常依赖于固定体素表示来保持检测精度。随着感知范围的增加,这些方法会产生相当大的计算开销。虽然基于变压器的查询方法在图像域任务中显示出捕获大型接受域依赖关系的强大潜力,但由于雷达点云稀疏和跨模态对齐挑战,它们在雷达视觉融合中的应用受到限制。为了解决这些限制,我们提出了RVFormer,这是一种双分支特征级融合网络,它使用基于稀疏关键点的查询策略来整合两种模式的特征,从而减轻了大规模场景对推理速度的影响。此外,我们引入了聚类体素查询初始化(CVQI)来加速收敛和增强目标定位。通过整合雷达体素绘制(RVP)、雷达图像交叉关注(RICA)和门控自适应融合(GAF)模块,我们的框架能够实现雷达和视觉特征的深度和自适应融合,有效缓解点云稀疏和模态不一致造成的问题。与现有的雷达-视觉融合模型相比,RVFormer具有竞争力的性能,推理速度约为每秒15.2帧。它提供的精度与基于cnn的方法相当,而在3D平均精度上至少比基线方法高4.72%,在鸟瞰平均精度上至少比基线方法高5.82%。
{"title":"RVFormer: Keypoint-based fusion of 4D radar and vision for 3D object detection in autonomous driving","authors":"Xin Bi ,&nbsp;Caien Weng ,&nbsp;Panpan Tong ,&nbsp;Arno Eichberger ,&nbsp;Lu Xiong","doi":"10.1016/j.eswa.2026.131497","DOIUrl":"10.1016/j.eswa.2026.131497","url":null,"abstract":"<div><div>Multi-modal fusion is crucial in autonomous driving perception, enhancing reliability, completeness, and accuracy, which extends the performance limits of perception systems. Specifically, large-scale perception through 4D radar and vision fusion has become a key research focus aimed at improving driving safety, enhancing complex scene understanding, and supporting fine-grained local planning and control. However, existing 3D object detection methods typically rely on fixed-voxel representations to maintain detection accuracy. As the perception range increases, these methods incur considerable computational overhead. While transformer-based query methods show strong potential in capturing dependencies over large receptive fields in image-domain tasks, their application in radar-vision fusion is limited due to radar point cloud sparsity and cross-modal alignment challenges. To address these limitations, we propose RVFormer, a dual-branch feature-level fusion network that uses a sparse keypoint-based query strategy to integrate features from both modalities, thereby mitigating the impact of large-scale scenes on inference speed. Additionally, we introduce clustered voxel query initialization (CVQI) to accelerate convergence and enhance object localization. By incorporating the radar voxel painter (RVP), radar-image cross-attention (RICA), and gated adaptive fusion (GAF) modules, our framework enables deep and adaptive fusion of radar and visual features, effectively mitigating issues caused by point cloud sparsity and modality inconsistency. Compared to existing radar-vision fusion models, RVFormer demonstrates competitive performance, with an inference speed of approximately 15.2 frames per second. It delivers accuracy comparable to CNN-based approaches, while outperforming baseline methods by at least 4.72% in 3D mean average precision and 5.82% in bird’s-eye view mean average precision.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131497"},"PeriodicalIF":7.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Expert Systems with Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1