Pub Date : 2026-06-01Epub Date: 2025-12-15DOI: 10.1016/j.eswa.2025.130826
Weichao Wu , Yongyang Xu , Zhong Xie
Point cloud completion aims to reconstruct complete structures from incomplete point clouds by extracting fine-grained local details and global features. Current state-of-the-art methods rely on Transformer architectures, which suffer from quadratic complexity, leading to high computational costs and trade-offs in resolution and feature extraction. To address this limitation, we propose a novel point cloud completion network that integrates the Mamba model, a state space framework with linear complexity, for feature extraction in the encoding phase. Our approach replaces the self-attention module with Mamba and introduces a multi-scale encoding network to enhance the extraction and fusion of features from incomplete point clouds. A cross-attention decoding module processes centre points and incomplete features to predict a complete point cloud. Experiments on synthetic and real-world datasets show that our method achieves comparable performance to existing state-of-the-art approaches on benchmark datasets, achieving an average CDL1 score of 6.50 on the PCN dataset. In addition, our method demonstrates superior accuracy when processing large-volume point cloud data, highlighting Mamba’s effectiveness in handling such challenges compared with Transformer-based models.
{"title":"A point cloud completion network integrating Mamba and transformer architectures","authors":"Weichao Wu , Yongyang Xu , Zhong Xie","doi":"10.1016/j.eswa.2025.130826","DOIUrl":"10.1016/j.eswa.2025.130826","url":null,"abstract":"<div><div>Point cloud completion aims to reconstruct complete structures from incomplete point clouds by extracting fine-grained local details and global features. Current state-of-the-art methods rely on Transformer architectures, which suffer from quadratic complexity, leading to high computational costs and trade-offs in resolution and feature extraction. To address this limitation, we propose a novel point cloud completion network that integrates the Mamba model, a state space framework with linear complexity, for feature extraction in the encoding phase. Our approach replaces the self-attention module with Mamba and introduces a multi-scale encoding network to enhance the extraction and fusion of features from incomplete point clouds. A cross-attention decoding module processes centre points and incomplete features to predict a complete point cloud. Experiments on synthetic and real-world datasets show that our method achieves comparable performance to existing state-of-the-art approaches on benchmark datasets, achieving an average CDL1 score of 6.50 on the PCN dataset. In addition, our method demonstrates superior accuracy when processing large-volume point cloud data, highlighting Mamba’s effectiveness in handling such challenges compared with Transformer-based models.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 130826"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-02-07DOI: 10.1016/j.eswa.2026.131559
Cunhan Guo, Heyan Huang, Ruiqi Hu, Danjie Han
Audio-Visual Cooperative tasks underpin multimodal scene understanding and compel models to reconcile continuous temporal evolution with abrupt sensory transitions. We propose the Coherence-Aware and Snap-Triggered mechanism (CAST) mechanism, a plug-in temporal refinement layer without perturbing backbone parameters or demanding additional modalities. The Exponential Memory based Coherence-Aware module attenuates distant frame contributions through an exponentially decaying weight envelope, thereby preventing the persistent influence of obsolete disruptions. Complementarily, the Optical Flow based Snap-Triggered Module module registers instantaneous motion discontinuities and reallocates attention toward nascent events. Operating in concert, these modules yield a representation that remains coherent across smooth transitions yet responsive to sudden perturbations. Empirical evaluation across multiple AVC benchmarks demonstrates consistent superiority over established baselines, corroborating that CAST enhances temporal fidelity and, by extension, the reliability of downstream multimodal decisions.
{"title":"Coherence-aware and snap-triggered: A novel mechanism for audio-visual cooperative tasks","authors":"Cunhan Guo, Heyan Huang, Ruiqi Hu, Danjie Han","doi":"10.1016/j.eswa.2026.131559","DOIUrl":"10.1016/j.eswa.2026.131559","url":null,"abstract":"<div><div>Audio-Visual Cooperative tasks underpin multimodal scene understanding and compel models to reconcile continuous temporal evolution with abrupt sensory transitions. We propose the Coherence-Aware and Snap-Triggered mechanism (CAST) mechanism, a plug-in temporal refinement layer without perturbing backbone parameters or demanding additional modalities. The Exponential Memory based Coherence-Aware module attenuates distant frame contributions through an exponentially decaying weight envelope, thereby preventing the persistent influence of obsolete disruptions. Complementarily, the Optical Flow based Snap-Triggered Module module registers instantaneous motion discontinuities and reallocates attention toward nascent events. Operating in concert, these modules yield a representation that remains coherent across smooth transitions yet responsive to sudden perturbations. Empirical evaluation across multiple AVC benchmarks demonstrates consistent superiority over established baselines, corroborating that CAST enhances temporal fidelity and, by extension, the reliability of downstream multimodal decisions.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131559"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-02-05DOI: 10.1016/j.eswa.2026.131469
Jiale Quan , Weijun Hu , Xianlong Ma , Gang Chen
Achieving robust indoor autonomous flight for Unmanned Aerial Vehicles (UAVs) under strict hardware and computational constraints remains a formidable challenge. Conventional solutions relying on high-end sensors or global mapping are often inapplicable to resource-constrained micro-UAVs. In this paper, we propose a mapless integrated navigation framework aimed at achieving stable flight using a low-cost single-line 2D LiDAR. To address the limitations of sparse sensing, we propose a window-neighborhood-based denoising filtering algorithm and a velocity estimation-based motion distortion correction module. The system combines a risk-aware local planner and a short-sighted trajectory memory mechanism to navigate through cluttered spaces. The system operates in an O(N) loop with sub-millisecond latency. To overcome the local minima inherent in reactive planning, a deadlock escape layer is introduced, which formalizes navigation difficulty through trajectory entropy analysis, and generates recovery waypoints using discrete polar coordinate search. Validation through high-fidelity simulations and real-world experiments show that the system is capable of collision-free navigation at speeds up to 6 m/s, using low-cost sensors. This work provides an efficient solution for deploying intelligent aerial robots in perception-constrained indoor environments.
{"title":"Breaking the low-cost barrier: a memory-augmented reactive navigation system for UAVs in cluttered indoor environments","authors":"Jiale Quan , Weijun Hu , Xianlong Ma , Gang Chen","doi":"10.1016/j.eswa.2026.131469","DOIUrl":"10.1016/j.eswa.2026.131469","url":null,"abstract":"<div><div>Achieving robust indoor autonomous flight for Unmanned Aerial Vehicles (UAVs) under strict hardware and computational constraints remains a formidable challenge. Conventional solutions relying on high-end sensors or global mapping are often inapplicable to resource-constrained micro-UAVs. In this paper, we propose a mapless integrated navigation framework aimed at achieving stable flight using a low-cost single-line 2D LiDAR. To address the limitations of sparse sensing, we propose a window-neighborhood-based denoising filtering algorithm and a velocity estimation-based motion distortion correction module. The system combines a risk-aware local planner and a short-sighted trajectory memory mechanism to navigate through cluttered spaces. The system operates in an O(N) loop with sub-millisecond latency. To overcome the local minima inherent in reactive planning, a deadlock escape layer is introduced, which formalizes navigation difficulty through trajectory entropy analysis, and generates recovery waypoints using discrete polar coordinate search. Validation through high-fidelity simulations and real-world experiments show that the system is capable of collision-free navigation at speeds up to 6 m/s, using low-cost sensors. This work provides an efficient solution for deploying intelligent aerial robots in perception-constrained indoor environments.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131469"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-02-06DOI: 10.1016/j.eswa.2026.131534
Xiujing Gao , Yongfeng Xie , Fanchao Lin , Chiwang Lin , Hongwu Huang , Ziru Wang
Accurate prediction of significant wave heights is crucial for the safety of marine structures and ships. Traditional models struggle to capture the key frequency and periodic characteristics in wave height data. To address this issue, a novel Ivy Algorithm-Fast Fourier Transform Mogrifier Gated Recurrent Unit (IVYA-FMGRU) model is proposed, which integrates the gated recurrent unit (GRU) with the fast Fourier transform (FFT) and Mogrifier operations. The FFT extracts periodic features, the Mogrifier enhances the interaction between the GRU and frequency information, and the Ivy algorithm (IVYA), a bio-inspired optimization method, optimizes the model parameters. In addition, random forest (RF) is employed for feature selection. Experimental results show that the IVYA-FMGRU model achieves R2 scores of 0.8505, 0.8683, and 0.8910 on datasets 46027, 46083, and 46084, respectively outperforming other baseline models. Furthermore, error statistical analysis across different wave height intervals confirms the model’s accuracy and stability within each interval, demonstrating its superior performance and generalization capability in wave height prediction.
{"title":"IVYA-FMGRU: A frequency-domain context interaction model with bio-inspired optimization for significant wave height prediction","authors":"Xiujing Gao , Yongfeng Xie , Fanchao Lin , Chiwang Lin , Hongwu Huang , Ziru Wang","doi":"10.1016/j.eswa.2026.131534","DOIUrl":"10.1016/j.eswa.2026.131534","url":null,"abstract":"<div><div>Accurate prediction of significant wave heights is crucial for the safety of marine structures and ships. Traditional models struggle to capture the key frequency and periodic characteristics in wave height data. To address this issue, a novel Ivy Algorithm-Fast Fourier Transform Mogrifier Gated Recurrent Unit (IVYA-FMGRU) model is proposed, which integrates the gated recurrent unit (GRU) with the fast Fourier transform (FFT) and Mogrifier operations. The FFT extracts periodic features, the Mogrifier enhances the interaction between the GRU and frequency information, and the Ivy algorithm (IVYA), a bio-inspired optimization method, optimizes the model parameters. In addition, random forest (RF) is employed for feature selection. Experimental results show that the IVYA-FMGRU model achieves R<sup>2</sup> scores of 0.8505, 0.8683, and 0.8910 on datasets 46027, 46083, and 46084, respectively outperforming other baseline models. Furthermore, error statistical analysis across different wave height intervals confirms the model’s accuracy and stability within each interval, demonstrating its superior performance and generalization capability in wave height prediction.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131534"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maritime Autonomous Surface Ships (MASS) require reliable performance evaluation methods to ensure safe and efficient operation, yet existing research and regulations lack an integrated approach for automatic assessment of maneuvering, path planning, obstacle avoidance, and motion control performance. To fill this gap, this study proposes a comprehensive assessment tool that integrates assessment aspects, test scenarios, key performance indicators (KPIs), and evaluation criteria. Three assessment modules were developed: the Maneuvering Assessment Module (MAM), the Path planning and Obstacle avoidance Assessment Module (POAM), and the Motion Control Assessment Module (MCAM). The applicability of the proposed assessment tool is studied through simulations, including maneuvering tests under three water-depth conditions, comparative evaluation of ten path-planning algorithms, and an analysis of path-following control performance. The results indicate that maneuvering performance is poorer in shallow water (water depth-draft ratio =1.4) compared with deep (=10.0) and medium-deep water (=2.0). Moreover, the proposed Tuned Fast Marching Square (TFMS) method generates safer and more cost-effective paths than Fast Marching Method (FMM) and FMS. These findings confirm that the assessment tool can provide quantitative and reproducible evaluations of MASS performance. The developed tool offers a practical platform for both researchers and practitioners, with potential extensions toward environmentally oriented (“green”) performance assessments in future work.
{"title":"An intelligent approach to maritime autonomous surface ship performance evaluation","authors":"Changyuan Chen, Chuanbo Duan, Yipeng Wang, Guiyang Zhang","doi":"10.1016/j.eswa.2026.131631","DOIUrl":"10.1016/j.eswa.2026.131631","url":null,"abstract":"<div><div>Maritime Autonomous Surface Ships (MASS) require reliable performance evaluation methods to ensure safe and efficient operation, yet existing research and regulations lack an integrated approach for automatic assessment of maneuvering, path planning, obstacle avoidance, and motion control performance. To fill this gap, this study proposes a comprehensive assessment tool that integrates assessment aspects, test scenarios, key performance indicators (KPIs), and evaluation criteria. Three assessment modules were developed: the Maneuvering Assessment Module (MAM), the Path planning and Obstacle avoidance Assessment Module (POAM), and the Motion Control Assessment Module (MCAM). The applicability of the proposed assessment tool is studied through simulations, including maneuvering tests under three water-depth conditions, comparative evaluation of ten path-planning algorithms, and an analysis of path-following control performance. The results indicate that maneuvering performance is poorer in shallow water (water depth-draft ratio <span><math><mrow><mi>h</mi><mo>/</mo><mi>T</mi></mrow></math></span>=1.4) compared with deep (<span><math><mrow><mi>h</mi><mo>/</mo><mi>T</mi></mrow></math></span>=10.0) and medium-deep water (<span><math><mrow><mi>h</mi><mo>/</mo><mi>T</mi></mrow></math></span>=2.0). Moreover, the proposed Tuned Fast Marching Square (TFMS) method generates safer and more cost-effective paths than Fast Marching Method (FMM) and FMS. These findings confirm that the assessment tool can provide quantitative and reproducible evaluations of MASS performance. The developed tool offers a practical platform for both researchers and practitioners, with potential extensions toward environmentally oriented (“green”) performance assessments in future work.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131631"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-02-05DOI: 10.1016/j.eswa.2026.131396
Yanfeng He, Fangning Hu, Guoxiang Tong
Multimodal medical image segmentation plays a crucial role in disease diagnosis, as different MRI modalities provide complementary structural and lesion information. However, in clinical practice, the absence of certain modalities often leads to a significant decline in segmentation performance, limiting the application of multimodal methods. To address this issue, we propose a multimodal segmentation model called MECS-Net, which combines modality contribution optimization, edge enhancement, and efficient feature fusion. Based on four MRI modalities (Flair, T1ce, T1, T2), we further introduce edge features as auxiliary modalities to enhance the perception of critical structural boundaries. The model incorporates a modality contribution measurement mechanism to quantify the actual predictive value of each modality at the sample level and performs resampling training on low-contribution modalities to mitigate performance degradation caused by modality missing. The feature fusion module combines multi-head cross-attention and state space modeling (Mamba), where the former enhances fine-grained interactions between modalities and the latter models cross-modal global dependencies, synergistically improving semantic alignment and fusion effects. Extensive experiments on the BraTS 2020 dataset demonstrate that MECS-Net achieves outstanding performance under both complete and incomplete modality conditions. The Dice coefficients for WT (whole tumor area) and TC (tumor core area) reach 91.8% and 86.4%, respectively, under complete modality conditions, and average 86.7% and 79.1%, respectively, under incomplete modality conditions.
{"title":"Towards robust brain tumor segmentation under modality incompleteness: A contribution-optimized edge-enhanced network","authors":"Yanfeng He, Fangning Hu, Guoxiang Tong","doi":"10.1016/j.eswa.2026.131396","DOIUrl":"10.1016/j.eswa.2026.131396","url":null,"abstract":"<div><div>Multimodal medical image segmentation plays a crucial role in disease diagnosis, as different MRI modalities provide complementary structural and lesion information. However, in clinical practice, the absence of certain modalities often leads to a significant decline in segmentation performance, limiting the application of multimodal methods. To address this issue, we propose a multimodal segmentation model called MECS-Net, which combines modality contribution optimization, edge enhancement, and efficient feature fusion. Based on four MRI modalities (Flair, T1ce, T1, T2), we further introduce edge features as auxiliary modalities to enhance the perception of critical structural boundaries. The model incorporates a modality contribution measurement mechanism to quantify the actual predictive value of each modality at the sample level and performs resampling training on low-contribution modalities to mitigate performance degradation caused by modality missing. The feature fusion module combines multi-head cross-attention and state space modeling (Mamba), where the former enhances fine-grained interactions between modalities and the latter models cross-modal global dependencies, synergistically improving semantic alignment and fusion effects. Extensive experiments on the BraTS 2020 dataset demonstrate that MECS-Net achieves outstanding performance under both complete and incomplete modality conditions. The Dice coefficients for WT (whole tumor area) and TC (tumor core area) reach 91.8% and 86.4%, respectively, under complete modality conditions, and average 86.7% and 79.1%, respectively, under incomplete modality conditions.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131396"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-02-05DOI: 10.1016/j.eswa.2026.131544
Wenlong Hang , Beijing Wang , Shuang Liang , Qingfeng Zhang , Qiang Wu , Yukun Jin , Qiong Wang , Jing Qin
Semi-supervised learning (SSL) has shown promising performance in medical image segmentation by effectively utilizing extensive unlabeled images. However, inaccurate predictions of unlabeled images can significantly impair the segmentation performance of SSL models. Furthermore, most current SSL methods lack mechanisms to handle cognitive bias, causing the model easily overfit on inaccurate predictions and making self-correction challenging. In this work, we propose a conflict-aware semi-supervised mutual learning framework (CSSML), which integrates two different subnetworks and selectively utilizes conflicting pseudo-labels for mutual supervision to address these challenges. Specifically, we introduce two subnetworks with different architecture incorporating a conflict-aware distinct feature learning (CDFL) regularization to avoid the homogenization of subnetworks while promoting diversified predictions. To handle potential inaccurate predictions, we introduce a geometry-aware mutual pseudo supervision (GMPS) regularization to determine the reliability of conflicting pseudo-labels of unlabeled images, and selectively leverage the more reliable pseudo-labels in the two subnetworks to supervise the other one. The synergistic learning between CDFL and GMPS regularizations during the training process facilitates each subnetwork to selectively incorporates reliable knowledge from the other subnetwork, thereby helping the model overcome cognitive bias. Extensive experiments on three public medical image datasets demonstrate that the proposed CSSML achieves an average of 80.65% DSC, 87.83% Precision, and 14.48mm 95HD using only 20% labeled data, highlight-ing its superior performance. The code is available at: https://github.com/Mwnic-AI/CSSML.
{"title":"Conflict-aware semi-supervised mutual learning for medical image segmentation","authors":"Wenlong Hang , Beijing Wang , Shuang Liang , Qingfeng Zhang , Qiang Wu , Yukun Jin , Qiong Wang , Jing Qin","doi":"10.1016/j.eswa.2026.131544","DOIUrl":"10.1016/j.eswa.2026.131544","url":null,"abstract":"<div><div>Semi-supervised learning (SSL) has shown promising performance in medical image segmentation by effectively utilizing extensive unlabeled images. However, inaccurate predictions of unlabeled images can significantly impair the segmentation performance of SSL models. Furthermore, most current SSL methods lack mechanisms to handle cognitive bias, causing the model easily overfit on inaccurate predictions and making self-correction challenging. In this work, we propose a conflict-aware semi-supervised mutual learning framework (CSSML), which integrates two different subnetworks and selectively utilizes conflicting pseudo-labels for mutual supervision to address these challenges. Specifically, we introduce two subnetworks with different architecture incorporating a conflict-aware distinct feature learning (CDFL) regularization to avoid the homogenization of subnetworks while promoting diversified predictions. To handle potential inaccurate predictions, we introduce a geometry-aware mutual pseudo supervision (GMPS) regularization to determine the reliability of conflicting pseudo-labels of unlabeled images, and selectively leverage the more reliable pseudo-labels in the two subnetworks to supervise the other one. The synergistic learning between CDFL and GMPS regularizations during the training process facilitates each subnetwork to selectively incorporates reliable knowledge from the other subnetwork, thereby helping the model overcome cognitive bias. Extensive experiments on three public medical image datasets demonstrate that the proposed CSSML achieves an average of 80.65% DSC, 87.83% Precision, and 14.48mm 95HD using only 20% labeled data, highlight-ing its superior performance. The code is available at: <span><span>https://github.com/Mwnic-AI/CSSML</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131544"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-02-06DOI: 10.1016/j.eswa.2026.131494
Shuaipeng Ding , Jianan Shui , Mingyuan Ge , Mengnan Fan , Xin Li , Yijie Zhu , Mingyong Li
Automated radiology report generation has emerged as a crucial technology for improving clinical workflow efficiency and alleviating the documentation burden on radiologists. Current approaches predominantly employ encoder-decoder architectures, they often overemphasize text generation while neglecting two critical issues: inherent biases in textual data distribution that limit abnormal region descriptions, and inadequate cross- modal interaction. To address these challenges, we propose an innovative Image-Tag Adapter (ITAdapter) framework that dynamically balances visual information and diagnostic information during decoding, with particular attention to optimizing feature selection for different types of generated words. The framework incorporates two key components: a Retrieval Knowledge Enhancer (RKE) that utilizes pre-trained CLIP models’ cross-modal retrieval capability to obtain relevant clinical reports as diagnostic references, and an Image-Tag Adapter (ITA) that intelligently fuses visual information with diagnostic information from disease tags. For model optimization, we combine reinforcement learning with knowledge distillation to enable effective knowledge transfer through iterative training. Extensive experiments on IU X-ray and MIMIC-CXR benchmark datasets demonstrate our method’s effectiveness in generating more accurate and clinically relevant reports, achieving the highest performance scores: on IU X-ray, BLEU-1 = 0.536, BLEU-4 = 0.206 and METEOR = 0.220; on MIMIC-CXR, BLEU-1 = 0.411, BLEU-4 = 0.141 and METEOR = 0.152.
{"title":"ITAdapter: Image-Tag adapter framework with retrieval knowledge enhancer for radiology report generation","authors":"Shuaipeng Ding , Jianan Shui , Mingyuan Ge , Mengnan Fan , Xin Li , Yijie Zhu , Mingyong Li","doi":"10.1016/j.eswa.2026.131494","DOIUrl":"10.1016/j.eswa.2026.131494","url":null,"abstract":"<div><div>Automated radiology report generation has emerged as a crucial technology for improving clinical workflow efficiency and alleviating the documentation burden on radiologists. Current approaches predominantly employ encoder-decoder architectures, they often overemphasize text generation while neglecting two critical issues: inherent biases in textual data distribution that limit abnormal region descriptions, and inadequate cross- modal interaction. To address these challenges, we propose an innovative Image-Tag Adapter (ITAdapter) framework that dynamically balances visual information and diagnostic information during decoding, with particular attention to optimizing feature selection for different types of generated words. The framework incorporates two key components: a Retrieval Knowledge Enhancer (RKE) that utilizes pre-trained CLIP models’ cross-modal retrieval capability to obtain relevant clinical reports as diagnostic references, and an Image-Tag Adapter (ITA) that intelligently fuses visual information with diagnostic information from disease tags. For model optimization, we combine reinforcement learning with knowledge distillation to enable effective knowledge transfer through iterative training. Extensive experiments on IU X-ray and MIMIC-CXR benchmark datasets demonstrate our method’s effectiveness in generating more accurate and clinically relevant reports, achieving the highest performance scores: on IU X-ray, BLEU-1 = 0.536, BLEU-4 = 0.206 and METEOR = 0.220; on MIMIC-CXR, BLEU-1 = 0.411, BLEU-4 = 0.141 and METEOR = 0.152.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131494"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tongue imaging serves as a valuable diagnostic modality, particularly in Traditional Chinese Medicine (TCM). The quality of tongue surface segmentation significantly affects the accuracy of tongue image classification and subsequent diagnosis in intelligent tongue diagnosis systems. However, existing research on tongue image segmentation exhibits significant limitations, including sensitivity to lighting and background noise, similarity in color with surrounding tissues, and a lack of robust and user-friendly segmentation tools. This paper proposes a tongue image segmentation method (TOM) based on multi-teacher knowledge distillation. By introducing a novel diffusion-based data augmentation method, we notably improved the generalization ability of the segmentation model while reducing its parameter size. Notably, after reducing the parameter count by 96.6% compared to the largest teacher models, the student model still achieves an impressive segmentation performance of 95.22% mIoU. Furthermore, we packaged and deployed the trained model as an online and offline segmentation tool (available at https://itongue.cn/), allowing TCM practitioners and researchers to use it without any programming experience. We also present a case study on TCM constitution classification using segmented tongue patches. Experimental results demonstrate that training with tongue patches yields higher classification performance and better interpretability than original tongue images. To the best of our knowledge, this is the first open-source and freely available tongue image segmentation tool.
{"title":"TOM: An open-source tongue segmentation method with multi-teacher distillation and task-specific data augmentation","authors":"Jiacheng Xie , Ziyang Zhang , Biplab Poudel , Congyu Guo , Yang Yu , Guanghui An , Xiaoting Tang , Lening Zhao , Chunhui Xu , Dong Xu","doi":"10.1016/j.eswa.2026.131499","DOIUrl":"10.1016/j.eswa.2026.131499","url":null,"abstract":"<div><div>Tongue imaging serves as a valuable diagnostic modality, particularly in Traditional Chinese Medicine (TCM). The quality of tongue surface segmentation significantly affects the accuracy of tongue image classification and subsequent diagnosis in intelligent tongue diagnosis systems. However, existing research on tongue image segmentation exhibits significant limitations, including sensitivity to lighting and background noise, similarity in color with surrounding tissues, and a lack of robust and user-friendly segmentation tools. This paper proposes a <strong>to</strong>ngue image segmentation <strong>m</strong><strong>ethod</strong> (TOM) based on multi-teacher knowledge distillation. By introducing a novel diffusion-based data augmentation method, we notably improved the generalization ability of the segmentation model while reducing its parameter size. Notably, after reducing the parameter count by 96.6% compared to the largest teacher models, the student model still achieves an impressive segmentation performance of 95.22% mIoU. Furthermore, we packaged and deployed the trained model as an online and offline segmentation tool (available at <span><span>https://itongue.cn/</span><svg><path></path></svg></span>), allowing TCM practitioners and researchers to use it without any programming experience. We also present a case study on TCM constitution classification using segmented tongue patches. Experimental results demonstrate that training with tongue patches yields higher classification performance and better interpretability than original tongue images. To the best of our knowledge, this is the first open-source and freely available tongue image segmentation tool.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131499"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-02-08DOI: 10.1016/j.eswa.2026.131547
Wei Wu , Xiaohui Hou , Minggang Gan , Jie Chen
Ensuring the safety and robustness of autonomous vehicles (AVs) in complex and safety–critical driving scenarios remains a fundamental challenge in the advancement of autonomous driving technology. Traditional training methods often exhibit limitations in coping with uncertainty and rare extreme events encountered in real-world driving environments. To address these challenges, this paper proposes an adversarial learning framework guided by the Zone of Proximal Development (ZPD), aiming to enhance the adaptability and robustness of autonomous driving decision-making policies in complex environments. Specifically, the proposed approach embeds ZPD-inspired guidance into adversarial learning to generate safety–critical traffic interactions that are both extreme and learnable. To regulate adversarial behaviors and maintain a balance between challenge and solvability, the framework incorporates structured constraints based on the Ideal Return Ceiling (IRC) and fine-grained collision severity modeling. Furthermore, a Vehicle Potential Threat Level (VPTL) mechanism is employed to adaptively adjust adversarial training difficulty in accordance with the evolving capability of the ego vehicle, thereby facilitating continuous learning and policy adaptation. Experimental results indicate that, compared with representative baseline methods such as SAC and TD3, the proposed approach reduces the Damage Index by approximately 20–40% across a wide range of evaluation settings, while simultaneously lowering collision severity and maintaining task executability. These results suggest that the proposed framework provides a viable approach for improving safety-oriented learning behavior in complex traffic environments.
确保自动驾驶汽车(AVs)在复杂和安全关键驾驶场景中的安全性和鲁棒性仍然是自动驾驶技术进步的根本挑战。传统的训练方法在应对现实驾驶环境中遇到的不确定性和罕见的极端事件时往往表现出局限性。为了应对这些挑战,本文提出了一种基于近端发展区(Zone of Proximal Development, ZPD)的对抗学习框架,旨在增强自动驾驶决策策略在复杂环境下的适应性和鲁棒性。具体来说,提出的方法将受zpd启发的指导嵌入到对抗性学习中,以生成对安全至关重要的交通交互,这些交互既极端又可学习。为了调节对抗行为并保持挑战和可解决性之间的平衡,该框架结合了基于理想回报上限(IRC)和细粒度碰撞严重性建模的结构化约束。利用车辆潜在威胁等级(VPTL)机制,根据自我车辆的演化能力自适应调整对抗训练难度,实现持续学习和政策适应。实验结果表明,与具有代表性的基线方法(如SAC和TD3)相比,该方法在广泛的评估设置范围内将损伤指数降低了约20-40%,同时降低了碰撞严重性并保持了任务的可执行性。这些结果表明,所提出的框架为改善复杂交通环境中以安全为导向的学习行为提供了一种可行的方法。
{"title":"ZPD-guided adversarial learning for safety-critical autonomous driving","authors":"Wei Wu , Xiaohui Hou , Minggang Gan , Jie Chen","doi":"10.1016/j.eswa.2026.131547","DOIUrl":"10.1016/j.eswa.2026.131547","url":null,"abstract":"<div><div>Ensuring the safety and robustness of autonomous vehicles (AVs) in complex and safety–critical driving scenarios remains a fundamental challenge in the advancement of autonomous driving technology. Traditional training methods often exhibit limitations in coping with uncertainty and rare extreme events encountered in real-world driving environments. To address these challenges, this paper proposes an adversarial learning framework guided by the Zone of Proximal Development (ZPD), aiming to enhance the adaptability and robustness of autonomous driving decision-making policies in complex environments. Specifically, the proposed approach embeds ZPD-inspired guidance into adversarial learning to generate safety–critical traffic interactions that are both extreme and learnable. To regulate adversarial behaviors and maintain a balance between challenge and solvability, the framework incorporates structured constraints based on the Ideal Return Ceiling (IRC) and fine-grained collision severity modeling. Furthermore, a Vehicle Potential Threat Level (VPTL) mechanism is employed to adaptively adjust adversarial training difficulty in accordance with the evolving capability of the ego vehicle, thereby facilitating continuous learning and policy adaptation. Experimental results indicate that, compared with representative baseline methods such as SAC and TD3, the proposed approach reduces the Damage Index by approximately 20–40% across a wide range of evaluation settings, while simultaneously lowering collision severity and maintaining task executability. These results suggest that the proposed framework provides a viable approach for improving safety-oriented learning behavior in complex traffic environments.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"313 ","pages":"Article 131547"},"PeriodicalIF":7.5,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}