首页 > 最新文献

Displays最新文献

英文 中文
Hybrid detection model for unauthorized use of doctor’s code in health insurance: Integrating rule-based screening and LLM reasoning 医疗保险中未经授权使用医生代码的混合检测模型:整合基于规则的筛选和LLM推理
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-28 DOI: 10.1016/j.displa.2026.103359
Qiwen Yuan , Jiajie Chen , Zhendong Shi
Unauthorized use of doctor’s code is a high-risk and context-dependent issue in health insurance supervision. Traditional rule-based screening achieves high recall but often produces false positives in cases that appear anomalous yet are clinically legitimate, such as telemedicine encounters, refund-related re-settlements, and rapid outpatient–emergency transitions. These methods lack semantic understanding of medical context and rely heavily on manual auditing. We propose a hybrid detection framework that integrates rule-based temporal filtering with large language model (LLM)–based semantic reasoning. Time-threshold rules are first applied to extract suspected cases from real health-insurance claim data. Expert-derived legitimate scenario patterns are then embedded into structured prompts to guide the LLM in semantic plausibility assessment and false-positive reduction. For evaluation, we construct a 240-pair multi-scenario benchmark dataset from de-identified real claim records, covering both reasonable and suspicious situations. Zero-shot experiments with DeepSeek-R1-7B show that the framework achieves 75% accuracy and 87% precision in distinguishing reasonable from unauthorized cases. These results indicate that the proposed method can effectively reduce false alarms and alleviate manual audit workload, providing a practical and efficient solution for real-world health-insurance supervision.
在医疗保险监管中,未经授权使用医生密码是一个高风险且与环境相关的问题。传统的基于规则的筛查具有很高的召回率,但在看似异常但临床上合法的病例中,如远程医疗就诊、退款相关的重新安置和快速门诊-急诊转换,往往会产生假阳性。这些方法缺乏对医学上下文的语义理解,并且严重依赖于人工审核。我们提出了一种将基于规则的时间过滤与基于大语言模型(LLM)的语义推理相结合的混合检测框架。首先应用时间阈值规则从真实的医疗保险索赔数据中提取疑似病例。然后,专家派生的合法场景模式被嵌入到结构化提示中,以指导法学硕士进行语义合理性评估和减少误报。为了评估,我们从去识别的真实索赔记录中构建了240对多场景基准数据集,涵盖了合理和可疑的情况。基于DeepSeek-R1-7B的零射击实验表明,该框架在区分合理和未经授权的情况下达到75%的准确率和87%的精度。结果表明,该方法可以有效地减少误报,减轻人工审计工作量,为现实世界的医疗保险监管提供了一种实用高效的解决方案。
{"title":"Hybrid detection model for unauthorized use of doctor’s code in health insurance: Integrating rule-based screening and LLM reasoning","authors":"Qiwen Yuan ,&nbsp;Jiajie Chen ,&nbsp;Zhendong Shi","doi":"10.1016/j.displa.2026.103359","DOIUrl":"10.1016/j.displa.2026.103359","url":null,"abstract":"<div><div>Unauthorized use of doctor’s code is a high-risk and context-dependent issue in health insurance supervision. Traditional rule-based screening achieves high recall but often produces false positives in cases that appear anomalous yet are clinically legitimate, such as telemedicine encounters, refund-related re-settlements, and rapid outpatient–emergency transitions. These methods lack semantic understanding of medical context and rely heavily on manual auditing. We propose a hybrid detection framework that integrates rule-based temporal filtering with large language model (LLM)–based semantic reasoning. Time-threshold rules are first applied to extract suspected cases from real health-insurance claim data. Expert-derived legitimate scenario patterns are then embedded into structured prompts to guide the LLM in semantic plausibility assessment and false-positive reduction. For evaluation, we construct a 240-pair multi-scenario benchmark dataset from de-identified real claim records, covering both reasonable and suspicious situations. Zero-shot experiments with DeepSeek-R1-7B show that the framework achieves 75% accuracy and 87% precision in distinguishing reasonable from unauthorized cases. These results indicate that the proposed method can effectively reduce false alarms and alleviate manual audit workload, providing a practical and efficient solution for real-world health-insurance supervision.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103359"},"PeriodicalIF":3.4,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Content-adaptive dual feature selection for infrared aerial video compressive sensing reconstruction 基于内容自适应的航空红外视频压缩感知重构双特征选择
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-27 DOI: 10.1016/j.displa.2026.103368
Hao Liu , Maoji Qiu , Rong Huang
For block compressive sensing (BCS) of natural videos, existing reconstruction algorithms typically utilize nonlocal self-similarity (NSS) to generate sparse residuals, thereby achieving favorable recovery performance by exploiting the statistical characteristics of key frames and non-key frames. However, when applied to multi-perspective infrared aerial videos rather than natural videos, these reconstruction algorithms usually result in poor recovery quality because of the inflexibility in selecting similar patches and poor adaptability to dynamic scene changes. Due to the distribution property of infrared aerial imagery, inter-frame and intra-frame similar patches should be selected adaptively so that an accurate dictionary matrix can be learned. Therefore, this paper proposes a content-adaptive dual feature selection mechanism. It first conducts a rough screening of inter-frame and intra-frame similar patches based on the correlation of observed measurement vectors across frames. Then, it is followed by a fine screening stage, where principal component analysis (PCA) is applied to project the similar patch-group matrix into a low-dimensional space. Finally, the split Bregman iteration (SBI) is employed to solve the BCS reconstruction for infrared aerial video. Experimental results on both HIT-UAV and M200-XT2DroneVehicle datasets demonstrate that the proposed algorithm achieves better recovery quality compared to state-of-the-art algorithms.
对于自然视频的块压缩感知(BCS),现有的重建算法通常利用非局部自相似(NSS)来产生稀疏残差,从而利用关键帧和非关键帧的统计特征来获得良好的恢复性能。然而,当这些重建算法应用于多视角红外航拍视频而非自然视频时,由于选取相似patch的灵活性不强,对场景动态变化的适应性较差,往往导致恢复质量较差。由于红外航拍图像的分布特性,需要自适应地选择帧间和帧内的相似patch,从而学习到准确的字典矩阵。为此,本文提出了一种内容自适应的双特征选择机制。首先,基于观测到的测量向量跨帧的相关性,对帧间和帧内的相似patch进行粗略筛选。然后,进入精细筛选阶段,应用主成分分析(PCA)将相似的patch-group矩阵投影到低维空间中。最后,采用分割布雷格曼迭代(SBI)解决红外航拍视频的BCS重构问题。在HIT-UAV和m200 - xt2drone - vehicle数据集上的实验结果表明,与现有算法相比,该算法获得了更好的恢复质量。
{"title":"Content-adaptive dual feature selection for infrared aerial video compressive sensing reconstruction","authors":"Hao Liu ,&nbsp;Maoji Qiu ,&nbsp;Rong Huang","doi":"10.1016/j.displa.2026.103368","DOIUrl":"10.1016/j.displa.2026.103368","url":null,"abstract":"<div><div>For block compressive sensing (BCS) of natural videos, existing reconstruction algorithms typically utilize nonlocal self-similarity (NSS) to generate sparse residuals, thereby achieving favorable recovery performance by exploiting the statistical characteristics of key frames and non-key frames. However, when applied to multi-perspective infrared aerial videos rather than natural videos, these reconstruction algorithms usually result in poor recovery quality because of the inflexibility in selecting similar patches and poor adaptability to dynamic scene changes. Due to the distribution property of infrared aerial imagery, inter-frame and intra-frame similar patches should be selected adaptively so that an accurate dictionary matrix can be learned. Therefore, this paper proposes a content-adaptive dual feature selection mechanism. It first conducts a rough screening of inter-frame and intra-frame similar patches based on the correlation of observed measurement vectors across frames. Then, it is followed by a fine screening stage, where principal component analysis (PCA) is applied to project the similar patch-group matrix into a low-dimensional space. Finally, the split Bregman iteration (SBI) is employed to solve the BCS reconstruction for infrared aerial video. Experimental results on both HIT-UAV and M200-XT2DroneVehicle datasets demonstrate that the proposed algorithm achieves better recovery quality compared to state-of-the-art algorithms.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103368"},"PeriodicalIF":3.4,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trans-MT: a 3D semi-supervised glioma segmentation model integrating transformer architecture and asymmetric data augmentation Trans-MT:一种集成变压器结构和非对称数据增强的3D半监督胶质瘤分割模型
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-27 DOI: 10.1016/j.displa.2026.103365
Yuehui Liao , Yun Zheng , Yingjie Jiao , Na Tang , Yuhao Wang , Yu Hu , Yaning Feng , Ruofan Wang , Qun Jin , Xiaobo Lai , Panfei Li
Accurate glioma segmentation in magnetic resonance imaging (MRI) is crucial for effective diagnosis and treatment planning in neuro-oncology; however, this process is often time-consuming and heavily reliant on expert annotations. To address these limitations, we present Trans-MT, a 3D semi-supervised segmentation model that integrates a transformer-based architecture with asymmetric data augmentation, achieving high segmentation accuracy with limited labeled data. Trans-MT employs a teacher-student framework: the teacher model generates reliable pseudo-labels for unlabeled data, while the student model learns through supervised and consistency losses, guided by an uncertainty-aware mechanism to refine its predictions. The architecture of Trans-MT features a hybrid encoder, nnUFormer, which combines the robust capabilities of nn-UNet with transformers, enabling it to capture global contextual information essential for accurate tumor segmentation. This design enhances the model’s ability to detect intricate tumor structures within MRI scans, even with sparse annotations. Additionally, the model’s learning process is strengthened by asymmetric data augmentation, which enriches data diversity and robustness. We evaluated Trans-MT on the BraTS 2019, 2020, and 2021 datasets, where it demonstrated superior performance over several state-of-the-art semi-supervised models, particularly in segmenting challenging tumor sub-regions. The results confirm that Trans-MT significantly improves segmentation precision, making it a valuable advancement in brain tumor segmentation methodology and a practical solution for clinical settings with limited labeled data. Our code is available at https://github.com/smallboy-code/TransMT.
核磁共振成像(MRI)中胶质瘤的准确分割对神经肿瘤的有效诊断和治疗计划至关重要。然而,这个过程通常很耗时,并且严重依赖于专家注释。为了解决这些限制,我们提出了Trans-MT,这是一种3D半监督分割模型,它集成了基于变压器的架构和非对称数据增强,在有限的标记数据下实现了高分割精度。Trans-MT采用教师-学生框架:教师模型为未标记的数据生成可靠的伪标签,而学生模型通过监督和一致性损失进行学习,在不确定性意识机制的指导下改进其预测。Trans-MT的结构特点是混合编码器nnUFormer,它结合了nn-UNet与变压器的强大功能,使其能够捕获准确肿瘤分割所必需的全局上下文信息。这种设计增强了模型在MRI扫描中检测复杂肿瘤结构的能力,即使有稀疏的注释。此外,通过非对称数据增强增强了模型的学习过程,丰富了数据的多样性和鲁棒性。我们在BraTS 2019年、2020年和2021年的数据集上评估了Trans-MT,在这些数据集上,它比几个最先进的半监督模型表现出了更好的性能,特别是在分割具有挑战性的肿瘤子区域方面。结果证实,Trans-MT显著提高了分割精度,使其成为脑肿瘤分割方法的一个有价值的进步,并为临床环境中有限的标记数据提供了实用的解决方案。我们的代码可在https://github.com/smallboy-code/TransMT上获得。
{"title":"Trans-MT: a 3D semi-supervised glioma segmentation model integrating transformer architecture and asymmetric data augmentation","authors":"Yuehui Liao ,&nbsp;Yun Zheng ,&nbsp;Yingjie Jiao ,&nbsp;Na Tang ,&nbsp;Yuhao Wang ,&nbsp;Yu Hu ,&nbsp;Yaning Feng ,&nbsp;Ruofan Wang ,&nbsp;Qun Jin ,&nbsp;Xiaobo Lai ,&nbsp;Panfei Li","doi":"10.1016/j.displa.2026.103365","DOIUrl":"10.1016/j.displa.2026.103365","url":null,"abstract":"<div><div>Accurate glioma segmentation in magnetic resonance imaging (MRI) is crucial for effective diagnosis and treatment planning in neuro-oncology; however, this process is often time-consuming and heavily reliant on expert annotations. To address these limitations, we present Trans-MT, a 3D semi-supervised segmentation model that integrates a transformer-based architecture with asymmetric data augmentation, achieving high segmentation accuracy with limited labeled data. Trans-MT employs a teacher-student framework: the teacher model generates reliable pseudo-labels for unlabeled data, while the student model learns through supervised and consistency losses, guided by an uncertainty-aware mechanism to refine its predictions. The architecture of Trans-MT features a hybrid encoder, nnUFormer, which combines the robust capabilities of nn-UNet with transformers, enabling it to capture global contextual information essential for accurate tumor segmentation. This design enhances the model’s ability to detect intricate tumor structures within MRI scans, even with sparse annotations. Additionally, the model’s learning process is strengthened by asymmetric data augmentation, which enriches data diversity and robustness. We evaluated Trans-MT on the BraTS 2019, 2020, and 2021 datasets, where it demonstrated superior performance over several state-of-the-art semi-supervised models, particularly in segmenting challenging tumor sub-regions. The results confirm that Trans-MT significantly improves segmentation precision, making it a valuable advancement in brain tumor segmentation methodology and a practical solution for clinical settings with limited labeled data. Our code is available<!--> <!-->at <span><span>https://github.com/smallboy-code/TransMT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103365"},"PeriodicalIF":3.4,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning video normality for anomaly detection via multi-scale spatiotemporal feature extraction and a feature memory module 基于多尺度时空特征提取和特征记忆模块的视频常态学习异常检测
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-22 DOI: 10.1016/j.displa.2026.103355
Yongqing Huo, Wenke Jiang
Video anomaly detection (VAD) is critical for automated identification of anomalous behaviors in surveillance system, with applications in public safety, intelligent transportation and healthcare. However, with the continuous expansion of application domains, ensuring that VAD algorithm maintains excellent detection performance across diverse scenarios has become the primary focus of current research direction. To enhance the robustness of detection across various environments, we propose a novel autoencoder-based model in this paper. Compared with other algorithms, our method can more effectively exploit multi-scale feature information within frames for learning feature distribution. In the encoder, we construct the convolutional module with multiple kernel sizes and incorporate the designed Spatial-Channel Transformer Attention (SCTA) module to strengthen the feature representation. In the decoder, we integrate the multi-scale feature reconstruction module with Self-Supervised Predictive Convolutional Attentive Blocks (SSPCAB) for more accurate next-frame prediction. Moreover, we introduce a dedicated memory module to capture and store the distribution of normal data patterns. Meanwhile, the architecture employs the Conv-LSTM and a specially designed Temporal-Spatial Attention (TSA) module in skip connections to capture spatiotemporal dependencies across video frames. Benefiting from the design and integration of those modules, our proposed method achieves superior detection performance on public datasets, including UCSD Ped2, CUHK Avenue and ShanghaiTech. The experimental results demonstrate the effectiveness and versatility of our method in anomaly detection tasks.
视频异常检测(VAD)是监控系统中异常行为自动识别的关键,在公共安全、智能交通和医疗保健等领域都有广泛的应用。然而,随着应用领域的不断扩展,确保VAD算法在不同场景下保持优异的检测性能已成为当前研究方向的首要焦点。为了提高检测在不同环境下的鲁棒性,本文提出了一种新的基于自编码器的模型。与其他算法相比,我们的方法可以更有效地利用帧内的多尺度特征信息来学习特征分布。在编码器中,我们构建了具有多个核大小的卷积模块,并结合设计的空间通道变压器注意(SCTA)模块来增强特征表示。在解码器中,我们将多尺度特征重构模块与自监督预测卷积关注块(SSPCAB)相结合,以实现更准确的下一帧预测。此外,我们还引入了一个专用的内存模块来捕获和存储正常数据模式的分布。同时,该架构采用了convl - lstm和在跳过连接中特别设计的时空注意(Temporal-Spatial Attention, TSA)模块来捕获视频帧之间的时空依赖性。得益于这些模块的设计和集成,我们提出的方法在UCSD Ped2、CUHK Avenue和ShanghaiTech等公共数据集上取得了优异的检测性能。实验结果证明了该方法在异常检测任务中的有效性和通用性。
{"title":"Learning video normality for anomaly detection via multi-scale spatiotemporal feature extraction and a feature memory module","authors":"Yongqing Huo,&nbsp;Wenke Jiang","doi":"10.1016/j.displa.2026.103355","DOIUrl":"10.1016/j.displa.2026.103355","url":null,"abstract":"<div><div>Video anomaly detection (VAD) is critical for automated identification of anomalous behaviors in surveillance system, with applications in public safety, intelligent transportation and healthcare. However, with the continuous expansion of application domains, ensuring that VAD algorithm maintains excellent detection performance across diverse scenarios has become the primary focus of current research direction. To enhance the robustness of detection across various environments, we propose a novel autoencoder-based model in this paper. Compared with other algorithms, our method can more effectively exploit multi-scale feature information within frames for learning feature distribution. In the encoder, we construct the convolutional module with multiple kernel sizes and incorporate the designed Spatial-Channel Transformer Attention (SCTA) module to strengthen the feature representation. In the decoder, we integrate the multi-scale feature reconstruction module with Self-Supervised Predictive Convolutional Attentive Blocks (SSPCAB) for more accurate next-frame prediction. Moreover, we introduce a dedicated memory module to capture and store the distribution of normal data patterns. Meanwhile, the architecture employs the Conv-LSTM and a specially designed Temporal-Spatial Attention (TSA) module in skip connections to capture spatiotemporal dependencies across video frames. Benefiting from the design and integration of those modules, our proposed method achieves superior detection performance on public datasets, including UCSD Ped2, CUHK Avenue and ShanghaiTech. The experimental results demonstrate the effectiveness and versatility of our method in anomaly detection tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103355"},"PeriodicalIF":3.4,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-attention-based mixture-of-experts framework for non-invasive prediction of MGMT promoter methylation in glioblastoma using multi-modal MRI 利用多模态MRI对胶质母细胞瘤中MGMT启动子甲基化进行无创预测的基于自我注意的专家混合框架
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-21 DOI: 10.1016/j.displa.2026.103358
Yuehui Liao , Yun Zheng , Jingyu Zhu , Yu Chen , Feng Gao , Yaning Feng , Weiji Yang , Guang Yang , Xiaobo Lai , Panfei Li
Glioblastoma (GBM) is an aggressive brain tumor associated with poor prognosis and limited treatment options. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT) promoter is a critical biomarker for predicting the efficacy of temozolomide chemotherapy in GBM patients. However, current methods for determining MGMT promoter methylation, including invasive and costly techniques, hinder their widespread clinical application. In this study, we propose a novel non-invasive deep learning framework based on a Mixture-of-Experts (MoE) architecture for predicting MGMT promoter methylation status using multi-modal magnetic resonance imaging (MRI) data. Our MoE model incorporates modality-specific expert networks built on the ResNet18 architecture, with a self-attention-based gating mechanism that dynamically selects and integrates the most relevant features across MRI modalities (T1-weighted, contrast-enhanced T1, T2-weighted, and fluid-attenuated inversion recovery). We evaluate the proposed framework on the BraTS2021 and TCGA-GBM datasets, showing superior performance compared to conventional deep learning models in terms of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Furthermore, Grad-CAM visualizations provide enhanced interpretability by highlighting biologically relevant regions in the tumor and peritumoral areas that influence model predictions. The proposed framework represents a promising tool for integrating imaging biomarkers into precision oncology workflows, offering a scalable, cost-effective, and interpretable solution for non-invasive MGMT methylation prediction in GBM.
胶质母细胞瘤(GBM)是一种侵袭性脑肿瘤,预后差,治疗方案有限。o6 -甲基鸟嘌呤- dna甲基转移酶(MGMT)启动子的甲基化状态是预测替莫唑胺化疗对GBM患者疗效的关键生物标志物。然而,目前确定MGMT启动子甲基化的方法,包括侵入性和昂贵的技术,阻碍了它们的广泛临床应用。在本研究中,我们提出了一种基于专家混合(MoE)架构的新型非侵入性深度学习框架,用于使用多模态磁共振成像(MRI)数据预测MGMT启动子甲基化状态。我们的MoE模型结合了基于ResNet18架构的特定模态专家网络,以及基于自注意力的门控机制,可以动态选择和集成MRI模态(T1加权、对比度增强T1、t2加权和流体衰减反演恢复)中最相关的特征。我们在BraTS2021和TCGA-GBM数据集上评估了所提出的框架,与传统的深度学习模型相比,在准确性、灵敏度、特异性和接受者工作特征曲线下面积(AUC)方面表现出优越的性能。此外,通过强调影响模型预测的肿瘤和肿瘤周围区域的生物学相关区域,Grad-CAM可视化提供了增强的可解释性。所提出的框架代表了将成像生物标志物整合到精确肿瘤学工作流程中的有前途的工具,为GBM的非侵入性MGMT甲基化预测提供了可扩展、经济高效和可解释的解决方案。
{"title":"Self-attention-based mixture-of-experts framework for non-invasive prediction of MGMT promoter methylation in glioblastoma using multi-modal MRI","authors":"Yuehui Liao ,&nbsp;Yun Zheng ,&nbsp;Jingyu Zhu ,&nbsp;Yu Chen ,&nbsp;Feng Gao ,&nbsp;Yaning Feng ,&nbsp;Weiji Yang ,&nbsp;Guang Yang ,&nbsp;Xiaobo Lai ,&nbsp;Panfei Li","doi":"10.1016/j.displa.2026.103358","DOIUrl":"10.1016/j.displa.2026.103358","url":null,"abstract":"<div><div>Glioblastoma (GBM) is an aggressive brain tumor associated with poor prognosis and limited treatment options. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT) promoter is a critical biomarker for predicting the efficacy of temozolomide chemotherapy in GBM patients. However, current methods for determining MGMT promoter methylation, including invasive and costly techniques, hinder their widespread clinical application. In this study, we propose a novel non-invasive deep learning framework based on a Mixture-of-Experts (MoE) architecture for predicting MGMT promoter methylation status using multi-modal magnetic resonance imaging (MRI) data. Our MoE model incorporates modality-specific expert networks built on the ResNet18 architecture, with a self-attention-based gating mechanism that dynamically selects and integrates the most relevant features across MRI modalities (T1-weighted, contrast-enhanced T1, T2-weighted, and fluid-attenuated inversion recovery). We evaluate the proposed framework on the BraTS2021 and TCGA-GBM datasets, showing superior performance compared to conventional deep learning models in terms of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Furthermore, Grad-CAM visualizations provide enhanced interpretability by highlighting biologically relevant regions in the tumor and peritumoral areas that influence model predictions. The proposed framework represents a promising tool for integrating imaging biomarkers into precision oncology workflows, offering a scalable, cost-effective, and interpretable solution for non-invasive MGMT methylation prediction in GBM.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103358"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing differentiable geometry and orientation attention for semi-supervised vessel segmentation with limited annotations 利用可微几何和方向注意进行有限注释的半监督血管分割
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-21 DOI: 10.1016/j.displa.2026.103347
Yan Liu , Yan Yang , Yongquan Jiang , Xiaole Zhao , Liang Fan
The precise segmentation of vascular structures is vital for diagnosing retinal and coronary artery diseases. However, the complex morphology and large structural variability of blood vessels make manual annotation time-consuming and finite which in turn limits the scalability of supervised segmentation methods. We propose a semi-supervised segmentation framework named geometric orientational fusion attention network (GOFA-Net) that integrates differentiable geometric augmentation and orientation-aware attention to effectively leverage knowledge from limited annotations. GOFA-Net comprises three key complementary components: 1) a differentiable geometric augmentation strategy (DGAS) employs quaternion-based representations to diversify training samples while preserving prediction consistency between teacher and student models; 2) a multi-view fusion module (MVFM) orchestrates collaborative feature learning between quaternion and conventional convolutional streams to capture comprehensive spatial dependencies; and 3) a global orientational attention module (GOAM) enhances structural awareness through direction-sensitive geometric embeddings, specifically reinforcing the perception of vascular topology along horizontal and vertical orientations. Extensive validation on multiple retinal vessel datasets (DRIVE, STARE, CHASE_DB1, and HRF) and coronary angiography datasets (DCA1 and CHUAC) show that GOFA-Net consistently outperforms state-of-the-art semi-supervised methods, achieving particularly notable gains in scenarios with limited annotations.
血管结构的精确分割对视网膜和冠状动脉疾病的诊断至关重要。然而,血管的复杂形态和大的结构可变性使得人工标注耗时且有限,从而限制了监督分割方法的可扩展性。我们提出了一种半监督分割框架,称为几何方向融合注意网络(GOFA-Net),该框架集成了可微几何增强和方向感知注意,以有效利用有限注释中的知识。GOFA-Net包括三个关键的互补部分:1)可微分几何增强策略(DGAS)采用基于四元数的表示来多样化训练样本,同时保持师生模型之间的预测一致性;2)多视图融合模块(MVFM)协调四元数和传统卷积流之间的协同特征学习,以捕获全面的空间依赖性;3)全局定向注意模块(GOAM)通过方向敏感的几何嵌入增强结构意识,特别是加强血管拓扑在水平和垂直方向上的感知。在多个视网膜血管数据集(DRIVE、STARE、CHASE_DB1和HRF)和冠状动脉造影数据集(DCA1和CHUAC)上进行的广泛验证表明,GOFA-Net始终优于最先进的半监督方法,在注释有限的情况下取得了特别显著的进步。
{"title":"Harnessing differentiable geometry and orientation attention for semi-supervised vessel segmentation with limited annotations","authors":"Yan Liu ,&nbsp;Yan Yang ,&nbsp;Yongquan Jiang ,&nbsp;Xiaole Zhao ,&nbsp;Liang Fan","doi":"10.1016/j.displa.2026.103347","DOIUrl":"10.1016/j.displa.2026.103347","url":null,"abstract":"<div><div>The precise segmentation of vascular structures is vital for diagnosing retinal and coronary artery diseases. However, the complex morphology and large structural variability of blood vessels make manual annotation time-consuming and finite which in turn limits the scalability of supervised segmentation methods. We propose a semi-supervised segmentation framework named geometric orientational fusion attention network (GOFA-Net) that integrates differentiable geometric augmentation and orientation-aware attention to effectively leverage knowledge from limited annotations. GOFA-Net comprises three key complementary components: 1) a differentiable geometric augmentation strategy (DGAS) employs quaternion-based representations to diversify training samples while preserving prediction consistency between teacher and student models; 2) a multi-view fusion module (MVFM) orchestrates collaborative feature learning between quaternion and conventional convolutional streams to capture comprehensive spatial dependencies; and 3) a global orientational attention module (GOAM) enhances structural awareness through direction-sensitive geometric embeddings, specifically reinforcing the perception of vascular topology along horizontal and vertical orientations. Extensive validation on multiple retinal vessel datasets (DRIVE, STARE, CHASE_DB1, and HRF) and coronary angiography datasets (DCA1 and CHUAC) show that GOFA-Net consistently outperforms state-of-the-art semi-supervised methods, achieving particularly notable gains in scenarios with limited annotations.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103347"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RAP-SORT: Advanced Multi-Object Tracking for complex scenarios RAP-SORT:用于复杂场景的高级多目标跟踪
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-21 DOI: 10.1016/j.displa.2026.103361
Shuming Zhang , Yuhang Zhu , Yanhui Sun , Weiyong Liu , Zhangjin Huang
Multi-Object Tracking (MOT) aims to detect and associate objects across frames while maintaining consistent IDs. While some approaches leverage both strong and weak cues alongside camera compensation to improve association, they struggle in scenarios involving high object density or nonlinear motion. To address these challenges, we propose RAP-SORT, a novel MOT framework that introduces four key innovations. First, the Robust Tracklet Confidence Modeling (RTCM) module models trajectory confidence by smoothing updates and applying second-order difference adjustments for low-confidence cases. Second, the Advanced Observation-Centric Recovery (AOCR) module facilitates trajectory recovery via linear interpolation and backtracking. Third, the Pseudo-Depth IoU (PDIoU) metric integrates height and depth cues into IoU calculations for enhanced spatial awareness. Finally, the Window Denoising (WD) module is tailored for the DanceTrack dataset, effectively mitigating the creation of new tracks caused by misdetections. RAP-SORT sets a new state-of-the-art on the DanceTrack and MOT20 benchmarks, achieving HOTA scores of 66.7 and 64.2, surpassing the previous best by 1.0 and 0.3, respectively, while also delivering competitive performance on MOT17. Code and models will be available soon at https://github.com/levi5611/RAP-SORT.
多目标跟踪(MOT)旨在检测和关联跨帧的对象,同时保持一致的id。虽然有些方法同时利用强和弱线索以及相机补偿来改善关联,但它们在涉及高物体密度或非线性运动的场景中很困难。为了应对这些挑战,我们提出了RAP-SORT,这是一个新的MOT框架,引入了四个关键创新。首先,鲁棒Tracklet置信度建模(RTCM)模块通过平滑更新和对低置信度情况应用二阶差分调整来建模轨迹置信度。其次,先进的以观测为中心的恢复(AOCR)模块通过线性插值和回溯实现轨迹恢复。第三,伪深度IoU (PDIoU)度量将高度和深度线索集成到IoU计算中,以增强空间感知。最后,窗口去噪(WD)模块是为DanceTrack数据集量身定制的,有效地减少了由于误检测引起的新轨道的创建。RAP-SORT在DanceTrack和MOT20基准上取得了新的先进水平,分别比之前的最佳成绩高出1.0分和0.3分,达到了66.7分和64.2分,同时在MOT17上也表现出了竞争力。代码和模型将很快在https://github.com/levi5611/RAP-SORT上提供。
{"title":"RAP-SORT: Advanced Multi-Object Tracking for complex scenarios","authors":"Shuming Zhang ,&nbsp;Yuhang Zhu ,&nbsp;Yanhui Sun ,&nbsp;Weiyong Liu ,&nbsp;Zhangjin Huang","doi":"10.1016/j.displa.2026.103361","DOIUrl":"10.1016/j.displa.2026.103361","url":null,"abstract":"<div><div>Multi-Object Tracking (MOT) aims to detect and associate objects across frames while maintaining consistent IDs. While some approaches leverage both strong and weak cues alongside camera compensation to improve association, they struggle in scenarios involving high object density or nonlinear motion. To address these challenges, we propose RAP-SORT, a novel MOT framework that introduces four key innovations. First, the Robust Tracklet Confidence Modeling (RTCM) module models trajectory confidence by smoothing updates and applying second-order difference adjustments for low-confidence cases. Second, the Advanced Observation-Centric Recovery (AOCR) module facilitates trajectory recovery via linear interpolation and backtracking. Third, the Pseudo-Depth IoU (PDIoU) metric integrates height and depth cues into IoU calculations for enhanced spatial awareness. Finally, the Window Denoising (WD) module is tailored for the DanceTrack dataset, effectively mitigating the creation of new tracks caused by misdetections. RAP-SORT sets a new state-of-the-art on the DanceTrack and MOT20 benchmarks, achieving HOTA scores of 66.7 and 64.2, surpassing the previous best by 1.0 and 0.3, respectively, while also delivering competitive performance on MOT17. Code and models will be available soon at <span><span>https://github.com/levi5611/RAP-SORT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103361"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ctp2Fic: From coarse-grained token pruning to fine-grained token clustering for LVLM inference acceleration Ctp2Fic:从粗粒度令牌修剪到细粒度令牌聚类,用于LVLM推理加速
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-21 DOI: 10.1016/j.displa.2026.103360
Yulong Lei , Zishuo Wang , Jinglin Xu , Yuxin Peng
Large Vision–Language Models (LVLMs) excel in multimodal tasks, but their high computational cost, driven by the large number of image tokens, severely limits inference efficiency. While existing training-free methods reduce token counts to accelerate inference, they often struggle to preserve model performance. This trade-off between efficiency and accuracy poses the key challenge in accelerating Large Vision–Language Model (LVLM) inference without retraining. In this paper, we analyze the rank of attention matrices across layers and discover that image token redundancy peaks in two specific VLM layers: many tokens convey nearly identical information, yet still participate in subsequent computations. Leveraging this insight, we propose Ctp2Fic, a new two-stage coarse-to-fine token compression framework. Specifically, in the Coarse-grained Text-guided Pruning stage, we dynamically assign a weight to each visual token based on its semantic relevance to the input instruction and prune low-weight tokens that are unrelated to the task. During the Fine-grained Image-based Clustering stage, we apply a lightweight clustering algorithm to merge semantically similar tokens into compact, representative ones, thus further reducing the sequence length. Our framework requires no model fine-tuning and seamlessly integrates into existing LVLM inference pipelines. Extensive experiments demonstrate that Ctp2Fic outperforms state-of-the-art acceleration techniques in both inference speed and accuracy, achieving superior efficiency and performance without retraining.
大型视觉语言模型(Large Vision-Language Models, LVLMs)在多模态任务中表现优异,但由于大量图像标记的驱动,其计算成本高,严重限制了推理效率。虽然现有的无训练方法减少令牌计数以加速推理,但它们往往难以保持模型性能。这种效率和准确性之间的权衡是在不重新训练的情况下加速大型视觉语言模型(LVLM)推理的关键挑战。在本文中,我们分析了各层关注矩阵的秩,发现图像令牌冗余在两个特定的VLM层中达到峰值:许多令牌传递几乎相同的信息,但仍然参与后续的计算。利用这一见解,我们提出了Ctp2Fic,这是一个新的两阶段从粗到精的令牌压缩框架。具体而言,在粗粒度文本引导修剪阶段,我们根据每个视觉标记与输入指令的语义相关性动态分配权重,并修剪与任务无关的低权重标记。在基于细粒度图像的聚类阶段,我们应用轻量级聚类算法将语义相似的令牌合并为紧凑的、有代表性的令牌,从而进一步减少序列长度。我们的框架不需要模型微调,可以无缝地集成到现有的LVLM推理管道中。大量实验表明,Ctp2Fic在推理速度和准确性方面都优于最先进的加速技术,无需再训练即可实现卓越的效率和性能。
{"title":"Ctp2Fic: From coarse-grained token pruning to fine-grained token clustering for LVLM inference acceleration","authors":"Yulong Lei ,&nbsp;Zishuo Wang ,&nbsp;Jinglin Xu ,&nbsp;Yuxin Peng","doi":"10.1016/j.displa.2026.103360","DOIUrl":"10.1016/j.displa.2026.103360","url":null,"abstract":"<div><div>Large Vision–Language Models (LVLMs) excel in multimodal tasks, but their high computational cost, driven by the large number of image tokens, severely limits inference efficiency. While existing training-free methods reduce token counts to accelerate inference, they often struggle to preserve model performance. This trade-off between efficiency and accuracy poses the key challenge in accelerating Large Vision–Language Model (LVLM) inference without retraining. In this paper, we analyze the rank of attention matrices across layers and discover that image token redundancy peaks in two specific VLM layers: many tokens convey nearly identical information, yet still participate in subsequent computations. Leveraging this insight, we propose Ctp2Fic, a new two-stage coarse-to-fine token compression framework. Specifically, in the Coarse-grained Text-guided Pruning stage, we dynamically assign a weight to each visual token based on its semantic relevance to the input instruction and prune low-weight tokens that are unrelated to the task. During the Fine-grained Image-based Clustering stage, we apply a lightweight clustering algorithm to merge semantically similar tokens into compact, representative ones, thus further reducing the sequence length. Our framework requires no model fine-tuning and seamlessly integrates into existing LVLM inference pipelines. Extensive experiments demonstrate that Ctp2Fic outperforms state-of-the-art acceleration techniques in both inference speed and accuracy, achieving superior efficiency and performance without retraining.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103360"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MeAP: dual level memory strategy augmented transformer based visual object predictor MeAP:基于视觉对象预测器的双级存储策略增强变压器
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-20 DOI: 10.1016/j.displa.2026.103356
Shiliang Yan, Yinling Wang, Dandan Lu, Min Wang
The exploration and resolution of persistent noise incursions within the tracking sequences, especially the occlusion, illumination variations, and fast motion, have garnered substantial attention for their functional properties in enhancing the accuracy and robustness of visual object trackers. However, existing visual object trackers, equipped with template updating mechanisms or calibration strategy, heavily rely on time-consuming historical data to achieve optimal tracking performance, impeding their real-time tracking capabilities. To address these challenges, this paper introduces a long-short term dual level memory augmented transformer structure aided visual object predictor (MeAP). The key contributions of MeAP can be summarized as follows: 1) the formulation of a noise model for specific invasion events based on incursion effects and corresponding template strategies serving as the foundation for more efficient memory utilization; 2) The memory exploration scheme based online tracking mask-based feature extraction strategy and the transformer architecture is introduced to mitigate the impact of noise invasion during memory vector construction; 3) the memory utilization scheme based target basic feature and dual feature target mask predictor is provided to implement the scene-edge feature for mask-based feature extraction method and jointly predict the accurate location of the tracking target.. Extensive experiments conducted on OTB100, NFS, VOT2021, and AVisT benchmarks demonstrate that MeAP, with its introduced modules, achieves comparable tracking performances against other state-of-the-art (SOTA) trackers, and operates at an average speed of 31 frames per second (FPS) across 4 benchmarks.
跟踪序列中持续噪声入侵的探索和解决,特别是遮挡、光照变化和快速运动,因其在提高视觉目标跟踪器的准确性和鲁棒性方面的功能特性而获得了大量关注。然而,现有的视觉目标跟踪器大多采用模板更新机制或校准策略,严重依赖耗时的历史数据来实现最佳跟踪性能,阻碍了其实时跟踪能力。为了解决这些问题,本文介绍了一种长短期双电平记忆增强变压器结构辅助视觉对象预测器(MeAP)。MeAP的主要贡献包括:1)建立了基于入侵效应的特定入侵事件的噪声模型和相应的模板策略,为更有效地利用内存奠定了基础;2)引入基于在线跟踪掩模的特征提取策略和变压器结构的记忆探索方案,减轻了记忆向量构建过程中噪声入侵的影响;3)提供基于目标基本特征和双特征目标掩码预测器的内存利用方案,实现基于掩码的特征提取方法的场景边缘特征,共同预测跟踪目标的准确位置。在OTB100、NFS、VOT2021和AVisT基准测试上进行的大量实验表明,MeAP及其引入的模块实现了与其他最先进(SOTA)跟踪器相当的跟踪性能,并且在4个基准测试中以每秒31帧(FPS)的平均速度运行。
{"title":"MeAP: dual level memory strategy augmented transformer based visual object predictor","authors":"Shiliang Yan,&nbsp;Yinling Wang,&nbsp;Dandan Lu,&nbsp;Min Wang","doi":"10.1016/j.displa.2026.103356","DOIUrl":"10.1016/j.displa.2026.103356","url":null,"abstract":"<div><div>The exploration and resolution of persistent noise incursions within the tracking sequences, especially the occlusion, illumination variations, and fast motion, have garnered substantial attention for their functional properties in enhancing the accuracy and robustness of visual object trackers. However, existing visual object trackers, equipped with template updating mechanisms or calibration strategy, heavily rely on time-consuming historical data to achieve optimal tracking performance, impeding their real-time tracking capabilities. To address these challenges, this paper introduces a long-short term dual level memory augmented transformer structure aided visual object predictor (MeAP). The key contributions of MeAP can be summarized as follows: 1) the formulation of a noise model for specific invasion events based on incursion effects and corresponding template strategies serving as the foundation for more efficient memory utilization; 2) The memory exploration scheme based online tracking mask-based feature extraction strategy and the transformer architecture is introduced to mitigate the impact of noise invasion during memory vector construction; 3) the memory utilization scheme based target basic feature and dual feature target mask predictor is provided to implement the scene-edge feature for mask-based feature extraction method and jointly predict the accurate location of the tracking target.. Extensive experiments conducted on OTB100, NFS, VOT2021, and AVisT benchmarks demonstrate that MeAP, with its introduced modules, achieves comparable tracking performances against other state-of-the-art (SOTA) trackers, and operates at an average speed of 31 frames per second (FPS) across 4 benchmarks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103356"},"PeriodicalIF":3.4,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive feature pyramid network for small object detection in UAV aerial images 基于交互特征金字塔网络的无人机航拍图像小目标检测
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-17 DOI: 10.1016/j.displa.2026.103352
Jinhang Zhang, LiQiang Song, Min Gao, Wenzhao Li, Zhuang Wei
The high prevalence of small objects in aerial images presents a significant challenge for object detection tasks. In this paper, we propose the Interactive Feature Pyramid Network (IFPN) specifically for small object detection in aerial images. The IFPN architecture comprises an Interactive Channel-Wise Attention (ICA) module and an Interactive Spatial-Wise Attention (ISA) module. The ICA and ISA modules facilitate feature interaction across multiple layers, thereby mitigating semantic gaps and information loss inherent in traditional feature pyramids, and effectively capturing the detailed features essential for small objects. By incorporating global contextual information, IFPN enhances the model’s ability to discern the relationship between the target and its surrounding context, particularly in scenarios where small objects exhibit limited features, thereby significantly improving the accuracy of small object detection. Additionally, we propose an Attention Convolution Module (ACM) designed to furnish high-quality feature bases for IFPN during its early stages. Extensive experiments conducted on aerial image datasets attest to the effectiveness and sophistication of IFPN for detecting small objects within aerial images.
航拍图像中小目标的高普遍性给目标检测任务带来了重大挑战。在本文中,我们提出了交互式特征金字塔网络(IFPN),专门用于航空图像中的小目标检测。IFPN架构包括ICA (Interactive Channel-Wise Attention)模块和ISA (Interactive Spatial-Wise Attention)模块。ICA和ISA模块促进了多层特征交互,从而减轻了传统特征金字塔固有的语义缺口和信息丢失,并有效地捕获了小对象所需的详细特征。通过整合全局上下文信息,IFPN增强了模型识别目标与其周围环境之间关系的能力,特别是在小物体表现出有限特征的情况下,从而显着提高了小物体检测的准确性。此外,我们提出了一个注意力卷积模块(ACM),旨在为IFPN的早期阶段提供高质量的特征库。在航空图像数据集上进行的大量实验证明了IFPN在航空图像中检测小物体的有效性和复杂性。
{"title":"Interactive feature pyramid network for small object detection in UAV aerial images","authors":"Jinhang Zhang,&nbsp;LiQiang Song,&nbsp;Min Gao,&nbsp;Wenzhao Li,&nbsp;Zhuang Wei","doi":"10.1016/j.displa.2026.103352","DOIUrl":"10.1016/j.displa.2026.103352","url":null,"abstract":"<div><div>The high prevalence of small objects in aerial images presents a significant challenge for object detection tasks. In this paper, we propose the Interactive Feature Pyramid Network (IFPN) specifically for small object detection in aerial images. The IFPN architecture comprises an Interactive Channel-Wise Attention (ICA) module and an Interactive Spatial-Wise Attention (ISA) module. The ICA and ISA modules facilitate feature interaction across multiple layers, thereby mitigating semantic gaps and information loss inherent in traditional feature pyramids, and effectively capturing the detailed features essential for small objects. By incorporating global contextual information, IFPN enhances the model’s ability to discern the relationship between the target and its surrounding context, particularly in scenarios where small objects exhibit limited features, thereby significantly improving the accuracy of small object detection. Additionally, we propose an Attention Convolution Module (ACM) designed to furnish high-quality feature bases for IFPN during its early stages. Extensive experiments conducted on aerial image datasets attest to the effectiveness and sophistication of IFPN for detecting small objects within aerial images.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103352"},"PeriodicalIF":3.4,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Displays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1