首页 > 最新文献

Displays最新文献

英文 中文
Eye-tracking-based perceptual performance evaluation for multi-screen spliced aircraft 基于眼动追踪的多屏拼接飞机感知性能评价
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-29 DOI: 10.1016/j.displa.2026.103369
Peitong Han, Yan Zhao, Jian Wei, Shibo Wang, Shigang Wang
The evolution from conventional glass cockpits to enclosed, multi-screen display configurations has significantly increased visual complexity and pilot cognitive workload, posing new challenges for the scientific assessment of visual perception. Current evaluation methods primarily rely on subjective questionnaires, which lack objectivity, timeliness, and cannot support real-time cockpit optimization. To overcome these limitations, this study presents an objective visual perception assessment approach for closed cockpit environments. Specifically, three novel eye-tracking indicators – perceptual continuity, visual responsiveness, and focus degree – are proposed and extracted using algorithms developed in this work. These indicators are fused through a regression-based model to achieve non-intrusive and quantitative perception evaluation based on eye-tracking data collected in real time. Experiments conducted in a three-screen splicing scenario demonstrate that the proposed method achieves high prediction accuracy and robustness, providing an effective tool for optimizing cockpit display design and monitoring pilot perceptual states during flight operations.
从传统的玻璃驾驶舱到封闭的多屏幕显示配置的演变,大大增加了视觉复杂性和飞行员的认知工作量,对视觉感知的科学评估提出了新的挑战。目前的评价方法主要依靠主观问卷,缺乏客观性、时效性,无法支持驾驶舱实时优化。为了克服这些限制,本研究提出了一种用于封闭座舱环境的客观视觉感知评估方法。具体而言,提出了三个新的眼动追踪指标-感知连续性,视觉响应性和焦点度-并使用本工作开发的算法提取。基于实时采集的眼动追踪数据,通过回归模型融合这些指标,实现非侵入性、定量的感知评价。在三屏拼接场景下进行的实验表明,该方法具有较高的预测精度和鲁棒性,为优化驾驶舱显示设计和监控飞行操作中飞行员的感知状态提供了有效工具。
{"title":"Eye-tracking-based perceptual performance evaluation for multi-screen spliced aircraft","authors":"Peitong Han,&nbsp;Yan Zhao,&nbsp;Jian Wei,&nbsp;Shibo Wang,&nbsp;Shigang Wang","doi":"10.1016/j.displa.2026.103369","DOIUrl":"10.1016/j.displa.2026.103369","url":null,"abstract":"<div><div>The evolution from conventional glass cockpits to enclosed, multi-screen display configurations has significantly increased visual complexity and pilot cognitive workload, posing new challenges for the scientific assessment of visual perception. Current evaluation methods primarily rely on subjective questionnaires, which lack objectivity, timeliness, and cannot support real-time cockpit optimization. To overcome these limitations, this study presents an objective visual perception assessment approach for closed cockpit environments. Specifically, three novel eye-tracking indicators – perceptual continuity, visual responsiveness, and focus degree – are proposed and extracted using algorithms developed in this work. These indicators are fused through a regression-based model to achieve non-intrusive and quantitative perception evaluation based on eye-tracking data collected in real time. Experiments conducted in a three-screen splicing scenario demonstrate that the proposed method achieves high prediction accuracy and robustness, providing an effective tool for optimizing cockpit display design and monitoring pilot perceptual states during flight operations.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103369"},"PeriodicalIF":3.4,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured pruning via cross-layer metric and ℓ2,0-norm sparse reconstruction 通过跨层度量和l2,0范数稀疏重建进行结构化剪枝
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-29 DOI: 10.1016/j.displa.2026.103362
Huoxiang Yang , Shuangyan Yi , Fanyang Meng , Wei Liu , Yongsheng Liang
Existing intra-layer pruning methods face two major challenges. First, they often use empirically determined pruning ratios, which ignore the distinct statistical properties of different layers and limit global structural optimization. Second, they fail to effectively model intra-layer dependencies, leading to inaccurate identification of redundant filters. To address these challenges, we propose a novel structured pruning method that integrates a cross-layer metric and feature reconstruction constrained by the 2,0-norm. Initially, we introduce a cross-layer importance metric based on statistical distribution standardization. By normalizing the statistical properties of feature responses across layers, our approach constructs a unified metric for consistent importance evaluation. Dynamic thresholding is then applied to adaptively determine the pruning ratio for each layer, resulting in enhanced network architectures and improved pruning efficiency. Subsequently, based on the layer-wise pruning ratios, we introduce an 2,0-norm constrained sparse reconstruction model that captures intra-layer dependencies. The model produces a sparse coefficient matrix with entirely zeroed columns corresponding to redundant components, thus enhancing the accuracy of redundant filter identification. Extensive experiments on multiple benchmark datasets and network architectures verify the effectiveness of our method. For instance, on CIFAR-10 with VGG-16, our method achieves a 58.21% reduction in computational complexity and a 90.65% decrease in storage cost, with only a 0.11% accuracy drop.
现有的层内剪枝方法面临两个主要挑战。首先,他们经常使用经验确定的修剪比率,这忽略了不同层的不同统计特性,限制了全局结构优化。其次,它们不能有效地建模层内依赖关系,导致冗余滤波器的不准确识别。为了解决这些挑战,我们提出了一种新的结构化修剪方法,该方法集成了跨层度量和受l2,0范数约束的特征重构。首先,我们引入了一个基于统计分布标准化的跨层重要性度量。通过标准化跨层特征响应的统计属性,我们的方法构建了一致重要性评估的统一度量。然后应用动态阈值自适应地确定每层的剪枝比例,从而增强网络架构并提高剪枝效率。随后,基于分层修剪比率,我们引入了一个捕获层内依赖关系的l2,0范数约束稀疏重建模型。该模型生成一个列全归零的稀疏系数矩阵,对应于冗余分量,从而提高了冗余滤波器识别的准确性。在多个基准数据集和网络架构上的大量实验验证了我们方法的有效性。例如,在使用VGG-16的CIFAR-10上,我们的方法实现了计算复杂度降低58.21%,存储成本降低90.65%,准确率仅下降0.11%。
{"title":"Structured pruning via cross-layer metric and ℓ2,0-norm sparse reconstruction","authors":"Huoxiang Yang ,&nbsp;Shuangyan Yi ,&nbsp;Fanyang Meng ,&nbsp;Wei Liu ,&nbsp;Yongsheng Liang","doi":"10.1016/j.displa.2026.103362","DOIUrl":"10.1016/j.displa.2026.103362","url":null,"abstract":"<div><div>Existing intra-layer pruning methods face two major challenges. First, they often use empirically determined pruning ratios, which ignore the distinct statistical properties of different layers and limit global structural optimization. Second, they fail to effectively model intra-layer dependencies, leading to inaccurate identification of redundant filters. To address these challenges, we propose a novel structured pruning method that integrates a cross-layer metric and feature reconstruction constrained by the <span><math><mrow><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>0</mn></mrow></msub><mtext>-norm</mtext></mrow></math></span>. Initially, we introduce a cross-layer importance metric based on statistical distribution standardization. By normalizing the statistical properties of feature responses across layers, our approach constructs a unified metric for consistent importance evaluation. Dynamic thresholding is then applied to adaptively determine the pruning ratio for each layer, resulting in enhanced network architectures and improved pruning efficiency. Subsequently, based on the layer-wise pruning ratios, we introduce an <span><math><mrow><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>0</mn></mrow></msub><mtext>-norm</mtext></mrow></math></span> constrained sparse reconstruction model that captures intra-layer dependencies. The model produces a sparse coefficient matrix with entirely zeroed columns corresponding to redundant components, thus enhancing the accuracy of redundant filter identification. Extensive experiments on multiple benchmark datasets and network architectures verify the effectiveness of our method. For instance, on CIFAR-10 with VGG-16, our method achieves a 58.21% reduction in computational complexity and a 90.65% decrease in storage cost, with only a 0.11% accuracy drop.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103362"},"PeriodicalIF":3.4,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProtoConNet: Prototypical augmentation and alignment for open-set few-shot image classification ProtoConNet:开放集少镜头图像分类的原型增强和对齐
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-28 DOI: 10.1016/j.displa.2026.103364
Kexuan Shi , Zhuang Qi , Jingjing Zhu , Lei Meng , Yaochen Zhang , Haibei Huang , Xiangxu Meng
Open-set few-shot image classification aims to train models using a small amount of labeled data, enabling them to achieve good generalization when confronted with unknown environments. Existing methods mainly use visual information from a single image to learn class representations to distinguish known from unknown categories. However, these methods often overlook the benefits of integrating rich contextual information. To address this issue, this paper proposes a prototypical augmentation and alignment method, termed ProtoConNet, which incorporates background information from different samples to enhance the diversity of the feature space, breaking the spurious associations between context and image subjects in few-shot scenarios. Specifically, it consists of three main modules: the clustering-based data selection (CDS) module mines diverse data patterns while preserving core features; the contextual-enhanced semantic refinement (CSR) module builds a context dictionary to integrate into image representations, which boosts the model’s robustness in various scenarios; and the prototypical alignment (PA) module reduces the gap between image representations and class prototypes, amplifying feature distances for known and unknown classes. Experimental results from two datasets verified that ProtoConNet enhances the effectiveness of representation learning in few-shot scenarios and identifies open-set samples, making it superior to existing methods.
开集少镜头图像分类的目的是利用少量的标记数据训练模型,使其在面对未知环境时能够很好的泛化。现有的方法主要是利用单个图像的视觉信息来学习类表示,以区分已知和未知的类别。然而,这些方法往往忽略了集成丰富的上下文信息的好处。为了解决这一问题,本文提出了一种原型增强和对齐方法,称为ProtoConNet,该方法融合了来自不同样本的背景信息,以增强特征空间的多样性,打破了在少数镜头场景中上下文和图像主体之间的虚假关联。具体来说,它包括三个主要模块:基于聚类的数据选择(CDS)模块在保留核心特征的同时挖掘不同的数据模式;上下文增强语义细化(CSR)模块构建上下文字典并集成到图像表示中,增强了模型在各种场景下的鲁棒性;原型对齐(PA)模块减少了图像表示和类原型之间的差距,放大了已知和未知类的特征距离。两个数据集的实验结果验证了ProtoConNet提高了表征学习在少镜头场景下的有效性,并能识别开集样本,优于现有方法。
{"title":"ProtoConNet: Prototypical augmentation and alignment for open-set few-shot image classification","authors":"Kexuan Shi ,&nbsp;Zhuang Qi ,&nbsp;Jingjing Zhu ,&nbsp;Lei Meng ,&nbsp;Yaochen Zhang ,&nbsp;Haibei Huang ,&nbsp;Xiangxu Meng","doi":"10.1016/j.displa.2026.103364","DOIUrl":"10.1016/j.displa.2026.103364","url":null,"abstract":"<div><div>Open-set few-shot image classification aims to train models using a small amount of labeled data, enabling them to achieve good generalization when confronted with unknown environments. Existing methods mainly use visual information from a single image to learn class representations to distinguish known from unknown categories. However, these methods often overlook the benefits of integrating rich contextual information. To address this issue, this paper proposes a prototypical augmentation and alignment method, termed ProtoConNet, which incorporates background information from different samples to enhance the diversity of the feature space, breaking the spurious associations between context and image subjects in few-shot scenarios. Specifically, it consists of three main modules: the clustering-based data selection (CDS) module mines diverse data patterns while preserving core features; the contextual-enhanced semantic refinement (CSR) module builds a context dictionary to integrate into image representations, which boosts the model’s robustness in various scenarios; and the prototypical alignment (PA) module reduces the gap between image representations and class prototypes, amplifying feature distances for known and unknown classes. Experimental results from two datasets verified that ProtoConNet enhances the effectiveness of representation learning in few-shot scenarios and identifies open-set samples, making it superior to existing methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103364"},"PeriodicalIF":3.4,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid detection model for unauthorized use of doctor’s code in health insurance: Integrating rule-based screening and LLM reasoning 医疗保险中未经授权使用医生代码的混合检测模型:整合基于规则的筛选和LLM推理
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-28 DOI: 10.1016/j.displa.2026.103359
Qiwen Yuan , Jiajie Chen , Zhendong Shi
Unauthorized use of doctor’s code is a high-risk and context-dependent issue in health insurance supervision. Traditional rule-based screening achieves high recall but often produces false positives in cases that appear anomalous yet are clinically legitimate, such as telemedicine encounters, refund-related re-settlements, and rapid outpatient–emergency transitions. These methods lack semantic understanding of medical context and rely heavily on manual auditing. We propose a hybrid detection framework that integrates rule-based temporal filtering with large language model (LLM)–based semantic reasoning. Time-threshold rules are first applied to extract suspected cases from real health-insurance claim data. Expert-derived legitimate scenario patterns are then embedded into structured prompts to guide the LLM in semantic plausibility assessment and false-positive reduction. For evaluation, we construct a 240-pair multi-scenario benchmark dataset from de-identified real claim records, covering both reasonable and suspicious situations. Zero-shot experiments with DeepSeek-R1-7B show that the framework achieves 75% accuracy and 87% precision in distinguishing reasonable from unauthorized cases. These results indicate that the proposed method can effectively reduce false alarms and alleviate manual audit workload, providing a practical and efficient solution for real-world health-insurance supervision.
在医疗保险监管中,未经授权使用医生密码是一个高风险且与环境相关的问题。传统的基于规则的筛查具有很高的召回率,但在看似异常但临床上合法的病例中,如远程医疗就诊、退款相关的重新安置和快速门诊-急诊转换,往往会产生假阳性。这些方法缺乏对医学上下文的语义理解,并且严重依赖于人工审核。我们提出了一种将基于规则的时间过滤与基于大语言模型(LLM)的语义推理相结合的混合检测框架。首先应用时间阈值规则从真实的医疗保险索赔数据中提取疑似病例。然后,专家派生的合法场景模式被嵌入到结构化提示中,以指导法学硕士进行语义合理性评估和减少误报。为了评估,我们从去识别的真实索赔记录中构建了240对多场景基准数据集,涵盖了合理和可疑的情况。基于DeepSeek-R1-7B的零射击实验表明,该框架在区分合理和未经授权的情况下达到75%的准确率和87%的精度。结果表明,该方法可以有效地减少误报,减轻人工审计工作量,为现实世界的医疗保险监管提供了一种实用高效的解决方案。
{"title":"Hybrid detection model for unauthorized use of doctor’s code in health insurance: Integrating rule-based screening and LLM reasoning","authors":"Qiwen Yuan ,&nbsp;Jiajie Chen ,&nbsp;Zhendong Shi","doi":"10.1016/j.displa.2026.103359","DOIUrl":"10.1016/j.displa.2026.103359","url":null,"abstract":"<div><div>Unauthorized use of doctor’s code is a high-risk and context-dependent issue in health insurance supervision. Traditional rule-based screening achieves high recall but often produces false positives in cases that appear anomalous yet are clinically legitimate, such as telemedicine encounters, refund-related re-settlements, and rapid outpatient–emergency transitions. These methods lack semantic understanding of medical context and rely heavily on manual auditing. We propose a hybrid detection framework that integrates rule-based temporal filtering with large language model (LLM)–based semantic reasoning. Time-threshold rules are first applied to extract suspected cases from real health-insurance claim data. Expert-derived legitimate scenario patterns are then embedded into structured prompts to guide the LLM in semantic plausibility assessment and false-positive reduction. For evaluation, we construct a 240-pair multi-scenario benchmark dataset from de-identified real claim records, covering both reasonable and suspicious situations. Zero-shot experiments with DeepSeek-R1-7B show that the framework achieves 75% accuracy and 87% precision in distinguishing reasonable from unauthorized cases. These results indicate that the proposed method can effectively reduce false alarms and alleviate manual audit workload, providing a practical and efficient solution for real-world health-insurance supervision.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103359"},"PeriodicalIF":3.4,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Content-adaptive dual feature selection for infrared aerial video compressive sensing reconstruction 基于内容自适应的航空红外视频压缩感知重构双特征选择
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-27 DOI: 10.1016/j.displa.2026.103368
Hao Liu , Maoji Qiu , Rong Huang
For block compressive sensing (BCS) of natural videos, existing reconstruction algorithms typically utilize nonlocal self-similarity (NSS) to generate sparse residuals, thereby achieving favorable recovery performance by exploiting the statistical characteristics of key frames and non-key frames. However, when applied to multi-perspective infrared aerial videos rather than natural videos, these reconstruction algorithms usually result in poor recovery quality because of the inflexibility in selecting similar patches and poor adaptability to dynamic scene changes. Due to the distribution property of infrared aerial imagery, inter-frame and intra-frame similar patches should be selected adaptively so that an accurate dictionary matrix can be learned. Therefore, this paper proposes a content-adaptive dual feature selection mechanism. It first conducts a rough screening of inter-frame and intra-frame similar patches based on the correlation of observed measurement vectors across frames. Then, it is followed by a fine screening stage, where principal component analysis (PCA) is applied to project the similar patch-group matrix into a low-dimensional space. Finally, the split Bregman iteration (SBI) is employed to solve the BCS reconstruction for infrared aerial video. Experimental results on both HIT-UAV and M200-XT2DroneVehicle datasets demonstrate that the proposed algorithm achieves better recovery quality compared to state-of-the-art algorithms.
对于自然视频的块压缩感知(BCS),现有的重建算法通常利用非局部自相似(NSS)来产生稀疏残差,从而利用关键帧和非关键帧的统计特征来获得良好的恢复性能。然而,当这些重建算法应用于多视角红外航拍视频而非自然视频时,由于选取相似patch的灵活性不强,对场景动态变化的适应性较差,往往导致恢复质量较差。由于红外航拍图像的分布特性,需要自适应地选择帧间和帧内的相似patch,从而学习到准确的字典矩阵。为此,本文提出了一种内容自适应的双特征选择机制。首先,基于观测到的测量向量跨帧的相关性,对帧间和帧内的相似patch进行粗略筛选。然后,进入精细筛选阶段,应用主成分分析(PCA)将相似的patch-group矩阵投影到低维空间中。最后,采用分割布雷格曼迭代(SBI)解决红外航拍视频的BCS重构问题。在HIT-UAV和m200 - xt2drone - vehicle数据集上的实验结果表明,与现有算法相比,该算法获得了更好的恢复质量。
{"title":"Content-adaptive dual feature selection for infrared aerial video compressive sensing reconstruction","authors":"Hao Liu ,&nbsp;Maoji Qiu ,&nbsp;Rong Huang","doi":"10.1016/j.displa.2026.103368","DOIUrl":"10.1016/j.displa.2026.103368","url":null,"abstract":"<div><div>For block compressive sensing (BCS) of natural videos, existing reconstruction algorithms typically utilize nonlocal self-similarity (NSS) to generate sparse residuals, thereby achieving favorable recovery performance by exploiting the statistical characteristics of key frames and non-key frames. However, when applied to multi-perspective infrared aerial videos rather than natural videos, these reconstruction algorithms usually result in poor recovery quality because of the inflexibility in selecting similar patches and poor adaptability to dynamic scene changes. Due to the distribution property of infrared aerial imagery, inter-frame and intra-frame similar patches should be selected adaptively so that an accurate dictionary matrix can be learned. Therefore, this paper proposes a content-adaptive dual feature selection mechanism. It first conducts a rough screening of inter-frame and intra-frame similar patches based on the correlation of observed measurement vectors across frames. Then, it is followed by a fine screening stage, where principal component analysis (PCA) is applied to project the similar patch-group matrix into a low-dimensional space. Finally, the split Bregman iteration (SBI) is employed to solve the BCS reconstruction for infrared aerial video. Experimental results on both HIT-UAV and M200-XT2DroneVehicle datasets demonstrate that the proposed algorithm achieves better recovery quality compared to state-of-the-art algorithms.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103368"},"PeriodicalIF":3.4,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distinguishing sleepiness from mental fatigue in sustained monitoring tasks to enhance the reliability of fatigue detection based on multimodal fusion 在持续监测任务中区分困倦和精神疲劳,提高基于多模态融合的疲劳检测可靠性
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-27 DOI: 10.1016/j.displa.2026.103366
Xinggang Hou , Bingchen Gou , Dengkai Chen , Jianjie Chu , Xiaosai Duan , Xuerui Li , Lin Ma , Jing Chen , Yao Zhou
In monitoring tasks involving sustained interaction with display systems, fatigue is a primary factor diminishing efficiency. Traditional models confuse sleepiness with mental fatigue, which compromises the reliability of assessments. We propose an explainable multimodal framework that models these two subtypes separately and integrates them into a comprehensive fatigue assessment. To validate our methodology, we invited 20 pilots to participate in a 90-minute continuous monitoring experiment, during which we collected multimodal data including their eye movements, electroencephalogram (EEG), electrocardiogram (ECG), and video. First, we derive explicit representation functions for sleepiness and mental fatigue using symbolic regression on facial and behavioral cues, enabling continuous subtype related labeling beyond intermittent questionnaires. Second, we identify compact physiological marker subsets via a cascaded feature selection method that combines mRMR prescreening with a heuristic search, yielding key feature sets while substantially reducing dimensionality. Finally, dynamic weighted coupling analysis based on information entropy revealed the nonlinear superposition effects between sleepiness and mental fatigue. Using 30 s windows under the current cohort and evaluation setting, the resulting comprehensive classifier achieves 94.8% accuracy. Following external validation and domain-specific adaptations, the methodology developed in this study holds broad application prospects across numerous automation scenarios involving monotonous human–machine interaction tasks.
在与显示系统持续交互的监测任务中,疲劳是降低效率的主要因素。传统的模型混淆了困倦和精神疲劳,这降低了评估的可靠性。我们提出了一个可解释的多模态框架,分别对这两个亚型进行建模,并将它们集成到综合疲劳评估中。为了验证我们的方法,我们邀请了20名飞行员参加一个90分钟的连续监测实验,在此期间,我们收集了多模态数据,包括他们的眼球运动、脑电图(EEG)、心电图(ECG)和视频。首先,我们利用面部和行为线索的符号回归推导出困倦和精神疲劳的显式表征函数,实现了间歇性问卷之外的连续亚型相关标记。其次,我们通过级联特征选择方法识别紧凑的生理标记子集,该方法将mRMR预筛选与启发式搜索相结合,在大幅降低维数的同时产生关键特征集。最后,基于信息熵的动态加权耦合分析揭示了困倦与精神疲劳之间的非线性叠加效应。在当前队列和评价设置下使用30 s窗口,得到的综合分类器准确率达到94.8%。经过外部验证和特定领域的调整,本研究中开发的方法在涉及单调人机交互任务的众多自动化场景中具有广泛的应用前景。
{"title":"Distinguishing sleepiness from mental fatigue in sustained monitoring tasks to enhance the reliability of fatigue detection based on multimodal fusion","authors":"Xinggang Hou ,&nbsp;Bingchen Gou ,&nbsp;Dengkai Chen ,&nbsp;Jianjie Chu ,&nbsp;Xiaosai Duan ,&nbsp;Xuerui Li ,&nbsp;Lin Ma ,&nbsp;Jing Chen ,&nbsp;Yao Zhou","doi":"10.1016/j.displa.2026.103366","DOIUrl":"10.1016/j.displa.2026.103366","url":null,"abstract":"<div><div>In monitoring tasks involving sustained interaction with display systems, fatigue is a primary factor diminishing efficiency. Traditional models confuse sleepiness with mental fatigue, which compromises the reliability of assessments. We propose an explainable multimodal framework that models these two subtypes separately and integrates them into a comprehensive fatigue assessment. To validate our methodology, we invited 20 pilots to participate in a 90-minute continuous monitoring experiment, during which we collected multimodal data including their eye movements, electroencephalogram (EEG), electrocardiogram (ECG), and video. First, we derive explicit representation functions for sleepiness and mental fatigue using symbolic regression on facial and behavioral cues, enabling continuous subtype related labeling beyond intermittent questionnaires. Second, we identify compact physiological marker subsets via a cascaded feature selection method that combines mRMR prescreening with a heuristic search, yielding key feature sets while substantially reducing dimensionality. Finally, dynamic weighted coupling analysis based on information entropy revealed the nonlinear superposition effects between sleepiness and mental fatigue. Using 30 s windows under the current cohort and evaluation setting, the resulting comprehensive classifier achieves 94.8% accuracy. Following external validation and domain-specific adaptations, the methodology developed in this study holds broad application prospects across numerous automation scenarios involving monotonous human–machine interaction tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103366"},"PeriodicalIF":3.4,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trans-MT: a 3D semi-supervised glioma segmentation model integrating transformer architecture and asymmetric data augmentation Trans-MT:一种集成变压器结构和非对称数据增强的3D半监督胶质瘤分割模型
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-27 DOI: 10.1016/j.displa.2026.103365
Yuehui Liao , Yun Zheng , Yingjie Jiao , Na Tang , Yuhao Wang , Yu Hu , Yaning Feng , Ruofan Wang , Qun Jin , Xiaobo Lai , Panfei Li
Accurate glioma segmentation in magnetic resonance imaging (MRI) is crucial for effective diagnosis and treatment planning in neuro-oncology; however, this process is often time-consuming and heavily reliant on expert annotations. To address these limitations, we present Trans-MT, a 3D semi-supervised segmentation model that integrates a transformer-based architecture with asymmetric data augmentation, achieving high segmentation accuracy with limited labeled data. Trans-MT employs a teacher-student framework: the teacher model generates reliable pseudo-labels for unlabeled data, while the student model learns through supervised and consistency losses, guided by an uncertainty-aware mechanism to refine its predictions. The architecture of Trans-MT features a hybrid encoder, nnUFormer, which combines the robust capabilities of nn-UNet with transformers, enabling it to capture global contextual information essential for accurate tumor segmentation. This design enhances the model’s ability to detect intricate tumor structures within MRI scans, even with sparse annotations. Additionally, the model’s learning process is strengthened by asymmetric data augmentation, which enriches data diversity and robustness. We evaluated Trans-MT on the BraTS 2019, 2020, and 2021 datasets, where it demonstrated superior performance over several state-of-the-art semi-supervised models, particularly in segmenting challenging tumor sub-regions. The results confirm that Trans-MT significantly improves segmentation precision, making it a valuable advancement in brain tumor segmentation methodology and a practical solution for clinical settings with limited labeled data. Our code is available at https://github.com/smallboy-code/TransMT.
核磁共振成像(MRI)中胶质瘤的准确分割对神经肿瘤的有效诊断和治疗计划至关重要。然而,这个过程通常很耗时,并且严重依赖于专家注释。为了解决这些限制,我们提出了Trans-MT,这是一种3D半监督分割模型,它集成了基于变压器的架构和非对称数据增强,在有限的标记数据下实现了高分割精度。Trans-MT采用教师-学生框架:教师模型为未标记的数据生成可靠的伪标签,而学生模型通过监督和一致性损失进行学习,在不确定性意识机制的指导下改进其预测。Trans-MT的结构特点是混合编码器nnUFormer,它结合了nn-UNet与变压器的强大功能,使其能够捕获准确肿瘤分割所必需的全局上下文信息。这种设计增强了模型在MRI扫描中检测复杂肿瘤结构的能力,即使有稀疏的注释。此外,通过非对称数据增强增强了模型的学习过程,丰富了数据的多样性和鲁棒性。我们在BraTS 2019年、2020年和2021年的数据集上评估了Trans-MT,在这些数据集上,它比几个最先进的半监督模型表现出了更好的性能,特别是在分割具有挑战性的肿瘤子区域方面。结果证实,Trans-MT显著提高了分割精度,使其成为脑肿瘤分割方法的一个有价值的进步,并为临床环境中有限的标记数据提供了实用的解决方案。我们的代码可在https://github.com/smallboy-code/TransMT上获得。
{"title":"Trans-MT: a 3D semi-supervised glioma segmentation model integrating transformer architecture and asymmetric data augmentation","authors":"Yuehui Liao ,&nbsp;Yun Zheng ,&nbsp;Yingjie Jiao ,&nbsp;Na Tang ,&nbsp;Yuhao Wang ,&nbsp;Yu Hu ,&nbsp;Yaning Feng ,&nbsp;Ruofan Wang ,&nbsp;Qun Jin ,&nbsp;Xiaobo Lai ,&nbsp;Panfei Li","doi":"10.1016/j.displa.2026.103365","DOIUrl":"10.1016/j.displa.2026.103365","url":null,"abstract":"<div><div>Accurate glioma segmentation in magnetic resonance imaging (MRI) is crucial for effective diagnosis and treatment planning in neuro-oncology; however, this process is often time-consuming and heavily reliant on expert annotations. To address these limitations, we present Trans-MT, a 3D semi-supervised segmentation model that integrates a transformer-based architecture with asymmetric data augmentation, achieving high segmentation accuracy with limited labeled data. Trans-MT employs a teacher-student framework: the teacher model generates reliable pseudo-labels for unlabeled data, while the student model learns through supervised and consistency losses, guided by an uncertainty-aware mechanism to refine its predictions. The architecture of Trans-MT features a hybrid encoder, nnUFormer, which combines the robust capabilities of nn-UNet with transformers, enabling it to capture global contextual information essential for accurate tumor segmentation. This design enhances the model’s ability to detect intricate tumor structures within MRI scans, even with sparse annotations. Additionally, the model’s learning process is strengthened by asymmetric data augmentation, which enriches data diversity and robustness. We evaluated Trans-MT on the BraTS 2019, 2020, and 2021 datasets, where it demonstrated superior performance over several state-of-the-art semi-supervised models, particularly in segmenting challenging tumor sub-regions. The results confirm that Trans-MT significantly improves segmentation precision, making it a valuable advancement in brain tumor segmentation methodology and a practical solution for clinical settings with limited labeled data. Our code is available<!--> <!-->at <span><span>https://github.com/smallboy-code/TransMT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103365"},"PeriodicalIF":3.4,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning video normality for anomaly detection via multi-scale spatiotemporal feature extraction and a feature memory module 基于多尺度时空特征提取和特征记忆模块的视频常态学习异常检测
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-22 DOI: 10.1016/j.displa.2026.103355
Yongqing Huo, Wenke Jiang
Video anomaly detection (VAD) is critical for automated identification of anomalous behaviors in surveillance system, with applications in public safety, intelligent transportation and healthcare. However, with the continuous expansion of application domains, ensuring that VAD algorithm maintains excellent detection performance across diverse scenarios has become the primary focus of current research direction. To enhance the robustness of detection across various environments, we propose a novel autoencoder-based model in this paper. Compared with other algorithms, our method can more effectively exploit multi-scale feature information within frames for learning feature distribution. In the encoder, we construct the convolutional module with multiple kernel sizes and incorporate the designed Spatial-Channel Transformer Attention (SCTA) module to strengthen the feature representation. In the decoder, we integrate the multi-scale feature reconstruction module with Self-Supervised Predictive Convolutional Attentive Blocks (SSPCAB) for more accurate next-frame prediction. Moreover, we introduce a dedicated memory module to capture and store the distribution of normal data patterns. Meanwhile, the architecture employs the Conv-LSTM and a specially designed Temporal-Spatial Attention (TSA) module in skip connections to capture spatiotemporal dependencies across video frames. Benefiting from the design and integration of those modules, our proposed method achieves superior detection performance on public datasets, including UCSD Ped2, CUHK Avenue and ShanghaiTech. The experimental results demonstrate the effectiveness and versatility of our method in anomaly detection tasks.
视频异常检测(VAD)是监控系统中异常行为自动识别的关键,在公共安全、智能交通和医疗保健等领域都有广泛的应用。然而,随着应用领域的不断扩展,确保VAD算法在不同场景下保持优异的检测性能已成为当前研究方向的首要焦点。为了提高检测在不同环境下的鲁棒性,本文提出了一种新的基于自编码器的模型。与其他算法相比,我们的方法可以更有效地利用帧内的多尺度特征信息来学习特征分布。在编码器中,我们构建了具有多个核大小的卷积模块,并结合设计的空间通道变压器注意(SCTA)模块来增强特征表示。在解码器中,我们将多尺度特征重构模块与自监督预测卷积关注块(SSPCAB)相结合,以实现更准确的下一帧预测。此外,我们还引入了一个专用的内存模块来捕获和存储正常数据模式的分布。同时,该架构采用了convl - lstm和在跳过连接中特别设计的时空注意(Temporal-Spatial Attention, TSA)模块来捕获视频帧之间的时空依赖性。得益于这些模块的设计和集成,我们提出的方法在UCSD Ped2、CUHK Avenue和ShanghaiTech等公共数据集上取得了优异的检测性能。实验结果证明了该方法在异常检测任务中的有效性和通用性。
{"title":"Learning video normality for anomaly detection via multi-scale spatiotemporal feature extraction and a feature memory module","authors":"Yongqing Huo,&nbsp;Wenke Jiang","doi":"10.1016/j.displa.2026.103355","DOIUrl":"10.1016/j.displa.2026.103355","url":null,"abstract":"<div><div>Video anomaly detection (VAD) is critical for automated identification of anomalous behaviors in surveillance system, with applications in public safety, intelligent transportation and healthcare. However, with the continuous expansion of application domains, ensuring that VAD algorithm maintains excellent detection performance across diverse scenarios has become the primary focus of current research direction. To enhance the robustness of detection across various environments, we propose a novel autoencoder-based model in this paper. Compared with other algorithms, our method can more effectively exploit multi-scale feature information within frames for learning feature distribution. In the encoder, we construct the convolutional module with multiple kernel sizes and incorporate the designed Spatial-Channel Transformer Attention (SCTA) module to strengthen the feature representation. In the decoder, we integrate the multi-scale feature reconstruction module with Self-Supervised Predictive Convolutional Attentive Blocks (SSPCAB) for more accurate next-frame prediction. Moreover, we introduce a dedicated memory module to capture and store the distribution of normal data patterns. Meanwhile, the architecture employs the Conv-LSTM and a specially designed Temporal-Spatial Attention (TSA) module in skip connections to capture spatiotemporal dependencies across video frames. Benefiting from the design and integration of those modules, our proposed method achieves superior detection performance on public datasets, including UCSD Ped2, CUHK Avenue and ShanghaiTech. The experimental results demonstrate the effectiveness and versatility of our method in anomaly detection tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103355"},"PeriodicalIF":3.4,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-attention-based mixture-of-experts framework for non-invasive prediction of MGMT promoter methylation in glioblastoma using multi-modal MRI 利用多模态MRI对胶质母细胞瘤中MGMT启动子甲基化进行无创预测的基于自我注意的专家混合框架
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-21 DOI: 10.1016/j.displa.2026.103358
Yuehui Liao , Yun Zheng , Jingyu Zhu , Yu Chen , Feng Gao , Yaning Feng , Weiji Yang , Guang Yang , Xiaobo Lai , Panfei Li
Glioblastoma (GBM) is an aggressive brain tumor associated with poor prognosis and limited treatment options. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT) promoter is a critical biomarker for predicting the efficacy of temozolomide chemotherapy in GBM patients. However, current methods for determining MGMT promoter methylation, including invasive and costly techniques, hinder their widespread clinical application. In this study, we propose a novel non-invasive deep learning framework based on a Mixture-of-Experts (MoE) architecture for predicting MGMT promoter methylation status using multi-modal magnetic resonance imaging (MRI) data. Our MoE model incorporates modality-specific expert networks built on the ResNet18 architecture, with a self-attention-based gating mechanism that dynamically selects and integrates the most relevant features across MRI modalities (T1-weighted, contrast-enhanced T1, T2-weighted, and fluid-attenuated inversion recovery). We evaluate the proposed framework on the BraTS2021 and TCGA-GBM datasets, showing superior performance compared to conventional deep learning models in terms of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Furthermore, Grad-CAM visualizations provide enhanced interpretability by highlighting biologically relevant regions in the tumor and peritumoral areas that influence model predictions. The proposed framework represents a promising tool for integrating imaging biomarkers into precision oncology workflows, offering a scalable, cost-effective, and interpretable solution for non-invasive MGMT methylation prediction in GBM.
胶质母细胞瘤(GBM)是一种侵袭性脑肿瘤,预后差,治疗方案有限。o6 -甲基鸟嘌呤- dna甲基转移酶(MGMT)启动子的甲基化状态是预测替莫唑胺化疗对GBM患者疗效的关键生物标志物。然而,目前确定MGMT启动子甲基化的方法,包括侵入性和昂贵的技术,阻碍了它们的广泛临床应用。在本研究中,我们提出了一种基于专家混合(MoE)架构的新型非侵入性深度学习框架,用于使用多模态磁共振成像(MRI)数据预测MGMT启动子甲基化状态。我们的MoE模型结合了基于ResNet18架构的特定模态专家网络,以及基于自注意力的门控机制,可以动态选择和集成MRI模态(T1加权、对比度增强T1、t2加权和流体衰减反演恢复)中最相关的特征。我们在BraTS2021和TCGA-GBM数据集上评估了所提出的框架,与传统的深度学习模型相比,在准确性、灵敏度、特异性和接受者工作特征曲线下面积(AUC)方面表现出优越的性能。此外,通过强调影响模型预测的肿瘤和肿瘤周围区域的生物学相关区域,Grad-CAM可视化提供了增强的可解释性。所提出的框架代表了将成像生物标志物整合到精确肿瘤学工作流程中的有前途的工具,为GBM的非侵入性MGMT甲基化预测提供了可扩展、经济高效和可解释的解决方案。
{"title":"Self-attention-based mixture-of-experts framework for non-invasive prediction of MGMT promoter methylation in glioblastoma using multi-modal MRI","authors":"Yuehui Liao ,&nbsp;Yun Zheng ,&nbsp;Jingyu Zhu ,&nbsp;Yu Chen ,&nbsp;Feng Gao ,&nbsp;Yaning Feng ,&nbsp;Weiji Yang ,&nbsp;Guang Yang ,&nbsp;Xiaobo Lai ,&nbsp;Panfei Li","doi":"10.1016/j.displa.2026.103358","DOIUrl":"10.1016/j.displa.2026.103358","url":null,"abstract":"<div><div>Glioblastoma (GBM) is an aggressive brain tumor associated with poor prognosis and limited treatment options. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT) promoter is a critical biomarker for predicting the efficacy of temozolomide chemotherapy in GBM patients. However, current methods for determining MGMT promoter methylation, including invasive and costly techniques, hinder their widespread clinical application. In this study, we propose a novel non-invasive deep learning framework based on a Mixture-of-Experts (MoE) architecture for predicting MGMT promoter methylation status using multi-modal magnetic resonance imaging (MRI) data. Our MoE model incorporates modality-specific expert networks built on the ResNet18 architecture, with a self-attention-based gating mechanism that dynamically selects and integrates the most relevant features across MRI modalities (T1-weighted, contrast-enhanced T1, T2-weighted, and fluid-attenuated inversion recovery). We evaluate the proposed framework on the BraTS2021 and TCGA-GBM datasets, showing superior performance compared to conventional deep learning models in terms of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Furthermore, Grad-CAM visualizations provide enhanced interpretability by highlighting biologically relevant regions in the tumor and peritumoral areas that influence model predictions. The proposed framework represents a promising tool for integrating imaging biomarkers into precision oncology workflows, offering a scalable, cost-effective, and interpretable solution for non-invasive MGMT methylation prediction in GBM.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103358"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing differentiable geometry and orientation attention for semi-supervised vessel segmentation with limited annotations 利用可微几何和方向注意进行有限注释的半监督血管分割
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-21 DOI: 10.1016/j.displa.2026.103347
Yan Liu , Yan Yang , Yongquan Jiang , Xiaole Zhao , Liang Fan
The precise segmentation of vascular structures is vital for diagnosing retinal and coronary artery diseases. However, the complex morphology and large structural variability of blood vessels make manual annotation time-consuming and finite which in turn limits the scalability of supervised segmentation methods. We propose a semi-supervised segmentation framework named geometric orientational fusion attention network (GOFA-Net) that integrates differentiable geometric augmentation and orientation-aware attention to effectively leverage knowledge from limited annotations. GOFA-Net comprises three key complementary components: 1) a differentiable geometric augmentation strategy (DGAS) employs quaternion-based representations to diversify training samples while preserving prediction consistency between teacher and student models; 2) a multi-view fusion module (MVFM) orchestrates collaborative feature learning between quaternion and conventional convolutional streams to capture comprehensive spatial dependencies; and 3) a global orientational attention module (GOAM) enhances structural awareness through direction-sensitive geometric embeddings, specifically reinforcing the perception of vascular topology along horizontal and vertical orientations. Extensive validation on multiple retinal vessel datasets (DRIVE, STARE, CHASE_DB1, and HRF) and coronary angiography datasets (DCA1 and CHUAC) show that GOFA-Net consistently outperforms state-of-the-art semi-supervised methods, achieving particularly notable gains in scenarios with limited annotations.
血管结构的精确分割对视网膜和冠状动脉疾病的诊断至关重要。然而,血管的复杂形态和大的结构可变性使得人工标注耗时且有限,从而限制了监督分割方法的可扩展性。我们提出了一种半监督分割框架,称为几何方向融合注意网络(GOFA-Net),该框架集成了可微几何增强和方向感知注意,以有效利用有限注释中的知识。GOFA-Net包括三个关键的互补部分:1)可微分几何增强策略(DGAS)采用基于四元数的表示来多样化训练样本,同时保持师生模型之间的预测一致性;2)多视图融合模块(MVFM)协调四元数和传统卷积流之间的协同特征学习,以捕获全面的空间依赖性;3)全局定向注意模块(GOAM)通过方向敏感的几何嵌入增强结构意识,特别是加强血管拓扑在水平和垂直方向上的感知。在多个视网膜血管数据集(DRIVE、STARE、CHASE_DB1和HRF)和冠状动脉造影数据集(DCA1和CHUAC)上进行的广泛验证表明,GOFA-Net始终优于最先进的半监督方法,在注释有限的情况下取得了特别显著的进步。
{"title":"Harnessing differentiable geometry and orientation attention for semi-supervised vessel segmentation with limited annotations","authors":"Yan Liu ,&nbsp;Yan Yang ,&nbsp;Yongquan Jiang ,&nbsp;Xiaole Zhao ,&nbsp;Liang Fan","doi":"10.1016/j.displa.2026.103347","DOIUrl":"10.1016/j.displa.2026.103347","url":null,"abstract":"<div><div>The precise segmentation of vascular structures is vital for diagnosing retinal and coronary artery diseases. However, the complex morphology and large structural variability of blood vessels make manual annotation time-consuming and finite which in turn limits the scalability of supervised segmentation methods. We propose a semi-supervised segmentation framework named geometric orientational fusion attention network (GOFA-Net) that integrates differentiable geometric augmentation and orientation-aware attention to effectively leverage knowledge from limited annotations. GOFA-Net comprises three key complementary components: 1) a differentiable geometric augmentation strategy (DGAS) employs quaternion-based representations to diversify training samples while preserving prediction consistency between teacher and student models; 2) a multi-view fusion module (MVFM) orchestrates collaborative feature learning between quaternion and conventional convolutional streams to capture comprehensive spatial dependencies; and 3) a global orientational attention module (GOAM) enhances structural awareness through direction-sensitive geometric embeddings, specifically reinforcing the perception of vascular topology along horizontal and vertical orientations. Extensive validation on multiple retinal vessel datasets (DRIVE, STARE, CHASE_DB1, and HRF) and coronary angiography datasets (DCA1 and CHUAC) show that GOFA-Net consistently outperforms state-of-the-art semi-supervised methods, achieving particularly notable gains in scenarios with limited annotations.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103347"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Displays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1