Pub Date : 2026-01-29DOI: 10.1016/j.displa.2026.103369
Peitong Han, Yan Zhao, Jian Wei, Shibo Wang, Shigang Wang
The evolution from conventional glass cockpits to enclosed, multi-screen display configurations has significantly increased visual complexity and pilot cognitive workload, posing new challenges for the scientific assessment of visual perception. Current evaluation methods primarily rely on subjective questionnaires, which lack objectivity, timeliness, and cannot support real-time cockpit optimization. To overcome these limitations, this study presents an objective visual perception assessment approach for closed cockpit environments. Specifically, three novel eye-tracking indicators – perceptual continuity, visual responsiveness, and focus degree – are proposed and extracted using algorithms developed in this work. These indicators are fused through a regression-based model to achieve non-intrusive and quantitative perception evaluation based on eye-tracking data collected in real time. Experiments conducted in a three-screen splicing scenario demonstrate that the proposed method achieves high prediction accuracy and robustness, providing an effective tool for optimizing cockpit display design and monitoring pilot perceptual states during flight operations.
{"title":"Eye-tracking-based perceptual performance evaluation for multi-screen spliced aircraft","authors":"Peitong Han, Yan Zhao, Jian Wei, Shibo Wang, Shigang Wang","doi":"10.1016/j.displa.2026.103369","DOIUrl":"10.1016/j.displa.2026.103369","url":null,"abstract":"<div><div>The evolution from conventional glass cockpits to enclosed, multi-screen display configurations has significantly increased visual complexity and pilot cognitive workload, posing new challenges for the scientific assessment of visual perception. Current evaluation methods primarily rely on subjective questionnaires, which lack objectivity, timeliness, and cannot support real-time cockpit optimization. To overcome these limitations, this study presents an objective visual perception assessment approach for closed cockpit environments. Specifically, three novel eye-tracking indicators – perceptual continuity, visual responsiveness, and focus degree – are proposed and extracted using algorithms developed in this work. These indicators are fused through a regression-based model to achieve non-intrusive and quantitative perception evaluation based on eye-tracking data collected in real time. Experiments conducted in a three-screen splicing scenario demonstrate that the proposed method achieves high prediction accuracy and robustness, providing an effective tool for optimizing cockpit display design and monitoring pilot perceptual states during flight operations.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103369"},"PeriodicalIF":3.4,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.displa.2026.103362
Huoxiang Yang , Shuangyan Yi , Fanyang Meng , Wei Liu , Yongsheng Liang
Existing intra-layer pruning methods face two major challenges. First, they often use empirically determined pruning ratios, which ignore the distinct statistical properties of different layers and limit global structural optimization. Second, they fail to effectively model intra-layer dependencies, leading to inaccurate identification of redundant filters. To address these challenges, we propose a novel structured pruning method that integrates a cross-layer metric and feature reconstruction constrained by the . Initially, we introduce a cross-layer importance metric based on statistical distribution standardization. By normalizing the statistical properties of feature responses across layers, our approach constructs a unified metric for consistent importance evaluation. Dynamic thresholding is then applied to adaptively determine the pruning ratio for each layer, resulting in enhanced network architectures and improved pruning efficiency. Subsequently, based on the layer-wise pruning ratios, we introduce an constrained sparse reconstruction model that captures intra-layer dependencies. The model produces a sparse coefficient matrix with entirely zeroed columns corresponding to redundant components, thus enhancing the accuracy of redundant filter identification. Extensive experiments on multiple benchmark datasets and network architectures verify the effectiveness of our method. For instance, on CIFAR-10 with VGG-16, our method achieves a 58.21% reduction in computational complexity and a 90.65% decrease in storage cost, with only a 0.11% accuracy drop.
{"title":"Structured pruning via cross-layer metric and ℓ2,0-norm sparse reconstruction","authors":"Huoxiang Yang , Shuangyan Yi , Fanyang Meng , Wei Liu , Yongsheng Liang","doi":"10.1016/j.displa.2026.103362","DOIUrl":"10.1016/j.displa.2026.103362","url":null,"abstract":"<div><div>Existing intra-layer pruning methods face two major challenges. First, they often use empirically determined pruning ratios, which ignore the distinct statistical properties of different layers and limit global structural optimization. Second, they fail to effectively model intra-layer dependencies, leading to inaccurate identification of redundant filters. To address these challenges, we propose a novel structured pruning method that integrates a cross-layer metric and feature reconstruction constrained by the <span><math><mrow><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>0</mn></mrow></msub><mtext>-norm</mtext></mrow></math></span>. Initially, we introduce a cross-layer importance metric based on statistical distribution standardization. By normalizing the statistical properties of feature responses across layers, our approach constructs a unified metric for consistent importance evaluation. Dynamic thresholding is then applied to adaptively determine the pruning ratio for each layer, resulting in enhanced network architectures and improved pruning efficiency. Subsequently, based on the layer-wise pruning ratios, we introduce an <span><math><mrow><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>0</mn></mrow></msub><mtext>-norm</mtext></mrow></math></span> constrained sparse reconstruction model that captures intra-layer dependencies. The model produces a sparse coefficient matrix with entirely zeroed columns corresponding to redundant components, thus enhancing the accuracy of redundant filter identification. Extensive experiments on multiple benchmark datasets and network architectures verify the effectiveness of our method. For instance, on CIFAR-10 with VGG-16, our method achieves a 58.21% reduction in computational complexity and a 90.65% decrease in storage cost, with only a 0.11% accuracy drop.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103362"},"PeriodicalIF":3.4,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-28DOI: 10.1016/j.displa.2026.103364
Kexuan Shi , Zhuang Qi , Jingjing Zhu , Lei Meng , Yaochen Zhang , Haibei Huang , Xiangxu Meng
Open-set few-shot image classification aims to train models using a small amount of labeled data, enabling them to achieve good generalization when confronted with unknown environments. Existing methods mainly use visual information from a single image to learn class representations to distinguish known from unknown categories. However, these methods often overlook the benefits of integrating rich contextual information. To address this issue, this paper proposes a prototypical augmentation and alignment method, termed ProtoConNet, which incorporates background information from different samples to enhance the diversity of the feature space, breaking the spurious associations between context and image subjects in few-shot scenarios. Specifically, it consists of three main modules: the clustering-based data selection (CDS) module mines diverse data patterns while preserving core features; the contextual-enhanced semantic refinement (CSR) module builds a context dictionary to integrate into image representations, which boosts the model’s robustness in various scenarios; and the prototypical alignment (PA) module reduces the gap between image representations and class prototypes, amplifying feature distances for known and unknown classes. Experimental results from two datasets verified that ProtoConNet enhances the effectiveness of representation learning in few-shot scenarios and identifies open-set samples, making it superior to existing methods.
{"title":"ProtoConNet: Prototypical augmentation and alignment for open-set few-shot image classification","authors":"Kexuan Shi , Zhuang Qi , Jingjing Zhu , Lei Meng , Yaochen Zhang , Haibei Huang , Xiangxu Meng","doi":"10.1016/j.displa.2026.103364","DOIUrl":"10.1016/j.displa.2026.103364","url":null,"abstract":"<div><div>Open-set few-shot image classification aims to train models using a small amount of labeled data, enabling them to achieve good generalization when confronted with unknown environments. Existing methods mainly use visual information from a single image to learn class representations to distinguish known from unknown categories. However, these methods often overlook the benefits of integrating rich contextual information. To address this issue, this paper proposes a prototypical augmentation and alignment method, termed ProtoConNet, which incorporates background information from different samples to enhance the diversity of the feature space, breaking the spurious associations between context and image subjects in few-shot scenarios. Specifically, it consists of three main modules: the clustering-based data selection (CDS) module mines diverse data patterns while preserving core features; the contextual-enhanced semantic refinement (CSR) module builds a context dictionary to integrate into image representations, which boosts the model’s robustness in various scenarios; and the prototypical alignment (PA) module reduces the gap between image representations and class prototypes, amplifying feature distances for known and unknown classes. Experimental results from two datasets verified that ProtoConNet enhances the effectiveness of representation learning in few-shot scenarios and identifies open-set samples, making it superior to existing methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103364"},"PeriodicalIF":3.4,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-28DOI: 10.1016/j.displa.2026.103359
Qiwen Yuan , Jiajie Chen , Zhendong Shi
Unauthorized use of doctor’s code is a high-risk and context-dependent issue in health insurance supervision. Traditional rule-based screening achieves high recall but often produces false positives in cases that appear anomalous yet are clinically legitimate, such as telemedicine encounters, refund-related re-settlements, and rapid outpatient–emergency transitions. These methods lack semantic understanding of medical context and rely heavily on manual auditing. We propose a hybrid detection framework that integrates rule-based temporal filtering with large language model (LLM)–based semantic reasoning. Time-threshold rules are first applied to extract suspected cases from real health-insurance claim data. Expert-derived legitimate scenario patterns are then embedded into structured prompts to guide the LLM in semantic plausibility assessment and false-positive reduction. For evaluation, we construct a 240-pair multi-scenario benchmark dataset from de-identified real claim records, covering both reasonable and suspicious situations. Zero-shot experiments with DeepSeek-R1-7B show that the framework achieves 75% accuracy and 87% precision in distinguishing reasonable from unauthorized cases. These results indicate that the proposed method can effectively reduce false alarms and alleviate manual audit workload, providing a practical and efficient solution for real-world health-insurance supervision.
{"title":"Hybrid detection model for unauthorized use of doctor’s code in health insurance: Integrating rule-based screening and LLM reasoning","authors":"Qiwen Yuan , Jiajie Chen , Zhendong Shi","doi":"10.1016/j.displa.2026.103359","DOIUrl":"10.1016/j.displa.2026.103359","url":null,"abstract":"<div><div>Unauthorized use of doctor’s code is a high-risk and context-dependent issue in health insurance supervision. Traditional rule-based screening achieves high recall but often produces false positives in cases that appear anomalous yet are clinically legitimate, such as telemedicine encounters, refund-related re-settlements, and rapid outpatient–emergency transitions. These methods lack semantic understanding of medical context and rely heavily on manual auditing. We propose a hybrid detection framework that integrates rule-based temporal filtering with large language model (LLM)–based semantic reasoning. Time-threshold rules are first applied to extract suspected cases from real health-insurance claim data. Expert-derived legitimate scenario patterns are then embedded into structured prompts to guide the LLM in semantic plausibility assessment and false-positive reduction. For evaluation, we construct a 240-pair multi-scenario benchmark dataset from de-identified real claim records, covering both reasonable and suspicious situations. Zero-shot experiments with DeepSeek-R1-7B show that the framework achieves 75% accuracy and 87% precision in distinguishing reasonable from unauthorized cases. These results indicate that the proposed method can effectively reduce false alarms and alleviate manual audit workload, providing a practical and efficient solution for real-world health-insurance supervision.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103359"},"PeriodicalIF":3.4,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.displa.2026.103368
Hao Liu , Maoji Qiu , Rong Huang
For block compressive sensing (BCS) of natural videos, existing reconstruction algorithms typically utilize nonlocal self-similarity (NSS) to generate sparse residuals, thereby achieving favorable recovery performance by exploiting the statistical characteristics of key frames and non-key frames. However, when applied to multi-perspective infrared aerial videos rather than natural videos, these reconstruction algorithms usually result in poor recovery quality because of the inflexibility in selecting similar patches and poor adaptability to dynamic scene changes. Due to the distribution property of infrared aerial imagery, inter-frame and intra-frame similar patches should be selected adaptively so that an accurate dictionary matrix can be learned. Therefore, this paper proposes a content-adaptive dual feature selection mechanism. It first conducts a rough screening of inter-frame and intra-frame similar patches based on the correlation of observed measurement vectors across frames. Then, it is followed by a fine screening stage, where principal component analysis (PCA) is applied to project the similar patch-group matrix into a low-dimensional space. Finally, the split Bregman iteration (SBI) is employed to solve the BCS reconstruction for infrared aerial video. Experimental results on both HIT-UAV and M200-XT2DroneVehicle datasets demonstrate that the proposed algorithm achieves better recovery quality compared to state-of-the-art algorithms.
{"title":"Content-adaptive dual feature selection for infrared aerial video compressive sensing reconstruction","authors":"Hao Liu , Maoji Qiu , Rong Huang","doi":"10.1016/j.displa.2026.103368","DOIUrl":"10.1016/j.displa.2026.103368","url":null,"abstract":"<div><div>For block compressive sensing (BCS) of natural videos, existing reconstruction algorithms typically utilize nonlocal self-similarity (NSS) to generate sparse residuals, thereby achieving favorable recovery performance by exploiting the statistical characteristics of key frames and non-key frames. However, when applied to multi-perspective infrared aerial videos rather than natural videos, these reconstruction algorithms usually result in poor recovery quality because of the inflexibility in selecting similar patches and poor adaptability to dynamic scene changes. Due to the distribution property of infrared aerial imagery, inter-frame and intra-frame similar patches should be selected adaptively so that an accurate dictionary matrix can be learned. Therefore, this paper proposes a content-adaptive dual feature selection mechanism. It first conducts a rough screening of inter-frame and intra-frame similar patches based on the correlation of observed measurement vectors across frames. Then, it is followed by a fine screening stage, where principal component analysis (PCA) is applied to project the similar patch-group matrix into a low-dimensional space. Finally, the split Bregman iteration (SBI) is employed to solve the BCS reconstruction for infrared aerial video. Experimental results on both HIT-UAV and M200-XT2DroneVehicle datasets demonstrate that the proposed algorithm achieves better recovery quality compared to state-of-the-art algorithms.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103368"},"PeriodicalIF":3.4,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.displa.2026.103366
Xinggang Hou , Bingchen Gou , Dengkai Chen , Jianjie Chu , Xiaosai Duan , Xuerui Li , Lin Ma , Jing Chen , Yao Zhou
In monitoring tasks involving sustained interaction with display systems, fatigue is a primary factor diminishing efficiency. Traditional models confuse sleepiness with mental fatigue, which compromises the reliability of assessments. We propose an explainable multimodal framework that models these two subtypes separately and integrates them into a comprehensive fatigue assessment. To validate our methodology, we invited 20 pilots to participate in a 90-minute continuous monitoring experiment, during which we collected multimodal data including their eye movements, electroencephalogram (EEG), electrocardiogram (ECG), and video. First, we derive explicit representation functions for sleepiness and mental fatigue using symbolic regression on facial and behavioral cues, enabling continuous subtype related labeling beyond intermittent questionnaires. Second, we identify compact physiological marker subsets via a cascaded feature selection method that combines mRMR prescreening with a heuristic search, yielding key feature sets while substantially reducing dimensionality. Finally, dynamic weighted coupling analysis based on information entropy revealed the nonlinear superposition effects between sleepiness and mental fatigue. Using 30 s windows under the current cohort and evaluation setting, the resulting comprehensive classifier achieves 94.8% accuracy. Following external validation and domain-specific adaptations, the methodology developed in this study holds broad application prospects across numerous automation scenarios involving monotonous human–machine interaction tasks.
{"title":"Distinguishing sleepiness from mental fatigue in sustained monitoring tasks to enhance the reliability of fatigue detection based on multimodal fusion","authors":"Xinggang Hou , Bingchen Gou , Dengkai Chen , Jianjie Chu , Xiaosai Duan , Xuerui Li , Lin Ma , Jing Chen , Yao Zhou","doi":"10.1016/j.displa.2026.103366","DOIUrl":"10.1016/j.displa.2026.103366","url":null,"abstract":"<div><div>In monitoring tasks involving sustained interaction with display systems, fatigue is a primary factor diminishing efficiency. Traditional models confuse sleepiness with mental fatigue, which compromises the reliability of assessments. We propose an explainable multimodal framework that models these two subtypes separately and integrates them into a comprehensive fatigue assessment. To validate our methodology, we invited 20 pilots to participate in a 90-minute continuous monitoring experiment, during which we collected multimodal data including their eye movements, electroencephalogram (EEG), electrocardiogram (ECG), and video. First, we derive explicit representation functions for sleepiness and mental fatigue using symbolic regression on facial and behavioral cues, enabling continuous subtype related labeling beyond intermittent questionnaires. Second, we identify compact physiological marker subsets via a cascaded feature selection method that combines mRMR prescreening with a heuristic search, yielding key feature sets while substantially reducing dimensionality. Finally, dynamic weighted coupling analysis based on information entropy revealed the nonlinear superposition effects between sleepiness and mental fatigue. Using 30 s windows under the current cohort and evaluation setting, the resulting comprehensive classifier achieves 94.8% accuracy. Following external validation and domain-specific adaptations, the methodology developed in this study holds broad application prospects across numerous automation scenarios involving monotonous human–machine interaction tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103366"},"PeriodicalIF":3.4,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.displa.2026.103365
Yuehui Liao , Yun Zheng , Yingjie Jiao , Na Tang , Yuhao Wang , Yu Hu , Yaning Feng , Ruofan Wang , Qun Jin , Xiaobo Lai , Panfei Li
Accurate glioma segmentation in magnetic resonance imaging (MRI) is crucial for effective diagnosis and treatment planning in neuro-oncology; however, this process is often time-consuming and heavily reliant on expert annotations. To address these limitations, we present Trans-MT, a 3D semi-supervised segmentation model that integrates a transformer-based architecture with asymmetric data augmentation, achieving high segmentation accuracy with limited labeled data. Trans-MT employs a teacher-student framework: the teacher model generates reliable pseudo-labels for unlabeled data, while the student model learns through supervised and consistency losses, guided by an uncertainty-aware mechanism to refine its predictions. The architecture of Trans-MT features a hybrid encoder, nnUFormer, which combines the robust capabilities of nn-UNet with transformers, enabling it to capture global contextual information essential for accurate tumor segmentation. This design enhances the model’s ability to detect intricate tumor structures within MRI scans, even with sparse annotations. Additionally, the model’s learning process is strengthened by asymmetric data augmentation, which enriches data diversity and robustness. We evaluated Trans-MT on the BraTS 2019, 2020, and 2021 datasets, where it demonstrated superior performance over several state-of-the-art semi-supervised models, particularly in segmenting challenging tumor sub-regions. The results confirm that Trans-MT significantly improves segmentation precision, making it a valuable advancement in brain tumor segmentation methodology and a practical solution for clinical settings with limited labeled data. Our code is available at https://github.com/smallboy-code/TransMT.
{"title":"Trans-MT: a 3D semi-supervised glioma segmentation model integrating transformer architecture and asymmetric data augmentation","authors":"Yuehui Liao , Yun Zheng , Yingjie Jiao , Na Tang , Yuhao Wang , Yu Hu , Yaning Feng , Ruofan Wang , Qun Jin , Xiaobo Lai , Panfei Li","doi":"10.1016/j.displa.2026.103365","DOIUrl":"10.1016/j.displa.2026.103365","url":null,"abstract":"<div><div>Accurate glioma segmentation in magnetic resonance imaging (MRI) is crucial for effective diagnosis and treatment planning in neuro-oncology; however, this process is often time-consuming and heavily reliant on expert annotations. To address these limitations, we present Trans-MT, a 3D semi-supervised segmentation model that integrates a transformer-based architecture with asymmetric data augmentation, achieving high segmentation accuracy with limited labeled data. Trans-MT employs a teacher-student framework: the teacher model generates reliable pseudo-labels for unlabeled data, while the student model learns through supervised and consistency losses, guided by an uncertainty-aware mechanism to refine its predictions. The architecture of Trans-MT features a hybrid encoder, nnUFormer, which combines the robust capabilities of nn-UNet with transformers, enabling it to capture global contextual information essential for accurate tumor segmentation. This design enhances the model’s ability to detect intricate tumor structures within MRI scans, even with sparse annotations. Additionally, the model’s learning process is strengthened by asymmetric data augmentation, which enriches data diversity and robustness. We evaluated Trans-MT on the BraTS 2019, 2020, and 2021 datasets, where it demonstrated superior performance over several state-of-the-art semi-supervised models, particularly in segmenting challenging tumor sub-regions. The results confirm that Trans-MT significantly improves segmentation precision, making it a valuable advancement in brain tumor segmentation methodology and a practical solution for clinical settings with limited labeled data. Our code is available<!--> <!-->at <span><span>https://github.com/smallboy-code/TransMT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103365"},"PeriodicalIF":3.4,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.displa.2026.103355
Yongqing Huo, Wenke Jiang
Video anomaly detection (VAD) is critical for automated identification of anomalous behaviors in surveillance system, with applications in public safety, intelligent transportation and healthcare. However, with the continuous expansion of application domains, ensuring that VAD algorithm maintains excellent detection performance across diverse scenarios has become the primary focus of current research direction. To enhance the robustness of detection across various environments, we propose a novel autoencoder-based model in this paper. Compared with other algorithms, our method can more effectively exploit multi-scale feature information within frames for learning feature distribution. In the encoder, we construct the convolutional module with multiple kernel sizes and incorporate the designed Spatial-Channel Transformer Attention (SCTA) module to strengthen the feature representation. In the decoder, we integrate the multi-scale feature reconstruction module with Self-Supervised Predictive Convolutional Attentive Blocks (SSPCAB) for more accurate next-frame prediction. Moreover, we introduce a dedicated memory module to capture and store the distribution of normal data patterns. Meanwhile, the architecture employs the Conv-LSTM and a specially designed Temporal-Spatial Attention (TSA) module in skip connections to capture spatiotemporal dependencies across video frames. Benefiting from the design and integration of those modules, our proposed method achieves superior detection performance on public datasets, including UCSD Ped2, CUHK Avenue and ShanghaiTech. The experimental results demonstrate the effectiveness and versatility of our method in anomaly detection tasks.
{"title":"Learning video normality for anomaly detection via multi-scale spatiotemporal feature extraction and a feature memory module","authors":"Yongqing Huo, Wenke Jiang","doi":"10.1016/j.displa.2026.103355","DOIUrl":"10.1016/j.displa.2026.103355","url":null,"abstract":"<div><div>Video anomaly detection (VAD) is critical for automated identification of anomalous behaviors in surveillance system, with applications in public safety, intelligent transportation and healthcare. However, with the continuous expansion of application domains, ensuring that VAD algorithm maintains excellent detection performance across diverse scenarios has become the primary focus of current research direction. To enhance the robustness of detection across various environments, we propose a novel autoencoder-based model in this paper. Compared with other algorithms, our method can more effectively exploit multi-scale feature information within frames for learning feature distribution. In the encoder, we construct the convolutional module with multiple kernel sizes and incorporate the designed Spatial-Channel Transformer Attention (SCTA) module to strengthen the feature representation. In the decoder, we integrate the multi-scale feature reconstruction module with Self-Supervised Predictive Convolutional Attentive Blocks (SSPCAB) for more accurate next-frame prediction. Moreover, we introduce a dedicated memory module to capture and store the distribution of normal data patterns. Meanwhile, the architecture employs the Conv-LSTM and a specially designed Temporal-Spatial Attention (TSA) module in skip connections to capture spatiotemporal dependencies across video frames. Benefiting from the design and integration of those modules, our proposed method achieves superior detection performance on public datasets, including UCSD Ped2, CUHK Avenue and ShanghaiTech. The experimental results demonstrate the effectiveness and versatility of our method in anomaly detection tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103355"},"PeriodicalIF":3.4,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.displa.2026.103358
Yuehui Liao , Yun Zheng , Jingyu Zhu , Yu Chen , Feng Gao , Yaning Feng , Weiji Yang , Guang Yang , Xiaobo Lai , Panfei Li
Glioblastoma (GBM) is an aggressive brain tumor associated with poor prognosis and limited treatment options. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT) promoter is a critical biomarker for predicting the efficacy of temozolomide chemotherapy in GBM patients. However, current methods for determining MGMT promoter methylation, including invasive and costly techniques, hinder their widespread clinical application. In this study, we propose a novel non-invasive deep learning framework based on a Mixture-of-Experts (MoE) architecture for predicting MGMT promoter methylation status using multi-modal magnetic resonance imaging (MRI) data. Our MoE model incorporates modality-specific expert networks built on the ResNet18 architecture, with a self-attention-based gating mechanism that dynamically selects and integrates the most relevant features across MRI modalities (T1-weighted, contrast-enhanced T1, T2-weighted, and fluid-attenuated inversion recovery). We evaluate the proposed framework on the BraTS2021 and TCGA-GBM datasets, showing superior performance compared to conventional deep learning models in terms of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Furthermore, Grad-CAM visualizations provide enhanced interpretability by highlighting biologically relevant regions in the tumor and peritumoral areas that influence model predictions. The proposed framework represents a promising tool for integrating imaging biomarkers into precision oncology workflows, offering a scalable, cost-effective, and interpretable solution for non-invasive MGMT methylation prediction in GBM.
{"title":"Self-attention-based mixture-of-experts framework for non-invasive prediction of MGMT promoter methylation in glioblastoma using multi-modal MRI","authors":"Yuehui Liao , Yun Zheng , Jingyu Zhu , Yu Chen , Feng Gao , Yaning Feng , Weiji Yang , Guang Yang , Xiaobo Lai , Panfei Li","doi":"10.1016/j.displa.2026.103358","DOIUrl":"10.1016/j.displa.2026.103358","url":null,"abstract":"<div><div>Glioblastoma (GBM) is an aggressive brain tumor associated with poor prognosis and limited treatment options. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT) promoter is a critical biomarker for predicting the efficacy of temozolomide chemotherapy in GBM patients. However, current methods for determining MGMT promoter methylation, including invasive and costly techniques, hinder their widespread clinical application. In this study, we propose a novel non-invasive deep learning framework based on a Mixture-of-Experts (MoE) architecture for predicting MGMT promoter methylation status using multi-modal magnetic resonance imaging (MRI) data. Our MoE model incorporates modality-specific expert networks built on the ResNet18 architecture, with a self-attention-based gating mechanism that dynamically selects and integrates the most relevant features across MRI modalities (T1-weighted, contrast-enhanced T1, T2-weighted, and fluid-attenuated inversion recovery). We evaluate the proposed framework on the BraTS2021 and TCGA-GBM datasets, showing superior performance compared to conventional deep learning models in terms of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Furthermore, Grad-CAM visualizations provide enhanced interpretability by highlighting biologically relevant regions in the tumor and peritumoral areas that influence model predictions. The proposed framework represents a promising tool for integrating imaging biomarkers into precision oncology workflows, offering a scalable, cost-effective, and interpretable solution for non-invasive MGMT methylation prediction in GBM.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103358"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.displa.2026.103347
Yan Liu , Yan Yang , Yongquan Jiang , Xiaole Zhao , Liang Fan
The precise segmentation of vascular structures is vital for diagnosing retinal and coronary artery diseases. However, the complex morphology and large structural variability of blood vessels make manual annotation time-consuming and finite which in turn limits the scalability of supervised segmentation methods. We propose a semi-supervised segmentation framework named geometric orientational fusion attention network (GOFA-Net) that integrates differentiable geometric augmentation and orientation-aware attention to effectively leverage knowledge from limited annotations. GOFA-Net comprises three key complementary components: 1) a differentiable geometric augmentation strategy (DGAS) employs quaternion-based representations to diversify training samples while preserving prediction consistency between teacher and student models; 2) a multi-view fusion module (MVFM) orchestrates collaborative feature learning between quaternion and conventional convolutional streams to capture comprehensive spatial dependencies; and 3) a global orientational attention module (GOAM) enhances structural awareness through direction-sensitive geometric embeddings, specifically reinforcing the perception of vascular topology along horizontal and vertical orientations. Extensive validation on multiple retinal vessel datasets (DRIVE, STARE, CHASE_DB1, and HRF) and coronary angiography datasets (DCA1 and CHUAC) show that GOFA-Net consistently outperforms state-of-the-art semi-supervised methods, achieving particularly notable gains in scenarios with limited annotations.
{"title":"Harnessing differentiable geometry and orientation attention for semi-supervised vessel segmentation with limited annotations","authors":"Yan Liu , Yan Yang , Yongquan Jiang , Xiaole Zhao , Liang Fan","doi":"10.1016/j.displa.2026.103347","DOIUrl":"10.1016/j.displa.2026.103347","url":null,"abstract":"<div><div>The precise segmentation of vascular structures is vital for diagnosing retinal and coronary artery diseases. However, the complex morphology and large structural variability of blood vessels make manual annotation time-consuming and finite which in turn limits the scalability of supervised segmentation methods. We propose a semi-supervised segmentation framework named geometric orientational fusion attention network (GOFA-Net) that integrates differentiable geometric augmentation and orientation-aware attention to effectively leverage knowledge from limited annotations. GOFA-Net comprises three key complementary components: 1) a differentiable geometric augmentation strategy (DGAS) employs quaternion-based representations to diversify training samples while preserving prediction consistency between teacher and student models; 2) a multi-view fusion module (MVFM) orchestrates collaborative feature learning between quaternion and conventional convolutional streams to capture comprehensive spatial dependencies; and 3) a global orientational attention module (GOAM) enhances structural awareness through direction-sensitive geometric embeddings, specifically reinforcing the perception of vascular topology along horizontal and vertical orientations. Extensive validation on multiple retinal vessel datasets (DRIVE, STARE, CHASE_DB1, and HRF) and coronary angiography datasets (DCA1 and CHUAC) show that GOFA-Net consistently outperforms state-of-the-art semi-supervised methods, achieving particularly notable gains in scenarios with limited annotations.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103347"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}