Pub Date : 2026-04-01Epub Date: 2025-12-11DOI: 10.1016/j.displa.2025.103320
Chongze Wang , Ruiqi Cheng , Haoqing Yu , Xuan Gong , Hai-Miao Hu
Robust 3D object detection in challenging weather scenarios remains a significant challenge due to sensor and algorithm degradation caused by various environmental noises. In this paper, we propose a novel camera-radar-based 3D object detection framework that leverages a cross-modality knowledge distillation method to improve detection accuracy in adverse conditions, such as rain and snow. Specifically, we introduce a teacher-student training paradigm, where the teacher model is trained under clear weather and guides the student model trained under weather-degraded environments. We design three novel distillation losses focusing on spatial alignment, semantic consistency, and prediction refinement between different modalities to facilitate effective knowledge transfer. Moreover, a weather simulation module is introduced to generate adverse-weather-like input, enabling the student model to learn robust features under challenging conditions better. A gated fusion module is also integrated to adaptively fuse camera and radar features, enhancing robustness to modality-specific degradation. Experimental results on the nuScenes dataset reveal our model outperforms multiple state-of-the-art methods, achieving superior results across common detection metrics (mAP, NDS) and per-class AP, particularly under challenging weather, showing improvements of 3.5–3.9 % mAP and 4.3–4.8 % NDS in rainy and snowy scenes.
{"title":"Bridging the performance gap of 3D object detection in adverse weather conditions via camera-radar distillation (ChinaMM)","authors":"Chongze Wang , Ruiqi Cheng , Haoqing Yu , Xuan Gong , Hai-Miao Hu","doi":"10.1016/j.displa.2025.103320","DOIUrl":"10.1016/j.displa.2025.103320","url":null,"abstract":"<div><div>Robust 3D object detection in challenging weather scenarios remains a significant challenge due to sensor and algorithm degradation caused by various environmental noises. In this paper, we propose a novel camera-radar-based 3D object detection framework that leverages a cross-modality knowledge distillation method to improve detection accuracy in adverse conditions, such as rain and snow. Specifically, we introduce a teacher-student training paradigm, where the teacher model is trained under clear weather and guides the student model trained under weather-degraded environments. We design three novel distillation losses focusing on spatial alignment, semantic consistency, and prediction refinement between different modalities to facilitate effective knowledge transfer. Moreover, a weather simulation module is introduced to generate adverse-weather-like input, enabling the student model to learn robust features under challenging conditions better. A gated fusion module is also integrated to adaptively fuse camera and radar features, enhancing robustness to modality-specific degradation. Experimental results on the nuScenes dataset reveal our model outperforms multiple state-of-the-art methods, achieving superior results across common detection metrics (mAP, NDS) and per-class AP, particularly under challenging weather, showing improvements of 3.5–3.9 % mAP and 4.3–4.8 % NDS in rainy and snowy scenes.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103320"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-11-13DOI: 10.1016/j.displa.2025.103286
Jingyu Liu , Jiawei Zhang , Zhenyou Zou , Yibin Lin , Jinyu Ye , Wenfu Huang , Chaoxing Wu , Yongai Zhang , Jie Sun , Qun Yan , Xiongtu Zhou
The strong total internal reflection (TIR) in micro light-emitting diodes (Micro-LEDs) significantly limits light extraction efficiency (LEE) and uniformity of light distribution, thereby hindering their industrial applications. Inspired by the layered surface structures found in firefly lanterns, this study proposes a flexible bioinspired micro-/nano-composite structure that effectively enhances both LEE and the uniformity of light output. Finite-Difference Time-Domain (FDTD) simulations demonstrate that microstructures contribute to directional light extraction, whereas nanostructures facilitate overall optical optimization. A novel fabrication approach integrating grayscale photolithography, mechanical stretching, and plasma treatment was developed, enabling the realization of micro-/nano-composite structures with tunable design parameters. Experimental results indicate a 40.5% increase in external quantum efficiency (EQE) and a 41.6% improvement in power efficiency (PE) for blue Micro-LEDs, accompanied by enhanced angular light distribution, leading to wider viewing angles and near-ideal light uniformity. This advancement effectively resolves the longstanding challenge of balancing efficiency and uniformity in light extraction, thereby facilitating the industrialization of Micro-LED technology.
{"title":"Bioinspired micro-/nano-composite structures for simultaneous enhancement of light extraction efficiency and output uniformity in Micro-LEDs","authors":"Jingyu Liu , Jiawei Zhang , Zhenyou Zou , Yibin Lin , Jinyu Ye , Wenfu Huang , Chaoxing Wu , Yongai Zhang , Jie Sun , Qun Yan , Xiongtu Zhou","doi":"10.1016/j.displa.2025.103286","DOIUrl":"10.1016/j.displa.2025.103286","url":null,"abstract":"<div><div>The strong total internal reflection (TIR) in micro light-emitting diodes (Micro-LEDs) significantly limits light extraction efficiency (LEE) and uniformity of light distribution, thereby hindering their industrial applications. Inspired by the layered surface structures found in firefly lanterns, this study proposes a flexible bioinspired micro-/nano-composite structure that effectively enhances both LEE and the uniformity of light output. Finite-Difference Time-Domain (FDTD) simulations demonstrate that microstructures contribute to directional light extraction, whereas nanostructures facilitate overall optical optimization. A novel fabrication approach integrating grayscale photolithography, mechanical stretching, and plasma treatment was developed, enabling the realization of micro-/nano-composite structures with tunable design parameters. Experimental results indicate a 40.5% increase in external quantum efficiency (EQE) and a 41.6% improvement in power efficiency (PE) for blue Micro-LEDs, accompanied by enhanced angular light distribution, leading to wider viewing angles and near-ideal light uniformity. This advancement effectively resolves the longstanding challenge of balancing efficiency and uniformity in light extraction, thereby facilitating the industrialization of Micro-LED technology.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103286"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145580484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Event cameras are neuromorphically inspired sensors that output brightness changes in the form of a stream of asynchronous events instead of intensity frames. Event-based monocular depth estimation forms a foundation of widespread high dynamic vision applications. Existing monocular depth estimation networks, such as CNNs and transformers, suffer from the insufficient exploration of spatio-temporal correlation, and the high complexity. In this paper, we propose the Lightweight Deformable Attention Network (LDANet) for circumventing the two issues. The key component of LDANet is the Mixed Attention with Temporal Embedding (MATE) module, which consists of a lightweight deformable attention layer and a temporal embedding layer. The former, as an improvement of deformable attention, is equipped with a drifted token representation and a -nearest multi-head deformable-attention block, capturing the locally-spatial correlation. The latter is equipped with a cross-attention layer by querying the previous temporal event frame, encouraging to memorize the history of depth clues and capturing temporal correlation. Experiments on a real scenario dataset and a simulation scenario dataset show that, LDANet achieves a satisfactory balance between the inference efficiency and depth estimation accuracy. The code is available at https://github.com/wangsfan/LDA.
{"title":"Lightweight deformable attention for event-based monocular depth estimation","authors":"Jianye Yang, Shaofan Wang, Jingyi Wang, Yanfeng Sun, Baocai Yin","doi":"10.1016/j.displa.2025.103303","DOIUrl":"10.1016/j.displa.2025.103303","url":null,"abstract":"<div><div>Event cameras are neuromorphically inspired sensors that output brightness changes in the form of a stream of asynchronous events instead of intensity frames. Event-based monocular depth estimation forms a foundation of widespread high dynamic vision applications. Existing monocular depth estimation networks, such as CNNs and transformers, suffer from the insufficient exploration of spatio-temporal correlation, and the high complexity. In this paper, we propose the <u>L</u>ightweight <u>D</u>eformable <u>A</u>ttention <u>Net</u>work (LDANet) for circumventing the two issues. The key component of LDANet is the Mixed Attention with Temporal Embedding (MATE) module, which consists of a lightweight deformable attention layer and a temporal embedding layer. The former, as an improvement of deformable attention, is equipped with a drifted token representation and a <span><math><mi>K</mi></math></span>-nearest multi-head deformable-attention block, capturing the locally-spatial correlation. The latter is equipped with a cross-attention layer by querying the previous temporal event frame, encouraging to memorize the history of depth clues and capturing temporal correlation. Experiments on a real scenario dataset and a simulation scenario dataset show that, LDANet achieves a satisfactory balance between the inference efficiency and depth estimation accuracy. The code is available at <span><span>https://github.com/wangsfan/LDA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103303"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-12DOI: 10.1016/j.displa.2026.103350
Christopher A. Sanchez, Nisha Raghunath, Chelsea Ahart
Given the massive amount of visual media consumed across the world everyday, an open question is whether deviations from high-quality streaming can negatively impact viewer’s opinions and attitudes towards viewed content? Previous research has shown that reductions in perceptual quality can negatively impact attitudes in other contexts. These changes in quality often lead to corresponding changes in attitudes. Are users sensitive to changes in video quality, and does this impact reactions to viewed content? For example, do users enjoy lower quality videos as much as higher-quality versions? Do quality differences also make viewers less receptive to the content of videos? Across two studies, participants watched a video in lower- or higher-quality, and were then queried regarding their viewing experience. This included ratings of attitudes towards video streaming and video content, and also included measures of factual recall. Results indicated that viewers significantly prefer videos presented in higher quality, which drives future viewing intentions. Further, while factual memory for information was equivalent across video quality, participants who viewed the higher-quality video were more likely to show an affective reaction to the video, and also change their attitudes relative to the presented content. These results have implications for the design and delivery of online video content, and suggests that any deviations from higher-quality presentations can bias opinions relative to the viewed content. Lower-quality videos decreased attitudes towards content, and also negatively impacted viewers’ receptiveness to presented content.
{"title":"Differences in streaming quality impact viewer expectations, attitudes and reactions to video","authors":"Christopher A. Sanchez, Nisha Raghunath, Chelsea Ahart","doi":"10.1016/j.displa.2026.103350","DOIUrl":"10.1016/j.displa.2026.103350","url":null,"abstract":"<div><div>Given the massive amount of visual media consumed across the world everyday, an open question is whether deviations from high-quality streaming can negatively impact viewer’s opinions and attitudes towards viewed content? Previous research has shown that reductions in perceptual quality can negatively impact attitudes in other contexts. These changes in quality often lead to corresponding changes in attitudes. Are users sensitive to changes in video quality, and does this impact reactions to viewed content? For example, do users enjoy lower quality videos as much as higher-quality versions? Do quality differences also make viewers less receptive to the content of videos? Across two studies, participants watched a video in lower- or higher-quality, and were then queried regarding their viewing experience. This included ratings of attitudes towards video streaming and video content, and also included measures of factual recall. Results indicated that viewers significantly prefer videos presented in higher quality, which drives future viewing intentions. Further, while factual memory for information was equivalent across video quality, participants who viewed the higher-quality video were more likely to show an affective reaction to the video, and also change their attitudes relative to the presented content. These results have implications for the design and delivery of online video content, and suggests that any deviations from higher-quality presentations can bias opinions relative to the viewed content. Lower-quality videos decreased attitudes towards content, and also negatively impacted viewers’ receptiveness to presented content.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103350"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-16DOI: 10.1016/j.displa.2026.103342
María José Pérez-Peñalver , S.-W. Lee , Cristina Jordán , Esther Sanabria-Codesal , Samuel Morillas
In this paper, we propose a new inverse model for display characterization based on the direct model developed in Kim and Lee (2015). We use an iterative method to compute what inputs are able to produce a desired color expressed in device independent color coordinates. Whereas iterative approaches have been used in the past for this task, the main novelty in our proposal is the use of specific heuristics based on the former display model and color science principles to achieve an efficient and accurate convergence. On the one hand, to set the initial point of the iterative process, we use orthogonal projections of the desired color chromaticity, , onto the display’s chromaticity triangle to find the initial ratio the RGB coordinates need to have. Subsequently, we use a factor product, preserving RGB proportions, to initially approximate the desired color’s luminance. This factor is obtained through a nonlinear modeling of the relation between RGB and luminance. On the other hand, to reduce the number of iterations needed, we use the direct model mentioned above: to set the RGB values of the next iteration we look at the differences between color prediction provided by the direct model for the current RGB values and desired color coordinates but looking separately at chromaticity and luminance following the same reasoning as for the initial point. As we will see from the experimental results, the method is accurate, efficient and robust. With respect to state of the art, method performance is specially good for low quality displays where physical assumptions made by other models do not hold completely.
{"title":"A new iterative inverse display model","authors":"María José Pérez-Peñalver , S.-W. Lee , Cristina Jordán , Esther Sanabria-Codesal , Samuel Morillas","doi":"10.1016/j.displa.2026.103342","DOIUrl":"10.1016/j.displa.2026.103342","url":null,"abstract":"<div><div>In this paper, we propose a new inverse model for display characterization based on the direct model developed in Kim and Lee (2015). We use an iterative method to compute what inputs are able to produce a desired color expressed in device independent color coordinates. Whereas iterative approaches have been used in the past for this task, the main novelty in our proposal is the use of specific heuristics based on the former display model and color science principles to achieve an efficient and accurate convergence. On the one hand, to set the initial point of the iterative process, we use orthogonal projections of the desired color chromaticity, <span><math><mrow><mi>x</mi><mi>y</mi></mrow></math></span>, onto the display’s chromaticity triangle to find the initial ratio the RGB coordinates need to have. Subsequently, we use a factor product, preserving RGB proportions, to initially approximate the desired color’s luminance. This factor is obtained through a nonlinear modeling of the relation between RGB and luminance. On the other hand, to reduce the number of iterations needed, we use the direct model mentioned above: to set the RGB values of the next iteration we look at the differences between color prediction provided by the direct model for the current RGB values and desired color coordinates but looking separately at chromaticity and luminance following the same reasoning as for the initial point. As we will see from the experimental results, the method is accurate, efficient and robust. With respect to state of the art, method performance is specially good for low quality displays where physical assumptions made by other models do not hold completely.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103342"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-11-19DOI: 10.1016/j.displa.2025.103292
Junjie Li , Dewei Han , Jian Xu , Kang Li , Zhaoyuan Ma
Spatially separated teleoperation is crucial for inaccessible or hazardous scenarios but requires intuitive human–machine interfaces (HMIs) to ensure situational awareness, especially visual perception. While 360°panoramic vision offers immersion and a wide field of view, its high latency reduces efficiency and quality and causes motion sickness. This paper presents the Avatar system, an ultra-low-latency panoramic vision platform for teleoperation and telepresence. Using a convenient method, Avatar’s measured capture-to-display latency is only 220 ms. Two experiments with 43 participants demonstrated that Avatar achieves near-scene perception efficiency in near-field visual search. Its ultra-low latency also ensured high efficiency and quality in teleoperation tasks. Analysis of subjective questionnaires and physiological indicators confirmed that Avatar provides operators with intense immersion and presence. The system’s design and verification guide future universal, efficient HMI development for diverse applications.
{"title":"Design and evaluation of Avatar: An ultra-low-latency immersive human–machine interface for teleoperation","authors":"Junjie Li , Dewei Han , Jian Xu , Kang Li , Zhaoyuan Ma","doi":"10.1016/j.displa.2025.103292","DOIUrl":"10.1016/j.displa.2025.103292","url":null,"abstract":"<div><div>Spatially separated teleoperation is crucial for inaccessible or hazardous scenarios but requires intuitive human–machine interfaces (HMIs) to ensure situational awareness, especially visual perception. While 360°panoramic vision offers immersion and a wide field of view, its high latency reduces efficiency and quality and causes motion sickness. This paper presents the Avatar system, an ultra-low-latency panoramic vision platform for teleoperation and telepresence. Using a convenient method, Avatar’s measured capture-to-display latency is only 220 ms. Two experiments with 43 participants demonstrated that Avatar achieves near-scene perception efficiency in near-field visual search. Its ultra-low latency also ensured high efficiency and quality in teleoperation tasks. Analysis of subjective questionnaires and physiological indicators confirmed that Avatar provides operators with intense immersion and presence. The system’s design and verification guide future universal, efficient HMI development for diverse applications.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103292"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145580485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-12-15DOI: 10.1016/j.displa.2025.103319
Weihua He , Li Liang , Fei Ouyang , Guangming Yang , Peng Ding , Tong Zhang , Zhiyong Zhang
With advancements in robotics and artificial intelligence, Robotic Autonomous Ultrasound Screening (RAUSS) has emerged as a critical research area in medical technology. A major challenge in RAUSS is the automatic assessment of ultrasound image quality. In clinical practice, physicians evaluate images based on both pixel-level technical metrics and anatomical content. However, existing methods often emphasize positive anatomical features while neglecting negative factors such as noise and artifacts. To address this, we propose a dual-perspective multi-feature collaborative network (DM-Net) for ultrasound image quality assessment. Built on ResNet, the model extracts both positive anatomical features and negative artifacts, integrating them through a cross-attention mechanism for comprehensive quality evaluation. Experimental results show that the proposed method achieves superior consistency with expert evaluations, with a PLCC of 0.8318, SROCC of 0.8334, and an accuracy of 76.07%. It outperforms conventional methods used in robotic systems and aligns more closely with clinical assessments. Additionally, the system processes each image in just 0.062 s, meeting real-time requirements for robotic screening. This work provides a clinically relevant quality feedback solution for RAUSS and lays the foundation for future research on ultrasound video assessment.
{"title":"Ultrasound image quality assessment of robot screening based on dual perspective multi feature collaboration","authors":"Weihua He , Li Liang , Fei Ouyang , Guangming Yang , Peng Ding , Tong Zhang , Zhiyong Zhang","doi":"10.1016/j.displa.2025.103319","DOIUrl":"10.1016/j.displa.2025.103319","url":null,"abstract":"<div><div>With advancements in robotics and artificial intelligence, Robotic Autonomous Ultrasound Screening (RAUSS) has emerged as a critical research area in medical technology. A major challenge in RAUSS is the automatic assessment of ultrasound image quality. In clinical practice, physicians evaluate images based on both pixel-level technical metrics and anatomical content. However, existing methods often emphasize positive anatomical features while neglecting negative factors such as noise and artifacts. To address this, we propose a dual-perspective multi-feature collaborative network (DM-Net) for ultrasound image quality assessment. Built on ResNet, the model extracts both positive anatomical features and negative artifacts, integrating them through a cross-attention mechanism for comprehensive quality evaluation. Experimental results show that the proposed method achieves superior consistency with expert evaluations, with a PLCC of 0.8318, SROCC of 0.8334, and an accuracy of 76.07%. It outperforms conventional methods used in robotic systems and aligns more closely with clinical assessments. Additionally, the system processes each image in just 0.062 s, meeting real-time requirements for robotic screening. This work provides a clinically relevant quality feedback solution for RAUSS and lays the foundation for future research on ultrasound video assessment.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103319"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-11-28DOI: 10.1016/j.displa.2025.103299
Fangfang Lu , Haoyang Ni , Yijie Huang , Nan Guo , Kaiwei Zhang , Wei Sun , Xiongkuo Min
Traffic scene image quality assessment (IQA) is critical for intelligent transportation systems and autonomous driving applications. However, existing IQA methods are primarily designed for general real-world scenes and struggle to adapt to the structured elements and statistical characteristics unique to traffic scenes. Moreover, these methods overlook the distinct assessment needs arising from the spatially imbalanced perceptual importance in traffic scenes: some small regions (e.g., vehicles, pedestrians, traffic signals) are vital for driving safety, whereas some large regions (e.g., sky), despite their spatial dominance, are less critical. In addition, different traffic objects exhibit distinct degradation patterns due to their unique physical properties and texture structures, rendering a global quality score insufficient to represent differences in quality among these elements. Furthermore, the lack of IQA databases specifically for real-world traffic scenes has constrained further research development. To address these challenges, we construct a new real-world traffic scene IQA database providing both whole image quality scores and per-category quality scores for traffic object categories. Furthermore, we develop an adaptive multi-branch no-reference IQA network based on a dual-network architecture. This network extracts multi-scale features through pre-trained Swin Transformer combined with a semantic structure compensation module to enhance local structure modeling capability. It introduces a multi-branch assessment module utilizing object detection to identify traffic object location and category, achieving differentiated quality assessment for various traffic object categories. Experimental results show that the proposed method effectively outputs image quality for different objects within the same image on our constructed database and performs excellently on multiple general IQA databases.
{"title":"RTSIQA: A database and method for real-world traffic scenes image quality assessment","authors":"Fangfang Lu , Haoyang Ni , Yijie Huang , Nan Guo , Kaiwei Zhang , Wei Sun , Xiongkuo Min","doi":"10.1016/j.displa.2025.103299","DOIUrl":"10.1016/j.displa.2025.103299","url":null,"abstract":"<div><div>Traffic scene image quality assessment (IQA) is critical for intelligent transportation systems and autonomous driving applications. However, existing IQA methods are primarily designed for general real-world scenes and struggle to adapt to the structured elements and statistical characteristics unique to traffic scenes. Moreover, these methods overlook the distinct assessment needs arising from the spatially imbalanced perceptual importance in traffic scenes: some small regions (e.g., vehicles, pedestrians, traffic signals) are vital for driving safety, whereas some large regions (e.g., sky), despite their spatial dominance, are less critical. In addition, different traffic objects exhibit distinct degradation patterns due to their unique physical properties and texture structures, rendering a global quality score insufficient to represent differences in quality among these elements. Furthermore, the lack of IQA databases specifically for real-world traffic scenes has constrained further research development. To address these challenges, we construct a new real-world traffic scene IQA database providing both whole image quality scores and per-category quality scores for traffic object categories. Furthermore, we develop an adaptive multi-branch no-reference IQA network based on a dual-network architecture. This network extracts multi-scale features through pre-trained Swin Transformer combined with a semantic structure compensation module to enhance local structure modeling capability. It introduces a multi-branch assessment module utilizing object detection to identify traffic object location and category, achieving differentiated quality assessment for various traffic object categories. Experimental results show that the proposed method effectively outputs image quality for different objects within the same image on our constructed database and performs excellently on multiple general IQA databases.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103299"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-11-06DOI: 10.1016/j.displa.2025.103278
Dan Wu, Mengyin Wang, Fuming Sun
Camouflaged object detection is characterized by targets with fuzzy boundaries, diverse sizes, and backgrounds similar to the target objects. Due to these characteristics, existing methods tend to improve detection performance by building very complex models while ignoring computational efficiency. In addition, the difficulty in capturing the target makes it difficult to localize, and the huge background noise of the target leads to the loss of detailed features in the process. To address the above problems, we propose an efficient Global Guidance and Cascading Refinement Network (GCNet) with a streamlined structure. Firstly, considering the model size, we employ a lightweight SMT as the backbone. Secondly, we design a Rough Position Module (RPM) to coarsely localize the target by collecting global semantic information and guiding global features to anchor near the target location with high quality. Finally, we introduce a Feature Refinement Module (FRM), which employs a reverse attention mechanism to enhance feature discrimination and helps to highlight the camouflaged regions by refining features through an efficient cascading manner. Extensive experimental results show that GCNet outperforms 20 current methods on four benchmark datasets. Importantly, GCNet boasts a low number of parameters, a low computational complexity, and a very competitive inference speed, successfully balancing the incompatibility between model size and recognition accuracy. The codes are released at https://github.com/wd61419/GCNet.
伪装目标检测的特点是目标边界模糊、大小多样、背景与目标相似。由于这些特点,现有的方法往往通过建立非常复杂的模型来提高检测性能,而忽略了计算效率。此外,目标的难以捕获导致定位困难,目标巨大的背景噪声导致过程中细节特征的丢失。为了解决上述问题,我们提出了一种具有流线型结构的高效全局制导和级联细化网络(GCNet)。首先,考虑到模型的大小,我们采用轻量级的SMT作为主干。其次,设计了粗糙定位模块(Rough Position Module, RPM),通过收集全局语义信息,引导全局特征高质量地锚定在目标位置附近,对目标进行粗定位;最后,我们引入了一个特征细化模块(FRM),该模块采用反向注意机制来增强特征识别,并通过有效的级联方式来细化特征,从而帮助突出被伪装的区域。大量的实验结果表明,GCNet在4个基准数据集上优于目前20种方法。重要的是,GCNet具有较少的参数,较低的计算复杂度和极具竞争力的推理速度,成功地平衡了模型大小和识别精度之间的不兼容性。这些代码在https://github.com/wd61419/GCNet上发布。
{"title":"Towards camouflaged object detection via global guidance and cascading refinement","authors":"Dan Wu, Mengyin Wang, Fuming Sun","doi":"10.1016/j.displa.2025.103278","DOIUrl":"10.1016/j.displa.2025.103278","url":null,"abstract":"<div><div>Camouflaged object detection is characterized by targets with fuzzy boundaries, diverse sizes, and backgrounds similar to the target objects. Due to these characteristics, existing methods tend to improve detection performance by building very complex models while ignoring computational efficiency. In addition, the difficulty in capturing the target makes it difficult to localize, and the huge background noise of the target leads to the loss of detailed features in the process. To address the above problems, we propose an efficient Global Guidance and Cascading Refinement Network (GCNet) with a streamlined structure. Firstly, considering the model size, we employ a lightweight SMT as the backbone. Secondly, we design a Rough Position Module (RPM) to coarsely localize the target by collecting global semantic information and guiding global features to anchor near the target location with high quality. Finally, we introduce a Feature Refinement Module (FRM), which employs a reverse attention mechanism to enhance feature discrimination and helps to highlight the camouflaged regions by refining features through an efficient cascading manner. Extensive experimental results show that GCNet outperforms 20 current methods on four benchmark datasets. Importantly, GCNet boasts a low number of parameters, a low computational complexity, and a very competitive inference speed, successfully balancing the incompatibility between model size and recognition accuracy. The codes are released at <span><span>https://github.com/wd61419/GCNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103278"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145486353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-11-26DOI: 10.1016/j.displa.2025.103300
Miao Wang , Zechen Zheng , Congqian Wang , Chao Fan , Xuelei He
Segmenting medical images has grown in importance as a computer-aided diagnostic tool. However, unlabeled medical data, due to the lack of clear supervision signals, may lead to unclear optimization goals and the learning of pseudo-correlation features. To deal with these issues, a self-supervised medical image segmentation model based on edge attention and global feature enhancement (GFEM) has been set forth. This model conducts branch extraction of the local and global information of the image through global feature enhancement. A feature fusion module (MFF) based on mamba structure was utilized to enhance the relation of local and global feature. To pursue the accurate segmentation, the edge attention module and the compound edge loss function (CEEG-Loss) are combined to guide the edge information of the segmented object. The model was evaluated on Abdomen and CHAOS datasets with average 79.70% and 78.81% Dice. Extensive evaluations confirm our model outperforms baselines significantly and remains competitive against other methods.
{"title":"Enhancing medical image segmentation: A self-supervised approach with global feature enhancement and edge constraint guidance","authors":"Miao Wang , Zechen Zheng , Congqian Wang , Chao Fan , Xuelei He","doi":"10.1016/j.displa.2025.103300","DOIUrl":"10.1016/j.displa.2025.103300","url":null,"abstract":"<div><div>Segmenting medical images has grown in importance as a computer-aided diagnostic tool. However, unlabeled medical data, due to the lack of clear supervision signals, may lead to unclear optimization goals and the learning of pseudo-correlation features. To deal with these issues, a self-supervised medical image segmentation model based on edge attention and global feature enhancement (GFEM) has been set forth. This model conducts branch extraction of the local and global information of the image through global feature enhancement. A feature fusion module (MFF) based on mamba structure was utilized to enhance the relation of local and global feature. To pursue the accurate segmentation, the edge attention module and the compound edge loss function (CEEG-Loss) are combined to guide the edge information of the segmented object. The model was evaluated on Abdomen and CHAOS datasets with average 79.70% and 78.81% Dice. Extensive evaluations confirm our model outperforms baselines significantly and remains competitive against other methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103300"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}