Pub Date : 2026-01-17DOI: 10.1016/j.displa.2026.103354
Hangyu Li , Moquan Liu , Nan Wang, Mengcheng Sun, Yu Zhu
In clinical diagnosis, doctors usually judge the information by a few X-rays to avoid excessive ionizing radiation from harming the patient. The recent Neural Radiance Field (NERF) technology contemplates generating novel-views from a single X-ray to assist physicians in diagnosis. In this task, we consider two advantages of X-ray filming over natural images: (1) The medical equipment is fixed, and there is a standardized filming pose. (2) There is an apparent structural prior to X-rays of the same body part at the same pose. Based on such conditions, we propose a Pose-Guided generative radiance field (PGgraf) containing a generator and discriminator. In the training phase, the discriminator combines the image features with two kinds of pose information (ray direction set and camera angle) to guide the generator to synthesize X-rays consistent with the realistic view. In the generator, we design a Density Reconstruction Block (DRB). Unlike the original NERF, which directly estimates the particle density based on the particle positions, the DRB considers all the particle features sampled in a ray and integrally predicts the density of each particle. Experiments comparing qualitative–quantitative on two chest datasets and one knee dataset with state-of-the-art NERF schemes show that PGgraf has a clear advantage in inferring novel-views at different ranges. In the three ranges of 0°to 360°, −15°to 15°, and 75°to 105°, the Peak Signal-to-Noise Ratio (PSNR) improved by an average of 4.18 decibel, and the Learned Perceptual Image Patch Similarity (LPIPS) improved by an average of 50.7%.
{"title":"PGgraf: Pose-Guided generative radiance field for novel-views on X-ray","authors":"Hangyu Li , Moquan Liu , Nan Wang, Mengcheng Sun, Yu Zhu","doi":"10.1016/j.displa.2026.103354","DOIUrl":"10.1016/j.displa.2026.103354","url":null,"abstract":"<div><div>In clinical diagnosis, doctors usually judge the information by a few X-rays to avoid excessive ionizing radiation from harming the patient. The recent Neural Radiance Field (NERF) technology contemplates generating novel-views from a single X-ray to assist physicians in diagnosis. In this task, we consider two advantages of X-ray filming over natural images: (1) The medical equipment is fixed, and there is a standardized filming pose. (2) There is an apparent structural prior to X-rays of the same body part at the same pose. Based on such conditions, we propose a Pose-Guided generative radiance field (PGgraf) containing a generator and discriminator. In the training phase, the discriminator combines the image features with two kinds of pose information (ray direction set and camera angle) to guide the generator to synthesize X-rays consistent with the realistic view. In the generator, we design a Density Reconstruction Block (DRB). Unlike the original NERF, which directly estimates the particle density based on the particle positions, the DRB considers all the particle features sampled in a ray and integrally predicts the density of each particle. Experiments comparing qualitative–quantitative on two chest datasets and one knee dataset with state-of-the-art NERF schemes show that PGgraf has a clear advantage in inferring novel-views at different ranges. In the three ranges of 0°to 360°, −15°to 15°, and 75°to 105°, the Peak Signal-to-Noise Ratio (PSNR) improved by an average of 4.18 decibel, and the Learned Perceptual Image Patch Similarity (LPIPS) improved by an average of 50.7%.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103354"},"PeriodicalIF":3.4,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.displa.2026.103342
María José Pérez-Peñalver , S.-W. Lee , Cristina Jordán , Esther Sanabria-Codesal , Samuel Morillas
In this paper, we propose a new inverse model for display characterization based on the direct model developed in Kim and Lee (2015). We use an iterative method to compute what inputs are able to produce a desired color expressed in device independent color coordinates. Whereas iterative approaches have been used in the past for this task, the main novelty in our proposal is the use of specific heuristics based on the former display model and color science principles to achieve an efficient and accurate convergence. On the one hand, to set the initial point of the iterative process, we use orthogonal projections of the desired color chromaticity, , onto the display’s chromaticity triangle to find the initial ratio the RGB coordinates need to have. Subsequently, we use a factor product, preserving RGB proportions, to initially approximate the desired color’s luminance. This factor is obtained through a nonlinear modeling of the relation between RGB and luminance. On the other hand, to reduce the number of iterations needed, we use the direct model mentioned above: to set the RGB values of the next iteration we look at the differences between color prediction provided by the direct model for the current RGB values and desired color coordinates but looking separately at chromaticity and luminance following the same reasoning as for the initial point. As we will see from the experimental results, the method is accurate, efficient and robust. With respect to state of the art, method performance is specially good for low quality displays where physical assumptions made by other models do not hold completely.
{"title":"A new iterative inverse display model","authors":"María José Pérez-Peñalver , S.-W. Lee , Cristina Jordán , Esther Sanabria-Codesal , Samuel Morillas","doi":"10.1016/j.displa.2026.103342","DOIUrl":"10.1016/j.displa.2026.103342","url":null,"abstract":"<div><div>In this paper, we propose a new inverse model for display characterization based on the direct model developed in Kim and Lee (2015). We use an iterative method to compute what inputs are able to produce a desired color expressed in device independent color coordinates. Whereas iterative approaches have been used in the past for this task, the main novelty in our proposal is the use of specific heuristics based on the former display model and color science principles to achieve an efficient and accurate convergence. On the one hand, to set the initial point of the iterative process, we use orthogonal projections of the desired color chromaticity, <span><math><mrow><mi>x</mi><mi>y</mi></mrow></math></span>, onto the display’s chromaticity triangle to find the initial ratio the RGB coordinates need to have. Subsequently, we use a factor product, preserving RGB proportions, to initially approximate the desired color’s luminance. This factor is obtained through a nonlinear modeling of the relation between RGB and luminance. On the other hand, to reduce the number of iterations needed, we use the direct model mentioned above: to set the RGB values of the next iteration we look at the differences between color prediction provided by the direct model for the current RGB values and desired color coordinates but looking separately at chromaticity and luminance following the same reasoning as for the initial point. As we will see from the experimental results, the method is accurate, efficient and robust. With respect to state of the art, method performance is specially good for low quality displays where physical assumptions made by other models do not hold completely.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103342"},"PeriodicalIF":3.4,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.displa.2026.103351
Kang Zheng, Fu Ren
Road marking extraction is critical for high-definition mapping and autonomous driving, yet most lightweight models overlook the long-tailed appearance of thin markings during real-time inference. We propose Efficient Road Markings Segmentation Network (ERMSNet), a hybrid network that pairs lightweight design with the expressive power of Mamba and foundation models. ERMSNet comprises three synergistic branches. (1) A wavelet-augmented Baseline embeds a Road-Marking Mamba (RM-Mamba) whose bi-directional vertical scan captures elongated structures with fewer parameters than vanilla Mamba. (2) A Feature Enhancement branch distills dense image embeddings from the frozen Segment Anything Model (SAM) foundation model through a depth-wise squeeze-and-excitation adapter, injecting rich spatial detail at negligible cost. (3) An Attention Focusing branch projects text–image similarities produced by the Contrastive Language-Image Pre-training (CLIP) foundation model as soft masks that steer the decoder toward rare classes. Comprehensive experiments on CamVid, and our newly released Wuhan Road-Marking (WHRM) benchmark verify the design. Experimental results demonstrate that ERMSNet, with a lightweight configuration of only 0.99 million parameters and 6.44 GFLOPs, achieves mIoU scores of 79.85% and 81.18%, respectively. Compared with existing state-of-the-art methods, ERMSNet significantly reduces computational and memory costs while still delivering outstanding segmentation performance. Its superiority is especially evident in extracting thin and infrequently occurring road marking, highlighting its strong ability to balance efficiency and accuracy. Code and the WHRM dataset will be released upon publication.
{"title":"Efficient road marking extraction via cooperative enhancement of foundation models and Mamba","authors":"Kang Zheng, Fu Ren","doi":"10.1016/j.displa.2026.103351","DOIUrl":"10.1016/j.displa.2026.103351","url":null,"abstract":"<div><div>Road marking extraction is critical for high-definition mapping and autonomous driving, yet most lightweight models overlook the long-tailed appearance of thin markings during real-time inference. We propose Efficient Road Markings Segmentation Network (ERMSNet), a hybrid network that pairs lightweight design with the expressive power of Mamba and foundation models. ERMSNet comprises three synergistic branches. (1) A wavelet-augmented Baseline embeds a Road-Marking Mamba (RM-Mamba) whose bi-directional vertical scan captures elongated structures with fewer parameters than vanilla Mamba. (2) A Feature Enhancement branch distills dense image embeddings from the frozen Segment Anything Model (SAM) foundation model through a depth-wise squeeze-and-excitation adapter, injecting rich spatial detail at negligible cost. (3) An Attention Focusing branch projects text–image similarities produced by the Contrastive Language-Image Pre-training (CLIP) foundation model as soft masks that steer the decoder toward rare classes. Comprehensive experiments on CamVid, and our newly released Wuhan Road-Marking (WHRM) benchmark verify the design. Experimental results demonstrate that ERMSNet, with a lightweight configuration of only 0.99 million parameters and 6.44 GFLOPs, achieves mIoU scores of 79.85% and 81.18%, respectively. Compared with existing state-of-the-art methods, ERMSNet significantly reduces computational and memory costs while still delivering outstanding segmentation performance. Its superiority is especially evident in extracting thin and infrequently occurring road marking, highlighting its strong ability to balance efficiency and accuracy. Code and the WHRM dataset will be released upon publication.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103351"},"PeriodicalIF":3.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.displa.2026.103346
Haobin Shi , Quantao Wang , Zihan Wang , Jianning Zhan , Huijian Liang , Beiya Yang
With the growing demand for autonomous inspection with Unmanned Aerial Vehicles (UAVs) in dark and confined environments, accurately determining UAV position has become crucial. The Ultra-Wideband (UWB) localization technology offers a promising solution by overcoming challenges posed by signal obstruction, low illumination condition, and confined spaces. However, conventional UWB-based positioning suffers from performance oscillations due to measurement inconsistencies and degradations with time-varying noise models. Furthermore, the widely used Two-Way Time-of-Flight (TW-TOF) method has limitations, such as high energy consumption and a restricted number of tags to be deployed. To address these, a sensor fusion approach combining UWB and Inertial Measurement Unit (IMU) measurements with Time Difference of Arrival (TDOA) localization mechanism is proposed. This method exploits an adaptive Kalman filter, which dynamically adjusts to noise model variations and employs individual weighting factors for each anchor node, enhancing stability and robustness in challenging environments. The comprehensive experiments demonstrate the proposed algorithm achieves a median positioning error of 0.110 m, a 90th percentile error of 0.232 m, and an average standard deviation of 0.075 m with the significantly reduced energy consumption. Additionally, due to TDOA communication principles, this method supports multiple tag nodes, making it ideal for multi-UAV collaborative inspections in future applications.
{"title":"TDOA based localization mechanism for the UAV positioning in dark and confined environments","authors":"Haobin Shi , Quantao Wang , Zihan Wang , Jianning Zhan , Huijian Liang , Beiya Yang","doi":"10.1016/j.displa.2026.103346","DOIUrl":"10.1016/j.displa.2026.103346","url":null,"abstract":"<div><div>With the growing demand for autonomous inspection with Unmanned Aerial Vehicles (UAVs) in dark and confined environments, accurately determining UAV position has become crucial. The Ultra-Wideband (UWB) localization technology offers a promising solution by overcoming challenges posed by signal obstruction, low illumination condition, and confined spaces. However, conventional UWB-based positioning suffers from performance oscillations due to measurement inconsistencies and degradations with time-varying noise models. Furthermore, the widely used Two-Way Time-of-Flight (TW-TOF) method has limitations, such as high energy consumption and a restricted number of tags to be deployed. To address these, a sensor fusion approach combining UWB and Inertial Measurement Unit (IMU) measurements with Time Difference of Arrival (TDOA) localization mechanism is proposed. This method exploits an adaptive Kalman filter, which dynamically adjusts to noise model variations and employs individual weighting factors for each anchor node, enhancing stability and robustness in challenging environments. The comprehensive experiments demonstrate the proposed algorithm achieves a median positioning error of 0.110 m, a 90th percentile error of 0.232 m, and an average standard deviation of 0.075 m with the significantly reduced energy consumption. Additionally, due to TDOA communication principles, this method supports multiple tag nodes, making it ideal for multi-UAV collaborative inspections in future applications.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103346"},"PeriodicalIF":3.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.displa.2026.103348
Qinghua Lin , Yu Long , Xudong Xiong , Wenchao Jiang , Zhihua Wang , Qiuping Jiang
Low-light image enhancement (LLIE) remains a challenging task due to the complex degradations in illumination, contrast, and structural details. Deep neural network-based approaches have shown promising results in addressing LLIE. However, most existing methods either utilize convolutional layers with local receptive fields, which are well-suited for restoring local textures, or Transformer layers with long-range dependencies, which are better at correcting global illumination. Despite their respective strengths, these approaches often struggle to effectively handle both aspects simultaneously. In this paper, we revisit LLIE from a local–global synergy perspective and propose a unified framework, the Local–Global Synergy Network (LGS-Net). LGS-Net explicitly extracts local and global features in parallel using a separable CNN and a Swin Transformer block, respectively, effectively modeling both local structural fidelity and global illumination balance. The extracted features are then fed into a squeeze-and-excitation-based fusion module, which adaptively integrates multi-scale information guided by perceptual relevance. Extensive experiments on multiple real-world benchmarks show that our method consistently outperforms existing state-of-the-art methods across both quantitative metrics (e.g., PSNR, SSIM, Q-Align) and perceptual quality, with notable improvements in color fidelity and detail preservation under extreme low-light and non-uniform illumination.
{"title":"Rethinking low-light image enhancement: A local–global synergy perspective","authors":"Qinghua Lin , Yu Long , Xudong Xiong , Wenchao Jiang , Zhihua Wang , Qiuping Jiang","doi":"10.1016/j.displa.2026.103348","DOIUrl":"10.1016/j.displa.2026.103348","url":null,"abstract":"<div><div>Low-light image enhancement (LLIE) remains a challenging task due to the complex degradations in illumination, contrast, and structural details. Deep neural network-based approaches have shown promising results in addressing LLIE. However, most existing methods either utilize convolutional layers with local receptive fields, which are well-suited for restoring local textures, or Transformer layers with long-range dependencies, which are better at correcting global illumination. Despite their respective strengths, these approaches often struggle to effectively handle both aspects simultaneously. In this paper, we revisit LLIE from a local–global synergy perspective and propose a unified framework, the Local–Global Synergy Network (LGS-Net). LGS-Net explicitly extracts local and global features in parallel using a separable CNN and a Swin Transformer block, respectively, effectively modeling both local structural fidelity and global illumination balance. The extracted features are then fed into a squeeze-and-excitation-based fusion module, which adaptively integrates multi-scale information guided by perceptual relevance. Extensive experiments on multiple real-world benchmarks show that our method consistently outperforms existing state-of-the-art methods across both quantitative metrics (e.g., PSNR, SSIM, Q-Align) and perceptual quality, with notable improvements in color fidelity and detail preservation under extreme low-light and non-uniform illumination.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103348"},"PeriodicalIF":3.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.displa.2026.103353
Xiongzhi Wang , Boyu Yang , Min Wei , Yu Chen , Jingang Zhang , Yunfeng Nie
Three-dimensional (3D) reconstruction is essential for enhancing spatial perception and geometric understanding in minimally invasive surgery. However, current methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) often rely on offline preprocessing—such as COLMAP-based point clouds or multi-frame fusion—limiting their adaptability and clinical deployment. We propose Endo-E2E-GS, a fully end-to-end framework that reconstructs structured 3D Gaussian fields directly from a single stereo endoscopic image pair. The system integrates (1) a DilatedResNet-based stereo depth estimator for robust geometry inference in low-texture scenes, (2) a Gaussian attribute predictor that infers per-pixel rotation, scale, and opacity, and (3) a differentiable splatting renderer for 2D view supervision. Evaluated on the ENDONERF and SCARED datasets, Endo-E2E-GS achieves highly competitive performance, reaching PSNR values of 38.874/33.052 and SSIM scores of 0.978/0.863, respectively, surpassing recent state-of-the-art approaches. It requires no explicit scene initialization and demonstrates consistent performance across two representative endoscopic datasets. Code is available at: https://github.com/Intelligent-Imaging-Center/Endo-E2E-GS.
{"title":"Endo-E2E-GS: End-to-end 3D reconstruction of endoscopic scenes using Gaussian Splatting","authors":"Xiongzhi Wang , Boyu Yang , Min Wei , Yu Chen , Jingang Zhang , Yunfeng Nie","doi":"10.1016/j.displa.2026.103353","DOIUrl":"10.1016/j.displa.2026.103353","url":null,"abstract":"<div><div>Three-dimensional (3D) reconstruction is essential for enhancing spatial perception and geometric understanding in minimally invasive surgery. However, current methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) often rely on offline preprocessing—such as COLMAP-based point clouds or multi-frame fusion—limiting their adaptability and clinical deployment. We propose Endo-E2E-GS, a fully end-to-end framework that reconstructs structured 3D Gaussian fields directly from a single stereo endoscopic image pair. The system integrates (1) a DilatedResNet-based stereo depth estimator for robust geometry inference in low-texture scenes, (2) a Gaussian attribute predictor that infers per-pixel rotation, scale, and opacity, and (3) a differentiable splatting renderer for 2D view supervision. Evaluated on the ENDONERF and SCARED datasets, Endo-E2E-GS achieves highly competitive performance, reaching PSNR values of 38.874/33.052 and SSIM scores of 0.978/0.863, respectively, surpassing recent state-of-the-art approaches. It requires no explicit scene initialization and demonstrates consistent performance across two representative endoscopic datasets. Code is available at: <span><span><strong>https://github.com/Intelligent-Imaging-Center/Endo-E2E-GS</strong></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103353"},"PeriodicalIF":3.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.displa.2026.103350
Christopher A. Sanchez, Nisha Raghunath, Chelsea Ahart
Given the massive amount of visual media consumed across the world everyday, an open question is whether deviations from high-quality streaming can negatively impact viewer’s opinions and attitudes towards viewed content? Previous research has shown that reductions in perceptual quality can negatively impact attitudes in other contexts. These changes in quality often lead to corresponding changes in attitudes. Are users sensitive to changes in video quality, and does this impact reactions to viewed content? For example, do users enjoy lower quality videos as much as higher-quality versions? Do quality differences also make viewers less receptive to the content of videos? Across two studies, participants watched a video in lower- or higher-quality, and were then queried regarding their viewing experience. This included ratings of attitudes towards video streaming and video content, and also included measures of factual recall. Results indicated that viewers significantly prefer videos presented in higher quality, which drives future viewing intentions. Further, while factual memory for information was equivalent across video quality, participants who viewed the higher-quality video were more likely to show an affective reaction to the video, and also change their attitudes relative to the presented content. These results have implications for the design and delivery of online video content, and suggests that any deviations from higher-quality presentations can bias opinions relative to the viewed content. Lower-quality videos decreased attitudes towards content, and also negatively impacted viewers’ receptiveness to presented content.
{"title":"Differences in streaming quality impact viewer expectations, attitudes and reactions to video","authors":"Christopher A. Sanchez, Nisha Raghunath, Chelsea Ahart","doi":"10.1016/j.displa.2026.103350","DOIUrl":"10.1016/j.displa.2026.103350","url":null,"abstract":"<div><div>Given the massive amount of visual media consumed across the world everyday, an open question is whether deviations from high-quality streaming can negatively impact viewer’s opinions and attitudes towards viewed content? Previous research has shown that reductions in perceptual quality can negatively impact attitudes in other contexts. These changes in quality often lead to corresponding changes in attitudes. Are users sensitive to changes in video quality, and does this impact reactions to viewed content? For example, do users enjoy lower quality videos as much as higher-quality versions? Do quality differences also make viewers less receptive to the content of videos? Across two studies, participants watched a video in lower- or higher-quality, and were then queried regarding their viewing experience. This included ratings of attitudes towards video streaming and video content, and also included measures of factual recall. Results indicated that viewers significantly prefer videos presented in higher quality, which drives future viewing intentions. Further, while factual memory for information was equivalent across video quality, participants who viewed the higher-quality video were more likely to show an affective reaction to the video, and also change their attitudes relative to the presented content. These results have implications for the design and delivery of online video content, and suggests that any deviations from higher-quality presentations can bias opinions relative to the viewed content. Lower-quality videos decreased attitudes towards content, and also negatively impacted viewers’ receptiveness to presented content.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103350"},"PeriodicalIF":3.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.displa.2026.103344
Yihan Wang , Yongfang Wang , Shuo Zhu , Zhijun Fang
Due to rapid advances in 3-Dimensional (3D) sensing and rendering technologies, point clouds have become increasingly widespread, bring significant challenges for transmission and storage. Existing LiDAR Point Cloud Compression (PCC) methods primarily focus on enhancing compression efficiency and maintaining high signal fidelity, with insufficient considering human and machine joint perception. This paper proposes Rate Distortion Optimization (RDO) and Adaptive Quantization (AQ) for LiDAR Point Cloud Geometry Compression (PCGC) to balance human–machine vision performance. Specifically, we first propose Hybrid Distortion RDO (HDRDO) using hybrid distortion and Lagrange multiplier, where the optimal weights are determined by Differential Evolution (DE) algorithm. Furthermore, by comprehensively analyzing the impacts of point clouds on a Gaussian-based classification method on overall quality, we propose a HDRDO-based AQ method to adaptively quantify important and non-important points by optimal Quantization Parameter (QP) selection. We implement on Geometry-based Point Cloud Compression (G-PCC) Test Model Category 1 and 3 (TMC13), called the anchor method. Compared with the anchor method, the proposed algorithm achieves consistent PSNR for human vision tasks and improves by 2.66% and 21.18% on accuracy at low bitrates for detection and segmentation, respectively. Notably, the proposed overall method performs better than the existing method.
{"title":"Towards LiDAR point cloud geometry compression using rate-distortion optimization and adaptive quantization for human-machine vision","authors":"Yihan Wang , Yongfang Wang , Shuo Zhu , Zhijun Fang","doi":"10.1016/j.displa.2026.103344","DOIUrl":"10.1016/j.displa.2026.103344","url":null,"abstract":"<div><div>Due to rapid advances in 3-Dimensional (3D) sensing and rendering technologies, point clouds have become increasingly widespread, bring significant challenges for transmission and storage. Existing LiDAR Point Cloud Compression (PCC) methods primarily focus on enhancing compression efficiency and maintaining high signal fidelity, with insufficient considering human and machine joint perception. This paper proposes Rate Distortion Optimization (RDO) and Adaptive Quantization (AQ) for LiDAR Point Cloud Geometry Compression (PCGC) to balance human–machine vision performance. Specifically, we first propose Hybrid Distortion RDO (HDRDO) using hybrid distortion and Lagrange multiplier, where the optimal weights are determined by Differential Evolution (DE) algorithm. Furthermore, by comprehensively analyzing the impacts of point clouds on a Gaussian-based classification method on overall quality, we propose a HDRDO-based AQ method to adaptively quantify important and non-important points by optimal Quantization Parameter (QP) selection. We implement on Geometry-based Point Cloud Compression (G-PCC) Test Model Category 1 and 3 (TMC13), called the anchor method. Compared with the anchor method, the proposed algorithm achieves consistent PSNR for human vision tasks and improves by 2.66% and 21.18% on accuracy at low bitrates for detection and segmentation, respectively. Notably, the proposed overall method performs better than the existing method.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103344"},"PeriodicalIF":3.4,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.displa.2026.103349
Xingdong Sheng , Qi Zhou , Xu Liu , Zhenyang Qu , Haoyu Xu , Shijie Mao , Xiaokang Yang
3D Gaussian Splatting (3DGS) has recently demonstrated remarkable rendering speed and photorealistic quality for 3D reconstruction. Yet precise surface reconstruction and view-consistent photometric fidelity remain challenging, because the standard pipeline lacks explicit geometry supervision. Several recent approaches incorporate dense LiDAR point clouds as guidance, typically by aligning Gaussian centers or projecting LiDAR points into pseudo-depth maps. However, such methods constrain positions only and overlook the anisotropic shapes of the Gaussians, often resulting in rough surfaces and residual artifacts. To overcome these limitations, we propose a direct LiDAR-supervised surface-aligned regularization loss that simultaneously constrains Gaussian positions and shapes without converting LiDAR scans into depth maps. We further introduce adaptive densification and a multi-view depth-guided pruning strategy to enhance fidelity and suppress floaters. Extensive experiments on diverse indoor and outdoor datasets that represent the demands of industrial digital-twin applications show that our method consistently improves photorealistic rendering, even under significant viewpoint deviations, demonstrating advantages over existing typical LiDAR-assisted 3DGS methods.
{"title":"Direct LiDAR-supervised surface-aligned 3D Gaussian Splatting for high-fidelity digital twin","authors":"Xingdong Sheng , Qi Zhou , Xu Liu , Zhenyang Qu , Haoyu Xu , Shijie Mao , Xiaokang Yang","doi":"10.1016/j.displa.2026.103349","DOIUrl":"10.1016/j.displa.2026.103349","url":null,"abstract":"<div><div>3D Gaussian Splatting (3DGS) has recently demonstrated remarkable rendering speed and photorealistic quality for 3D reconstruction. Yet precise surface reconstruction and view-consistent photometric fidelity remain challenging, because the standard pipeline lacks explicit geometry supervision. Several recent approaches incorporate dense LiDAR point clouds as guidance, typically by aligning Gaussian centers or projecting LiDAR points into pseudo-depth maps. However, such methods constrain positions only and overlook the anisotropic shapes of the Gaussians, often resulting in rough surfaces and residual artifacts. To overcome these limitations, we propose a direct LiDAR-supervised surface-aligned regularization loss that simultaneously constrains Gaussian positions and shapes without converting LiDAR scans into depth maps. We further introduce adaptive densification and a multi-view depth-guided pruning strategy to enhance fidelity and suppress floaters. Extensive experiments on diverse indoor and outdoor datasets that represent the demands of industrial digital-twin applications show that our method consistently improves photorealistic rendering, even under significant viewpoint deviations, demonstrating advantages over existing typical LiDAR-assisted 3DGS methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103349"},"PeriodicalIF":3.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.displa.2026.103343
Almudena Palacios-Ibáñez , Manuel F. Contero-López , Santiago Castellet-Lathan , Nathan Hartman , Manuel Contero
Most of the information we gather from our environment is obtained from sight, hence, visual evaluation is vital for assessing products. However, designers have traditionally relied on self-report questionnaires for this purpose, which have proven to be insufficient in some cases. Consequently, physiological measures are being employed to gain a deeper understanding of the cognitive and perceptual processes involved in product evaluation, and, thanks to their integration in Virtual Reality (VR) headsets, they have become a powerful tool for virtual prototype assessment. Still, using virtual prototypes raises some concerns, as previous studies have found that the medium can influence product perception. These results rely solely on self-report techniques, highlighting the need to explore the use of ET for product assessment, which is the main objective of this research. We present two case studies where a group of people assessed through two display mediums (CS-1) a set of furniture comprising a general scene using a ranking-type evaluation (i.e., joint assessment) and (CS-2) two armchairs individually using the Semantic Differential technique. Moreover, the dwell time of the Areas of Interest (AOIs) defined was recorded. Primarily, our results showed that, despite VR being sensitive to aesthetic differences between designs of the same product typology, the medium may still influence the perception of specific product attributes —e.g., fragility (pMODERN < 0.001, pTRADITIONAL = 0.002)—, and observation of specific AOIs —e.g., AOI1 (pMODERN = 0.003, pTRADITIONAL < 0.001), AOI9 and AOI10 (p < 0.001). At the same time, no differences were found in the perception of the general scene, whereas dwell time was influenced for AOI1 (p = 0.003), AOI4 (p = 0.006), and AOI5 (<.001). Additionally, the university of origin may also be a factor influencing product evaluation, while confidence in the response was not affected by the medium. Hence, this study contributes to a deeper understanding of how the medium influences product perception by employing ET with self-report methods, offering valuable insights into user behavior.
{"title":"Leveraging the power of eye-tracking for virtual prototype evaluation: a comparison between virtual reality and photorealistic images","authors":"Almudena Palacios-Ibáñez , Manuel F. Contero-López , Santiago Castellet-Lathan , Nathan Hartman , Manuel Contero","doi":"10.1016/j.displa.2026.103343","DOIUrl":"10.1016/j.displa.2026.103343","url":null,"abstract":"<div><div>Most of the information we gather from our environment is obtained from sight, hence, visual evaluation is vital for assessing products. However, designers have traditionally relied on self-report questionnaires for this purpose, which have proven to be insufficient in some cases. Consequently, physiological measures are being employed to gain a deeper understanding of the cognitive and perceptual processes involved in product evaluation, and, thanks to their integration in Virtual Reality (VR) headsets, they have become a powerful tool for virtual prototype assessment. Still, using virtual prototypes raises some concerns, as previous studies have found that the medium can influence product perception. These results rely solely on self-report techniques, highlighting the need to explore the use of ET for product assessment, which is the main objective of this research. We present two case studies where a group of people assessed through two display mediums (CS-1) a set of furniture comprising a general scene using a ranking-type evaluation (i.e., joint assessment) and (CS-2) two armchairs individually using the Semantic Differential technique. Moreover, the dwell time of the Areas of Interest (AOIs) defined was recorded. Primarily, our results showed that, despite VR being sensitive to aesthetic differences between designs of the same product typology, the medium may still influence the perception of specific product attributes —e.g., fragility (p<sub>MODERN</sub> < 0.001, p<sub>TRADITIONAL</sub> = 0.002)—, and observation of specific AOIs —e.g., AOI1 (p<sub>MODERN</sub> = 0.003, p<sub>TRADITIONAL</sub> < 0.001), AOI9 and AOI10 (p < 0.001). At the same time, no differences were found in the perception of the general scene, whereas dwell time was influenced for AOI1 (p = 0.003), AOI4 (p = 0.006), and AOI5 (<.001). Additionally, the university of origin may also be a factor influencing product evaluation, while confidence in the response was not affected by the medium. Hence, this study contributes to a deeper understanding of how the medium influences product perception by employing ET with self-report methods, offering valuable insights into user behavior.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103343"},"PeriodicalIF":3.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}