Pub Date : 2026-01-12DOI: 10.1016/j.displa.2026.103350
Christopher A. Sanchez, Nisha Raghunath, Chelsea Ahart
Given the massive amount of visual media consumed across the world everyday, an open question is whether deviations from high-quality streaming can negatively impact viewer’s opinions and attitudes towards viewed content? Previous research has shown that reductions in perceptual quality can negatively impact attitudes in other contexts. These changes in quality often lead to corresponding changes in attitudes. Are users sensitive to changes in video quality, and does this impact reactions to viewed content? For example, do users enjoy lower quality videos as much as higher-quality versions? Do quality differences also make viewers less receptive to the content of videos? Across two studies, participants watched a video in lower- or higher-quality, and were then queried regarding their viewing experience. This included ratings of attitudes towards video streaming and video content, and also included measures of factual recall. Results indicated that viewers significantly prefer videos presented in higher quality, which drives future viewing intentions. Further, while factual memory for information was equivalent across video quality, participants who viewed the higher-quality video were more likely to show an affective reaction to the video, and also change their attitudes relative to the presented content. These results have implications for the design and delivery of online video content, and suggests that any deviations from higher-quality presentations can bias opinions relative to the viewed content. Lower-quality videos decreased attitudes towards content, and also negatively impacted viewers’ receptiveness to presented content.
{"title":"Differences in streaming quality impact viewer expectations, attitudes and reactions to video","authors":"Christopher A. Sanchez, Nisha Raghunath, Chelsea Ahart","doi":"10.1016/j.displa.2026.103350","DOIUrl":"10.1016/j.displa.2026.103350","url":null,"abstract":"<div><div>Given the massive amount of visual media consumed across the world everyday, an open question is whether deviations from high-quality streaming can negatively impact viewer’s opinions and attitudes towards viewed content? Previous research has shown that reductions in perceptual quality can negatively impact attitudes in other contexts. These changes in quality often lead to corresponding changes in attitudes. Are users sensitive to changes in video quality, and does this impact reactions to viewed content? For example, do users enjoy lower quality videos as much as higher-quality versions? Do quality differences also make viewers less receptive to the content of videos? Across two studies, participants watched a video in lower- or higher-quality, and were then queried regarding their viewing experience. This included ratings of attitudes towards video streaming and video content, and also included measures of factual recall. Results indicated that viewers significantly prefer videos presented in higher quality, which drives future viewing intentions. Further, while factual memory for information was equivalent across video quality, participants who viewed the higher-quality video were more likely to show an affective reaction to the video, and also change their attitudes relative to the presented content. These results have implications for the design and delivery of online video content, and suggests that any deviations from higher-quality presentations can bias opinions relative to the viewed content. Lower-quality videos decreased attitudes towards content, and also negatively impacted viewers’ receptiveness to presented content.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103350"},"PeriodicalIF":3.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.displa.2026.103344
Yihan Wang , Yongfang Wang , Shuo Zhu , Zhijun Fang
Due to rapid advances in 3-Dimensional (3D) sensing and rendering technologies, point clouds have become increasingly widespread, bring significant challenges for transmission and storage. Existing LiDAR Point Cloud Compression (PCC) methods primarily focus on enhancing compression efficiency and maintaining high signal fidelity, with insufficient considering human and machine joint perception. This paper proposes Rate Distortion Optimization (RDO) and Adaptive Quantization (AQ) for LiDAR Point Cloud Geometry Compression (PCGC) to balance human–machine vision performance. Specifically, we first propose Hybrid Distortion RDO (HDRDO) using hybrid distortion and Lagrange multiplier, where the optimal weights are determined by Differential Evolution (DE) algorithm. Furthermore, by comprehensively analyzing the impacts of point clouds on a Gaussian-based classification method on overall quality, we propose a HDRDO-based AQ method to adaptively quantify important and non-important points by optimal Quantization Parameter (QP) selection. We implement on Geometry-based Point Cloud Compression (G-PCC) Test Model Category 1 and 3 (TMC13), called the anchor method. Compared with the anchor method, the proposed algorithm achieves consistent PSNR for human vision tasks and improves by 2.66% and 21.18% on accuracy at low bitrates for detection and segmentation, respectively. Notably, the proposed overall method performs better than the existing method.
{"title":"Towards LiDAR point cloud geometry compression using rate-distortion optimization and adaptive quantization for human-machine vision","authors":"Yihan Wang , Yongfang Wang , Shuo Zhu , Zhijun Fang","doi":"10.1016/j.displa.2026.103344","DOIUrl":"10.1016/j.displa.2026.103344","url":null,"abstract":"<div><div>Due to rapid advances in 3-Dimensional (3D) sensing and rendering technologies, point clouds have become increasingly widespread, bring significant challenges for transmission and storage. Existing LiDAR Point Cloud Compression (PCC) methods primarily focus on enhancing compression efficiency and maintaining high signal fidelity, with insufficient considering human and machine joint perception. This paper proposes Rate Distortion Optimization (RDO) and Adaptive Quantization (AQ) for LiDAR Point Cloud Geometry Compression (PCGC) to balance human–machine vision performance. Specifically, we first propose Hybrid Distortion RDO (HDRDO) using hybrid distortion and Lagrange multiplier, where the optimal weights are determined by Differential Evolution (DE) algorithm. Furthermore, by comprehensively analyzing the impacts of point clouds on a Gaussian-based classification method on overall quality, we propose a HDRDO-based AQ method to adaptively quantify important and non-important points by optimal Quantization Parameter (QP) selection. We implement on Geometry-based Point Cloud Compression (G-PCC) Test Model Category 1 and 3 (TMC13), called the anchor method. Compared with the anchor method, the proposed algorithm achieves consistent PSNR for human vision tasks and improves by 2.66% and 21.18% on accuracy at low bitrates for detection and segmentation, respectively. Notably, the proposed overall method performs better than the existing method.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103344"},"PeriodicalIF":3.4,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.displa.2026.103349
Xingdong Sheng , Qi Zhou , Xu Liu , Zhenyang Qu , Haoyu Xu , Shijie Mao , Xiaokang Yang
3D Gaussian Splatting (3DGS) has recently demonstrated remarkable rendering speed and photorealistic quality for 3D reconstruction. Yet precise surface reconstruction and view-consistent photometric fidelity remain challenging, because the standard pipeline lacks explicit geometry supervision. Several recent approaches incorporate dense LiDAR point clouds as guidance, typically by aligning Gaussian centers or projecting LiDAR points into pseudo-depth maps. However, such methods constrain positions only and overlook the anisotropic shapes of the Gaussians, often resulting in rough surfaces and residual artifacts. To overcome these limitations, we propose a direct LiDAR-supervised surface-aligned regularization loss that simultaneously constrains Gaussian positions and shapes without converting LiDAR scans into depth maps. We further introduce adaptive densification and a multi-view depth-guided pruning strategy to enhance fidelity and suppress floaters. Extensive experiments on diverse indoor and outdoor datasets that represent the demands of industrial digital-twin applications show that our method consistently improves photorealistic rendering, even under significant viewpoint deviations, demonstrating advantages over existing typical LiDAR-assisted 3DGS methods.
{"title":"Direct LiDAR-supervised surface-aligned 3D Gaussian Splatting for high-fidelity digital twin","authors":"Xingdong Sheng , Qi Zhou , Xu Liu , Zhenyang Qu , Haoyu Xu , Shijie Mao , Xiaokang Yang","doi":"10.1016/j.displa.2026.103349","DOIUrl":"10.1016/j.displa.2026.103349","url":null,"abstract":"<div><div>3D Gaussian Splatting (3DGS) has recently demonstrated remarkable rendering speed and photorealistic quality for 3D reconstruction. Yet precise surface reconstruction and view-consistent photometric fidelity remain challenging, because the standard pipeline lacks explicit geometry supervision. Several recent approaches incorporate dense LiDAR point clouds as guidance, typically by aligning Gaussian centers or projecting LiDAR points into pseudo-depth maps. However, such methods constrain positions only and overlook the anisotropic shapes of the Gaussians, often resulting in rough surfaces and residual artifacts. To overcome these limitations, we propose a direct LiDAR-supervised surface-aligned regularization loss that simultaneously constrains Gaussian positions and shapes without converting LiDAR scans into depth maps. We further introduce adaptive densification and a multi-view depth-guided pruning strategy to enhance fidelity and suppress floaters. Extensive experiments on diverse indoor and outdoor datasets that represent the demands of industrial digital-twin applications show that our method consistently improves photorealistic rendering, even under significant viewpoint deviations, demonstrating advantages over existing typical LiDAR-assisted 3DGS methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103349"},"PeriodicalIF":3.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.displa.2026.103343
Almudena Palacios-Ibáñez , Manuel F. Contero-López , Santiago Castellet-Lathan , Nathan Hartman , Manuel Contero
Most of the information we gather from our environment is obtained from sight, hence, visual evaluation is vital for assessing products. However, designers have traditionally relied on self-report questionnaires for this purpose, which have proven to be insufficient in some cases. Consequently, physiological measures are being employed to gain a deeper understanding of the cognitive and perceptual processes involved in product evaluation, and, thanks to their integration in Virtual Reality (VR) headsets, they have become a powerful tool for virtual prototype assessment. Still, using virtual prototypes raises some concerns, as previous studies have found that the medium can influence product perception. These results rely solely on self-report techniques, highlighting the need to explore the use of ET for product assessment, which is the main objective of this research. We present two case studies where a group of people assessed through two display mediums (CS-1) a set of furniture comprising a general scene using a ranking-type evaluation (i.e., joint assessment) and (CS-2) two armchairs individually using the Semantic Differential technique. Moreover, the dwell time of the Areas of Interest (AOIs) defined was recorded. Primarily, our results showed that, despite VR being sensitive to aesthetic differences between designs of the same product typology, the medium may still influence the perception of specific product attributes —e.g., fragility (pMODERN < 0.001, pTRADITIONAL = 0.002)—, and observation of specific AOIs —e.g., AOI1 (pMODERN = 0.003, pTRADITIONAL < 0.001), AOI9 and AOI10 (p < 0.001). At the same time, no differences were found in the perception of the general scene, whereas dwell time was influenced for AOI1 (p = 0.003), AOI4 (p = 0.006), and AOI5 (<.001). Additionally, the university of origin may also be a factor influencing product evaluation, while confidence in the response was not affected by the medium. Hence, this study contributes to a deeper understanding of how the medium influences product perception by employing ET with self-report methods, offering valuable insights into user behavior.
{"title":"Leveraging the power of eye-tracking for virtual prototype evaluation: a comparison between virtual reality and photorealistic images","authors":"Almudena Palacios-Ibáñez , Manuel F. Contero-López , Santiago Castellet-Lathan , Nathan Hartman , Manuel Contero","doi":"10.1016/j.displa.2026.103343","DOIUrl":"10.1016/j.displa.2026.103343","url":null,"abstract":"<div><div>Most of the information we gather from our environment is obtained from sight, hence, visual evaluation is vital for assessing products. However, designers have traditionally relied on self-report questionnaires for this purpose, which have proven to be insufficient in some cases. Consequently, physiological measures are being employed to gain a deeper understanding of the cognitive and perceptual processes involved in product evaluation, and, thanks to their integration in Virtual Reality (VR) headsets, they have become a powerful tool for virtual prototype assessment. Still, using virtual prototypes raises some concerns, as previous studies have found that the medium can influence product perception. These results rely solely on self-report techniques, highlighting the need to explore the use of ET for product assessment, which is the main objective of this research. We present two case studies where a group of people assessed through two display mediums (CS-1) a set of furniture comprising a general scene using a ranking-type evaluation (i.e., joint assessment) and (CS-2) two armchairs individually using the Semantic Differential technique. Moreover, the dwell time of the Areas of Interest (AOIs) defined was recorded. Primarily, our results showed that, despite VR being sensitive to aesthetic differences between designs of the same product typology, the medium may still influence the perception of specific product attributes —e.g., fragility (p<sub>MODERN</sub> < 0.001, p<sub>TRADITIONAL</sub> = 0.002)—, and observation of specific AOIs —e.g., AOI1 (p<sub>MODERN</sub> = 0.003, p<sub>TRADITIONAL</sub> < 0.001), AOI9 and AOI10 (p < 0.001). At the same time, no differences were found in the perception of the general scene, whereas dwell time was influenced for AOI1 (p = 0.003), AOI4 (p = 0.006), and AOI5 (<.001). Additionally, the university of origin may also be a factor influencing product evaluation, while confidence in the response was not affected by the medium. Hence, this study contributes to a deeper understanding of how the medium influences product perception by employing ET with self-report methods, offering valuable insights into user behavior.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103343"},"PeriodicalIF":3.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.displa.2026.103345
Horyun Chung , Eunjae Na , Myunghwan Kim , Sungguk An , Yeong Hwan Ko , Jae Su Yu
To enhance external luminous efficiency and reduce power consumption in rigid top-emitting organic light-emitting diode (OLED) panels for mobile applications, a film color filter was introduced as a promising alternative for conventional polarizers. The film color filter exhibited higher transmittance in the red, green, and blue emission wavelength regions of OLEDs compared to the polarizer, thereby improving external luminous efficiency. However, its application also increases reflectance due to external light, which necessitates optimization strategies to mitigate this drawback. To address this issue, the internal reflection within the OLED panel was reduced by optimizing the capping layer (CPL) thickness from 60 to 40 nm. Additionally, a refractive index matching layer was implemented between the encapsulation glass and the CPL, resulting in a 24.5% reduction in the specular component included (SCI) reflectance and a decrease in the absolute value of the specular component excluded (SCE) reflection color coordinate. White efficiency typically decreases with the reduction of the CPL thickness; however, the Device B exhibited improvements of 13.7%, 16.8%, and 12.4% in white efficiency compared to the polarizer at the CPL thicknesses of 40, 50, and 60 nm, respectively. This enhancement was particularly pronounced in the blue emission region, where the luminous efficiency is inherently lower. These findings indicate that optimizing the CPL thickness to 40 nm in conjunction with the Device B effectively reduces SCI reflectance, improves SCE reflection color coordinate, and enhances white efficiency. This study demonstrates that replacing the conventional polarizer with a film color filter is a viable approach to achieving higher luminous efficiency in rigid top-emitting OLED panels for mobile devices.
{"title":"Enhanced white efficiency using film color filter via internal reflectance control by capping and refractive index matching layers for rigid OLED panels","authors":"Horyun Chung , Eunjae Na , Myunghwan Kim , Sungguk An , Yeong Hwan Ko , Jae Su Yu","doi":"10.1016/j.displa.2026.103345","DOIUrl":"10.1016/j.displa.2026.103345","url":null,"abstract":"<div><div>To enhance external luminous efficiency and reduce power consumption in rigid top-emitting organic light-emitting diode (OLED) panels for mobile applications, a film color filter was introduced as a promising alternative for conventional polarizers. The film color filter exhibited higher transmittance in the red, green, and blue emission wavelength regions of OLEDs compared to the polarizer, thereby improving external luminous efficiency. However, its application also increases reflectance due to external light, which necessitates optimization strategies to mitigate this drawback. To address this issue, the internal reflection within the OLED panel was reduced by optimizing the capping layer (CPL) thickness from 60 to 40 nm. Additionally, a refractive index matching layer was implemented between the encapsulation glass and the CPL, resulting in a 24.5% reduction in the specular component included (SCI) reflectance and a decrease in the absolute value of the specular component excluded (SCE) reflection color coordinate. White efficiency typically decreases with the reduction of the CPL thickness; however, the Device B exhibited improvements of 13.7%, 16.8%, and 12.4% in white efficiency compared to the polarizer at the CPL thicknesses of 40, 50, and 60 nm, respectively. This enhancement was particularly pronounced in the blue emission region, where the luminous efficiency is inherently lower. These findings indicate that optimizing the CPL thickness to 40 nm in conjunction with the Device B effectively reduces SCI reflectance, improves SCE reflection color coordinate, and enhances white efficiency. This study demonstrates that replacing the conventional polarizer with a film color filter is a viable approach to achieving higher luminous efficiency in rigid top-emitting OLED panels for mobile devices.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103345"},"PeriodicalIF":3.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.displa.2025.103337
Deboch Eyob Abera , Jiaye He , Jia Liu , Nazar Zaki , Wenjian Qin
Robust and accurate cell segmentation across diverse imaging modalities remains a critical challenge in microscopy image analysis. While foundation models like the Segment Anything Model (SAM) have demonstrated exceptional performance in natural image segmentation, their adaptation to multi-modal cellular analysis is hindered by domain-specific knowledge gaps and morphological complexity. To bridge this gap, we present a novel SAM-driven framework featuring three systematic innovations: First, we propose Shape-Aware Classification to enhance segmentation of cells with diverse morphologies. Second, Auto Point Prompt Generation (APPGen) module guides the segmentation model with automatically generated point cues to improve segmentation accuracy. Third, we implement Boundary-Aware SAM Adaptation to effectively resolve overlapping cells in microscopy images. Our experiments show that the proposed framework reduces manual effort through automated prompts, adapts well to different imaging modalities, and enhances segmentation accuracy by incorporating boundary-aware techniques. The source code is available at https://github.com/MIXAILAB/Multi_Modality_CellSeg.
{"title":"Automated prompt-guided multi-modality cell segmentation with shape-aware classification and boundary-aware SAM adaptation","authors":"Deboch Eyob Abera , Jiaye He , Jia Liu , Nazar Zaki , Wenjian Qin","doi":"10.1016/j.displa.2025.103337","DOIUrl":"10.1016/j.displa.2025.103337","url":null,"abstract":"<div><div>Robust and accurate cell segmentation across diverse imaging modalities remains a critical challenge in microscopy image analysis. While foundation models like the Segment Anything Model (SAM) have demonstrated exceptional performance in natural image segmentation, their adaptation to multi-modal cellular analysis is hindered by domain-specific knowledge gaps and morphological complexity. To bridge this gap, we present a novel SAM-driven framework featuring three systematic innovations: First, we propose Shape-Aware Classification to enhance segmentation of cells with diverse morphologies. Second, Auto Point Prompt Generation (APPGen) module guides the segmentation model with automatically generated point cues to improve segmentation accuracy. Third, we implement Boundary-Aware SAM Adaptation to effectively resolve overlapping cells in microscopy images. Our experiments show that the proposed framework reduces manual effort through automated prompts, adapts well to different imaging modalities, and enhances segmentation accuracy by incorporating boundary-aware techniques. The source code is available at <span><span>https://github.com/MIXAILAB/Multi_Modality_CellSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103337"},"PeriodicalIF":3.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agriculture plays a crucial role in national food security, with crop diseases and pests being major threats to agricultural sustainability. Traditional detection methods are labor-intensive, subjective, and often inaccurate. Recent advancements in deep learning have significantly improved image-based recognition; however, the performance of convolutional neural networks (CNNs) is highly dependent on hyperparameter tuning, which remains a challenging task. To address this issue, this study proposes a multi-strategy grey wolf optimizer (MGWO) to enhance CNN hyperparameter optimization. MGWO improves the global search efficiency of the conventional grey wolf optimizer (GWO), enabling automatic selection of optimal hyperparameters. The proposed approach is evaluated on corn disease and Pentatomidae stinkbug pest classification, comparing its performance against a baseline CNN model and six other optimization algorithms. Experimental results show that MGWO achieves 95.71% accuracy on the corn disease dataset and 94.46% on the pest dataset, outperforming all competing methods.
These findings demonstrate the potential of MGWO in optimizing deep learning models for agricultural applications, providing a robust and automated solution for crop disease and pest recognition.
{"title":"An optimized convolutional neural network based on multi-strategy grey wolf optimizer to identify crop diseases and pests","authors":"Xiaobing Yu , Hongqian Zhang , Yuchen Duan , Xuming Wang","doi":"10.1016/j.displa.2026.103341","DOIUrl":"10.1016/j.displa.2026.103341","url":null,"abstract":"<div><div>Agriculture plays a crucial role in national food security, with crop diseases and pests being major threats to agricultural sustainability. Traditional detection methods are labor-intensive, subjective, and often inaccurate. Recent advancements in deep learning have significantly improved image-based recognition; however, the performance of convolutional neural networks (CNNs) is highly dependent on hyperparameter tuning, which remains a challenging <span><span>task. To</span><svg><path></path></svg></span> address this issue, this study proposes a multi-strategy grey wolf optimizer (MGWO) to enhance CNN hyperparameter optimization. MGWO improves the global search efficiency of the conventional grey wolf optimizer (GWO), enabling automatic selection of optimal hyperparameters. The proposed approach is evaluated on corn disease and Pentatomidae stinkbug pest classification, comparing its performance against a baseline CNN model and six other optimization algorithms. Experimental results show that MGWO achieves 95.71% accuracy on the corn disease dataset and 94.46% on the pest dataset, outperforming all competing methods.</div><div>These findings demonstrate the potential of MGWO in optimizing deep learning models for agricultural applications, providing a robust and automated solution for crop disease and pest recognition.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103341"},"PeriodicalIF":3.4,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parameter-Efficient Fine-Tuning (PEFT) is a transfer learning technique designed to adapt pre-trained models to downstream tasks while minimizing parameter and computational complexity. In recent years, No-Reference Image Quality Assessment (NR-IQA) methods based on pre-trained visual models have achieved significant progress. However, most of these methods rely on full fine-tuning, which requires substantial computational and memory resources. A natural question arises: can PEFT techniques achieve parameter-efficient NR-IQA with good performance? To explore this, we perform empirical studies using several PEFT methods on pre-trained Vision Transformer (ViT) model. Specifically, we select three PEFT approaches – adapter tuning, prompt tuning, and partial tuning – that have proven effective in general vision tasks, and investigate whether they can achieve performance comparable to traditional visual NR-IQA models. Among them, which is the most effective? Furthermore, we examine the impact of four key factors on the results: fine-tuning position, parameter configuration, layer selection strategy, and the scale of pre-trained weights. Finally, we evaluate whether the optimal PEFT strategy on ViT can be generalized to other Transformer-based architectures. This work offers valuable insights and practical guidance for future research on PEFT methods in NR-IQA tasks.
{"title":"Parameter-efficient fine-tuning for no-reference image quality assessment: Empirical studies on vision transformer","authors":"GuangLu Sun, Kaiwei Lei, Tianlin Li, Linsen Yu, Suxia Zhu","doi":"10.1016/j.displa.2026.103339","DOIUrl":"10.1016/j.displa.2026.103339","url":null,"abstract":"<div><div>Parameter-Efficient Fine-Tuning (PEFT) is a transfer learning technique designed to adapt pre-trained models to downstream tasks while minimizing parameter and computational complexity. In recent years, No-Reference Image Quality Assessment (NR-IQA) methods based on pre-trained visual models have achieved significant progress. However, most of these methods rely on full fine-tuning, which requires substantial computational and memory resources. A natural question arises: can PEFT techniques achieve parameter-efficient NR-IQA with good performance? To explore this, we perform empirical studies using several PEFT methods on pre-trained Vision Transformer (ViT) model. Specifically, we select three PEFT approaches – adapter tuning, prompt tuning, and partial tuning – that have proven effective in general vision tasks, and investigate whether they can achieve performance comparable to traditional visual NR-IQA models. Among them, which is the most effective? Furthermore, we examine the impact of four key factors on the results: fine-tuning position, parameter configuration, layer selection strategy, and the scale of pre-trained weights. Finally, we evaluate whether the optimal PEFT strategy on ViT can be generalized to other Transformer-based architectures. This work offers valuable insights and practical guidance for future research on PEFT methods in NR-IQA tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103339"},"PeriodicalIF":3.4,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-03DOI: 10.1016/j.displa.2026.103340
Yaxin Lin , Xiaopeng Li , Lian Zou , Liqing Zhou , Cien Fan
Under low illumination, RGB cameras often capture images with significant noise and low visibility, while event cameras, with their high dynamic range characteristic, emerge as a promising solution for improving image quality in the low-light environment by supplementing image details in low-light condition. In this paper, we propose a novel image enhancement framework called AFFLIE, which integrates event and frame-based techniques to improve image quality in low-light conditions. The framework introduces a Multi-scale Spatial-Channel Transformer Encoder (MS-SCTE) to address low-light image noise and event temporal characteristics. Additionally, an Adaptive Feature Fusion Module (AFFM) is proposed to dynamically aggregate features from both image and event streams, enhancing generalization performance. The framework demonstrates superior performance on the SDE, LIE and RELED datasets by enhancing noise reduction and detail preservation.
{"title":"AFFLIE: Adaptive feature fusion for low-light image enhancement","authors":"Yaxin Lin , Xiaopeng Li , Lian Zou , Liqing Zhou , Cien Fan","doi":"10.1016/j.displa.2026.103340","DOIUrl":"10.1016/j.displa.2026.103340","url":null,"abstract":"<div><div>Under low illumination, RGB cameras often capture images with significant noise and low visibility, while event cameras, with their high dynamic range characteristic, emerge as a promising solution for improving image quality in the low-light environment by supplementing image details in low-light condition. In this paper, we propose a novel image enhancement framework called AFFLIE, which integrates event and frame-based techniques to improve image quality in low-light conditions. The framework introduces a Multi-scale Spatial-Channel Transformer Encoder (MS-SCTE) to address low-light image noise and event temporal characteristics. Additionally, an Adaptive Feature Fusion Module (AFFM) is proposed to dynamically aggregate features from both image and event streams, enhancing generalization performance. The framework demonstrates superior performance on the SDE, LIE and RELED datasets by enhancing noise reduction and detail preservation.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103340"},"PeriodicalIF":3.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Few-Shot Object Detection (FSOD) aims at learning robust detectors under extreme data imbalance between abundant base classes and scarce novel classes. While recent transfer learning paradigms achieve initial success through sequential base class pre-training and novel class fine-tuning, their fundamental assumption that base class trained feature encoder can generalize to novel class instances reveals critical limitations due to the information suppression of novel classes. Knowledge distillation from vision-language models like CLIP presents promising solutions, yet conventional distillation approaches exhibit inherent flaws from the perspective of Information Bottleneck (IB) principle: CLIP’s broad semantic understanding results in low information compression, and feature distillation can struggle to reconcile with FSOD’s high information compression demand, potentially leading to suboptimal information compression of the detector. Conversely, while logits distillation using only base classes can enhance information compression, it fails to preserve and transfer crucial novel class semantics from CLIP. To address these challenges, we propose a unified framework comprising Class Extension Logits Distillation (CELD) and Virtual Knowledge Parameter Initializer (VKPInit). During base training, CELD uses CLIP’s text encoder to create an expanded base-novel classifier. This acts as an IB, providing target distributions from CLIP’s visual features for both base and unseen novel classes. The detector aligns to these distributions using its base classifier and a virtual novel classifier, allowing it to learn compressed, novel-aware knowledge from CLIP. Subsequently, during novel tuning, VKPInit leverages the virtual novel classifier learned in CELD to provide semantically-informed initializations for the novel class heads, mitigating initialization bias and enhancing resistance to overfitting. Extensive experiments on PASCAL VOC and MS COCO demonstrate the robustness and superiority of our proposed method over multiple baselines.
{"title":"Class extension logits distillation for few-shot object detection","authors":"Taijin Zhao, Heqian Qiu, Lanxiao Wang, Yu Dai, Qingbo Wu, Hongliang Li","doi":"10.1016/j.displa.2026.103338","DOIUrl":"10.1016/j.displa.2026.103338","url":null,"abstract":"<div><div>Few-Shot Object Detection (FSOD) aims at learning robust detectors under extreme data imbalance between abundant base classes and scarce novel classes. While recent transfer learning paradigms achieve initial success through sequential base class pre-training and novel class fine-tuning, their fundamental assumption that base class trained feature encoder can generalize to novel class instances reveals critical limitations due to the information suppression of novel classes. Knowledge distillation from vision-language models like CLIP presents promising solutions, yet conventional distillation approaches exhibit inherent flaws from the perspective of Information Bottleneck (IB) principle: CLIP’s broad semantic understanding results in low information compression, and feature distillation can struggle to reconcile with FSOD’s high information compression demand, potentially leading to suboptimal information compression of the detector. Conversely, while logits distillation using only base classes can enhance information compression, it fails to preserve and transfer crucial novel class semantics from CLIP. To address these challenges, we propose a unified framework comprising Class Extension Logits Distillation (CELD) and Virtual Knowledge Parameter Initializer (VKPInit). During base training, CELD uses CLIP’s text encoder to create an expanded base-novel classifier. This acts as an IB, providing target distributions from CLIP’s visual features for both base and unseen novel classes. The detector aligns to these distributions using its base classifier and a virtual novel classifier, allowing it to learn compressed, novel-aware knowledge from CLIP. Subsequently, during novel tuning, VKPInit leverages the virtual novel classifier learned in CELD to provide semantically-informed initializations for the novel class heads, mitigating initialization bias and enhancing resistance to overfitting. Extensive experiments on PASCAL VOC and MS COCO demonstrate the robustness and superiority of our proposed method over multiple baselines.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103338"},"PeriodicalIF":3.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}