首页 > 最新文献

Displays最新文献

英文 中文
Differences in streaming quality impact viewer expectations, attitudes and reactions to video 流媒体质量的差异会影响观众对视频的期望、态度和反应
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-12 DOI: 10.1016/j.displa.2026.103350
Christopher A. Sanchez, Nisha Raghunath, Chelsea Ahart
Given the massive amount of visual media consumed across the world everyday, an open question is whether deviations from high-quality streaming can negatively impact viewer’s opinions and attitudes towards viewed content? Previous research has shown that reductions in perceptual quality can negatively impact attitudes in other contexts. These changes in quality often lead to corresponding changes in attitudes. Are users sensitive to changes in video quality, and does this impact reactions to viewed content? For example, do users enjoy lower quality videos as much as higher-quality versions? Do quality differences also make viewers less receptive to the content of videos? Across two studies, participants watched a video in lower- or higher-quality, and were then queried regarding their viewing experience. This included ratings of attitudes towards video streaming and video content, and also included measures of factual recall. Results indicated that viewers significantly prefer videos presented in higher quality, which drives future viewing intentions. Further, while factual memory for information was equivalent across video quality, participants who viewed the higher-quality video were more likely to show an affective reaction to the video, and also change their attitudes relative to the presented content. These results have implications for the design and delivery of online video content, and suggests that any deviations from higher-quality presentations can bias opinions relative to the viewed content. Lower-quality videos decreased attitudes towards content, and also negatively impacted viewers’ receptiveness to presented content.
鉴于世界各地每天消费的大量视觉媒体,一个悬而未决的问题是,偏离高质量的流媒体是否会对观众对所观看内容的意见和态度产生负面影响?先前的研究表明,感知质量的降低会对其他情况下的态度产生负面影响。这些品质的变化往往导致态度的相应变化。用户对视频质量的变化是否敏感?这是否会影响用户对观看内容的反应?例如,用户是否像喜欢高质量视频一样喜欢低质量视频?质量差异是否也会使观众对视频内容的接受度降低?在两项研究中,参与者观看了低质量或高质量的视频,然后询问他们的观看体验。这包括对视频流媒体和视频内容的态度评级,也包括对事实回忆的测量。结果表明,观众明显更喜欢高质量的视频,这推动了未来的观看意图。此外,虽然对信息的事实记忆在视频质量上是相同的,但观看高质量视频的参与者更有可能对视频表现出情感反应,并且也会相对于所呈现的内容改变他们的态度。这些结果对在线视频内容的设计和交付具有启示意义,并表明任何与高质量演示的偏差都可能使人们对所观看的内容产生偏见。低质量的视频降低了观众对内容的态度,也对观众对所呈现内容的接受程度产生了负面影响。
{"title":"Differences in streaming quality impact viewer expectations, attitudes and reactions to video","authors":"Christopher A. Sanchez,&nbsp;Nisha Raghunath,&nbsp;Chelsea Ahart","doi":"10.1016/j.displa.2026.103350","DOIUrl":"10.1016/j.displa.2026.103350","url":null,"abstract":"<div><div>Given the massive amount of visual media consumed across the world everyday, an open question is whether deviations from high-quality streaming can negatively impact viewer’s opinions and attitudes towards viewed content? Previous research has shown that reductions in perceptual quality can negatively impact attitudes in other contexts. These changes in quality often lead to corresponding changes in attitudes. Are users sensitive to changes in video quality, and does this impact reactions to viewed content? For example, do users enjoy lower quality videos as much as higher-quality versions? Do quality differences also make viewers less receptive to the content of videos? Across two studies, participants watched a video in lower- or higher-quality, and were then queried regarding their viewing experience. This included ratings of attitudes towards video streaming and video content, and also included measures of factual recall. Results indicated that viewers significantly prefer videos presented in higher quality, which drives future viewing intentions. Further, while factual memory for information was equivalent across video quality, participants who viewed the higher-quality video were more likely to show an affective reaction to the video, and also change their attitudes relative to the presented content. These results have implications for the design and delivery of online video content, and suggests that any deviations from higher-quality presentations can bias opinions relative to the viewed content. Lower-quality videos decreased attitudes towards content, and also negatively impacted viewers’ receptiveness to presented content.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103350"},"PeriodicalIF":3.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards LiDAR point cloud geometry compression using rate-distortion optimization and adaptive quantization for human-machine vision 基于率失真优化和自适应量化的人机视觉激光雷达点云几何压缩
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-11 DOI: 10.1016/j.displa.2026.103344
Yihan Wang , Yongfang Wang , Shuo Zhu , Zhijun Fang
Due to rapid advances in 3-Dimensional (3D) sensing and rendering technologies, point clouds have become increasingly widespread, bring significant challenges for transmission and storage. Existing LiDAR Point Cloud Compression (PCC) methods primarily focus on enhancing compression efficiency and maintaining high signal fidelity, with insufficient considering human and machine joint perception. This paper proposes Rate Distortion Optimization (RDO) and Adaptive Quantization (AQ) for LiDAR Point Cloud Geometry Compression (PCGC) to balance human–machine vision performance. Specifically, we first propose Hybrid Distortion RDO (HDRDO) using hybrid distortion and Lagrange multiplier, where the optimal weights are determined by Differential Evolution (DE) algorithm. Furthermore, by comprehensively analyzing the impacts of point clouds on a Gaussian-based classification method on overall quality, we propose a HDRDO-based AQ method to adaptively quantify important and non-important points by optimal Quantization Parameter (QP) selection. We implement on Geometry-based Point Cloud Compression (G-PCC) Test Model Category 1 and 3 (TMC13), called the anchor method. Compared with the anchor method, the proposed algorithm achieves consistent PSNR for human vision tasks and improves by 2.66% and 21.18% on accuracy at low bitrates for detection and segmentation, respectively. Notably, the proposed overall method performs better than the existing method.
由于三维(3D)传感和渲染技术的快速发展,点云变得越来越普遍,给传输和存储带来了重大挑战。现有的激光雷达点云压缩(PCC)方法主要侧重于提高压缩效率和保持高信号保真度,没有充分考虑人与机器的联合感知。本文提出了激光雷达点云几何压缩(PCGC)的速率失真优化(RDO)和自适应量化(AQ)来平衡人机视觉性能。具体来说,我们首先提出了混合失真RDO (HDRDO),使用混合失真和拉格朗日乘法器,其中最优权重由差分进化(DE)算法确定。在综合分析点云对高斯分类方法整体质量影响的基础上,提出了一种基于hdrdo的AQ方法,通过最优量化参数(QP)选择自适应量化重要点和非重要点。我们实现了基于几何的点云压缩(G-PCC)测试模型类别1和3 (TMC13),称为锚点方法。与锚点方法相比,该算法在人类视觉任务中实现了一致的PSNR,在低比特率下的检测和分割准确率分别提高了2.66%和21.18%。值得注意的是,本文提出的方法总体性能优于现有方法。
{"title":"Towards LiDAR point cloud geometry compression using rate-distortion optimization and adaptive quantization for human-machine vision","authors":"Yihan Wang ,&nbsp;Yongfang Wang ,&nbsp;Shuo Zhu ,&nbsp;Zhijun Fang","doi":"10.1016/j.displa.2026.103344","DOIUrl":"10.1016/j.displa.2026.103344","url":null,"abstract":"<div><div>Due to rapid advances in 3-Dimensional (3D) sensing and rendering technologies, point clouds have become increasingly widespread, bring significant challenges for transmission and storage. Existing LiDAR Point Cloud Compression (PCC) methods primarily focus on enhancing compression efficiency and maintaining high signal fidelity, with insufficient considering human and machine joint perception. This paper proposes Rate Distortion Optimization (RDO) and Adaptive Quantization (AQ) for LiDAR Point Cloud Geometry Compression (PCGC) to balance human–machine vision performance. Specifically, we first propose Hybrid Distortion RDO (HDRDO) using hybrid distortion and Lagrange multiplier, where the optimal weights are determined by Differential Evolution (DE) algorithm. Furthermore, by comprehensively analyzing the impacts of point clouds on a Gaussian-based classification method on overall quality, we propose a HDRDO-based AQ method to adaptively quantify important and non-important points by optimal Quantization Parameter (QP) selection. We implement on Geometry-based Point Cloud Compression (G-PCC) Test Model Category 1 and 3 (TMC13), called the anchor method. Compared with the anchor method, the proposed algorithm achieves consistent PSNR for human vision tasks and improves by 2.66% and 21.18% on accuracy at low bitrates for detection and segmentation, respectively. Notably, the proposed overall method performs better than the existing method.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103344"},"PeriodicalIF":3.4,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Direct LiDAR-supervised surface-aligned 3D Gaussian Splatting for high-fidelity digital twin 直接激光雷达监督表面对准三维高斯溅射高保真数字孪生
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-10 DOI: 10.1016/j.displa.2026.103349
Xingdong Sheng , Qi Zhou , Xu Liu , Zhenyang Qu , Haoyu Xu , Shijie Mao , Xiaokang Yang
3D Gaussian Splatting (3DGS) has recently demonstrated remarkable rendering speed and photorealistic quality for 3D reconstruction. Yet precise surface reconstruction and view-consistent photometric fidelity remain challenging, because the standard pipeline lacks explicit geometry supervision. Several recent approaches incorporate dense LiDAR point clouds as guidance, typically by aligning Gaussian centers or projecting LiDAR points into pseudo-depth maps. However, such methods constrain positions only and overlook the anisotropic shapes of the Gaussians, often resulting in rough surfaces and residual artifacts. To overcome these limitations, we propose a direct LiDAR-supervised surface-aligned regularization loss that simultaneously constrains Gaussian positions and shapes without converting LiDAR scans into depth maps. We further introduce adaptive densification and a multi-view depth-guided pruning strategy to enhance fidelity and suppress floaters. Extensive experiments on diverse indoor and outdoor datasets that represent the demands of industrial digital-twin applications show that our method consistently improves photorealistic rendering, even under significant viewpoint deviations, demonstrating advantages over existing typical LiDAR-assisted 3DGS methods.
3D高斯喷溅(3DGS)最近在3D重建中表现出了惊人的渲染速度和逼真的质量。然而,由于标准管道缺乏明确的几何形状监督,精确的表面重建和视场一致的光度保真度仍然具有挑战性。最近的几种方法将密集的激光雷达点云作为制导,通常是通过对齐高斯中心或将激光雷达点投影到伪深度图中。然而,这些方法只限制了位置,而忽略了高斯函数的各向异性形状,经常导致粗糙的表面和残留的伪影。为了克服这些限制,我们提出了一种直接激光雷达监督的表面对齐正则化损失,同时约束高斯位置和形状,而无需将激光雷达扫描转换为深度图。我们进一步引入自适应致密化和多视图深度引导修剪策略来提高保真度并抑制飞蚊。在代表工业数字孪生应用需求的各种室内和室外数据集上进行的大量实验表明,即使在显著的视点偏差下,我们的方法也能持续改善真实感渲染,显示出比现有典型的激光雷达辅助3DGS方法的优势。
{"title":"Direct LiDAR-supervised surface-aligned 3D Gaussian Splatting for high-fidelity digital twin","authors":"Xingdong Sheng ,&nbsp;Qi Zhou ,&nbsp;Xu Liu ,&nbsp;Zhenyang Qu ,&nbsp;Haoyu Xu ,&nbsp;Shijie Mao ,&nbsp;Xiaokang Yang","doi":"10.1016/j.displa.2026.103349","DOIUrl":"10.1016/j.displa.2026.103349","url":null,"abstract":"<div><div>3D Gaussian Splatting (3DGS) has recently demonstrated remarkable rendering speed and photorealistic quality for 3D reconstruction. Yet precise surface reconstruction and view-consistent photometric fidelity remain challenging, because the standard pipeline lacks explicit geometry supervision. Several recent approaches incorporate dense LiDAR point clouds as guidance, typically by aligning Gaussian centers or projecting LiDAR points into pseudo-depth maps. However, such methods constrain positions only and overlook the anisotropic shapes of the Gaussians, often resulting in rough surfaces and residual artifacts. To overcome these limitations, we propose a direct LiDAR-supervised surface-aligned regularization loss that simultaneously constrains Gaussian positions and shapes without converting LiDAR scans into depth maps. We further introduce adaptive densification and a multi-view depth-guided pruning strategy to enhance fidelity and suppress floaters. Extensive experiments on diverse indoor and outdoor datasets that represent the demands of industrial digital-twin applications show that our method consistently improves photorealistic rendering, even under significant viewpoint deviations, demonstrating advantages over existing typical LiDAR-assisted 3DGS methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103349"},"PeriodicalIF":3.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging the power of eye-tracking for virtual prototype evaluation: a comparison between virtual reality and photorealistic images 利用眼动追踪的力量进行虚拟原型评估:虚拟现实与逼真图像的比较
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-10 DOI: 10.1016/j.displa.2026.103343
Almudena Palacios-Ibáñez , Manuel F. Contero-López , Santiago Castellet-Lathan , Nathan Hartman , Manuel Contero
Most of the information we gather from our environment is obtained from sight, hence, visual evaluation is vital for assessing products. However, designers have traditionally relied on self-report questionnaires for this purpose, which have proven to be insufficient in some cases. Consequently, physiological measures are being employed to gain a deeper understanding of the cognitive and perceptual processes involved in product evaluation, and, thanks to their integration in Virtual Reality (VR) headsets, they have become a powerful tool for virtual prototype assessment. Still, using virtual prototypes raises some concerns, as previous studies have found that the medium can influence product perception. These results rely solely on self-report techniques, highlighting the need to explore the use of ET for product assessment, which is the main objective of this research. We present two case studies where a group of people assessed through two display mediums (CS-1) a set of furniture comprising a general scene using a ranking-type evaluation (i.e., joint assessment) and (CS-2) two armchairs individually using the Semantic Differential technique. Moreover, the dwell time of the Areas of Interest (AOIs) defined was recorded. Primarily, our results showed that, despite VR being sensitive to aesthetic differences between designs of the same product typology, the medium may still influence the perception of specific product attributes —e.g., fragility (pMODERN < 0.001, pTRADITIONAL = 0.002)—, and observation of specific AOIs —e.g., AOI1 (pMODERN = 0.003, pTRADITIONAL < 0.001), AOI9 and AOI10 (p < 0.001). At the same time, no differences were found in the perception of the general scene, whereas dwell time was influenced for AOI1 (p = 0.003), AOI4 (p = 0.006), and AOI5 (<.001). Additionally, the university of origin may also be a factor influencing product evaluation, while confidence in the response was not affected by the medium. Hence, this study contributes to a deeper understanding of how the medium influences product perception by employing ET with self-report methods, offering valuable insights into user behavior.
我们从环境中收集的大部分信息都是通过视觉获得的,因此,视觉评估对于评估产品至关重要。然而,设计师传统上依靠自我报告问卷来达到这个目的,这在某些情况下被证明是不够的。因此,生理测量被用于更深入地了解产品评估中涉及的认知和感知过程,并且由于它们集成在虚拟现实(VR)头显中,它们已成为虚拟原型评估的强大工具。然而,使用虚拟原型引起了一些担忧,因为之前的研究发现,这种媒介会影响产品的感知。这些结果完全依赖于自我报告技术,突出了探索使用ET进行产品评估的必要性,这是本研究的主要目标。我们提出了两个案例研究,其中一组人通过两种显示媒介(CS-1)评估一套家具,包括使用排名类型评估(即联合评估)的一般场景,(CS-2)使用语义差异技术分别评估两把扶手椅。此外,还记录了所定义的兴趣区域(aoi)的停留时间。首先,我们的研究结果表明,尽管VR对相同产品类型的设计之间的美学差异很敏感,但媒介仍然可能影响对特定产品属性的感知。,脆弱性(pMODERN < 0.001, pTRADITIONAL = 0.002) -和特定aoi的观察-例如;, AOI1 (pMODERN = 0.003, pTRADITIONAL < 0.001), AOI9和AOI10 (p < 0.001)。同时,AOI1 (p = 0.003)、AOI4 (p = 0.006)和AOI5 (< 001)的停留时间受到AOI1 (p = 0.003)、AOI4 (p = 0.006)和AOI5的影响。此外,原产大学也可能是影响产品评价的一个因素,而对反应的信心不受媒介的影响。因此,本研究通过使用ET和自我报告方法,有助于更深入地了解媒体如何影响产品感知,为用户行为提供有价值的见解。
{"title":"Leveraging the power of eye-tracking for virtual prototype evaluation: a comparison between virtual reality and photorealistic images","authors":"Almudena Palacios-Ibáñez ,&nbsp;Manuel F. Contero-López ,&nbsp;Santiago Castellet-Lathan ,&nbsp;Nathan Hartman ,&nbsp;Manuel Contero","doi":"10.1016/j.displa.2026.103343","DOIUrl":"10.1016/j.displa.2026.103343","url":null,"abstract":"<div><div>Most of the information we gather from our environment is obtained from sight, hence, visual evaluation is vital for assessing products. However, designers have traditionally relied on self-report questionnaires for this purpose, which have proven to be insufficient in some cases. Consequently, physiological measures are being employed to gain a deeper understanding of the cognitive and perceptual processes involved in product evaluation, and, thanks to their integration in Virtual Reality (VR) headsets, they have become a powerful tool for virtual prototype assessment. Still, using virtual prototypes raises some concerns, as previous studies have found that the medium can influence product perception. These results rely solely on self-report techniques, highlighting the need to explore the use of ET for product assessment, which is the main objective of this research. We present two case studies where a group of people assessed through two display mediums (CS-1) a set of furniture comprising a general scene using a ranking-type evaluation (i.e., joint assessment) and (CS-2) two armchairs individually using the Semantic Differential technique. Moreover, the dwell time of the Areas of Interest (AOIs) defined was recorded. Primarily, our results showed that, despite VR being sensitive to aesthetic differences between designs of the same product typology, the medium may still influence the perception of specific product attributes —e.g., fragility (p<sub>MODERN</sub> &lt; 0.001, p<sub>TRADITIONAL</sub> = 0.002)—, and observation of specific AOIs —e.g., AOI1 (p<sub>MODERN</sub> = 0.003, p<sub>TRADITIONAL</sub> &lt; 0.001), AOI9 and AOI10 (p &lt; 0.001). At the same time, no differences were found in the perception of the general scene, whereas dwell time was influenced for AOI1 (p = 0.003), AOI4 (p = 0.006), and AOI5 (&lt;.001). Additionally, the university of origin may also be a factor influencing product evaluation, while confidence in the response was not affected by the medium. Hence, this study contributes to a deeper understanding of how the medium influences product perception by employing ET with self-report methods, offering valuable insights into user behavior.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103343"},"PeriodicalIF":3.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced white efficiency using film color filter via internal reflectance control by capping and refractive index matching layers for rigid OLED panels 采用薄膜滤色片,通过覆盖层和折射率匹配层控制内部反射率,提高了硬OLED面板的白色效率
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-09 DOI: 10.1016/j.displa.2026.103345
Horyun Chung , Eunjae Na , Myunghwan Kim , Sungguk An , Yeong Hwan Ko , Jae Su Yu
To enhance external luminous efficiency and reduce power consumption in rigid top-emitting organic light-emitting diode (OLED) panels for mobile applications, a film color filter was introduced as a promising alternative for conventional polarizers. The film color filter exhibited higher transmittance in the red, green, and blue emission wavelength regions of OLEDs compared to the polarizer, thereby improving external luminous efficiency. However, its application also increases reflectance due to external light, which necessitates optimization strategies to mitigate this drawback. To address this issue, the internal reflection within the OLED panel was reduced by optimizing the capping layer (CPL) thickness from 60 to 40 nm. Additionally, a refractive index matching layer was implemented between the encapsulation glass and the CPL, resulting in a 24.5% reduction in the specular component included (SCI) reflectance and a decrease in the absolute value of the specular component excluded (SCE) reflection color coordinate. White efficiency typically decreases with the reduction of the CPL thickness; however, the Device B exhibited improvements of 13.7%, 16.8%, and 12.4% in white efficiency compared to the polarizer at the CPL thicknesses of 40, 50, and 60 nm, respectively. This enhancement was particularly pronounced in the blue emission region, where the luminous efficiency is inherently lower. These findings indicate that optimizing the CPL thickness to 40 nm in conjunction with the Device B effectively reduces SCI reflectance, improves SCE reflection color coordinate, and enhances white efficiency. This study demonstrates that replacing the conventional polarizer with a film color filter is a viable approach to achieving higher luminous efficiency in rigid top-emitting OLED panels for mobile devices.
为了提高用于移动应用的刚性顶发射有机发光二极管(OLED)面板的外部发光效率并降低功耗,引入了一种薄膜彩色滤光片,作为传统偏光片的一种有前途的替代方案。与偏光镜相比,薄膜滤色片在oled的红、绿、蓝发射波长区域具有更高的透过率,从而提高了外发光效率。然而,由于外部光线的影响,它的应用也增加了反射率,这就需要优化策略来减轻这一缺点。为了解决这个问题,通过将封盖层(CPL)厚度从60 nm优化到40 nm,从而减少了OLED面板内的内部反射。此外,在封装玻璃和CPL之间添加折射率匹配层,使得包含镜面分量(SCI)反射率降低24.5%,排除镜面分量(SCE)反射颜色坐标绝对值降低。白效率随CPL厚度的减小而减小;然而,与CPL厚度为40nm、50nm和60nm的偏振器相比,器件B的白色效率分别提高了13.7%、16.8%和12.4%。这种增强在发光效率固有较低的蓝色发射区尤为明显。上述结果表明,将CPL厚度优化至40 nm,结合Device B可有效降低SCI反射率,改善SCE反射色坐标,提高白色效率。该研究表明,用薄膜彩色滤光片取代传统的偏光片是一种可行的方法,可以在移动设备的刚性顶发射OLED面板上实现更高的发光效率。
{"title":"Enhanced white efficiency using film color filter via internal reflectance control by capping and refractive index matching layers for rigid OLED panels","authors":"Horyun Chung ,&nbsp;Eunjae Na ,&nbsp;Myunghwan Kim ,&nbsp;Sungguk An ,&nbsp;Yeong Hwan Ko ,&nbsp;Jae Su Yu","doi":"10.1016/j.displa.2026.103345","DOIUrl":"10.1016/j.displa.2026.103345","url":null,"abstract":"<div><div>To enhance external luminous efficiency and reduce power consumption in rigid top-emitting organic light-emitting diode (OLED) panels for mobile applications, a film color filter was introduced as a promising alternative for conventional polarizers. The film color filter exhibited higher transmittance in the red, green, and blue emission wavelength regions of OLEDs compared to the polarizer, thereby improving external luminous efficiency. However, its application also increases reflectance due to external light, which necessitates optimization strategies to mitigate this drawback. To address this issue, the internal reflection within the OLED panel was reduced by optimizing the capping layer (CPL) thickness from 60 to 40 nm. Additionally, a refractive index matching layer was implemented between the encapsulation glass and the CPL, resulting in a 24.5% reduction in the specular component included (SCI) reflectance and a decrease in the absolute value of the specular component excluded (SCE) reflection color coordinate. White efficiency typically decreases with the reduction of the CPL thickness; however, the Device B exhibited improvements of 13.7%, 16.8%, and 12.4% in white efficiency compared to the polarizer at the CPL thicknesses of 40, 50, and 60 nm, respectively. This enhancement was particularly pronounced in the blue emission region, where the luminous efficiency is inherently lower. These findings indicate that optimizing the CPL thickness to 40 nm in conjunction with the Device B effectively reduces SCI reflectance, improves SCE reflection color coordinate, and enhances white efficiency. This study demonstrates that replacing the conventional polarizer with a film color filter is a viable approach to achieving higher luminous efficiency in rigid top-emitting OLED panels for mobile devices.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103345"},"PeriodicalIF":3.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated prompt-guided multi-modality cell segmentation with shape-aware classification and boundary-aware SAM adaptation 具有形状感知分类和边界感知SAM适应的自动提示引导多模态细胞分割
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-07 DOI: 10.1016/j.displa.2025.103337
Deboch Eyob Abera , Jiaye He , Jia Liu , Nazar Zaki , Wenjian Qin
Robust and accurate cell segmentation across diverse imaging modalities remains a critical challenge in microscopy image analysis. While foundation models like the Segment Anything Model (SAM) have demonstrated exceptional performance in natural image segmentation, their adaptation to multi-modal cellular analysis is hindered by domain-specific knowledge gaps and morphological complexity. To bridge this gap, we present a novel SAM-driven framework featuring three systematic innovations: First, we propose Shape-Aware Classification to enhance segmentation of cells with diverse morphologies. Second, Auto Point Prompt Generation (APPGen) module guides the segmentation model with automatically generated point cues to improve segmentation accuracy. Third, we implement Boundary-Aware SAM Adaptation to effectively resolve overlapping cells in microscopy images. Our experiments show that the proposed framework reduces manual effort through automated prompts, adapts well to different imaging modalities, and enhances segmentation accuracy by incorporating boundary-aware techniques. The source code is available at https://github.com/MIXAILAB/Multi_Modality_CellSeg.
鲁棒和准确的细胞分割跨越不同的成像模式仍然是显微镜图像分析的关键挑战。虽然像SAM这样的基础模型在自然图像分割中表现出色,但它们对多模态细胞分析的适应受到领域特定知识差距和形态复杂性的阻碍。为了弥补这一差距,我们提出了一种新的sam驱动框架,该框架具有三个系统创新:首先,我们提出了形状感知分类来增强具有不同形态的细胞的分割。其次,自动点提示生成(APPGen)模块以自动生成的点提示引导分割模型,提高分割精度。第三,我们实现了边界感知的SAM自适应,有效地解决了显微镜图像中的重叠细胞。我们的实验表明,该框架通过自动提示减少了人工工作量,很好地适应了不同的成像模式,并通过结合边界感知技术提高了分割精度。源代码可从https://github.com/MIXAILAB/Multi_Modality_CellSeg获得。
{"title":"Automated prompt-guided multi-modality cell segmentation with shape-aware classification and boundary-aware SAM adaptation","authors":"Deboch Eyob Abera ,&nbsp;Jiaye He ,&nbsp;Jia Liu ,&nbsp;Nazar Zaki ,&nbsp;Wenjian Qin","doi":"10.1016/j.displa.2025.103337","DOIUrl":"10.1016/j.displa.2025.103337","url":null,"abstract":"<div><div>Robust and accurate cell segmentation across diverse imaging modalities remains a critical challenge in microscopy image analysis. While foundation models like the Segment Anything Model (SAM) have demonstrated exceptional performance in natural image segmentation, their adaptation to multi-modal cellular analysis is hindered by domain-specific knowledge gaps and morphological complexity. To bridge this gap, we present a novel SAM-driven framework featuring three systematic innovations: First, we propose Shape-Aware Classification to enhance segmentation of cells with diverse morphologies. Second, Auto Point Prompt Generation (APPGen) module guides the segmentation model with automatically generated point cues to improve segmentation accuracy. Third, we implement Boundary-Aware SAM Adaptation to effectively resolve overlapping cells in microscopy images. Our experiments show that the proposed framework reduces manual effort through automated prompts, adapts well to different imaging modalities, and enhances segmentation accuracy by incorporating boundary-aware techniques. The source code is available at <span><span>https://github.com/MIXAILAB/Multi_Modality_CellSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103337"},"PeriodicalIF":3.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An optimized convolutional neural network based on multi-strategy grey wolf optimizer to identify crop diseases and pests 基于多策略灰狼优化器的优化卷积神经网络作物病虫害识别
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-06 DOI: 10.1016/j.displa.2026.103341
Xiaobing Yu , Hongqian Zhang , Yuchen Duan , Xuming Wang
Agriculture plays a crucial role in national food security, with crop diseases and pests being major threats to agricultural sustainability. Traditional detection methods are labor-intensive, subjective, and often inaccurate. Recent advancements in deep learning have significantly improved image-based recognition; however, the performance of convolutional neural networks (CNNs) is highly dependent on hyperparameter tuning, which remains a challenging task. To address this issue, this study proposes a multi-strategy grey wolf optimizer (MGWO) to enhance CNN hyperparameter optimization. MGWO improves the global search efficiency of the conventional grey wolf optimizer (GWO), enabling automatic selection of optimal hyperparameters. The proposed approach is evaluated on corn disease and Pentatomidae stinkbug pest classification, comparing its performance against a baseline CNN model and six other optimization algorithms. Experimental results show that MGWO achieves 95.71% accuracy on the corn disease dataset and 94.46% on the pest dataset, outperforming all competing methods.
These findings demonstrate the potential of MGWO in optimizing deep learning models for agricultural applications, providing a robust and automated solution for crop disease and pest recognition.
农业在国家粮食安全中发挥着至关重要的作用,作物病虫害是农业可持续性的主要威胁。传统的检测方法是劳动密集型的,主观的,而且往往不准确。深度学习的最新进展显著改善了基于图像的识别;然而,卷积神经网络(cnn)的性能高度依赖于超参数调谐,这仍然是一个具有挑战性的任务。为了解决这一问题,本研究提出了一种多策略灰狼优化器(MGWO)来增强CNN的超参数优化。MGWO提高了传统灰狼优化器的全局搜索效率,实现了最优超参数的自动选择。在玉米病害和蝽科臭虫分类中对该方法进行了评估,并将其与基线CNN模型和其他六种优化算法的性能进行了比较。实验结果表明,MGWO在玉米病害数据集上的准确率为95.71%,在害虫数据集上的准确率为94.46%,优于所有竞争方法。这些发现证明了MGWO在优化农业应用深度学习模型方面的潜力,为作物病虫害识别提供了强大的自动化解决方案。
{"title":"An optimized convolutional neural network based on multi-strategy grey wolf optimizer to identify crop diseases and pests","authors":"Xiaobing Yu ,&nbsp;Hongqian Zhang ,&nbsp;Yuchen Duan ,&nbsp;Xuming Wang","doi":"10.1016/j.displa.2026.103341","DOIUrl":"10.1016/j.displa.2026.103341","url":null,"abstract":"<div><div>Agriculture plays a crucial role in national food security, with crop diseases and pests being major threats to agricultural sustainability. Traditional detection methods are labor-intensive, subjective, and often inaccurate. Recent advancements in deep learning have significantly improved image-based recognition; however, the performance of convolutional neural networks (CNNs) is highly dependent on hyperparameter tuning, which remains a challenging <span><span>task. To</span><svg><path></path></svg></span> address this issue, this study proposes a multi-strategy grey wolf optimizer (MGWO) to enhance CNN hyperparameter optimization. MGWO improves the global search efficiency of the conventional grey wolf optimizer (GWO), enabling automatic selection of optimal hyperparameters. The proposed approach is evaluated on corn disease and Pentatomidae stinkbug pest classification, comparing its performance against a baseline CNN model and six other optimization algorithms. Experimental results show that MGWO achieves 95.71% accuracy on the corn disease dataset and 94.46% on the pest dataset, outperforming all competing methods.</div><div>These findings demonstrate the potential of MGWO in optimizing deep learning models for agricultural applications, providing a robust and automated solution for crop disease and pest recognition.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103341"},"PeriodicalIF":3.4,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameter-efficient fine-tuning for no-reference image quality assessment: Empirical studies on vision transformer 无参考图像质量评估的参数高效微调:视觉变压器的实证研究
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-05 DOI: 10.1016/j.displa.2026.103339
GuangLu Sun, Kaiwei Lei, Tianlin Li, Linsen Yu, Suxia Zhu
Parameter-Efficient Fine-Tuning (PEFT) is a transfer learning technique designed to adapt pre-trained models to downstream tasks while minimizing parameter and computational complexity. In recent years, No-Reference Image Quality Assessment (NR-IQA) methods based on pre-trained visual models have achieved significant progress. However, most of these methods rely on full fine-tuning, which requires substantial computational and memory resources. A natural question arises: can PEFT techniques achieve parameter-efficient NR-IQA with good performance? To explore this, we perform empirical studies using several PEFT methods on pre-trained Vision Transformer (ViT) model. Specifically, we select three PEFT approaches – adapter tuning, prompt tuning, and partial tuning – that have proven effective in general vision tasks, and investigate whether they can achieve performance comparable to traditional visual NR-IQA models. Among them, which is the most effective? Furthermore, we examine the impact of four key factors on the results: fine-tuning position, parameter configuration, layer selection strategy, and the scale of pre-trained weights. Finally, we evaluate whether the optimal PEFT strategy on ViT can be generalized to other Transformer-based architectures. This work offers valuable insights and practical guidance for future research on PEFT methods in NR-IQA tasks.
参数有效微调(PEFT)是一种迁移学习技术,旨在使预训练模型适应下游任务,同时最小化参数和计算复杂性。近年来,基于预训练视觉模型的无参考图像质量评估(NR-IQA)方法取得了重大进展。然而,这些方法中的大多数依赖于完全微调,这需要大量的计算和内存资源。一个自然的问题出现了:PEFT技术能否以良好的性能实现参数高效的NR-IQA ?为了探讨这一点,我们使用几种PEFT方法对预训练的视觉变压器(ViT)模型进行了实证研究。具体来说,我们选择了三种PEFT方法——适配器调优、提示调优和部分调优——它们在一般视觉任务中被证明是有效的,并研究它们是否能达到与传统视觉NR-IQA模型相当的性能。其中,哪一种最有效?此外,我们研究了四个关键因素对结果的影响:微调位置、参数配置、层选择策略和预训练权重的规模。最后,我们评估了ViT上的最优PEFT策略是否可以推广到其他基于变压器的体系结构。这项工作为未来在NR-IQA任务中PEFT方法的研究提供了有价值的见解和实践指导。
{"title":"Parameter-efficient fine-tuning for no-reference image quality assessment: Empirical studies on vision transformer","authors":"GuangLu Sun,&nbsp;Kaiwei Lei,&nbsp;Tianlin Li,&nbsp;Linsen Yu,&nbsp;Suxia Zhu","doi":"10.1016/j.displa.2026.103339","DOIUrl":"10.1016/j.displa.2026.103339","url":null,"abstract":"<div><div>Parameter-Efficient Fine-Tuning (PEFT) is a transfer learning technique designed to adapt pre-trained models to downstream tasks while minimizing parameter and computational complexity. In recent years, No-Reference Image Quality Assessment (NR-IQA) methods based on pre-trained visual models have achieved significant progress. However, most of these methods rely on full fine-tuning, which requires substantial computational and memory resources. A natural question arises: can PEFT techniques achieve parameter-efficient NR-IQA with good performance? To explore this, we perform empirical studies using several PEFT methods on pre-trained Vision Transformer (ViT) model. Specifically, we select three PEFT approaches – adapter tuning, prompt tuning, and partial tuning – that have proven effective in general vision tasks, and investigate whether they can achieve performance comparable to traditional visual NR-IQA models. Among them, which is the most effective? Furthermore, we examine the impact of four key factors on the results: fine-tuning position, parameter configuration, layer selection strategy, and the scale of pre-trained weights. Finally, we evaluate whether the optimal PEFT strategy on ViT can be generalized to other Transformer-based architectures. This work offers valuable insights and practical guidance for future research on PEFT methods in NR-IQA tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103339"},"PeriodicalIF":3.4,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AFFLIE: Adaptive feature fusion for low-light image enhancement AFFLIE:低光图像增强的自适应特征融合
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-03 DOI: 10.1016/j.displa.2026.103340
Yaxin Lin , Xiaopeng Li , Lian Zou , Liqing Zhou , Cien Fan
Under low illumination, RGB cameras often capture images with significant noise and low visibility, while event cameras, with their high dynamic range characteristic, emerge as a promising solution for improving image quality in the low-light environment by supplementing image details in low-light condition. In this paper, we propose a novel image enhancement framework called AFFLIE, which integrates event and frame-based techniques to improve image quality in low-light conditions. The framework introduces a Multi-scale Spatial-Channel Transformer Encoder (MS-SCTE) to address low-light image noise and event temporal characteristics. Additionally, an Adaptive Feature Fusion Module (AFFM) is proposed to dynamically aggregate features from both image and event streams, enhancing generalization performance. The framework demonstrates superior performance on the SDE, LIE and RELED datasets by enhancing noise reduction and detail preservation.
在低照度条件下,RGB相机拍摄到的图像噪声大、能见度低,而事件相机由于具有高动态范围的特性,通过补充低照度条件下的图像细节来提高低照度环境下的图像质量,是一种很有前景的解决方案。在本文中,我们提出了一种新的图像增强框架AFFLIE,它集成了基于事件和帧的技术来提高低光条件下的图像质量。该框架引入了一个多尺度空间通道变压器编码器(MS-SCTE)来解决低光图像噪声和事件时间特性。此外,提出了一种自适应特征融合模块(AFFM),对图像流和事件流的特征进行动态聚合,提高了泛化性能。该框架通过增强降噪和细节保存功能,在SDE、LIE和RELED数据集上表现出优异的性能。
{"title":"AFFLIE: Adaptive feature fusion for low-light image enhancement","authors":"Yaxin Lin ,&nbsp;Xiaopeng Li ,&nbsp;Lian Zou ,&nbsp;Liqing Zhou ,&nbsp;Cien Fan","doi":"10.1016/j.displa.2026.103340","DOIUrl":"10.1016/j.displa.2026.103340","url":null,"abstract":"<div><div>Under low illumination, RGB cameras often capture images with significant noise and low visibility, while event cameras, with their high dynamic range characteristic, emerge as a promising solution for improving image quality in the low-light environment by supplementing image details in low-light condition. In this paper, we propose a novel image enhancement framework called AFFLIE, which integrates event and frame-based techniques to improve image quality in low-light conditions. The framework introduces a Multi-scale Spatial-Channel Transformer Encoder (MS-SCTE) to address low-light image noise and event temporal characteristics. Additionally, an Adaptive Feature Fusion Module (AFFM) is proposed to dynamically aggregate features from both image and event streams, enhancing generalization performance. The framework demonstrates superior performance on the SDE, LIE and RELED datasets by enhancing noise reduction and detail preservation.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103340"},"PeriodicalIF":3.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class extension logits distillation for few-shot object detection 类扩展logits精馏为少数射击目标检测
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-02 DOI: 10.1016/j.displa.2026.103338
Taijin Zhao, Heqian Qiu, Lanxiao Wang, Yu Dai, Qingbo Wu, Hongliang Li
Few-Shot Object Detection (FSOD) aims at learning robust detectors under extreme data imbalance between abundant base classes and scarce novel classes. While recent transfer learning paradigms achieve initial success through sequential base class pre-training and novel class fine-tuning, their fundamental assumption that base class trained feature encoder can generalize to novel class instances reveals critical limitations due to the information suppression of novel classes. Knowledge distillation from vision-language models like CLIP presents promising solutions, yet conventional distillation approaches exhibit inherent flaws from the perspective of Information Bottleneck (IB) principle: CLIP’s broad semantic understanding results in low information compression, and feature distillation can struggle to reconcile with FSOD’s high information compression demand, potentially leading to suboptimal information compression of the detector. Conversely, while logits distillation using only base classes can enhance information compression, it fails to preserve and transfer crucial novel class semantics from CLIP. To address these challenges, we propose a unified framework comprising Class Extension Logits Distillation (CELD) and Virtual Knowledge Parameter Initializer (VKPInit). During base training, CELD uses CLIP’s text encoder to create an expanded base-novel classifier. This acts as an IB, providing target distributions from CLIP’s visual features for both base and unseen novel classes. The detector aligns to these distributions using its base classifier and a virtual novel classifier, allowing it to learn compressed, novel-aware knowledge from CLIP. Subsequently, during novel tuning, VKPInit leverages the virtual novel classifier learned in CELD to provide semantically-informed initializations for the novel class heads, mitigating initialization bias and enhancing resistance to overfitting. Extensive experiments on PASCAL VOC and MS COCO demonstrate the robustness and superiority of our proposed method over multiple baselines.
少射目标检测(FSOD)的目标是在大量基类和稀缺新类之间极度数据不平衡的情况下学习鲁棒检测器。虽然最近的迁移学习范式通过顺序基类预训练和新类微调取得了初步成功,但它们的基本假设——基类训练的特征编码器可以推广到新类实例——由于新类的信息抑制,暴露了严重的局限性。来自视觉语言模型(如CLIP)的知识蒸馏提供了有前途的解决方案,但从信息瓶颈(IB)原理的角度来看,传统的蒸馏方法存在固有缺陷:CLIP的广泛语义理解导致信息压缩低,特征蒸馏可能难以与FSOD的高信息压缩需求相协调,可能导致检测器的信息压缩次优。相反,虽然仅使用基类的logits蒸馏可以增强信息压缩,但它无法从CLIP中保存和传输关键的新类语义。为了解决这些挑战,我们提出了一个由类扩展逻辑蒸馏(CELD)和虚拟知识参数初始化器(VKPInit)组成的统一框架。在基础训练期间,CELD使用CLIP的文本编码器来创建扩展的基础新颖分类器。它充当IB,为基本类和不可见的新类提供来自CLIP视觉特征的目标分布。检测器使用其基本分类器和虚拟小说分类器来对齐这些分布,从而允许它从CLIP学习压缩的、小说感知的知识。随后,在新调优期间,VKPInit利用在CELD中学习到的虚拟新分类器为新类头提供语义知情的初始化,减轻初始化偏差并增强对过拟合的抵抗力。在PASCAL VOC和MS COCO上的大量实验证明了我们提出的方法在多个基线上的鲁棒性和优越性。
{"title":"Class extension logits distillation for few-shot object detection","authors":"Taijin Zhao,&nbsp;Heqian Qiu,&nbsp;Lanxiao Wang,&nbsp;Yu Dai,&nbsp;Qingbo Wu,&nbsp;Hongliang Li","doi":"10.1016/j.displa.2026.103338","DOIUrl":"10.1016/j.displa.2026.103338","url":null,"abstract":"<div><div>Few-Shot Object Detection (FSOD) aims at learning robust detectors under extreme data imbalance between abundant base classes and scarce novel classes. While recent transfer learning paradigms achieve initial success through sequential base class pre-training and novel class fine-tuning, their fundamental assumption that base class trained feature encoder can generalize to novel class instances reveals critical limitations due to the information suppression of novel classes. Knowledge distillation from vision-language models like CLIP presents promising solutions, yet conventional distillation approaches exhibit inherent flaws from the perspective of Information Bottleneck (IB) principle: CLIP’s broad semantic understanding results in low information compression, and feature distillation can struggle to reconcile with FSOD’s high information compression demand, potentially leading to suboptimal information compression of the detector. Conversely, while logits distillation using only base classes can enhance information compression, it fails to preserve and transfer crucial novel class semantics from CLIP. To address these challenges, we propose a unified framework comprising Class Extension Logits Distillation (CELD) and Virtual Knowledge Parameter Initializer (VKPInit). During base training, CELD uses CLIP’s text encoder to create an expanded base-novel classifier. This acts as an IB, providing target distributions from CLIP’s visual features for both base and unseen novel classes. The detector aligns to these distributions using its base classifier and a virtual novel classifier, allowing it to learn compressed, novel-aware knowledge from CLIP. Subsequently, during novel tuning, VKPInit leverages the virtual novel classifier learned in CELD to provide semantically-informed initializations for the novel class heads, mitigating initialization bias and enhancing resistance to overfitting. Extensive experiments on PASCAL VOC and MS COCO demonstrate the robustness and superiority of our proposed method over multiple baselines.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103338"},"PeriodicalIF":3.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Displays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1