首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Multi-scale Spatial Frequency Interaction Variance Perception Model for Deepfake Face Detection 深度假人脸检测的多尺度空间频率交互方差感知模型
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-01-14 DOI: 10.1016/j.jvcir.2026.104719
Yihang Wang , Shouxin Liu , Xudong Chen , Seok Tae Kim , Xiaowei Li
The negative effects of deepfake technology have attracted increasing attention and become a prominent social issue. Existing detection approaches typically refine conventional network architectures to uncover subtle manipulation traces, yet most focus exclusively on either spatial- or frequency-domain cues, overlooking their interaction. To address the limitations in existing deepfake detection methods, we present an innovative Multi-Scale Spatial-Frequency Variance-sensing (MSFV) model. This model effectively combines spatial and frequency information by utilizing iterative, variance-guided self-attention mechanisms. By integrating these two domains, the MSFV model enhances detection capabilities and improves the identification of subtle manipulations present in deepfake images. A dedicated high-frequency separation module further enhances the extraction of forgery indicators from the high-frequency components of manipulated images. Extensive experiments demonstrate that MSFV achieves classification accuracies of 98.95 % on the DFDC dataset and 97.92 % on the FaceForensics++ dataset, confirming its strong detection capability, generalization, and robustness compared with existing methods.
深度造假技术的负面影响已经引起了越来越多的关注,成为一个突出的社会问题。现有的检测方法通常会改进传统的网络架构,以发现微妙的操作痕迹,但大多数只关注空间或频域线索,而忽略了它们的相互作用。为了解决现有深度伪造检测方法的局限性,我们提出了一种创新的多尺度空间-频率方差感知(MSFV)模型。该模型利用迭代、方差导向的自注意机制,有效地结合了空间和频率信息。通过整合这两个域,MSFV模型增强了检测能力,并改进了对深度伪造图像中存在的微妙操纵的识别。专用的高频分离模块进一步增强了从被操纵图像的高频成分中提取伪造指标的能力。大量实验表明,MSFV在DFDC数据集上的分类准确率为98.95%,在facefrensics ++数据集上的分类准确率为97.92%,与现有方法相比,MSFV具有较强的检测能力、泛化能力和鲁棒性。
{"title":"Multi-scale Spatial Frequency Interaction Variance Perception Model for Deepfake Face Detection","authors":"Yihang Wang ,&nbsp;Shouxin Liu ,&nbsp;Xudong Chen ,&nbsp;Seok Tae Kim ,&nbsp;Xiaowei Li","doi":"10.1016/j.jvcir.2026.104719","DOIUrl":"10.1016/j.jvcir.2026.104719","url":null,"abstract":"<div><div>The negative effects of deepfake technology have attracted increasing attention and become a prominent social issue. Existing detection approaches typically refine conventional network architectures to uncover subtle manipulation traces, yet most focus exclusively on either spatial- or frequency-domain cues, overlooking their interaction. To address the limitations in existing deepfake detection methods, we present an innovative Multi-Scale Spatial-Frequency Variance-sensing (MSFV) model. This model effectively combines spatial and frequency information by utilizing iterative, variance-guided self-attention mechanisms. By integrating these two domains, the MSFV model enhances detection capabilities and improves the identification of subtle manipulations present in deepfake images. A dedicated high-frequency separation module further enhances the extraction of forgery indicators from the high-frequency components of manipulated images. Extensive experiments demonstrate that MSFV achieves classification accuracies of 98.95 % on the DFDC dataset and 97.92 % on the FaceForensics++ dataset, confirming its strong detection capability, generalization, and robustness compared with existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104719"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TSNUNet: Two-Stage Nested U-Network for salient object detection TSNUNet:两阶段嵌套u -网络显著目标检测
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-01-30 DOI: 10.1016/j.jvcir.2026.104737
Luna Sun , Zhenxue Chen , Xinming Zhu , Yu Bi , Chengyun Liu , Q.M. Jonathan Wu
Recently, while significant progress has been made in salient object detection, particularly with the advent of transformers, existing models still face challenges regarding the integrity and accuracy of predictions. To address these limitations, we propose the Two-Stage Nested U-Network (TSNUNet), which incorporates three innovative modules. First, the Pixel Shuffle Channel Convert Module (PSCCM) captures the potential cross-channel information distribution in high-level features, aligns multi-level features, and ensures accurate and complete initial predictions. Second, the Two-Stage Strategy, with its hybrid connections and a Two-Stage Fusion Module (TSFM), facilitates interactive learning across stages and directs precise location cues, further boosting prediction accuracy. Third, we design the Nested U/Trans-U Module for robust cross-level feature decoding. The Nested U/Trans-U Module employs continuous pixel unshuffle down-sampling, hierarchical adaptive top-layer enhancement, and multi-level pixel shuffle feature reconstruction, specifically contributing to improved feature representation and accuracy. Finally, through our Two-Stage combined supervision mechanism, TSNUNet is capable of effectively segmenting both complete and accurate salient objects. Experiments on 7 SOD and 4 cross-domain datasets show TSNUNet outperforms state-of-the-art methods with strong generalization capability. On RTX 4090 GPU, our SwinB-based model attains approximately 96 FPS in ideal forward inference and 61.49 FPS in practical end-to-end testing, demonstrating its real-time capability. Code: https://github.com/LnSCV/TSNUNet.
最近,虽然在显著目标检测方面取得了重大进展,特别是随着变压器的出现,现有模型仍然面临着预测完整性和准确性方面的挑战。为了解决这些限制,我们提出了两阶段嵌套u型网络(TSNUNet),它包含三个创新模块。首先,像素洗牌通道转换模块(PSCCM)捕获高级特征中潜在的跨通道信息分布,对齐多层次特征,并确保准确和完整的初始预测。其次,两阶段策略,其混合连接和两阶段融合模块(TSFM),促进了跨阶段的互动学习,并指导精确的位置线索,进一步提高了预测的准确性。第三,我们设计了嵌套U/Trans-U模块,实现了鲁棒的跨层特征解码。嵌套U/Trans-U模块采用连续像素打乱下采样、分层自适应顶层增强和多层次像素打乱特征重建,特别有助于提高特征表示和准确性。最后,通过我们的两阶段联合监督机制,TSNUNet能够有效地分割完整和准确的显著目标。在7个SOD和4个跨域数据集上的实验表明,TSNUNet具有较强的泛化能力。在RTX 4090 GPU上,我们的基于swinb的模型在理想的前向推理中达到了大约96 FPS,在实际的端到端测试中达到了61.49 FPS,证明了它的实时性。代码:https://github.com/LnSCV/TSNUNet。
{"title":"TSNUNet: Two-Stage Nested U-Network for salient object detection","authors":"Luna Sun ,&nbsp;Zhenxue Chen ,&nbsp;Xinming Zhu ,&nbsp;Yu Bi ,&nbsp;Chengyun Liu ,&nbsp;Q.M. Jonathan Wu","doi":"10.1016/j.jvcir.2026.104737","DOIUrl":"10.1016/j.jvcir.2026.104737","url":null,"abstract":"<div><div>Recently, while significant progress has been made in salient object detection, particularly with the advent of transformers, existing models still face challenges regarding the integrity and accuracy of predictions. To address these limitations, we propose the Two-Stage Nested U-Network (TSNUNet), which incorporates three innovative modules. First, the Pixel Shuffle Channel Convert Module (PSCCM) captures the potential cross-channel information distribution in high-level features, aligns multi-level features, and ensures accurate and complete initial predictions. Second, the Two-Stage Strategy, with its hybrid connections and a Two-Stage Fusion Module (TSFM), facilitates interactive learning across stages and directs precise location cues, further boosting prediction accuracy. Third, we design the Nested U/Trans-U Module for robust cross-level feature decoding. The Nested U/Trans-U Module employs continuous pixel unshuffle down-sampling, hierarchical adaptive top-layer enhancement, and multi-level pixel shuffle feature reconstruction, specifically contributing to improved feature representation and accuracy. Finally, through our Two-Stage combined supervision mechanism, TSNUNet is capable of effectively segmenting both complete and accurate salient objects. Experiments on 7 SOD and 4 cross-domain datasets show TSNUNet outperforms state-of-the-art methods with strong generalization capability. On RTX 4090 GPU, our SwinB-based model attains approximately 96 FPS in ideal forward inference and 61.49 FPS in practical end-to-end testing, demonstrating its real-time capability. Code: <span><span>https://github.com/LnSCV/TSNUNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104737"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight whole-body mesh recovery with joints and depth aware hand detail optimization 轻量级的全身网格恢复与关节和深度感知手细节优化
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-01-23 DOI: 10.1016/j.jvcir.2026.104729
Zilong Yang, Shujun Zhang, Xiao Wang, Hu Jin, Limin Sun
Expressive whole-body mesh recovery aims to estimate 3D human pose and shape parameters, including the face and hands, from a monocular image. Since hand details play a crucial role in conveying human posture, accurate hand reconstruction is of great importance for applications in 3D human modeling. However, precise recovery of hands is highly challenging due to the relatively small spatial proportion of hands, high flexibility, diverse gestures, and frequent occlusions. In this work, we propose a lightweight whole-body mesh recovery framework that enhances hand detail reconstruction while reducing computational complexity. Specifically, we introduce a Joints and Depth Aware Fusion (JDAF) module that adaptively encodes geometric joints and depth cues from local hand regions. This module provides strong 3D priors and effectively guides the regression of accurate hand parameters. In addition, we propose an Adaptive Dual-branch Pooling Attention (ADPA) module that models global context and local fine-grained interactions in a lightweight manner. Compared with the traditional self-attention mechanism, this module significantly reduces the computational burden. Experiments on the EHF and UBody benchmarks demonstrate that our approach outperforms SOTA methods, reducing body MPVPE by 8.5% and hand PA-MPVPE by 6.2%, while significantly lowering the number of parameters and MACs. More importantly, its efficiency and lightweight make it particularly suitable for real-time visual communication scenarios such as immersive conferencing, sign language translation, and VR/AR interaction.
表达全身网格恢复的目的是估计三维人体姿态和形状参数,包括脸和手,从单眼图像。由于手部细节在人体姿态的传递中起着至关重要的作用,因此准确的手部重建对于人体三维建模的应用具有重要意义。然而,由于手部的空间比例相对较小,灵活性高,手势多样,并且经常发生闭塞,因此手部的精确恢复具有很高的挑战性。在这项工作中,我们提出了一种轻量级的全身网格恢复框架,增强了手部细节重建,同时降低了计算复杂度。具体来说,我们引入了一个关节和深度感知融合(JDAF)模块,该模块自适应地编码来自手部局部区域的几何关节和深度线索。该模块提供了强大的三维先验,有效地指导准确的手部参数回归。此外,我们提出了一个自适应双分支池注意(ADPA)模块,该模块以轻量级的方式对全局上下文和局部细粒度交互进行建模。与传统的自关注机制相比,该模块显著降低了计算量。在EHF和UBody基准测试上的实验表明,我们的方法优于SOTA方法,将车身MPVPE降低8.5%,手部PA-MPVPE降低6.2%,同时显著降低了参数和mac的数量。更重要的是,它的效率和轻量级使其特别适合沉浸式会议、手语翻译、VR/AR交互等实时视觉通信场景。
{"title":"Lightweight whole-body mesh recovery with joints and depth aware hand detail optimization","authors":"Zilong Yang,&nbsp;Shujun Zhang,&nbsp;Xiao Wang,&nbsp;Hu Jin,&nbsp;Limin Sun","doi":"10.1016/j.jvcir.2026.104729","DOIUrl":"10.1016/j.jvcir.2026.104729","url":null,"abstract":"<div><div>Expressive whole-body mesh recovery aims to estimate 3D human pose and shape parameters, including the face and hands, from a monocular image. Since hand details play a crucial role in conveying human posture, accurate hand reconstruction is of great importance for applications in 3D human modeling. However, precise recovery of hands is highly challenging due to the relatively small spatial proportion of hands, high flexibility, diverse gestures, and frequent occlusions. In this work, we propose a lightweight whole-body mesh recovery framework that enhances hand detail reconstruction while reducing computational complexity. Specifically, we introduce a Joints and Depth Aware Fusion (JDAF) module that adaptively encodes geometric joints and depth cues from local hand regions. This module provides strong 3D priors and effectively guides the regression of accurate hand parameters. In addition, we propose an Adaptive Dual-branch Pooling Attention (ADPA) module that models global context and local fine-grained interactions in a lightweight manner. Compared with the traditional self-attention mechanism, this module significantly reduces the computational burden. Experiments on the EHF and UBody benchmarks demonstrate that our approach outperforms SOTA methods, reducing body MPVPE by 8.5% and hand PA-MPVPE by 6.2%, while significantly lowering the number of parameters and MACs. More importantly, its efficiency and lightweight make it particularly suitable for real-time visual communication scenarios such as immersive conferencing, sign language translation, and VR/AR interaction.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104729"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SFNet: Hierarchical perception and adaptive test-time training for AI-generated military image detection SFNet:人工智能生成军事图像检测的层次感知和自适应测试时间训练
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-01-27 DOI: 10.1016/j.jvcir.2026.104733
Minyang Li , Wenpeng Mu , Yifan Yuan , Shengyan Li , Qiang Xu
Existing general-purpose forgery detection techniques fall short in military scenarios because they lack military-specific priors about how real assets are designed, manufactured, and deployed. Authentic military platforms obey strict engineering and design standards, resulting in highly regular structural layouts and characteristic material textures, whereas AI-generated forgeries often exhibit subtle violations of these constraints. To address this critical gap, we introduce SentinelFakeNet (SFNet), a novel framework specifically designed for detecting AI-generated military images. SFNet features the Military Hierarchical Perception (MHP) Module, which extracts military-relevant hierarchical representations via Cross-Level Feature Fusion (CLFF) — a mechanism that intricately combines features from varying depths of the backbone. Furthermore, to ensure robustness and adaptability to diverse generative models, we propose the Military Adaptive Test-Time Training (MATTT) strategy, which incorporates Local Consistency Verification (LCV) and Multi-Scale Signature Analysis (MSSA) as specially designed tasks. To facilitate research in this domain, we also introduce MilForgery, the first large-scale military image forensic dataset comprising 800,000 authentic and synthetically generated military-related images. Extensive experiments demonstrate that our method achieves 95.80% average accuracy, representing state-of-the-art performance. Moreover, it exhibits superior generalization capabilities on public AIGC detection benchmarks, outperforming the leading baselines by +8.47% and +6.49% on GenImage and ForenSynths in average accuracy, respectively. Our code will be available on the author’s homepage.
现有的通用伪造检测技术在军事场景中存在不足,因为它们缺乏关于如何设计、制造和部署真实资产的特定军事经验。真实的军事平台遵守严格的工程和设计标准,导致高度规则的结构布局和特征材料纹理,而人工智能生成的伪造品通常会微妙地违反这些限制。为了解决这一关键差距,我们引入了SentinelFakeNet (SFNet),这是一个专门用于检测人工智能生成的军事图像的新框架。SFNet具有军事层次感知(MHP)模块,该模块通过跨层特征融合(CLFF)提取军事相关的层次表示,CLFF是一种复杂地结合来自骨干不同深度的特征的机制。此外,为了确保对不同生成模型的鲁棒性和适应性,我们提出了军事自适应测试时间训练(MATTT)策略,该策略将局部一致性验证(LCV)和多尺度签名分析(MSSA)作为特殊设计的任务。为了促进这一领域的研究,我们还介绍了MilForgery,这是第一个大规模军事图像法医数据集,包含80万张真实和合成的军事相关图像。大量的实验表明,我们的方法达到95.80%的平均准确率,代表了最先进的性能。此外,它在公共AIGC检测基准上表现出卓越的泛化能力,在GenImage和ForenSynths上的平均准确率分别比领先基线高出+8.47%和+6.49%。我们的代码将在作者的主页上提供。
{"title":"SFNet: Hierarchical perception and adaptive test-time training for AI-generated military image detection","authors":"Minyang Li ,&nbsp;Wenpeng Mu ,&nbsp;Yifan Yuan ,&nbsp;Shengyan Li ,&nbsp;Qiang Xu","doi":"10.1016/j.jvcir.2026.104733","DOIUrl":"10.1016/j.jvcir.2026.104733","url":null,"abstract":"<div><div>Existing general-purpose forgery detection techniques fall short in military scenarios because they lack military-specific priors about how real assets are designed, manufactured, and deployed. Authentic military platforms obey strict engineering and design standards, resulting in highly regular structural layouts and characteristic material textures, whereas AI-generated forgeries often exhibit subtle violations of these constraints. To address this critical gap, we introduce SentinelFakeNet (SFNet), a novel framework specifically designed for detecting AI-generated military images. SFNet features the Military Hierarchical Perception (MHP) Module, which extracts military-relevant hierarchical representations via Cross-Level Feature Fusion (CLFF) — a mechanism that intricately combines features from varying depths of the backbone. Furthermore, to ensure robustness and adaptability to diverse generative models, we propose the Military Adaptive Test-Time Training (MATTT) strategy, which incorporates Local Consistency Verification (LCV) and Multi-Scale Signature Analysis (MSSA) as specially designed tasks. To facilitate research in this domain, we also introduce MilForgery, the first large-scale military image forensic dataset comprising 800,000 authentic and synthetically generated military-related images. Extensive experiments demonstrate that our method achieves 95.80% average accuracy, representing state-of-the-art performance. Moreover, it exhibits superior generalization capabilities on public AIGC detection benchmarks, outperforming the leading baselines by +8.47% and +6.49% on GenImage and ForenSynths in average accuracy, respectively. Our code will be available on the author’s homepage.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104733"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-grained aesthetic multi-attribute captioning with aligned vision-language representations 具有对齐视觉语言表示的细粒度美学多属性标题
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-02-04 DOI: 10.1016/j.jvcir.2026.104732
Hongtao Yang , Yehui Liu , Minzheng Jia , Lu Han , Yongqiang Kong , Xin Jin , Ping Shi
Image aesthetic multi-attribute captioning emphasizes fine-grained aesthetic attributes, capturing intricate aesthetic characteristics from diverse perspectives and reflecting a more nuanced and profound understanding of aesthetics, encompassing a wide spectrum of aesthetic semantics. Despite its potential, current approaches to aesthetic multi-attribute captioning remain underexplored. This paper introduces a novel image aesthetic multi-attribute captioning method grounded in vision-language pre-training, aimed at addressing the inadequacy in aesthetic information expression by generating fine-grained attribute-aware aesthetic descriptions to enrich semantic depth and interpretability. Adopting a “pre-training and fine-tuning” paradigm, the proposed framework leverages CLIP and GPT-2 architectures, aligning CLIP-derived visual features with the GPT-2 embedding space via a cross-modal mapping network. The incorporation of aesthetic attribute control flags enable precise regulation of the generated aesthetic multi-attribute captions. Experimental results demonstrate that our method surpasses mainstream approaches across several metrics on DPC-MAC and PCCD dataset, including BLEU, METEOR, SPICE, etc. Furthermore, ablation studies on multi-stage aesthetic pre-training substantiate the effectiveness of the proposed strategy. The model consistently produces aesthetically coherent and attribute-aligned captions, underscoring its potential for advanced aesthetic analysis.
图像美学多属性字幕强调细粒度的美学属性,从不同的角度捕捉复杂的美学特征,反映出对美学更细致和深刻的理解,涵盖了广泛的美学语义。尽管具有潜力,但目前的美学多属性字幕方法仍未得到充分探索。本文提出了一种基于视觉语言预训练的图像审美多属性标注方法,旨在通过生成细粒度属性感知的审美描述,丰富语义深度和可解释性,解决审美信息表达不足的问题。该框架采用“预训练和微调”模式,利用CLIP和GPT-2架构,通过跨模态映射网络将CLIP衍生的视觉特征与GPT-2嵌入空间对齐。美学属性控制标志的结合可以精确地调节生成的美学多属性标题。实验结果表明,我们的方法在DPC-MAC和PCCD数据集上的BLEU、METEOR、SPICE等多个指标上都优于主流方法。此外,多阶段美学预训练的消融研究证实了该策略的有效性。该模型始终如一地产生美学上连贯和属性对齐的标题,强调其在高级美学分析方面的潜力。
{"title":"Fine-grained aesthetic multi-attribute captioning with aligned vision-language representations","authors":"Hongtao Yang ,&nbsp;Yehui Liu ,&nbsp;Minzheng Jia ,&nbsp;Lu Han ,&nbsp;Yongqiang Kong ,&nbsp;Xin Jin ,&nbsp;Ping Shi","doi":"10.1016/j.jvcir.2026.104732","DOIUrl":"10.1016/j.jvcir.2026.104732","url":null,"abstract":"<div><div>Image aesthetic multi-attribute captioning emphasizes fine-grained aesthetic attributes, capturing intricate aesthetic characteristics from diverse perspectives and reflecting a more nuanced and profound understanding of aesthetics, encompassing a wide spectrum of aesthetic semantics. Despite its potential, current approaches to aesthetic multi-attribute captioning remain underexplored. This paper introduces a novel image aesthetic multi-attribute captioning method grounded in vision-language pre-training, aimed at addressing the inadequacy in aesthetic information expression by generating fine-grained attribute-aware aesthetic descriptions to enrich semantic depth and interpretability. Adopting a “pre-training and fine-tuning” paradigm, the proposed framework leverages CLIP and GPT-2 architectures, aligning CLIP-derived visual features with the GPT-2 embedding space via a cross-modal mapping network. The incorporation of aesthetic attribute control flags enable precise regulation of the generated aesthetic multi-attribute captions. Experimental results demonstrate that our method surpasses mainstream approaches across several metrics on DPC-MAC and PCCD dataset, including BLEU, METEOR, SPICE, etc. Furthermore, ablation studies on multi-stage aesthetic pre-training substantiate the effectiveness of the proposed strategy. The model consistently produces aesthetically coherent and attribute-aligned captions, underscoring its potential for advanced aesthetic analysis.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104732"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-centric is a novel perspective for UAV-based tracking: A new benchmark via efficient data utilization strategy 以数据为中心是无人机跟踪的新视角:通过高效的数据利用策略实现新的基准
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-02-11 DOI: 10.1016/j.jvcir.2026.104743
Xiongyou Cai, Shuguang Wu, Shiwen Li, Hongru Zhang
Tracking a moving target with unmanned aerial vehicles (UAVs) poses significant challenges due to the substantial distance between the camera and target and the high relative motion. Trackers must efficiently process both appearance and motion information while adhering to the constraints of UAVs’ limited onboard computing power and real-time operational demands. Although current state-of-the-art (SOTA) UAV trackers rely on compact network structures, optimizing performance without increasing complexity remains a daunting challenge. This paper introduces a data-centric approach to enhance tracking performance in UAV environments. We first critique the limitations of existing datasets and propose a novel data mining strategy that leads to the development of the UAVSOT dataset. This dataset provides a more detailed representation for single-object tracking in UAV scenarios, effectively addressing the shortcomings of current datasets. Our experiments show that methods trained on UAVSOT significantly enhance tracking accuracy without additional computational overhead. Additionally, we compare model-centric and data-centric approaches to underscore the efficacy of our data-driven strategy in optimizing UAV trackers. The code and raw results can be found at https://github.com/caixiongyou/UAV-DC-Track.
由于相机与目标之间的距离较大,并且相对运动较大,因此使用无人机跟踪运动目标提出了重大挑战。跟踪器必须有效地处理外观和运动信息,同时遵守无人机有限的机载计算能力和实时操作需求的约束。尽管目前最先进的(SOTA)无人机跟踪器依赖于紧凑的网络结构,但在不增加复杂性的情况下优化性能仍然是一项艰巨的挑战。本文介绍了一种以数据为中心的方法来提高无人机环境下的跟踪性能。我们首先批评了现有数据集的局限性,并提出了一种新的数据挖掘策略,从而开发了UAVSOT数据集。该数据集为无人机场景下的单目标跟踪提供了更详细的表示,有效地解决了当前数据集的不足。我们的实验表明,在没有额外计算开销的情况下,在UAVSOT上训练的方法显著提高了跟踪精度。此外,我们比较了以模型为中心和以数据为中心的方法,以强调我们的数据驱动策略在优化无人机跟踪器方面的有效性。代码和原始结果可以在https://github.com/caixiongyou/UAV-DC-Track上找到。
{"title":"Data-centric is a novel perspective for UAV-based tracking: A new benchmark via efficient data utilization strategy","authors":"Xiongyou Cai,&nbsp;Shuguang Wu,&nbsp;Shiwen Li,&nbsp;Hongru Zhang","doi":"10.1016/j.jvcir.2026.104743","DOIUrl":"10.1016/j.jvcir.2026.104743","url":null,"abstract":"<div><div>Tracking a moving target with unmanned aerial vehicles (UAVs) poses significant challenges due to the substantial distance between the camera and target and the high relative motion. Trackers must efficiently process both appearance and motion information while adhering to the constraints of UAVs’ limited onboard computing power and real-time operational demands. Although current state-of-the-art (SOTA) UAV trackers rely on compact network structures, optimizing performance without increasing complexity remains a daunting challenge. This paper introduces a data-centric approach to enhance tracking performance in UAV environments. We first critique the limitations of existing datasets and propose a novel data mining strategy that leads to the development of the UAVSOT dataset. This dataset provides a more detailed representation for single-object tracking in UAV scenarios, effectively addressing the shortcomings of current datasets. Our experiments show that methods trained on UAVSOT significantly enhance tracking accuracy without additional computational overhead. Additionally, we compare model-centric and data-centric approaches to underscore the efficacy of our data-driven strategy in optimizing UAV trackers. The code and raw results can be found at <span><span>https://github.com/caixiongyou/UAV-DC-Track</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104743"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147398303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the transformer-based and diffusion-based models for single image deblurring 探索基于变压器和基于扩散的单幅图像去模糊模型
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-01-27 DOI: 10.1016/j.jvcir.2026.104735
Seunghwan Park , Chaehun Shin , Jaihyun Lew , Sungroh Yoon
Image deblurring is a fundamental task in image restoration (IR) aimed at removing blurring artifacts caused by factors such as defocusing, motions, and others. Since a blurry image could be originated from various sharp images, deblurring is regarded as an ill-posed problem with multiple valid solutions. The evolution of deblurring techniques spans from rule-based algorithms to deep learning-based models. Early research focused on estimating blur kernels using maximum a posteriori (MAP) estimation, but advancements in deep learning have shifted the focus towards directly predicting sharp images by leveraging deep learning techniques such as convolutional neural networks (CNNs), generative adversarial networks (GANs), recurrent neural networks (RNNs), and others. Building on these foundations, recent studies have advanced along two directions: transformer-based architectural innovations and diffusion-based algorithmic advances. This survey provides an in-depth investigation of recent deblurring models and traditional approaches. Furthermore, we conduct a fair re-evaluation under a unified evaluation protocol.
图像去模糊是图像恢复(IR)中的一项基本任务,旨在消除因散焦、运动等因素引起的模糊伪影。由于模糊图像可以由各种清晰图像产生,因此去模糊被认为是一个具有多个有效解的不适定问题。去模糊技术的发展跨越了从基于规则的算法到基于深度学习的模型。早期的研究主要集中在使用最大后验(MAP)估计来估计模糊核,但深度学习的进步已经将重点转移到通过利用深度学习技术(如卷积神经网络(cnn)、生成对抗网络(gan)、循环神经网络(rnn)等直接预测清晰图像。在这些基础上,最近的研究沿着两个方向发展:基于转换器的架构创新和基于扩散的算法进步。这项调查提供了一个深入的调查,最近的去模糊模型和传统的方法。此外,我们在统一的评估方案下进行公平的重新评估。
{"title":"Exploring the transformer-based and diffusion-based models for single image deblurring","authors":"Seunghwan Park ,&nbsp;Chaehun Shin ,&nbsp;Jaihyun Lew ,&nbsp;Sungroh Yoon","doi":"10.1016/j.jvcir.2026.104735","DOIUrl":"10.1016/j.jvcir.2026.104735","url":null,"abstract":"<div><div>Image deblurring is a fundamental task in image restoration (IR) aimed at removing blurring artifacts caused by factors such as defocusing, motions, and others. Since a blurry image could be originated from various sharp images, deblurring is regarded as an ill-posed problem with multiple valid solutions. The evolution of deblurring techniques spans from rule-based algorithms to deep learning-based models. Early research focused on estimating blur kernels using maximum a posteriori (MAP) estimation, but advancements in deep learning have shifted the focus towards directly predicting sharp images by leveraging deep learning techniques such as convolutional neural networks (CNNs), generative adversarial networks (GANs), recurrent neural networks (RNNs), and others. Building on these foundations, recent studies have advanced along two directions: transformer-based architectural innovations and diffusion-based algorithmic advances. This survey provides an in-depth investigation of recent deblurring models and traditional approaches. Furthermore, we conduct a fair re-evaluation under a unified evaluation protocol.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104735"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ATR-Net: Attention-based temporal-refinement network for efficient facial emotion recognition in human–robot interaction ATR-Net:人机交互中高效面部情感识别的基于注意力的时间优化网络
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-01-17 DOI: 10.1016/j.jvcir.2026.104720
Sougatamoy Biswas , Harshavardhan Reddy Gajarla , Anup Nandy , Asim Kumar Naskar
Facial Emotion Recognition (FER) enables human–robot interaction by allowing robots to interpret human emotions effectively. Traditional FER models achieve high accuracy but are often computationally intensive, limiting real-time application on resource-constrained devices. These models also face challenges in capturing subtle emotional expressions and addressing variations in facial poses. This study proposes a lightweight FER model based on EfficientNet-B0, balancing accuracy and efficiency for real-time deployment on embedded robotic systems. The proposed architecture integrates an Attention Augmented Convolution (AAC) layer with EfficientNet-B0 to enhance the model’s focus on subtle emotional cues, enabling robust performance in complex environments. Additionally, a Pyramid Channel-Gated Attention with a Temporal Refinement Block is introduced to capture spatial and channel dependencies, ensuring adaptability and efficiency on resource-limited devices. The model achieves accuracies of 74.22% on FER-2013, 99.14% on CK+, and 67.36% on AffectNet-7. These results demonstrate its efficiency and robustness for facial emotion recognition in human–robot interaction.
面部情绪识别(FER)通过允许机器人有效地解释人类的情绪来实现人机交互。传统的FER模型具有较高的精度,但通常计算量大,限制了在资源受限设备上的实时应用。这些模特在捕捉微妙的情绪表达和处理面部姿势的变化方面也面临着挑战。本研究提出了一种基于EfficientNet-B0的轻量级FER模型,平衡了嵌入式机器人系统实时部署的精度和效率。该架构将注意力增强卷积(Attention Augmented Convolution, AAC)层与effentnet - b0集成在一起,以增强模型对微妙情感线索的关注,从而在复杂环境中实现稳健的性能。此外,引入了具有时间细化块的金字塔通道门控注意力来捕获空间和通道依赖性,确保在资源有限的设备上的适应性和效率。该模型在FER-2013上的准确率为74.22%,在CK+上为99.14%,在AffectNet-7上为67.36%。实验结果证明了该方法在人机交互中人脸情感识别的有效性和鲁棒性。
{"title":"ATR-Net: Attention-based temporal-refinement network for efficient facial emotion recognition in human–robot interaction","authors":"Sougatamoy Biswas ,&nbsp;Harshavardhan Reddy Gajarla ,&nbsp;Anup Nandy ,&nbsp;Asim Kumar Naskar","doi":"10.1016/j.jvcir.2026.104720","DOIUrl":"10.1016/j.jvcir.2026.104720","url":null,"abstract":"<div><div>Facial Emotion Recognition (FER) enables human–robot interaction by allowing robots to interpret human emotions effectively. Traditional FER models achieve high accuracy but are often computationally intensive, limiting real-time application on resource-constrained devices. These models also face challenges in capturing subtle emotional expressions and addressing variations in facial poses. This study proposes a lightweight FER model based on EfficientNet-B0, balancing accuracy and efficiency for real-time deployment on embedded robotic systems. The proposed architecture integrates an Attention Augmented Convolution (AAC) layer with EfficientNet-B0 to enhance the model’s focus on subtle emotional cues, enabling robust performance in complex environments. Additionally, a Pyramid Channel-Gated Attention with a Temporal Refinement Block is introduced to capture spatial and channel dependencies, ensuring adaptability and efficiency on resource-limited devices. The model achieves accuracies of 74.22% on FER-2013, 99.14% on CK+, and 67.36% on AffectNet-7. These results demonstrate its efficiency and robustness for facial emotion recognition in human–robot interaction.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104720"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-in-the-loop dual-branch architecture for image super-resolution 面向图像超分辨率的人在环双分支架构
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-01-17 DOI: 10.1016/j.jvcir.2026.104726
Suraj Neelakantan, Martin Längkvist, Amy Loutfi
Single-image super-resolution aims to recover high-frequency detail from a single low-resolution image, but practical applications often require balancing distortion against perceptual quality. Existing methods typically produce a single fixed reconstruction and offer limited test-time control over this trade-off. This paper presents DR-SCAN, a dual-branch deep residual network for single-image super-resolution in which, during test-time inference, weights can be assigned to either of the branches to dynamically steer their respective contributions to the reconstructed output. An interactive interface enables users to re-weight the shallow and deep branches at inference or run a one-click LPIPS search, to navigate the distortion–perception trade-off without retraining the model. Ablation experiments confirm that both the second branch and the channel–spatial attention that is used within the residual blocks are essential for the network for better reconstruction, while the interactive tuning routine demonstrates the practical value of post-hoc branch fusion.
单图像超分辨率旨在从单个低分辨率图像中恢复高频细节,但实际应用通常需要平衡失真和感知质量。现有的方法通常只产生一个固定的重构,并且对这种权衡提供有限的测试时间控制。本文提出了一种用于单图像超分辨率的双分支深度残差网络DR-SCAN,在测试时间推理过程中,可以为任意一个分支分配权重,以动态地引导它们各自对重建输出的贡献。交互界面使用户能够在推理时重新权衡浅分支和深分支的权重,或者运行一键式LPIPS搜索,在不重新训练模型的情况下导航扭曲感知权衡。消融实验证实了第二分支和残块内使用的通道空间关注对于更好地重建网络是必不可少的,而交互式调优程序则证明了事后分支融合的实用价值。
{"title":"Human-in-the-loop dual-branch architecture for image super-resolution","authors":"Suraj Neelakantan,&nbsp;Martin Längkvist,&nbsp;Amy Loutfi","doi":"10.1016/j.jvcir.2026.104726","DOIUrl":"10.1016/j.jvcir.2026.104726","url":null,"abstract":"<div><div>Single-image super-resolution aims to recover high-frequency detail from a single low-resolution image, but practical applications often require balancing distortion against perceptual quality. Existing methods typically produce a single fixed reconstruction and offer limited test-time control over this trade-off. This paper presents DR-SCAN, a dual-branch deep residual network for single-image super-resolution in which, during test-time inference, weights can be assigned to either of the branches to dynamically steer their respective contributions to the reconstructed output. An interactive interface enables users to re-weight the shallow and deep branches at inference or run a one-click LPIPS search, to navigate the distortion–perception trade-off without retraining the model. Ablation experiments confirm that both the second branch and the channel–spatial attention that is used within the residual blocks are essential for the network for better reconstruction, while the interactive tuning routine demonstrates the practical value of post-hoc branch fusion.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104726"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards fast and effective low-light image enhancement via adaptive Gamma correction and detail refinement 通过自适应伽玛校正和细节细化,实现快速有效的低光图像增强
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 Epub Date: 2026-01-15 DOI: 10.1016/j.jvcir.2026.104724
Shaoping Xu, Qiyu Chen, Liang Peng, Hanyang Hu, Wuyong Tao
Over the past decade, deep neural networks have significantly advanced low-light image enhancement (LLIE), achieving marked improvements in perceptual quality and robustness. However, these gains are increasingly accompanied by architectural complexity and computational inefficiency, widening the gap between enhancement performance and real-time applicability. This trade-off poses a critical challenge for time-sensitive scenarios requiring both high visual quality and efficient execution. To resolve the efficiency–quality trade-off in LLIE, we propose an ultra-lightweight framework comprising two computationally efficient modules: the adaptive Gamma correction module (AGCM) and the nonlinear refinement module (NRM). Specifically, the AGCM employs lightweight convolutions to generate spatially adaptive, pixel-wise Gamma maps that simultaneously mitigate global underexposure and suppress highlight overexposure, thereby preserving scene-specific luminance characteristics and ensuring visually natural global enhancement. Subsequently, the NRM employs two nonlinear transformation layers that logarithmically compress highlights and adaptively stretch shadows, effectively restoring local details without semantic distortion. Moreover, the first nonlinear transformation layer within the NRM incorporates residual connections to facilitate the capture and exploitation of subtle image features. Finally, the AGCM and NRM modules are jointly optimized using a hybrid loss function combining a reference-based fidelity term and no-reference perceptual metrics (i.e., local contrast, colorfulness, and exposure balance). Extensive experiments demonstrate that the proposed LLIE framework delivers performance comparable to state-of-the-art (SOTA) algorithms, while requiring only 8K parameters, achieving an optimal trade-off between enhancement quality and computational efficiency. This performance stems from our two-stage ultra-lightweight design: global illumination correction via pixel-adaptive Gamma adjustment, followed by detail-aware nonlinear refinement, all realized within a minimally parameterized architecture. As a result, the framework is uniquely suited for real-time deployment in resource-constrained environments.
在过去的十年中,深度神经网络显著地推进了低光图像增强(LLIE),在感知质量和鲁棒性方面取得了显著的进步。然而,这些增益越来越多地伴随着架构复杂性和计算效率低下,扩大了增强性能和实时适用性之间的差距。这种权衡对需要高视觉质量和高效执行的时间敏感场景提出了关键挑战。为了解决LLIE中效率与质量的权衡,我们提出了一个超轻量级框架,包括两个计算效率高的模块:自适应伽玛校正模块(AGCM)和非线性细化模块(NRM)。具体来说,AGCM采用轻量级卷积来生成空间自适应的像素级伽玛图,同时减轻全局曝光不足和抑制高光过度曝光,从而保留特定场景的亮度特征,并确保视觉上自然的全局增强。随后,NRM采用对数压缩高光和自适应拉伸阴影的两个非线性变换层,有效地恢复局部细节而不产生语义失真。此外,NRM中的第一非线性变换层包含残差连接,以方便捕获和利用微妙的图像特征。最后,使用结合基于参考的保真度项和无参考感知度量(即局部对比度、色彩和曝光平衡)的混合损失函数对AGCM和NRM模块进行联合优化。大量的实验表明,所提出的LLIE框架提供了与最先进(SOTA)算法相当的性能,同时只需要8K个参数,在增强质量和计算效率之间实现了最佳权衡。这种性能源于我们的两阶段超轻量化设计:通过像素自适应伽马调整进行全局照明校正,然后是细节感知非线性细化,所有这些都在最小参数化架构中实现。因此,该框架非常适合在资源受限的环境中进行实时部署。
{"title":"Towards fast and effective low-light image enhancement via adaptive Gamma correction and detail refinement","authors":"Shaoping Xu,&nbsp;Qiyu Chen,&nbsp;Liang Peng,&nbsp;Hanyang Hu,&nbsp;Wuyong Tao","doi":"10.1016/j.jvcir.2026.104724","DOIUrl":"10.1016/j.jvcir.2026.104724","url":null,"abstract":"<div><div>Over the past decade, deep neural networks have significantly advanced low-light image enhancement (LLIE), achieving marked improvements in perceptual quality and robustness. However, these gains are increasingly accompanied by architectural complexity and computational inefficiency, widening the gap between enhancement performance and real-time applicability. This trade-off poses a critical challenge for time-sensitive scenarios requiring both high visual quality and efficient execution. To resolve the efficiency–quality trade-off in LLIE, we propose an ultra-lightweight framework comprising two computationally efficient modules: the adaptive Gamma correction module (AGCM) and the nonlinear refinement module (NRM). Specifically, the AGCM employs lightweight convolutions to generate spatially adaptive, pixel-wise Gamma maps that simultaneously mitigate global underexposure and suppress highlight overexposure, thereby preserving scene-specific luminance characteristics and ensuring visually natural global enhancement. Subsequently, the NRM employs two nonlinear transformation layers that logarithmically compress highlights and adaptively stretch shadows, effectively restoring local details without semantic distortion. Moreover, the first nonlinear transformation layer within the NRM incorporates residual connections to facilitate the capture and exploitation of subtle image features. Finally, the AGCM and NRM modules are jointly optimized using a hybrid loss function combining a reference-based fidelity term and no-reference perceptual metrics (i.e., local contrast, colorfulness, and exposure balance). Extensive experiments demonstrate that the proposed LLIE framework delivers performance comparable to state-of-the-art (SOTA) algorithms, while requiring only 8K parameters, achieving an optimal trade-off between enhancement quality and computational efficiency. This performance stems from our two-stage ultra-lightweight design: global illumination correction via pixel-adaptive Gamma adjustment, followed by detail-aware nonlinear refinement, all realized within a minimally parameterized architecture. As a result, the framework is uniquely suited for real-time deployment in resource-constrained environments.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"116 ","pages":"Article 104724"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1