首页 > 最新文献

IEEE Transactions on Broadcasting最新文献

英文 中文
IEEE Transactions on Broadcasting Information for Readers and Authors 面向读者和作者的广播信息IEEE汇刊
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-03-06 DOI: 10.1109/TBC.2026.3666580
{"title":"IEEE Transactions on Broadcasting Information for Readers and Authors","authors":"","doi":"10.1109/TBC.2026.3666580","DOIUrl":"https://doi.org/10.1109/TBC.2026.3666580","url":null,"abstract":"","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"72 1","pages":"C3-C4"},"PeriodicalIF":4.8,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11422825","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Light Field Referring Segmentation: A Benchmark and an LLM-Based Approach 光场参考分割:一个基准和基于llm的方法
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-02-06 DOI: 10.1109/TBC.2026.3659013
Shishun Tian;Qian Xu;He Gui;Ting Su;Yan Li;Qiong Wang
Referring image segmentation (RIS) is a challenging task that requires models to segment objects based on natural language descriptions. Existing RIS models have limited leverage of geometric information, resulting in multimodal mismatch between language and vision. In contrast to conventional 2D images, light field imaging gathers rays emitted from light sources in all directions. This unique characteristic enriches the comprehensive understanding of scenes, which provides us with a new way to optimize RIS. In this paper, we propose the first light field referring segmentation dataset, which contains rich occluded objects and depth-referring descriptions. Afterward, we benchmark the performance of existing 2D referring image segmentation methods on the proposed dataset. The results revealed that these methods show limited efficacy in occluded scenes and depth-based descriptions of scenes. To address this issue, we propose a novel framework, termed LFLLM, for light field referring segmentation. Specifically, we propose a Center Angular Aggregation Module that warps the views adjacent to the central view to prevent feature occlusion caused by viewpoints misalignment, and a Depth Convergence Module that adds a depth token into the LLMs to leverage the depth information in the light field. Extensive experiments demonstrate that our approach outperforms the current state-of-the-art methods. The dataset and code are available at https://github.com/ShishunTian/LFLLM-TBC2026
参考图像分割(RIS)是一项具有挑战性的任务,它要求模型基于自然语言描述对对象进行分割。现有RIS模型对几何信息的利用有限,导致语言和视觉之间存在多模态失配。与传统的二维图像相比,光场成像收集了光源在各个方向发射的光线。这种独特的特性丰富了对场景的全面理解,为我们优化RIS提供了新的途径。本文提出了第一个光场参考分割数据集,该数据集包含丰富的遮挡对象和深度参考描述。然后,我们在提出的数据集上对现有二维参考图像分割方法的性能进行了基准测试。结果表明,这些方法在遮挡场景和基于深度的场景描述中效果有限。为了解决这个问题,我们提出了一个新的框架,称为LFLLM,用于光场参考分割。具体来说,我们提出了一个中心角聚合模块,该模块扭曲了与中心视图相邻的视图,以防止视点不对齐导致的特征遮挡,以及一个深度收敛模块,该模块在llm中添加深度令牌以利用光场中的深度信息。大量的实验表明,我们的方法优于目前最先进的方法。数据集和代码可在https://github.com/ShishunTian/LFLLM-TBC2026上获得
{"title":"Light Field Referring Segmentation: A Benchmark and an LLM-Based Approach","authors":"Shishun Tian;Qian Xu;He Gui;Ting Su;Yan Li;Qiong Wang","doi":"10.1109/TBC.2026.3659013","DOIUrl":"https://doi.org/10.1109/TBC.2026.3659013","url":null,"abstract":"Referring image segmentation (RIS) is a challenging task that requires models to segment objects based on natural language descriptions. Existing RIS models have limited leverage of geometric information, resulting in multimodal mismatch between language and vision. In contrast to conventional 2D images, light field imaging gathers rays emitted from light sources in all directions. This unique characteristic enriches the comprehensive understanding of scenes, which provides us with a new way to optimize RIS. In this paper, we propose the first light field referring segmentation dataset, which contains rich occluded objects and depth-referring descriptions. Afterward, we benchmark the performance of existing 2D referring image segmentation methods on the proposed dataset. The results revealed that these methods show limited efficacy in occluded scenes and depth-based descriptions of scenes. To address this issue, we propose a novel framework, termed LFLLM, for light field referring segmentation. Specifically, we propose a Center Angular Aggregation Module that warps the views adjacent to the central view to prevent feature occlusion caused by viewpoints misalignment, and a Depth Convergence Module that adds a depth token into the LLMs to leverage the depth information in the light field. Extensive experiments demonstrate that our approach outperforms the current state-of-the-art methods. The dataset and code are available at <uri>https://github.com/ShishunTian/LFLLM-TBC2026</uri>","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"72 1","pages":"361-372"},"PeriodicalIF":4.8,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compression Efficiency and Picture Quality Assessment of Broadcast HDR Videos With and Without Film-Grain 有无膜粒的广播HDR视频压缩效率和图像质量评价
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2026-02-06 DOI: 10.1109/TBC.2026.3651199
Gosala Kulupana;Jayasingam Adhuran;Andrew Cotton
Along with the ever-growing popularity of information rich video formats such as High Dynamic Range (HDR), Wide Colour Gamut (WCG) and Ultra High Definition (UHD), the need for efficient distribution of such video content has become crucially important. To this end, video compression plays an important role in reducing the bitrate requirements by, for example, up to two orders of magnitude. This work first investigates the performance of state-of-the-art video compression standards with the help of a large-scale subjective study. The study for the first time provides some useful insight into the types of artifacts that are introduced by the video codecs on HDR content with film-grain. The evaluation of the pictures generated by the video codecs is also a crucial aspect in the video distribution pipeline. For this purpose, this paper further provides an HDR picture quality metric which is based on an existing Standard Dynamic Range (SDR) picture quality metric called Detail Loss Metric (DLM). The proposed metric includes several novel features to make it suitable for assessing compression artifacts of HDR content with and without film-grain. They are 1) A Just Noticeable Difference (JND) based perceptual weighting function that captures the non-linearity in the HDR signal. 2) An entropy masking module to capture the impact of film-grain characteristics. 3) A contrast masking operation based on previous work, to better represent the impact of local visual masking. Performance of the proposed method and several state-of-the-art HDR and SDR quality metrics is evaluated on two large-scale datasets using three commonly used accuracy measures: Pearson Linear Correlation Coefficient (PLCC), Spearman’s Rank Correlation Coefficient (SRCC) and Root Mean Square Error (RMSE). A further measure of how reliably a given metric can represent the user perception is also introduced. The experimental results indicate superior performance of the proposed method, demonstrating a PLCC score of more than 94% and similar good performance scores for other accuracy measures.
随着高动态范围(HDR)、宽色域(WCG)和超高清(UHD)等信息丰富的视频格式的日益普及,高效分发此类视频内容的需求变得至关重要。为此,视频压缩在降低比特率要求方面起着重要作用,例如,最多可降低两个数量级。这项工作首先在大规模主观研究的帮助下调查了最先进的视频压缩标准的性能。该研究首次对视频编解码器在带有胶片颗粒的HDR内容上引入的伪影类型提供了一些有用的见解。视频编解码器生成的图像的评估也是视频分发管道中的一个关键方面。为此,本文进一步提出了一种HDR图像质量度量,该度量基于现有的标准动态范围(SDR)图像质量度量,称为细节损失度量(DLM)。提出的度量包括几个新特征,使其适合于评估有或没有膜粒的HDR内容的压缩伪影。它们是1)基于感知加权函数的可注意差分(JND),用于捕获HDR信号中的非线性。2)利用熵掩蔽模块捕捉膜粒特性的影响。3)在之前工作的基础上进行对比度掩蔽操作,更好地代表局部视觉掩蔽的影响。在两个大型数据集上,使用三种常用的精度测量方法:Pearson线性相关系数(PLCC)、Spearman秩相关系数(SRCC)和均方根误差(RMSE),评估了所提出的方法和几种最先进的HDR和SDR质量指标的性能。还介绍了对给定度量表示用户感知的可靠性的进一步度量。实验结果表明,该方法具有优异的性能,PLCC得分超过94%,在其他精度测量中也有类似的良好表现。
{"title":"Compression Efficiency and Picture Quality Assessment of Broadcast HDR Videos With and Without Film-Grain","authors":"Gosala Kulupana;Jayasingam Adhuran;Andrew Cotton","doi":"10.1109/TBC.2026.3651199","DOIUrl":"https://doi.org/10.1109/TBC.2026.3651199","url":null,"abstract":"Along with the ever-growing popularity of information rich video formats such as High Dynamic Range (HDR), Wide Colour Gamut (WCG) and Ultra High Definition (UHD), the need for efficient distribution of such video content has become crucially important. To this end, video compression plays an important role in reducing the bitrate requirements by, for example, up to two orders of magnitude. This work first investigates the performance of state-of-the-art video compression standards with the help of a large-scale subjective study. The study for the first time provides some useful insight into the types of artifacts that are introduced by the video codecs on HDR content with film-grain. The evaluation of the pictures generated by the video codecs is also a crucial aspect in the video distribution pipeline. For this purpose, this paper further provides an HDR picture quality metric which is based on an existing Standard Dynamic Range (SDR) picture quality metric called Detail Loss Metric (DLM). The proposed metric includes several novel features to make it suitable for assessing compression artifacts of HDR content with and without film-grain. They are 1) A Just Noticeable Difference (JND) based perceptual weighting function that captures the non-linearity in the HDR signal. 2) An entropy masking module to capture the impact of film-grain characteristics. 3) A contrast masking operation based on previous work, to better represent the impact of local visual masking. Performance of the proposed method and several state-of-the-art HDR and SDR quality metrics is evaluated on two large-scale datasets using three commonly used accuracy measures: Pearson Linear Correlation Coefficient (PLCC), Spearman’s Rank Correlation Coefficient (SRCC) and Root Mean Square Error (RMSE). A further measure of how reliably a given metric can represent the user perception is also introduced. The experimental results indicate superior performance of the proposed method, demonstrating a PLCC score of more than 94% and similar good performance scores for other accuracy measures.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"72 1","pages":"317-328"},"PeriodicalIF":4.8,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Broadcasting Publication Information IEEE广播出版信息汇刊
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-17 DOI: 10.1109/TBC.2025.3640759
{"title":"IEEE Transactions on Broadcasting Publication Information","authors":"","doi":"10.1109/TBC.2025.3640759","DOIUrl":"https://doi.org/10.1109/TBC.2025.3640759","url":null,"abstract":"","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 4","pages":"C2-C2"},"PeriodicalIF":4.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11302006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145766223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2025 Scott Helt Memorial Award for the Best Paper Published in IEEE Transactions on Broadcasting 2025年斯科特·海尔特纪念奖,在IEEE广播事务中发表的最佳论文
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-17 DOI: 10.1109/TBC.2025.3640887
{"title":"2025 Scott Helt Memorial Award for the Best Paper Published in IEEE Transactions on Broadcasting","authors":"","doi":"10.1109/TBC.2025.3640887","DOIUrl":"https://doi.org/10.1109/TBC.2025.3640887","url":null,"abstract":"","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 4","pages":"1108-1110"},"PeriodicalIF":4.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11302029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145766204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Broadcasting Information for Readers and Authors 面向读者和作者的广播信息IEEE汇刊
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-17 DOI: 10.1109/TBC.2025.3640761
{"title":"IEEE Transactions on Broadcasting Information for Readers and Authors","authors":"","doi":"10.1109/TBC.2025.3640761","DOIUrl":"https://doi.org/10.1109/TBC.2025.3640761","url":null,"abstract":"","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 4","pages":"C3-C4"},"PeriodicalIF":4.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11302004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145766222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Successive Refinement Coding and SWIPT for Energy-Efficient Cooperative NOMA Systems 节能协同NOMA系统的逐次细化编码与SWIPT
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-11 DOI: 10.1109/TBC.2025.3635396
Kewen Wang;Meng Cheng
Successive refinement (SR) coding has been integrated into cooperative non-orthogonal multiple access (CO-NOMA) to improve transmission efficiency. This paper further incorporates simultaneous wireless information and power transfer (SWIPT) into the SR-based CO-NOMA framework, enabling the relay user to operate using harvested energy and thereby reducing system power consumption. Assuming block Rayleigh fading channels, closed-form expressions for the outage probability are derived. The power allocation ratio and power splitting factor are jointly optimized to enhance system performance. Under a fixed total transmit power constraint, the proposed SWIPT-enabled system achieves significantly lower outage probabilities than the conventional CO-NOMA counterpart without SWIPT. The performance gain is further evaluated across different relay locations.
为了提高传输效率,将逐次细化(SR)编码集成到协同非正交多址(CO-NOMA)中。本文进一步将同步无线信息和功率传输(SWIPT)整合到基于sr的CO-NOMA框架中,使中继用户能够使用收集的能量进行操作,从而降低系统功耗。假设分块瑞利衰落信道,导出了中断概率的封闭表达式。通过对功率分配比和功率分割因数的共同优化,提高了系统的性能。在固定的总发射功率约束下,所提出的启用SWIPT的系统比不启用SWIPT的传统CO-NOMA系统实现更低的中断概率。在不同的中继位置上进一步评估性能增益。
{"title":"Successive Refinement Coding and SWIPT for Energy-Efficient Cooperative NOMA Systems","authors":"Kewen Wang;Meng Cheng","doi":"10.1109/TBC.2025.3635396","DOIUrl":"https://doi.org/10.1109/TBC.2025.3635396","url":null,"abstract":"Successive refinement (SR) coding has been integrated into cooperative non-orthogonal multiple access (CO-NOMA) to improve transmission efficiency. This paper further incorporates simultaneous wireless information and power transfer (SWIPT) into the SR-based CO-NOMA framework, enabling the relay user to operate using harvested energy and thereby reducing system power consumption. Assuming block Rayleigh fading channels, closed-form expressions for the outage probability are derived. The power allocation ratio and power splitting factor are jointly optimized to enhance system performance. Under a fixed total transmit power constraint, the proposed SWIPT-enabled system achieves significantly lower outage probabilities than the conventional CO-NOMA counterpart without SWIPT. The performance gain is further evaluated across different relay locations.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"72 1","pages":"382-386"},"PeriodicalIF":4.8,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DM-VSR: Depth-Aware Diffusion Models With Adaptive Modulation for Video Super-Resolution DM-VSR:深度感知扩散模型与自适应调制视频超分辨率
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/TBC.2025.3637713
Linlin Liu;Yifan Wang;Yize Wang;Zhen Xu;Jun Tang;Yong Ding
Video Super-Resolution (VSR) is essential for enhancing the quality of low-resolution (LR) videos in practical applications. Recent studies have explored diffusion models (DMs) for VSR due to their ability to generate realistic details. However, existing methods overlook spatiotemporal object-scale variations and dynamic control demands during denoising. This leads to visual distortion and quality degradation, severely limiting its practical applications. To address these limitations, we propose DM-VSR, a novel DM-based framework that incorporates depth-aware guidance and adaptive modulation for precise content reconstruction. Specifically, a Depth-aware Multimodal Fusion (DMF) module integrates depth maps, LR inputs, and flow-warped frames to provide unified depth-aware guidance. A Timestep Adaptive Modulation (TAM) module dynamically adjusts the control feature injection according to demand at each denoising step. Additionally, a Dynamic Consistency Loss (DCL) is introduced to align training objectives with the evolving semantic focus. Extensive experiments on the REDS4 and Vid4 benchmarks demonstrate that DM-VSR achieves competitive performance, surpassing state-of-the-art methods in both perceptual quality and temporal consistency. Moreover, DM-VSR generates more visually realistic results, emphasizing its effectiveness in real-world applications. The code will be released at https://github.com/aigcvsr/DM-VSR
在实际应用中,视频超分辨率(VSR)对于提高低分辨率视频的质量至关重要。由于扩散模型能够生成真实的细节,最近的研究已经探索了VSR的扩散模型。然而,现有方法忽略了去噪过程中对象尺度的时空变化和动态控制需求。这导致视觉失真和质量下降,严重限制了其实际应用。为了解决这些限制,我们提出了DM-VSR,这是一种新的基于dm的框架,它结合了深度感知制导和自适应调制,用于精确的内容重建。具体来说,深度感知多模态融合(DMF)模块集成了深度图、LR输入和流扭曲帧,以提供统一的深度感知引导。时间步长自适应调制(TAM)模块根据每个去噪步骤的需要动态调整控制特征注入。此外,引入了动态一致性损失(DCL)来使训练目标与不断变化的语义焦点保持一致。在REDS4和Vid4基准测试上进行的大量实验表明,DM-VSR实现了具有竞争力的性能,在感知质量和时间一致性方面都超过了最先进的方法。此外,DM-VSR产生更逼真的视觉效果,强调其在实际应用中的有效性。代码将在https://github.com/aigcvsr/DM-VSR上发布
{"title":"DM-VSR: Depth-Aware Diffusion Models With Adaptive Modulation for Video Super-Resolution","authors":"Linlin Liu;Yifan Wang;Yize Wang;Zhen Xu;Jun Tang;Yong Ding","doi":"10.1109/TBC.2025.3637713","DOIUrl":"https://doi.org/10.1109/TBC.2025.3637713","url":null,"abstract":"Video Super-Resolution (VSR) is essential for enhancing the quality of low-resolution (LR) videos in practical applications. Recent studies have explored diffusion models (DMs) for VSR due to their ability to generate realistic details. However, existing methods overlook spatiotemporal object-scale variations and dynamic control demands during denoising. This leads to visual distortion and quality degradation, severely limiting its practical applications. To address these limitations, we propose <bold>DM-VSR</b>, a novel DM-based framework that incorporates depth-aware guidance and adaptive modulation for precise content reconstruction. Specifically, a Depth-aware Multimodal Fusion (DMF) module integrates depth maps, LR inputs, and flow-warped frames to provide unified depth-aware guidance. A Timestep Adaptive Modulation (TAM) module dynamically adjusts the control feature injection according to demand at each denoising step. Additionally, a Dynamic Consistency Loss (DCL) is introduced to align training objectives with the evolving semantic focus. Extensive experiments on the REDS4 and Vid4 benchmarks demonstrate that DM-VSR achieves competitive performance, surpassing state-of-the-art methods in both perceptual quality and temporal consistency. Moreover, DM-VSR generates more visually realistic results, emphasizing its effectiveness in real-world applications. The code will be released at <uri>https://github.com/aigcvsr/DM-VSR</uri>","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"72 1","pages":"329-344"},"PeriodicalIF":4.8,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anime4K: A Hybrid CNN–Transformer Network for Anime Super-Resolution Anime4K:动画超分辨率的混合cnn -变压器网络
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-13 DOI: 10.1109/TBC.2025.3622413
Qi Liu;Feifan Cai;Zihao Zhang;Haiqi Zhu;Youdong Ding
Although anime super-resolution (SR) has garnered increasing interest in recent years, many existing approaches still inherit design principles from photorealistic imagery, overlooking the unique visual traits of anime. In this work, we revisit the anime SR task from an anime-specific perspective and propose a unified solution tailored for real-world restoration. First, we present A4K, the first 4K-resolution dataset specifically curated for anime SR, built via a dual-criterion selection pipeline combining perceptual quality and structural complexity, resulting in cleaner and more informative training samples. Second, we introduce AniFusionNet, a hybrid CNN–Transformer architecture that dynamically fuses local convolutional features with global self-attention, effectively balancing fine detail reconstruction and global coherence. Finally, we introduce a targeted ground truth (GT) enhancement strategy that selectively strengthens hand-drawn line structures, enabling more accurate learning of anime-specific textures. Extensive experiments on public benchmarks demonstrate that our approach achieves state-of-the-art performance in edge sharpness, color consistency, and artifact suppression. The project is available at https://github.com/wasai67/Anime4K
尽管近年来动画超分辨率(SR)引起了越来越多的兴趣,但许多现有的方法仍然继承了来自逼真图像的设计原则,忽视了动画独特的视觉特征。在这项工作中,我们从动画特定的角度重新审视动画SR任务,并提出了为现实世界修复量身定制的统一解决方案。首先,我们提出了A4K,这是第一个专门为动漫SR策划的4k分辨率数据集,通过结合感知质量和结构复杂性的双标准选择管道构建,从而产生更清晰、更有信息量的训练样本。其次,我们引入了AniFusionNet,这是一种CNN-Transformer混合架构,它动态融合了局部卷积特征和全局自关注,有效地平衡了精细细节重建和全局相干性。最后,我们引入了一种有针对性的ground truth (GT)增强策略,该策略可以选择性地增强手绘线结构,从而更准确地学习特定于动画的纹理。在公共基准测试上进行的大量实验表明,我们的方法在边缘清晰度、颜色一致性和伪影抑制方面实现了最先进的性能。该项目可在https://github.com/wasai67/Anime4K上获得
{"title":"Anime4K: A Hybrid CNN–Transformer Network for Anime Super-Resolution","authors":"Qi Liu;Feifan Cai;Zihao Zhang;Haiqi Zhu;Youdong Ding","doi":"10.1109/TBC.2025.3622413","DOIUrl":"https://doi.org/10.1109/TBC.2025.3622413","url":null,"abstract":"Although anime super-resolution (SR) has garnered increasing interest in recent years, many existing approaches still inherit design principles from photorealistic imagery, overlooking the unique visual traits of anime. In this work, we revisit the anime SR task from an anime-specific perspective and propose a unified solution tailored for real-world restoration. First, we present A4K, the first 4K-resolution dataset specifically curated for anime SR, built via a dual-criterion selection pipeline combining perceptual quality and structural complexity, resulting in cleaner and more informative training samples. Second, we introduce AniFusionNet, a hybrid CNN–Transformer architecture that dynamically fuses local convolutional features with global self-attention, effectively balancing fine detail reconstruction and global coherence. Finally, we introduce a targeted ground truth (GT) enhancement strategy that selectively strengthens hand-drawn line structures, enabling more accurate learning of anime-specific textures. Extensive experiments on public benchmarks demonstrate that our approach achieves state-of-the-art performance in edge sharpness, color consistency, and artifact suppression. The project is available at <uri>https://github.com/wasai67/Anime4K</uri>","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"72 1","pages":"345-360"},"PeriodicalIF":4.8,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind Light Field Image Quality Assessment Using Multiplane Texture and Multilevel Wavelet Information 基于多平面纹理和多级小波信息的盲光场图像质量评价
IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-06 DOI: 10.1109/TBC.2025.3627787
Zhengyu Zhang;Shishun Tian;Jianjun Xiang;Wenbin Zou;Luce Morin;Lu Zhang
Light Field Image (LFI) has garnered remarkable interest and fascination due to its burgeoning significance in immersive applications. Although the abundant information in LFIs enables a more immersive experience, it also poses a greater challenge for Light Field Image Quality Assessment (LFIQA), especially when reference information is inaccessible. In this paper, inspired by the holistic visual perception of high-dimensional LFIs and neuroscience studies on the Human Visual System (HVS), we propose a novel Blind Light Field image quality assessment metric by exploring MultiPlane Texture and Multilevel Wavelet Information, abbreviated as MPT-MWI-BLiF. Specifically, considering the texture sensitivity of the secondary visual cortex (V2), we first convert LFIs into multiple individual planes and capture textural variations from these planes. Then, the statistical histogram of textural variations for all planes is calculated as holistic textural variation features. In addition, motivated by the fact that neuronal responses in the visual cortex are frequency-dependent, we simulate this visual perception process by decomposing LFIs into multilevel wavelet subbands with Four-Dimensional Discrete Haar Wavelet Transform (4D-DHWT). After that, the subband geometric features of first-level 4D-DHWT subbands and the coefficient intensity features of second-level 4D-DHWT subbands are computed respectively. Finally, we combine all the extracted quality-aware features and employ the widely-used Support Vector Regression (SVR) to predict the perceptual quality of LFIs. To fully validate the effectiveness of the proposed metric, we perform extensive experiments on five representative LFIQA databases with two cross-validation methods. Experimental results demonstrate the superiority of the proposed metric in quality evaluation, as well as its low time complexity compared to other state-of-the-art metrics. The full code will be publicly available at https://github.com/ZhengyuZhang96/MPT-MWI-BLiF
光场图像(LFI)由于其在沉浸式应用中的重要性而引起了人们的极大兴趣和迷恋。虽然lfi中丰富的信息可以提供更身临其境的体验,但它也对光场图像质量评估(LFIQA)提出了更大的挑战,特别是在无法获得参考信息的情况下。本文受高维lfi整体视觉感知和人类视觉系统(HVS)神经科学研究的启发,提出了一种基于多平面纹理和多层次小波信息的盲光场图像质量评价方法,简称mpt - mwi - bif。具体而言,考虑到第二视觉皮层(V2)的纹理敏感性,我们首先将lfi转换为多个单独的平面,并从这些平面中捕获纹理变化。然后,计算各平面纹理变化的统计直方图作为整体纹理变化特征;此外,考虑到视觉皮层的神经元反应是频率依赖的,我们通过使用四维离散Haar小波变换(4D-DHWT)将lfi分解成多层次小波子带来模拟视觉感知过程。然后分别计算第一级4D-DHWT子带的子带几何特征和第二级4D-DHWT子带的系数强度特征。最后,我们结合所有提取的质量感知特征,并使用广泛使用的支持向量回归(SVR)来预测lfi的感知质量。为了充分验证所提出度量的有效性,我们使用两种交叉验证方法在五个代表性的LFIQA数据库上进行了广泛的实验。实验结果证明了该度量在质量评价方面的优越性,并且与其他最新度量相比具有较低的时间复杂度。完整的代码将在https://github.com/ZhengyuZhang96/MPT-MWI-BLiF上公开提供
{"title":"Blind Light Field Image Quality Assessment Using Multiplane Texture and Multilevel Wavelet Information","authors":"Zhengyu Zhang;Shishun Tian;Jianjun Xiang;Wenbin Zou;Luce Morin;Lu Zhang","doi":"10.1109/TBC.2025.3627787","DOIUrl":"https://doi.org/10.1109/TBC.2025.3627787","url":null,"abstract":"Light Field Image (LFI) has garnered remarkable interest and fascination due to its burgeoning significance in immersive applications. Although the abundant information in LFIs enables a more immersive experience, it also poses a greater challenge for Light Field Image Quality Assessment (LFIQA), especially when reference information is inaccessible. In this paper, inspired by the holistic visual perception of high-dimensional LFIs and neuroscience studies on the Human Visual System (HVS), we propose a novel Blind Light Field image quality assessment metric by exploring MultiPlane Texture and Multilevel Wavelet Information, abbreviated as MPT-MWI-BLiF. Specifically, considering the texture sensitivity of the secondary visual cortex (V2), we first convert LFIs into multiple individual planes and capture textural variations from these planes. Then, the statistical histogram of textural variations for all planes is calculated as holistic textural variation features. In addition, motivated by the fact that neuronal responses in the visual cortex are frequency-dependent, we simulate this visual perception process by decomposing LFIs into multilevel wavelet subbands with Four-Dimensional Discrete Haar Wavelet Transform (4D-DHWT). After that, the subband geometric features of first-level 4D-DHWT subbands and the coefficient intensity features of second-level 4D-DHWT subbands are computed respectively. Finally, we combine all the extracted quality-aware features and employ the widely-used Support Vector Regression (SVR) to predict the perceptual quality of LFIs. To fully validate the effectiveness of the proposed metric, we perform extensive experiments on five representative LFIQA databases with two cross-validation methods. Experimental results demonstrate the superiority of the proposed metric in quality evaluation, as well as its low time complexity compared to other state-of-the-art metrics. The full code will be publicly available at <uri>https://github.com/ZhengyuZhang96/MPT-MWI-BLiF</uri>","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 4","pages":"1092-1107"},"PeriodicalIF":4.8,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145765632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Broadcasting
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1