首页 > 最新文献

IEEE Transactions on Circuits and Systems for Video Technology最新文献

英文 中文
GOTrack+: A Deep Learning Framework With Graph Optimal Transport for Particle Tracking Velocimetry GOTrack+:一个用于粒子跟踪测速的图形最优传输深度学习框架
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-29 DOI: 10.1109/TCSVT.2025.3604034
Zhi Wang;Zixuan Wang;Chao Xu;Shengze Cai
Particle image-based fluid measurement techniques are widely used to study complex flows in nature and industrial processes. Despite that particle tracking velocimetry (PTV) has shown potential in various experimental applications for quantitatively capturing unsteady flow characteristics, estimating fluid motion with long displacement and high particle density remains challenging. We propose an artificial-intelligence-enhanced PTV framework to track particle trajectories from consecutive images. The proposed framework, called GOTrack+ (a learning framework with graph optimal transport for particle tracking velocimetry), contains three components: a convolutional neural network-based particle detector for particle recognition and sub-pixel coordinate localization; a graph neural network-based initial displacement predictor for fluid motion estimation; and a graph-based optimal transport particle tracker for continuous particle trajectory linking. Each component of GOTrack+ can be extracted and used independently, not only to enhance classical PTV algorithms but also as a simple, fast, accurate, and robust alternative to traditional PTV programs. Comprehensive evaluations, including numerical simulations and real-world experiments, have shown that GOTrack+ achieves state-of-the-art performance compared to recent PTV approaches. All the codes are available at https://github.com/wuwuwuas/GOTrack.git
基于颗粒图像的流体测量技术被广泛用于研究自然界和工业过程中的复杂流动。尽管粒子跟踪测速(PTV)在定量捕获非定常流动特性的各种实验应用中显示出潜力,但估计长位移和高粒子密度的流体运动仍然具有挑战性。我们提出了一个人工智能增强的PTV框架来跟踪连续图像中的粒子轨迹。该框架名为GOTrack+(一种用于粒子跟踪测速的图形最优传输学习框架),包含三个组成部分:基于卷积神经网络的粒子检测器,用于粒子识别和亚像素坐标定位;基于图神经网络的流体运动初始位移预测器并提出了一种基于图的连续粒子轨迹连接的最优输运粒子跟踪器。GOTrack+的各个组成部分都可以独立提取和使用,不仅增强了经典的PTV算法,而且是传统PTV节目的简单、快速、准确和鲁棒的替代品。包括数值模拟和真实世界实验在内的综合评估表明,与最近的PTV方法相比,GOTrack+达到了最先进的性能。所有的代码都可以在https://github.com/wuwuwuas/GOTrack.git上找到
{"title":"GOTrack+: A Deep Learning Framework With Graph Optimal Transport for Particle Tracking Velocimetry","authors":"Zhi Wang;Zixuan Wang;Chao Xu;Shengze Cai","doi":"10.1109/TCSVT.2025.3604034","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3604034","url":null,"abstract":"Particle image-based fluid measurement techniques are widely used to study complex flows in nature and industrial processes. Despite that particle tracking velocimetry (PTV) has shown potential in various experimental applications for quantitatively capturing unsteady flow characteristics, estimating fluid motion with long displacement and high particle density remains challenging. We propose an artificial-intelligence-enhanced PTV framework to track particle trajectories from consecutive images. The proposed framework, called GOTrack+ (a learning framework with graph optimal transport for particle tracking velocimetry), contains three components: a convolutional neural network-based particle detector for particle recognition and sub-pixel coordinate localization; a graph neural network-based initial displacement predictor for fluid motion estimation; and a graph-based optimal transport particle tracker for continuous particle trajectory linking. Each component of GOTrack+ can be extracted and used independently, not only to enhance classical PTV algorithms but also as a simple, fast, accurate, and robust alternative to traditional PTV programs. Comprehensive evaluations, including numerical simulations and real-world experiments, have shown that GOTrack+ achieves state-of-the-art performance compared to recent PTV approaches. All the codes are available at <uri>https://github.com/wuwuwuas/GOTrack.git</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 2","pages":"2358-2371"},"PeriodicalIF":11.1,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Annotation-Efficient Hybrid Learning for Temporal Sentence Grounding 时间句基础的高效标注混合学习
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-26 DOI: 10.1109/TCSVT.2025.3603110
Jianxiang Dong;Zhaozheng Yin
Temporal Sentence Grounding (TSG) aims at localizing a temporal interval in an untrimmed video that contains the most relevant semantics to a given query sentence. Most existing methods either focus on addressing the problem in a fully-supervised manner where the temporal boundary annotations are provided, or are dedicated to weakly-supervised TSG without any boundary annotations. However, the former ones suffer from expensive annotation cost and the latter ones only give inferior grounding performance. In this paper, we propose an Annotation-efficient Hybrid Learning (AHL) framework that aims to achieve good TSG performance with less annotation cost by leveraging weakly semi-supervised learning, contrastive learning and active learning: (1) AHL includes a progressive pseudo-label self-learning module which generates pseudo labels and progressively selects reliable ones to re-train the model in a progressive manner; (2) AHL includes a novel self-guided contrastive learning method that performs proposal-level contrastive learning based on weakly-labeled data to align the visual and language feature; (3) AHL explores the fully-labeled set construction by gradually expanding it via actively searching on the informative weakly-labeled samples, from the aspects of both difficulty and diversity. We conduct extensive experiments on ActivityNet and Charades-STA datasets and results verify the effectiveness of our proposed AHL to exploit the weakly-labeled data and to achieve the same performance as fully-supervised method, with much less annotation cost. Our code is available at https://github.com/DJX1995/AHL
时态句子基础(TSG)的目的是在未修剪的视频中定位一个时间间隔,该时间间隔包含与给定查询句子最相关的语义。大多数现有方法要么专注于以完全监督的方式解决问题,其中提供了时间边界注释,要么致力于没有任何边界注释的弱监督TSG。但前者标注成本昂贵,后者接地性能较差。在本文中,我们提出了一个高效标注混合学习(AHL)框架,旨在利用弱半监督学习、对比学习和主动学习,以较少的标注成本获得良好的TSG性能:(1)AHL包含一个渐进式伪标签自学习模块,该模块生成伪标签,并逐步选择可靠的标签,以渐进式方式重新训练模型;(2) AHL包括一种新的自引导对比学习方法,该方法基于弱标记数据进行提案级对比学习,以对齐视觉和语言特征;(3) AHL从难度和多样性两个方面探索全标记集的构建,通过对信息丰富的弱标记样本的主动搜索,逐步扩展全标记集的构建。我们在ActivityNet和Charades-STA数据集上进行了大量的实验,结果验证了我们提出的AHL在利用弱标记数据方面的有效性,并取得了与完全监督方法相同的性能,并且注释成本更低。我们的代码可在https://github.com/DJX1995/AHL上获得
{"title":"Annotation-Efficient Hybrid Learning for Temporal Sentence Grounding","authors":"Jianxiang Dong;Zhaozheng Yin","doi":"10.1109/TCSVT.2025.3603110","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3603110","url":null,"abstract":"Temporal Sentence Grounding (TSG) aims at localizing a temporal interval in an untrimmed video that contains the most relevant semantics to a given query sentence. Most existing methods either focus on addressing the problem in a fully-supervised manner where the temporal boundary annotations are provided, or are dedicated to weakly-supervised TSG without any boundary annotations. However, the former ones suffer from expensive annotation cost and the latter ones only give inferior grounding performance. In this paper, we propose an Annotation-efficient Hybrid Learning (AHL) framework that aims to achieve good TSG performance with less annotation cost by leveraging weakly semi-supervised learning, contrastive learning and active learning: (1) AHL includes a progressive pseudo-label self-learning module which generates pseudo labels and progressively selects reliable ones to re-train the model in a progressive manner; (2) AHL includes a novel self-guided contrastive learning method that performs proposal-level contrastive learning based on weakly-labeled data to align the visual and language feature; (3) AHL explores the fully-labeled set construction by gradually expanding it via actively searching on the informative weakly-labeled samples, from the aspects of both difficulty and diversity. We conduct extensive experiments on ActivityNet and Charades-STA datasets and results verify the effectiveness of our proposed AHL to exploit the weakly-labeled data and to achieve the same performance as fully-supervised method, with much less annotation cost. Our code is available at <uri>https://github.com/DJX1995/AHL</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 2","pages":"2594-2606"},"PeriodicalIF":11.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery 针对广义类别发现的多专家适配器调优
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-26 DOI: 10.1109/TCSVT.2025.3602981
Yuxun Qu;Yongqiang Tang;Chenyang Zhang;Wensheng Zhang
Different from the traditional semi-supervised learning paradigm that is constrained by the close-world assumption, Generalized Category Discovery (GCD) presumes that the unlabeled dataset contains new categories not appearing in the labeled set, and aims to not only classify old categories but also discover new categories in the unlabeled data. Existing studies on GCD typically devote to transferring the general knowledge from the self-supervised pretrained model to the target GCD task via some fine-tuning strategies, such as partial tuning and prompt learning. Nevertheless, these fine-tuning methods fail to make a sound balance between the generalization capacity of pretrained backbone and the adaptability to the GCD task. To fill this gap, in this paper, we propose a novel adapter-tuning-based method named AdaptGCD, which is the first work to introduce the adapter tuning into the GCD task and provides some key insights expected to enlighten future research. Furthermore, considering the discrepancy of supervision information between the old and new classes, a multi-expert adapter structure equipped with a route assignment constraint is elaborately devised, such that the data from old and new classes are separated into different expert groups. Extensive experiments are conducted on 7 widely-used datasets. The remarkable performance improvements highlight the efficacy of our proposal and it can be also combined with other advanced methods like SPTNet for further enhancement.
与传统受封闭世界假设约束的半监督学习范式不同,广义类别发现(GCD)假设未标记数据集包含未出现在标记集中的新类别,其目的是在对旧类别进行分类的同时,在未标记数据中发现新类别。现有的GCD研究通常是通过一些微调策略,如部分微调和提示学习,将自监督预训练模型的一般知识转移到目标GCD任务中。然而,这些微调方法未能在预训练骨干的泛化能力和对GCD任务的适应性之间取得良好的平衡。为了填补这一空白,在本文中,我们提出了一种新的基于适配器调优的方法,称为AdaptGCD,这是第一次将适配器调优引入GCD任务,并提供了一些关键的见解,预计将启发未来的研究。在此基础上,考虑到新旧类之间监督信息的差异,设计了一种带有路由分配约束的多专家适配器结构,将新旧类的数据划分到不同的专家组中。在7个广泛使用的数据集上进行了大量的实验。显著的性能改进突出了我们的建议的有效性,它也可以结合其他先进的方法,如SPTNet进一步增强。
{"title":"AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery","authors":"Yuxun Qu;Yongqiang Tang;Chenyang Zhang;Wensheng Zhang","doi":"10.1109/TCSVT.2025.3602981","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3602981","url":null,"abstract":"Different from the traditional semi-supervised learning paradigm that is constrained by the close-world assumption, Generalized Category Discovery (GCD) presumes that the unlabeled dataset contains new categories not appearing in the labeled set, and aims to not only classify old categories but also discover new categories in the unlabeled data. Existing studies on GCD typically devote to transferring the general knowledge from the self-supervised pretrained model to the target GCD task via some fine-tuning strategies, such as partial tuning and prompt learning. Nevertheless, these fine-tuning methods fail to make a sound balance between the generalization capacity of pretrained backbone and the adaptability to the GCD task. To fill this gap, in this paper, we propose a novel adapter-tuning-based method named AdaptGCD, which is the first work to introduce the adapter tuning into the GCD task and provides some key insights expected to enlighten future research. Furthermore, considering the discrepancy of supervision information between the old and new classes, a multi-expert adapter structure equipped with a route assignment constraint is elaborately devised, such that the data from old and new classes are separated into different expert groups. Extensive experiments are conducted on 7 widely-used datasets. The remarkable performance improvements highlight the efficacy of our proposal and it can be also combined with other advanced methods like SPTNet for further enhancement.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 2","pages":"2344-2357"},"PeriodicalIF":11.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Medical VLP Model Is Vulnerable: Toward Multimodal Adversarial Attack on Large Medical Vision-Language Models 医学VLP模型易受攻击:针对大型医学视觉语言模型的多模态对抗性攻击
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-26 DOI: 10.1109/TCSVT.2025.3602970
Zimu Lu;Ning Xu;Hongshuo Tian;Lanjun Wang;An-An Liu
Medical Visual Question Answering (Medical VQA) is an essential task that facilitates the automated interpretation of complex clinical imagery with corresponding textual questions, thereby supporting both clinicians and patients in making informed medical decisions. With the rapid progress of Vision-Language Pretraining (VLP) in general domains, the development of medical VLP models has emerged as a rapidly growing interdisciplinary area at the intersection of artificial intelligence (AI) and healthcare. However, few works have been proposed to evaluate the adversarial robustness of medical VLP models, which faces two primary challenges: 1) the complexity of medical texts, stemming from the presence of terminologies, poses significant challenges for models in comprehending the text for adversarial attack; 2) the diversity of medical images arises from the variety of anatomical regions depicted, which requires models to determine critical anatomical regions for attack. In this paper, we propose a novel multimodal adversarial attack generator for evaluating the robustness of medical VLP models. Specifically, for the complexity of medical texts, we integrate medical knowledge when crafting text adversarial samples, which can facilitate the terminologies understanding and adversarial strength; for the diversity of medical images, we divide the anatomical regions into either global or local regions in medical images, which are determined by learned balance weights for perturbations. Our experimental study not only provides a quantitative understanding in medical VLP models, but also underscores the critical need for thorough safety evaluations before implementing them in real-world medical applications.
医学视觉问答(Medical VQA)是一项重要的任务,它可以通过相应的文本问题促进复杂临床图像的自动解释,从而支持临床医生和患者做出明智的医疗决策。随着视觉语言预训练(VLP)在一般领域的快速发展,医学VLP模型的开发已成为人工智能(AI)与医疗保健交叉领域快速发展的跨学科领域。然而,很少有人提出评估医学VLP模型的对抗性鲁棒性,这面临两个主要挑战:1)医学文本的复杂性,源于术语的存在,对模型在理解文本以进行对抗性攻击方面提出了重大挑战;2)医学图像的多样性源于所描绘的解剖区域的多样性,这需要模型确定攻击的关键解剖区域。在本文中,我们提出了一种新的多模态对抗攻击生成器来评估医学VLP模型的鲁棒性。具体而言,针对医学文本的复杂性,我们在制作文本对抗样本时整合医学知识,这可以促进术语的理解和对抗强度;针对医学图像的多样性,我们将医学图像的解剖区域划分为全局区域和局部区域,这两个区域由学习到的扰动平衡权来确定。我们的实验研究不仅提供了对医用VLP模型的定量理解,而且强调了在将其应用于实际医疗应用之前进行彻底安全性评估的必要性。
{"title":"Medical VLP Model Is Vulnerable: Toward Multimodal Adversarial Attack on Large Medical Vision-Language Models","authors":"Zimu Lu;Ning Xu;Hongshuo Tian;Lanjun Wang;An-An Liu","doi":"10.1109/TCSVT.2025.3602970","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3602970","url":null,"abstract":"Medical Visual Question Answering (Medical VQA) is an essential task that facilitates the automated interpretation of complex clinical imagery with corresponding textual questions, thereby supporting both clinicians and patients in making informed medical decisions. With the rapid progress of Vision-Language Pretraining (VLP) in general domains, the development of medical VLP models has emerged as a rapidly growing interdisciplinary area at the intersection of artificial intelligence (AI) and healthcare. However, few works have been proposed to evaluate the adversarial robustness of medical VLP models, which faces two primary challenges: 1) the complexity of medical texts, stemming from the presence of terminologies, poses significant challenges for models in comprehending the text for adversarial attack; 2) the diversity of medical images arises from the variety of anatomical regions depicted, which requires models to determine critical anatomical regions for attack. In this paper, we propose a novel multimodal adversarial attack generator for evaluating the robustness of medical VLP models. Specifically, for the complexity of medical texts, we integrate medical knowledge when crafting text adversarial samples, which can facilitate the terminologies understanding and adversarial strength; for the diversity of medical images, we divide the anatomical regions into either global or local regions in medical images, which are determined by learned balance weights for perturbations. Our experimental study not only provides a quantitative understanding in medical VLP models, but also underscores the critical need for thorough safety evaluations before implementing them in real-world medical applications.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 2","pages":"2478-2491"},"PeriodicalIF":11.1,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text and Non-Text Latent Feature Disentanglement for Screen Content Image Compression 用于屏幕内容图像压缩的文本和非文本潜在特征解纠缠
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-25 DOI: 10.1109/TCSVT.2025.3602506
Hao Wang;Junyan Huo;Fei Yang;Shuai Wan;Gaoxing Chen;Kun Yang;Luis Herranz;Fuzheng Yang
With the growing prevalence of screen content images in multimedia communication, efficient compression has become increasingly crucial. Unlike natural scene images, screen content typically contains rich text regions that exhibit unique characteristics and low correlation with surrounding non-text elements. The intricate mixture of text and non-text within images poses significant challenges for existing learned compression networks, as the text and non-text features are severely entangled in the latent domain along the channel dimension, leading to compromised reconstruction quality and suboptimal entropy estimation. In this paper, we propose a novel Disentangled Image Compression Architecture (DICA) that enhances the analysis module and the entropy model of existing compression architectures to address these limitations. First, we introduce a Disentangled Analysis Module (DAM) by augmenting original analysis modules with an additional text approximation branch and a disentangling network. They work in concert to disentangle latent features into text and non-text classes along the channel dimension, resulting in a more structured feature distribution that better aligns with compression requirements. Second, we propose a Disentangled Channel-Conditional Entropy Model (DCEM) that efficiently leverages the feature distribution bias introduced by DAM, thereby further improving compression performance. Experimental results demonstrate that the proposed DICA, along with DAM and DCEM can be integrated into various channel-conditional compression backbones, significantly improving their performance in screen content compression–particularly in hard-to-compress text regions. When integrated with an advanced WACNN backbone, our method achieves a 13% overall BD-Rate gain and a 16% BD-Rate gain in text regions on the SIQAD dataset.
随着多媒体通信中屏幕内容图像的日益普及,高效的压缩变得越来越重要。与自然场景图像不同,屏幕内容通常包含富文本区域,这些区域表现出独特的特征,并且与周围的非文本元素相关性较低。图像中文本和非文本的复杂混合对现有的学习压缩网络提出了重大挑战,因为文本和非文本特征在潜在域中沿通道维数严重纠缠,导致重构质量下降和熵估计次优。在本文中,我们提出了一种新的解纠缠图像压缩架构(DICA),它增强了现有压缩架构的分析模块和熵模型来解决这些限制。首先,通过增加文本逼近分支和解纠缠网络对原有分析模块进行扩充,引入解纠缠分析模块(DAM)。它们协同工作,沿着通道维度将潜在特征分解为文本和非文本类,从而产生更结构化的特征分布,更好地符合压缩需求。其次,我们提出了一种解纠缠信道条件熵模型(DCEM),该模型有效地利用了DAM引入的特征分布偏差,从而进一步提高了压缩性能。实验结果表明,DICA以及DAM和DCEM可以集成到各种信道条件压缩主干中,显著提高了它们在屏幕内容压缩方面的性能,特别是在难以压缩的文本区域。当与高级WACNN骨干网集成时,我们的方法在SIQAD数据集上的文本区域实现了13%的总体BD-Rate增益和16%的BD-Rate增益。
{"title":"Text and Non-Text Latent Feature Disentanglement for Screen Content Image Compression","authors":"Hao Wang;Junyan Huo;Fei Yang;Shuai Wan;Gaoxing Chen;Kun Yang;Luis Herranz;Fuzheng Yang","doi":"10.1109/TCSVT.2025.3602506","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3602506","url":null,"abstract":"With the growing prevalence of screen content images in multimedia communication, efficient compression has become increasingly crucial. Unlike natural scene images, screen content typically contains rich text regions that exhibit unique characteristics and low correlation with surrounding non-text elements. The intricate mixture of text and non-text within images poses significant challenges for existing learned compression networks, as the text and non-text features are severely entangled in the latent domain along the channel dimension, leading to compromised reconstruction quality and suboptimal entropy estimation. In this paper, we propose a novel <bold>Disentangled Image Compression Architecture (DICA)</b> that enhances the analysis module and the entropy model of existing compression architectures to address these limitations. First, we introduce a <bold>Disentangled Analysis Module (DAM)</b> by augmenting original analysis modules with an additional text approximation branch and a disentangling network. They work in concert to disentangle latent features into text and non-text classes along the channel dimension, resulting in a more structured feature distribution that better aligns with compression requirements. Second, we propose a Disentangled Channel-Conditional Entropy Model (DCEM) that efficiently leverages the feature distribution bias introduced by DAM, thereby further improving compression performance. Experimental results demonstrate that the proposed DICA, along with DAM and DCEM can be integrated into various channel-conditional compression backbones, significantly improving their performance in screen content compression–particularly in hard-to-compress text regions. When integrated with an advanced WACNN backbone, our method achieves a 13% overall BD-Rate gain and a 16% BD-Rate gain in text regions on the SIQAD dataset.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 2","pages":"2505-2519"},"PeriodicalIF":11.1,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phylogeny-Based Traitor Tracing Method for Interleaving Attacks 基于系统发育的交错攻击叛逆者跟踪方法
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-25 DOI: 10.1109/TCSVT.2025.3602214
Karama Abdelhedi;Faten Chaabane;Walid Wannes;William Puech;Chokri Ben Amar
Today, the popularity of 3D videos is increasing significantly. This trend can be attributed to their immersive appeal and lifelike experience. In an era dominated by the widespread distribution of digital content, data integrity, and ownership, all of these elements are of crucial importance. In this context, the practice of traitor tracing, closely related to Digital Rights Management (DRM), facilitates the identification and tracking of unauthorized users who have violated copyright in order to share illegal copyright-protected content. In this paper, we propose a solution to this problem, we introduce an innovative traitor tracing approach focused on 3D video, with a particular focus on the DIBR (Depth Image-Based Rendering) format, which can be vulnerable to an Interleaving attack strategy. For this purpose, we develop a new phylogeny tree construction method designed to combat collusion attacks. Our experimental evaluations demonstrate the effectiveness of our proposed approach particularly when applied to long fingerprinting codes. Compared to Tardos’ approach, our method delivers very good results, even for a large number of colluders.
如今,3D视频的受欢迎程度正在显著增加。这种趋势可以归因于它们的沉浸式吸引力和逼真的体验。在一个以数字内容的广泛传播、数据完整性和所有权为主导的时代,所有这些因素都至关重要。在这种背景下,叛逆者追踪的做法与数字版权管理(DRM)密切相关,有助于识别和跟踪侵犯版权的未经授权的用户,从而共享非法的受版权保护的内容。在本文中,我们提出了一个解决这个问题的方法,我们引入了一种创新的叛徒跟踪方法,专注于3D视频,特别关注DIBR(深度图像基础渲染)格式,该格式容易受到交错攻击策略的攻击。为此,我们开发了一种新的系统发育树构建方法来对抗共谋攻击。我们的实验评估证明了我们提出的方法的有效性,特别是当应用于长指纹码时。与Tardos的方法相比,我们的方法提供了非常好的结果,即使对于大量的共谋者也是如此。
{"title":"Phylogeny-Based Traitor Tracing Method for Interleaving Attacks","authors":"Karama Abdelhedi;Faten Chaabane;Walid Wannes;William Puech;Chokri Ben Amar","doi":"10.1109/TCSVT.2025.3602214","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3602214","url":null,"abstract":"Today, the popularity of 3D videos is increasing significantly. This trend can be attributed to their immersive appeal and lifelike experience. In an era dominated by the widespread distribution of digital content, data integrity, and ownership, all of these elements are of crucial importance. In this context, the practice of traitor tracing, closely related to Digital Rights Management (DRM), facilitates the identification and tracking of unauthorized users who have violated copyright in order to share illegal copyright-protected content. In this paper, we propose a solution to this problem, we introduce an innovative traitor tracing approach focused on 3D video, with a particular focus on the DIBR (Depth Image-Based Rendering) format, which can be vulnerable to an Interleaving attack strategy. For this purpose, we develop a new phylogeny tree construction method designed to combat collusion attacks. Our experimental evaluations demonstrate the effectiveness of our proposed approach particularly when applied to long fingerprinting codes. Compared to Tardos’ approach, our method delivers very good results, even for a large number of colluders.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 2","pages":"2623-2634"},"PeriodicalIF":11.1,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
URA-Net: Uncertainty-Integrated Anomaly Perception and Restoration Attention Network for Unsupervised Anomaly Detection URA-Net:用于无监督异常检测的不确定性集成异常感知和恢复注意网络
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-25 DOI: 10.1109/TCSVT.2025.3602391
Wei Luo;Peng Xing;Yunkang Cao;Haiming Yao;Weiming Shen;Zechao Li
Unsupervised anomaly detection plays a pivotal role in industrial defect inspection and medical image analysis, with most methods relying on the reconstruction framework. However, these methods may suffer from over-generalization, enabling them to reconstruct anomalies well, which leads to poor detection performance. To address this issue, instead of focusing solely on normality reconstruction, we propose an innovative Uncertainty-Integrated Anomaly Perception and Restoration Attention Network (URA-Net), which explicitly restores abnormal patterns to their corresponding normality. First, unlike traditional image reconstruction methods, we utilize a pre-trained convolutional neural network to extract multi-level semantic features as the reconstruction target. To assist the URA-Net learning to restore anomalies, we introduce a novel feature-level artificial anomaly synthesis module to generate anomalous samples for training. Subsequently, a novel uncertainty-integrated anomaly perception module based on Bayesian neural networks is introduced to learn the distributions of anomalous and normal features. This facilitates the estimation of anomalous regions and ambiguous boundaries, laying the foundation for subsequent anomaly restoration. Then, we propose a novel restoration attention mechanism that leverages global normal semantic information to restore detected anomalous regions, thereby obtaining defect-free restored features. Finally, we employ residual maps between input features and restored features for anomaly detection and localization. The comprehensive experimental results on two industrial datasets, MVTec AD and BTAD, along with a medical image dataset, OCT-2017, unequivocally demonstrate the effectiveness and superiority of the proposed method.
无监督异常检测在工业缺陷检测和医学图像分析中起着举足轻重的作用,大多数方法依赖于重构框架。然而,这些方法可能会受到过度泛化的影响,使得它们能够很好地重建异常,从而导致较差的检测性能。为了解决这一问题,我们提出了一种创新的不确定性综合异常感知和恢复注意网络(URA-Net),该网络可以明确地将异常模式恢复到相应的正常状态。首先,与传统的图像重建方法不同,我们利用预训练的卷积神经网络提取多层次语义特征作为重建目标。为了帮助URA-Net学习恢复异常,我们引入了一种新的特征级人工异常合成模块来生成用于训练的异常样本。随后,引入了一种新的基于贝叶斯神经网络的不确定性集成异常感知模块来学习异常和正常特征的分布。这有利于异常区域和模糊边界的估计,为后续的异常恢复奠定基础。然后,我们提出了一种新的恢复注意机制,利用全局正常语义信息来恢复检测到的异常区域,从而获得无缺陷的恢复特征。最后,我们利用输入特征和恢复特征之间的残差映射进行异常检测和定位。在MVTec AD和BTAD两个工业数据集以及OCT-2017医学图像数据集上的综合实验结果明确地证明了该方法的有效性和优越性。
{"title":"URA-Net: Uncertainty-Integrated Anomaly Perception and Restoration Attention Network for Unsupervised Anomaly Detection","authors":"Wei Luo;Peng Xing;Yunkang Cao;Haiming Yao;Weiming Shen;Zechao Li","doi":"10.1109/TCSVT.2025.3602391","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3602391","url":null,"abstract":"Unsupervised anomaly detection plays a pivotal role in industrial defect inspection and medical image analysis, with most methods relying on the reconstruction framework. However, these methods may suffer from over-generalization, enabling them to reconstruct anomalies well, which leads to poor detection performance. To address this issue, instead of focusing solely on normality reconstruction, we propose an innovative Uncertainty-Integrated Anomaly Perception and Restoration Attention Network (URA-Net), which explicitly restores abnormal patterns to their corresponding normality. First, unlike traditional image reconstruction methods, we utilize a pre-trained convolutional neural network to extract multi-level semantic features as the reconstruction target. To assist the URA-Net learning to restore anomalies, we introduce a novel feature-level artificial anomaly synthesis module to generate anomalous samples for training. Subsequently, a novel uncertainty-integrated anomaly perception module based on Bayesian neural networks is introduced to learn the distributions of anomalous and normal features. This facilitates the estimation of anomalous regions and ambiguous boundaries, laying the foundation for subsequent anomaly restoration. Then, we propose a novel restoration attention mechanism that leverages global normal semantic information to restore detected anomalous regions, thereby obtaining defect-free restored features. Finally, we employ residual maps between input features and restored features for anomaly detection and localization. The comprehensive experimental results on two industrial datasets, MVTec AD and BTAD, along with a medical image dataset, OCT-2017, unequivocally demonstrate the effectiveness and superiority of the proposed method.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 2","pages":"2464-2477"},"PeriodicalIF":11.1,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Network-Based Adaptive Quantization for Practical Video Coding 基于深度网络的实用视频编码自适应量化
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-22 DOI: 10.1109/TCSVT.2025.3601718
Shuai Huo;Hewei Liu;Jiawen Gu;Dengchao Jin;Meng Lei;Bo Huang;Chao Zhou
The optimization of block-level quantization parameters (QP) is critical to improving the performance of practical block-based video compression encoders, but the extremely large optimization space makes it challenging to solve. Existing solutions, e.g. HEVC encoder x265, usually add some optimization constraints of the block-independent assumption and linear distortion propagation model, which limits compression efficiency improvement to a certain extent. To address this problem, a deep learning-based encoder-only adaptive quantization method (DAQ) is proposed in this paper, where a deep network is designed to adaptively model the joint temporal propagation relationship of quantization among blocks. Specifically, DAQ consists of two phases: in the training phase, considering the heavy searching cost of the traditional codec, we introduce a well-designed end-to-end learned block-based video compression network as an effective training proxy tool for the deep encoder-side network. While in the deployment phase, the trained deep network is applied to jointly predict all block QPs in a frame for the traditional encoder. Besides, our network deploys only on the encoder side without changing the standard decoder and has very low inference complexity, making it able to apply in practice. At last, we deploy DAQ in HEVC and VVC encoder for performance comparison, and the experimental results demonstrate that DAQ significantly outperforms practically used x265 with on average 15.0%, 10.9% BD-rate reduction under the SSIM and PSNR, and also achieves 12.5%, 5.0% coding gain than VTM. Moreover, for deploying deep video codec in practice, this work provides a new insight for optimizing the encoder parameters with a large space.
块级量化参数(QP)的优化是提高实际基于块的视频压缩编码器性能的关键,但其极大的优化空间使其难以解决。现有的解决方案,如HEVC编码器x265,通常会增加一些块无关假设和线性失真传播模型的优化约束,这在一定程度上限制了压缩效率的提高。为了解决这一问题,本文提出了一种基于深度学习的编码器自适应量化方法(DAQ),该方法设计了一个深度网络来自适应地建模量化在块之间的联合时间传播关系。具体来说,DAQ包括两个阶段:在训练阶段,考虑到传统编解码器的搜索成本高,我们引入了一个精心设计的端到端学习的基于块的视频压缩网络,作为深度编码器侧网络的有效训练代理工具。在部署阶段,利用训练好的深度网络对传统编码器的一帧内的所有块qp进行联合预测。此外,我们的网络只部署在编码器端,而不改变标准的解码器,并且具有非常低的推理复杂度,使其能够在实践中应用。最后,我们将DAQ部署在HEVC和VVC编码器中进行性能比较,实验结果表明,DAQ在SSIM和PSNR下的平均bd率降低了15.0%,10.9%,明显优于实际使用的x265,编码增益也比VTM高12.5%,5.0%。此外,对于实际部署深度视频编解码器,本工作为大空间下编码器参数的优化提供了新的思路。
{"title":"Deep Network-Based Adaptive Quantization for Practical Video Coding","authors":"Shuai Huo;Hewei Liu;Jiawen Gu;Dengchao Jin;Meng Lei;Bo Huang;Chao Zhou","doi":"10.1109/TCSVT.2025.3601718","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3601718","url":null,"abstract":"The optimization of block-level quantization parameters (QP) is critical to improving the performance of practical block-based video compression encoders, but the extremely large optimization space makes it challenging to solve. Existing solutions, e.g. HEVC encoder x265, usually add some optimization constraints of the block-independent assumption and linear distortion propagation model, which limits compression efficiency improvement to a certain extent. To address this problem, a deep learning-based encoder-only adaptive quantization method (DAQ) is proposed in this paper, where a deep network is designed to adaptively model the joint temporal propagation relationship of quantization among blocks. Specifically, DAQ consists of two phases: in the training phase, considering the heavy searching cost of the traditional codec, we introduce a well-designed end-to-end learned block-based video compression network as an effective training proxy tool for the deep encoder-side network. While in the deployment phase, the trained deep network is applied to jointly predict all block QPs in a frame for the traditional encoder. Besides, our network deploys only on the encoder side without changing the standard decoder and has very low inference complexity, making it able to apply in practice. At last, we deploy DAQ in HEVC and VVC encoder for performance comparison, and the experimental results demonstrate that DAQ significantly outperforms practically used x265 with on average 15.0%, 10.9% BD-rate reduction under the SSIM and PSNR, and also achieves 12.5%, 5.0% coding gain than VTM. Moreover, for deploying deep video codec in practice, this work provides a new insight for optimizing the encoder parameters with a large space.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 2","pages":"2538-2550"},"PeriodicalIF":11.1,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LEGO: Learning and Graph-Optimized Modular Tracker for Online Multi-Object Tracking With Point Clouds 乐高:学习和图形优化模块化跟踪器与点云在线多目标跟踪
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-20 DOI: 10.1109/TCSVT.2025.3600881
Zhenrong Zhang;Jianan Liu;Yuxuan Xia;Tao Huang;Qing-Long Han;Hongbin Liu
Online Multi-Object Tracking (MOT) plays a pivotal role in autonomous systems. The state-of-the-art approaches usually employ a tracking-by-detection method, and data association plays a critical role. This paper proposes a learning and graph-optimized (LEGO) modular tracker to improve data association performance in the existing literature. The proposed LEGO tracker integrates graph optimization, which efficiently formulates the association score map, facilitating the accurate and efficient matching of objects across time frames. To further enhance the state update process, the Kalman filter is added to ensure consistent tracking by incorporating temporal coherence in the object states to further enhance the state update process. Our proposed method, utilising LiDAR alone, has shown exceptional performance compared to other online tracking approaches, including LiDAR-based and LiDAR-camera fusion-based methods. LEGO ranked $3^{rd}$ among all trackers (both online and offline) and $2^{nd}$ among all online trackers in the KITTI MOT benchmark for cars, (https://www.cvlibs.net/datasets/kitti/eval_tracking.php) at the time of submitting results to KITTI object tracking evaluation ranking board. Moreover, our method also achieves competitive performance on the Waymo open dataset benchmark.
在线多目标跟踪(MOT)在自主系统中起着至关重要的作用。最先进的方法通常采用检测跟踪方法,数据关联起着关键作用。本文提出了一种学习和图形优化(LEGO)模块化跟踪器,以提高现有文献中的数据关联性能。所提出的LEGO跟踪器集成了图形优化,有效地制定了关联分数图,促进了对象跨时间框架的准确高效匹配。为了进一步增强状态更新过程,加入了卡尔曼滤波器,通过结合对象状态的时间相干性来保证跟踪的一致性,进一步增强状态更新过程。与其他在线跟踪方法(包括基于激光雷达和基于激光雷达与相机融合的方法)相比,我们提出的方法仅利用激光雷达,表现出卓越的性能。在向KITTI对象跟踪评估排名板提交结果时,乐高在所有跟踪器(包括在线和离线)中排名为$3^{rd}$,在KITTI汽车MOT基准测试(https://www.cvlibs.net/datasets/kitti/eval_tracking.php)中排名为$2^{nd}$。此外,我们的方法在Waymo开放数据集基准测试上也取得了具有竞争力的性能。
{"title":"LEGO: Learning and Graph-Optimized Modular Tracker for Online Multi-Object Tracking With Point Clouds","authors":"Zhenrong Zhang;Jianan Liu;Yuxuan Xia;Tao Huang;Qing-Long Han;Hongbin Liu","doi":"10.1109/TCSVT.2025.3600881","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3600881","url":null,"abstract":"Online Multi-Object Tracking (MOT) plays a pivotal role in autonomous systems. The state-of-the-art approaches usually employ a tracking-by-detection method, and data association plays a critical role. This paper proposes a learning and graph-optimized (LEGO) modular tracker to improve data association performance in the existing literature. The proposed LEGO tracker integrates graph optimization, which efficiently formulates the association score map, facilitating the accurate and efficient matching of objects across time frames. To further enhance the state update process, the Kalman filter is added to ensure consistent tracking by incorporating temporal coherence in the object states to further enhance the state update process. Our proposed method, utilising LiDAR alone, has shown exceptional performance compared to other online tracking approaches, including LiDAR-based and LiDAR-camera fusion-based methods. LEGO ranked <inline-formula> <tex-math>$3^{rd}$ </tex-math></inline-formula> among all trackers (both online and offline) and <inline-formula> <tex-math>$2^{nd}$ </tex-math></inline-formula> among all online trackers in the KITTI MOT benchmark for cars, (<uri>https://www.cvlibs.net/datasets/kitti/eval_tracking.php</uri>) at the time of submitting results to KITTI object tracking evaluation ranking board. Moreover, our method also achieves competitive performance on the Waymo open dataset benchmark.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 2","pages":"2419-2432"},"PeriodicalIF":11.1,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146154399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Temporal Priors for Template-Generated Video Compression 挖掘时间先验的模板生成视频压缩
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-18 DOI: 10.1109/TCSVT.2025.3599239
Feng Xing;Yingwen Zhang;Meng Wang;Hengyu Man;Yongbing Zhang;Shiqi Wang;Xiaopeng Fan;Wen Gao
The popularity of template-generated videos has recently experienced a significant increase on social media platforms. In general, videos from the same template share similar temporal characteristics, which are unfortunately ignored in the current compression schemes. In view of this, we aim to examine how such temporal priors from templates can be effectively utilized during the compression process for template-generated videos. First, a comprehensive statistical analysis is conducted, revealing that the coding decisions, including the merge, non-affine, and motion information, across template-generated videos are strongly correlated. Subsequently, leveraging such correlations as prior knowledge, a simple yet effective prior-driven compression scheme for template-generated videos is proposed. In particular, a mode decision pruning algorithm is devised to dynamically skip unnecessarily advanced motion vector prediction (AMVP) or affine AMVP decisions. Moreover, an improved AMVP motion estimation algorithm is applied to further accelerate reference frame selection and the motion estimation process. Experimental results on the versatile video coding (VVC) platform VTM-23.0 demonstrate that the proposed scheme achieves moderate time reductions of 14.31% and 14.99% under the Low-Delay P (LDP) and Low-Delay B (LDB) configurations, respectively, while maintaining negligible increases in Bjøntegaard Delta Rate (BD-Rate) of 0.15% and 0.18%, respectively.
最近,在社交媒体平台上,模板生成视频的受欢迎程度显著增加。通常,来自同一模板的视频具有相似的时间特征,不幸的是,这些特征在当前的压缩方案中被忽略了。鉴于此,我们的目的是研究如何在模板生成视频的压缩过程中有效地利用模板的时间先验。首先,进行了全面的统计分析,揭示了编码决策,包括合并,非仿射和运动信息,跨模板生成的视频是强相关的。随后,利用先验知识的相关性,提出了一种简单而有效的先验驱动的模板生成视频压缩方案。特别地,设计了一种模式决策修剪算法来动态跳过不必要的高级运动向量预测(AMVP)或仿射AMVP决策。此外,采用改进的AMVP运动估计算法,进一步加快了参考帧的选择和运动估计过程。在通用视频编码(VVC)平台VTM-23.0上的实验结果表明,该方案在低延迟P (LDP)和低延迟B (LDB)配置下分别实现了14.31%和14.99%的适度时间缩减,同时保持了可忽略的bj ~ ntegaard Delta Rate (BD-Rate)的增长,分别为0.15%和0.18%。
{"title":"Mining Temporal Priors for Template-Generated Video Compression","authors":"Feng Xing;Yingwen Zhang;Meng Wang;Hengyu Man;Yongbing Zhang;Shiqi Wang;Xiaopeng Fan;Wen Gao","doi":"10.1109/TCSVT.2025.3599239","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3599239","url":null,"abstract":"The popularity of template-generated videos has recently experienced a significant increase on social media platforms. In general, videos from the same template share similar temporal characteristics, which are unfortunately ignored in the current compression schemes. In view of this, we aim to examine how such temporal priors from templates can be effectively utilized during the compression process for template-generated videos. First, a comprehensive statistical analysis is conducted, revealing that the coding decisions, including the merge, non-affine, and motion information, across template-generated videos are strongly correlated. Subsequently, leveraging such correlations as prior knowledge, a simple yet effective prior-driven compression scheme for template-generated videos is proposed. In particular, a mode decision pruning algorithm is devised to dynamically skip unnecessarily advanced motion vector prediction (AMVP) or affine AMVP decisions. Moreover, an improved AMVP motion estimation algorithm is applied to further accelerate reference frame selection and the motion estimation process. Experimental results on the versatile video coding (VVC) platform VTM-23.0 demonstrate that the proposed scheme achieves moderate time reductions of 14.31% and 14.99% under the Low-Delay P (LDP) and Low-Delay B (LDB) configurations, respectively, while maintaining negligible increases in Bjøntegaard Delta Rate (BD-Rate) of 0.15% and 0.18%, respectively.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 1","pages":"1160-1172"},"PeriodicalIF":11.1,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146049298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Circuits and Systems for Video Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1