首页 > 最新文献

IEEE Transactions on Circuits and Systems for Video Technology最新文献

英文 中文
3D-Aided Pedestrian Representation Learning for Video-Based Person Re-Identification 基于视频的人再识别的3d辅助行人表征学习
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-07 DOI: 10.1109/TCSVT.2025.3586808
Guquan Jing;Peng Gao;Yujian Lee;Yiyang Hu;Hui Zhang
Video-based person re-identification (Re-ID) aims to match the target pedestrian from video sequences. Recent methods perform frame-level feature extraction followed by temporal aggregation to obtain video representations. However, they pay insufficient attention to the quality of frame-level features, which suffer from issues including multi-frame misalignment, partial occlusion and appearance confusion. People live in a 3D space. 3D pedestrian representations can provide rich geometric information and shape cues that offer promising solutions to these challenges in video-based Re-ID. To mitigate these issues, this paper proposes a 3D-Aid Pedestrian Representation Learning (3DAPRL) network, which introduces 3D modality to video-based Re-ID. Specifically, two novel modules are designed, i.e., the Cross-Modal Fusion (CMF) module and the Shape-aware Spatial-Temporal Interaction (SSTI) module, to enhance pedestrian representation learning. The CMF module generates discriminative fusion representations by utilizing 3D pedestrian data, while the SSTI module learns spatial-temporal 3D shape representation which are distinguishable for finding the target pedestrian in video scenarios. Both features generated from the CMF and SSTI modules contribute to the final video representation. Extensive experiments on four challenging video-based Re-ID datasets demonstrate that our 3DAPRL network reaches better performance than state-of-the-arts methods.
基于视频的人再识别(Re-ID)旨在从视频序列中匹配目标行人。最近的方法执行帧级特征提取,然后进行时间聚合以获得视频表示。然而,它们对帧级特征的质量重视不够,存在多帧不对齐、局部遮挡和外观混淆等问题。人们生活在3D空间中。3D行人表示可以提供丰富的几何信息和形状线索,为基于视频的Re-ID中的这些挑战提供了有希望的解决方案。为了缓解这些问题,本文提出了一种3D辅助行人表征学习(3DAPRL)网络,该网络将3D模式引入到基于视频的Re-ID中。具体而言,设计了跨模态融合(CMF)模块和形状感知时空交互(SSTI)模块来增强行人表征学习。CMF模块利用三维行人数据生成判别性融合表示,SSTI模块学习可分辨的时空三维形状表示,用于在视频场景中寻找目标行人。CMF和SSTI模块生成的两个特征都有助于最终的视频表示。在四个具有挑战性的基于视频的Re-ID数据集上进行的大量实验表明,我们的3DAPRL网络比最先进的方法达到了更好的性能。
{"title":"3D-Aided Pedestrian Representation Learning for Video-Based Person Re-Identification","authors":"Guquan Jing;Peng Gao;Yujian Lee;Yiyang Hu;Hui Zhang","doi":"10.1109/TCSVT.2025.3586808","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3586808","url":null,"abstract":"Video-based person re-identification (Re-ID) aims to match the target pedestrian from video sequences. Recent methods perform frame-level feature extraction followed by temporal aggregation to obtain video representations. However, they pay insufficient attention to the quality of frame-level features, which suffer from issues including multi-frame misalignment, partial occlusion and appearance confusion. People live in a 3D space. 3D pedestrian representations can provide rich geometric information and shape cues that offer promising solutions to these challenges in video-based Re-ID. To mitigate these issues, this paper proposes a 3D-Aid Pedestrian Representation Learning (3DAPRL) network, which introduces 3D modality to video-based Re-ID. Specifically, two novel modules are designed, <italic>i.e.</i>, the Cross-Modal Fusion (CMF) module and the Shape-aware Spatial-Temporal Interaction (SSTI) module, to enhance pedestrian representation learning. The CMF module generates discriminative fusion representations by utilizing 3D pedestrian data, while the SSTI module learns spatial-temporal 3D shape representation which are distinguishable for finding the target pedestrian in video scenarios. Both features generated from the CMF and SSTI modules contribute to the final video representation. Extensive experiments on four challenging video-based Re-ID datasets demonstrate that our 3DAPRL network reaches better performance than state-of-the-arts methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12830-12845"},"PeriodicalIF":11.1,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Circuits and Systems for Video Technology Publication Information IEEE视频技术电路与系统汇刊
IF 8.3 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-04 DOI: 10.1109/TCSVT.2025.3580975
{"title":"IEEE Transactions on Circuits and Systems for Video Technology Publication Information","authors":"","doi":"10.1109/TCSVT.2025.3580975","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3580975","url":null,"abstract":"","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"C2-C2"},"PeriodicalIF":8.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11071889","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Call for Special Issues Proposals 特别议题提案征集
IF 8.3 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-04 DOI: 10.1109/TCSVT.2025.3580998
{"title":"Call for Special Issues Proposals","authors":"","doi":"10.1109/TCSVT.2025.3580998","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3580998","url":null,"abstract":"","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"7322-7322"},"PeriodicalIF":8.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11071930","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrections to “DCNet: Large-Scale Point Cloud Semantic Segmentation With Discriminative and Efficient Feature Aggregation” 对“基于判别和高效特征聚合的大规模点云语义分割”的修正
IF 8.3 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-04 DOI: 10.1109/TCSVT.2025.3575262
Fukun Yin;Zilong Huang;Tao Chen;Guozhong Luo;Gang Yu;Bin Fu
Presents corrections to the paper, “DCNet: Large-Scale Point Cloud Semantic Segmentation With Discriminative and Efficient Feature Aggregation”.
对论文“DCNet:基于判别和高效特征聚合的大规模点云语义分割”进行了修正。
{"title":"Corrections to “DCNet: Large-Scale Point Cloud Semantic Segmentation With Discriminative and Efficient Feature Aggregation”","authors":"Fukun Yin;Zilong Huang;Tao Chen;Guozhong Luo;Gang Yu;Bin Fu","doi":"10.1109/TCSVT.2025.3575262","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3575262","url":null,"abstract":"Presents corrections to the paper, “DCNet: Large-Scale Point Cloud Semantic Segmentation With Discriminative and Efficient Feature Aggregation”.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"7321-7321"},"PeriodicalIF":8.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CompCraft: Foreground-Driven Image Synthesis With Customized Layouts 前景驱动的图像合成与自定义布局
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-04 DOI: 10.1109/TCSVT.2025.3585898
Honglin Guo;Ruidong Chen;Weizhi Nie;Lanjun Wang;Anan Liu
Recently, advancements in text-to-image synthesis and image customization have drawn significant attention. Among these technologies, foreground-driven image synthesis models aim to create diverse scenes for specific foregrounds, showing broad application prospects. However, existing foreground-driven diffusion models struggle to accurately generate scenes with layouts that align with user intentions. To address these challenges, we propose CompCraft, a training-free framework that enhances layout control and improves overall generation quality in current models. First, CompCraft identifies that the failure of existing methods to achieve effective control arises from the excessive influence of fully denoised foreground information on the generated scene. To address this, we propose a foreground regularization strategy that modifies the foreground-related attention maps, reducing their impact and ensuring better integration of the foreground with the generated scene. Then, we propose a series of inference-time layout guidance strategies to guide the image generation process with the user’s finely customized layouts. These strategies enable current foreground-driven diffusion models with accurate layout control. Finally, we introduce a comprehensive benchmark to evaluate CompCraft. Both quantitative and qualitative results demonstrate that CompCraft can effectively generate high-quality images with precise customized layouts, showcasing its strong capabilities in pratical image synthesis applications.
最近,文本到图像合成和图像定制方面的进展引起了人们的极大关注。在这些技术中,前景驱动的图像合成模型旨在为特定前景创造多样化的场景,显示出广阔的应用前景。然而,现有的前景驱动扩散模型很难准确地生成与用户意图一致的布局场景。为了解决这些挑战,我们提出了CompCraft,这是一个无需训练的框架,可以增强布局控制并提高当前模型的整体生成质量。首先,CompCraft发现,现有方法无法实现有效控制的原因是完全去噪的前景信息对生成场景的影响过大。为了解决这个问题,我们提出了一种前景正则化策略,修改前景相关的注意图,减少它们的影响,并确保前景与生成的场景更好地融合。然后,我们提出了一系列推理时间布局引导策略,以用户精细定制的布局指导图像生成过程。这些策略使当前前景驱动的扩散模型具有精确的布局控制。最后,我们介绍了一个综合的基准来评估CompCraft。定量和定性结果均表明,CompCraft可以有效地生成具有精确定制布局的高质量图像,展示了其在实际图像合成应用中的强大能力。
{"title":"CompCraft: Foreground-Driven Image Synthesis With Customized Layouts","authors":"Honglin Guo;Ruidong Chen;Weizhi Nie;Lanjun Wang;Anan Liu","doi":"10.1109/TCSVT.2025.3585898","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3585898","url":null,"abstract":"Recently, advancements in text-to-image synthesis and image customization have drawn significant attention. Among these technologies, foreground-driven image synthesis models aim to create diverse scenes for specific foregrounds, showing broad application prospects. However, existing foreground-driven diffusion models struggle to accurately generate scenes with layouts that align with user intentions. To address these challenges, we propose <bold>CompCraft</b>, a training-free framework that enhances layout control and improves overall generation quality in current models. First, CompCraft identifies that the failure of existing methods to achieve effective control arises from the excessive influence of fully denoised foreground information on the generated scene. To address this, we propose a foreground regularization strategy that modifies the foreground-related attention maps, reducing their impact and ensuring better integration of the foreground with the generated scene. Then, we propose a series of inference-time layout guidance strategies to guide the image generation process with the user’s finely customized layouts. These strategies enable current foreground-driven diffusion models with accurate layout control. Finally, we introduce a comprehensive benchmark to evaluate CompCraft. Both quantitative and qualitative results demonstrate that CompCraft can effectively generate high-quality images with precise customized layouts, showcasing its strong capabilities in pratical image synthesis applications.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12747-12759"},"PeriodicalIF":11.1,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Circuits and Systems Society Information IEEE电路与系统学会信息
IF 8.3 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-04 DOI: 10.1109/TCSVT.2025.3580997
{"title":"IEEE Circuits and Systems Society Information","authors":"","doi":"10.1109/TCSVT.2025.3580997","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3580997","url":null,"abstract":"","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"C3-C3"},"PeriodicalIF":8.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11071888","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMDStegNet: An Adversarial Steganography Framework With Maximum Mean Discrepancy Regularization MMDStegNet:一个具有最大平均差异正则化的对抗性隐写框架
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-03 DOI: 10.1109/TCSVT.2025.3585589
Ziwen He;Xingjie Dai;Xiang Zhang;Zhangjie Fu
Recent advances in steganography leverage generative adversarial networks (GANs) as a robust framework for securing covert communications through adversarial training between stego-generators and steganalytic discriminators. This paradigm facilitates the synthesis of secure steganographic images by harnessing the competition between network components. However, existing GAN-based approaches suffer from asymmetric capacity between generators and discriminators: suboptimally trained discriminators provide inadequate gradient guidance for generator optimization, causing premature convergence and security degradation. To overcome this critical limitation, we propose an enhanced multi-steganalyzer adversarial architecture incorporating maximum mean discrepancy (MMD) regularization. Our framework introduces two key innovations: 1) an MMD-based regularization mechanism mitigating distributional discrepancies among multiple steganalyzers through kernel embedding optimization, and 2) a reward function with fusing gradients derived from multiple steganalyzers to boost reinforcement learning-based adversarial training. This dual strategy enables the discriminator to learn generalized forensic features while maintaining equilibrium in adversarial training dynamics, ultimately allowing the generator to produce stego images resistant to multiple steganalyzers simultaneously. Comprehensive experiments validate our method’s superiority: When evaluated across five steganalysis networks, including YedNet, CovNet, LWENet, SRNet, and SwT-SN, at 0.1-0.4 bpp payloads, the proposed framework achieves improvements in average detection error rates over state-of-the-art techniques such as SPAR-RL and GMAN. Ablation studies further confirm that MMD regularization contributes significantly to security enhancement.
隐写术的最新进展利用生成对抗网络(gan)作为一个强大的框架,通过隐写生成器和隐写分析鉴别器之间的对抗训练来保护秘密通信。这种模式通过利用网络组件之间的竞争,促进了安全隐写图像的合成。然而,现有的基于gan的方法存在生成器和鉴别器之间容量不对称的问题:次优训练的鉴别器为生成器优化提供的梯度指导不足,导致过早收敛和安全性下降。为了克服这一关键限制,我们提出了一种增强的多隐写分析器对抗结构,该结构结合了最大平均差异(MMD)正则化。我们的框架引入了两个关键创新:1)基于mmd的正则化机制,通过核嵌入优化减轻多个隐写分析器之间的分布差异;2)从多个隐写分析器中提取的具有融合梯度的奖励函数,以促进基于强化学习的对抗训练。这种双重策略使鉴别器能够学习广义取证特征,同时在对抗训练动态中保持平衡,最终允许生成器同时产生抵抗多个隐写分析器的隐写图像。综合实验验证了我们的方法的优越性:当在5个隐写分析网络(包括YedNet、CovNet、LWENet、SRNet和SwT-SN)上进行评估时,在0.1-0.4 bpp的有效载荷下,所提出的框架比最先进的技术(如SPAR-RL和GMAN)实现了平均检测错误率的改进。消融研究进一步证实了MMD正则化对安全性增强的显著贡献。
{"title":"MMDStegNet: An Adversarial Steganography Framework With Maximum Mean Discrepancy Regularization","authors":"Ziwen He;Xingjie Dai;Xiang Zhang;Zhangjie Fu","doi":"10.1109/TCSVT.2025.3585589","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3585589","url":null,"abstract":"Recent advances in steganography leverage generative adversarial networks (GANs) as a robust framework for securing covert communications through adversarial training between stego-generators and steganalytic discriminators. This paradigm facilitates the synthesis of secure steganographic images by harnessing the competition between network components. However, existing GAN-based approaches suffer from asymmetric capacity between generators and discriminators: suboptimally trained discriminators provide inadequate gradient guidance for generator optimization, causing premature convergence and security degradation. To overcome this critical limitation, we propose an enhanced multi-steganalyzer adversarial architecture incorporating maximum mean discrepancy (MMD) regularization. Our framework introduces two key innovations: 1) an MMD-based regularization mechanism mitigating distributional discrepancies among multiple steganalyzers through kernel embedding optimization, and 2) a reward function with fusing gradients derived from multiple steganalyzers to boost reinforcement learning-based adversarial training. This dual strategy enables the discriminator to learn generalized forensic features while maintaining equilibrium in adversarial training dynamics, ultimately allowing the generator to produce stego images resistant to multiple steganalyzers simultaneously. Comprehensive experiments validate our method’s superiority: When evaluated across five steganalysis networks, including YedNet, CovNet, LWENet, SRNet, and SwT-SN, at 0.1-0.4 bpp payloads, the proposed framework achieves improvements in average detection error rates over state-of-the-art techniques such as SPAR-RL and GMAN. Ablation studies further confirm that MMD regularization contributes significantly to security enhancement.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12918-12924"},"PeriodicalIF":11.1,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Task Learning Model for V-PCC Geometry Compression Artifact Removal V-PCC几何压缩伪影去除的多任务学习模型
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-03 DOI: 10.1109/TCSVT.2025.3585554
Jian Xiong;Junhao Wu;Wang Luo;Jiu-Cheng Xie;Hui Yuan;Hao Gao
In video-based point cloud compression (V-PCC), point clouds are projected as videos using a patch projection method and then compressed using video coding techniques. However, the lossy video compression and the down-sampling of occupancy maps (OMs) can lead to geometry compression artifacts, i.e., depth errors and OM errors, respectively. These errors can significantly affect the reconstruction quality of the point clouds. Existing methods can only eliminate one type of error and therefore have limited quality improvement. In this paper, to improve the quality maximally, a multi-task learning-based geometry compression artifact removal method is proposed to reduce both types of errors simultaneously. Considering the differences between the two tasks, the proposed method deals with the challenges of shared feature extraction and heterogeneous objective optimization. First, we propose a context-aware multi-task learning (CAML) model. The proposed CAML model can extract shared features that are context-aware and satisfy both tasks. Second, an improved optimization scheme is presented to train the proposed model. The improved optimization can fix the gradient imbalance of model updating. Cross-validation experiments show that the proposed method saves an average of over 45% Bj $phi $ ntegaard Delta bitrate in terms of the D2 metric.
在基于视频的点云压缩(V-PCC)中,点云采用补丁投影法作为视频进行投影,然后使用视频编码技术进行压缩。然而,有损视频压缩和占用地图(OMs)的下采样会导致几何压缩伪影,即深度误差和OM误差。这些误差会严重影响点云的重建质量。现有的方法只能消除一种类型的误差,因此质量改进有限。为了最大限度地提高图像质量,本文提出了一种基于多任务学习的几何压缩伪影去除方法,以同时减少这两种误差。考虑到两种任务之间的差异,该方法解决了共享特征提取和异构目标优化的难题。首先,我们提出了一个上下文感知的多任务学习(CAML)模型。提出的CAML模型可以提取上下文感知的共享特征,同时满足这两个任务。其次,提出了一种改进的优化方案来训练所提出的模型。改进后的优化可以解决模型更新时的梯度不平衡问题。交叉验证实验表明,该方法在D2度量方面平均节省了45%以上的Bj $phi $ integrard Delta比特率。
{"title":"Multi-Task Learning Model for V-PCC Geometry Compression Artifact Removal","authors":"Jian Xiong;Junhao Wu;Wang Luo;Jiu-Cheng Xie;Hui Yuan;Hao Gao","doi":"10.1109/TCSVT.2025.3585554","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3585554","url":null,"abstract":"In video-based point cloud compression (V-PCC), point clouds are projected as videos using a patch projection method and then compressed using video coding techniques. However, the lossy video compression and the down-sampling of occupancy maps (OMs) can lead to geometry compression artifacts, i.e., depth errors and OM errors, respectively. These errors can significantly affect the reconstruction quality of the point clouds. Existing methods can only eliminate one type of error and therefore have limited quality improvement. In this paper, to improve the quality maximally, a multi-task learning-based geometry compression artifact removal method is proposed to reduce both types of errors simultaneously. Considering the differences between the two tasks, the proposed method deals with the challenges of shared feature extraction and heterogeneous objective optimization. First, we propose a context-aware multi-task learning (CAML) model. The proposed CAML model can extract shared features that are context-aware and satisfy both tasks. Second, an improved optimization scheme is presented to train the proposed model. The improved optimization can fix the gradient imbalance of model updating. Cross-validation experiments show that the proposed method saves an average of over 45% Bj<inline-formula> <tex-math>$phi $ </tex-math></inline-formula>ntegaard Delta bitrate in terms of the D2 metric.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12802-12815"},"PeriodicalIF":11.1,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 0.96 pJ/SOP Heterogeneous Neuromorphic Chip Toward Energy-Efficient Edge Visual Applications 面向节能边缘视觉应用的0.96 pJ/SOP异质神经形态芯片
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-02 DOI: 10.1109/TCSVT.2025.3585355
P. J. Zhou;G. C. Qiao;Q. Yu;M. Chen;Y. C. Wang;Y. C. Chen;J. J. Wang;N. Ning;Y. Liu;S. G. Hu
Edge devices require low power consumption and compact area, which poses challenges for visual signal processing. This work introduces an energy-efficient heterogeneous neuromorphic system-on-chip (SoC) for edge visual computing. The neuromorphic core design incorporates advanced technologies, such as sparse-aware synaptic calculation, partial membrane potential update, non-uniform weight quantization, and partial parallel computing, achieving excellent energy efficiency, computing performance, and area utilization. Twenty neuromorphic cores and twelve multi-mode connected-matrix-based routers form a network-on-chip (NoC) with fullerene-like topology. Its average degree of communication nodes exceeds traditional topologies by 32 % and maintains a minimum degree variance of 0.93, thereby enabling advanced decentralized on-chip communication. Moreover, the NoC can be scaled up through extended off-chip high-level router nodes. At the top layer of the SoC, a RISC-V CPU and a 20-core neuromorphic processor are tightly coupled to form a heterogeneous architecture. Eventually, the chip is fabricated within a 3.41 mm2 die area under 55 nm CMOS technology, achieving a low power density of 0.52 mW/mm2 and a high neuron density of 30.23 K/mm2. Its effectiveness is verified across different visual tasks, with a best energy efficiency of 0.96 pJ/SOP. This work is expected to promote the development of neuromorphic computing in edge visual applications.
边缘设备需要低功耗和紧凑的面积,这对视觉信号处理提出了挑战。本文介绍了一种用于边缘视觉计算的节能异构神经形态片上系统(SoC)。神经形态核心设计融合了诸如稀疏感知突触计算、部分膜电位更新、非均匀权量化和部分并行计算等先进技术,实现了卓越的能效、计算性能和面积利用率。20个神经形态核心和12个基于多模式连接矩阵的路由器构成了一个具有类似富勒烯拓扑结构的片上网络(NoC)。其通信节点的平均程度超过传统拓扑32%,并保持最小度方差0.93,从而实现先进的分散式片上通信。此外,NoC可以通过扩展的片外高级路由器节点来扩展。在SoC的顶层,RISC-V CPU和20核神经形态处理器紧密耦合,形成异构架构。最终,该芯片在55 nm CMOS技术下在3.41 mm2的芯片面积内制造,实现了0.52 mW/mm2的低功率密度和30.23 K/mm2的高神经元密度。在不同的视觉任务中验证了其有效性,其最佳能量效率为0.96 pJ/SOP。这项工作有望促进神经形态计算在边缘视觉应用中的发展。
{"title":"A 0.96 pJ/SOP Heterogeneous Neuromorphic Chip Toward Energy-Efficient Edge Visual Applications","authors":"P. J. Zhou;G. C. Qiao;Q. Yu;M. Chen;Y. C. Wang;Y. C. Chen;J. J. Wang;N. Ning;Y. Liu;S. G. Hu","doi":"10.1109/TCSVT.2025.3585355","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3585355","url":null,"abstract":"Edge devices require low power consumption and compact area, which poses challenges for visual signal processing. This work introduces an energy-efficient heterogeneous neuromorphic system-on-chip (SoC) for edge visual computing. The neuromorphic core design incorporates advanced technologies, such as sparse-aware synaptic calculation, partial membrane potential update, non-uniform weight quantization, and partial parallel computing, achieving excellent energy efficiency, computing performance, and area utilization. Twenty neuromorphic cores and twelve multi-mode connected-matrix-based routers form a network-on-chip (NoC) with fullerene-like topology. Its average degree of communication nodes exceeds traditional topologies by 32 % and maintains a minimum degree variance of 0.93, thereby enabling advanced decentralized on-chip communication. Moreover, the NoC can be scaled up through extended off-chip high-level router nodes. At the top layer of the SoC, a RISC-V CPU and a 20-core neuromorphic processor are tightly coupled to form a heterogeneous architecture. Eventually, the chip is fabricated within a 3.41 mm<sup>2</sup> die area under 55 nm CMOS technology, achieving a low power density of 0.52 mW/mm<sup>2</sup> and a high neuron density of 30.23 K/mm<sup>2</sup>. Its effectiveness is verified across different visual tasks, with a best energy efficiency of 0.96 pJ/SOP. This work is expected to promote the development of neuromorphic computing in edge visual applications.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12890-12903"},"PeriodicalIF":11.1,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LAC-PS: A Light Direction Selection Policy Under the Accuracy Constraint for Photometric Stereo 光度立体精度约束下的光方向选择策略
IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-13 DOI: 10.1109/TCSVT.2025.3579572
Wenjia Meng;Huimin Han;Xiankai Lu;Yilong Yin;Gang Pan;Qian Zheng
Photometric stereo (PS) methods recover surface normals from appearance changes under varying light directions, excelling in tasks like 3D surface reconstruction and defect inspection. However, collecting the illumination images is expensive, and current PS methods cannot obtain the light direction set that satisfies the pre-defined accuracy constraint, limiting their adaptability to various applications with varying accuracy requirements. To address this issue, we propose the LAC-PS, a light direction selection policy under the accuracy constraint for photometric stereo, which optimizes the light direction set to meet target reconstruction accuracy. In our method, we develop an accuracy assessment network that estimates reconstruction accuracy without ground truth. With this estimated accuracy, we put forward a reinforcement learning-based method that can utilize policy to sequentially select light directions and obtain the light directions satisfying the desired PS recovery accuracy constraint. Experimental results on real and synthetic datasets demonstrate that our method effectively selects light directions that satisfy accuracy constraints.
光度立体(Photometric stereo, PS)方法在不同光照方向下从外观变化中恢复表面法线,在三维表面重建和缺陷检测等任务中表现优异。然而,采集照明图像的成本较高,且目前的PS方法无法获得满足预定精度约束的光方向集,限制了其对各种精度要求不同的应用的适应性。为了解决这一问题,我们提出了精度约束下的光度立体光方向选择策略LAC-PS,优化光方向集以满足目标重建精度。在我们的方法中,我们开发了一个精度评估网络,可以在不考虑地面真值的情况下估计重建精度。在此估计精度的基础上,提出了一种基于强化学习的方法,利用策略顺序选择光照方向,得到满足期望PS恢复精度约束的光照方向。在真实数据集和合成数据集上的实验结果表明,该方法可以有效地选择满足精度约束的光方向。
{"title":"LAC-PS: A Light Direction Selection Policy Under the Accuracy Constraint for Photometric Stereo","authors":"Wenjia Meng;Huimin Han;Xiankai Lu;Yilong Yin;Gang Pan;Qian Zheng","doi":"10.1109/TCSVT.2025.3579572","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3579572","url":null,"abstract":"Photometric stereo (PS) methods recover surface normals from appearance changes under varying light directions, excelling in tasks like 3D surface reconstruction and defect inspection. However, collecting the illumination images is expensive, and current PS methods cannot obtain the light direction set that satisfies the pre-defined accuracy constraint, limiting their adaptability to various applications with varying accuracy requirements. To address this issue, we propose the LAC-PS, a light direction selection policy under the accuracy constraint for photometric stereo, which optimizes the light direction set to meet target reconstruction accuracy. In our method, we develop an accuracy assessment network that estimates reconstruction accuracy without ground truth. With this estimated accuracy, we put forward a reinforcement learning-based method that can utilize policy to sequentially select light directions and obtain the light directions satisfying the desired PS recovery accuracy constraint. Experimental results on real and synthetic datasets demonstrate that our method effectively selects light directions that satisfy accuracy constraints.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12622-12635"},"PeriodicalIF":11.1,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Circuits and Systems for Video Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1