Video-based person re-identification (Re-ID) aims to match the target pedestrian from video sequences. Recent methods perform frame-level feature extraction followed by temporal aggregation to obtain video representations. However, they pay insufficient attention to the quality of frame-level features, which suffer from issues including multi-frame misalignment, partial occlusion and appearance confusion. People live in a 3D space. 3D pedestrian representations can provide rich geometric information and shape cues that offer promising solutions to these challenges in video-based Re-ID. To mitigate these issues, this paper proposes a 3D-Aid Pedestrian Representation Learning (3DAPRL) network, which introduces 3D modality to video-based Re-ID. Specifically, two novel modules are designed, i.e., the Cross-Modal Fusion (CMF) module and the Shape-aware Spatial-Temporal Interaction (SSTI) module, to enhance pedestrian representation learning. The CMF module generates discriminative fusion representations by utilizing 3D pedestrian data, while the SSTI module learns spatial-temporal 3D shape representation which are distinguishable for finding the target pedestrian in video scenarios. Both features generated from the CMF and SSTI modules contribute to the final video representation. Extensive experiments on four challenging video-based Re-ID datasets demonstrate that our 3DAPRL network reaches better performance than state-of-the-arts methods.
{"title":"3D-Aided Pedestrian Representation Learning for Video-Based Person Re-Identification","authors":"Guquan Jing;Peng Gao;Yujian Lee;Yiyang Hu;Hui Zhang","doi":"10.1109/TCSVT.2025.3586808","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3586808","url":null,"abstract":"Video-based person re-identification (Re-ID) aims to match the target pedestrian from video sequences. Recent methods perform frame-level feature extraction followed by temporal aggregation to obtain video representations. However, they pay insufficient attention to the quality of frame-level features, which suffer from issues including multi-frame misalignment, partial occlusion and appearance confusion. People live in a 3D space. 3D pedestrian representations can provide rich geometric information and shape cues that offer promising solutions to these challenges in video-based Re-ID. To mitigate these issues, this paper proposes a 3D-Aid Pedestrian Representation Learning (3DAPRL) network, which introduces 3D modality to video-based Re-ID. Specifically, two novel modules are designed, <italic>i.e.</i>, the Cross-Modal Fusion (CMF) module and the Shape-aware Spatial-Temporal Interaction (SSTI) module, to enhance pedestrian representation learning. The CMF module generates discriminative fusion representations by utilizing 3D pedestrian data, while the SSTI module learns spatial-temporal 3D shape representation which are distinguishable for finding the target pedestrian in video scenarios. Both features generated from the CMF and SSTI modules contribute to the final video representation. Extensive experiments on four challenging video-based Re-ID datasets demonstrate that our 3DAPRL network reaches better performance than state-of-the-arts methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12830-12845"},"PeriodicalIF":11.1,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-04DOI: 10.1109/TCSVT.2025.3580975
{"title":"IEEE Transactions on Circuits and Systems for Video Technology Publication Information","authors":"","doi":"10.1109/TCSVT.2025.3580975","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3580975","url":null,"abstract":"","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"C2-C2"},"PeriodicalIF":8.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11071889","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-04DOI: 10.1109/TCSVT.2025.3580998
{"title":"Call for Special Issues Proposals","authors":"","doi":"10.1109/TCSVT.2025.3580998","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3580998","url":null,"abstract":"","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"7322-7322"},"PeriodicalIF":8.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11071930","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-04DOI: 10.1109/TCSVT.2025.3575262
Fukun Yin;Zilong Huang;Tao Chen;Guozhong Luo;Gang Yu;Bin Fu
Presents corrections to the paper, “DCNet: Large-Scale Point Cloud Semantic Segmentation With Discriminative and Efficient Feature Aggregation”.
对论文“DCNet:基于判别和高效特征聚合的大规模点云语义分割”进行了修正。
{"title":"Corrections to “DCNet: Large-Scale Point Cloud Semantic Segmentation With Discriminative and Efficient Feature Aggregation”","authors":"Fukun Yin;Zilong Huang;Tao Chen;Guozhong Luo;Gang Yu;Bin Fu","doi":"10.1109/TCSVT.2025.3575262","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3575262","url":null,"abstract":"Presents corrections to the paper, “DCNet: Large-Scale Point Cloud Semantic Segmentation With Discriminative and Efficient Feature Aggregation”.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"7321-7321"},"PeriodicalIF":8.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-04DOI: 10.1109/TCSVT.2025.3585898
Honglin Guo;Ruidong Chen;Weizhi Nie;Lanjun Wang;Anan Liu
Recently, advancements in text-to-image synthesis and image customization have drawn significant attention. Among these technologies, foreground-driven image synthesis models aim to create diverse scenes for specific foregrounds, showing broad application prospects. However, existing foreground-driven diffusion models struggle to accurately generate scenes with layouts that align with user intentions. To address these challenges, we propose CompCraft, a training-free framework that enhances layout control and improves overall generation quality in current models. First, CompCraft identifies that the failure of existing methods to achieve effective control arises from the excessive influence of fully denoised foreground information on the generated scene. To address this, we propose a foreground regularization strategy that modifies the foreground-related attention maps, reducing their impact and ensuring better integration of the foreground with the generated scene. Then, we propose a series of inference-time layout guidance strategies to guide the image generation process with the user’s finely customized layouts. These strategies enable current foreground-driven diffusion models with accurate layout control. Finally, we introduce a comprehensive benchmark to evaluate CompCraft. Both quantitative and qualitative results demonstrate that CompCraft can effectively generate high-quality images with precise customized layouts, showcasing its strong capabilities in pratical image synthesis applications.
{"title":"CompCraft: Foreground-Driven Image Synthesis With Customized Layouts","authors":"Honglin Guo;Ruidong Chen;Weizhi Nie;Lanjun Wang;Anan Liu","doi":"10.1109/TCSVT.2025.3585898","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3585898","url":null,"abstract":"Recently, advancements in text-to-image synthesis and image customization have drawn significant attention. Among these technologies, foreground-driven image synthesis models aim to create diverse scenes for specific foregrounds, showing broad application prospects. However, existing foreground-driven diffusion models struggle to accurately generate scenes with layouts that align with user intentions. To address these challenges, we propose <bold>CompCraft</b>, a training-free framework that enhances layout control and improves overall generation quality in current models. First, CompCraft identifies that the failure of existing methods to achieve effective control arises from the excessive influence of fully denoised foreground information on the generated scene. To address this, we propose a foreground regularization strategy that modifies the foreground-related attention maps, reducing their impact and ensuring better integration of the foreground with the generated scene. Then, we propose a series of inference-time layout guidance strategies to guide the image generation process with the user’s finely customized layouts. These strategies enable current foreground-driven diffusion models with accurate layout control. Finally, we introduce a comprehensive benchmark to evaluate CompCraft. Both quantitative and qualitative results demonstrate that CompCraft can effectively generate high-quality images with precise customized layouts, showcasing its strong capabilities in pratical image synthesis applications.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12747-12759"},"PeriodicalIF":11.1,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-04DOI: 10.1109/TCSVT.2025.3580997
{"title":"IEEE Circuits and Systems Society Information","authors":"","doi":"10.1109/TCSVT.2025.3580997","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3580997","url":null,"abstract":"","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"C3-C3"},"PeriodicalIF":8.3,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11071888","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-03DOI: 10.1109/TCSVT.2025.3585589
Ziwen He;Xingjie Dai;Xiang Zhang;Zhangjie Fu
Recent advances in steganography leverage generative adversarial networks (GANs) as a robust framework for securing covert communications through adversarial training between stego-generators and steganalytic discriminators. This paradigm facilitates the synthesis of secure steganographic images by harnessing the competition between network components. However, existing GAN-based approaches suffer from asymmetric capacity between generators and discriminators: suboptimally trained discriminators provide inadequate gradient guidance for generator optimization, causing premature convergence and security degradation. To overcome this critical limitation, we propose an enhanced multi-steganalyzer adversarial architecture incorporating maximum mean discrepancy (MMD) regularization. Our framework introduces two key innovations: 1) an MMD-based regularization mechanism mitigating distributional discrepancies among multiple steganalyzers through kernel embedding optimization, and 2) a reward function with fusing gradients derived from multiple steganalyzers to boost reinforcement learning-based adversarial training. This dual strategy enables the discriminator to learn generalized forensic features while maintaining equilibrium in adversarial training dynamics, ultimately allowing the generator to produce stego images resistant to multiple steganalyzers simultaneously. Comprehensive experiments validate our method’s superiority: When evaluated across five steganalysis networks, including YedNet, CovNet, LWENet, SRNet, and SwT-SN, at 0.1-0.4 bpp payloads, the proposed framework achieves improvements in average detection error rates over state-of-the-art techniques such as SPAR-RL and GMAN. Ablation studies further confirm that MMD regularization contributes significantly to security enhancement.
{"title":"MMDStegNet: An Adversarial Steganography Framework With Maximum Mean Discrepancy Regularization","authors":"Ziwen He;Xingjie Dai;Xiang Zhang;Zhangjie Fu","doi":"10.1109/TCSVT.2025.3585589","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3585589","url":null,"abstract":"Recent advances in steganography leverage generative adversarial networks (GANs) as a robust framework for securing covert communications through adversarial training between stego-generators and steganalytic discriminators. This paradigm facilitates the synthesis of secure steganographic images by harnessing the competition between network components. However, existing GAN-based approaches suffer from asymmetric capacity between generators and discriminators: suboptimally trained discriminators provide inadequate gradient guidance for generator optimization, causing premature convergence and security degradation. To overcome this critical limitation, we propose an enhanced multi-steganalyzer adversarial architecture incorporating maximum mean discrepancy (MMD) regularization. Our framework introduces two key innovations: 1) an MMD-based regularization mechanism mitigating distributional discrepancies among multiple steganalyzers through kernel embedding optimization, and 2) a reward function with fusing gradients derived from multiple steganalyzers to boost reinforcement learning-based adversarial training. This dual strategy enables the discriminator to learn generalized forensic features while maintaining equilibrium in adversarial training dynamics, ultimately allowing the generator to produce stego images resistant to multiple steganalyzers simultaneously. Comprehensive experiments validate our method’s superiority: When evaluated across five steganalysis networks, including YedNet, CovNet, LWENet, SRNet, and SwT-SN, at 0.1-0.4 bpp payloads, the proposed framework achieves improvements in average detection error rates over state-of-the-art techniques such as SPAR-RL and GMAN. Ablation studies further confirm that MMD regularization contributes significantly to security enhancement.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12918-12924"},"PeriodicalIF":11.1,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In video-based point cloud compression (V-PCC), point clouds are projected as videos using a patch projection method and then compressed using video coding techniques. However, the lossy video compression and the down-sampling of occupancy maps (OMs) can lead to geometry compression artifacts, i.e., depth errors and OM errors, respectively. These errors can significantly affect the reconstruction quality of the point clouds. Existing methods can only eliminate one type of error and therefore have limited quality improvement. In this paper, to improve the quality maximally, a multi-task learning-based geometry compression artifact removal method is proposed to reduce both types of errors simultaneously. Considering the differences between the two tasks, the proposed method deals with the challenges of shared feature extraction and heterogeneous objective optimization. First, we propose a context-aware multi-task learning (CAML) model. The proposed CAML model can extract shared features that are context-aware and satisfy both tasks. Second, an improved optimization scheme is presented to train the proposed model. The improved optimization can fix the gradient imbalance of model updating. Cross-validation experiments show that the proposed method saves an average of over 45% Bj$phi $ ntegaard Delta bitrate in terms of the D2 metric.
{"title":"Multi-Task Learning Model for V-PCC Geometry Compression Artifact Removal","authors":"Jian Xiong;Junhao Wu;Wang Luo;Jiu-Cheng Xie;Hui Yuan;Hao Gao","doi":"10.1109/TCSVT.2025.3585554","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3585554","url":null,"abstract":"In video-based point cloud compression (V-PCC), point clouds are projected as videos using a patch projection method and then compressed using video coding techniques. However, the lossy video compression and the down-sampling of occupancy maps (OMs) can lead to geometry compression artifacts, i.e., depth errors and OM errors, respectively. These errors can significantly affect the reconstruction quality of the point clouds. Existing methods can only eliminate one type of error and therefore have limited quality improvement. In this paper, to improve the quality maximally, a multi-task learning-based geometry compression artifact removal method is proposed to reduce both types of errors simultaneously. Considering the differences between the two tasks, the proposed method deals with the challenges of shared feature extraction and heterogeneous objective optimization. First, we propose a context-aware multi-task learning (CAML) model. The proposed CAML model can extract shared features that are context-aware and satisfy both tasks. Second, an improved optimization scheme is presented to train the proposed model. The improved optimization can fix the gradient imbalance of model updating. Cross-validation experiments show that the proposed method saves an average of over 45% Bj<inline-formula> <tex-math>$phi $ </tex-math></inline-formula>ntegaard Delta bitrate in terms of the D2 metric.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12802-12815"},"PeriodicalIF":11.1,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-02DOI: 10.1109/TCSVT.2025.3585355
P. J. Zhou;G. C. Qiao;Q. Yu;M. Chen;Y. C. Wang;Y. C. Chen;J. J. Wang;N. Ning;Y. Liu;S. G. Hu
Edge devices require low power consumption and compact area, which poses challenges for visual signal processing. This work introduces an energy-efficient heterogeneous neuromorphic system-on-chip (SoC) for edge visual computing. The neuromorphic core design incorporates advanced technologies, such as sparse-aware synaptic calculation, partial membrane potential update, non-uniform weight quantization, and partial parallel computing, achieving excellent energy efficiency, computing performance, and area utilization. Twenty neuromorphic cores and twelve multi-mode connected-matrix-based routers form a network-on-chip (NoC) with fullerene-like topology. Its average degree of communication nodes exceeds traditional topologies by 32 % and maintains a minimum degree variance of 0.93, thereby enabling advanced decentralized on-chip communication. Moreover, the NoC can be scaled up through extended off-chip high-level router nodes. At the top layer of the SoC, a RISC-V CPU and a 20-core neuromorphic processor are tightly coupled to form a heterogeneous architecture. Eventually, the chip is fabricated within a 3.41 mm2 die area under 55 nm CMOS technology, achieving a low power density of 0.52 mW/mm2 and a high neuron density of 30.23 K/mm2. Its effectiveness is verified across different visual tasks, with a best energy efficiency of 0.96 pJ/SOP. This work is expected to promote the development of neuromorphic computing in edge visual applications.
{"title":"A 0.96 pJ/SOP Heterogeneous Neuromorphic Chip Toward Energy-Efficient Edge Visual Applications","authors":"P. J. Zhou;G. C. Qiao;Q. Yu;M. Chen;Y. C. Wang;Y. C. Chen;J. J. Wang;N. Ning;Y. Liu;S. G. Hu","doi":"10.1109/TCSVT.2025.3585355","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3585355","url":null,"abstract":"Edge devices require low power consumption and compact area, which poses challenges for visual signal processing. This work introduces an energy-efficient heterogeneous neuromorphic system-on-chip (SoC) for edge visual computing. The neuromorphic core design incorporates advanced technologies, such as sparse-aware synaptic calculation, partial membrane potential update, non-uniform weight quantization, and partial parallel computing, achieving excellent energy efficiency, computing performance, and area utilization. Twenty neuromorphic cores and twelve multi-mode connected-matrix-based routers form a network-on-chip (NoC) with fullerene-like topology. Its average degree of communication nodes exceeds traditional topologies by 32 % and maintains a minimum degree variance of 0.93, thereby enabling advanced decentralized on-chip communication. Moreover, the NoC can be scaled up through extended off-chip high-level router nodes. At the top layer of the SoC, a RISC-V CPU and a 20-core neuromorphic processor are tightly coupled to form a heterogeneous architecture. Eventually, the chip is fabricated within a 3.41 mm<sup>2</sup> die area under 55 nm CMOS technology, achieving a low power density of 0.52 mW/mm<sup>2</sup> and a high neuron density of 30.23 K/mm<sup>2</sup>. Its effectiveness is verified across different visual tasks, with a best energy efficiency of 0.96 pJ/SOP. This work is expected to promote the development of neuromorphic computing in edge visual applications.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12890-12903"},"PeriodicalIF":11.1,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Photometric stereo (PS) methods recover surface normals from appearance changes under varying light directions, excelling in tasks like 3D surface reconstruction and defect inspection. However, collecting the illumination images is expensive, and current PS methods cannot obtain the light direction set that satisfies the pre-defined accuracy constraint, limiting their adaptability to various applications with varying accuracy requirements. To address this issue, we propose the LAC-PS, a light direction selection policy under the accuracy constraint for photometric stereo, which optimizes the light direction set to meet target reconstruction accuracy. In our method, we develop an accuracy assessment network that estimates reconstruction accuracy without ground truth. With this estimated accuracy, we put forward a reinforcement learning-based method that can utilize policy to sequentially select light directions and obtain the light directions satisfying the desired PS recovery accuracy constraint. Experimental results on real and synthetic datasets demonstrate that our method effectively selects light directions that satisfy accuracy constraints.
{"title":"LAC-PS: A Light Direction Selection Policy Under the Accuracy Constraint for Photometric Stereo","authors":"Wenjia Meng;Huimin Han;Xiankai Lu;Yilong Yin;Gang Pan;Qian Zheng","doi":"10.1109/TCSVT.2025.3579572","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3579572","url":null,"abstract":"Photometric stereo (PS) methods recover surface normals from appearance changes under varying light directions, excelling in tasks like 3D surface reconstruction and defect inspection. However, collecting the illumination images is expensive, and current PS methods cannot obtain the light direction set that satisfies the pre-defined accuracy constraint, limiting their adaptability to various applications with varying accuracy requirements. To address this issue, we propose the LAC-PS, a light direction selection policy under the accuracy constraint for photometric stereo, which optimizes the light direction set to meet target reconstruction accuracy. In our method, we develop an accuracy assessment network that estimates reconstruction accuracy without ground truth. With this estimated accuracy, we put forward a reinforcement learning-based method that can utilize policy to sequentially select light directions and obtain the light directions satisfying the desired PS recovery accuracy constraint. Experimental results on real and synthetic datasets demonstrate that our method effectively selects light directions that satisfy accuracy constraints.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 12","pages":"12622-12635"},"PeriodicalIF":11.1,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}