This letter presents a novel latent factorization model for high dimensional and incomplete (HDI) tensor, namely the neural Tucker factorization (NeuTucF), which is a generic neural network-based latent-factorization-of-tensors model under the Tucker decomposition framework. It first interprets the traditional Tucker framework into a neural network with embeddings for different tensor modes. Afterwards, a Tucker interaction layer is innovatively built to accurately represent the complex spatiotemporal feature interactions among different tensor modes. Experiments on real-world datasets demonstrate that the proposed NeuTucF model significantly outperforms several state-of-the-art models in terms of estimation accuracy to missing data in an HDI tensor, owing to its ability of accurately representing an HDI tensor via modeling the complex interaction among different input modes. Interestingly, the results also indicate that our model has a certain level of implicit regularization.
{"title":"Neural Tucker Factorization","authors":"Peng Tang;Xin Luo","doi":"10.1109/JAS.2024.124977","DOIUrl":"https://doi.org/10.1109/JAS.2024.124977","url":null,"abstract":"This letter presents a novel latent factorization model for high dimensional and incomplete (HDI) tensor, namely the neural Tucker factorization (NeuTucF), which is a generic neural network-based latent-factorization-of-tensors model under the Tucker decomposition framework. It first interprets the traditional Tucker framework into a neural network with embeddings for different tensor modes. Afterwards, a Tucker interaction layer is innovatively built to accurately represent the complex spatiotemporal feature interactions among different tensor modes. Experiments on real-world datasets demonstrate that the proposed NeuTucF model significantly outperforms several state-of-the-art models in terms of estimation accuracy to missing data in an HDI tensor, owing to its ability of accurately representing an HDI tensor via modeling the complex interaction among different input modes. Interestingly, the results also indicate that our model has a certain level of implicit regularization.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 2","pages":"475-477"},"PeriodicalIF":15.3,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10846955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the distributed nonconvex optimization problem, where both the global cost function and local inequality constraint function are nonconvex. To tackle this issue, the p-power transformation and penalty function techniques are introduced to reframe the nonconvex optimization problem. This ensures that the Hessian matrix of the augmented Lagrangian function becomes local positive definite by choosing appropriate control parameters. A multi-timescale primal-dual method is then devised based on the Karush-Kuhn-Tucker (KKT) point of the reformulated nonconvex problem to attain convergence. The Lyapunov theory guarantees the model's stability in the presence of an undirected and connected communication network. Finally, two nonconvex optimization problems are presented to demonstrate the efficacy of the previously developed method.
{"title":"Penalty Function-Based Distributed Primal-Dual Algorithm for Nonconvex Optimization Problem","authors":"Xiasheng Shi;Changyin Sun","doi":"10.1109/JAS.2024.124935","DOIUrl":"https://doi.org/10.1109/JAS.2024.124935","url":null,"abstract":"This paper addresses the distributed nonconvex optimization problem, where both the global cost function and local inequality constraint function are nonconvex. To tackle this issue, the p-power transformation and penalty function techniques are introduced to reframe the nonconvex optimization problem. This ensures that the Hessian matrix of the augmented Lagrangian function becomes local positive definite by choosing appropriate control parameters. A multi-timescale primal-dual method is then devised based on the Karush-Kuhn-Tucker (KKT) point of the reformulated nonconvex problem to attain convergence. The Lyapunov theory guarantees the model's stability in the presence of an undirected and connected communication network. Finally, two nonconvex optimization problems are presented to demonstrate the efficacy of the previously developed method.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 2","pages":"394-402"},"PeriodicalIF":15.3,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we propose the Dyna action-dependent heuristic dynamic programming (Dyna-ADHDP) method, which incorporates the ideas of learning and planning from the Dyna framework in action-dependent heuristic dynamic programming. This method defines a continuous action space for precise control of an energy storage system and allows online optimization of algorithm performance during the real-time operation of the residential energy model. Meanwhile, the target network is introduced during the training process to make the training smoother and more efficient. We conducted experimental comparisons with the benchmark method using simulated and real data to verify its applicability and performance. The results confirm the method's excellent performance and generalization capabilities, as well as its excellence in increasing renewable energy utilization and extending equipment life.
{"title":"Residential Energy Scheduling With Solar Energy Based on Dyna Adaptive Dynamic Programming","authors":"Kang Xiong;Qinglai Wei;Hongyang Li","doi":"10.1109/JAS.2024.124809","DOIUrl":"https://doi.org/10.1109/JAS.2024.124809","url":null,"abstract":"Learning-based methods have become mainstream for solving residential energy scheduling problems. In order to improve the learning efficiency of existing methods and increase the utilization of renewable energy, we propose the Dyna action-dependent heuristic dynamic programming (Dyna-ADHDP) method, which incorporates the ideas of learning and planning from the Dyna framework in action-dependent heuristic dynamic programming. This method defines a continuous action space for precise control of an energy storage system and allows online optimization of algorithm performance during the real-time operation of the residential energy model. Meanwhile, the target network is introduced during the training process to make the training smoother and more efficient. We conducted experimental comparisons with the benchmark method using simulated and real data to verify its applicability and performance. The results confirm the method's excellent performance and generalization capabilities, as well as its excellence in increasing renewable energy utilization and extending equipment life.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 2","pages":"403-413"},"PeriodicalIF":15.3,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter proposes a dynamic switching soft slicing strategy for industrial mixed traffic in 5G networks. Considering two types of traffic, periodic delay-sensitive (PDS) traffic and sporadic delay-tolerant (SDT) traffic, we design a dynamic switching strategy based on a traffic-QoS-aware soft slicing (TQASS) scheme and a resource-efficiency-aware soft slicing (REASS) scheme. The proposed strategy ensures the reliability of PDS traffic under delay constraints, while dynamically allocating remaining resources to SDT traffic. Simulation results show that the proposed soft slicing strategy out-performs existing works in meeting the strict QoS requirements of industrial mixed traffic.
{"title":"Soft Resource Slicing for Industrial Mixed Traffic in 5G Networks","authors":"Jingfang Ding;Meng Zheng;Haibin Yu","doi":"10.1109/JAS.2024.124761","DOIUrl":"https://doi.org/10.1109/JAS.2024.124761","url":null,"abstract":"This letter proposes a dynamic switching soft slicing strategy for industrial mixed traffic in 5G networks. Considering two types of traffic, periodic delay-sensitive (PDS) traffic and sporadic delay-tolerant (SDT) traffic, we design a dynamic switching strategy based on a traffic-QoS-aware soft slicing (TQASS) scheme and a resource-efficiency-aware soft slicing (REASS) scheme. The proposed strategy ensures the reliability of PDS traffic under delay constraints, while dynamically allocating remaining resources to SDT traffic. Simulation results show that the proposed soft slicing strategy out-performs existing works in meeting the strict QoS requirements of industrial mixed traffic.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 2","pages":"463-465"},"PeriodicalIF":15.3,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10846933","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dense small objects is challenging. A DI-YOLOv5 object detection algorithm is proposed. Specifically, a dual-wavelet convolution module (DWCM), which contains DWT_Conv and IWT_Conv, is proposed to reduce the loss of feature map information while obtaining feature maps with a large receptive field. The DWT _ Conv and IWT _ Conv can be used as replacements for downsampling and upsampling operations. Moreover, in the process of information transmission to the deep layer, a CSPCoA module is proposed to further capture the location information and information dependencies in different spatial directions. DWCM and CSPCoA are single, generic, plug-and-play units. We propose DI-YOLOv5 with YOLOv5 [1] as the baseline, and extensively evaluate the performance of these two modules on small object detection. Experiments demonstrate that DI-YOLOv5 can effectively improve the accuracy of object detection.
{"title":"DI-YOLOv5: An Improved Dual-Wavelet-Based YOLOv5 for Dense Small Object Detection","authors":"Zi-Xin Li;Yu-Long Wang;Fei Wang","doi":"10.1109/JAS.2024.124368","DOIUrl":"https://doi.org/10.1109/JAS.2024.124368","url":null,"abstract":"This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dense small objects is challenging. A DI-YOLOv5 object detection algorithm is proposed. Specifically, a dual-wavelet convolution module (DWCM), which contains DWT_Conv and IWT_Conv, is proposed to reduce the loss of feature map information while obtaining feature maps with a large receptive field. The DWT _ Conv and IWT _ Conv can be used as replacements for downsampling and upsampling operations. Moreover, in the process of information transmission to the deep layer, a CSPCoA module is proposed to further capture the location information and information dependencies in different spatial directions. DWCM and CSPCoA are single, generic, plug-and-play units. We propose DI-YOLOv5 with YOLOv5 [1] as the baseline, and extensively evaluate the performance of these two modules on small object detection. Experiments demonstrate that DI-YOLOv5 can effectively improve the accuracy of object detection.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 2","pages":"457-459"},"PeriodicalIF":15.3,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10846924","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Xu;Danya Xu;Xinlei Yi;Chao Deng;Tianyou Chai;Tao Yang
In this paper, we study the decentralized federated learning problem, which involves the collaborative training of a global model among multiple devices while ensuring data privacy. In classical federated learning, the communication channel between the devices poses a potential risk of compromising private information. To reduce the risk of adversary eavesdropping in the communication channel, we propose TRADE (transmit difference weight) concept. This concept replaces the decentralized federated learning algorithm's transmitted weight parameters with differential weight parameters, enhancing the privacy data against eavesdropping. Subsequently, by integrating the TRADE concept with the primal-dual stochastic gradient descent (SGD) algorithm, we propose a decentralized TRADE primal-dual SGD algorithm. We demonstrate that our proposed algorithm's convergence properties are the same as those of the primal-dual SGD algorithm while providing enhanced privacy protection. We validate the algorithm's performance on fault diagnosis task using the Case Western Reserve University dataset, and image classification tasks using the CIFAR-10 and CIFAR-100 datasets, revealing model accuracy comparable to centralized federated learning. Additionally, the experiments confirm the algorithm's privacy protection capability.
{"title":"Decentralized Federated Learning Algorithm Under Adversary Eavesdropping","authors":"Lei Xu;Danya Xu;Xinlei Yi;Chao Deng;Tianyou Chai;Tao Yang","doi":"10.1109/JAS.2024.125079","DOIUrl":"https://doi.org/10.1109/JAS.2024.125079","url":null,"abstract":"In this paper, we study the decentralized federated learning problem, which involves the collaborative training of a global model among multiple devices while ensuring data privacy. In classical federated learning, the communication channel between the devices poses a potential risk of compromising private information. To reduce the risk of adversary eavesdropping in the communication channel, we propose TRADE (transmit difference weight) concept. This concept replaces the decentralized federated learning algorithm's transmitted weight parameters with differential weight parameters, enhancing the privacy data against eavesdropping. Subsequently, by integrating the TRADE concept with the primal-dual stochastic gradient descent (SGD) algorithm, we propose a decentralized TRADE primal-dual SGD algorithm. We demonstrate that our proposed algorithm's convergence properties are the same as those of the primal-dual SGD algorithm while providing enhanced privacy protection. We validate the algorithm's performance on fault diagnosis task using the Case Western Reserve University dataset, and image classification tasks using the CIFAR-10 and CIFAR-100 datasets, revealing model accuracy comparable to centralized federated learning. Additionally, the experiments confirm the algorithm's privacy protection capability.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 2","pages":"448-456"},"PeriodicalIF":15.3,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of infrared and visible image fusion (IVIF) is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene. However, existing methods struggle to effectively handle modal disparities, resulting in visual degradation of the details and prominent targets of the fused images. To address these challenges, we introduce PromptFusion, a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts. Firstly, to better characterize the features of different modalities, a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities, thereby improving the extraction of fine details and textures. We also introduce a prompt learning mechanism using positive and negative prompts, leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images, leading to improved performance in downstream tasks. Furthermore, we employ bi-level asymptotic convergence optimization. This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent. Our approach advances the state-of-the-art, delivering superior fusion quality and boosting the performance of related downstream tasks. Project page: https://github.com/hey-it-s-me/PromptFusion.
{"title":"PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion","authors":"Jinyuan Liu;Xingyuan Li;Zirui Wang;Zhiying Jiang;Wei Zhong;Wei Fan;Bin Xu","doi":"10.1109/JAS.2024.124878","DOIUrl":"https://doi.org/10.1109/JAS.2024.124878","url":null,"abstract":"The goal of infrared and visible image fusion (IVIF) is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene. However, existing methods struggle to effectively handle modal disparities, resulting in visual degradation of the details and prominent targets of the fused images. To address these challenges, we introduce PromptFusion, a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts. Firstly, to better characterize the features of different modalities, a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities, thereby improving the extraction of fine details and textures. We also introduce a prompt learning mechanism using positive and negative prompts, leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images, leading to improved performance in downstream tasks. Furthermore, we employ bi-level asymptotic convergence optimization. This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent. Our approach advances the state-of-the-art, delivering superior fusion quality and boosting the performance of related downstream tasks. Project page: https://github.com/hey-it-s-me/PromptFusion.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 3","pages":"502-515"},"PeriodicalIF":15.3,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past few decades, numerous adaptive Kalman filters (AKFs) have been proposed. However, achieving online estimation with both high estimation accuracy and fast convergence speed is challenging, especially when both the process noise and measurement noise covariance matrices are relatively inaccurate. Maximum likelihood estimation (MLE) possesses the potential to achieve this goal, since its theoretical accuracy is guaranteed by asymptotic optimality and the convergence speed is fast due to weak dependence on accurate state estimation. Unfortunately, the maximum likelihood cost function is so intricate that the existing MLE methods can only simply ignore all historical measurement information to achieve online estimation, which cannot adequately realize the potential of MLE. In order to design online MLE-based AKFs with high estimation accuracy and fast convergence speed, an online exploratory MLE approach is proposed, based on which a mini-batch coordinate descent noise covariance matrix estimation framework is developed. In this framework, the maximum likelihood cost function is simplified for online estimation with fewer and simpler terms which are selected in a mini-batch and calculated with a backtracking method. This maximum likelihood cost function is sidestepped and solved by exploring possible estimated noise covariance matrices adaptively while the historical measurement information is adequately utilized. Furthermore, four specific algorithms are derived under this framework to meet different practical requirements in terms of convergence speed, estimation accuracy, and calculation load. Abundant simulations and experiments are carried out to verify the validity and superiority of the proposed algorithms as compared with existing state-of-the-art AKFs.
{"title":"An Online Exploratory Maximum Likelihood Estimation Approach to Adaptive Kalman Filtering","authors":"Jiajun Cheng;Haonan Chen;Zhirui Xue;Yulong Huang;Yonggang Zhang","doi":"10.1109/JAS.2024.125001","DOIUrl":"https://doi.org/10.1109/JAS.2024.125001","url":null,"abstract":"Over the past few decades, numerous adaptive Kalman filters (AKFs) have been proposed. However, achieving online estimation with both high estimation accuracy and fast convergence speed is challenging, especially when both the process noise and measurement noise covariance matrices are relatively inaccurate. Maximum likelihood estimation (MLE) possesses the potential to achieve this goal, since its theoretical accuracy is guaranteed by asymptotic optimality and the convergence speed is fast due to weak dependence on accurate state estimation. Unfortunately, the maximum likelihood cost function is so intricate that the existing MLE methods can only simply ignore all historical measurement information to achieve online estimation, which cannot adequately realize the potential of MLE. In order to design online MLE-based AKFs with high estimation accuracy and fast convergence speed, an online exploratory MLE approach is proposed, based on which a mini-batch coordinate descent noise covariance matrix estimation framework is developed. In this framework, the maximum likelihood cost function is simplified for online estimation with fewer and simpler terms which are selected in a mini-batch and calculated with a backtracking method. This maximum likelihood cost function is sidestepped and solved by exploring possible estimated noise covariance matrices adaptively while the historical measurement information is adequately utilized. Furthermore, four specific algorithms are derived under this framework to meet different practical requirements in terms of convergence speed, estimation accuracy, and calculation load. Abundant simulations and experiments are carried out to verify the validity and superiority of the proposed algorithms as compared with existing state-of-the-art AKFs.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 1","pages":"228-254"},"PeriodicalIF":15.3,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dear Editor, This letter is concerned with the problem of time-varying formation tracking for heterogeneous multiagent systems (MASs) under directed switching networks. For this purpose, our first step is to present some sufficient conditions for the exponential stability of a particular category of switched systems. Then, we apply the theoretical results to design a distributed observer for reference leader under directed switching topologies. Based on the above designed observer, a novel event-triggered distributed control protocol is proposed for each follower to achieve the desired formation. Finally, we demonstrate the effectiveness of our proposed results through numerical simulations.
{"title":"Time-Varying Formation Tracking Control of Heterogeneous Multi-Agent Systems With Intermittent Communications and Directed Switching Networks","authors":"Yuhan Wang;Zhuping Wang;Hao Zhang;Huaicheng Yan","doi":"10.1109/JAS.2023.123924","DOIUrl":"https://doi.org/10.1109/JAS.2023.123924","url":null,"abstract":"Dear Editor, This letter is concerned with the problem of time-varying formation tracking for heterogeneous multiagent systems (MASs) under directed switching networks. For this purpose, our first step is to present some sufficient conditions for the exponential stability of a particular category of switched systems. Then, we apply the theoretical results to design a distributed observer for reference leader under directed switching topologies. Based on the above designed observer, a novel event-triggered distributed control protocol is proposed for each follower to achieve the desired formation. Finally, we demonstrate the effectiveness of our proposed results through numerical simulations.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 1","pages":"294-296"},"PeriodicalIF":15.3,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10815011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Capturing high-fidelity normals from single face images plays a core role in numerous computer vision and graphics applications. Though significant progress has been made in recent years, how to effectively and efficiently explore normal priors remains challenging. Most existing approaches depend on the development of intricate network architectures and complex calculations for in-the-wild face images. To overcome the above issue, we propose a simple yet effective cascaded neural network, called Cas-Fne, which progressively boosts the quality of predicted normals with marginal model parameters and computational cost. Meanwhile, it can mitigate the imbalance issue between training data and real-world face images due to the progressive refinement mechanism, and thus boost the generalization ability of the model. Specifically, in the training phase, our model relies solely on a small amount of labeled data. The earlier prediction serves as guidance for following refinement. In addition, our shared-parameter cascaded block employs a recurrent mechanism, allowing it to be applied multiple times for optimization without increasing network parameters. Quantitative and qualitative evaluations on benchmark datasets are conducted to show that our Cas-FNE can faithfully maintain facial details and reveal its superiority over state-of-the-art methods. The code is available at https://github.com/AutoHDR/CasFNE.git.
{"title":"Cas-FNE: Cascaded Face Normal Estimation","authors":"Meng Wang;Jiawan Zhang;Jiayi Ma;Xiaojie Guo","doi":"10.1109/JAS.2024.124899","DOIUrl":"https://doi.org/10.1109/JAS.2024.124899","url":null,"abstract":"Capturing high-fidelity normals from single face images plays a core role in numerous computer vision and graphics applications. Though significant progress has been made in recent years, how to effectively and efficiently explore normal priors remains challenging. Most existing approaches depend on the development of intricate network architectures and complex calculations for in-the-wild face images. To overcome the above issue, we propose a simple yet effective cascaded neural network, called Cas-Fne, which progressively boosts the quality of predicted normals with marginal model parameters and computational cost. Meanwhile, it can mitigate the imbalance issue between training data and real-world face images due to the progressive refinement mechanism, and thus boost the generalization ability of the model. Specifically, in the training phase, our model relies solely on a small amount of labeled data. The earlier prediction serves as guidance for following refinement. In addition, our shared-parameter cascaded block employs a recurrent mechanism, allowing it to be applied multiple times for optimization without increasing network parameters. Quantitative and qualitative evaluations on benchmark datasets are conducted to show that our Cas-FNE can faithfully maintain facial details and reveal its superiority over state-of-the-art methods. The code is available at https://github.com/AutoHDR/CasFNE.git.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 12","pages":"2423-2434"},"PeriodicalIF":15.3,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}