Pub Date : 2026-02-09DOI: 10.1109/tnnls.2026.3656889
Mengyun Wang, Yifeng Niu, Bo Wang, Wei Zhang, Chang Wang
{"title":"A Survey on Learning Motion Planning and Control for Mobile Robots: Toward Embodied Intelligence","authors":"Mengyun Wang, Yifeng Niu, Bo Wang, Wei Zhang, Chang Wang","doi":"10.1109/tnnls.2026.3656889","DOIUrl":"https://doi.org/10.1109/tnnls.2026.3656889","url":null,"abstract":"","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"35 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146146043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1109/tnnls.2026.3657138
Basit Alawode, Iyyakutti Iyappan Ganapathi, Sajid Javed, Mohammed Bennamoun, Arif Mahmood
{"title":"AquaticCLIP: A Vision-Language Foundation Model and Dataset for Underwater Scene Analysis","authors":"Basit Alawode, Iyyakutti Iyappan Ganapathi, Sajid Javed, Mohammed Bennamoun, Arif Mahmood","doi":"10.1109/tnnls.2026.3657138","DOIUrl":"https://doi.org/10.1109/tnnls.2026.3657138","url":null,"abstract":"","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"161 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146146044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TNNLS.2026.3655172
Yiming Shi, Yujia Wu, Jiwei Wei, Ran Ran, Chengwei Sun, Shiyuan He, Yang Yang
The rapid growth of model scale has necessitated substantial computational resources for fine-tuning. Existing approach such as low-rank adaptation (LoRA) has sought to address the problem of handling the large updated parameters in full fine-tuning (FT). However, LoRA utilize random initialization and optimization of low-rank matrices to approximate updated weights, which can result in suboptimal convergence and an accuracy gap compared to full fine-tuning (FT). To address these issues, we propose low-rank LDU (LoLDU), a parameter-efficient fine-tuning (PEFT) approach that significantly reduces trainable parameters by 2600 times compared to regular PEFT methods while maintaining comparable performance. LoLDU leverages lower-diag-upper (LDU) decomposition to initialize low-rank matrices for faster convergence and nonsingularity. We focus on optimizing the diagonal matrix for scaling transformations. To the best of our knowledge, LoLDU has the fewest parameters among all PEFT approaches. We conducted extensive experiments across 4 instruction-following datasets, six natural language understanding (NLU) datasets, eight image classification datasets, and image generation datasets with multiple model types [LLaMA2, RoBERTa, ViT, and stable diffusion (SD)], providing a comprehensive and detailed analysis. Our open-source code can be accessed at https://anonymous.4open.science/r/LoLDU-B5A6.
{"title":"LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning.","authors":"Yiming Shi, Yujia Wu, Jiwei Wei, Ran Ran, Chengwei Sun, Shiyuan He, Yang Yang","doi":"10.1109/TNNLS.2026.3655172","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3655172","url":null,"abstract":"<p><p>The rapid growth of model scale has necessitated substantial computational resources for fine-tuning. Existing approach such as low-rank adaptation (LoRA) has sought to address the problem of handling the large updated parameters in full fine-tuning (FT). However, LoRA utilize random initialization and optimization of low-rank matrices to approximate updated weights, which can result in suboptimal convergence and an accuracy gap compared to full fine-tuning (FT). To address these issues, we propose low-rank LDU (LoLDU), a parameter-efficient fine-tuning (PEFT) approach that significantly reduces trainable parameters by 2600 times compared to regular PEFT methods while maintaining comparable performance. LoLDU leverages lower-diag-upper (LDU) decomposition to initialize low-rank matrices for faster convergence and nonsingularity. We focus on optimizing the diagonal matrix for scaling transformations. To the best of our knowledge, LoLDU has the fewest parameters among all PEFT approaches. We conducted extensive experiments across 4 instruction-following datasets, six natural language understanding (NLU) datasets, eight image classification datasets, and image generation datasets with multiple model types [LLaMA2, RoBERTa, ViT, and stable diffusion (SD)], providing a comprehensive and detailed analysis. Our open-source code can be accessed at https://anonymous.4open.science/r/LoLDU-B5A6.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TNNLS.2026.3658584
Chien-Yu Chiou, Chun-Rong Huang, Lawrence L Latour, Yang C Fann, Pau-Choo Chung
In federated learning (FL), heterogeneous and client-specific data distributions cause a domain-shift problem, which leads to divergent local models and degraded global performance. To address this problem, this study proposes a class- and domain-aware FL framework that decouples and collaboratively learns the domain-invariant and domain-specific representations. During client training, a novel cross-gated feature separation (CGFS) module is employed to separate the domain features from the class features. A heterogeneous prototype contrastive learning (HPCL) module is then used to guide the learning of the class features and domain features with good discriminability within each feature space. Finally, during server aggregation, a gradient-reweighted hierarchical aggregation (GHA) strategy is applied to effectively aggregate information from all the clients and build a global model with good robustness to domain variation. The experimental results obtained on two FL datasets with domain shift show that the proposed method consistently outperforms state-of-the-art approaches.
{"title":"Using Class and Domain Information to Address Domain Shift in Federated Learning.","authors":"Chien-Yu Chiou, Chun-Rong Huang, Lawrence L Latour, Yang C Fann, Pau-Choo Chung","doi":"10.1109/TNNLS.2026.3658584","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3658584","url":null,"abstract":"<p><p>In federated learning (FL), heterogeneous and client-specific data distributions cause a domain-shift problem, which leads to divergent local models and degraded global performance. To address this problem, this study proposes a class- and domain-aware FL framework that decouples and collaboratively learns the domain-invariant and domain-specific representations. During client training, a novel cross-gated feature separation (CGFS) module is employed to separate the domain features from the class features. A heterogeneous prototype contrastive learning (HPCL) module is then used to guide the learning of the class features and domain features with good discriminability within each feature space. Finally, during server aggregation, a gradient-reweighted hierarchical aggregation (GHA) strategy is applied to effectively aggregate information from all the clients and build a global model with good robustness to domain variation. The experimental results obtained on two FL datasets with domain shift show that the proposed method consistently outperforms state-of-the-art approaches.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable performance with theoretical policy improvement guarantees, high variance and high sample complexity still remain critical challenges in on-policy algorithms. To alleviate these issues, we propose a hybrid-policy PPO (HP3O), which utilizes a trajectory replay buffer to make efficient use of trajectories generated by recent policies. Particularly, the buffer applies the "first in, firstout" (FIFO) strategy so as to keep only the recent trajectories to attenuate the data distribution drift. A batch consisting of the trajectory with the best return and other randomly sampled ones from the buffer is used for updating the policy networks. The strategy helps the agent to improve its capability on top of the most recent best performance and, in turn, reduce variance empirically. We theoretically construct the policy improvement guarantees for the proposed algorithm. HP3O is validated and compared against several baseline algorithms using multiple continuous control environments. Our code is available at https://anonymous.4open.science/r/HP30-EB61/HP3O_train.py.
{"title":"Enhancing PPO With Trajectory-Aware Hybrid Policies.","authors":"Qisai Liu, Zhanhong Jiang, Hsin-Jung Yang, Mahsa Khosravi, Joshua R Waite, Soumik Sarkar","doi":"10.1109/TNNLS.2025.3641531","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3641531","url":null,"abstract":"<p><p>Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable performance with theoretical policy improvement guarantees, high variance and high sample complexity still remain critical challenges in on-policy algorithms. To alleviate these issues, we propose a hybrid-policy PPO (HP3O), which utilizes a trajectory replay buffer to make efficient use of trajectories generated by recent policies. Particularly, the buffer applies the \"first in, firstout\" (FIFO) strategy so as to keep only the recent trajectories to attenuate the data distribution drift. A batch consisting of the trajectory with the best return and other randomly sampled ones from the buffer is used for updating the policy networks. The strategy helps the agent to improve its capability on top of the most recent best performance and, in turn, reduce variance empirically. We theoretically construct the policy improvement guarantees for the proposed algorithm. HP3O is validated and compared against several baseline algorithms using multiple continuous control environments. Our code is available at https://anonymous.4open.science/r/HP30-EB61/HP3O_train.py.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TNNLS.2026.3656757
Huasheng Wang, Jiang Liu, Hongchen Tan, Jianxun Lou, Xiaochang Liu, Wei Zhou, Ying Chen, Roger Whitaker, Walter Colombo, Hantao Liu
No-reference image quality assessment (NR-IQA) aims to quantitatively measure human perception of visual quality without comparing a distorted image to a reference. Despite recent advances, existing NR-IQR approaches often demonstrate insufficient ability to capture perceptual cues in the absence of a reference, limiting their generalisability across diverse and complex real-world image degradations. These limitations hinder their ability to match the reliability of full-reference IQA (FR-IQA) counterparts. A key challenge, therefore, is to enable NR-IQA models to emulate the reference-aware reasoning exhibited by humans and FR-IQA methods. To address this challenge, we propose a novel NR-IQA model based on a knowledge-sharing (KS) strategy to simulate this capability and predict image quality more effectively. Specifically, we designate an FR-IQA model as the teacher and an NR-IQA model as the student. Unlike conventional knowledge distillation (KD), our proposed architecture enables the NR-IQA student and FR-IQA teacher to share a decoder rather than being independent models. Furthermore, the student model contains a Mental Imagery Generation (MIG) module to learn mental imagery as the reference. To fully exploit local and global information, we adopt a vision transformer (ViT) branch and a convolutional neural network branch for feature extraction (FE). Finally, a quality-aware regressor (QAR) combined with deep ordinal regression is constructed to infer the quality score. Experiments show that our proposed NR-IQA model, KSIQA, has class-leading performance against current no-reference (NR) techniques across widespread benchmark datasets.
{"title":"KSIQA: A Knowledge-Sharing Model for No-Reference Image Quality Assessment.","authors":"Huasheng Wang, Jiang Liu, Hongchen Tan, Jianxun Lou, Xiaochang Liu, Wei Zhou, Ying Chen, Roger Whitaker, Walter Colombo, Hantao Liu","doi":"10.1109/TNNLS.2026.3656757","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3656757","url":null,"abstract":"<p><p>No-reference image quality assessment (NR-IQA) aims to quantitatively measure human perception of visual quality without comparing a distorted image to a reference. Despite recent advances, existing NR-IQR approaches often demonstrate insufficient ability to capture perceptual cues in the absence of a reference, limiting their generalisability across diverse and complex real-world image degradations. These limitations hinder their ability to match the reliability of full-reference IQA (FR-IQA) counterparts. A key challenge, therefore, is to enable NR-IQA models to emulate the reference-aware reasoning exhibited by humans and FR-IQA methods. To address this challenge, we propose a novel NR-IQA model based on a knowledge-sharing (KS) strategy to simulate this capability and predict image quality more effectively. Specifically, we designate an FR-IQA model as the teacher and an NR-IQA model as the student. Unlike conventional knowledge distillation (KD), our proposed architecture enables the NR-IQA student and FR-IQA teacher to share a decoder rather than being independent models. Furthermore, the student model contains a Mental Imagery Generation (MIG) module to learn mental imagery as the reference. To fully exploit local and global information, we adopt a vision transformer (ViT) branch and a convolutional neural network branch for feature extraction (FE). Finally, a quality-aware regressor (QAR) combined with deep ordinal regression is constructed to infer the quality score. Experiments show that our proposed NR-IQA model, KSIQA, has class-leading performance against current no-reference (NR) techniques across widespread benchmark datasets.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TNNLS.2026.3655749
Yinglan Lv, Qiyu Chen, Xiangbo Lin, Jianwen Li, Wenbin Bai, Yi Sun
Dexterous grasping and manipulation with multifingered robotic hands presents a significant challenge due to their high degrees of freedom and the need for task-specific adaptations. Existing methods usually adopt single-task learning framework or focus on simple stable wrap grasping, limiting their efficiency and generalization ability when encountering new task or precise functional grasping pose. In this article, we introduce MetaGrasp, a novel approach that defines dexterous functional grasping as a multitask reinforcement learning (RL) problem based on hand grasp pose classification. Our method features a unique gradual skill curriculum learning (GSCL) framework, which structures the learning process into three stages: beginner, intermediate, and advanced curriculum learning according to the level of difficulty. MetaGrasp leverages this hierarchical learning structure to develop a versatile, adaptive grasping policy that can grasp objects based on hand grasp pose and object point cloud inputs. Taking five hand grasp types as research cases, the trained policy with our MetaGrasp can be easily adpated to grasp different object instances from different object categories according to functional grasp intentions specified by one expert demonstration without requiring extensive system interaction. We categorize the dexterous functional grasping tasks of a five-fingered robotic hand into multiple tasks based on hand poses for RL, and to combine meta imitation learning (IL) with curriculum learning. The experimental results show that the MetaGrasp has better one-shot generalization ability on new grasp tasks, and outperforms state-of-the-art single-task dexterous grasping methods.
{"title":"MetaGrasp: Generalizable Dexterous Multifingered Functional Grasping With Gradual Skill Curriculum Learning.","authors":"Yinglan Lv, Qiyu Chen, Xiangbo Lin, Jianwen Li, Wenbin Bai, Yi Sun","doi":"10.1109/TNNLS.2026.3655749","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3655749","url":null,"abstract":"<p><p>Dexterous grasping and manipulation with multifingered robotic hands presents a significant challenge due to their high degrees of freedom and the need for task-specific adaptations. Existing methods usually adopt single-task learning framework or focus on simple stable wrap grasping, limiting their efficiency and generalization ability when encountering new task or precise functional grasping pose. In this article, we introduce MetaGrasp, a novel approach that defines dexterous functional grasping as a multitask reinforcement learning (RL) problem based on hand grasp pose classification. Our method features a unique gradual skill curriculum learning (GSCL) framework, which structures the learning process into three stages: beginner, intermediate, and advanced curriculum learning according to the level of difficulty. MetaGrasp leverages this hierarchical learning structure to develop a versatile, adaptive grasping policy that can grasp objects based on hand grasp pose and object point cloud inputs. Taking five hand grasp types as research cases, the trained policy with our MetaGrasp can be easily adpated to grasp different object instances from different object categories according to functional grasp intentions specified by one expert demonstration without requiring extensive system interaction. We categorize the dexterous functional grasping tasks of a five-fingered robotic hand into multiple tasks based on hand poses for RL, and to combine meta imitation learning (IL) with curriculum learning. The experimental results show that the MetaGrasp has better one-shot generalization ability on new grasp tasks, and outperforms state-of-the-art single-task dexterous grasping methods.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TNNLS.2026.3656756
Thanh-Dung Le, Clara Macabiau, Kevin Albert, Symeon Chatzinotas, Philippe Jouvet, Rita Noumeir
This study delves into the effectiveness of various learning methods in improving Transformer models, focusing mainly on the Gated Residual Network (GRN) Transformer in the context of pediatric intensive care units (PICUs) with limited data availability. Our findings indicate that Transformers trained via supervised learning are less effective than MLP, CNN, and LSTM networks in such environments. Yet, leveraging unsupervised and self-supervised learning (SSL) on unannotated data, with subsequent fine-tuning on annotated data, notably enhances Transformer performance, although not to the level of the GRN-Transformer. Central to our research is analyzing different activation functions for the gated linear unit (GLU), a crucial element of the GRN structure. We also employ Mutual Information Neural Estimation (MINE) to evaluate the GRN's contribution. Additionally, the study examines the effects of integrating GRN within the Transformer's attention mechanism versus using it as a separate intermediary layer. Our results highlight that GLU with sigmoid activation stands out, achieving 0.98 accuracy, 0.91 precision, 0.96 recall, and $0.94~F1$ -score. The MINE analysis supports the hypothesis that GRN enhances the mutual information (MI) between the hidden representations and the output. Moreover, using GRN as an intermediate filter layer proves more beneficial than incorporating it within the Attention mechanism. This study clarifies how GRN boosters GRN-Transformer's performance surpasses other techniques. These findings offer a promising avenue for adopting sophisticated models like Transformers in data-constrained environments, such as PPG artifact detection in PICU settings.
{"title":"Transformer Meets Gated Residual Networks to Enhance PICU's PPG Artifact Detection Informed by Mutual Information Neural Estimation.","authors":"Thanh-Dung Le, Clara Macabiau, Kevin Albert, Symeon Chatzinotas, Philippe Jouvet, Rita Noumeir","doi":"10.1109/TNNLS.2026.3656756","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3656756","url":null,"abstract":"<p><p>This study delves into the effectiveness of various learning methods in improving Transformer models, focusing mainly on the Gated Residual Network (GRN) Transformer in the context of pediatric intensive care units (PICUs) with limited data availability. Our findings indicate that Transformers trained via supervised learning are less effective than MLP, CNN, and LSTM networks in such environments. Yet, leveraging unsupervised and self-supervised learning (SSL) on unannotated data, with subsequent fine-tuning on annotated data, notably enhances Transformer performance, although not to the level of the GRN-Transformer. Central to our research is analyzing different activation functions for the gated linear unit (GLU), a crucial element of the GRN structure. We also employ Mutual Information Neural Estimation (MINE) to evaluate the GRN's contribution. Additionally, the study examines the effects of integrating GRN within the Transformer's attention mechanism versus using it as a separate intermediary layer. Our results highlight that GLU with sigmoid activation stands out, achieving 0.98 accuracy, 0.91 precision, 0.96 recall, and $0.94~F1$ -score. The MINE analysis supports the hypothesis that GRN enhances the mutual information (MI) between the hidden representations and the output. Moreover, using GRN as an intermediate filter layer proves more beneficial than incorporating it within the Attention mechanism. This study clarifies how GRN boosters GRN-Transformer's performance surpasses other techniques. These findings offer a promising avenue for adopting sophisticated models like Transformers in data-constrained environments, such as PPG artifact detection in PICU settings.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TNNLS.2026.3656642
Mohd Tasleem Khan, Yuan Ding, George Goussetis
The computational demands of next-generation (Next-Gen) communication systems pose major challenges for real-time signal processing, particularly in digital predistortion (DPD), which is essential for linearizing power amplifier (PA) nonlinearities. While traditional DPD methods-such as polynomial and Volterra series models-remain prevalent, neural network (NN)-based approaches offer superior modeling accuracy and adaptability. However, their deployment is hindered by high computational complexity, limited scalability, and hardware integration challenges. This review presents a comprehensive analysis of NN-based DPD techniques and hardware acceleration strategies for efficient real-time implementation. We assess the strengths of various NN architectures-deep, convolutional, recurrent, and hybrid-and evaluate their tradeoffs across graphics processing unit (GPU), field-programmable gate arrays (FPGA), and application-specific integrated circuits (ASIC) platforms. We also examine key challenges, including fragmented evaluation standards and limited real-world validation. Finally, we outline future directions emphasizing model-hardware codesign, reconfigurable computing, and on-chip learning to enable scalable, energy-efficient DPD for 5G, 6G, and beyond.
{"title":"Next-Gen Digital Predistortion From Hardware Acceleration of Neural Networks: Trends, Challenges, and Future.","authors":"Mohd Tasleem Khan, Yuan Ding, George Goussetis","doi":"10.1109/TNNLS.2026.3656642","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3656642","url":null,"abstract":"<p><p>The computational demands of next-generation (Next-Gen) communication systems pose major challenges for real-time signal processing, particularly in digital predistortion (DPD), which is essential for linearizing power amplifier (PA) nonlinearities. While traditional DPD methods-such as polynomial and Volterra series models-remain prevalent, neural network (NN)-based approaches offer superior modeling accuracy and adaptability. However, their deployment is hindered by high computational complexity, limited scalability, and hardware integration challenges. This review presents a comprehensive analysis of NN-based DPD techniques and hardware acceleration strategies for efficient real-time implementation. We assess the strengths of various NN architectures-deep, convolutional, recurrent, and hybrid-and evaluate their tradeoffs across graphics processing unit (GPU), field-programmable gate arrays (FPGA), and application-specific integrated circuits (ASIC) platforms. We also examine key challenges, including fragmented evaluation standards and limited real-world validation. Finally, we outline future directions emphasizing model-hardware codesign, reconfigurable computing, and on-chip learning to enable scalable, energy-efficient DPD for 5G, 6G, and beyond.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The whole-body pose estimation task aims to predict the location of keypoints of the face, body, hands, and feet given an image. However, scale variation in different parts of the human body and semantic ambiguity in small-scale parts cause performance degradation in keypoint localization. The traditional paradigm for solving multiscale issues is to construct multiscale feature representations. Nevertheless, multiscale features extracted from visual images do not eliminate the semantic ambiguity issue in the small-scale part. In this article, we propose affiliation alignment network (A2Net), which solves the aforementioned problem by alignment of vision-language hierarchical affiliations. Specifically, text modality has the advantage of not being affected by the scaling problem and the small-scale semantic ambiguity problem, which is due to image scale variations. We construct a multisemantic hierarchical language latent space with clear semantic and affiliation relations by designing Text Affiliation Injection operations. Subsequently, we adopt the optimal transport (OT) method to align image features of different scales with text features of the corresponding hierarchical levels to build an image scale-independent visual-language latent space, which overcomes the image scale problem and the small-scale semantic ambiguity problem. Extensive experimental results on two whole-body pose estimation datasets show that our model achieves convincing performance compared to the current state-of-the-art methods. The code is openly available at https://github.com/LingLin-ll/A2Net.
{"title":"A2Net: Affiliation Alignment Networks for Whole-Body Pose Estimation With Vision-Language Models.","authors":"Ling Lin, Yaoxing Wang, Congcong Zhu, Jingrun Chen","doi":"10.1109/TNNLS.2026.3656293","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3656293","url":null,"abstract":"<p><p>The whole-body pose estimation task aims to predict the location of keypoints of the face, body, hands, and feet given an image. However, scale variation in different parts of the human body and semantic ambiguity in small-scale parts cause performance degradation in keypoint localization. The traditional paradigm for solving multiscale issues is to construct multiscale feature representations. Nevertheless, multiscale features extracted from visual images do not eliminate the semantic ambiguity issue in the small-scale part. In this article, we propose affiliation alignment network (A2Net), which solves the aforementioned problem by alignment of vision-language hierarchical affiliations. Specifically, text modality has the advantage of not being affected by the scaling problem and the small-scale semantic ambiguity problem, which is due to image scale variations. We construct a multisemantic hierarchical language latent space with clear semantic and affiliation relations by designing Text Affiliation Injection operations. Subsequently, we adopt the optimal transport (OT) method to align image features of different scales with text features of the corresponding hierarchical levels to build an image scale-independent visual-language latent space, which overcomes the image scale problem and the small-scale semantic ambiguity problem. Extensive experimental results on two whole-body pose estimation datasets show that our model achieves convincing performance compared to the current state-of-the-art methods. The code is openly available at https://github.com/LingLin-ll/A2Net.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}