首页 > 最新文献

IEEE transactions on neural networks and learning systems最新文献

英文 中文
A Survey on Learning Motion Planning and Control for Mobile Robots: Toward Embodied Intelligence 移动机器人运动规划与控制学习综述:面向具身智能
IF 10.4 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-09 DOI: 10.1109/tnnls.2026.3656889
Mengyun Wang, Yifeng Niu, Bo Wang, Wei Zhang, Chang Wang
{"title":"A Survey on Learning Motion Planning and Control for Mobile Robots: Toward Embodied Intelligence","authors":"Mengyun Wang, Yifeng Niu, Bo Wang, Wei Zhang, Chang Wang","doi":"10.1109/tnnls.2026.3656889","DOIUrl":"https://doi.org/10.1109/tnnls.2026.3656889","url":null,"abstract":"","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"35 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146146043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AquaticCLIP: A Vision-Language Foundation Model and Dataset for Underwater Scene Analysis aquaticlip:用于水下场景分析的视觉语言基础模型和数据集
IF 10.4 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-09 DOI: 10.1109/tnnls.2026.3657138
Basit Alawode, Iyyakutti Iyappan Ganapathi, Sajid Javed, Mohammed Bennamoun, Arif Mahmood
{"title":"AquaticCLIP: A Vision-Language Foundation Model and Dataset for Underwater Scene Analysis","authors":"Basit Alawode, Iyyakutti Iyappan Ganapathi, Sajid Javed, Mohammed Bennamoun, Arif Mahmood","doi":"10.1109/tnnls.2026.3657138","DOIUrl":"https://doi.org/10.1109/tnnls.2026.3657138","url":null,"abstract":"","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"161 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146146044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning. 基于Lower-Diag-Upper分解的低秩自适应参数高效微调。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1109/TNNLS.2026.3655172
Yiming Shi, Yujia Wu, Jiwei Wei, Ran Ran, Chengwei Sun, Shiyuan He, Yang Yang

The rapid growth of model scale has necessitated substantial computational resources for fine-tuning. Existing approach such as low-rank adaptation (LoRA) has sought to address the problem of handling the large updated parameters in full fine-tuning (FT). However, LoRA utilize random initialization and optimization of low-rank matrices to approximate updated weights, which can result in suboptimal convergence and an accuracy gap compared to full fine-tuning (FT). To address these issues, we propose low-rank LDU (LoLDU), a parameter-efficient fine-tuning (PEFT) approach that significantly reduces trainable parameters by 2600 times compared to regular PEFT methods while maintaining comparable performance. LoLDU leverages lower-diag-upper (LDU) decomposition to initialize low-rank matrices for faster convergence and nonsingularity. We focus on optimizing the diagonal matrix for scaling transformations. To the best of our knowledge, LoLDU has the fewest parameters among all PEFT approaches. We conducted extensive experiments across 4 instruction-following datasets, six natural language understanding (NLU) datasets, eight image classification datasets, and image generation datasets with multiple model types [LLaMA2, RoBERTa, ViT, and stable diffusion (SD)], providing a comprehensive and detailed analysis. Our open-source code can be accessed at https://anonymous.4open.science/r/LoLDU-B5A6.

模型规模的快速增长需要大量的计算资源进行微调。现有的方法,如低秩自适应(LoRA),试图解决在全微调(FT)中处理大更新参数的问题。然而,LoRA利用低秩矩阵的随机初始化和优化来近似更新的权重,这可能导致次优收敛和与完全微调(FT)相比的精度差距。为了解决这些问题,我们提出了低秩LDU (LoLDU),这是一种参数高效微调(PEFT)方法,与常规PEFT方法相比,它将可训练参数显著减少了2600倍,同时保持了相当的性能。LoLDU利用lower-diag-upper (LDU)分解来初始化低秩矩阵,以实现更快的收敛和非奇异性。我们专注于优化对角矩阵的缩放变换。据我们所知,在所有PEFT方法中,LoLDU具有最少的参数。我们在4个指令跟随数据集、6个自然语言理解(NLU)数据集、8个图像分类数据集和多个模型类型的图像生成数据集[LLaMA2、RoBERTa、ViT和稳定扩散(SD)]上进行了广泛的实验,提供了全面而详细的分析。我们的开源代码可以在https://anonymous.4open.science/r/LoLDU-B5A6上访问。
{"title":"LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning.","authors":"Yiming Shi, Yujia Wu, Jiwei Wei, Ran Ran, Chengwei Sun, Shiyuan He, Yang Yang","doi":"10.1109/TNNLS.2026.3655172","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3655172","url":null,"abstract":"<p><p>The rapid growth of model scale has necessitated substantial computational resources for fine-tuning. Existing approach such as low-rank adaptation (LoRA) has sought to address the problem of handling the large updated parameters in full fine-tuning (FT). However, LoRA utilize random initialization and optimization of low-rank matrices to approximate updated weights, which can result in suboptimal convergence and an accuracy gap compared to full fine-tuning (FT). To address these issues, we propose low-rank LDU (LoLDU), a parameter-efficient fine-tuning (PEFT) approach that significantly reduces trainable parameters by 2600 times compared to regular PEFT methods while maintaining comparable performance. LoLDU leverages lower-diag-upper (LDU) decomposition to initialize low-rank matrices for faster convergence and nonsingularity. We focus on optimizing the diagonal matrix for scaling transformations. To the best of our knowledge, LoLDU has the fewest parameters among all PEFT approaches. We conducted extensive experiments across 4 instruction-following datasets, six natural language understanding (NLU) datasets, eight image classification datasets, and image generation datasets with multiple model types [LLaMA2, RoBERTa, ViT, and stable diffusion (SD)], providing a comprehensive and detailed analysis. Our open-source code can be accessed at https://anonymous.4open.science/r/LoLDU-B5A6.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Class and Domain Information to Address Domain Shift in Federated Learning. 用类和领域信息解决联邦学习中的领域转移问题。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1109/TNNLS.2026.3658584
Chien-Yu Chiou, Chun-Rong Huang, Lawrence L Latour, Yang C Fann, Pau-Choo Chung

In federated learning (FL), heterogeneous and client-specific data distributions cause a domain-shift problem, which leads to divergent local models and degraded global performance. To address this problem, this study proposes a class- and domain-aware FL framework that decouples and collaboratively learns the domain-invariant and domain-specific representations. During client training, a novel cross-gated feature separation (CGFS) module is employed to separate the domain features from the class features. A heterogeneous prototype contrastive learning (HPCL) module is then used to guide the learning of the class features and domain features with good discriminability within each feature space. Finally, during server aggregation, a gradient-reweighted hierarchical aggregation (GHA) strategy is applied to effectively aggregate information from all the clients and build a global model with good robustness to domain variation. The experimental results obtained on two FL datasets with domain shift show that the proposed method consistently outperforms state-of-the-art approaches.

在联邦学习(FL)中,异构和特定于客户端的数据分布会导致域转移问题,从而导致局部模型的分歧和全局性能的降低。为了解决这个问题,本研究提出了一个类和领域感知的FL框架,该框架可以解耦并协同学习领域不变和领域特定的表示。在客户端训练过程中,采用了一种新型的交叉门控特征分离(CGFS)模块来分离领域特征和类特征。然后利用异构原型对比学习(HPCL)模块,指导在每个特征空间中学习具有良好判别性的类特征和领域特征。最后,在服务器聚合过程中,采用梯度重加权分层聚合(GHA)策略对所有客户端信息进行有效聚合,构建了对域变化具有良好鲁棒性的全局模型。在两个具有域移位的FL数据集上的实验结果表明,该方法始终优于当前的方法。
{"title":"Using Class and Domain Information to Address Domain Shift in Federated Learning.","authors":"Chien-Yu Chiou, Chun-Rong Huang, Lawrence L Latour, Yang C Fann, Pau-Choo Chung","doi":"10.1109/TNNLS.2026.3658584","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3658584","url":null,"abstract":"<p><p>In federated learning (FL), heterogeneous and client-specific data distributions cause a domain-shift problem, which leads to divergent local models and degraded global performance. To address this problem, this study proposes a class- and domain-aware FL framework that decouples and collaboratively learns the domain-invariant and domain-specific representations. During client training, a novel cross-gated feature separation (CGFS) module is employed to separate the domain features from the class features. A heterogeneous prototype contrastive learning (HPCL) module is then used to guide the learning of the class features and domain features with good discriminability within each feature space. Finally, during server aggregation, a gradient-reweighted hierarchical aggregation (GHA) strategy is applied to effectively aggregate information from all the clients and build a global model with good robustness to domain variation. The experimental results obtained on two FL datasets with domain shift show that the proposed method consistently outperforms state-of-the-art approaches.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing PPO With Trajectory-Aware Hybrid Policies. 利用轨迹感知混合策略增强PPO。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1109/TNNLS.2025.3641531
Qisai Liu, Zhanhong Jiang, Hsin-Jung Yang, Mahsa Khosravi, Joshua R Waite, Soumik Sarkar

Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable performance with theoretical policy improvement guarantees, high variance and high sample complexity still remain critical challenges in on-policy algorithms. To alleviate these issues, we propose a hybrid-policy PPO (HP3O), which utilizes a trajectory replay buffer to make efficient use of trajectories generated by recent policies. Particularly, the buffer applies the "first in, firstout" (FIFO) strategy so as to keep only the recent trajectories to attenuate the data distribution drift. A batch consisting of the trajectory with the best return and other randomly sampled ones from the buffer is used for updating the policy networks. The strategy helps the agent to improve its capability on top of the most recent best performance and, in turn, reduce variance empirically. We theoretically construct the policy improvement guarantees for the proposed algorithm. HP3O is validated and compared against several baseline algorithms using multiple continuous control environments. Our code is available at https://anonymous.4open.science/r/HP30-EB61/HP3O_train.py.

近端策略优化(PPO)是目前最流行的策略上算法之一,已成为现代强化学习的标准基准,在许多领域都有应用。虽然它提供了稳定的性能和理论上的策略改进保证,但高方差和高样本复杂度仍然是非策略算法的关键挑战。为了缓解这些问题,我们提出了一种混合策略PPO (hp30),它利用轨迹重放缓冲来有效利用最近策略生成的轨迹。特别是,缓冲区采用“先进先出”(FIFO)策略,以便仅保留最近的轨迹以减弱数据分布漂移。由具有最佳回报的轨迹和从缓冲区中随机抽样的轨迹组成的一批轨迹用于更新策略网络。该策略帮助智能体在最近最佳性能的基础上提高其能力,从而减少经验方差。从理论上构建了算法的策略改进保证。hp30在多个连续控制环境下与几种基线算法进行了验证和比较。我们的代码可在https://anonymous.4open.science/r/HP30-EB61/HP3O_train.py上获得。
{"title":"Enhancing PPO With Trajectory-Aware Hybrid Policies.","authors":"Qisai Liu, Zhanhong Jiang, Hsin-Jung Yang, Mahsa Khosravi, Joshua R Waite, Soumik Sarkar","doi":"10.1109/TNNLS.2025.3641531","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3641531","url":null,"abstract":"<p><p>Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable performance with theoretical policy improvement guarantees, high variance and high sample complexity still remain critical challenges in on-policy algorithms. To alleviate these issues, we propose a hybrid-policy PPO (HP3O), which utilizes a trajectory replay buffer to make efficient use of trajectories generated by recent policies. Particularly, the buffer applies the \"first in, firstout\" (FIFO) strategy so as to keep only the recent trajectories to attenuate the data distribution drift. A batch consisting of the trajectory with the best return and other randomly sampled ones from the buffer is used for updating the policy networks. The strategy helps the agent to improve its capability on top of the most recent best performance and, in turn, reduce variance empirically. We theoretically construct the policy improvement guarantees for the proposed algorithm. HP3O is validated and compared against several baseline algorithms using multiple continuous control environments. Our code is available at https://anonymous.4open.science/r/HP30-EB61/HP3O_train.py.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KSIQA: A Knowledge-Sharing Model for No-Reference Image Quality Assessment. KSIQA:无参考图像质量评估的知识共享模型。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1109/TNNLS.2026.3656757
Huasheng Wang, Jiang Liu, Hongchen Tan, Jianxun Lou, Xiaochang Liu, Wei Zhou, Ying Chen, Roger Whitaker, Walter Colombo, Hantao Liu

No-reference image quality assessment (NR-IQA) aims to quantitatively measure human perception of visual quality without comparing a distorted image to a reference. Despite recent advances, existing NR-IQR approaches often demonstrate insufficient ability to capture perceptual cues in the absence of a reference, limiting their generalisability across diverse and complex real-world image degradations. These limitations hinder their ability to match the reliability of full-reference IQA (FR-IQA) counterparts. A key challenge, therefore, is to enable NR-IQA models to emulate the reference-aware reasoning exhibited by humans and FR-IQA methods. To address this challenge, we propose a novel NR-IQA model based on a knowledge-sharing (KS) strategy to simulate this capability and predict image quality more effectively. Specifically, we designate an FR-IQA model as the teacher and an NR-IQA model as the student. Unlike conventional knowledge distillation (KD), our proposed architecture enables the NR-IQA student and FR-IQA teacher to share a decoder rather than being independent models. Furthermore, the student model contains a Mental Imagery Generation (MIG) module to learn mental imagery as the reference. To fully exploit local and global information, we adopt a vision transformer (ViT) branch and a convolutional neural network branch for feature extraction (FE). Finally, a quality-aware regressor (QAR) combined with deep ordinal regression is constructed to infer the quality score. Experiments show that our proposed NR-IQA model, KSIQA, has class-leading performance against current no-reference (NR) techniques across widespread benchmark datasets.

无参考图像质量评估(NR-IQA)旨在定量测量人类对视觉质量的感知,而不需要将扭曲的图像与参考图像进行比较。尽管最近取得了进展,但现有的NR-IQR方法在缺乏参考的情况下往往表现出捕捉感知线索的能力不足,限制了它们在各种复杂的现实世界图像退化中的通用性。这些限制阻碍了它们与全参考IQA (FR-IQA)相匹配的可靠性。因此,一个关键的挑战是使NR-IQA模型能够模拟人类和FR-IQA方法所展示的参考感知推理。为了解决这一挑战,我们提出了一种基于知识共享(KS)策略的NR-IQA模型来模拟这种能力并更有效地预测图像质量。具体来说,我们指定一个FR-IQA模型作为教师,一个NR-IQA模型作为学生。与传统的知识蒸馏(KD)不同,我们提出的架构使NR-IQA学生和FR-IQA教师能够共享一个解码器,而不是作为独立的模型。此外,学生模型还包含一个心理意象生成(MIG)模块,用以学习心理意象作为参考。为了充分利用局部和全局信息,我们采用视觉变换分支和卷积神经网络分支进行特征提取。最后,结合深度序数回归构造了质量感知回归器(QAR)来推断质量分数。实验表明,我们提出的NR- iqa模型KSIQA在广泛的基准数据集上具有领先于当前无参考(NR)技术的性能。
{"title":"KSIQA: A Knowledge-Sharing Model for No-Reference Image Quality Assessment.","authors":"Huasheng Wang, Jiang Liu, Hongchen Tan, Jianxun Lou, Xiaochang Liu, Wei Zhou, Ying Chen, Roger Whitaker, Walter Colombo, Hantao Liu","doi":"10.1109/TNNLS.2026.3656757","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3656757","url":null,"abstract":"<p><p>No-reference image quality assessment (NR-IQA) aims to quantitatively measure human perception of visual quality without comparing a distorted image to a reference. Despite recent advances, existing NR-IQR approaches often demonstrate insufficient ability to capture perceptual cues in the absence of a reference, limiting their generalisability across diverse and complex real-world image degradations. These limitations hinder their ability to match the reliability of full-reference IQA (FR-IQA) counterparts. A key challenge, therefore, is to enable NR-IQA models to emulate the reference-aware reasoning exhibited by humans and FR-IQA methods. To address this challenge, we propose a novel NR-IQA model based on a knowledge-sharing (KS) strategy to simulate this capability and predict image quality more effectively. Specifically, we designate an FR-IQA model as the teacher and an NR-IQA model as the student. Unlike conventional knowledge distillation (KD), our proposed architecture enables the NR-IQA student and FR-IQA teacher to share a decoder rather than being independent models. Furthermore, the student model contains a Mental Imagery Generation (MIG) module to learn mental imagery as the reference. To fully exploit local and global information, we adopt a vision transformer (ViT) branch and a convolutional neural network branch for feature extraction (FE). Finally, a quality-aware regressor (QAR) combined with deep ordinal regression is constructed to infer the quality score. Experiments show that our proposed NR-IQA model, KSIQA, has class-leading performance against current no-reference (NR) techniques across widespread benchmark datasets.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetaGrasp: Generalizable Dexterous Multifingered Functional Grasping With Gradual Skill Curriculum Learning. 元抓握:泛化灵巧多指功能抓握与渐进式技能课程学习。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1109/TNNLS.2026.3655749
Yinglan Lv, Qiyu Chen, Xiangbo Lin, Jianwen Li, Wenbin Bai, Yi Sun

Dexterous grasping and manipulation with multifingered robotic hands presents a significant challenge due to their high degrees of freedom and the need for task-specific adaptations. Existing methods usually adopt single-task learning framework or focus on simple stable wrap grasping, limiting their efficiency and generalization ability when encountering new task or precise functional grasping pose. In this article, we introduce MetaGrasp, a novel approach that defines dexterous functional grasping as a multitask reinforcement learning (RL) problem based on hand grasp pose classification. Our method features a unique gradual skill curriculum learning (GSCL) framework, which structures the learning process into three stages: beginner, intermediate, and advanced curriculum learning according to the level of difficulty. MetaGrasp leverages this hierarchical learning structure to develop a versatile, adaptive grasping policy that can grasp objects based on hand grasp pose and object point cloud inputs. Taking five hand grasp types as research cases, the trained policy with our MetaGrasp can be easily adpated to grasp different object instances from different object categories according to functional grasp intentions specified by one expert demonstration without requiring extensive system interaction. We categorize the dexterous functional grasping tasks of a five-fingered robotic hand into multiple tasks based on hand poses for RL, and to combine meta imitation learning (IL) with curriculum learning. The experimental results show that the MetaGrasp has better one-shot generalization ability on new grasp tasks, and outperforms state-of-the-art single-task dexterous grasping methods.

灵巧的抓取和操作与多指机械手提出了一个重大的挑战,由于他们的高度自由度和需要特定的任务适应。现有方法通常采用单任务学习框架或侧重于简单稳定的包裹抓取,这限制了它们在遇到新任务或精确功能抓取姿势时的效率和泛化能力。在本文中,我们介绍了一种新的方法MetaGrasp,它将灵巧的功能性抓取定义为基于手抓姿势分类的多任务强化学习(RL)问题。我们的方法以独特的渐进式技能课程学习(GSCL)框架为特色,该框架将学习过程分为三个阶段:根据难度水平,初级,中级和高级课程学习。MetaGrasp利用这种分层学习结构来开发一种通用的、自适应的抓取策略,可以根据手抓取姿势和对象点云输入来抓取对象。以五种手部抓取类型作为研究案例,我们的MetaGrasp训练策略可以很容易地根据一个专家演示指定的功能抓取意图来抓取不同对象类别的不同对象实例,而不需要大量的系统交互。本文将五指机械手的灵巧功能抓取任务分类为基于手部姿势的多任务进行强化学习,并将元模仿学习(IL)与课程学习相结合。实验结果表明,MetaGrasp在新的抓取任务上具有更好的单次泛化能力,优于现有的单任务灵巧抓取方法。
{"title":"MetaGrasp: Generalizable Dexterous Multifingered Functional Grasping With Gradual Skill Curriculum Learning.","authors":"Yinglan Lv, Qiyu Chen, Xiangbo Lin, Jianwen Li, Wenbin Bai, Yi Sun","doi":"10.1109/TNNLS.2026.3655749","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3655749","url":null,"abstract":"<p><p>Dexterous grasping and manipulation with multifingered robotic hands presents a significant challenge due to their high degrees of freedom and the need for task-specific adaptations. Existing methods usually adopt single-task learning framework or focus on simple stable wrap grasping, limiting their efficiency and generalization ability when encountering new task or precise functional grasping pose. In this article, we introduce MetaGrasp, a novel approach that defines dexterous functional grasping as a multitask reinforcement learning (RL) problem based on hand grasp pose classification. Our method features a unique gradual skill curriculum learning (GSCL) framework, which structures the learning process into three stages: beginner, intermediate, and advanced curriculum learning according to the level of difficulty. MetaGrasp leverages this hierarchical learning structure to develop a versatile, adaptive grasping policy that can grasp objects based on hand grasp pose and object point cloud inputs. Taking five hand grasp types as research cases, the trained policy with our MetaGrasp can be easily adpated to grasp different object instances from different object categories according to functional grasp intentions specified by one expert demonstration without requiring extensive system interaction. We categorize the dexterous functional grasping tasks of a five-fingered robotic hand into multiple tasks based on hand poses for RL, and to combine meta imitation learning (IL) with curriculum learning. The experimental results show that the MetaGrasp has better one-shot generalization ability on new grasp tasks, and outperforms state-of-the-art single-task dexterous grasping methods.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer Meets Gated Residual Networks to Enhance PICU's PPG Artifact Detection Informed by Mutual Information Neural Estimation. 变压器满足门控残差网络,增强PICU的互信息神经估计PPG伪影检测。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1109/TNNLS.2026.3656756
Thanh-Dung Le, Clara Macabiau, Kevin Albert, Symeon Chatzinotas, Philippe Jouvet, Rita Noumeir

This study delves into the effectiveness of various learning methods in improving Transformer models, focusing mainly on the Gated Residual Network (GRN) Transformer in the context of pediatric intensive care units (PICUs) with limited data availability. Our findings indicate that Transformers trained via supervised learning are less effective than MLP, CNN, and LSTM networks in such environments. Yet, leveraging unsupervised and self-supervised learning (SSL) on unannotated data, with subsequent fine-tuning on annotated data, notably enhances Transformer performance, although not to the level of the GRN-Transformer. Central to our research is analyzing different activation functions for the gated linear unit (GLU), a crucial element of the GRN structure. We also employ Mutual Information Neural Estimation (MINE) to evaluate the GRN's contribution. Additionally, the study examines the effects of integrating GRN within the Transformer's attention mechanism versus using it as a separate intermediary layer. Our results highlight that GLU with sigmoid activation stands out, achieving 0.98 accuracy, 0.91 precision, 0.96 recall, and $0.94~F1$ -score. The MINE analysis supports the hypothesis that GRN enhances the mutual information (MI) between the hidden representations and the output. Moreover, using GRN as an intermediate filter layer proves more beneficial than incorporating it within the Attention mechanism. This study clarifies how GRN boosters GRN-Transformer's performance surpasses other techniques. These findings offer a promising avenue for adopting sophisticated models like Transformers in data-constrained environments, such as PPG artifact detection in PICU settings.

本研究探讨了各种学习方法在改进Transformer模型中的有效性,主要关注数据可用性有限的儿科重症监护病房(picu)情境下的门控残差网络(GRN) Transformer。我们的研究结果表明,在这种环境下,通过监督学习训练的变形金刚比MLP、CNN和LSTM网络更有效。然而,在未注释的数据上利用无监督和自监督学习(SSL),并随后对注释数据进行微调,可以显著提高Transformer的性能,尽管没有达到GRN-Transformer的水平。我们研究的核心是分析门控线性单元(GLU)的不同激活函数,GLU是GRN结构的关键元素。我们还使用互信息神经估计(MINE)来评估GRN的贡献。此外,该研究还检验了将GRN整合到Transformer的注意力机制中,而不是将其作为单独的中间层的效果。我们的研究结果表明,具有s形激活的GLU表现突出,正确率为0.98,精密度为0.91,召回率为0.96,得分为$0.94~F1$。MINE分析支持GRN增强隐藏表示与输出之间的互信息(MI)的假设。此外,使用GRN作为中间过滤层被证明比将其合并到注意力机制中更有益。本研究阐明了GRN助推器GRN- transformer的性能如何超越其他技术。这些发现为在数据受限的环境中采用复杂的模型(如变形金刚)提供了一条有希望的途径,例如在PICU设置中进行PPG伪像检测。
{"title":"Transformer Meets Gated Residual Networks to Enhance PICU's PPG Artifact Detection Informed by Mutual Information Neural Estimation.","authors":"Thanh-Dung Le, Clara Macabiau, Kevin Albert, Symeon Chatzinotas, Philippe Jouvet, Rita Noumeir","doi":"10.1109/TNNLS.2026.3656756","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3656756","url":null,"abstract":"<p><p>This study delves into the effectiveness of various learning methods in improving Transformer models, focusing mainly on the Gated Residual Network (GRN) Transformer in the context of pediatric intensive care units (PICUs) with limited data availability. Our findings indicate that Transformers trained via supervised learning are less effective than MLP, CNN, and LSTM networks in such environments. Yet, leveraging unsupervised and self-supervised learning (SSL) on unannotated data, with subsequent fine-tuning on annotated data, notably enhances Transformer performance, although not to the level of the GRN-Transformer. Central to our research is analyzing different activation functions for the gated linear unit (GLU), a crucial element of the GRN structure. We also employ Mutual Information Neural Estimation (MINE) to evaluate the GRN's contribution. Additionally, the study examines the effects of integrating GRN within the Transformer's attention mechanism versus using it as a separate intermediary layer. Our results highlight that GLU with sigmoid activation stands out, achieving 0.98 accuracy, 0.91 precision, 0.96 recall, and $0.94~F1$ -score. The MINE analysis supports the hypothesis that GRN enhances the mutual information (MI) between the hidden representations and the output. Moreover, using GRN as an intermediate filter layer proves more beneficial than incorporating it within the Attention mechanism. This study clarifies how GRN boosters GRN-Transformer's performance surpasses other techniques. These findings offer a promising avenue for adopting sophisticated models like Transformers in data-constrained environments, such as PPG artifact detection in PICU settings.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Next-Gen Digital Predistortion From Hardware Acceleration of Neural Networks: Trends, Challenges, and Future. 神经网络硬件加速的下一代数字预失真:趋势、挑战和未来。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1109/TNNLS.2026.3656642
Mohd Tasleem Khan, Yuan Ding, George Goussetis

The computational demands of next-generation (Next-Gen) communication systems pose major challenges for real-time signal processing, particularly in digital predistortion (DPD), which is essential for linearizing power amplifier (PA) nonlinearities. While traditional DPD methods-such as polynomial and Volterra series models-remain prevalent, neural network (NN)-based approaches offer superior modeling accuracy and adaptability. However, their deployment is hindered by high computational complexity, limited scalability, and hardware integration challenges. This review presents a comprehensive analysis of NN-based DPD techniques and hardware acceleration strategies for efficient real-time implementation. We assess the strengths of various NN architectures-deep, convolutional, recurrent, and hybrid-and evaluate their tradeoffs across graphics processing unit (GPU), field-programmable gate arrays (FPGA), and application-specific integrated circuits (ASIC) platforms. We also examine key challenges, including fragmented evaluation standards and limited real-world validation. Finally, we outline future directions emphasizing model-hardware codesign, reconfigurable computing, and on-chip learning to enable scalable, energy-efficient DPD for 5G, 6G, and beyond.

下一代通信系统的计算需求对实时信号处理提出了重大挑战,特别是在数字预失真(DPD)方面,这对于线性化功率放大器(PA)的非线性至关重要。虽然传统的DPD方法-如多项式和Volterra系列模型-仍然流行,但基于神经网络(NN)的方法提供了优越的建模精度和适应性。然而,它们的部署受到高计算复杂性、有限的可伸缩性和硬件集成挑战的阻碍。本文综述了基于神经网络的DPD技术和硬件加速策略的全面分析,以实现高效的实时实现。我们评估了各种神经网络架构(深度、卷积、循环和混合)的优势,并评估了它们在图形处理单元(GPU)、现场可编程门阵列(FPGA)和特定应用集成电路(ASIC)平台上的权衡。我们还研究了主要挑战,包括分散的评估标准和有限的现实验证。最后,我们概述了未来的发展方向,强调模型硬件协同设计,可重构计算和片上学习,以实现5G, 6G及以后的可扩展,节能的DPD。
{"title":"Next-Gen Digital Predistortion From Hardware Acceleration of Neural Networks: Trends, Challenges, and Future.","authors":"Mohd Tasleem Khan, Yuan Ding, George Goussetis","doi":"10.1109/TNNLS.2026.3656642","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3656642","url":null,"abstract":"<p><p>The computational demands of next-generation (Next-Gen) communication systems pose major challenges for real-time signal processing, particularly in digital predistortion (DPD), which is essential for linearizing power amplifier (PA) nonlinearities. While traditional DPD methods-such as polynomial and Volterra series models-remain prevalent, neural network (NN)-based approaches offer superior modeling accuracy and adaptability. However, their deployment is hindered by high computational complexity, limited scalability, and hardware integration challenges. This review presents a comprehensive analysis of NN-based DPD techniques and hardware acceleration strategies for efficient real-time implementation. We assess the strengths of various NN architectures-deep, convolutional, recurrent, and hybrid-and evaluate their tradeoffs across graphics processing unit (GPU), field-programmable gate arrays (FPGA), and application-specific integrated circuits (ASIC) platforms. We also examine key challenges, including fragmented evaluation standards and limited real-world validation. Finally, we outline future directions emphasizing model-hardware codesign, reconfigurable computing, and on-chip learning to enable scalable, energy-efficient DPD for 5G, 6G, and beyond.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A2Net: Affiliation Alignment Networks for Whole-Body Pose Estimation With Vision-Language Models. 基于视觉语言模型的全身姿态估计关联对齐网络。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1109/TNNLS.2026.3656293
Ling Lin, Yaoxing Wang, Congcong Zhu, Jingrun Chen

The whole-body pose estimation task aims to predict the location of keypoints of the face, body, hands, and feet given an image. However, scale variation in different parts of the human body and semantic ambiguity in small-scale parts cause performance degradation in keypoint localization. The traditional paradigm for solving multiscale issues is to construct multiscale feature representations. Nevertheless, multiscale features extracted from visual images do not eliminate the semantic ambiguity issue in the small-scale part. In this article, we propose affiliation alignment network (A2Net), which solves the aforementioned problem by alignment of vision-language hierarchical affiliations. Specifically, text modality has the advantage of not being affected by the scaling problem and the small-scale semantic ambiguity problem, which is due to image scale variations. We construct a multisemantic hierarchical language latent space with clear semantic and affiliation relations by designing Text Affiliation Injection operations. Subsequently, we adopt the optimal transport (OT) method to align image features of different scales with text features of the corresponding hierarchical levels to build an image scale-independent visual-language latent space, which overcomes the image scale problem and the small-scale semantic ambiguity problem. Extensive experimental results on two whole-body pose estimation datasets show that our model achieves convincing performance compared to the current state-of-the-art methods. The code is openly available at https://github.com/LingLin-ll/A2Net.

全身姿态估计任务的目的是预测给定图像的脸、身体、手和脚的关键点位置。然而,人体不同部位的尺度差异和小尺度部位的语义模糊会导致关键点定位的性能下降。解决多尺度问题的传统范式是构建多尺度特征表示。然而,从视觉图像中提取的多尺度特征并不能消除小尺度部分的语义模糊问题。在本文中,我们提出了关联对齐网络(A2Net),它通过对齐视觉语言层次关联来解决上述问题。具体来说,文本模态的优点是不受尺度问题和小规模语义歧义问题的影响,而小规模语义歧义问题是由图像尺度变化引起的。通过设计文本关联注入操作,构建了语义关系清晰、隶属关系清晰的多语义层次语言潜空间。随后,我们采用最优传输(OT)方法将不同尺度的图像特征与相应层次的文本特征对齐,构建与图像尺度无关的视觉语言潜空间,克服了图像尺度问题和小尺度语义模糊问题。在两个全身姿态估计数据集上的大量实验结果表明,与目前最先进的方法相比,我们的模型取得了令人信服的性能。该代码可在https://github.com/LingLin-ll/A2Net上公开获得。
{"title":"A2Net: Affiliation Alignment Networks for Whole-Body Pose Estimation With Vision-Language Models.","authors":"Ling Lin, Yaoxing Wang, Congcong Zhu, Jingrun Chen","doi":"10.1109/TNNLS.2026.3656293","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3656293","url":null,"abstract":"<p><p>The whole-body pose estimation task aims to predict the location of keypoints of the face, body, hands, and feet given an image. However, scale variation in different parts of the human body and semantic ambiguity in small-scale parts cause performance degradation in keypoint localization. The traditional paradigm for solving multiscale issues is to construct multiscale feature representations. Nevertheless, multiscale features extracted from visual images do not eliminate the semantic ambiguity issue in the small-scale part. In this article, we propose affiliation alignment network (A2Net), which solves the aforementioned problem by alignment of vision-language hierarchical affiliations. Specifically, text modality has the advantage of not being affected by the scaling problem and the small-scale semantic ambiguity problem, which is due to image scale variations. We construct a multisemantic hierarchical language latent space with clear semantic and affiliation relations by designing Text Affiliation Injection operations. Subsequently, we adopt the optimal transport (OT) method to align image features of different scales with text features of the corresponding hierarchical levels to build an image scale-independent visual-language latent space, which overcomes the image scale problem and the small-scale semantic ambiguity problem. Extensive experimental results on two whole-body pose estimation datasets show that our model achieves convincing performance compared to the current state-of-the-art methods. The code is openly available at https://github.com/LingLin-ll/A2Net.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146131728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on neural networks and learning systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1