IEEE transactions on neural networks and learning systems最新文献_第2页

Dynamics of Q-Learning in Networked Stochastic Games. 网络随机博弈中q -学习的动力学。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2026-01-01 DOI: 10.1109/TNNLS.2025.3641365

Zheng Yuan, Guangchen Jiang, Shuyue Hu, Matjaz Perc, Chen Chu, Jinzhuo Liu

Stochastic games form the foundational mathematical framework for describing multiagent interactions and underpin the theoretical foundations of multiagent reinforcement learning (MARL) and optimal decision making. However, previous research has typically focused on either two-agent settings or large-scale well-mixed agent populations, where the considered interaction scenarios were far from realistic. In this article, we consider structured populations where agents can interact with immediate neighbors. By using the pair-approximation method, we develop a new dynamical model to describe the $Q$ -learning dynamics in stochastic games on regular graphs. Through comparisons with agent-based simulation results, we validate the accuracy of our dynamical model across various stochastic games, population structures, and algorithm parameters. Our research thus provides both qualitative and quantitative insights into the effects of state transition rules and graph topologies in population dynamics. In particular, we show that, under certain conditions, state transitions can significantly promote the evolution of cooperation in social dilemmas. We also explored the effects of agent degree on cooperation, and unlike previous findings, we show that this can have either positive or negative implications for cooperation depending on the transition rules.

随机博弈是描述多智能体交互的基本数学框架，也是多智能体强化学习（MARL）和最优决策的理论基础。然而，先前的研究通常集中在双智能体设置或大规模混合良好的智能体种群，其中考虑的相互作用场景远非现实。在本文中，我们考虑结构化种群，其中代理可以与近邻交互。利用对逼近方法，我们建立了一个新的动态模型来描述正则图上随机博弈的$Q$学习动态。通过与基于智能体的仿真结果的比较，我们验证了我们的动态模型在各种随机博弈、人口结构和算法参数中的准确性。因此，我们的研究为人口动态中的状态转换规则和图拓扑的影响提供了定性和定量的见解。特别是，我们表明，在一定条件下，状态转换可以显著促进社会困境中的合作进化。我们还探讨了代理程度对合作的影响，与之前的研究结果不同，我们表明，根据过渡规则，这可能对合作产生积极或消极的影响。

{"title":"Dynamics of Q-Learning in Networked Stochastic Games.","authors":"Zheng Yuan, Guangchen Jiang, Shuyue Hu, Matjaz Perc, Chen Chu, Jinzhuo Liu","doi":"10.1109/TNNLS.2025.3641365","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3641365","url":null,"abstract":"Stochastic games form the foundational mathematical framework for describing multiagent interactions and underpin the theoretical foundations of multiagent reinforcement learning (MARL) and optimal decision making. However, previous research has typically focused on either two-agent settings or large-scale well-mixed agent populations, where the considered interaction scenarios were far from realistic. In this article, we consider structured populations where agents can interact with immediate neighbors. By using the pair-approximation method, we develop a new dynamical model to describe the $Q$ -learning dynamics in stochastic games on regular graphs. Through comparisons with agent-based simulation results, we validate the accuracy of our dynamical model across various stochastic games, population structures, and algorithm parameters. Our research thus provides both qualitative and quantitative insights into the effects of state transition rules and graph topologies in population dynamics. In particular, we show that, under certain conditions, state transitions can significantly promote the evolution of cooperation in social dilemmas. We also explored the effects of agent degree on cooperation, and unlike previous findings, we show that this can have either positive or negative implications for cooperation depending on the transition rules.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145889160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reinforcement Learning-Based Optimal Formation Tracking for UAVs With Safety Constraints. 基于强化学习的安全约束无人机最优编队跟踪。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2026-01-01 DOI: 10.1109/TNNLS.2025.3643630

Ping Wang, Chengpu Yu, Fang Deng, Jie Chen

This article develops a scheme to tackle the safe optimal formation tracking issue for multiple fixed-wing uncrewed aerial vehicles (UAVs) with external disturbances and asymmetric control constraints. To ensure safety constraints in collision avoidance, a safe set is first constructed by a super level set of a continuously differential function, following a novel control barrier function (CBF) to characterize the safety. Subsequently, we transform the safe optimal formation tracking control into a constrained zero-sum (ZS) differential game to mitigate the destabilizing effects of the disturbances, where the cost function is constructed in a nonquadratic form to cope with asymmetric input constraints. Particularly, the designed CBF is integrated into the cost function to penalize the unsafe behavior, and a damping coefficient is included to balance the optimality and safety. Afterwords, a critic-only reinforcement learning (RL) strategy is developed to learn the robust safe Nash policy, where the critic weights are updated by applying experience replay technology, thus avoiding the requirement for persistence of excitation condition. Moreover, the stability and forward invariance of the safe set of the presented scheme are also verified. Finally, simulation examples are provided to substantiate the validity of the control scheme.

本文提出了一种解决具有外部干扰和非对称控制约束的多固定翼无人机安全最优编队跟踪问题的方案。为了保证避碰中的安全约束，首先用连续微分函数的超水平集构造安全集，然后用新的控制障碍函数（CBF）来表征安全。随后，我们将安全最优编队跟踪控制转化为约束零和（ZS）微分博弈以减轻干扰的不稳定效应，其中代价函数构造为非二次形式以应对非对称输入约束。特别地，将设计的CBF集成到代价函数中以惩罚不安全行为，并加入阻尼系数以平衡最优性和安全性。最后，提出了一种仅限临界强化学习（RL）策略来学习鲁棒安全纳什策略，该策略通过应用经验重放技术来更新临界权值，从而避免了对激励条件持久性的要求。此外，还验证了该方案安全集的稳定性和前向不变性。最后，通过仿真实例验证了控制方案的有效性。

{"title":"Reinforcement Learning-Based Optimal Formation Tracking for UAVs With Safety Constraints.","authors":"Ping Wang, Chengpu Yu, Fang Deng, Jie Chen","doi":"10.1109/TNNLS.2025.3643630","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3643630","url":null,"abstract":"This article develops a scheme to tackle the safe optimal formation tracking issue for multiple fixed-wing uncrewed aerial vehicles (UAVs) with external disturbances and asymmetric control constraints. To ensure safety constraints in collision avoidance, a safe set is first constructed by a super level set of a continuously differential function, following a novel control barrier function (CBF) to characterize the safety. Subsequently, we transform the safe optimal formation tracking control into a constrained zero-sum (ZS) differential game to mitigate the destabilizing effects of the disturbances, where the cost function is constructed in a nonquadratic form to cope with asymmetric input constraints. Particularly, the designed CBF is integrated into the cost function to penalize the unsafe behavior, and a damping coefficient is included to balance the optimality and safety. Afterwords, a critic-only reinforcement learning (RL) strategy is developed to learn the robust safe Nash policy, where the critic weights are updated by applying experience replay technology, thus avoiding the requirement for persistence of excitation condition. Moreover, the stability and forward invariance of the safe set of the presented scheme are also verified. Finally, simulation examples are provided to substantiate the validity of the control scheme.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145889218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiagent Inductive Policy Optimization. 多智能体归纳策略优化。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2026-01-01 DOI: 10.1109/TNNLS.2025.3601360

Yubo Huang, Xiaowei Zhao

Policy optimization methods are promising to tackle high-complexity reinforcement learning (RL) tasks with multiple agents. In this article, we derive a general trust region for policy optimization methods by considering the effect of subpolicy combinations among agents in multiagent environments. Based on this trust region, we propose an inductive objective to train the policy function, which can ensure agents learn monotonically improving policies. Furthermore, we observe that the policy always updates very weakly before falling into a local optimum. To address this, we introduce a cost regarding policy distance in the inductive objective to strengthen the motivation of agents to explore new policies. This approach strikes a balance during training, where the policy update step size remains within the constraints of the trust region, preventing excessive updates while avoiding getting stuck in local optima. Simulations on wind farm (WF) control tasks and two multiagent benchmarks demonstrate the high performance of the proposed multiagent inductive policy optimization (MAIPO) method.

策略优化方法有望解决具有多智能体的高复杂性强化学习（RL）任务。本文通过考虑多智能体环境中agent间子策略组合的影响，推导出策略优化方法的一般信任域。在此信任域的基础上，我们提出了一个归纳目标来训练策略函数，使智能体能够单调地学习改进策略。此外，我们观察到策略在陷入局部最优之前总是非常弱的更新。为了解决这个问题，我们在归纳目标中引入了与政策距离相关的成本，以增强智能体探索新政策的动机。这种方法在训练过程中取得了平衡，策略更新步长保持在信任区域的约束范围内，防止过度更新，同时避免陷入局部最优。对风电场控制任务和两个多智能体基准的仿真验证了所提出的多智能体诱导策略优化方法的高性能。

引用次数: 0

A New Neural Network PI-Funnel Distributed Control for Cooperative Manipulator With Global Prescribed Performance. 具有全局预定性能的协同机械臂神经网络pi -漏斗分布式控制。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2026-01-01 DOI: 10.1109/TNNLS.2025.3648714

Cui-Hua Zhang, Ze-Yun Hu, Yu-Jia Li, Ying Zhang, Chang-Chun Hua

This article addresses the distributed global prescribed-performance control problem for uncertain Lagrangian dynamics, with a particular emphasis on minimizing steady-state error oscillations. A novel global distributed prescribed-performance control framework is proposed based on a dynamic funnel function and neural network design. Specifically, by integrating funnel barrier properties and derivative information, a new neural network learning law is developed. Furthermore, a projection operator is incorporated into the learning law to guarantee the boundedness of the weight estimates in the stability proof, ultimately avoiding potential constraint incompatibility problems caused by neural network integration. The established control framework ensures that the trajectory consensus error of robotic manipulators under distributed control satisfies global arbitrary convergence rates and steady-state error bounds while leveraging neural network approximation to mitigate the inherent uncertainties of controllers that do not require precise mathematical model, thereby effectively suppressing steady-state error oscillations. Unlike existing literature, this work pioneers the incorporation of neural networks into distributed funnel control, achieving global prescribed performance while significantly reducing steady-state error oscillations. Finally, simulation results validate the effectiveness of the proposed method.

本文讨论了不确定拉格朗日动力学的分布式全局规定性能控制问题，特别强调最小化稳态误差振荡。提出了一种基于动态漏斗函数和神经网络设计的全局分布式规定性能控制框架。具体而言，通过整合漏斗屏障特性和导数信息，提出了一种新的神经网络学习规律。此外，在学习律中加入投影算子，保证了稳定性证明中权值估计的有界性，避免了神经网络集成可能产生的约束不相容问题。所建立的控制框架保证了机器人在分布式控制下的轨迹一致性误差满足全局任意收敛速率和稳态误差界，同时利用神经网络逼近来减轻控制器不需要精确数学模型的固有不确定性，从而有效地抑制稳态误差振荡。与现有文献不同的是，这项工作开创了将神经网络纳入分布式漏斗控制的先机，在显著减少稳态误差振荡的同时实现了全局规定的性能。最后，仿真结果验证了该方法的有效性。

{"title":"A New Neural Network PI-Funnel Distributed Control for Cooperative Manipulator With Global Prescribed Performance.","authors":"Cui-Hua Zhang, Ze-Yun Hu, Yu-Jia Li, Ying Zhang, Chang-Chun Hua","doi":"10.1109/TNNLS.2025.3648714","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3648714","url":null,"abstract":"This article addresses the distributed global prescribed-performance control problem for uncertain Lagrangian dynamics, with a particular emphasis on minimizing steady-state error oscillations. A novel global distributed prescribed-performance control framework is proposed based on a dynamic funnel function and neural network design. Specifically, by integrating funnel barrier properties and derivative information, a new neural network learning law is developed. Furthermore, a projection operator is incorporated into the learning law to guarantee the boundedness of the weight estimates in the stability proof, ultimately avoiding potential constraint incompatibility problems caused by neural network integration. The established control framework ensures that the trajectory consensus error of robotic manipulators under distributed control satisfies global arbitrary convergence rates and steady-state error bounds while leveraging neural network approximation to mitigate the inherent uncertainties of controllers that do not require precise mathematical model, thereby effectively suppressing steady-state error oscillations. Unlike existing literature, this work pioneers the incorporation of neural networks into distributed funnel control, achieving global prescribed performance while significantly reducing steady-state error oscillations. Finally, simulation results validate the effectiveness of the proposed method.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145889203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced Residual Tensor Norm Minimization for Multiview Subspace Clustering. 多视图子空间聚类的增强残差张量范数最小化。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2025-12-31 DOI: 10.1109/TNNLS.2025.3648433

Qinghai Zheng

The low-rank tensor constraint is widely used in multiview subspace clustering (MSC) and has demonstrated promising clustering performance on many datasets. The key challenges in most existing low-rank tensor constraint-based methods include: 1) the choice of surrogate functions for the tensor rank and 2) the rotation operation applied to the tensor formed by stacking multiple subspace representations along the third mode. The latter plays a critical role in enhancing clustering performance in multiview settings. In this work, we rethink the low-rank tensor constraint and present the enhanced residual tensor norm (ERTN) for multiview subspace clustering, dubbed ERTN-MSC. To be specific, ERTN employs a novel surrogate for the tensor rank, based on the residual learning of singular values, which facilitates better exploitation of the structural information in multiview data. Furthermore, ERTN applies the tensor-singular value decomposition (t-SVD) on three modes of the tensor constructed by multiple subspaces, which generalizes the rotation operation of tensor and enables comprehensive exploration of both intraview information and interview information of multiview data. An augmented Lagrangian multiplier-based algorithm with a convergence guarantee is designed for optimization. Experiments conducted on several real-world multiview datasets demonstrate the effectiveness and competitiveness of our ERTN-MSC.

低秩张量约束在多视图子空间聚类（MSC）中得到了广泛的应用，并在许多数据集上显示出良好的聚类性能。大多数现有的基于低秩张量约束的方法面临的主要挑战包括：1)张量秩的替代函数的选择；2)通过沿第三模态叠加多个子空间表示形成的张量的旋转操作。后者在提高多视图设置下的聚类性能方面起着关键作用。在这项工作中，我们重新思考了低秩张量约束，并提出了用于多视图子空间聚类的增强残差张量范数（ERTN），称为ERTN- msc。具体来说，ERTN基于奇异值残差学习为张量秩使用了一种新的代理，这有助于更好地利用多视图数据中的结构信息。此外，ERTN将张量-奇异值分解（t-SVD）应用于由多个子空间构成的张量的三种模态，推广了张量的旋转运算，可以对多视图数据的视域信息和访谈信息进行综合挖掘。设计了一种具有收敛保证的增广拉格朗日乘法器优化算法。在几个真实世界的多视图数据集上进行的实验证明了我们的ern - msc的有效性和竞争力。

{"title":"Enhanced Residual Tensor Norm Minimization for Multiview Subspace Clustering.","authors":"Qinghai Zheng","doi":"10.1109/TNNLS.2025.3648433","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3648433","url":null,"abstract":"The low-rank tensor constraint is widely used in multiview subspace clustering (MSC) and has demonstrated promising clustering performance on many datasets. The key challenges in most existing low-rank tensor constraint-based methods include: 1) the choice of surrogate functions for the tensor rank and 2) the rotation operation applied to the tensor formed by stacking multiple subspace representations along the third mode. The latter plays a critical role in enhancing clustering performance in multiview settings. In this work, we rethink the low-rank tensor constraint and present the enhanced residual tensor norm (ERTN) for multiview subspace clustering, dubbed ERTN-MSC. To be specific, ERTN employs a novel surrogate for the tensor rank, based on the residual learning of singular values, which facilitates better exploitation of the structural information in multiview data. Furthermore, ERTN applies the tensor-singular value decomposition (t-SVD) on three modes of the tensor constructed by multiple subspaces, which generalizes the rotation operation of tensor and enables comprehensive exploration of both intraview information and interview information of multiview data. An augmented Lagrangian multiplier-based algorithm with a convergence guarantee is designed for optimization. Experiments conducted on several real-world multiview datasets demonstrate the effectiveness and competitiveness of our ERTN-MSC.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145878231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diffusion-Augmented Graph Contrastive Learning for Knowledge-Aware Recommendation. 面向知识感知推荐的扩散-增强图对比学习。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2025-12-30 DOI: 10.1109/TNNLS.2025.3646605

Jing Zhang, Xiaoqian Jiang, Youxuan Wang, Shunmeng Meng, Cangqi Zhou

Knowledge graph (KG) contrastive learning (CL) has garnered significant attention in the realm of recommendation systems. However, existing models often employ random masking for graph enhancement, which can introduce sampling bias and impede interpretability. Furthermore, the KG-UIG information imbalance can lead to the neglect of critical information in the user-item interaction graph (UIG) by the model. To address these challenges, we propose a novel model, diffusion-augmented graph CL (DAGCL). This model leverages a graph diffusion mechanism for data enhancement in CL, thereby ensuring that the generated diffusion graph closely resembles the original UIG and avoiding the pitfalls associated with random sampling. Additionally, DAGCL enhances the impact of UIG on predictive accuracy by implementing both intragraph and intergraph CL (GCL), effectively mitigating the information imbalance between KGs and UIG. The model also leverages the structural characteristics of the UIG to construct a structural diffusion graph, which is integrated with the information diffusion graph to produce a comprehensive diffusion representation-further enhancing the model's robustness against sampling noise and semantic dilution by preserving essential interaction patterns and structural features in the augmented graph. Experimental results across three real-world datasets demonstrate that our proposed model outperforms state-of-the-art models significantly.

知识图（KG）对比学习（CL）在推荐系统领域引起了广泛的关注。然而，现有的模型通常采用随机掩蔽来进行图增强，这可能会引入抽样偏差并阻碍可解释性。此外，KG-UIG信息不平衡会导致模型忽略用户-物品交互图（UIG）中的关键信息。为了解决这些挑战，我们提出了一个新的模型，扩散增广图CL （DAGCL）。该模型利用图扩散机制在CL中进行数据增强，从而确保生成的扩散图与原始UIG非常相似，并避免与随机抽样相关的陷阱。此外，DAGCL通过同时实现图内和图间CL (GCL)，增强了UIG对预测精度的影响，有效地缓解了KGs和UIG之间的信息不平衡。该模型还利用UIG的结构特征构建了一个结构扩散图，该结构扩散图与信息扩散图集成以产生一个全面的扩散表示——通过保留增广图中的基本交互模式和结构特征，进一步增强了模型对采样噪声和语义稀释的鲁棒性。三个真实数据集的实验结果表明，我们提出的模型明显优于最先进的模型。

{"title":"Diffusion-Augmented Graph Contrastive Learning for Knowledge-Aware Recommendation.","authors":"Jing Zhang, Xiaoqian Jiang, Youxuan Wang, Shunmeng Meng, Cangqi Zhou","doi":"10.1109/TNNLS.2025.3646605","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3646605","url":null,"abstract":"Knowledge graph (KG) contrastive learning (CL) has garnered significant attention in the realm of recommendation systems. However, existing models often employ random masking for graph enhancement, which can introduce sampling bias and impede interpretability. Furthermore, the KG-UIG information imbalance can lead to the neglect of critical information in the user-item interaction graph (UIG) by the model. To address these challenges, we propose a novel model, diffusion-augmented graph CL (DAGCL). This model leverages a graph diffusion mechanism for data enhancement in CL, thereby ensuring that the generated diffusion graph closely resembles the original UIG and avoiding the pitfalls associated with random sampling. Additionally, DAGCL enhances the impact of UIG on predictive accuracy by implementing both intragraph and intergraph CL (GCL), effectively mitigating the information imbalance between KGs and UIG. The model also leverages the structural characteristics of the UIG to construct a structural diffusion graph, which is integrated with the information diffusion graph to produce a comprehensive diffusion representation-further enhancing the model's robustness against sampling noise and semantic dilution by preserving essential interaction patterns and structural features in the augmented graph. Experimental results across three real-world datasets demonstrate that our proposed model outperforms state-of-the-art models significantly.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145863054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large Language Models Are Multitask Chain-of-Thought Prompting Optimizers. 大型语言模型是多任务思维链提示优化器。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2025-12-30 DOI: 10.1109/TNNLS.2025.3644134

Feihu Jin, Ying Tan

Large language models (LLMs) have achieved striking performance across a broad range of reasoning benchmarks, yet the quality of their outputs remains acutely sensitive to prompt design. A meticulously engineered prompt can coax an LLM into correctly answering even highly complex questions, prompting a surge of research into techniques that boost prompt efficacy. Manual chain-of-thought (CoT) prompting and automated prompt generation have emerged as leading strategies. However, CoT exemplars must be painstakingly tailored to each task, making the process labor-intensive, while prompts optimized for a single task often fail to generalize. We introduce a simple yet powerful alternative: by treating the LLM itself as a multitask optimizer, we enable iterative self-refinement of prompts through natural language task descriptions and few-shot in-context learning (ICL). Recognizing that individual prompts exhibit task-dependent sensitivity, we further ensemble the top-performing prompts at inference time. Empirical evaluation with several state-of-the-art LLMs shows that our method substantially surpasses prior baselines, delivering gains of up to 6.0% on mathematical reasoning tasks and 10.2% on commonsense reasoning benchmarks.

大型语言模型（llm）已经在广泛的推理基准上取得了惊人的性能，但是它们的输出质量仍然对提示设计非常敏感。精心设计的提示可以诱使法学硕士正确回答甚至是非常复杂的问题，这促使对提高提示效果的技术的研究激增。人工思维链（CoT）提示和自动提示生成已成为主要策略。然而，CoT范例必须为每个任务精心定制，这使得流程非常费力，而针对单个任务优化的提示通常无法泛化。我们介绍了一个简单而强大的替代方案：通过将LLM本身视为多任务优化器，我们通过自然语言任务描述和少量上下文学习（ICL）实现提示的迭代自改进。认识到单个提示表现出任务依赖的敏感性，我们进一步集成了在推理时表现最好的提示。对几个最先进的法学硕士进行的实证评估表明，我们的方法大大超过了之前的基线，在数学推理任务上的收益高达6.0%，在常识推理基准上的收益高达10.2%。

{"title":"Large Language Models Are Multitask Chain-of-Thought Prompting Optimizers.","authors":"Feihu Jin, Ying Tan","doi":"10.1109/TNNLS.2025.3644134","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3644134","url":null,"abstract":"Large language models (LLMs) have achieved striking performance across a broad range of reasoning benchmarks, yet the quality of their outputs remains acutely sensitive to prompt design. A meticulously engineered prompt can coax an LLM into correctly answering even highly complex questions, prompting a surge of research into techniques that boost prompt efficacy. Manual chain-of-thought (CoT) prompting and automated prompt generation have emerged as leading strategies. However, CoT exemplars must be painstakingly tailored to each task, making the process labor-intensive, while prompts optimized for a single task often fail to generalize. We introduce a simple yet powerful alternative: by treating the LLM itself as a multitask optimizer, we enable iterative self-refinement of prompts through natural language task descriptions and few-shot in-context learning (ICL). Recognizing that individual prompts exhibit task-dependent sensitivity, we further ensemble the top-performing prompts at inference time. Empirical evaluation with several state-of-the-art LLMs shows that our method substantially surpasses prior baselines, delivering gains of up to 6.0% on mathematical reasoning tasks and 10.2% on commonsense reasoning benchmarks.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145863039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Dynamic Neural Network-Based Control Method Using Reinforcement Learning for Nonlinear Parameter-Varying System With Application to Morphing Aircraft. 非线性变参数系统的强化学习动态神经网络控制方法及其在变形飞机上的应用。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2025-12-30 DOI: 10.1109/TNNLS.2025.3646358

Chun-Xiao Li, Huai-Ning Wu

This article proposes a dynamic neural network (DNN)-based control method to realize the optimal control of nonlinear parameter-varying (NPV) systems. Specifically, a DNN-based control policy (DNN-CP) composed of static shared layers and a parameter-related dynamic layer is constructed to improve the generalization and adaptability. An extreme learning machine (ELM)-based weight prediction model is established to fit the relationship between the dynamic weights and the system parameters. The shared layers are updated by solving the constrained multiobjective problem to reduce performance conflicts among different systems, and the weight prediction model is tuned by maximizing parameter-related objectives to achieve optimal control of each system. To improve data efficiency and adaptability, a supervised learning-based pretraining and reinforcement learning (RL)-based fine-tuning algorithm is developed. Finally, the control performance of the DNN-CP is verified on morphing aircraft. We demonstrate that the designed DNN-CP and training algorithm can achieve generalization capabilities, and DNN-CP can be immediately generalized to any system within the parameter space without sample collection or fine-tuning. Compared with other methods, DNN-CP has better control performance on the system with continuously varying parameters.

提出了一种基于动态神经网络（DNN）的非线性变参系统最优控制方法。具体而言，构建了由静态共享层和参数相关动态层组成的基于深度神经网络的控制策略（DNN-CP），提高了策略的泛化和自适应能力。为了拟合动态权值与系统参数之间的关系，建立了基于极限学习机（ELM）的权值预测模型。通过求解约束多目标问题来更新共享层，以减少不同系统之间的性能冲突；通过最大化参数相关目标来调整权重预测模型，以实现各系统的最优控制。为了提高数据效率和适应性，提出了一种基于监督学习的预训练和强化学习（RL）微调算法。最后，在变形飞机上验证了DNN-CP的控制性能。我们证明了所设计的DNN-CP和训练算法可以实现泛化能力，并且DNN-CP可以立即泛化到参数空间内的任何系统，而无需采样或微调。与其他方法相比，DNN-CP对参数连续变化的系统具有更好的控制性能。

{"title":"A Dynamic Neural Network-Based Control Method Using Reinforcement Learning for Nonlinear Parameter-Varying System With Application to Morphing Aircraft.","authors":"Chun-Xiao Li, Huai-Ning Wu","doi":"10.1109/TNNLS.2025.3646358","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3646358","url":null,"abstract":"This article proposes a dynamic neural network (DNN)-based control method to realize the optimal control of nonlinear parameter-varying (NPV) systems. Specifically, a DNN-based control policy (DNN-CP) composed of static shared layers and a parameter-related dynamic layer is constructed to improve the generalization and adaptability. An extreme learning machine (ELM)-based weight prediction model is established to fit the relationship between the dynamic weights and the system parameters. The shared layers are updated by solving the constrained multiobjective problem to reduce performance conflicts among different systems, and the weight prediction model is tuned by maximizing parameter-related objectives to achieve optimal control of each system. To improve data efficiency and adaptability, a supervised learning-based pretraining and reinforcement learning (RL)-based fine-tuning algorithm is developed. Finally, the control performance of the DNN-CP is verified on morphing aircraft. We demonstrate that the designed DNN-CP and training algorithm can achieve generalization capabilities, and DNN-CP can be immediately generalized to any system within the parameter space without sample collection or fine-tuning. Compared with other methods, DNN-CP has better control performance on the system with continuously varying parameters.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145863061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiphase and Multitask Prompt Tuning for LLM-Based Context-Aware Machine Translation. 基于llm的上下文感知机器翻译的多阶段多任务提示调优。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2025-12-30 DOI: 10.1109/TNNLS.2025.3646848

Xinglin Lyu, Junhui Li, Daimeng Wei, Min Zhang, Shimin Tao, Hao Yang, Min Zhang

Large language models (LLMs) are typically adapted for context-aware machine translation (MT) by combining both the source sentence and its surrounding sentences into a single input. This unified input is then processed in one go, with the model producing the target translation step by step. However, this method treats the intrasentence and intersentence contexts similarly, even though they play distinct roles. In this study, we present a novel strategy called multiphase prompt tuning (MPT) to address this issue by enabling LLMs to treat these two context types differently. MPT divides the context-aware translation task into three phases: encoding the intersentence context, encoding the source sentence, andthefinal decoding phase. Each phase incorporates distinct continuous prompts that help the model focus on the appropriate task for each type of context. We also introduce a multitask fine-tuning approach to emphasize the distinction between intersentence and intrasentence contexts and enhance intersentence dependencies. This includes two auxiliary tasks: context-agnostic translation and cross-lingual next sentence generation, which help extract additional information and improve the model's handling of discourse-related challenges.

大型语言模型（llm）通常适用于上下文感知机器翻译（MT），方法是将源句子及其周围句子组合成单个输入。然后，这个统一的输入被一次性处理，模型一步一步地生成目标翻译。然而，这种方法对句内和句间语境的处理是相似的，尽管它们扮演着不同的角色。在本研究中，我们提出了一种称为多相提示调优（MPT）的新策略，通过使llm能够不同地对待这两种上下文类型来解决这个问题。MPT将语境感知翻译任务分为三个阶段：句间语境编码、源句子编码和最终解码阶段。每个阶段都包含不同的连续提示，帮助模型针对每种类型的上下文关注适当的任务。我们还介绍了一种多任务微调方法，以强调句子间和句子内上下文的区别，并增强句子间的依赖性。这包括两个辅助任务：与上下文无关的翻译和跨语言的下一个句子生成，这有助于提取额外的信息并改进模型对与话语相关的挑战的处理。

{"title":"Multiphase and Multitask Prompt Tuning for LLM-Based Context-Aware Machine Translation.","authors":"Xinglin Lyu, Junhui Li, Daimeng Wei, Min Zhang, Shimin Tao, Hao Yang, Min Zhang","doi":"10.1109/TNNLS.2025.3646848","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3646848","url":null,"abstract":"Large language models (LLMs) are typically adapted for context-aware machine translation (MT) by combining both the source sentence and its surrounding sentences into a single input. This unified input is then processed in one go, with the model producing the target translation step by step. However, this method treats the intrasentence and intersentence contexts similarly, even though they play distinct roles. In this study, we present a novel strategy called multiphase prompt tuning (MPT) to address this issue by enabling LLMs to treat these two context types differently. MPT divides the context-aware translation task into three phases: encoding the intersentence context, encoding the source sentence, andthefinal decoding phase. Each phase incorporates distinct continuous prompts that help the model focus on the appropriate task for each type of context. We also introduce a multitask fine-tuning approach to emphasize the distinction between intersentence and intrasentence contexts and enhance intersentence dependencies. This includes two auxiliary tasks: context-agnostic translation and cross-lingual next sentence generation, which help extract additional information and improve the model's handling of discourse-related challenges.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145863013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Space–Frequency Cross-Attention Node Feature Optimization Graph Neural Operator for Partial Differential Equations 偏微分方程的空频交叉注意节点特征优化图神经算子

IF 10.4 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems

Pub Date : 2025-12-25 DOI: 10.1109/tnnls.2025.3643632

Pengfei Bie, Ning Song, Nuoqing Zhang, Jie Nie, Min Ye, Xinyue Liang, Qi Wen

引用次数: 0