首页 > 最新文献

Neural Networks最新文献

英文 中文
Constraining an Unconstrained Multi-agent Policy with offline data
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-13 DOI: 10.1016/j.neunet.2025.107253
Cong Guan, Tao Jiang, Yi-Chen Li, Zongzhang Zhang, Lei Yuan, Yang Yu
Real-world multi-agent decision-making systems often have to satisfy some constraints, such as harmfulness, economics, etc., spurring the emergence of Constrained Multi-Agent Reinforcement Learning (CMARL). Existing studies of CMARL mainly focus on training a constrained policy in an online manner, that is, not only maximizing cumulative rewards but also not violating constraints. However, in practice, online learning may be infeasible due to safety restrictions or a lack of high-fidelity simulators. Moreover, as the learned policy runs, new constraints, that are not taken into account during training, may occur. To deal with the above two issues, we propose a method called Constraining an UnconsTrained Multi-Agent Policy with offline data, dubbed CUTMAP, following the popular centralized training with decentralized execution paradigm. Specifically, we have formulated a scalable optimization objective within the framework of multi-agent maximum entropy reinforcement learning for CMARL. This approach is designed to estimate a decomposable Q-function by leveraging an unconstrained “prior policy”1 in conjunction with cost signals extracted from offline data. When a new constraint comes, CUTMAP can reuse the prior policy without re-training it. To tackle the distribution shift challenge in offline learning, we also incorporate a conservative loss term when updating the Q-function. Therefore, the unconstrained prior policy can be trained to satisfy cost constraints through CUTMAP without the need for expensive interactions with the real environment, facilitating the practical application of MARL algorithms. Empirical results in several cooperative multi-agent benchmarks, including StarCraft games, particle games, food search games, and robot control, demonstrate the superior performance of our method.
{"title":"Constraining an Unconstrained Multi-agent Policy with offline data","authors":"Cong Guan,&nbsp;Tao Jiang,&nbsp;Yi-Chen Li,&nbsp;Zongzhang Zhang,&nbsp;Lei Yuan,&nbsp;Yang Yu","doi":"10.1016/j.neunet.2025.107253","DOIUrl":"10.1016/j.neunet.2025.107253","url":null,"abstract":"<div><div>Real-world multi-agent decision-making systems often have to satisfy some constraints, such as harmfulness, economics, etc., spurring the emergence of Constrained Multi-Agent Reinforcement Learning (CMARL). Existing studies of CMARL mainly focus on training a constrained policy in an online manner, that is, not only maximizing cumulative rewards but also not violating constraints. However, in practice, online learning may be infeasible due to safety restrictions or a lack of high-fidelity simulators. Moreover, as the learned policy runs, new constraints, that are not taken into account during training, may occur. To deal with the above two issues, we propose a method called <strong>C</strong>onstraining an <strong>U</strong>ncons<strong>T</strong>rained <strong>M</strong>ulti-<strong>A</strong>gent <strong>P</strong>olicy with offline data, dubbed <strong>CUTMAP</strong>, following the popular centralized training with decentralized execution paradigm. Specifically, we have formulated a scalable optimization objective within the framework of multi-agent maximum entropy reinforcement learning for CMARL. This approach is designed to estimate a decomposable Q-function by leveraging an unconstrained “prior policy”<span><span><sup>1</sup></span></span> in conjunction with cost signals extracted from offline data. When a new constraint comes, CUTMAP can reuse the prior policy without re-training it. To tackle the distribution shift challenge in offline learning, we also incorporate a conservative loss term when updating the Q-function. Therefore, the unconstrained prior policy can be trained to satisfy cost constraints through CUTMAP without the need for expensive interactions with the real environment, facilitating the practical application of MARL algorithms. Empirical results in several cooperative multi-agent benchmarks, including StarCraft games, particle games, food search games, and robot control, demonstrate the superior performance of our method.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107253"},"PeriodicalIF":6.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143429229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memristor-based circuit design of interweaving mechanism of emotional memory in a hippocamp-brain emotion learning model
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-13 DOI: 10.1016/j.neunet.2025.107276
Yunlai Zhu, Yongjie Zhao, Junjie Zhang, Xi Sun, Ying Zhu, Xu Zhou, Xuming Shen, Zuyu Xu, Zuheng Wu, Yuehua Dai
Endowing robots with human-like emotional and cognitive abilities has garnered widespread attention, driving deep investigations into the complexities of these processes. However, few studies have examined the intricate circuits that govern the interplay between emotion and memory. This work presents a memristive circuit design that generates emotional memory, mimicking human emotional responses and memories while enabling interaction between emotions and cognition. Leveraging the hippocampal-brain emotion learning (BEL) architecture, the memristive circuit comprises seven comprehensive modules: the thalamus, sensory cortex, orbitofrontal cortex, amygdala, dentate gyrus (DG), CA3, and CA1. This design incorporates a compact biological framework, facilitating the collaborative encoding of emotional memories by the amygdala and hippocampus and allowing for flexible adjustment of circuit parameters to accommodate diverse personality traits. The proposed memristor-based circuit effectively mimics the complex interplay between emotions and memory, providing a valuable foundation for advancing the development of robots capable of replicating human-like emotional responses and cognitive integration.
{"title":"Memristor-based circuit design of interweaving mechanism of emotional memory in a hippocamp-brain emotion learning model","authors":"Yunlai Zhu,&nbsp;Yongjie Zhao,&nbsp;Junjie Zhang,&nbsp;Xi Sun,&nbsp;Ying Zhu,&nbsp;Xu Zhou,&nbsp;Xuming Shen,&nbsp;Zuyu Xu,&nbsp;Zuheng Wu,&nbsp;Yuehua Dai","doi":"10.1016/j.neunet.2025.107276","DOIUrl":"10.1016/j.neunet.2025.107276","url":null,"abstract":"<div><div>Endowing robots with human-like emotional and cognitive abilities has garnered widespread attention, driving deep investigations into the complexities of these processes. However, few studies have examined the intricate circuits that govern the interplay between emotion and memory. This work presents a memristive circuit design that generates emotional memory, mimicking human emotional responses and memories while enabling interaction between emotions and cognition. Leveraging the hippocampal-brain emotion learning (BEL) architecture, the memristive circuit comprises seven comprehensive modules: the thalamus, sensory cortex, orbitofrontal cortex, amygdala, dentate gyrus (DG), CA3, and CA1. This design incorporates a compact biological framework, facilitating the collaborative encoding of emotional memories by the amygdala and hippocampus and allowing for flexible adjustment of circuit parameters to accommodate diverse personality traits. The proposed memristor-based circuit effectively mimics the complex interplay between emotions and memory, providing a valuable foundation for advancing the development of robots capable of replicating human-like emotional responses and cognitive integration.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107276"},"PeriodicalIF":6.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Daydreaming Hopfield Networks and their surprising effectiveness on correlated data
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-13 DOI: 10.1016/j.neunet.2025.107216
Ludovica Serricchio , Dario Bocchi , Claudio Chilin , Raffaele Marino , Matteo Negri , Chiara Cammarota , Federico Ricci-Tersenghi
To improve the storage capacity of the Hopfield model, we develop a version of the dreaming algorithm that perpetually reinforces the patterns to be stored (as in the Hebb rule), and erases the spurious memories (as in dreaming algorithms). For this reason, we called it Daydreaming. Daydreaming is not destructive and it converges asymptotically to stationary retrieval maps. When trained on random uncorrelated examples, the model shows optimal performance in terms of the size of the basins of attraction of stored examples and the quality of reconstruction. We also train the Daydreaming algorithm on correlated data obtained via the random-features model and argue that it spontaneously exploits the correlations thus increasing even further the storage capacity and the size of the basins of attraction. Moreover, the Daydreaming algorithm is also able to stabilize the features hidden in the data. Finally, we test Daydreaming on the MNIST dataset and show that it still works surprisingly well, producing attractors that are close to unseen examples and class prototypes.
{"title":"Daydreaming Hopfield Networks and their surprising effectiveness on correlated data","authors":"Ludovica Serricchio ,&nbsp;Dario Bocchi ,&nbsp;Claudio Chilin ,&nbsp;Raffaele Marino ,&nbsp;Matteo Negri ,&nbsp;Chiara Cammarota ,&nbsp;Federico Ricci-Tersenghi","doi":"10.1016/j.neunet.2025.107216","DOIUrl":"10.1016/j.neunet.2025.107216","url":null,"abstract":"<div><div>To improve the storage capacity of the Hopfield model, we develop a version of the dreaming algorithm that <em>perpetually</em> reinforces the patterns to be stored (as in the Hebb rule), and erases the spurious memories (as in dreaming algorithms). For this reason, we called it <em>Daydreaming</em>. Daydreaming is not destructive and it converges asymptotically to stationary retrieval maps. When trained on random uncorrelated examples, the model shows optimal performance in terms of the size of the basins of attraction of stored examples and the quality of reconstruction. We also train the Daydreaming algorithm on correlated data obtained via the random-features model and argue that it spontaneously exploits the correlations thus increasing even further the storage capacity and the size of the basins of attraction. Moreover, the Daydreaming algorithm is also able to stabilize the features hidden in the data. Finally, we test Daydreaming on the MNIST dataset and show that it still works surprisingly well, producing attractors that are close to unseen examples and class prototypes.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107216"},"PeriodicalIF":6.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Span-aware pre-trained network with deep information bottleneck for scientific entity relation extraction
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-11 DOI: 10.1016/j.neunet.2025.107250
Youwei Wang , Peisong Cao , Haichuan Fang , Yangdong Ye
Scientific entity relation extraction intends to promote the performance of each subtask through exploring the contextual representations with rich scientific semantics. However, most of existing models encounter the dilemma of scientific semantic dilution, where task-irrelevant information entangles with task-relevant information making science-friendly representation learning challenging. In addition, existing models isolate task-relevant information among subtasks, undermining the coherence of scientific semantics and consequently impairing the performance of each subtask. To deal with these challenges, a novel and effective Span-aware Pre-trained network with deep Information Bottleneck (SpIB) is proposed, which aims to conduct the scientific entity and relation extraction by minimizing task-irrelevant information and meanwhile maximizing the relatedness of task-relevant information. Specifically, SpIB model includes a minimum span-based representation learning (SRL) module and a relatedness-oriented task-relevant representation learning (TRL) module to disentangle the task-irrelevant information and discover the relatedness hidden in task-relevant information across subtasks. Then, an information minimum–maximum strategy is designed to minimize the mutual information of span-based representations and maximize the multivariate information of task-relevant representations. Finally, we design a unified loss function to simultaneously optimize the learned span-based and task-relevant representations. Experimental results on several scientific datasets, SciERC, ADE, BioRelEx, show the superiority of the proposed SpIB model over various the state-of-the-art models. The source code is publicly available at https://github.com/SWT-AITeam/SpIB.
{"title":"Span-aware pre-trained network with deep information bottleneck for scientific entity relation extraction","authors":"Youwei Wang ,&nbsp;Peisong Cao ,&nbsp;Haichuan Fang ,&nbsp;Yangdong Ye","doi":"10.1016/j.neunet.2025.107250","DOIUrl":"10.1016/j.neunet.2025.107250","url":null,"abstract":"<div><div>Scientific entity relation extraction intends to promote the performance of each subtask through exploring the contextual representations with rich scientific semantics. However, most of existing models encounter the dilemma of scientific semantic dilution, where task-irrelevant information entangles with task-relevant information making science-friendly representation learning challenging. In addition, existing models isolate task-relevant information among subtasks, undermining the coherence of scientific semantics and consequently impairing the performance of each subtask. To deal with these challenges, a novel and effective <strong>S</strong>pan-aware <strong>P</strong>re-trained network with deep <strong>I</strong>nformation <strong>B</strong>ottleneck (SpIB) is proposed, which aims to conduct the scientific entity and relation extraction by minimizing task-irrelevant information and meanwhile maximizing the relatedness of task-relevant information. Specifically, SpIB model includes a minimum span-based representation learning (SRL) module and a relatedness-oriented task-relevant representation learning (TRL) module to disentangle the task-irrelevant information and discover the relatedness hidden in task-relevant information across subtasks. Then, an information minimum–maximum strategy is designed to minimize the mutual information of span-based representations and maximize the multivariate information of task-relevant representations. Finally, we design a unified loss function to simultaneously optimize the learned span-based and task-relevant representations. Experimental results on several scientific datasets, SciERC, ADE, BioRelEx, show the superiority of the proposed SpIB model over various the state-of-the-art models. The source code is publicly available at <span><span>https://github.com/SWT-AITeam/SpIB</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107250"},"PeriodicalIF":6.0,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical task network-enhanced multi-agent reinforcement learning: Toward efficient cooperative strategies
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-11 DOI: 10.1016/j.neunet.2025.107254
Xuechen Mu , Hankz Hankui Zhuo , Chen Chen , Kai Zhang , Chao Yu , Jianye Hao
Navigating multi-agent reinforcement learning (MARL) environments with sparse rewards is notoriously difficult, particularly in suboptimal settings where exploration can be prematurely halted. To tackle these challenges, we introduce Hierarchical Symbolic Multi-Agent Reinforcement Learning (HS-MARL), a novel approach that incorporates hierarchical knowledge into MARL to effectively reduce the exploration space. We design intermediate states to decompose the state space into a hierarchical structure, represented using the Hierarchical Domain Definition Language (HDDL) and the option framework, forming domain knowledge and a symbolic option set. We leverage pyHIPOP+, an enhanced hierarchical task network (HTN) planner, to generate action sequences. A high-level meta-controller then assigns these symbolic options as policy functions, guiding low-level agents in their exploration of the environment. During this process, the meta-controller computes intrinsic rewards from the environmental rewards collected, which are used to train the symbolic option policies and refine pyHIPOP+’s heuristic function, thereby optimizing future action sequences. We evaluate HS-MARL with comparison to 15 state-of-the-art algorithms across two types of environments: four with sparse rewards and suboptimal conditions, and a real-world scenario involving a football match. Additionally, we perform an ablation study on HS-MARL’s intrinsic reward mechanism and pyHIPOP+, along with a sensitivity analysis of intrinsic reward hyperparameters. Our results show that HS-MARL significantly outperforms other methods in environments with sparse rewards and suboptimal conditions, underscoring the critical role of its intrinsic reward design and the pyHIPOP+ component. The code is available at: https://github.com/Mxc666/HS-MARL.git.
{"title":"Hierarchical task network-enhanced multi-agent reinforcement learning: Toward efficient cooperative strategies","authors":"Xuechen Mu ,&nbsp;Hankz Hankui Zhuo ,&nbsp;Chen Chen ,&nbsp;Kai Zhang ,&nbsp;Chao Yu ,&nbsp;Jianye Hao","doi":"10.1016/j.neunet.2025.107254","DOIUrl":"10.1016/j.neunet.2025.107254","url":null,"abstract":"<div><div>Navigating multi-agent reinforcement learning (MARL) environments with sparse rewards is notoriously difficult, particularly in suboptimal settings where exploration can be prematurely halted. To tackle these challenges, we introduce Hierarchical Symbolic Multi-Agent Reinforcement Learning (HS-MARL), a novel approach that incorporates hierarchical knowledge into MARL to effectively reduce the exploration space. We design intermediate states to decompose the state space into a hierarchical structure, represented using the Hierarchical Domain Definition Language (HDDL) and the option framework, forming domain knowledge and a symbolic option set. We leverage pyHIPOP+, an enhanced hierarchical task network (HTN) planner, to generate action sequences. A high-level meta-controller then assigns these symbolic options as policy functions, guiding low-level agents in their exploration of the environment. During this process, the meta-controller computes intrinsic rewards from the environmental rewards collected, which are used to train the symbolic option policies and refine pyHIPOP+’s heuristic function, thereby optimizing future action sequences. We evaluate HS-MARL with comparison to 15 state-of-the-art algorithms across two types of environments: four with sparse rewards and suboptimal conditions, and a real-world scenario involving a football match. Additionally, we perform an ablation study on HS-MARL’s intrinsic reward mechanism and pyHIPOP+, along with a sensitivity analysis of intrinsic reward hyperparameters. Our results show that HS-MARL significantly outperforms other methods in environments with sparse rewards and suboptimal conditions, underscoring the critical role of its intrinsic reward design and the pyHIPOP+ component. The code is available at: <span><span>https://github.com/Mxc666/HS-MARL.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107254"},"PeriodicalIF":6.0,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMBANet: A flexible efficient multi-branch attention network
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-11 DOI: 10.1016/j.neunet.2025.107248
Keke Zu , Hu Zhang , Lei Zhang , Jian Lu , Chen Xu , Hongyang Chen , Yu Zheng
Recent advances in the design of convolutional neural networks have shown that performance can be enhanced by improving the ability to represent multi-scale features. However, most existing methods either focus on designing more sophisticated attention modules, which leads to higher computational costs, or fail to effectively establish long-range channel dependencies, or neglect the extraction and utilization of structural information. This work introduces a novel module, the Multi-Branch Concatenation (MBC), designed to process input tensors and extract multi-scale feature maps. The MBC module introduces new degrees of freedom (DoF) in the design of attention networks by allowing for flexible adjustments to the types of transformation operators and the number of branches. This study considers two key transformation operators: multiplexing and splitting, both of which facilitate a more granular representation of multi-scale features and enhance the receptive field range. By integrating the MBC with an attention module, a Multi-Branch Attention (MBA) module is developed to capture channel-wise interactions within feature maps, thereby establishing long-range channel dependencies. Replacing the 3x3 convolutions in the bottleneck blocks of ResNet with the proposed MBA yields a new block, the Efficient Multi-Branch Attention (EMBA), which can be seamlessly integrated into state-of-the-art backbone CNN models. Furthermore, a new backbone network, named EMBANet, is constructed by stacking EMBA blocks. The proposed EMBANet has been thoroughly evaluated across various computer vision tasks, including classification, detection, and segmentation, consistently demonstrating superior performance compared to popular backbones.
{"title":"EMBANet: A flexible efficient multi-branch attention network","authors":"Keke Zu ,&nbsp;Hu Zhang ,&nbsp;Lei Zhang ,&nbsp;Jian Lu ,&nbsp;Chen Xu ,&nbsp;Hongyang Chen ,&nbsp;Yu Zheng","doi":"10.1016/j.neunet.2025.107248","DOIUrl":"10.1016/j.neunet.2025.107248","url":null,"abstract":"<div><div>Recent advances in the design of convolutional neural networks have shown that performance can be enhanced by improving the ability to represent multi-scale features. However, most existing methods either focus on designing more sophisticated attention modules, which leads to higher computational costs, or fail to effectively establish long-range channel dependencies, or neglect the extraction and utilization of structural information. This work introduces a novel module, the Multi-Branch Concatenation (MBC), designed to process input tensors and extract multi-scale feature maps. The MBC module introduces new degrees of freedom (DoF) in the design of attention networks by allowing for flexible adjustments to the types of transformation operators and the number of branches. This study considers two key transformation operators: multiplexing and splitting, both of which facilitate a more granular representation of multi-scale features and enhance the receptive field range. By integrating the MBC with an attention module, a Multi-Branch Attention (MBA) module is developed to capture channel-wise interactions within feature maps, thereby establishing long-range channel dependencies. Replacing the 3x3 convolutions in the bottleneck blocks of ResNet with the proposed MBA yields a new block, the Efficient Multi-Branch Attention (EMBA), which can be seamlessly integrated into state-of-the-art backbone CNN models. Furthermore, a new backbone network, named EMBANet, is constructed by stacking EMBA blocks. The proposed EMBANet has been thoroughly evaluated across various computer vision tasks, including classification, detection, and segmentation, consistently demonstrating superior performance compared to popular backbones.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107248"},"PeriodicalIF":6.0,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural-network-based accelerated safe Q-learning for optimal control of discrete-time nonlinear systems with state constraints
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-10 DOI: 10.1016/j.neunet.2025.107249
Mingming Zhao, Ding Wang, Junfei Qiao
For unknown nonlinear systems with state constraints, it is difficult to achieve the safe optimal control by using Q-learning methods based on traditional quadratic utility functions. To solve this problem, this article proposes an accelerated safe Q-learning (SQL) technique that addresses the concurrent requirements of safety and optimality for discrete-time nonlinear systems within an integrated framework. First, an adjustable control barrier function is designed and integrated into the cost function, aiming to facilitate the transformation of constrained optimal control problems into unconstrained cases. The augmented cost function is closely linked to the next state, enabling quicker deviation of the state from constraint boundaries. Second, leveraging offline data that adheres to safety constraints, we introduce an off-policy value iteration SQL approach for searching a safe optimal policy, thus mitigating the risk of unsafe interactions that may result from suboptimal iterative policies. Third, the vast amounts of offline data and the complex augmented cost function can hinder the learning speed of the algorithm. To address this issue, we integrate historical iteration information into the current iteration step to accelerate policy evaluation, and introduce the Nesterov Momentum technique to expedite policy improvement. Additionally, the theoretical analysis demonstrates the convergence, optimality, and safety of the SQL algorithm. Finally, under the influence of different parameters, simulation outcomes of two nonlinear systems with state constraints reveal the efficacy and advantages of the accelerated SQL approach. The proposed method requires fewer iterations while enabling the system state to converge to the equilibrium point more rapidly.
{"title":"Neural-network-based accelerated safe Q-learning for optimal control of discrete-time nonlinear systems with state constraints","authors":"Mingming Zhao,&nbsp;Ding Wang,&nbsp;Junfei Qiao","doi":"10.1016/j.neunet.2025.107249","DOIUrl":"10.1016/j.neunet.2025.107249","url":null,"abstract":"<div><div>For unknown nonlinear systems with state constraints, it is difficult to achieve the safe optimal control by using Q-learning methods based on traditional quadratic utility functions. To solve this problem, this article proposes an accelerated safe Q-learning (SQL) technique that addresses the concurrent requirements of safety and optimality for discrete-time nonlinear systems within an integrated framework. First, an adjustable control barrier function is designed and integrated into the cost function, aiming to facilitate the transformation of constrained optimal control problems into unconstrained cases. The augmented cost function is closely linked to the next state, enabling quicker deviation of the state from constraint boundaries. Second, leveraging offline data that adheres to safety constraints, we introduce an off-policy value iteration SQL approach for searching a safe optimal policy, thus mitigating the risk of unsafe interactions that may result from suboptimal iterative policies. Third, the vast amounts of offline data and the complex augmented cost function can hinder the learning speed of the algorithm. To address this issue, we integrate historical iteration information into the current iteration step to accelerate policy evaluation, and introduce the Nesterov Momentum technique to expedite policy improvement. Additionally, the theoretical analysis demonstrates the convergence, optimality, and safety of the SQL algorithm. Finally, under the influence of different parameters, simulation outcomes of two nonlinear systems with state constraints reveal the efficacy and advantages of the accelerated SQL approach. The proposed method requires fewer iterations while enabling the system state to converge to the equilibrium point more rapidly.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107249"},"PeriodicalIF":6.0,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intervening on few-shot object detection based on the front-door criterion
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-10 DOI: 10.1016/j.neunet.2025.107251
Yanan Zhang , Jiangmeng Li , Qirui Ji , Kai Li , Lixiang Liu , Changwen Zheng , Wenwen Qiang
Most few-shot object detection methods aim to utilize the learned generalizable knowledge from base categories to identify instances of novel categories. The fundamental assumption of these approaches is that the model can acquire sufficient transferable knowledge through the learning of base categories. However, our motivating experiments reveal a phenomenon that the model is overfitted to the data of base categories. To discuss the impact of this phenomenon on detection from a causal perspective, we develop a Structural Causal Model involving two key variables, causal generative factors and spurious generative factors. Both variables are derived from the base categories. Generative factors are latent variables or features that are used to control image generation. Causal generative factors are general generative factors that directly influence the generation process, while spurious generative factors are specific to certain categories, specifically the base categories in the problem we are analyzing. We recognize that the essence of the few-shot object detection methods lies in modeling the statistic dependence between novel object instances and their corresponding categories determined by the causal generative factors, while the set of spurious generative factors serves as a confounder in the modeling process. To mitigate the misleading impact of the spurious generative factors, we propose the Front-door Regulator guided by the front-door criterion. Front-door Regulator consists of two plug-and-play regularization terms, namely Semantic Grouping and Semantic Decoupling. We substantiate the effectiveness of our proposed method through experiments conducted on multiple benchmark datasets.
{"title":"Intervening on few-shot object detection based on the front-door criterion","authors":"Yanan Zhang ,&nbsp;Jiangmeng Li ,&nbsp;Qirui Ji ,&nbsp;Kai Li ,&nbsp;Lixiang Liu ,&nbsp;Changwen Zheng ,&nbsp;Wenwen Qiang","doi":"10.1016/j.neunet.2025.107251","DOIUrl":"10.1016/j.neunet.2025.107251","url":null,"abstract":"<div><div>Most few-shot object detection methods aim to utilize the learned generalizable knowledge from base categories to identify instances of novel categories. The fundamental assumption of these approaches is that the model can acquire sufficient transferable knowledge through the learning of base categories. However, our motivating experiments reveal a phenomenon that the model is overfitted to the data of base categories. To discuss the impact of this phenomenon on detection from a causal perspective, we develop a Structural Causal Model involving two key variables, causal generative factors and spurious generative factors. Both variables are derived from the base categories. Generative factors are latent variables or features that are used to control image generation. Causal generative factors are general generative factors that directly influence the generation process, while spurious generative factors are specific to certain categories, specifically the base categories in the problem we are analyzing. We recognize that the essence of the few-shot object detection methods lies in modeling the statistic dependence between novel object instances and their corresponding categories determined by the causal generative factors, while the set of spurious generative factors serves as a confounder in the modeling process. To mitigate the misleading impact of the spurious generative factors, we propose the <em><strong>F</strong>ront-door <strong>R</strong>egulator</em> guided by the front-door criterion. <em><strong>F</strong>ront-door <strong>R</strong>egulator</em> consists of two plug-and-play regularization terms, namely Semantic Grouping and Semantic Decoupling. We substantiate the effectiveness of our proposed method through experiments conducted on multiple benchmark datasets.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107251"},"PeriodicalIF":6.0,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing spatial perception and contextual understanding for 3D dense captioning
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-10 DOI: 10.1016/j.neunet.2025.107252
Jie Yan , Yuxiang Xie , Shiwei Zou , Yingmei Wei , Xidao Luan
3D dense captioning (3D-DC) transcends traditional 2D image captioning by requiring detailed spatial understanding and object localization, aiming to generate high-quality descriptions for objects within 3D environments. Current approaches struggle with accurately describing the spatial location relationships of the objects and suffer from discrepancies between object detection and caption generation. To address these limitations, we introduce a novel one-stage 3D-DC model that integrates a Query-Guided Detector and Task-Specific Context-Aware Captioner to enhance the performance of 3D-DC. The Query-Guided Detector employs an adaptive query mechanism and leverages the Transformer architecture to dynamically adjust attention focus across layers, improving the model’s comprehension of spatial relationships within point clouds. Additionally, the Task-Specific Context-Aware Captioner incorporates task-specific context-aware prompts and a Squeeze-and-Excitation (SE) module to improve contextual understanding and ensure consistency and accuracy between detected objects and their descriptions. A two-stage learning rate update strategy is proposed to optimize the training of the Query-Guided Detector. Extensive experiments on the ScanRefer and Nr3D datasets demonstrate the superiority of our approach, outperforming previous two-stage ‘detect-then-describe’ methods and existing one-stage methods, particularly on the challenging Nr3D dataset.
{"title":"Enhancing spatial perception and contextual understanding for 3D dense captioning","authors":"Jie Yan ,&nbsp;Yuxiang Xie ,&nbsp;Shiwei Zou ,&nbsp;Yingmei Wei ,&nbsp;Xidao Luan","doi":"10.1016/j.neunet.2025.107252","DOIUrl":"10.1016/j.neunet.2025.107252","url":null,"abstract":"<div><div>3D dense captioning (3D-DC) transcends traditional 2D image captioning by requiring detailed spatial understanding and object localization, aiming to generate high-quality descriptions for objects within 3D environments. Current approaches struggle with accurately describing the spatial location relationships of the objects and suffer from discrepancies between object detection and caption generation. To address these limitations, we introduce a novel one-stage 3D-DC model that integrates a Query-Guided Detector and Task-Specific Context-Aware Captioner to enhance the performance of 3D-DC. The Query-Guided Detector employs an adaptive query mechanism and leverages the Transformer architecture to dynamically adjust attention focus across layers, improving the model’s comprehension of spatial relationships within point clouds. Additionally, the Task-Specific Context-Aware Captioner incorporates task-specific context-aware prompts and a Squeeze-and-Excitation (SE) module to improve contextual understanding and ensure consistency and accuracy between detected objects and their descriptions. A two-stage learning rate update strategy is proposed to optimize the training of the Query-Guided Detector. Extensive experiments on the ScanRefer and Nr3D datasets demonstrate the superiority of our approach, outperforming previous two-stage ‘detect-then-describe’ methods and existing one-stage methods, particularly on the challenging Nr3D dataset.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107252"},"PeriodicalIF":6.0,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmenting interaction effects in convolutional networks with taylor polynomial gated units
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-08 DOI: 10.1016/j.neunet.2025.107262
Ligeng Zou , Qi Liu , Jianhua Dai
Transformer-based vision models are often assumed to have an advantage over traditional convolutional neural networks (CNNs) due to their ability to model long-range dependencies and interactions between inputs. However, the remarkable success of pure convolutional models such as ConvNeXt, which incorporates architectural elements from Vision Transformers (ViTs), challenge the prevailing assumption about the intrinsic superiority of Transformers. In this work, we aim to explore an alternative path to efficiently express interactions between inputs without an attention module by delving into the interaction effects in ConvNeXt. This exploration leads to the proposal of a new activation function, i.e., the Taylor Polynomial Gated Unit (TPGU). The TPGU substitutes the cumulative distribution function in the Gaussian Error Linear Unit (GELU) with a learnable Taylor polynomial, so that it not only can flexibly adjust the strength of each order of interactions but also does not require additional normalization or regularization of the input and output. Comprehensive experiments demonstrate that swapping out GELUs for TPGUs notably boosts model performance under identical training settings. Moreover, empirical evidence highlights the particularly favorable impact of the TPGU on pure convolutional networks, such that it enhances the performance of ConvNeXt-T by 0.7 % on ImageNet-1K. Our findings encourage revisiting the potential utility of polynomials within contemporary neural network architectures. The code for our implementation has been made publicly available at https://github.com/LQandlq/tpgu.
{"title":"Augmenting interaction effects in convolutional networks with taylor polynomial gated units","authors":"Ligeng Zou ,&nbsp;Qi Liu ,&nbsp;Jianhua Dai","doi":"10.1016/j.neunet.2025.107262","DOIUrl":"10.1016/j.neunet.2025.107262","url":null,"abstract":"<div><div>Transformer-based vision models are often assumed to have an advantage over traditional convolutional neural networks (CNNs) due to their ability to model long-range dependencies and interactions between inputs. However, the remarkable success of pure convolutional models such as ConvNeXt, which incorporates architectural elements from Vision Transformers (ViTs), challenge the prevailing assumption about the intrinsic superiority of Transformers. In this work, we aim to explore an alternative path to efficiently express interactions between inputs without an attention module by delving into the interaction effects in ConvNeXt. This exploration leads to the proposal of a new activation function, i.e., the Taylor Polynomial Gated Unit (TPGU). The TPGU substitutes the cumulative distribution function in the Gaussian Error Linear Unit (GELU) with a learnable Taylor polynomial, so that it not only can flexibly adjust the strength of each order of interactions but also does not require additional normalization or regularization of the input and output. Comprehensive experiments demonstrate that swapping out GELUs for TPGUs notably boosts model performance under identical training settings. Moreover, empirical evidence highlights the particularly favorable impact of the TPGU on pure convolutional networks, such that it enhances the performance of ConvNeXt-T by 0.7 % on ImageNet-1K. Our findings encourage revisiting the potential utility of polynomials within contemporary neural network architectures. The code for our implementation has been made publicly available at <span><span>https://github.com/LQandlq/tpgu</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107262"},"PeriodicalIF":6.0,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1