Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108630
Deyu Chen , Caicai Guo , Qiyuan Li , Jinguang Gu , Meiyi Xie , Hong Zhu
Lifelong knowledge graph embedding (KGE) methods aim to learn new knowledge continuously while retaining old knowledge. This line of work has received much attention for its potential to enable knowledge retention and transfer and to reduce training costs under knowledge graphs’ growing scale and flexibility. However, embedding space drift under different contexts is a crucial reason for catastrophic forgetting and inefficient learning of new facts, and existing work ignores this perspective. In order to address the above issues, we proposed a novel lifelong KGE framework that considers learning new facts and preserving old facts in a unified perspective. We propose a diffusion-based embedding method that captures the contextual variation of entity representations and obtains transferable embeddings. In order to handle the drift of the embedding space and balance the learning efficiency, we adopt a reconstruction and generation strategy based on contrastive learning. To avoid catastrophic forgetting and maintain the stability of the embedding distribution, we proposed an effective distribution regularization method. We conduct extensive experiments on seven benchmark datasets with different construction strategies and incremental speed. Experimental results show that our proposed framework outperforms existing lifelong KGE methods.
{"title":"Lifelong knowledge graph embedding via diffusion model","authors":"Deyu Chen , Caicai Guo , Qiyuan Li , Jinguang Gu , Meiyi Xie , Hong Zhu","doi":"10.1016/j.neunet.2026.108630","DOIUrl":"10.1016/j.neunet.2026.108630","url":null,"abstract":"<div><div>Lifelong knowledge graph embedding (KGE) methods aim to learn new knowledge continuously while retaining old knowledge. This line of work has received much attention for its potential to enable knowledge retention and transfer and to reduce training costs under knowledge graphs’ growing scale and flexibility. However, embedding space drift under different contexts is a crucial reason for catastrophic forgetting and inefficient learning of new facts, and existing work ignores this perspective. In order to address the above issues, we proposed a novel lifelong KGE framework that considers learning new facts and preserving old facts in a unified perspective. We propose a diffusion-based embedding method that captures the contextual variation of entity representations and obtains transferable embeddings. In order to handle the drift of the embedding space and balance the learning efficiency, we adopt a reconstruction and generation strategy based on contrastive learning. To avoid catastrophic forgetting and maintain the stability of the embedding distribution, we proposed an effective distribution regularization method. We conduct extensive experiments on seven benchmark datasets with different construction strategies and incremental speed. Experimental results show that our proposed framework outperforms existing lifelong KGE methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108630"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108631
Nianyi Wang, Shuai Zheng, Yu Chen, Hai Zhao, Zhou Fang
Learning-based fluid simulation has emerged as an efficient alternative to traditional Navier-Stokes solvers. However, existing neural methods that build upon Smoothed Particle Hydrodynamics (SPH) predominantly rely on local particle interactions, which induces instability in complex scenarios due to error accumulation. To address this, we introduce FluidFormer, a novel architecture that establishes a hierarchical local-global modeling paradigm. The core of our model is the Fluid Attention Block (FAB), a co-design that orchestrates continuous convolution for locality with self-attention for global corrective long-range hydrodynamic phenomena. Embedded in a dual-pipeline network, our approach seamlessly fuses inductive physical biases with structured global reasoning. Extensive experiments show that FluidFormer achieves state-of-the-art performance, with significantly improved stability and generalization in challenging fluid scenes, demonstrating its potential as a robust simulator for complex physical systems.
{"title":"FluidFormer : Transformer with continuous convolution for particle-based fluid simulation","authors":"Nianyi Wang, Shuai Zheng, Yu Chen, Hai Zhao, Zhou Fang","doi":"10.1016/j.neunet.2026.108631","DOIUrl":"10.1016/j.neunet.2026.108631","url":null,"abstract":"<div><div>Learning-based fluid simulation has emerged as an efficient alternative to traditional Navier-Stokes solvers. However, existing neural methods that build upon Smoothed Particle Hydrodynamics (SPH) predominantly rely on local particle interactions, which induces instability in complex scenarios due to error accumulation. To address this, we introduce FluidFormer, a novel architecture that establishes a hierarchical local-global modeling paradigm. The core of our model is the Fluid Attention Block (FAB), a co-design that orchestrates continuous convolution for locality with self-attention for global corrective long-range hydrodynamic phenomena. Embedded in a dual-pipeline network, our approach seamlessly fuses inductive physical biases with structured global reasoning. Extensive experiments show that FluidFormer achieves state-of-the-art performance, with significantly improved stability and generalization in challenging fluid scenes, demonstrating its potential as a robust simulator for complex physical systems.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108631"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108618
Xiaobo Li , Xiaodi Hou , Shilong Wang , Hongfei Lin , Yijia Zhang
Drug recommendation systems have garnered considerable interest in the healthcare, striving to offer precise and customized drug prescriptions that align with patients’ specific health needs. However, existing methods primarily focus on modeling temporal dependencies between visits for patients with multiple encounters, often neglecting the challenge of data sparsity in single-visit patients. To address above limitation, we propose a novel Relation-aware Pre-trained Network with hierarchical aggregation mechanism for drug recommendation (RPNet), which employs a pre-training and fine-tuning framework to enhance drug recommendation in cold-start scenario. Specifically, we introduce: 1) A code matching discrimination task during pre-training, designed to model the complex relationships between diagnosis and procedure entities. This task employs a mask-replace contrastive learning strategy, which pulls similar samples closer while pushing dissimilar ones apart, thereby capturing robust feature representations; 2) A hierarchical aggregation mechanism that enhances drug information integration by first selecting relevant visits based on rarity discrimination and then retrieving similar patients’ drug insights via similarity matching during fine-tuning. Extensive experiments on two real-world datasets demonstrate the superiority of the proposed RPNet, notably improving the F1 metric by 1.32% and 1.19%. The code of our model is available at https://github.com/Lxb0102/RPNet.
{"title":"Relation-aware pre-trained network with hierarchical aggregation mechanism for cold-start drug recommendation","authors":"Xiaobo Li , Xiaodi Hou , Shilong Wang , Hongfei Lin , Yijia Zhang","doi":"10.1016/j.neunet.2026.108618","DOIUrl":"10.1016/j.neunet.2026.108618","url":null,"abstract":"<div><div>Drug recommendation systems have garnered considerable interest in the healthcare, striving to offer precise and customized drug prescriptions that align with patients’ specific health needs. However, existing methods primarily focus on modeling temporal dependencies between visits for patients with multiple encounters, often neglecting the challenge of data sparsity in single-visit patients. To address above limitation, we propose a novel Relation-aware Pre-trained Network with hierarchical aggregation mechanism for drug recommendation (RPNet), which employs a pre-training and fine-tuning framework to enhance drug recommendation in cold-start scenario. Specifically, we introduce: 1) A code matching discrimination task during pre-training, designed to model the complex relationships between diagnosis and procedure entities. This task employs a mask-replace contrastive learning strategy, which pulls similar samples closer while pushing dissimilar ones apart, thereby capturing robust feature representations; 2) A hierarchical aggregation mechanism that enhances drug information integration by first selecting relevant visits based on rarity discrimination and then retrieving similar patients’ drug insights via similarity matching during fine-tuning. Extensive experiments on two real-world datasets demonstrate the superiority of the proposed RPNet, notably improving the F1 metric by 1.32% and 1.19%. The code of our model is available at <span><span>https://github.com/Lxb0102/RPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108618"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108628
Daixun Li , Weiying Xie , Leyuan Fang , Yunke Wang , Zirui Li , Mingxiang Cao , Jitao Ma , Yunsong Li , Chang Xu
Significant progress has been made in the application of transformer architectures for multimodal tasks. However, current methods such as the self-attention mechanism rarely consider the benefits that feature complementarity and consistency between different modalities bring to fusion, leading to obstacles such as redundant fusion or incomplete representation. Inspired by topological homology groups, we introduce MMFormer, a novel semi-supervised algorithm for high-dimensional multimodal fusion. This method is engineered to capture comprehensive representations by enhancing the interactivity between modal mappings. Specifically, we advocate for the representational consistency between these heterogeneous representations through a complete dictionary lookup and homology space in the encoder, and establish an exclusivity-aware mapping of the two modalities to emphasize their complementary information, serving as a powerful supplement for multimodal feature interpretation. Moreover, the model attempts to alleviate the challenge of sparse annotations in high-dimensional multimodal data by introducing a consistency joint regularization term. We have formulated these focuses into a unified end-to-end optimization framework and are the first to explore and derive the application of semi-supervised visual transformers in high-dimensional multimodal data fusion. Extensive experiments across three benchmarks demonstrate the superiority of MMFormer. Specifically, the model improves overall accuracy by 3.12% on Houston2013, 1.86% on Augsburg, and 1.66% on MUUFL compared with the strongest existing methods, confirming its robustness and effectiveness under sparse annotation conditions. The code is available at https://github.com/LDXDU/MMFormer.
{"title":"MMFormer: Multi-Modality semi-Supervised vision transformer in remote sensing imagery classification","authors":"Daixun Li , Weiying Xie , Leyuan Fang , Yunke Wang , Zirui Li , Mingxiang Cao , Jitao Ma , Yunsong Li , Chang Xu","doi":"10.1016/j.neunet.2026.108628","DOIUrl":"10.1016/j.neunet.2026.108628","url":null,"abstract":"<div><div>Significant progress has been made in the application of transformer architectures for multimodal tasks. However, current methods such as the self-attention mechanism rarely consider the benefits that feature complementarity and consistency between different modalities bring to fusion, leading to obstacles such as redundant fusion or incomplete representation. Inspired by topological homology groups, we introduce MMFormer, a novel semi-supervised algorithm for high-dimensional multimodal fusion. This method is engineered to capture comprehensive representations by enhancing the interactivity between modal mappings. Specifically, we advocate for the representational consistency between these heterogeneous representations through a complete dictionary lookup and homology space in the encoder, and establish an exclusivity-aware mapping of the two modalities to emphasize their complementary information, serving as a powerful supplement for multimodal feature interpretation. Moreover, the model attempts to alleviate the challenge of sparse annotations in high-dimensional multimodal data by introducing a consistency joint regularization term. We have formulated these focuses into a unified end-to-end optimization framework and are the first to explore and derive the application of semi-supervised visual transformers in high-dimensional multimodal data fusion. Extensive experiments across three benchmarks demonstrate the superiority of MMFormer. Specifically, the model improves overall accuracy by 3.12% on Houston2013, 1.86% on Augsburg, and 1.66% on MUUFL compared with the strongest existing methods, confirming its robustness and effectiveness under sparse annotation conditions. The code is available at <span><span>https://github.com/LDXDU/MMFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108628"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108616
Penglian Gao , Hongwei Ge , Shuzhi Su
Pan-sharpening is intended to generate high-resolution multi-spectral images, utilizing pairs of low-resolution multi-spectral and high-resolution panchromatic images. Recently, the Mamba-based pan-sharpening models achieve state-of-the-art performance due to their efficient long-range relational modeling. However, Mamba inherently obeys a first-order state-space high-dimensional nonlinear mapping, which fails to efficiently encode higher-order expressive interactions of spectral features. In this study, we propose a novel higher-order state-space model for pan-sharpening (PHoM). Our PHoM follows the concept of splitting, interaction, and aggregation for higher-order spatial adaptive interaction and discriminative learning without introducing excessive computational overhead. To model the fusion process between multi-spectral and panchromatic images, we further extend the PHoM into a cross-modal PHoM, which further improves the representation capability by exploiting higher-order cross-modal correlations. We conduct extensive experiments on different datasets. Experimental results show that our method achieves significant performance improvements, outperforming previous state-of-the-art methods on public datasets.
{"title":"PHoM: Effective pan-sharpening via higher-order state-space model","authors":"Penglian Gao , Hongwei Ge , Shuzhi Su","doi":"10.1016/j.neunet.2026.108616","DOIUrl":"10.1016/j.neunet.2026.108616","url":null,"abstract":"<div><div>Pan-sharpening is intended to generate high-resolution multi-spectral images, utilizing pairs of low-resolution multi-spectral and high-resolution panchromatic images. Recently, the Mamba-based pan-sharpening models achieve state-of-the-art performance due to their efficient long-range relational modeling. However, Mamba inherently obeys a first-order state-space high-dimensional nonlinear mapping, which fails to efficiently encode higher-order expressive interactions of spectral features. In this study, we propose a novel higher-order state-space model for pan-sharpening (PHoM). Our PHoM follows the concept of splitting, interaction, and aggregation for higher-order spatial adaptive interaction and discriminative learning without introducing excessive computational overhead. To model the fusion process between multi-spectral and panchromatic images, we further extend the PHoM into a cross-modal PHoM, which further improves the representation capability by exploiting higher-order cross-modal correlations. We conduct extensive experiments on different datasets. Experimental results show that our method achieves significant performance improvements, outperforming previous state-of-the-art methods on public datasets.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108616"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.neunet.2026.108627
Kunlun Wu, Donghai Zhai
Weakly-supervised temporal action localization is a practical task that localizes different action instances from untrimmed videos without frame-level annotations. Current approaches either enhance the discriminative features of action snippets to reduce confusion with the background, or focus on less informative snippets to guide the model to explore non-salient regions. However, they seldom explicitly consider similar sub-processes across different actions, known as cross-category consensus relationships, which can provide complementary information for exploring the more comprehensive localization results. Moreover, previous methods mostly overlooked class-level higher-order dynamics, which can provide finer-grained motion relationships to help the model capture subtle discriminative features. To alleviate the above problems, we investigate a simple yet effective method termed the STCD network, which leverages superclass-level semantics and high-order dynamics for spatiotemporal consensus and discriminative learning. Specifically, we leverage the high-order encoding module based on Koopman theory to explicitly explore the discriminative class-wise dynamics. Meanwhile, we adopt superclass-level semantics to capture the consensus relationships among various actions due to the similar sub-actions of diverse categories are essential to mine more comprehensive action snippets. Finally, we argue that snippets with high entropy in their category distribution typically demonstrate significant uncertainty and possess ambiguous representations in their feature space. From the perspective of information theory, we further propose an effective loss function to further enhance the discriminative features of each action snippet, i.e., selecting the Top-k categories with the highest predicted probability for each segment and reducing the uncertainty by minimizing their information entropy. Experimental results on three datasets, i.e., THUMOS14, ActivityNet v1.2 and ActivityNet v1.3, demonstrate that our method is superior to the state-of-the-art.
{"title":"Cross-category spatiotemporal consensus and discriminative networks for weakly-supervised temporal action localization","authors":"Kunlun Wu, Donghai Zhai","doi":"10.1016/j.neunet.2026.108627","DOIUrl":"10.1016/j.neunet.2026.108627","url":null,"abstract":"<div><div>Weakly-supervised temporal action localization is a practical task that localizes different action instances from untrimmed videos without frame-level annotations. Current approaches either enhance the discriminative features of action snippets to reduce confusion with the background, or focus on less informative snippets to guide the model to explore non-salient regions. However, they seldom explicitly consider similar sub-processes across different actions, known as cross-category consensus relationships, which can provide complementary information for exploring the more comprehensive localization results. Moreover, previous methods mostly overlooked class-level higher-order dynamics, which can provide finer-grained motion relationships to help the model capture subtle discriminative features. To alleviate the above problems, we investigate a simple yet effective method termed the STCD network, which leverages superclass-level semantics and high-order dynamics for spatiotemporal consensus and discriminative learning. Specifically, we leverage the high-order encoding module based on <em>Koopman</em> theory to explicitly explore the discriminative class-wise dynamics. Meanwhile, we adopt superclass-level semantics to capture the consensus relationships among various actions due to the similar sub-actions of diverse categories are essential to mine more comprehensive action snippets. Finally, we argue that snippets with high entropy in their category distribution typically demonstrate significant uncertainty and possess ambiguous representations in their feature space. From the perspective of information theory, we further propose an effective loss function to further enhance the discriminative features of each action snippet, i.e., selecting the Top-k categories with the highest predicted probability for each segment and reducing the uncertainty by minimizing their information entropy. Experimental results on three datasets, i.e., THUMOS14, ActivityNet v1.2 and ActivityNet v1.3, demonstrate that our method is superior to the state-of-the-art.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108627"},"PeriodicalIF":6.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual-based navigation for mobile robot poses significant challenges due to limited visual perception and the presence of unforeseen dynamic obstacles. Deep reinforcement learning (DRL) provides an end-to-end solution by directly mapping raw sensor data to control commands, offering high adaptability and reduced reliance on handcrafted rules. However, high-dimensional visual inputs and the non-stationarity introduced by dynamic obstacles easily make the policy learning of DRL difficult to convergent and unstable. In this paper, an enhanced end-to-end visual navigation framework is proposed for mobile robot operating in dynamic environments, denoted as CBAM-ST-GCN. A convolutional block attention module (CBAM) is introduced into the framework to enhance visual perception by assigning attention weights across spatial and temporal dimensions. Furthermore, a spatio-temporal graph convolutional network (ST-GCN) is designed to capture the behavior features of moving obstacles. In addition, a velocity obstacle (VO) method-based penalty term is incorporated into the reward function for the enhancement of collision avoidance. Extensive simulation results demonstrate that the proposed method achieves superior success rates and significantly higher convergence speed. Real-world experiments further validate the effectiveness and adaptability of the proposed approach in practical scenarios.
{"title":"CBAM-ST-GCN: An enhanced DRL-based end-to-end visual navigation framework for mobile robot","authors":"Mingyang Xie, Wei Yu, Huanyu Jin, Wei Li, Xin Chen","doi":"10.1016/j.neunet.2026.108622","DOIUrl":"10.1016/j.neunet.2026.108622","url":null,"abstract":"<div><div>Visual-based navigation for mobile robot poses significant challenges due to limited visual perception and the presence of unforeseen dynamic obstacles. Deep reinforcement learning (DRL) provides an end-to-end solution by directly mapping raw sensor data to control commands, offering high adaptability and reduced reliance on handcrafted rules. However, high-dimensional visual inputs and the non-stationarity introduced by dynamic obstacles easily make the policy learning of DRL difficult to convergent and unstable. In this paper, an enhanced end-to-end visual navigation framework is proposed for mobile robot operating in dynamic environments, denoted as CBAM-ST-GCN. A convolutional block attention module (CBAM) is introduced into the framework to enhance visual perception by assigning attention weights across spatial and temporal dimensions. Furthermore, a spatio-temporal graph convolutional network (ST-GCN) is designed to capture the behavior features of moving obstacles. In addition, a velocity obstacle (VO) method-based penalty term is incorporated into the reward function for the enhancement of collision avoidance. Extensive simulation results demonstrate that the proposed method achieves superior success rates and significantly higher convergence speed. Real-world experiments further validate the effectiveness and adaptability of the proposed approach in practical scenarios.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108622"},"PeriodicalIF":6.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1016/j.neunet.2026.108624
Kurt Pasque , Christopher Teska , Ruriko Yoshida , Keiji Miura , Jefferson Huang
We introduce a simple, easy to implement, and computationally efficient tropical convolutional neural network architecture that is robust against adversarial attacks. We exploit the tropical nature of piece-wise linear neural networks by embedding the data in the tropical projective torus. This can be accomplished with a single additional hidden layer called a tropical embedding layer, and can in principle be added to any neural network architecture. We study the geometry of the resulting decision boundary, and find that like adversarial training and various regularization techniques that have been proposed, adding the tropical embedding layer tends to increase the number of linear regions associated with the decision boundaries. Our numerical experiments show that our approach achieves state-of-the-art levels of adversarial robustness, while requiring much less computational time than adversarial training.
{"title":"Adversarially robust neural network decision boundaries via tropical geometry","authors":"Kurt Pasque , Christopher Teska , Ruriko Yoshida , Keiji Miura , Jefferson Huang","doi":"10.1016/j.neunet.2026.108624","DOIUrl":"10.1016/j.neunet.2026.108624","url":null,"abstract":"<div><div>We introduce a simple, easy to implement, and computationally efficient tropical convolutional neural network architecture that is robust against adversarial attacks. We exploit the tropical nature of piece-wise linear neural networks by embedding the data in the <em>tropical projective torus</em>. This can be accomplished with a single additional hidden layer called a <em>tropical embedding layer</em>, and can in principle be added to any neural network architecture. We study the geometry of the resulting decision boundary, and find that like adversarial training and various regularization techniques that have been proposed, adding the tropical embedding layer tends to increase the number of linear regions associated with the decision boundaries. Our numerical experiments show that our approach achieves state-of-the-art levels of adversarial robustness, while requiring much less computational time than adversarial training.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108624"},"PeriodicalIF":6.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1016/j.neunet.2026.108629
Wenlan Kuang , Zhixin Li
Multi-label image classification is a classification task that assigns labels to multiple objects in an input image. Recent research ideas mainly focus on solving the semantic consistency of visual features and label features. However, since images contain complex scene content, the features captured by visual feature extraction networks based on grid or sequence representation may introduce redundant information or lack continuity when identifying irregular objects. In order to fully mine the visual information of complex objects in images and enhance the inter-modal interaction of images and labels, we introduce a flexible graph structure to explore the internal information of objects and design a multi-modal feature alignment (MMFA) network for multi-label image classification. To enhance the context awareness and semantic association of different patch regions, we propose a semantic-augmented interaction module that combines two kinds of visual semantic information with label embeddings for interactive learning. Finally, we refine the dependence between local intrinsic information and overall semantics by redefining semantic queries through semantically enhanced visual spatial features and graph aggregation features. Experiments on three large-scale public datasets: Microsoft COCO, Pascal VOC 2007 and NUS-WIDE demonstrate the effectiveness of our proposed MMFA and achieve state-of-the-art performance.
{"title":"Multi-modal feature alignment networks for multi-label image classification","authors":"Wenlan Kuang , Zhixin Li","doi":"10.1016/j.neunet.2026.108629","DOIUrl":"10.1016/j.neunet.2026.108629","url":null,"abstract":"<div><div>Multi-label image classification is a classification task that assigns labels to multiple objects in an input image. Recent research ideas mainly focus on solving the semantic consistency of visual features and label features. However, since images contain complex scene content, the features captured by visual feature extraction networks based on grid or sequence representation may introduce redundant information or lack continuity when identifying irregular objects. In order to fully mine the visual information of complex objects in images and enhance the inter-modal interaction of images and labels, we introduce a flexible graph structure to explore the internal information of objects and design a multi-modal feature alignment (MMFA) network for multi-label image classification. To enhance the context awareness and semantic association of different patch regions, we propose a semantic-augmented interaction module that combines two kinds of visual semantic information with label embeddings for interactive learning. Finally, we refine the dependence between local intrinsic information and overall semantics by redefining semantic queries through semantically enhanced visual spatial features and graph aggregation features. Experiments on three large-scale public datasets: Microsoft COCO, Pascal VOC 2007 and NUS-WIDE demonstrate the effectiveness of our proposed MMFA and achieve state-of-the-art performance.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108629"},"PeriodicalIF":6.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1016/j.neunet.2026.108625
Wenjie Liu , Bohan Du , Weiwei Liu , Yifan Zhu
The integration of symmetry, such as permutation equivariance, into Quantum Graph Neural Networks (QGNNs), referred to as Equivariant Quantum Graph Neural Networks (EQGNNs), markedly improves the model’s generalization performance on graph-structured data. Despite this advancement, current research has not yet extended rotational equivariance to QGNN frameworks. Furthermore, processing large-scale graph data increases computational complexity due to numerous inter-node connections, significantly raising the required number of qubits. To address these challenges, a novel Rotationally Equivariant Quantum Graph Neural Network (REQGNN) with trainable compression encoder and entanglement-enhanced aggregation mechanism is proposed. By adopting quantum fidelity as the evaluation metric, we design a quantum autoencoder to effectively compress feature dimensionality, substantially lowering the qubit requirements of the model while preserving essential global structural details. To achieve rotational equivariance in the model, we propose an entanglement-enhanced layer that incorporates distance and angle information between nodes. This layer performs entanglement by extracting diverse edge information, thereby further refining edge feature extraction. Additionally, an auxiliary entanglement layer is introduced to mitigate the over-smoothing issue. Experimental results demonstrate REQGNN is significantly better for graph classification tasks than GIN, Gra+QSVM, and Gra+QCNN on four datasets in all metrics and achieves better results than egoGQNN in accuracy on PTC dataset, and it also has advantage for graph regression tasks over the classical models, including EGNN and EquiformerV2, and reduces the MAE of Cv task unit by 20% on average compared with a previous quantum model QGCNN. Our approach offers an effective solution for achieving rotational equivariance while providing a novel perspective for exploring symmetry in graph neural networks (GNNs).
{"title":"Rotation equivariant quantum graph neural networks with trainable compression encoder and entanglement-enhanced aggregation","authors":"Wenjie Liu , Bohan Du , Weiwei Liu , Yifan Zhu","doi":"10.1016/j.neunet.2026.108625","DOIUrl":"10.1016/j.neunet.2026.108625","url":null,"abstract":"<div><div>The integration of symmetry, such as permutation equivariance, into Quantum Graph Neural Networks (QGNNs), referred to as Equivariant Quantum Graph Neural Networks (EQGNNs), markedly improves the model’s generalization performance on graph-structured data. Despite this advancement, current research has not yet extended rotational equivariance to QGNN frameworks. Furthermore, processing large-scale graph data increases computational complexity due to numerous inter-node connections, significantly raising the required number of qubits. To address these challenges, a novel Rotationally Equivariant Quantum Graph Neural Network (REQGNN) with trainable compression encoder and entanglement-enhanced aggregation mechanism is proposed. By adopting quantum fidelity as the evaluation metric, we design a quantum autoencoder to effectively compress feature dimensionality, substantially lowering the qubit requirements of the model while preserving essential global structural details. To achieve rotational equivariance in the model, we propose an entanglement-enhanced layer that incorporates distance and angle information between nodes. This layer performs entanglement by extracting diverse edge information, thereby further refining edge feature extraction. Additionally, an auxiliary entanglement layer is introduced to mitigate the over-smoothing issue. Experimental results demonstrate REQGNN is significantly better for graph classification tasks than GIN, Gra+QSVM, and Gra+QCNN on four datasets in all metrics and achieves better results than egoGQNN in accuracy on PTC dataset, and it also has advantage for graph regression tasks over the classical models, including EGNN and EquiformerV2, and reduces the MAE of <em>C<sub>v</sub></em> task unit by 20% on average compared with a previous quantum model QGCNN. Our approach offers an effective solution for achieving rotational equivariance while providing a novel perspective for exploring symmetry in graph neural networks (GNNs).</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108625"},"PeriodicalIF":6.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}