Pub Date : 2026-02-04DOI: 10.1016/j.neucom.2026.132962
Dmytro Kuzmenko , Nadiya Shvai
Mixture-of-Experts (MoE) approaches have gained traction in robotics for their ability to dynamically allocate resources and specialize sub-networks. However, such systems typically rely on monolithic architectures with rigid, learned internal routing, which prevents selective expert customization and necessitates expensive joint training. We propose MoIRA, an architecture-agnostic modular framework that coordinates decoupled experts via an external, zero-shot text router. MoIRA employs two routing strategies: embedding-based similarity and prompt-driven language model inference. Leveraging Gr00t-N1 and Vision-Language-Action models with low-rank adapters, we evaluate MoIRA on GR1 Humanoid tasks and LIBERO benchmarks. Our approach consistently outperforms generalist models and competes with fully trained MoE pipelines. Furthermore, we demonstrate system robustness against instruction perturbations. By relying on textual descriptions for zero-shot orchestration, MoIRA proves the viability of modular deployment and offers a scalable, flexible foundation for multi-expert robotic systems.
{"title":"MoIRA: Modular instruction routing architecture for multi-task robotics","authors":"Dmytro Kuzmenko , Nadiya Shvai","doi":"10.1016/j.neucom.2026.132962","DOIUrl":"10.1016/j.neucom.2026.132962","url":null,"abstract":"<div><div>Mixture-of-Experts (MoE) approaches have gained traction in robotics for their ability to dynamically allocate resources and specialize sub-networks. However, such systems typically rely on monolithic architectures with rigid, learned internal routing, which prevents selective expert customization and necessitates expensive joint training. We propose MoIRA, an architecture-agnostic modular framework that coordinates decoupled experts via an external, zero-shot text router. MoIRA employs two routing strategies: embedding-based similarity and prompt-driven language model inference. Leveraging Gr00t-N1 and <span><math><msub><mi>π</mi><mrow><mn>0</mn></mrow></msub></math></span> Vision-Language-Action models with low-rank adapters, we evaluate MoIRA on GR1 Humanoid tasks and LIBERO benchmarks. Our approach consistently outperforms generalist models and competes with fully trained MoE pipelines. Furthermore, we demonstrate system robustness against instruction perturbations. By relying on textual descriptions for zero-shot orchestration, MoIRA proves the viability of modular deployment and offers a scalable, flexible foundation for multi-expert robotic systems.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132962"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1016/j.neucom.2026.132966
Ying Xu, Dexin Zhang, Dasen Cai
Triplet loss is widely used in classification tasks, especially in fine-grained image classification. However, there is a large proportion of simple triplet samples in the fine-grained image classification process, and the use of original triplet loss cannot fully utilize the data information to update network parameters. This study proposes a metric learning method called Triplet Loss with Gaussian Sampling Uncertainty (TL-GSU), which aims to capture fine-grained features with uncertainty in the data. Specifically, TL-GSU reformulates the triplet loss framework by modeling each anchor example using the data distribution of another example with the same class, represented by a multidimensional Gaussian distribution. The proposed loss function TL-GSU is defined as the expected value of the classical triplet loss, where anchor samples are taken from a multivariate Gaussian distribution derived from the training set. In addition, an improved feature reduction structure is proposed to reduce computational costs in the fine-grained visual classification pipeline. The proposed TL-GSU is comprehensively validated on three datasets: Stanford Cars, Stanford Dogs, and CUB-200–2011. The results demonstrate the effectiveness of the proposed approach.
{"title":"Fine-grained image classification driven by Gaussian sampling and metric learning","authors":"Ying Xu, Dexin Zhang, Dasen Cai","doi":"10.1016/j.neucom.2026.132966","DOIUrl":"10.1016/j.neucom.2026.132966","url":null,"abstract":"<div><div>Triplet loss is widely used in classification tasks, especially in fine-grained image classification. However, there is a large proportion of simple triplet samples in the fine-grained image classification process, and the use of original triplet loss cannot fully utilize the data information to update network parameters. This study proposes a metric learning method called Triplet Loss with Gaussian Sampling Uncertainty (TL-GSU), which aims to capture fine-grained features with uncertainty in the data. Specifically, TL-GSU reformulates the triplet loss framework by modeling each anchor example using the data distribution of another example with the same class, represented by a multidimensional Gaussian distribution. The proposed loss function TL-GSU is defined as the expected value of the classical triplet loss, where anchor samples are taken from a multivariate Gaussian distribution derived from the training set. In addition, an improved feature reduction structure is proposed to reduce computational costs in the fine-grained visual classification pipeline. The proposed TL-GSU is comprehensively validated on three datasets: Stanford Cars, Stanford Dogs, and CUB-200–2011. The results demonstrate the effectiveness of the proposed approach.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132966"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Attributed networks contain both structural connections and rich node attributes, which are crucial for the formation and identification of community structures. Although integrating attribute data enhances the accuracy of community detection algorithms, it also raises the risk of privacy leakage. To address this issue, community hiding has emerged as a promising solution. However, most existing research has centered on topological networks, leaving attributed networks largely unexplored. In response to these issues, we propose Attribute Permanence (APERM)—a novel community hiding method specifically designed for attributed networks, which quantifies permanence loss to identify structurally influential edges for perturbation. The objective of our perturbation strategy is to disrupt the global community structure, which typically involves considering all existing and potential edges in the network, and this introduces considerable computational complexity. To tackle this problem, we introduce a strategy that identifies Closely Homogeneous Nodes (CHN) by integrating both structural similarity and attribute information, thereby significantly reducing the edge perturbation search space. The experimental results from eight community detection algorithms (four for attributed networks and four for non-attributed networks) across six real-world datasets demonstrate that our proposed APERM algorithm not only achieves effective community hiding but also retains robust performance.
{"title":"Edge-centric community hiding based on permanence in attributed networks","authors":"Zhichao Feng , Bohan Zhang , Junchang Jing , Dong Liu","doi":"10.1016/j.neucom.2026.132924","DOIUrl":"10.1016/j.neucom.2026.132924","url":null,"abstract":"<div><div>Attributed networks contain both structural connections and rich node attributes, which are crucial for the formation and identification of community structures. Although integrating attribute data enhances the accuracy of community detection algorithms, it also raises the risk of privacy leakage. To address this issue, community hiding has emerged as a promising solution. However, most existing research has centered on topological networks, leaving attributed networks largely unexplored. In response to these issues, we propose Attribute Permanence (APERM)—a novel community hiding method specifically designed for attributed networks, which quantifies permanence loss to identify structurally influential edges for perturbation. The objective of our perturbation strategy is to disrupt the global community structure, which typically involves considering all existing and potential edges in the network, and this introduces considerable computational complexity. To tackle this problem, we introduce a strategy that identifies Closely Homogeneous Nodes (CHN) by integrating both structural similarity and attribute information, thereby significantly reducing the edge perturbation search space. The experimental results from eight community detection algorithms (four for attributed networks and four for non-attributed networks) across six real-world datasets demonstrate that our proposed APERM algorithm not only achieves effective community hiding but also retains robust performance.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132924"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1016/j.neucom.2026.132971
Jingyi He, Yongjun Li, Yifei Liang, Mengyan Lu, Haorui Liu, Jixing Zhou, Yi Wei, Hongyan Liu
To overcome the limitations of static feature extraction and inefficient context modeling in existing learned image compression, this paper proposes an image compression algorithm that integrates Depth-aware Adaptive Transformation (DAT) framework and Multi-reference Dynamic Entropy Model (MDEM). A proposed Multi-scale Capacity-aware Feature Enhancer (MCFE) model is adaptively embedded into the network to enhance feature extraction capability. The DAT architecture integrates a variational autoencoder framework with MCFE to increase the density of latent representations. Furthermore, an improved soft-threshold sparse attention mechanism is combined with a multi-context model, incorporating adaptive weights to eliminate spatial redundancy in the latent representations across local, non-local, and global dimensions, while channel context is introduced to capture channel dependencies. Building upon this, the MDEM integrates the side information provided by DAT along with spatial and channel context information and employs a channel-wise autoregressive model to achieve accurate pixel estimation for precise entropy probability estimation, which improves compression performance. Evaluated on the Kodak, Tecnick, and CLIC(Challenge on Learned Image Compression) Professional Validation datasets, the proposed method achieves BD-rate(Bjøntegaard Delta rate) gains of , , and , respectively, compared to the VTM(Versatile Video Coding Test Model)-17.0 benchmark. Therefore, the proposed algorithm overcomes the limitations of fixed-context and static feature extraction strategies, enabling precise probability estimation and superior compression performance through dynamic resource allocation and multi-dimensional contextual modeling.
为了克服现有学习图像压缩中静态特征提取和上下文建模效率低下的局限性,提出了一种融合深度感知自适应变换(DAT)框架和多参考动态熵模型(MDEM)的图像压缩算法。提出了一种自适应嵌入网络的多尺度容量感知特征增强器(MCFE)模型,以增强特征提取能力。DAT架构将变分自编码器框架与MCFE集成在一起,以增加潜在表示的密度。此外,将改进的软阈值稀疏注意机制与多上下文模型相结合,结合自适应权重来消除局部、非局部和全局维度潜在表征中的空间冗余,同时引入通道上下文来捕获通道依赖性。在此基础上,MDEM集成了DAT提供的侧信息以及空间和信道上下文信息,并采用信道自回归模型实现精确的像素估计,以实现精确的熵概率估计,从而提高了压缩性能。在Kodak, Tecnick和CLIC(Challenge on Learned Image Compression) Professional Validation数据集上进行了评估,与VTM(Versatile Video Coding Test Model)-17.0基准相比,该方法的BD-rate(Bjøntegaard Delta rate)分别提高了7.75%,9.33%和5.73%。因此,该算法克服了固定上下文和静态特征提取策略的局限性,通过动态资源分配和多维上下文建模实现了精确的概率估计和优越的压缩性能。
{"title":"Depth aware image compression with multi-reference dynamic entropy model","authors":"Jingyi He, Yongjun Li, Yifei Liang, Mengyan Lu, Haorui Liu, Jixing Zhou, Yi Wei, Hongyan Liu","doi":"10.1016/j.neucom.2026.132971","DOIUrl":"10.1016/j.neucom.2026.132971","url":null,"abstract":"<div><div>To overcome the limitations of static feature extraction and inefficient context modeling in existing learned image compression, this paper proposes an image compression algorithm that integrates Depth-aware Adaptive Transformation (DAT) framework and Multi-reference Dynamic Entropy Model (MDEM). A proposed Multi-scale Capacity-aware Feature Enhancer (MCFE) model is adaptively embedded into the network to enhance feature extraction capability. The DAT architecture integrates a variational autoencoder framework with MCFE to increase the density of latent representations. Furthermore, an improved soft-threshold sparse attention mechanism is combined with a multi-context model, incorporating adaptive weights to eliminate spatial redundancy in the latent representations across local, non-local, and global dimensions, while channel context is introduced to capture channel dependencies. Building upon this, the MDEM integrates the side information provided by DAT along with spatial and channel context information and employs a channel-wise autoregressive model to achieve accurate pixel estimation for precise entropy probability estimation, which improves compression performance. Evaluated on the Kodak, Tecnick, and CLIC(Challenge on Learned Image Compression) Professional Validation datasets, the proposed method achieves BD-rate(Bjøntegaard Delta rate) gains of <span><math><mn>7.75</mn><mi>%</mi></math></span>, <span><math><mn>9.33</mn><mi>%</mi></math></span>, and <span><math><mn>5.73</mn><mi>%</mi></math></span>, respectively, compared to the VTM(Versatile Video Coding Test Model)-17.0 benchmark. Therefore, the proposed algorithm overcomes the limitations of fixed-context and static feature extraction strategies, enabling precise probability estimation and superior compression performance through dynamic resource allocation and multi-dimensional contextual modeling.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132971"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Personalized image generation, a key application of diffusion models, holds significant importance for the advancement of computer vision, artistic creation, and content generation technologies. However, existing diffusion models fine-tuned with Low-Rank Adaptation (LoRA) face multiple challenges when learning novel concepts: language drift undermines the generation quality of new concepts in novel contexts; the entanglement of object features with other elements in reference images leads to misalignment between the learning target and its unique identifier; and traditional LoRA approaches are limited to learning only one concept at a time. To address these issues, this study proposes a novel hierarchical learning strategy and an enhanced LoRA module. Specifically, we incorporate the GeLU activation function into the LoRA architecture as a nonlinear transformation to effectively mitigate language drift. Furthermore, a gated hierarchical learning mechanism is designed to achieve inter-concept disentanglement, enabling a single LoRA module to learn multiple concepts concurrently. Experimental results across multiple random seeds demonstrate that our approach achieves a 4%–6% improvement in memory retention metrics and outperforms state-of-the-art methods in object fidelity and style similarity by approximately 12.5% and 10%, respectively. In addition to superior generation quality, our method demonstrates high computational efficiency, requiring significantly fewer trainable parameters (45M) compared to existing baselines. While preserving critical features of target objects and maintaining the model’s original capabilities, our method enables the generation of images across diverse scenes in new styles. In scenarios requiring the simultaneous learning of multiple concepts, this study not only presents a novel solution to the multi-concept learning problem in personalized diffusion model training but also lays a technical foundation for high-quality customized AI image generation and diverse visual content creation. The source code is publicly available athttps://github.com/ydniuyongjie/HierLoRA/tree/main.
{"title":"HierLoRA: A hierarchical multi-concept learning approach with enhanced LoRA for personalized image diffusion models","authors":"Yongjie Niu , Pengbo Zhou , Rui Zhou , Mingquan Zhou","doi":"10.1016/j.neucom.2026.132927","DOIUrl":"10.1016/j.neucom.2026.132927","url":null,"abstract":"<div><div>Personalized image generation, a key application of diffusion models, holds significant importance for the advancement of computer vision, artistic creation, and content generation technologies. However, existing diffusion models fine-tuned with Low-Rank Adaptation (LoRA) face multiple challenges when learning novel concepts: language drift undermines the generation quality of new concepts in novel contexts; the entanglement of object features with other elements in reference images leads to misalignment between the learning target and its unique identifier; and traditional LoRA approaches are limited to learning only one concept at a time. To address these issues, this study proposes a novel hierarchical learning strategy and an enhanced LoRA module. Specifically, we incorporate the GeLU activation function into the LoRA architecture as a nonlinear transformation to effectively mitigate language drift. Furthermore, a gated hierarchical learning mechanism is designed to achieve inter-concept disentanglement, enabling a single LoRA module to learn multiple concepts concurrently. Experimental results across multiple random seeds demonstrate that our approach achieves a 4%–6% improvement in memory retention metrics and outperforms state-of-the-art methods in object fidelity and style similarity by approximately 12.5% and 10%, respectively. In addition to superior generation quality, our method demonstrates high computational efficiency, requiring significantly fewer trainable parameters (<span><math><mo>∼</mo></math></span>45M) compared to existing baselines. While preserving critical features of target objects and maintaining the model’s original capabilities, our method enables the generation of images across diverse scenes in new styles. In scenarios requiring the simultaneous learning of multiple concepts, this study not only presents a novel solution to the multi-concept learning problem in personalized diffusion model training but also lays a technical foundation for high-quality customized AI image generation and diverse visual content creation. <strong>The source code is publicly available at</strong> <span><span><strong>https://github.com/ydniuyongjie/HierLoRA/tree/main</strong></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132927"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1016/j.neucom.2026.132973
Arthur Aubret , Cèline Teulière , Jochen Triesch
Humans learn to recognize categories of objects, even when exposed to minimal language supervision. Behavioral studies and the successes of self-supervised learning (SSL) models suggest that this learning may hinge on modeling spatial regularities of visual features. However, SSL models rely on geometric image augmentations such as masking portions of an image or aggressively cropping it, which are not known to be performed by the brain. Here, we propose CO-SSL, an alternative to geometric image augmentations to model spatial co-occurrences. CO-SSL aligns local representations (before pooling) with a global image representation. Combined with a neural network endowed with small receptive fields, we show that it outperforms previous methods by up to on ImageNet-1k when not using cropping augmentations. In addition, CO-SSL can be combined with cropping image augmentations to accelerate category learning and increase the robustness to internal corruptions and small adversarial attacks. Overall, our work paves the way towards a new approach for modeling biological learning and developing self-supervised representations in artificial systems.
{"title":"Seeing the whole in the parts with self-supervised representation learning","authors":"Arthur Aubret , Cèline Teulière , Jochen Triesch","doi":"10.1016/j.neucom.2026.132973","DOIUrl":"10.1016/j.neucom.2026.132973","url":null,"abstract":"<div><div>Humans learn to recognize categories of objects, even when exposed to minimal language supervision. Behavioral studies and the successes of self-supervised learning (SSL) models suggest that this learning may hinge on modeling spatial regularities of visual features. However, SSL models rely on geometric image augmentations such as masking portions of an image or aggressively cropping it, which are not known to be performed by the brain. Here, we propose CO-SSL, an alternative to geometric image augmentations to model spatial co-occurrences. CO-SSL aligns local representations (before pooling) with a global image representation. Combined with a neural network endowed with small receptive fields, we show that it outperforms previous methods by up to <span><math><mn>43.4</mn><mi>%</mi></math></span> on ImageNet-1k when not using cropping augmentations. In addition, CO-SSL can be combined with cropping image augmentations to accelerate category learning and increase the robustness to internal corruptions and small adversarial attacks. Overall, our work paves the way towards a new approach for modeling biological learning and developing self-supervised representations in artificial systems.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132973"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1016/j.neucom.2026.132961
Erdong Guo, David Draper
Bayesian statistical learning is a powerful paradigm for inference and prediction, which integrates internal information (sampling distribution of training data) and external information (prior knowledge or background information) within a logically consistent probabilistic framework. In addition, the posterior distribution and the posterior predictive (marginal) distribution derived from the Bayes’ rule summarize the entire information required for inference and prediction, respectively. In this work, we investigate the Bayesian framework of the Tensor Network (BTN) from two perspectives. First, for the inference step, we propose an effective initialization scheme for the BTN parameters, which significantly improves the robustness and efficiency of the training procedure and leads to improved test performance. Second, in the prediction stage, we consider the Gaussian prior of the weights in BTN and predict the labels of the new observations using the posterior predictive (marginal) distribution. We derive the approximation of the posterior predictive distribution using the Laplace approximation where the out-product approximation of the Hessian matrix of the posterior distribution is applied. In the numerical experiments, we evaluate the performance of our initialization strategy and demonstrate its advantages by comparing it with other popular initialization methods including He initialization, Xavier initialization and Haliassos initialization methods on California House Price (CHP), Breast Cancer (BC), Phishing Website (PW), MNIST, Fashion-MNIST (FMNIST), SVHN and CIFAR-10 datasets. We further examine the characteristics of BTN by showing its parameters and decision boundaries trained on the two-dimensional synthetic dataset. The performance of BTN is thoroughly analyzed from two perspectives: the generalization and calibration. Through the experiments on a variety of aforementioned datasets, we demonstrate the superior performance of BTN both in generalization and calibration compared to regular TN based learning models. This demonstrates the potential of the Bayesian formalism in the development of more powerful TN-based learning models.
{"title":"A Bayesian approach to tensor networks","authors":"Erdong Guo, David Draper","doi":"10.1016/j.neucom.2026.132961","DOIUrl":"10.1016/j.neucom.2026.132961","url":null,"abstract":"<div><div>Bayesian statistical learning is a powerful paradigm for inference and prediction, which integrates internal information (sampling distribution of training data) and external information (prior knowledge or background information) within a logically consistent probabilistic framework. In addition, the posterior distribution and the posterior predictive (marginal) distribution derived from the Bayes’ rule summarize the entire information required for inference and prediction, respectively. In this work, we investigate the Bayesian framework of the Tensor Network (BTN) from two perspectives. First, for the inference step, we propose an effective initialization scheme for the BTN parameters, which significantly improves the robustness and efficiency of the training procedure and leads to improved test performance. Second, in the prediction stage, we consider the Gaussian prior of the weights in BTN and predict the labels of the new observations using the posterior predictive (marginal) distribution. We derive the approximation of the posterior predictive distribution using the Laplace approximation where the out-product approximation of the Hessian matrix of the posterior distribution is applied. In the numerical experiments, we evaluate the performance of our initialization strategy and demonstrate its advantages by comparing it with other popular initialization methods including He initialization, Xavier initialization and Haliassos initialization methods on California House Price (CHP), Breast Cancer (BC), Phishing Website (PW), MNIST, Fashion-MNIST (FMNIST), SVHN and CIFAR-10 datasets. We further examine the characteristics of BTN by showing its parameters and decision boundaries trained on the two-dimensional synthetic dataset. The performance of BTN is thoroughly analyzed from two perspectives: the generalization and calibration. Through the experiments on a variety of aforementioned datasets, we demonstrate the superior performance of BTN both in generalization and calibration compared to regular TN based learning models. This demonstrates the potential of the Bayesian formalism in the development of more powerful TN-based learning models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132961"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1016/j.neucom.2026.132953
Junho Lee , Jisu Yoon , Jisong Kim , Jun Won Choi
Semantic segmentation models trained on source domains often fail to generalize to unseen domains due to domain shifts caused by varying environmental conditions. While existing approaches rely solely on text prompts for domain randomization, their generated styles often deviate from real-world distributions. To address this limitation, we propose a novel two-stage framework for Domain Generalization in Semantic Segmentation (DGSS). First, we introduce Image-Prompt-driven Instance Normalization (I-PIN), which leverages both style images and text prompts to optimize style parameters, achieving more accurate style representations compared to text-only approaches. Second, we present Dual-Path Style-Invariant Feature Learning (DSFL) which employs inter-style and intra-style consistency losses, ensuring consistent predictions across different styles while promoting feature alignment within semantic classes. Extensive experiments demonstrate that our approach consistently outperforms existing state-of-the-art methods across multiple challenging domains, effectively addressing the domain shift problem in semantic segmentation.
{"title":"Image-text driven style randomization for domain generalized semantic segmentation","authors":"Junho Lee , Jisu Yoon , Jisong Kim , Jun Won Choi","doi":"10.1016/j.neucom.2026.132953","DOIUrl":"10.1016/j.neucom.2026.132953","url":null,"abstract":"<div><div>Semantic segmentation models trained on source domains often fail to generalize to unseen domains due to domain shifts caused by varying environmental conditions. While existing approaches rely solely on text prompts for domain randomization, their generated styles often deviate from real-world distributions. To address this limitation, we propose a novel two-stage framework for <em>Domain Generalization in Semantic Segmentation</em> (DGSS). First, we introduce <em>Image-Prompt-driven Instance Normalization</em> (I-PIN), which leverages both style images and text prompts to optimize style parameters, achieving more accurate style representations compared to text-only approaches. Second, we present <em>Dual-Path Style-Invariant Feature Learning</em> (DSFL) which employs inter-style and intra-style consistency losses, ensuring consistent predictions across different styles while promoting feature alignment within semantic classes. Extensive experiments demonstrate that our approach consistently outperforms existing state-of-the-art methods across multiple challenging domains, effectively addressing the domain shift problem in semantic segmentation.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 132953"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146172681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1016/j.neucom.2026.132968
Hojae Choi , Jaewook Kim , Jongkil Park , Seongsik Park , Hyun Jae Jang , Seung Hwan Lee , Byeong-Kwon Ju , YeonJoo Jeong
Backpropagation Through Time (BPTT) trains Recurrent Spiking Neural Networks (R-SNNs) effectively but incurs high computational and memory costs, limiting real-time applications. To mitigate resource demands, we adopt truncated BPTT (K=1), reducing memory cost by three orders of magnitude. However, this truncation weakens sequence learning by limiting gradient propagation. To compensate, we introduce the Spatio-temporal Adaptive Recurrent Spiking Neural Network (STAR-SNN), which incorporates adaptive parameters to enhance high-dimensional representations and effectively retain sequence information despite truncation. Additionally, R-SNNs suffer from unstable training due to the entanglement of spike generation and suppression in weight updates. To resolve this, we develop Separated Propagation Surrogate Gradient (SPSG), which decouples these processes by selectively propagating error signals, stabilizing learning and improving convergence. Our approach achieves a 393-fold reduction in MSE loss for chaotic system forecasting and delivers high performance in event-driven DVS-Gesture recognition, establishing a scalable, hardware-efficient framework for real-time neuromorphic computing.
{"title":"STAR-SNN: A spatio-temporal adaptive recurrent spiking neural network with separated propagation surrogate gradient for hardware efficient real-time learning","authors":"Hojae Choi , Jaewook Kim , Jongkil Park , Seongsik Park , Hyun Jae Jang , Seung Hwan Lee , Byeong-Kwon Ju , YeonJoo Jeong","doi":"10.1016/j.neucom.2026.132968","DOIUrl":"10.1016/j.neucom.2026.132968","url":null,"abstract":"<div><div>Backpropagation Through Time (BPTT) trains Recurrent Spiking Neural Networks (R-SNNs) effectively but incurs high computational and memory costs, limiting real-time applications. To mitigate resource demands, we adopt truncated BPTT (K=1), reducing memory cost by three orders of magnitude. However, this truncation weakens sequence learning by limiting gradient propagation. To compensate, we introduce the Spatio-temporal Adaptive Recurrent Spiking Neural Network (STAR-SNN), which incorporates adaptive parameters to enhance high-dimensional representations and effectively retain sequence information despite truncation. Additionally, R-SNNs suffer from unstable training due to the entanglement of spike generation and suppression in weight updates. To resolve this, we develop Separated Propagation Surrogate Gradient (SPSG), which decouples these processes by selectively propagating error signals, stabilizing learning and improving convergence. Our approach achieves a 393-fold reduction in MSE loss for chaotic system forecasting and delivers high performance in event-driven DVS-Gesture recognition, establishing a scalable, hardware-efficient framework for real-time neuromorphic computing.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132968"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1016/j.neucom.2026.132869
Meng Liu , Xiao He
This paper introduces a novel extension to the Extended Kalman-based Smooth Variable Structure Filter (EK-SVSF), a hybrid state estimation framework that integrates the Extended Kalman Filter (EKF) with the Smooth Variable Structure Filter (SVSF). Tailored for nonlinear systems subject to model uncertainties and external disturbances, EK-SVSF enhances estimation accuracy by leveraging the complementary strengths of its constituent filters. Nonetheless, the efficacy of EK-SVSF hinges critically on the selection of an appropriate width for the smoothing boundary layer (SBL); suboptimal values—either excessively large or small—can substantially impair filtering performance. Compounding this issue, inherent model uncertainties render the determination of an optimal SBL a formidable and enduring challenge. To mitigate this, we propose a data-driven methodology that autonomously extracts salient features from the smoothing boundary function, thereby resolving the parameter tuning dilemma under model uncertainty. Furthermore, to refine the associated multi-loss weighted aggregation, we incorporate an adaptive weighting scheme based on the coefficient of variation, enabling dynamic optimization. Empirical evaluations demonstrate that the proposed approach yields robust and resilient state estimation outcomes, even in the presence of significant model discrepancies.
{"title":"Data-driven robust state estimation based on EK-SVSF","authors":"Meng Liu , Xiao He","doi":"10.1016/j.neucom.2026.132869","DOIUrl":"10.1016/j.neucom.2026.132869","url":null,"abstract":"<div><div>This paper introduces a novel extension to the Extended Kalman-based Smooth Variable Structure Filter (EK-SVSF), a hybrid state estimation framework that integrates the Extended Kalman Filter (EKF) with the Smooth Variable Structure Filter (SVSF). Tailored for nonlinear systems subject to model uncertainties and external disturbances, EK-SVSF enhances estimation accuracy by leveraging the complementary strengths of its constituent filters. Nonetheless, the efficacy of EK-SVSF hinges critically on the selection of an appropriate width for the smoothing boundary layer (SBL); suboptimal values—either excessively large or small—can substantially impair filtering performance. Compounding this issue, inherent model uncertainties render the determination of an optimal SBL a formidable and enduring challenge. To mitigate this, we propose a data-driven methodology that autonomously extracts salient features from the smoothing boundary function, thereby resolving the parameter tuning dilemma under model uncertainty. Furthermore, to refine the associated multi-loss weighted aggregation, we incorporate an adaptive weighting scheme based on the coefficient of variation, enabling dynamic optimization. Empirical evaluations demonstrate that the proposed approach yields robust and resilient state estimation outcomes, even in the presence of significant model discrepancies.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132869"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}