Pub Date : 2026-02-03DOI: 10.1016/j.neucom.2026.132952
Andrea Ceni, Valerio De Caro, Davide Bacciu, Claudio Gallicchio
We introduce AdaDiag, a framework for constructing sparse assemblies of recurrent neural networks (RNNs) with formal stability guarantees. Our approach builds upon contraction theory by designing RNN modules that are inherently contractive through adaptive diagonal parametrization and learnable characteristic time scales. This formulation enables each module to remain fully trainable while preserving global stability under skew-symmetric coupling. We provide rigorous theoretical analysis of contractivity, along with a complexity discussion showing that stability is achieved without additional computational burden. Experiments on ten heterogeneous time series benchmarks demonstrate that AdaDiag consistently surpasses SCN, LSTM, and Vanilla RNN baselines, and achieves competitive performance with state-of-the-art models, all while requiring substantially fewer trainable parameters. These results highlight the effectiveness of sparse and stable assemblies for efficient, adaptive, and generalizable sequence modeling.
{"title":"Sparse assemblies of recurrent neural networks with stability guarantees","authors":"Andrea Ceni, Valerio De Caro, Davide Bacciu, Claudio Gallicchio","doi":"10.1016/j.neucom.2026.132952","DOIUrl":"10.1016/j.neucom.2026.132952","url":null,"abstract":"<div><div>We introduce AdaDiag, a framework for constructing sparse assemblies of recurrent neural networks (RNNs) with formal stability guarantees. Our approach builds upon contraction theory by designing RNN modules that are inherently contractive through adaptive diagonal parametrization and learnable characteristic time scales. This formulation enables each module to remain fully trainable while preserving global stability under skew-symmetric coupling. We provide rigorous theoretical analysis of contractivity, along with a complexity discussion showing that stability is achieved without additional computational burden. Experiments on ten heterogeneous time series benchmarks demonstrate that AdaDiag consistently surpasses SCN, LSTM, and Vanilla RNN baselines, and achieves competitive performance with state-of-the-art models, all while requiring substantially fewer trainable parameters. These results highlight the effectiveness of sparse and stable assemblies for efficient, adaptive, and generalizable sequence modeling.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132952"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146172947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1016/j.neucom.2026.132938
Cagri Eser , Zeynep Sonat Baltaci , Emre Akbas , Sinan Kalkan
Imbalance in classification tasks is commonly quantified by the cardinalities of examples across classes. This, however, disregards the presence of redundant examples and inherent differences in the learning difficulties of classes. Alternatively, one can use complex measures such as training loss and uncertainty, which, however, depend on training a machine learning model. Our paper proposes using data Intrinsic Dimensionality (ID) as an easy-to-compute, model-free measure of imbalance that can be seamlessly incorporated into various imbalance mitigation methods. Our results across five different datasets with a diverse range of imbalance ratios show that ID consistently outperforms cardinality-based re-weighting and re-sampling techniques used in the literature. Moreover, we show that combining ID with cardinality can further improve performance. Our code and models are available at https://github.com/cagries/IDIM.
{"title":"Intrinsic dimensionality as a model-free measure of class imbalance","authors":"Cagri Eser , Zeynep Sonat Baltaci , Emre Akbas , Sinan Kalkan","doi":"10.1016/j.neucom.2026.132938","DOIUrl":"10.1016/j.neucom.2026.132938","url":null,"abstract":"<div><div>Imbalance in classification tasks is commonly quantified by the cardinalities of examples across classes. This, however, disregards the presence of redundant examples and inherent differences in the learning difficulties of classes. Alternatively, one can use complex measures such as training loss and uncertainty, which, however, depend on training a machine learning model. Our paper proposes using data Intrinsic Dimensionality (ID) as an easy-to-compute, model-free measure of imbalance that can be seamlessly incorporated into various imbalance mitigation methods. Our results across five different datasets with a diverse range of imbalance ratios show that ID consistently outperforms cardinality-based re-weighting and re-sampling techniques used in the literature. Moreover, we show that combining ID with cardinality can further improve performance. Our code and models are available at <span><span>https://github.com/cagries/IDIM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132938"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1016/j.neucom.2026.132932
Siqi Chen , Mingdao Wang , Xianlin Zhang , Xueming Li , Yue Zhang
Exemplar-based image colorization colorizes a grayscale image based on a color reference image. Although recent advances have significantly improved color matching and generation techniques, there is a paucity of research addressing the issue of color fidelity, i.e., whether the colored grayscale image accurately preserves the guidance information from the reference image. The absence of a ground truth colored target image for each reference-target image pair renders the color fidelity difficult to quantify or learn by the models. Motivated by this, this paper introduces cyclic strategy into exemplar-based colorization task. Firstly, we propose the concept of cycle reference peak signal-to-noise ratio (CRPSNR). By careful design, the CRPSNR uses the colorization output as the guidance to recolor the reference image. Using the original color reference image as the ground truth, CRPSNR enables the quantification of color fidelity. Furthermore, the cycle reference learning for exemplar-based image colorization (CRColor) is proposed. The CRColor uses a main branch to colorize the target image and a training-only cycle branch to draw the result closer to the guidance, which enables model to learn color fidelity. Experiments demonstrate that our method maintains comparable image quality to recent state-of-the-art methods while outperforming the methods in color fidelity to the reference image, both quantitatively and qualitatively. Our code will be published for academic research.
{"title":"CRColor: Cycle reference learning for exemplar-based image colorization","authors":"Siqi Chen , Mingdao Wang , Xianlin Zhang , Xueming Li , Yue Zhang","doi":"10.1016/j.neucom.2026.132932","DOIUrl":"10.1016/j.neucom.2026.132932","url":null,"abstract":"<div><div>Exemplar-based image colorization colorizes a grayscale image based on a color reference image. Although recent advances have significantly improved color matching and generation techniques, there is a paucity of research addressing the issue of color fidelity, i.e., whether the colored grayscale image accurately preserves the guidance information from the reference image. The absence of a ground truth colored target image for each reference-target image pair renders the color fidelity difficult to quantify or learn by the models. Motivated by this, this paper introduces cyclic strategy into exemplar-based colorization task. Firstly, we propose the concept of cycle reference peak signal-to-noise ratio (CRPSNR). By careful design, the CRPSNR uses the colorization output as the guidance to recolor the reference image. Using the original color reference image as the ground truth, CRPSNR enables the quantification of color fidelity. Furthermore, the cycle reference learning for exemplar-based image colorization (CRColor) is proposed. The CRColor uses a main branch to colorize the target image and a training-only cycle branch to draw the result closer to the guidance, which enables model to learn color fidelity. Experiments demonstrate that our method maintains comparable image quality to recent state-of-the-art methods while outperforming the methods in color fidelity to the reference image, both quantitatively and qualitatively. Our code will be published for academic research.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132932"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sign language translation (SLT) converts sign language videos into textual sentences. This process is essential for enabling communication between deaf and hearing individuals. However, the inherent modal gap between visual sign sequences and textual linguistics severely limits performance. Existing methods rely on costly gloss annotations for intermediate supervision, restricting scalability; unsupervised alternatives lack fine-grained alignment or semantic learning capabilities. To address this, we introduce CMAG-Net, a framework integrating cross-modal alignment pre-training and dynamic graph convolutions. The architecture comprises two modules: (1) A cross-modal alignment pre-training module. Optimized with a multi-objective loss, it learns to align visual features with textual semantics, effectively bridging the modality gap without gloss supervision; (2) A dynamic dual-graph spatiotemporal module. It consists of a temporal graph that captures local sign dynamics and a similarity graph that aggregates global semantic relationships. This design suppresses noise, enhances discriminative features, and addresses the challenges of redundant frames and complex spatiotemporal dependencies. Experiments show CMAG-Net outperforms all gloss-free methods on PHOENIX-2014T, CSL-Daily and How2Sign, approaching gloss-based state-of-the-art performance. Versus GFSLT-VLP (gloss-free) on PHOENIX-2014T dev/test sets, BLEU-4 improves by +5.19/+5.95. Compared to MMTLB (gloss-based), the gap narrows to 0.37/0.22 BLEU-4.
{"title":"Sign language translation via cross-modal alignment and graph convolution","authors":"Ming Yu , Pengfei Zhang , Cuihong Xue , Yingchun Guo","doi":"10.1016/j.neucom.2026.132949","DOIUrl":"10.1016/j.neucom.2026.132949","url":null,"abstract":"<div><div>Sign language translation (SLT) converts sign language videos into textual sentences. This process is essential for enabling communication between deaf and hearing individuals. However, the inherent modal gap between visual sign sequences and textual linguistics severely limits performance. Existing methods rely on costly gloss annotations for intermediate supervision, restricting scalability; unsupervised alternatives lack fine-grained alignment or semantic learning capabilities. To address this, we introduce CMAG-Net, a framework integrating cross-modal alignment pre-training and dynamic graph convolutions. The architecture comprises two modules: (1) A cross-modal alignment pre-training module. Optimized with a multi-objective loss, it learns to align visual features with textual semantics, effectively bridging the modality gap without gloss supervision; (2) A dynamic dual-graph spatiotemporal module. It consists of a temporal graph that captures local sign dynamics and a similarity graph that aggregates global semantic relationships. This design suppresses noise, enhances discriminative features, and addresses the challenges of redundant frames and complex spatiotemporal dependencies. Experiments show CMAG-Net outperforms all gloss-free methods on PHOENIX-2014T, CSL-Daily and How2Sign, approaching gloss-based state-of-the-art performance. Versus GFSLT-VLP (gloss-free) on PHOENIX-2014T dev/test sets, BLEU-4 improves by +5.19/+5.95. Compared to MMTLB (gloss-based), the gap narrows to 0.37/0.22 BLEU-4.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132949"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1016/j.neucom.2026.132930
Kailun Wei, Xiaoyan Liu, Wei Zhao
Infrared thermography allows imaging in dark and smoky environments and is widely used in firefighting and industrial scenarios. However, high temperature water vapor in the above scenarios can significantly degrade the quality of thermal infrared (TIR) images, leading to errors in subsequent visual tasks. The non-uniform distribution of high-temperature water vapor and the resulting severe information loss in TIR images pose significant challenges to restoration. To address this issue, we propose a cascaded dual-domain interaction network (CDINet) for TIR image restoration. The Dual-domain Interaction Block (DIB) is designed as the basic unit of CDINet. This module enhances feature representation through spatial-frequency interaction, thereby improving the model’s performance in perceiving and restoring non-uniform vapor degraded regions. In addition, we introduce Long Short-Term Memory (LSTM) and design CDINet as a cascade structure to progressively restore and refine the lost information caused by vapor interference in an iterative manner. Furthermore, we have constructed a benchmark dataset comprising 12,500 vapor degraded TIR images to evaluate the restoration performance of different models. Extensive experiments comparing our CDINet with 12 state-of-the-art methods have shown that CDINet can effectively eliminate vapor interference from scenes with varying distributions. It outperforms other methods, especially in challenging scenarios with large non-uniform dense and localized non-uniform vapor degradation. The dataset and code are publicly available at: https://github.com/wkl1996/CDINet-TIR-Restoration.
{"title":"CDINet: A cascaded dual-domain interaction network for vapor degraded thermal infrared image restoration","authors":"Kailun Wei, Xiaoyan Liu, Wei Zhao","doi":"10.1016/j.neucom.2026.132930","DOIUrl":"10.1016/j.neucom.2026.132930","url":null,"abstract":"<div><div>Infrared thermography allows imaging in dark and smoky environments and is widely used in firefighting and industrial scenarios. However, high temperature water vapor in the above scenarios can significantly degrade the quality of thermal infrared (TIR) images, leading to errors in subsequent visual tasks. The non-uniform distribution of high-temperature water vapor and the resulting severe information loss in TIR images pose significant challenges to restoration. To address this issue, we propose a cascaded dual-domain interaction network (CDINet) for TIR image restoration. The Dual-domain Interaction Block (DIB) is designed as the basic unit of CDINet. This module enhances feature representation through spatial-frequency interaction, thereby improving the model’s performance in perceiving and restoring non-uniform vapor degraded regions. In addition, we introduce Long Short-Term Memory (LSTM) and design CDINet as a cascade structure to progressively restore and refine the lost information caused by vapor interference in an iterative manner. Furthermore, we have constructed a benchmark dataset comprising 12,500 vapor degraded TIR images to evaluate the restoration performance of different models. Extensive experiments comparing our CDINet with 12 state-of-the-art methods have shown that CDINet can effectively eliminate vapor interference from scenes with varying distributions. It outperforms other methods, especially in challenging scenarios with large non-uniform dense and localized non-uniform vapor degradation. The dataset and code are publicly available at: <span><span>https://github.com/wkl1996/CDINet-TIR-Restoration</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132930"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1016/j.neucom.2026.132896
Dongyu Zhang , Wanqiu Liao , Haojia Li , Hongfei Lin
Metaphor plays a fundamental role in human cognition, involving the construction of conceptual mappings that unfold through dynamic neural processes. However, current natural language processing (NLP) systems largely overlook the brain signals engaged during metaphor production, limiting their ability to capture cognitively grounded mechanisms. To address this gap, we introduce ESGME (ERP-Signal-Guided Metaphor Explanation), a framework that integrates event-related potential (ERP) recordings with large language models (LLMs) to generate metaphor explanations conditioned on neural activity. ESGME employs a two-stage design: in Stage 1, an ERP encoder is trained to align ERP signals with the semantic embedding space of the target LLM, enabling neural representations to be mapped to conceptual-level meaning; in Stage 2, the aligned ERP embeddings serve as cognitive cue prompts that guide LLMs in producing metaphor explanations. The framework further incorporates text-based guiding factors to stabilize conceptual mapping during explanation generation. Experiments across multiple LLMs demonstrate that aligned ERP signals provide meaningful cognitive information beyond textual cues. These results highlight the feasibility of translating metaphor-related neural activity into coherent explanatory text and establish a new pathway for bridging cognitive neuroscience with generative NLP. Dataset and code: https://github.com/xinyu706/ESGME.
{"title":"ESGME: Generating metaphor explanations from event-related potential signals using large language models","authors":"Dongyu Zhang , Wanqiu Liao , Haojia Li , Hongfei Lin","doi":"10.1016/j.neucom.2026.132896","DOIUrl":"10.1016/j.neucom.2026.132896","url":null,"abstract":"<div><div>Metaphor plays a fundamental role in human cognition, involving the construction of conceptual mappings that unfold through dynamic neural processes. However, current natural language processing (NLP) systems largely overlook the brain signals engaged during metaphor production, limiting their ability to capture cognitively grounded mechanisms. To address this gap, we introduce ESGME (ERP-Signal-Guided Metaphor Explanation), a framework that integrates event-related potential (ERP) recordings with large language models (LLMs) to generate metaphor explanations conditioned on neural activity. ESGME employs a two-stage design: in Stage 1, an ERP encoder is trained to align ERP signals with the semantic embedding space of the target LLM, enabling neural representations to be mapped to conceptual-level meaning; in Stage 2, the aligned ERP embeddings serve as cognitive cue prompts that guide LLMs in producing metaphor explanations. The framework further incorporates text-based guiding factors to stabilize conceptual mapping during explanation generation. Experiments across multiple LLMs demonstrate that aligned ERP signals provide meaningful cognitive information beyond textual cues. These results highlight the feasibility of translating metaphor-related neural activity into coherent explanatory text and establish a new pathway for bridging cognitive neuroscience with generative NLP. Dataset and code: <span><span>https://github.com/xinyu706/ESGME</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132896"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146172945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1016/j.neucom.2026.132817
Gang Chen , Binjie Hou
Imbalanced data classification is a common challenge in various fields of medical diagnosis and financial risk management. However, the traditional Synthetic Minority Oversampling Technique (SMOTE) algorithm and its variants exhibit certain limitations, particularly the tendency to increase noise during the sample generation and the lack of a robust evaluation mechanism for assessing the quality of the synthetic data. To address these issues, we propose a novel data evaluation method based on the Wasserstein Generative Adversarial Network (DEM-WGAN). DEM-WGAN first learns the distribution characteristics of the majority-class by inputting majority-class samples into Wasserstein Generative Adversarial Network (WGAN). Then, the trained discriminator is used to evaluate the similarity between the synthetic data and the majority-class distribution. Finally, high-quality data that better conforms to the minority-class are generated through the evaluation process until the number of minority-class samples equals that of the majority-class. Experimental results demonstrate that DEM-WGAN significantly improves classification performance compared to several SMOTE algorithms. Source code for the applications discussed in this paper is available at https://github.com/ithbjgit1/DEM-WGAN.git.
{"title":"DEM-WGAN: A new data evaluation method based on Wasserstein Generative Adversarial Network for imbalanced data classification","authors":"Gang Chen , Binjie Hou","doi":"10.1016/j.neucom.2026.132817","DOIUrl":"10.1016/j.neucom.2026.132817","url":null,"abstract":"<div><div>Imbalanced data classification is a common challenge in various fields of medical diagnosis and financial risk management. However, the traditional Synthetic Minority Oversampling Technique (SMOTE) algorithm and its variants exhibit certain limitations, particularly the tendency to increase noise during the sample generation and the lack of a robust evaluation mechanism for assessing the quality of the synthetic data. To address these issues, we propose a novel data evaluation method based on the Wasserstein Generative Adversarial Network (DEM-WGAN). DEM-WGAN first learns the distribution characteristics of the majority-class by inputting majority-class samples into Wasserstein Generative Adversarial Network (WGAN). Then, the trained discriminator is used to evaluate the similarity between the synthetic data and the majority-class distribution. Finally, high-quality data that better conforms to the minority-class are generated through the evaluation process until the number of minority-class samples equals that of the majority-class. Experimental results demonstrate that DEM-WGAN significantly improves classification performance compared to several SMOTE algorithms. Source code for the applications discussed in this paper is available at <span><span>https://github.com/ithbjgit1/DEM-WGAN.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132817"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1016/j.neucom.2026.132942
Kangkan Wang , Kejie Wei , Shao-Yuan Li
This paper addresses the challenge of dynamic view synthesis from sparse input views with topologically-varying neural radiance fields (NeRFs). Previous methods estimate a NeRFs at each time step or learn a hyperspace of templates to represent the topology-changing scenes. However, the time-conditioned NeRFs is highly ill-posed as it predicts NeRFs and motion simultaneously with a single model, while hyperspace template approaches suffer from degraded performance when the number of input views is insufficient. To address these issues, we propose a topologically-varying NeRFs by learning sparse templates in canonical space. The sparse template NeRFs are learned to represent different topology-changing states of dynamic scenes which are realized through a variance constraint on hyper-coordinates of the templates. By composing the deformation fields with an inverse deformation fields, we obtain 3D scene flows among different time instances and constrain the per-frame deformation with 2D optical flows, which also implicitly form multi-view constraints on the NeRF model from sparse input views. Compared to existing methods for dynamic view synthesis, our method is more effective at handling the sparse view data with large topology changes owing to the constrained space of sparse template NeRFs and constraints from forward-inverse deformation fields. Extensive experiments on various datasets demonstrate that our method improves the quality of novel-view synthesis compared with previous works.
{"title":"Dynamic view synthesis with topologically-varying neural radiance fields from sparse input views","authors":"Kangkan Wang , Kejie Wei , Shao-Yuan Li","doi":"10.1016/j.neucom.2026.132942","DOIUrl":"10.1016/j.neucom.2026.132942","url":null,"abstract":"<div><div>This paper addresses the challenge of dynamic view synthesis from sparse input views with topologically-varying neural radiance fields (NeRFs). Previous methods estimate a NeRFs at each time step or learn a hyperspace of templates to represent the topology-changing scenes. However, the time-conditioned NeRFs is highly ill-posed as it predicts NeRFs and motion simultaneously with a single model, while hyperspace template approaches suffer from degraded performance when the number of input views is insufficient. To address these issues, we propose a topologically-varying NeRFs by learning sparse templates in canonical space. The sparse template NeRFs are learned to represent different topology-changing states of dynamic scenes which are realized through a variance constraint on hyper-coordinates of the templates. By composing the deformation fields with an inverse deformation fields, we obtain 3D scene flows among different time instances and constrain the per-frame deformation with 2D optical flows, which also implicitly form multi-view constraints on the NeRF model from sparse input views. Compared to existing methods for dynamic view synthesis, our method is more effective at handling the sparse view data with large topology changes owing to the constrained space of sparse template NeRFs and constraints from forward-inverse deformation fields. Extensive experiments on various datasets demonstrate that our method improves the quality of novel-view synthesis compared with previous works.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132942"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1016/j.neucom.2026.132929
Jingyue Wang , Lu Wei , Zheng Qian , Chengyao Shi , Yuwen Liu , Yinglan Xu
Multi-view learning (MVL), a paradigm of deep learning, has greatly facilitated the detection of epileptic seizures from electroencephalograms (EEGs) owing to its remarkable capability to learn generalization features. However, existing MVL-based seizure detection methods rely on decision strategies to aggregate the discriminative outputs of separate learners, leading to insufficient extraction of inter-view complementarity and limiting the detection performance. To address this issue, this paper focuses on two aspects and proposes a multi-view representation fusion learning framework, which enables direct information fusion at the feature encoding level. Firstly, to enhance discriminability, we construct hierarchical multi-view representations based on the Gramian Angular Summation Field and an improved Stockwell transform by introducing the spatial characteristics of EEG montages and temporal dependency dynamics. Secondly, to process both local and global features comprehensively, we propose a hybrid MetaFormer network that incorporates inverted depth-wise separable convolutions and sparsity-enhanced shifted-window attention mechanisms. Specifically, the fusion unit with cross-attention mechanisms exploits the Key and Value matrices to achieve effective inter-view information exchange. The experimental results on the public CHB-MIT and Siena datasets demonstrate that the proposed method outperforms competing techniques in both sample-based and event-based evaluations for EEG seizure detection. In addition, an explanation module is devised based on feature importance scoring. In this way, our method enables post-hoc explanations for the multi-view fusion learning process and discriminative results utilizing topographic maps, indicating an explainable computational solution for EEG seizure detection.
{"title":"An explainable multi-view representation fusion learning framework with hybrid MetaFormer for EEG-based epileptic seizure detection","authors":"Jingyue Wang , Lu Wei , Zheng Qian , Chengyao Shi , Yuwen Liu , Yinglan Xu","doi":"10.1016/j.neucom.2026.132929","DOIUrl":"10.1016/j.neucom.2026.132929","url":null,"abstract":"<div><div>Multi-view learning (MVL), a paradigm of deep learning, has greatly facilitated the detection of epileptic seizures from electroencephalograms (EEGs) owing to its remarkable capability to learn generalization features. However, existing MVL-based seizure detection methods rely on decision strategies to aggregate the discriminative outputs of separate learners, leading to insufficient extraction of inter-view complementarity and limiting the detection performance. To address this issue, this paper focuses on two aspects and proposes a multi-view representation fusion learning framework, which enables direct information fusion at the feature encoding level. Firstly, to enhance discriminability, we construct hierarchical multi-view representations based on the Gramian Angular Summation Field and an improved Stockwell transform by introducing the spatial characteristics of EEG montages and temporal dependency dynamics. Secondly, to process both local and global features comprehensively, we propose a hybrid MetaFormer network that incorporates inverted depth-wise separable convolutions and sparsity-enhanced shifted-window attention mechanisms. Specifically, the fusion unit with cross-attention mechanisms exploits the Key and Value matrices to achieve effective inter-view information exchange. The experimental results on the public CHB-MIT and Siena datasets demonstrate that the proposed method outperforms competing techniques in both sample-based and event-based evaluations for EEG seizure detection. In addition, an explanation module is devised based on feature importance scoring. In this way, our method enables post-hoc explanations for the multi-view fusion learning process and discriminative results utilizing topographic maps, indicating an explainable computational solution for EEG seizure detection.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132929"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1016/j.neucom.2026.132943
Yan Liu , Kaiju Li , Md Sabuj Khan , Jian Lang , Rongpei Hong , Kunpeng Zhang , Fan Zhou
Weakly supervised video anomaly detection (WSVAD) aims to localize frame-level anomalies using only video-level labels, offering scalability for large-scale surveillance systems. However, existing methods often struggle to adapt to previously unseen and continuously evolving anomaly patterns, limiting their practical applicability. This challenge necessitates the development of continual learning (CL) frameworks that support incremental adaptation while preserving previously acquired knowledge. To this end, we propose a novel CL-based framework, dubbed COMMAND, for WSVAD that enables robust and adaptive anomaly detection in dynamic environments. COMMAND incorporates TempMamba, a temporal modeling unit based on Mamba blocks, which effectively captures both short-range and long-range temporal dependencies essential for distinguishing normal and abnormal behavior. In addition, MemDualNet introduces a dual-memory mechanism that retains both short-term variations and long-term contextual information, facilitating more expressive temporal representations. The framework Notation, a continual learning strategy that integrates memory replay with a composite loss function comprising contrastive, focal, and multiple-instance objectives to alleviate catastrophic forgetting. Experimental results on benchmark datasets such as UCF-Crime and ShanghaiTech validate the effectiveness of the proposed approach, demonstrating superior performance in adaptability, generalization, and anomaly localization compared to existing state-of-the-art methods.
{"title":"COMMANDing anomalies: Continual video anomaly detection via dual-memory and temporal mamba modeling","authors":"Yan Liu , Kaiju Li , Md Sabuj Khan , Jian Lang , Rongpei Hong , Kunpeng Zhang , Fan Zhou","doi":"10.1016/j.neucom.2026.132943","DOIUrl":"10.1016/j.neucom.2026.132943","url":null,"abstract":"<div><div>Weakly supervised video anomaly detection (WSVAD) aims to localize frame-level anomalies using only video-level labels, offering scalability for large-scale surveillance systems. However, existing methods often struggle to adapt to previously unseen and continuously evolving anomaly patterns, limiting their practical applicability. This challenge necessitates the development of continual learning (CL) frameworks that support incremental adaptation while preserving previously acquired knowledge. To this end, we propose a novel CL-based framework, dubbed <strong>COMMAND</strong>, for WSVAD that enables robust and adaptive anomaly detection in dynamic environments. COMMAND incorporates TempMamba, a temporal modeling unit based on Mamba blocks, which effectively captures both short-range and long-range temporal dependencies essential for distinguishing normal and abnormal behavior. In addition, MemDualNet introduces a dual-memory mechanism that retains both short-term variations and long-term contextual information, facilitating more expressive temporal representations. The framework Notation<span><math><mo>+</mo><mo>+</mo></math></span>, a continual learning strategy that integrates memory replay with a composite loss function comprising contrastive, focal, and multiple-instance objectives to alleviate catastrophic forgetting. Experimental results on benchmark datasets such as UCF-Crime and ShanghaiTech validate the effectiveness of the proposed approach, demonstrating superior performance in adaptability, generalization, and anomaly localization compared to existing state-of-the-art methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132943"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}