首页 > 最新文献

Neurocomputing最新文献

英文 中文
Depth aware image compression with multi-reference dynamic entropy model 基于多参考动态熵模型的深度感知图像压缩
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.neucom.2026.132971
Jingyi He, Yongjun Li, Yifei Liang, Mengyan Lu, Haorui Liu, Jixing Zhou, Yi Wei, Hongyan Liu
To overcome the limitations of static feature extraction and inefficient context modeling in existing learned image compression, this paper proposes an image compression algorithm that integrates Depth-aware Adaptive Transformation (DAT) framework and Multi-reference Dynamic Entropy Model (MDEM). A proposed Multi-scale Capacity-aware Feature Enhancer (MCFE) model is adaptively embedded into the network to enhance feature extraction capability. The DAT architecture integrates a variational autoencoder framework with MCFE to increase the density of latent representations. Furthermore, an improved soft-threshold sparse attention mechanism is combined with a multi-context model, incorporating adaptive weights to eliminate spatial redundancy in the latent representations across local, non-local, and global dimensions, while channel context is introduced to capture channel dependencies. Building upon this, the MDEM integrates the side information provided by DAT along with spatial and channel context information and employs a channel-wise autoregressive model to achieve accurate pixel estimation for precise entropy probability estimation, which improves compression performance. Evaluated on the Kodak, Tecnick, and CLIC(Challenge on Learned Image Compression) Professional Validation datasets, the proposed method achieves BD-rate(Bjøntegaard Delta rate) gains of 7.75%, 9.33%, and 5.73%, respectively, compared to the VTM(Versatile Video Coding Test Model)-17.0 benchmark. Therefore, the proposed algorithm overcomes the limitations of fixed-context and static feature extraction strategies, enabling precise probability estimation and superior compression performance through dynamic resource allocation and multi-dimensional contextual modeling.
为了克服现有学习图像压缩中静态特征提取和上下文建模效率低下的局限性,提出了一种融合深度感知自适应变换(DAT)框架和多参考动态熵模型(MDEM)的图像压缩算法。提出了一种自适应嵌入网络的多尺度容量感知特征增强器(MCFE)模型,以增强特征提取能力。DAT架构将变分自编码器框架与MCFE集成在一起,以增加潜在表示的密度。此外,将改进的软阈值稀疏注意机制与多上下文模型相结合,结合自适应权重来消除局部、非局部和全局维度潜在表征中的空间冗余,同时引入通道上下文来捕获通道依赖性。在此基础上,MDEM集成了DAT提供的侧信息以及空间和信道上下文信息,并采用信道自回归模型实现精确的像素估计,以实现精确的熵概率估计,从而提高了压缩性能。在Kodak, Tecnick和CLIC(Challenge on Learned Image Compression) Professional Validation数据集上进行了评估,与VTM(Versatile Video Coding Test Model)-17.0基准相比,该方法的BD-rate(Bjøntegaard Delta rate)分别提高了7.75%,9.33%和5.73%。因此,该算法克服了固定上下文和静态特征提取策略的局限性,通过动态资源分配和多维上下文建模实现了精确的概率估计和优越的压缩性能。
{"title":"Depth aware image compression with multi-reference dynamic entropy model","authors":"Jingyi He,&nbsp;Yongjun Li,&nbsp;Yifei Liang,&nbsp;Mengyan Lu,&nbsp;Haorui Liu,&nbsp;Jixing Zhou,&nbsp;Yi Wei,&nbsp;Hongyan Liu","doi":"10.1016/j.neucom.2026.132971","DOIUrl":"10.1016/j.neucom.2026.132971","url":null,"abstract":"<div><div>To overcome the limitations of static feature extraction and inefficient context modeling in existing learned image compression, this paper proposes an image compression algorithm that integrates Depth-aware Adaptive Transformation (DAT) framework and Multi-reference Dynamic Entropy Model (MDEM). A proposed Multi-scale Capacity-aware Feature Enhancer (MCFE) model is adaptively embedded into the network to enhance feature extraction capability. The DAT architecture integrates a variational autoencoder framework with MCFE to increase the density of latent representations. Furthermore, an improved soft-threshold sparse attention mechanism is combined with a multi-context model, incorporating adaptive weights to eliminate spatial redundancy in the latent representations across local, non-local, and global dimensions, while channel context is introduced to capture channel dependencies. Building upon this, the MDEM integrates the side information provided by DAT along with spatial and channel context information and employs a channel-wise autoregressive model to achieve accurate pixel estimation for precise entropy probability estimation, which improves compression performance. Evaluated on the Kodak, Tecnick, and CLIC(Challenge on Learned Image Compression) Professional Validation datasets, the proposed method achieves BD-rate(Bjøntegaard Delta rate) gains of <span><math><mn>7.75</mn><mi>%</mi></math></span>, <span><math><mn>9.33</mn><mi>%</mi></math></span>, and <span><math><mn>5.73</mn><mi>%</mi></math></span>, respectively, compared to the VTM(Versatile Video Coding Test Model)-17.0 benchmark. Therefore, the proposed algorithm overcomes the limitations of fixed-context and static feature extraction strategies, enabling precise probability estimation and superior compression performance through dynamic resource allocation and multi-dimensional contextual modeling.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132971"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HierLoRA: A hierarchical multi-concept learning approach with enhanced LoRA for personalized image diffusion models HierLoRA:一种针对个性化图像扩散模型的分层多概念学习方法,具有增强的LoRA
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.neucom.2026.132927
Yongjie Niu , Pengbo Zhou , Rui Zhou , Mingquan Zhou
Personalized image generation, a key application of diffusion models, holds significant importance for the advancement of computer vision, artistic creation, and content generation technologies. However, existing diffusion models fine-tuned with Low-Rank Adaptation (LoRA) face multiple challenges when learning novel concepts: language drift undermines the generation quality of new concepts in novel contexts; the entanglement of object features with other elements in reference images leads to misalignment between the learning target and its unique identifier; and traditional LoRA approaches are limited to learning only one concept at a time. To address these issues, this study proposes a novel hierarchical learning strategy and an enhanced LoRA module. Specifically, we incorporate the GeLU activation function into the LoRA architecture as a nonlinear transformation to effectively mitigate language drift. Furthermore, a gated hierarchical learning mechanism is designed to achieve inter-concept disentanglement, enabling a single LoRA module to learn multiple concepts concurrently. Experimental results across multiple random seeds demonstrate that our approach achieves a 4%–6% improvement in memory retention metrics and outperforms state-of-the-art methods in object fidelity and style similarity by approximately 12.5% and 10%, respectively. In addition to superior generation quality, our method demonstrates high computational efficiency, requiring significantly fewer trainable parameters (45M) compared to existing baselines. While preserving critical features of target objects and maintaining the model’s original capabilities, our method enables the generation of images across diverse scenes in new styles. In scenarios requiring the simultaneous learning of multiple concepts, this study not only presents a novel solution to the multi-concept learning problem in personalized diffusion model training but also lays a technical foundation for high-quality customized AI image generation and diverse visual content creation. The source code is publicly available at https://github.com/ydniuyongjie/HierLoRA/tree/main.
个性化图像生成是扩散模型的关键应用,对计算机视觉、艺术创作和内容生成技术的进步具有重要意义。然而,现有的经低秩自适应(LoRA)微调的扩散模型在学习新概念时面临多重挑战:语言漂移破坏了新语境下新概念的生成质量;对象特征与参考图像中其他元素的纠缠导致学习目标与其唯一标识符之间的不对齐;而传统的LoRA方法一次只能学习一个概念。为了解决这些问题,本研究提出了一种新的分层学习策略和增强的LoRA模块。具体来说,我们将GeLU激活函数作为非线性转换合并到LoRA体系结构中,以有效地减轻语言漂移。此外,设计了一种门控分层学习机制来实现概念间的解纠缠,使单个LoRA模块能够同时学习多个概念。跨多个随机种子的实验结果表明,我们的方法在记忆保留指标上实现了4%-6%的改进,并且在对象保真度和风格相似性方面分别优于最先进的方法约12.5%和10%。除了优越的生成质量外,我们的方法还显示出高计算效率,与现有基线相比,需要更少的可训练参数(~ 45M)。在保留目标物体的关键特征和保持模型的原始功能的同时,我们的方法能够以新的风格生成不同场景的图像。在需要同时学习多个概念的场景中,本研究不仅为个性化扩散模型训练中的多概念学习问题提供了新颖的解决方案,而且为高质量的定制AI图像生成和多样化的视觉内容创作奠定了技术基础。源代码可在https://github.com/ydniuyongjie/HierLoRA/tree/main上公开获得。
{"title":"HierLoRA: A hierarchical multi-concept learning approach with enhanced LoRA for personalized image diffusion models","authors":"Yongjie Niu ,&nbsp;Pengbo Zhou ,&nbsp;Rui Zhou ,&nbsp;Mingquan Zhou","doi":"10.1016/j.neucom.2026.132927","DOIUrl":"10.1016/j.neucom.2026.132927","url":null,"abstract":"<div><div>Personalized image generation, a key application of diffusion models, holds significant importance for the advancement of computer vision, artistic creation, and content generation technologies. However, existing diffusion models fine-tuned with Low-Rank Adaptation (LoRA) face multiple challenges when learning novel concepts: language drift undermines the generation quality of new concepts in novel contexts; the entanglement of object features with other elements in reference images leads to misalignment between the learning target and its unique identifier; and traditional LoRA approaches are limited to learning only one concept at a time. To address these issues, this study proposes a novel hierarchical learning strategy and an enhanced LoRA module. Specifically, we incorporate the GeLU activation function into the LoRA architecture as a nonlinear transformation to effectively mitigate language drift. Furthermore, a gated hierarchical learning mechanism is designed to achieve inter-concept disentanglement, enabling a single LoRA module to learn multiple concepts concurrently. Experimental results across multiple random seeds demonstrate that our approach achieves a 4%–6% improvement in memory retention metrics and outperforms state-of-the-art methods in object fidelity and style similarity by approximately 12.5% and 10%, respectively. In addition to superior generation quality, our method demonstrates high computational efficiency, requiring significantly fewer trainable parameters (<span><math><mo>∼</mo></math></span>45M) compared to existing baselines. While preserving critical features of target objects and maintaining the model’s original capabilities, our method enables the generation of images across diverse scenes in new styles. In scenarios requiring the simultaneous learning of multiple concepts, this study not only presents a novel solution to the multi-concept learning problem in personalized diffusion model training but also lays a technical foundation for high-quality customized AI image generation and diverse visual content creation. <strong>The source code is publicly available at</strong> <span><span><strong>https://github.com/ydniuyongjie/HierLoRA/tree/main</strong></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132927"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seeing the whole in the parts with self-supervised representation learning 用自监督表示学习在局部中看到整体
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.neucom.2026.132973
Arthur Aubret , Cèline Teulière , Jochen Triesch
Humans learn to recognize categories of objects, even when exposed to minimal language supervision. Behavioral studies and the successes of self-supervised learning (SSL) models suggest that this learning may hinge on modeling spatial regularities of visual features. However, SSL models rely on geometric image augmentations such as masking portions of an image or aggressively cropping it, which are not known to be performed by the brain. Here, we propose CO-SSL, an alternative to geometric image augmentations to model spatial co-occurrences. CO-SSL aligns local representations (before pooling) with a global image representation. Combined with a neural network endowed with small receptive fields, we show that it outperforms previous methods by up to 43.4% on ImageNet-1k when not using cropping augmentations. In addition, CO-SSL can be combined with cropping image augmentations to accelerate category learning and increase the robustness to internal corruptions and small adversarial attacks. Overall, our work paves the way towards a new approach for modeling biological learning and developing self-supervised representations in artificial systems.
人类学会识别物体的类别,即使是在极少的语言监督下。行为研究和自监督学习(SSL)模型的成功表明,这种学习可能取决于对视觉特征的空间规律的建模。然而,SSL模型依赖于几何图像增强,例如遮蔽图像的部分或大幅裁剪图像,这些都不是由大脑执行的。在这里,我们提出了CO-SSL,一种替代几何图像增强来模拟空间共现。CO-SSL将本地表示(在池化之前)与全局图像表示对齐。结合具有小接受域的神经网络,我们发现当不使用裁剪增强时,它在ImageNet-1k上的性能比以前的方法高出43.4%。此外,CO-SSL可以与裁剪图像增强相结合,以加速类别学习并增加对内部损坏和小型对抗性攻击的鲁棒性。总的来说,我们的工作为在人工系统中建模生物学习和开发自监督表示的新方法铺平了道路。
{"title":"Seeing the whole in the parts with self-supervised representation learning","authors":"Arthur Aubret ,&nbsp;Cèline Teulière ,&nbsp;Jochen Triesch","doi":"10.1016/j.neucom.2026.132973","DOIUrl":"10.1016/j.neucom.2026.132973","url":null,"abstract":"<div><div>Humans learn to recognize categories of objects, even when exposed to minimal language supervision. Behavioral studies and the successes of self-supervised learning (SSL) models suggest that this learning may hinge on modeling spatial regularities of visual features. However, SSL models rely on geometric image augmentations such as masking portions of an image or aggressively cropping it, which are not known to be performed by the brain. Here, we propose CO-SSL, an alternative to geometric image augmentations to model spatial co-occurrences. CO-SSL aligns local representations (before pooling) with a global image representation. Combined with a neural network endowed with small receptive fields, we show that it outperforms previous methods by up to <span><math><mn>43.4</mn><mi>%</mi></math></span> on ImageNet-1k when not using cropping augmentations. In addition, CO-SSL can be combined with cropping image augmentations to accelerate category learning and increase the robustness to internal corruptions and small adversarial attacks. Overall, our work paves the way towards a new approach for modeling biological learning and developing self-supervised representations in artificial systems.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132973"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian approach to tensor networks 张量网络的贝叶斯方法
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.neucom.2026.132961
Erdong Guo, David Draper
Bayesian statistical learning is a powerful paradigm for inference and prediction, which integrates internal information (sampling distribution of training data) and external information (prior knowledge or background information) within a logically consistent probabilistic framework. In addition, the posterior distribution and the posterior predictive (marginal) distribution derived from the Bayes’ rule summarize the entire information required for inference and prediction, respectively. In this work, we investigate the Bayesian framework of the Tensor Network (BTN) from two perspectives. First, for the inference step, we propose an effective initialization scheme for the BTN parameters, which significantly improves the robustness and efficiency of the training procedure and leads to improved test performance. Second, in the prediction stage, we consider the Gaussian prior of the weights in BTN and predict the labels of the new observations using the posterior predictive (marginal) distribution. We derive the approximation of the posterior predictive distribution using the Laplace approximation where the out-product approximation of the Hessian matrix of the posterior distribution is applied. In the numerical experiments, we evaluate the performance of our initialization strategy and demonstrate its advantages by comparing it with other popular initialization methods including He initialization, Xavier initialization and Haliassos initialization methods on California House Price (CHP), Breast Cancer (BC), Phishing Website (PW), MNIST, Fashion-MNIST (FMNIST), SVHN and CIFAR-10 datasets. We further examine the characteristics of BTN by showing its parameters and decision boundaries trained on the two-dimensional synthetic dataset. The performance of BTN is thoroughly analyzed from two perspectives: the generalization and calibration. Through the experiments on a variety of aforementioned datasets, we demonstrate the superior performance of BTN both in generalization and calibration compared to regular TN based learning models. This demonstrates the potential of the Bayesian formalism in the development of more powerful TN-based learning models.
贝叶斯统计学习是一种强大的推理和预测范式,它将内部信息(训练数据的抽样分布)和外部信息(先验知识或背景信息)集成在逻辑一致的概率框架内。此外,由贝叶斯规则导出的后验分布和后验预测(边际)分布分别总结了推理和预测所需的全部信息。在这项工作中,我们从两个角度研究了张量网络(BTN)的贝叶斯框架。首先,在推理步骤中,我们提出了一种有效的BTN参数初始化方案,显著提高了训练过程的鲁棒性和效率,从而提高了测试性能。其次,在预测阶段,我们考虑BTN中权重的高斯先验,并使用后验预测(边际)分布预测新观测值的标签。我们用拉普拉斯近似推导了后验预测分布的近似,其中应用了后验分布的Hessian矩阵的外积近似。在数值实验中,我们评估了我们的初始化策略的性能,并通过将其与其他流行的初始化方法(包括He初始化、Xavier初始化和halassos初始化方法)在加利福尼亚房价(CHP)、乳腺癌(BC)、钓鱼网站(PW)、MNIST、Fashion-MNIST (FMNIST)、SVHN和CIFAR-10数据集上的性能进行了比较,证明了它的优势。通过展示在二维合成数据集上训练的BTN参数和决策边界,我们进一步研究了BTN的特征。从泛化和定标两个方面深入分析了BTN的性能。通过在上述各种数据集上的实验,我们证明了与常规的基于TN的学习模型相比,BTN在泛化和校准方面都具有优越的性能。这证明了贝叶斯形式主义在开发更强大的基于tn的学习模型方面的潜力。
{"title":"A Bayesian approach to tensor networks","authors":"Erdong Guo,&nbsp;David Draper","doi":"10.1016/j.neucom.2026.132961","DOIUrl":"10.1016/j.neucom.2026.132961","url":null,"abstract":"<div><div>Bayesian statistical learning is a powerful paradigm for inference and prediction, which integrates internal information (sampling distribution of training data) and external information (prior knowledge or background information) within a logically consistent probabilistic framework. In addition, the posterior distribution and the posterior predictive (marginal) distribution derived from the Bayes’ rule summarize the entire information required for inference and prediction, respectively. In this work, we investigate the Bayesian framework of the Tensor Network (BTN) from two perspectives. First, for the inference step, we propose an effective initialization scheme for the BTN parameters, which significantly improves the robustness and efficiency of the training procedure and leads to improved test performance. Second, in the prediction stage, we consider the Gaussian prior of the weights in BTN and predict the labels of the new observations using the posterior predictive (marginal) distribution. We derive the approximation of the posterior predictive distribution using the Laplace approximation where the out-product approximation of the Hessian matrix of the posterior distribution is applied. In the numerical experiments, we evaluate the performance of our initialization strategy and demonstrate its advantages by comparing it with other popular initialization methods including He initialization, Xavier initialization and Haliassos initialization methods on California House Price (CHP), Breast Cancer (BC), Phishing Website (PW), MNIST, Fashion-MNIST (FMNIST), SVHN and CIFAR-10 datasets. We further examine the characteristics of BTN by showing its parameters and decision boundaries trained on the two-dimensional synthetic dataset. The performance of BTN is thoroughly analyzed from two perspectives: the generalization and calibration. Through the experiments on a variety of aforementioned datasets, we demonstrate the superior performance of BTN both in generalization and calibration compared to regular TN based learning models. This demonstrates the potential of the Bayesian formalism in the development of more powerful TN-based learning models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132961"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-driven robust state estimation based on EK-SVSF 基于EK-SVSF的数据驱动鲁棒状态估计
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.neucom.2026.132869
Meng Liu , Xiao He
This paper introduces a novel extension to the Extended Kalman-based Smooth Variable Structure Filter (EK-SVSF), a hybrid state estimation framework that integrates the Extended Kalman Filter (EKF) with the Smooth Variable Structure Filter (SVSF). Tailored for nonlinear systems subject to model uncertainties and external disturbances, EK-SVSF enhances estimation accuracy by leveraging the complementary strengths of its constituent filters. Nonetheless, the efficacy of EK-SVSF hinges critically on the selection of an appropriate width for the smoothing boundary layer (SBL); suboptimal values—either excessively large or small—can substantially impair filtering performance. Compounding this issue, inherent model uncertainties render the determination of an optimal SBL a formidable and enduring challenge. To mitigate this, we propose a data-driven methodology that autonomously extracts salient features from the smoothing boundary function, thereby resolving the parameter tuning dilemma under model uncertainty. Furthermore, to refine the associated multi-loss weighted aggregation, we incorporate an adaptive weighting scheme based on the coefficient of variation, enabling dynamic optimization. Empirical evaluations demonstrate that the proposed approach yields robust and resilient state estimation outcomes, even in the presence of significant model discrepancies.
本文对基于扩展卡尔曼的光滑变结构滤波器(EK-SVSF)进行了扩展,提出了一种将扩展卡尔曼滤波器(EKF)与光滑变结构滤波器(SVSF)相结合的混合状态估计框架。针对受模型不确定性和外部干扰影响的非线性系统,EK-SVSF通过利用其组成滤波器的互补优势来提高估计精度。尽管如此,EK-SVSF的有效性关键取决于平滑边界层(SBL)的适当宽度的选择;次优值(无论是过大还是过小)都会严重损害过滤性能。使这一问题复杂化的是,固有的模式不确定性使最佳SBL的确定成为一项艰巨而持久的挑战。为了缓解这一问题,我们提出了一种数据驱动的方法,该方法可以自动从平滑边界函数中提取显著特征,从而解决模型不确定性下的参数调整困境。此外,为了改进相关的多损失加权聚合,我们结合了一种基于变异系数的自适应加权方案,实现了动态优化。实证评估表明,即使在存在显著模型差异的情况下,所提出的方法也能产生稳健和有弹性的状态估计结果。
{"title":"Data-driven robust state estimation based on EK-SVSF","authors":"Meng Liu ,&nbsp;Xiao He","doi":"10.1016/j.neucom.2026.132869","DOIUrl":"10.1016/j.neucom.2026.132869","url":null,"abstract":"<div><div>This paper introduces a novel extension to the Extended Kalman-based Smooth Variable Structure Filter (EK-SVSF), a hybrid state estimation framework that integrates the Extended Kalman Filter (EKF) with the Smooth Variable Structure Filter (SVSF). Tailored for nonlinear systems subject to model uncertainties and external disturbances, EK-SVSF enhances estimation accuracy by leveraging the complementary strengths of its constituent filters. Nonetheless, the efficacy of EK-SVSF hinges critically on the selection of an appropriate width for the smoothing boundary layer (SBL); suboptimal values—either excessively large or small—can substantially impair filtering performance. Compounding this issue, inherent model uncertainties render the determination of an optimal SBL a formidable and enduring challenge. To mitigate this, we propose a data-driven methodology that autonomously extracts salient features from the smoothing boundary function, thereby resolving the parameter tuning dilemma under model uncertainty. Furthermore, to refine the associated multi-loss weighted aggregation, we incorporate an adaptive weighting scheme based on the coefficient of variation, enabling dynamic optimization. Empirical evaluations demonstrate that the proposed approach yields robust and resilient state estimation outcomes, even in the presence of significant model discrepancies.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132869"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAD-TCN: Time series anomaly detection via multi-scale adaptive dependency temporal convolutional network 基于多尺度自适应依赖时间卷积网络的时间序列异常检测
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.neucom.2026.132954
Yongping Dan , Zhaoyuan Wang , MengZhao Zhang , Zhuo Li
With the increasing complexity of industrial Internet of Things systems and other intelligent technologies, anomaly detection in multivariate time series has become pivotal for applications in equipment health monitoring and industrial process control. Existing methodologies often struggle with addressing the challenges of multivariate dependencies, temporal dynamics, and computational efficiency. Therefore, this paper introduces the Multi-scale Adaptive Dependency Temporal Convolutional Network (MAD-TCN), a lightweight and efficient model designed to overcome these limitations. MAD-TCN leverages a dual-branch architecture, utilizing both local (short-term) and global (long-term) temporal feature extraction through depthwise separable dilated convolutions, which are fused to achieve multiscale integration. The model incorporates a cross-variable convolutional feedforward network and an adaptive gated unit to dynamically adjust dependency relationships between variables, enhancing the model’s ability to handle complex interdependencies across multiple dimensions. Comprehensive experiments on four public benchmark datasets (SMAP, SWaT, SMD, MBA) alongside 13 state-of-the-art methods (including LSTM-NDT, DAGMM, TimesNet, TranAD and DTAAD) demonstrate that MAD-TCN outperforms the competition in terms of anomaly detection accuracy, achieving the highest or second-highest AUC and F1-scores, while maintaining a parameter count of only approximately 0.026 M. In addition, compared to the best alternative, MAD-TCN achieves a 34% improvement in training and inference speed. In summary, these experimental results fully demonstrate the superior performance of MAD-TCN in the time series anomaly detection task with both high accuracy and computational efficiency.Source code: https://github.com/qianmo2001/MAD-TCN
现有的方法经常在解决多变量依赖关系、时间动态和计算效率的挑战方面挣扎。因此,本文引入了多尺度自适应依赖时态卷积网络(MAD-TCN),这是一种轻量级、高效的模型,旨在克服这些局限性。MAD-TCN利用双分支架构,通过深度可分离的扩张卷积同时利用局部(短期)和全局(长期)时间特征提取,并将其融合以实现多尺度集成。该模型结合了一个跨变量卷积前馈网络和一个自适应门控单元来动态调整变量之间的依赖关系,增强了模型处理多维复杂相互依赖关系的能力。在4个公共基准数据集(SMAP、SWaT、SMD、MBA)和13种最先进的方法(包括LSTM-NDT、DAGMM、TimesNet、TranAD和DTAAD)上进行的综合实验表明,MAD-TCN在异常检测精度方面优于竞争对手,实现了最高或第二高的AUC和f1分数,同时保持了大约0.026 m的参数计数。MAD-TCN在训练和推理速度上提高了34%。综上所述,这些实验结果充分证明了MAD-TCN在时间序列异常检测任务中的优越性能,具有较高的精度和计算效率。源代码:https://github.com/qianmo2001/MAD-TCN
{"title":"MAD-TCN: Time series anomaly detection via multi-scale adaptive dependency temporal convolutional network","authors":"Yongping Dan ,&nbsp;Zhaoyuan Wang ,&nbsp;MengZhao Zhang ,&nbsp;Zhuo Li","doi":"10.1016/j.neucom.2026.132954","DOIUrl":"10.1016/j.neucom.2026.132954","url":null,"abstract":"<div><div>With the increasing complexity of industrial Internet of Things systems and other intelligent technologies, anomaly detection in multivariate time series has become pivotal for applications in equipment health monitoring and industrial process control. Existing methodologies often struggle with addressing the challenges of multivariate dependencies, temporal dynamics, and computational efficiency. Therefore, this paper introduces the Multi-scale Adaptive Dependency Temporal Convolutional Network (MAD-TCN), a lightweight and efficient model designed to overcome these limitations. MAD-TCN leverages a dual-branch architecture, utilizing both local (short-term) and global (long-term) temporal feature extraction through depthwise separable dilated convolutions, which are fused to achieve multiscale integration. The model incorporates a cross-variable convolutional feedforward network and an adaptive gated unit to dynamically adjust dependency relationships between variables, enhancing the model’s ability to handle complex interdependencies across multiple dimensions. Comprehensive experiments on four public benchmark datasets (SMAP, SWaT, SMD, MBA) alongside 13 state-of-the-art methods (including LSTM-NDT, DAGMM, TimesNet, TranAD and DTAAD) demonstrate that MAD-TCN outperforms the competition in terms of anomaly detection accuracy, achieving the highest or second-highest AUC and F1-scores, while maintaining a parameter count of only approximately 0.026 M. In addition, compared to the best alternative, MAD-TCN achieves a 34% improvement in training and inference speed. In summary, these experimental results fully demonstrate the superior performance of MAD-TCN in the time series anomaly detection task with both high accuracy and computational efficiency.Source code: <span><span>https://github.com/qianmo2001/MAD-TCN</span><svg><path></path></svg></span></div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132954"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tensor-to-tensor models with fast iterated sum features 具有快速迭代和特征的张量到张量模型
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132884
Joscha Diehl , Rasheed Ibraheem , Leonard Schmitz , Yue Wu
Designing expressive yet computationally efficient layers for high-dimensional tensor data (e.g., images) remains a significant challenge. While sequence modeling has seen a shift toward linear-time architectures, extending these benefits to higher-order tensors is non-trivial.
In this work, we introduce the Fast Iterated Sums (FIS) layer, a novel tensor-to-tensor primitive with linear time and space complexity relative to the input size.
Theoretically, our framework bridges deep learning and algorithmic combinatorics: it leverages “corner tree” structures from permutation pattern counting to efficiently compute 2D iterated sums. This formulation admits dual interpretations as both a higher-order state-space model (SSM) and a multiparameter extension of the Signature Transform.
Practically, the FIS layer serves as a drop-in replacement for standard layers in vision backbones. We evaluate its performance on image classification and anomaly detection. When replacing layers in a smaller ResNet, the FIS-based model achieves accuracy of a larger ResNet baseline while reducing both trainable parameters and multiply-add operations. When replacing layers in ConvNeXt tiny, the FIS-based model saves around 2% of parameters, has around 8% shorter time per epoch and improves accuracy by around 0.6% on CIFAR-10 and around 2% on CIFAR-100. Furthermore, on the texture subset of MVTec AD, it attains an average AUROC of 97.3%. The code is available at https://github.com/diehlj/fast-iterated-sums.
为高维张量数据(如图像)设计具有表现力且计算效率高的层仍然是一个重大挑战。虽然序列建模已经转向线性时间架构,但将这些好处扩展到高阶张量是非常重要的。在这项工作中,我们引入了快速迭代求和(FIS)层,这是一种新的张量到张量原语,相对于输入大小具有线性的时间和空间复杂性。从理论上讲,我们的框架连接了深度学习和算法组合:它利用来自排列模式计数的“角树”结构来有效地计算二维迭代和。该公式允许双重解释,即高阶状态空间模型(SSM)和签名变换的多参数扩展。实际上,FIS层是视觉骨干中标准层的替代层。对其在图像分类和异常检测方面的性能进行了评价。当在较小的ResNet中替换层时,基于fis的模型在减少可训练参数和乘加操作的同时达到了较大ResNet基线的精度。当在ConvNeXt tiny中替换层时,基于fis的模型节省了约2%的参数,每个历元的时间缩短了约8%,并且在CIFAR-10上提高了约0.6%的精度,在CIFAR-100上提高了约2%的精度。在MVTec AD的纹理子集上,平均AUROC达到97.3%。代码可在https://github.com/diehlj/fast-iterated-sums上获得。
{"title":"Tensor-to-tensor models with fast iterated sum features","authors":"Joscha Diehl ,&nbsp;Rasheed Ibraheem ,&nbsp;Leonard Schmitz ,&nbsp;Yue Wu","doi":"10.1016/j.neucom.2026.132884","DOIUrl":"10.1016/j.neucom.2026.132884","url":null,"abstract":"<div><div>Designing expressive yet computationally efficient layers for high-dimensional tensor data (e.g., images) remains a significant challenge. While sequence modeling has seen a shift toward linear-time architectures, extending these benefits to higher-order tensors is non-trivial.</div><div>In this work, we introduce the <strong>Fast Iterated Sums (FIS)</strong> layer, a novel tensor-to-tensor primitive with <strong>linear time and space complexity</strong> relative to the input size.</div><div>Theoretically, our framework bridges deep learning and algorithmic combinatorics: it leverages “corner tree” structures from permutation pattern counting to efficiently compute 2D iterated sums. This formulation admits dual interpretations as both a higher-order state-space model (SSM) and a multiparameter extension of the Signature Transform.</div><div>Practically, the FIS layer serves as a drop-in replacement for standard layers in vision backbones. We evaluate its performance on image classification and anomaly detection. When replacing layers in a smaller ResNet, the FIS-based model achieves accuracy of a larger ResNet baseline while reducing both trainable parameters and multiply-add operations. When replacing layers in ConvNeXt tiny, the FIS-based model saves around 2% of parameters, has around 8% shorter time per epoch and improves accuracy by around 0.6% on CIFAR-10 and around 2% on CIFAR-100. Furthermore, on the texture subset of MVTec AD, it attains an average AUROC of 97.3%. The code is available at <span><span>https://github.com/diehlj/fast-iterated-sums</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132884"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sign language translation via cross-modal alignment and graph convolution 基于跨模态对齐和图卷积的手语翻译
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132949
Ming Yu , Pengfei Zhang , Cuihong Xue , Yingchun Guo
Sign language translation (SLT) converts sign language videos into textual sentences. This process is essential for enabling communication between deaf and hearing individuals. However, the inherent modal gap between visual sign sequences and textual linguistics severely limits performance. Existing methods rely on costly gloss annotations for intermediate supervision, restricting scalability; unsupervised alternatives lack fine-grained alignment or semantic learning capabilities. To address this, we introduce CMAG-Net, a framework integrating cross-modal alignment pre-training and dynamic graph convolutions. The architecture comprises two modules: (1) A cross-modal alignment pre-training module. Optimized with a multi-objective loss, it learns to align visual features with textual semantics, effectively bridging the modality gap without gloss supervision; (2) A dynamic dual-graph spatiotemporal module. It consists of a temporal graph that captures local sign dynamics and a similarity graph that aggregates global semantic relationships. This design suppresses noise, enhances discriminative features, and addresses the challenges of redundant frames and complex spatiotemporal dependencies. Experiments show CMAG-Net outperforms all gloss-free methods on PHOENIX-2014T, CSL-Daily and How2Sign, approaching gloss-based state-of-the-art performance. Versus GFSLT-VLP (gloss-free) on PHOENIX-2014T dev/test sets, BLEU-4 improves by +5.19/+5.95. Compared to MMTLB (gloss-based), the gap narrows to 0.37/0.22 BLEU-4.
手语翻译(SLT)将手语视频转换成文本句子。这一过程对于聋人与正常人之间的交流至关重要。然而,视觉符号序列与语篇语言学之间固有的模态差距严重限制了语言的表现。现有方法依赖于昂贵的光泽注释进行中间监督,限制了可扩展性;无监督替代方案缺乏细粒度对齐或语义学习能力。为了解决这个问题,我们引入了CMAG-Net,这是一个集成了跨模态对齐预训练和动态图卷积的框架。该体系结构包括两个模块:(1)跨模态对齐预训练模块。通过多目标损失优化,学习将视觉特征与文本语义对齐,在没有光泽监督的情况下有效弥合模态差距;(2)动态双图时空模块。它由捕获局部符号动态的时间图和聚合全局语义关系的相似图组成。这种设计抑制了噪声,增强了识别特征,并解决了冗余帧和复杂时空依赖性的挑战。实验表明,CMAG-Net在PHOENIX-2014T、CSL-Daily和How2Sign上优于所有无光泽方法,接近基于光泽的最先进性能。与PHOENIX-2014T开发/测试集上的GFSLT-VLP(无光泽)相比,BLEU-4提高了+5.19/+5.95。与MMTLB(基于光泽)相比,差距缩小到0.37/0.22 BLEU-4。
{"title":"Sign language translation via cross-modal alignment and graph convolution","authors":"Ming Yu ,&nbsp;Pengfei Zhang ,&nbsp;Cuihong Xue ,&nbsp;Yingchun Guo","doi":"10.1016/j.neucom.2026.132949","DOIUrl":"10.1016/j.neucom.2026.132949","url":null,"abstract":"<div><div>Sign language translation (SLT) converts sign language videos into textual sentences. This process is essential for enabling communication between deaf and hearing individuals. However, the inherent modal gap between visual sign sequences and textual linguistics severely limits performance. Existing methods rely on costly gloss annotations for intermediate supervision, restricting scalability; unsupervised alternatives lack fine-grained alignment or semantic learning capabilities. To address this, we introduce CMAG-Net, a framework integrating cross-modal alignment pre-training and dynamic graph convolutions. The architecture comprises two modules: (1) A cross-modal alignment pre-training module. Optimized with a multi-objective loss, it learns to align visual features with textual semantics, effectively bridging the modality gap without gloss supervision; (2) A dynamic dual-graph spatiotemporal module. It consists of a temporal graph that captures local sign dynamics and a similarity graph that aggregates global semantic relationships. This design suppresses noise, enhances discriminative features, and addresses the challenges of redundant frames and complex spatiotemporal dependencies. Experiments show CMAG-Net outperforms all gloss-free methods on PHOENIX-2014T, CSL-Daily and How2Sign, approaching gloss-based state-of-the-art performance. Versus GFSLT-VLP (gloss-free) on PHOENIX-2014T dev/test sets, BLEU-4 improves by +5.19/+5.95. Compared to MMTLB (gloss-based), the gap narrows to 0.37/0.22 BLEU-4.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132949"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CDINet: A cascaded dual-domain interaction network for vapor degraded thermal infrared image restoration CDINet:用于蒸汽退化热红外图像恢复的级联双域相互作用网络
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132930
Kailun Wei, Xiaoyan Liu, Wei Zhao
Infrared thermography allows imaging in dark and smoky environments and is widely used in firefighting and industrial scenarios. However, high temperature water vapor in the above scenarios can significantly degrade the quality of thermal infrared (TIR) images, leading to errors in subsequent visual tasks. The non-uniform distribution of high-temperature water vapor and the resulting severe information loss in TIR images pose significant challenges to restoration. To address this issue, we propose a cascaded dual-domain interaction network (CDINet) for TIR image restoration. The Dual-domain Interaction Block (DIB) is designed as the basic unit of CDINet. This module enhances feature representation through spatial-frequency interaction, thereby improving the model’s performance in perceiving and restoring non-uniform vapor degraded regions. In addition, we introduce Long Short-Term Memory (LSTM) and design CDINet as a cascade structure to progressively restore and refine the lost information caused by vapor interference in an iterative manner. Furthermore, we have constructed a benchmark dataset comprising 12,500 vapor degraded TIR images to evaluate the restoration performance of different models. Extensive experiments comparing our CDINet with 12 state-of-the-art methods have shown that CDINet can effectively eliminate vapor interference from scenes with varying distributions. It outperforms other methods, especially in challenging scenarios with large non-uniform dense and localized non-uniform vapor degradation. The dataset and code are publicly available at: https://github.com/wkl1996/CDINet-TIR-Restoration.
红外热成像允许在黑暗和烟雾环境中成像,并广泛用于消防和工业场景。然而,在上述情况下,高温水蒸气会显著降低热红外(TIR)图像的质量,导致后续视觉任务的错误。高温水蒸气的不均匀分布及其造成的严重信息损失给红外图像的恢复带来了巨大的挑战。为了解决这个问题,我们提出了一个级联双域交互网络(CDINet)用于红外图像恢复。双域交互块(Dual-domain Interaction Block, DIB)是CDINet的基本单元。该模块通过空间-频率交互增强特征表示,从而提高模型感知和恢复非均匀蒸汽退化区域的性能。此外,我们引入了长短期记忆(LSTM),并将CDINet设计为级联结构,以迭代的方式逐步恢复和细化由蒸汽干扰引起的丢失信息。此外,我们还构建了一个包含12500张蒸汽退化TIR图像的基准数据集,以评估不同模型的恢复性能。将CDINet与12种最先进的方法进行比较的大量实验表明,CDINet可以有效地消除不同分布场景中的蒸汽干扰。它优于其他方法,特别是在具有大量非均匀密度和局部非均匀蒸汽降解的挑战性场景中。数据集和代码可在:https://github.com/wkl1996/CDINet-TIR-Restoration上公开获取。
{"title":"CDINet: A cascaded dual-domain interaction network for vapor degraded thermal infrared image restoration","authors":"Kailun Wei,&nbsp;Xiaoyan Liu,&nbsp;Wei Zhao","doi":"10.1016/j.neucom.2026.132930","DOIUrl":"10.1016/j.neucom.2026.132930","url":null,"abstract":"<div><div>Infrared thermography allows imaging in dark and smoky environments and is widely used in firefighting and industrial scenarios. However, high temperature water vapor in the above scenarios can significantly degrade the quality of thermal infrared (TIR) images, leading to errors in subsequent visual tasks. The non-uniform distribution of high-temperature water vapor and the resulting severe information loss in TIR images pose significant challenges to restoration. To address this issue, we propose a cascaded dual-domain interaction network (CDINet) for TIR image restoration. The Dual-domain Interaction Block (DIB) is designed as the basic unit of CDINet. This module enhances feature representation through spatial-frequency interaction, thereby improving the model’s performance in perceiving and restoring non-uniform vapor degraded regions. In addition, we introduce Long Short-Term Memory (LSTM) and design CDINet as a cascade structure to progressively restore and refine the lost information caused by vapor interference in an iterative manner. Furthermore, we have constructed a benchmark dataset comprising 12,500 vapor degraded TIR images to evaluate the restoration performance of different models. Extensive experiments comparing our CDINet with 12 state-of-the-art methods have shown that CDINet can effectively eliminate vapor interference from scenes with varying distributions. It outperforms other methods, especially in challenging scenarios with large non-uniform dense and localized non-uniform vapor degradation. The dataset and code are publicly available at: <span><span>https://github.com/wkl1996/CDINet-TIR-Restoration</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132930"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An explainable multi-view representation fusion learning framework with hybrid MetaFormer for EEG-based epileptic seizure detection 基于脑电图的癫痫发作检测中可解释的多视图表示融合学习框架与混合元former
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132929
Jingyue Wang , Lu Wei , Zheng Qian , Chengyao Shi , Yuwen Liu , Yinglan Xu
Multi-view learning (MVL), a paradigm of deep learning, has greatly facilitated the detection of epileptic seizures from electroencephalograms (EEGs) owing to its remarkable capability to learn generalization features. However, existing MVL-based seizure detection methods rely on decision strategies to aggregate the discriminative outputs of separate learners, leading to insufficient extraction of inter-view complementarity and limiting the detection performance. To address this issue, this paper focuses on two aspects and proposes a multi-view representation fusion learning framework, which enables direct information fusion at the feature encoding level. Firstly, to enhance discriminability, we construct hierarchical multi-view representations based on the Gramian Angular Summation Field and an improved Stockwell transform by introducing the spatial characteristics of EEG montages and temporal dependency dynamics. Secondly, to process both local and global features comprehensively, we propose a hybrid MetaFormer network that incorporates inverted depth-wise separable convolutions and sparsity-enhanced shifted-window attention mechanisms. Specifically, the fusion unit with cross-attention mechanisms exploits the Key and Value matrices to achieve effective inter-view information exchange. The experimental results on the public CHB-MIT and Siena datasets demonstrate that the proposed method outperforms competing techniques in both sample-based and event-based evaluations for EEG seizure detection. In addition, an explanation module is devised based on feature importance scoring. In this way, our method enables post-hoc explanations for the multi-view fusion learning process and discriminative results utilizing topographic maps, indicating an explainable computational solution for EEG seizure detection.
多视图学习(Multi-view learning, MVL)是一种深度学习的范例,由于其学习泛化特征的能力,极大地促进了从脑电图(eeg)中检测癫痫发作。然而,现有的基于mhl的癫痫检测方法依赖于决策策略来汇总不同学习器的判别输出,导致无法充分提取视图间互补性,限制了检测性能。针对这一问题,本文从两个方面着手,提出了一种多视图表示融合学习框架,实现了特征编码层的直接信息融合。首先,通过引入EEG蒙太奇的空间特征和时间依赖动力学,基于Gramian角求和场和改进的Stockwell变换构建分层多视图表示,增强了识别能力;其次,为了综合处理局部和全局特征,我们提出了一个混合的MetaFormer网络,该网络结合了倒转深度可分离卷积和稀疏增强的移动窗口注意机制。具体而言,融合单元采用交叉注意机制,利用关键矩阵和价值矩阵实现有效的面谈信息交换。在CHB-MIT和Siena公共数据集上的实验结果表明,该方法在基于样本和基于事件的脑电图发作检测评估方面优于竞争技术。此外,设计了基于特征重要性评分的解释模块。通过这种方式,我们的方法可以对多视图融合学习过程和利用地形图的判别结果进行事后解释,为脑电图发作检测提供了一种可解释的计算解决方案。
{"title":"An explainable multi-view representation fusion learning framework with hybrid MetaFormer for EEG-based epileptic seizure detection","authors":"Jingyue Wang ,&nbsp;Lu Wei ,&nbsp;Zheng Qian ,&nbsp;Chengyao Shi ,&nbsp;Yuwen Liu ,&nbsp;Yinglan Xu","doi":"10.1016/j.neucom.2026.132929","DOIUrl":"10.1016/j.neucom.2026.132929","url":null,"abstract":"<div><div>Multi-view learning (MVL), a paradigm of deep learning, has greatly facilitated the detection of epileptic seizures from electroencephalograms (EEGs) owing to its remarkable capability to learn generalization features. However, existing MVL-based seizure detection methods rely on decision strategies to aggregate the discriminative outputs of separate learners, leading to insufficient extraction of inter-view complementarity and limiting the detection performance. To address this issue, this paper focuses on two aspects and proposes a multi-view representation fusion learning framework, which enables direct information fusion at the feature encoding level. Firstly, to enhance discriminability, we construct hierarchical multi-view representations based on the Gramian Angular Summation Field and an improved Stockwell transform by introducing the spatial characteristics of EEG montages and temporal dependency dynamics. Secondly, to process both local and global features comprehensively, we propose a hybrid MetaFormer network that incorporates inverted depth-wise separable convolutions and sparsity-enhanced shifted-window attention mechanisms. Specifically, the fusion unit with cross-attention mechanisms exploits the Key and Value matrices to achieve effective inter-view information exchange. The experimental results on the public CHB-MIT and Siena datasets demonstrate that the proposed method outperforms competing techniques in both sample-based and event-based evaluations for EEG seizure detection. In addition, an explanation module is devised based on feature importance scoring. In this way, our method enables post-hoc explanations for the multi-view fusion learning process and discriminative results utilizing topographic maps, indicating an explainable computational solution for EEG seizure detection.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132929"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neurocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1