首页 > 最新文献

Neurocomputing最新文献

英文 中文
MAD-TCN: Time series anomaly detection via multi-scale adaptive dependency temporal convolutional network 基于多尺度自适应依赖时间卷积网络的时间序列异常检测
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.neucom.2026.132954
Yongping Dan , Zhaoyuan Wang , MengZhao Zhang , Zhuo Li
With the increasing complexity of industrial Internet of Things systems and other intelligent technologies, anomaly detection in multivariate time series has become pivotal for applications in equipment health monitoring and industrial process control. Existing methodologies often struggle with addressing the challenges of multivariate dependencies, temporal dynamics, and computational efficiency. Therefore, this paper introduces the Multi-scale Adaptive Dependency Temporal Convolutional Network (MAD-TCN), a lightweight and efficient model designed to overcome these limitations. MAD-TCN leverages a dual-branch architecture, utilizing both local (short-term) and global (long-term) temporal feature extraction through depthwise separable dilated convolutions, which are fused to achieve multiscale integration. The model incorporates a cross-variable convolutional feedforward network and an adaptive gated unit to dynamically adjust dependency relationships between variables, enhancing the model’s ability to handle complex interdependencies across multiple dimensions. Comprehensive experiments on four public benchmark datasets (SMAP, SWaT, SMD, MBA) alongside 13 state-of-the-art methods (including LSTM-NDT, DAGMM, TimesNet, TranAD and DTAAD) demonstrate that MAD-TCN outperforms the competition in terms of anomaly detection accuracy, achieving the highest or second-highest AUC and F1-scores, while maintaining a parameter count of only approximately 0.026 M. In addition, compared to the best alternative, MAD-TCN achieves a 34% improvement in training and inference speed. In summary, these experimental results fully demonstrate the superior performance of MAD-TCN in the time series anomaly detection task with both high accuracy and computational efficiency.Source code: https://github.com/qianmo2001/MAD-TCN
现有的方法经常在解决多变量依赖关系、时间动态和计算效率的挑战方面挣扎。因此,本文引入了多尺度自适应依赖时态卷积网络(MAD-TCN),这是一种轻量级、高效的模型,旨在克服这些局限性。MAD-TCN利用双分支架构,通过深度可分离的扩张卷积同时利用局部(短期)和全局(长期)时间特征提取,并将其融合以实现多尺度集成。该模型结合了一个跨变量卷积前馈网络和一个自适应门控单元来动态调整变量之间的依赖关系,增强了模型处理多维复杂相互依赖关系的能力。在4个公共基准数据集(SMAP、SWaT、SMD、MBA)和13种最先进的方法(包括LSTM-NDT、DAGMM、TimesNet、TranAD和DTAAD)上进行的综合实验表明,MAD-TCN在异常检测精度方面优于竞争对手,实现了最高或第二高的AUC和f1分数,同时保持了大约0.026 m的参数计数。MAD-TCN在训练和推理速度上提高了34%。综上所述,这些实验结果充分证明了MAD-TCN在时间序列异常检测任务中的优越性能,具有较高的精度和计算效率。源代码:https://github.com/qianmo2001/MAD-TCN
{"title":"MAD-TCN: Time series anomaly detection via multi-scale adaptive dependency temporal convolutional network","authors":"Yongping Dan ,&nbsp;Zhaoyuan Wang ,&nbsp;MengZhao Zhang ,&nbsp;Zhuo Li","doi":"10.1016/j.neucom.2026.132954","DOIUrl":"10.1016/j.neucom.2026.132954","url":null,"abstract":"<div><div>With the increasing complexity of industrial Internet of Things systems and other intelligent technologies, anomaly detection in multivariate time series has become pivotal for applications in equipment health monitoring and industrial process control. Existing methodologies often struggle with addressing the challenges of multivariate dependencies, temporal dynamics, and computational efficiency. Therefore, this paper introduces the Multi-scale Adaptive Dependency Temporal Convolutional Network (MAD-TCN), a lightweight and efficient model designed to overcome these limitations. MAD-TCN leverages a dual-branch architecture, utilizing both local (short-term) and global (long-term) temporal feature extraction through depthwise separable dilated convolutions, which are fused to achieve multiscale integration. The model incorporates a cross-variable convolutional feedforward network and an adaptive gated unit to dynamically adjust dependency relationships between variables, enhancing the model’s ability to handle complex interdependencies across multiple dimensions. Comprehensive experiments on four public benchmark datasets (SMAP, SWaT, SMD, MBA) alongside 13 state-of-the-art methods (including LSTM-NDT, DAGMM, TimesNet, TranAD and DTAAD) demonstrate that MAD-TCN outperforms the competition in terms of anomaly detection accuracy, achieving the highest or second-highest AUC and F1-scores, while maintaining a parameter count of only approximately 0.026 M. In addition, compared to the best alternative, MAD-TCN achieves a 34% improvement in training and inference speed. In summary, these experimental results fully demonstrate the superior performance of MAD-TCN in the time series anomaly detection task with both high accuracy and computational efficiency.Source code: <span><span>https://github.com/qianmo2001/MAD-TCN</span><svg><path></path></svg></span></div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132954"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-triggered asynchronous control for IT-2 fuzzy time-delay Markov jump systems: A membership derivatives LKF method IT-2模糊时滞马尔可夫跳变系统的自触发异步控制:一种隶属度导数LKF方法
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-04 DOI: 10.1016/j.neucom.2026.132955
Xiao-Yan Wang , Xiao-Heng Chang , Xi-Ming Liu
In this paper, under the framework of state-dependent membership function, a stochastic stability analysis framework combining mode-dependent saturation self-triggered strategy and dynamic compensation mechanism is proposed for time-varying delay interval type-2 Takagi-Sugeno (T-S) fuzzy Markov jump system (MJS) with self-triggered mechanism (STM). By designing a trigger logic driven by real-time state information, time-delay state feedback and modal-dependent threshold lower bound, the trigger redundancy caused by time-varying delay is effectively suppressed. Based on the characteristics of asynchronous membership function, Lyapunov-Krasovskii functional (LKF) with quadratic asynchronous membership weight is constructed. The quadratic fuzzy time-delay integral term is introduced to enhance the ability of time-delay information representation, and the non-fuzzy term compensation mechanism is used to reduce conservatism. Furthermore, combined with the continuous differentiability of membership function, the compact constraint condition of its derivative is derived, which lays a foundation for stability analysis. Based on the Lyapunov theory, the local stochastic stability of the system under the time-varying delay interval quadratic T-S fuzzy Markov is strictly proved. Finally, the effectiveness and feasibility of the proposed method are proved by numerical simulation and truck-trailer system examples.
本文针对具有自触发机制(STM)的时变时滞区间2型Takagi-Sugeno (T-S)模糊马尔可夫跳变系统(MJS),在状态相关隶属函数框架下,提出了一种结合模式相关饱和自触发策略和动态补偿机制的随机稳定性分析框架。通过设计由实时状态信息、时滞状态反馈和模态相关阈值下界驱动的触发逻辑,有效地抑制了时变延迟导致的触发冗余。基于异步隶属函数的特点,构造了具有二次异步隶属权的Lyapunov-Krasovskii泛函(LKF)。引入二次模糊时滞积分项增强了时滞信息的表示能力,采用非模糊项补偿机制降低了保守性。结合隶属函数的连续可微性,导出了其导数的紧约束条件,为稳定性分析奠定了基础。基于李雅普诺夫理论,严格证明了系统在时变时滞区间二次T-S模糊马尔可夫条件下的局部随机稳定性。最后,通过数值仿真和卡车-挂车系统算例验证了所提方法的有效性和可行性。
{"title":"Self-triggered asynchronous control for IT-2 fuzzy time-delay Markov jump systems: A membership derivatives LKF method","authors":"Xiao-Yan Wang ,&nbsp;Xiao-Heng Chang ,&nbsp;Xi-Ming Liu","doi":"10.1016/j.neucom.2026.132955","DOIUrl":"10.1016/j.neucom.2026.132955","url":null,"abstract":"<div><div>In this paper, under the framework of state-dependent membership function, a stochastic stability analysis framework combining mode-dependent saturation self-triggered strategy and dynamic compensation mechanism is proposed for time-varying delay interval type-2 Takagi-Sugeno (T-S) fuzzy Markov jump system (MJS) with self-triggered mechanism (STM). By designing a trigger logic driven by real-time state information, time-delay state feedback and modal-dependent threshold lower bound, the trigger redundancy caused by time-varying delay is effectively suppressed. Based on the characteristics of asynchronous membership function, Lyapunov-Krasovskii functional (LKF) with quadratic asynchronous membership weight is constructed. The quadratic fuzzy time-delay integral term is introduced to enhance the ability of time-delay information representation, and the non-fuzzy term compensation mechanism is used to reduce conservatism. Furthermore, combined with the continuous differentiability of membership function, the compact constraint condition of its derivative is derived, which lays a foundation for stability analysis. Based on the Lyapunov theory, the local stochastic stability of the system under the time-varying delay interval quadratic T-S fuzzy Markov is strictly proved. Finally, the effectiveness and feasibility of the proposed method are proved by numerical simulation and truck-trailer system examples.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132955"},"PeriodicalIF":6.5,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LumiGAN: Memory-guided dual-branch learning for real-world low-light image enhancement LumiGAN:用于现实世界弱光图像增强的记忆引导双分支学习
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132946
Aoping Hong , Xiangyu Chen , Hongying Tang , Jiuhang Wang , Baoqing Li
Under low-light conditions, real-world images often exhibit significant illumination variations and uneven image quality. However, existing algorithms typically employ uniform enhancement strategies that disregard semantic consistency when processing such images, leading to issues such as overexposure, underexposure, or amplified noise and artifacts in shadow regions. To address this issue, we propose LumiGAN, a memory-based dual-branch network for low-light image enhancement. Specifically, LumiGAN utilizes a Quality Assessment Module (QAM) to segment images into regions requiring different enhancement levels. These regions are then processed by the encoder, which comprises the Spatial Dual-Branch Encoder Module (SDEM) and the Frequency Dual-Branch Encoder Module (FDEM). The SDEM extracts local and non-local features in network’s shallow layers through convolutions with varying receptive fields, while the FDEM captures global illumination and structural information in the deeper layers. Furthermore, these encoders optimize feature segmentation and extraction through dual-branch feature interaction. Finally, the decoder fuses and reconstructs the dual-branch features. Additionally, a memory bank module is introduced in the network’s intermediate layers. Drawing inspiration from human visual memory principles, this module enhances the semantic information of intermediate-layer features, thereby improving the consistency between the original image and the enhanced image. Comprehensive qualitative and quantitative evaluations on benchmark datasets demonstrate that our algorithm not only improves image brightness uniformity but also effectively suppresses noise and artifacts, while substantially boosting semantic consistency and image aesthetic quality. Codes and models are available at https://github.com/lLIVHT/LumiGAN.
在低光条件下,真实世界的图像往往表现出显著的照明变化和不均匀的图像质量。然而,现有算法通常采用统一的增强策略,在处理此类图像时忽略语义一致性,从而导致过度曝光、曝光不足或阴影区域的噪声放大和伪影等问题。为了解决这个问题,我们提出了一种基于内存的双分支网络LumiGAN,用于弱光图像增强。具体来说,LumiGAN利用质量评估模块(QAM)将图像分割成需要不同增强级别的区域。然后由编码器处理这些区域,编码器包括空间双支路编码器模块(SDEM)和频率双支路编码器模块(FDEM)。SDEM通过不同接受域的卷积提取网络浅层的局部和非局部特征,而FDEM捕获深层的全局光照和结构信息。此外,这些编码器通过双分支特征交互优化特征分割和提取。最后,对双支路特征进行融合重构。此外,在网络的中间层中引入了一个存储库模块。该模块借鉴人类视觉记忆原理,增强中间层特征的语义信息,从而提高原始图像与增强图像的一致性。对基准数据集的综合定性和定量评价表明,我们的算法不仅提高了图像亮度均匀性,而且有效地抑制了噪声和伪影,同时大大提高了语义一致性和图像审美质量。代码和模型可在https://github.com/lLIVHT/LumiGAN上获得。
{"title":"LumiGAN: Memory-guided dual-branch learning for real-world low-light image enhancement","authors":"Aoping Hong ,&nbsp;Xiangyu Chen ,&nbsp;Hongying Tang ,&nbsp;Jiuhang Wang ,&nbsp;Baoqing Li","doi":"10.1016/j.neucom.2026.132946","DOIUrl":"10.1016/j.neucom.2026.132946","url":null,"abstract":"<div><div>Under low-light conditions, real-world images often exhibit significant illumination variations and uneven image quality. However, existing algorithms typically employ uniform enhancement strategies that disregard semantic consistency when processing such images, leading to issues such as overexposure, underexposure, or amplified noise and artifacts in shadow regions. To address this issue, we propose LumiGAN, a memory-based dual-branch network for low-light image enhancement. Specifically, LumiGAN utilizes a Quality Assessment Module (QAM) to segment images into regions requiring different enhancement levels. These regions are then processed by the encoder, which comprises the Spatial Dual-Branch Encoder Module (SDEM) and the Frequency Dual-Branch Encoder Module (FDEM). The SDEM extracts local and non-local features in network’s shallow layers through convolutions with varying receptive fields, while the FDEM captures global illumination and structural information in the deeper layers. Furthermore, these encoders optimize feature segmentation and extraction through dual-branch feature interaction. Finally, the decoder fuses and reconstructs the dual-branch features. Additionally, a memory bank module is introduced in the network’s intermediate layers. Drawing inspiration from human visual memory principles, this module enhances the semantic information of intermediate-layer features, thereby improving the consistency between the original image and the enhanced image. Comprehensive qualitative and quantitative evaluations on benchmark datasets demonstrate that our algorithm not only improves image brightness uniformity but also effectively suppresses noise and artifacts, while substantially boosting semantic consistency and image aesthetic quality. Codes and models are available at <span><span>https://github.com/lLIVHT/LumiGAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132946"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A practical guide to streaming continual learning 流持续学习的实用指南
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132951
Andrea Cossu , Federico Giannini , Giacomo Ziffer , Alessio Bernardo , Alexander Gepperth , Emanuele Della Valle , Barbara Hammer , Davide Bacciu
Continual Learning (CL) and Streaming Machine Learning (SML) study the ability of agents to learn from a stream of non-stationary data. Despite sharing some similarities, they address different and complementary challenges. While SML focuses on rapid adaptation after changes (concept drifts), CL aims to retain past knowledge when learning new tasks. After a brief introduction to CL and SML, we discuss Streaming Continual Learning (SCL), an emerging paradigm providing a unifying solution to real-world problems, which may require both SML and CL abilities. We claim that SCL can i) connect the CL and SML communities, motivating their work towards the same goal, and ii) foster the design of hybrid approaches that can quickly adapt to new information (as in SML) without forgetting previous knowledge (as in CL). We conclude the paper with a motivating example and a set of experiments, highlighting the need for SCL by showing how CL and SML alone struggle to achieve rapid adaptation and knowledge retention.
持续学习(CL)和流机器学习(SML)研究智能体从非平稳数据流中学习的能力。尽管有一些相似之处,但它们解决了不同且互补的挑战。SML关注的是变化后的快速适应(概念漂移),而CL的目标是在学习新任务时保留过去的知识。在简要介绍了CL和SML之后,我们将讨论流式持续学习(SCL),这是一种新兴的范例,为现实世界的问题提供了统一的解决方案,这可能需要SML和CL的能力。我们声称SCL可以i)连接CL和SML社区,激励他们朝着相同的目标工作,以及ii)促进混合方法的设计,这些方法可以快速适应新信息(如SML),而不会忘记以前的知识(如CL)。我们以一个鼓舞人心的例子和一组实验来结束本文,通过展示CL和SML如何单独努力实现快速适应和知识保留,突出了对SCL的需求。
{"title":"A practical guide to streaming continual learning","authors":"Andrea Cossu ,&nbsp;Federico Giannini ,&nbsp;Giacomo Ziffer ,&nbsp;Alessio Bernardo ,&nbsp;Alexander Gepperth ,&nbsp;Emanuele Della Valle ,&nbsp;Barbara Hammer ,&nbsp;Davide Bacciu","doi":"10.1016/j.neucom.2026.132951","DOIUrl":"10.1016/j.neucom.2026.132951","url":null,"abstract":"<div><div>Continual Learning (CL) and Streaming Machine Learning (SML) study the ability of agents to learn from a stream of non-stationary data. Despite sharing some similarities, they address different and complementary challenges. While SML focuses on rapid adaptation after changes (concept drifts), CL aims to retain past knowledge when learning new tasks. After a brief introduction to CL and SML, we discuss Streaming Continual Learning (SCL), an emerging paradigm providing a unifying solution to real-world problems, which may require both SML and CL abilities. We claim that SCL can i) connect the CL and SML communities, motivating their work towards the same goal, and ii) foster the design of hybrid approaches that can quickly adapt to new information (as in SML) without forgetting previous knowledge (as in CL). We conclude the paper with a motivating example and a set of experiments, highlighting the need for SCL by showing how CL and SML alone struggle to achieve rapid adaptation and knowledge retention.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132951"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SynHOI: Multi-granularity GAN synthesizer for generative zero-shot HOI detection SynHOI:用于生成零射HOI检测的多粒度GAN合成器
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132947
Caixia Yan , Yan Kou , Chuan Liu
Zero-shot Human-Object Interaction (HOI) detection has emerged as a new challenge that aims to precisely identify human-object interactions without relying on specific prior training data. The existing visual-semantic mapping-based approaches tackle this challenge by transferring knowledge from external sources or exploring compositional techniques. However, due to the lack of training samples for unseen classes, these methods often suffer from the issue of overfitting to the available seen data and fail to generalize well to novel and diverse HOI categories of long-tail distribution. Thus, in this work, we propose to synthesize visual features for unseen HOI categories conditioned on the semantic embedding of the corresponding category, enabling the model to learn both seen and unseen HOI instances in the visual domain. In pursuit of this objective, we develop an innovative unseen HOI synthesizer by unleashing the power of Generative Adversarial Networks (GANs). Considering the flexibility and complexity of the zero-shot HOI task settings, we design a multi-granularity GAN synthesizer to generate both composite HOI features and basic elements of subjects, verbs and objects, which are then fused to provide enriched training data for the unseen HOI classifier. To further enhance the quality of HOI feature synthesis, we have customized both inter-cluster, and intra-cluster contrastive learning and composition augmented generation strategies to facilitate the learning process of GANs. Extensive experiments demonstrate that the proposed method can synthesize appropriate visual features for various unobserved HOI categories, and thus performs favorably in multiple zero-shot HOI detection settings.
零射击人-物交互(HOI)检测已经成为一项新的挑战,旨在精确识别人-物交互而不依赖于特定的事先训练数据。现有的基于视觉语义映射的方法通过从外部资源转移知识或探索组合技术来解决这一挑战。然而,由于缺乏未见类的训练样本,这些方法往往存在与现有可见数据过拟合的问题,并且不能很好地推广到长尾分布的新颖和多样化的HOI类别。因此,在这项工作中,我们提出在相应类别的语义嵌入的条件下,合成未见的HOI类别的视觉特征,使模型能够学习视觉领域中可见和未见的HOI实例。为了实现这一目标,我们通过释放生成对抗网络(gan)的力量,开发了一种创新的看不见的HOI合成器。考虑到零射击HOI任务设置的灵活性和复杂性,我们设计了一个多粒度GAN合成器来生成复合HOI特征和主语、动词和宾语的基本元素,然后将它们融合为未见的HOI分类器提供丰富的训练数据。为了进一步提高HOI特征合成的质量,我们定制了簇间和簇内对比学习和组合增强生成策略,以促进gan的学习过程。大量的实验表明,该方法可以针对各种未观测到的HOI类别合成合适的视觉特征,从而在多个零射击HOI检测设置中表现良好。
{"title":"SynHOI: Multi-granularity GAN synthesizer for generative zero-shot HOI detection","authors":"Caixia Yan ,&nbsp;Yan Kou ,&nbsp;Chuan Liu","doi":"10.1016/j.neucom.2026.132947","DOIUrl":"10.1016/j.neucom.2026.132947","url":null,"abstract":"<div><div>Zero-shot Human-Object Interaction (HOI) detection has emerged as a new challenge that aims to precisely identify human-object interactions without relying on specific prior training data. The existing visual-semantic mapping-based approaches tackle this challenge by transferring knowledge from external sources or exploring compositional techniques. However, due to the lack of training samples for unseen classes, these methods often suffer from the issue of overfitting to the available seen data and fail to generalize well to novel and diverse HOI categories of long-tail distribution. Thus, in this work, we propose to synthesize visual features for unseen HOI categories conditioned on the semantic embedding of the corresponding category, enabling the model to learn both seen and unseen HOI instances in the visual domain. In pursuit of this objective, we develop an innovative unseen HOI synthesizer by unleashing the power of Generative Adversarial Networks (GANs). Considering the flexibility and complexity of the zero-shot HOI task settings, we design a multi-granularity GAN synthesizer to generate both composite HOI features and basic elements of subjects, verbs and objects, which are then fused to provide enriched training data for the unseen HOI classifier. To further enhance the quality of HOI feature synthesis, we have customized both inter-cluster, and intra-cluster contrastive learning and composition augmented generation strategies to facilitate the learning process of GANs. Extensive experiments demonstrate that the proposed method can synthesize appropriate visual features for various unobserved HOI categories, and thus performs favorably in multiple zero-shot HOI detection settings.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132947"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSDMamba: A spectral–spatial dual-branch mamba for hyperspectral image classification ssd曼巴:用于高光谱图像分类的光谱-空间双分支曼巴
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132944
Zhaopeng Deng , Zheng Zhou , Haoran Zhao , Gengshen Wu , Xin Sun
Hyperspectral image (HSI) classification represents a critical research focus in remote sensing. However, effectively and efficiently modeling complex spectral-spatial relationships remains a fundamental challenge. While convolutional neural networks (CNNs) and Transformers have gained widespread adoption for HSI classification, CNNs struggle to capture long-range dependencies, and Transformers suffer from quadratic computational complexity. Recently, a selective state-space model (SSM) named Mamba has demonstrated significant potential. Nevertheless, directly applying Mamba to HSI classification poses substantial challenges due to intricate spectral-spatial interactions. To address this, we propose a novel Mamba-based architecture for HSI classification named Spectral–Spatial Dual-Branch Mamba for Hyperspectral Image Classification (SSDMamba), which models both spectral and spatial data and then efficiently fuses the two types of information. Specifically, we design a DS Spatial Mamba (DSSM) block, utilizing unidirectional scanning, to process spatial long-range information in a lightweight manner. Subsequently, we propose an FFT Spectral Mamba (FSM) block, which efficiently processes spectral data to establish global connections within the spectral data. Finally, a Dynamic Interactive Fusion Module (DIFM) dynamically and efficiently fuses spectral and spatial features. Extensive experiments on four benchmark HSI datasets demonstrate that SSDMamba achieves significantly higher accuracy with fewer parameters compared to other methods.
高光谱图像分类是遥感领域的一个重要研究热点。然而,有效和高效地建模复杂的光谱-空间关系仍然是一个根本性的挑战。虽然卷积神经网络(cnn)和变压器在HSI分类中得到了广泛的应用,但cnn很难捕捉到长期依赖关系,而变压器的计算复杂度是二次的。最近,一种名为Mamba的选择性状态空间模型(SSM)显示出了巨大的潜力。然而,由于复杂的光谱-空间相互作用,直接将曼巴应用于恒指分类面临着巨大的挑战。为了解决这个问题,我们提出了一种新的基于Mamba的HSI分类架构,称为光谱-空间双分支Mamba用于高光谱图像分类(SSDMamba),它同时建模光谱和空间数据,然后有效地融合两种类型的信息。具体来说,我们设计了一个DS空间曼巴(DSSM)块,利用单向扫描,以轻量级的方式处理空间远程信息。随后,我们提出了一种FFT光谱曼巴(FSM)块,该块有效地处理光谱数据,在光谱数据中建立全局连接。最后,采用动态交互融合模块(DIFM)对光谱特征和空间特征进行动态高效融合。在四个基准HSI数据集上进行的大量实验表明,与其他方法相比,SSDMamba方法在参数较少的情况下获得了更高的精度。
{"title":"SSDMamba: A spectral–spatial dual-branch mamba for hyperspectral image classification","authors":"Zhaopeng Deng ,&nbsp;Zheng Zhou ,&nbsp;Haoran Zhao ,&nbsp;Gengshen Wu ,&nbsp;Xin Sun","doi":"10.1016/j.neucom.2026.132944","DOIUrl":"10.1016/j.neucom.2026.132944","url":null,"abstract":"<div><div>Hyperspectral image (HSI) classification represents a critical research focus in remote sensing. However, effectively and efficiently modeling complex spectral-spatial relationships remains a fundamental challenge. While convolutional neural networks (CNNs) and Transformers have gained widespread adoption for HSI classification, CNNs struggle to capture long-range dependencies, and Transformers suffer from quadratic computational complexity. Recently, a selective state-space model (SSM) named Mamba has demonstrated significant potential. Nevertheless, directly applying Mamba to HSI classification poses substantial challenges due to intricate spectral-spatial interactions. To address this, we propose a novel Mamba-based architecture for HSI classification named Spectral–Spatial Dual-Branch Mamba for Hyperspectral Image Classification (SSDMamba), which models both spectral and spatial data and then efficiently fuses the two types of information. Specifically, we design a DS Spatial Mamba (DSSM) block, utilizing unidirectional scanning, to process spatial long-range information in a lightweight manner. Subsequently, we propose an FFT Spectral Mamba (FSM) block, which efficiently processes spectral data to establish global connections within the spectral data. Finally, a Dynamic Interactive Fusion Module (DIFM) dynamically and efficiently fuses spectral and spatial features. Extensive experiments on four benchmark HSI datasets demonstrate that SSDMamba achieves significantly higher accuracy with fewer parameters compared to other methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132944"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STIMI: A masked image modeling framework for spatiotemporal wind speed reconstruction 时空风速重建的掩膜图像建模框架
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132945
Kai Qu, Shuangsi Xue, Xiaodong Zheng, Hui Cao
While deep learning has revolutionized time-series imputation, prevailing sequence-based paradigms remain vulnerable when handling contiguous block missing patterns, where the loss of local context leads to significant performance degradation. To dismantle this barrier, this paper introduces the SpatioTemporal Image Mask Imputer (STIMI), a vision-inspired framework that recasts high-dimensional wind speed reconstruction as a grayscale image reconstruction task. STIMI introduces a Mutual Information (MI)-based re-indexing strategy that reshapes irregular time series into a structured 2D grid, helping the model better recognize and recover missing patterns. It further adopts an asymmetric encoder–decoder architecture with a Multi-Scale Window Self-Attention (MSWSA) mechanism to efficiently capture multi-granularity spatiotemporal dependencies at reduced computational cost. Furthermore, the framework optimizes a dual-objective hybrid loss function, synergizing Mean Squared Error (MSE) with Kullback-Leibler (KL) divergence to ensure both point-wise fidelity and global distributional consistency. Extensive experiments confirm that STIMI consistently outperforms state-of-the-art baselines, demonstrating superior resilience particularly in extreme block missing scenarios. Finally, SHAP-based interpretability analysis reveals the model’s ability to prioritize local contextual information through a distance-dependent hierarchy, establishing STIMI as a trustworthy and explainable solution for data-intensive renewable energy applications.
虽然深度学习已经彻底改变了时间序列输入,但在处理连续块缺失模式时,主流的基于序列的范式仍然很脆弱,在这种情况下,局部上下文的丢失会导致显著的性能下降。为了消除这一障碍,本文引入了时空图像掩码Imputer (STIMI),这是一个视觉启发的框架,将高维风速重建重塑为灰度图像重建任务。STIMI引入了一种基于互信息(MI)的重新索引策略,将不规则的时间序列重塑为结构化的2D网格,帮助模型更好地识别和恢复缺失的模式。采用非对称编码器-解码器架构,采用多尺度窗口自关注(MSWSA)机制,在降低计算成本的前提下高效捕获多粒度时空依赖关系。此外,该框架优化了双目标混合损失函数,将均方误差(MSE)与Kullback-Leibler (KL)散度协同,以确保点向保真度和全局分布一致性。大量实验证实,STIMI始终优于最先进的基线,特别是在极端区块缺失情况下,表现出卓越的弹性。最后,基于shap的可解释性分析揭示了该模型通过距离依赖的层次结构来优先考虑本地上下文信息的能力,从而将STIMI确立为数据密集型可再生能源应用的可靠且可解释的解决方案。
{"title":"STIMI: A masked image modeling framework for spatiotemporal wind speed reconstruction","authors":"Kai Qu,&nbsp;Shuangsi Xue,&nbsp;Xiaodong Zheng,&nbsp;Hui Cao","doi":"10.1016/j.neucom.2026.132945","DOIUrl":"10.1016/j.neucom.2026.132945","url":null,"abstract":"<div><div>While deep learning has revolutionized time-series imputation, prevailing sequence-based paradigms remain vulnerable when handling contiguous block missing patterns, where the loss of local context leads to significant performance degradation. To dismantle this barrier, this paper introduces the <u>S</u>patio<u>T</u>emporal <u>I</u>mage <u>M</u>ask <u>I</u>mputer (STIMI), a vision-inspired framework that recasts high-dimensional wind speed reconstruction as a grayscale image reconstruction task. STIMI introduces a Mutual Information (MI)-based re-indexing strategy that reshapes irregular time series into a structured 2D grid, helping the model better recognize and recover missing patterns. It further adopts an asymmetric encoder–decoder architecture with a Multi-Scale Window Self-Attention (MSWSA) mechanism to efficiently capture multi-granularity spatiotemporal dependencies at reduced computational cost. Furthermore, the framework optimizes a dual-objective hybrid loss function, synergizing Mean Squared Error (MSE) with Kullback-Leibler (KL) divergence to ensure both point-wise fidelity and global distributional consistency. Extensive experiments confirm that STIMI consistently outperforms state-of-the-art baselines, demonstrating superior resilience particularly in extreme block missing scenarios. Finally, SHAP-based interpretability analysis reveals the model’s ability to prioritize local contextual information through a distance-dependent hierarchy, establishing STIMI as a trustworthy and explainable solution for data-intensive renewable energy applications.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132945"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning-based remote sensing image super-resolution: Recent advances and challenges 基于深度学习的遥感图像超分辨率:最新进展与挑战
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132939
Jiawei Yang, Hongliang Ren, Zhichao He, Mengjie Zeng
As a crucial data source for Earth science research and spatial information applications, remote sensing images often face limitations in spatial resolution due to factors such as sensor performance, imaging conditions, and costs, making it challenging to meet the growing demand for fine-grained analysis. In recent years, deep learning-based remote sensing image super-resolution (RSISR) technology has demonstrated significant potential by reconstructing high-resolution (HR) details from low-resolution (LR) remote sensing images, quickly becoming a research hotspot. However, systematic reviews of RSISR methodologies, network architecture evolution, domain development characteristics, and future directions remain relatively scarce. To address this, this study comprehensively reviews the major advancements in the field since 2020 based on the development of deep learning-based RSISR frameworks. First, it defines the RSISR problem, provides a comprehensive statistical analysis of RSISR algorithms published from 2020 onward, and selects over 100 deep learning-related publications for in-depth study. Subsequently, existing research is systematically categorized according to methodological principles: supervised learning methods are divided into six categories based on convolutional neural networks, attention mechanisms, generative adversarial networks, Transformers, diffusion models, and Mamba, while unsupervised learning methods are grouped into four frameworks including self-supervised learning, contrastive learning, zero-shot learning, and generative methods. Additionally, commonly used datasets, loss functions, and evaluation metrics in RSISR tasks are reviewed, and existing performance assessment methods are discussed in detail. Finally, the study summarizes the current development trends, future directions, and key challenges in the field, aiming to provide theoretical reference and practical guidance for related research.
遥感图像作为地球科学研究和空间信息应用的重要数据源,由于传感器性能、成像条件和成本等因素,往往在空间分辨率上受到限制,难以满足日益增长的细粒度分析需求。近年来,基于深度学习的遥感图像超分辨率(RSISR)技术通过从低分辨率(LR)遥感图像中重建高分辨率(HR)细节,显示出巨大的潜力,迅速成为研究热点。然而,关于risr方法、网络架构演变、领域发展特征和未来方向的系统综述仍然相对缺乏。为了解决这个问题,本研究基于基于深度学习的risr框架的发展,全面回顾了自2020年以来该领域的主要进展。首先,定义了risr问题,对2020年以来发表的risr算法进行了全面的统计分析,并选择了100多篇深度学习相关的论文进行深入研究。随后,根据方法论原则对已有研究进行了系统分类:有监督学习方法分为基于卷积神经网络、注意机制、生成对抗网络、变形金刚、扩散模型和曼巴的6类,无监督学习方法分为自监督学习、对比学习、零机会学习和生成方法4个框架。此外,回顾了risr任务中常用的数据集、损失函数和评估指标,并详细讨论了现有的性能评估方法。最后,总结了该领域目前的发展趋势、未来发展方向和面临的关键挑战,旨在为相关研究提供理论参考和实践指导。
{"title":"Deep learning-based remote sensing image super-resolution: Recent advances and challenges","authors":"Jiawei Yang,&nbsp;Hongliang Ren,&nbsp;Zhichao He,&nbsp;Mengjie Zeng","doi":"10.1016/j.neucom.2026.132939","DOIUrl":"10.1016/j.neucom.2026.132939","url":null,"abstract":"<div><div>As a crucial data source for Earth science research and spatial information applications, remote sensing images often face limitations in spatial resolution due to factors such as sensor performance, imaging conditions, and costs, making it challenging to meet the growing demand for fine-grained analysis. In recent years, deep learning-based remote sensing image super-resolution (RSISR) technology has demonstrated significant potential by reconstructing high-resolution (HR) details from low-resolution (LR) remote sensing images, quickly becoming a research hotspot. However, systematic reviews of RSISR methodologies, network architecture evolution, domain development characteristics, and future directions remain relatively scarce. To address this, this study comprehensively reviews the major advancements in the field since 2020 based on the development of deep learning-based RSISR frameworks. First, it defines the RSISR problem, provides a comprehensive statistical analysis of RSISR algorithms published from 2020 onward, and selects over 100 deep learning-related publications for in-depth study. Subsequently, existing research is systematically categorized according to methodological principles: supervised learning methods are divided into six categories based on convolutional neural networks, attention mechanisms, generative adversarial networks, Transformers, diffusion models, and Mamba, while unsupervised learning methods are grouped into four frameworks including self-supervised learning, contrastive learning, zero-shot learning, and generative methods. Additionally, commonly used datasets, loss functions, and evaluation metrics in RSISR tasks are reviewed, and existing performance assessment methods are discussed in detail. Finally, the study summarizes the current development trends, future directions, and key challenges in the field, aiming to provide theoretical reference and practical guidance for related research.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132939"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anti-DETR: End-to-end anti-drone visual detection network based on wavelet convolution Anti-DETR:基于小波卷积的端到端反无人机视觉检测网络
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132935
Jiarui Zhang , Zhihua Chen , Chun Zheng , Wenjun Yi , Guoxu Yan , Yi Wang
With the advancement of unmanned aerial vehicle technology, visual detection for anti-drone tasks in air-to-air scenarios has become increasingly critical. However, detecting fast-moving small UAVs in complex backgrounds remains challenging due to interference from background noise and blurred target edges, resulting in poor detection accuracy. To address these issues, we propose Anti-DETR, an end-to-end detection network leveraging wavelet convolution specifically for small-target UAV detection. Anti-DETR consists of three key components: first, the Global Multi-channel Wavelet Residual Network, which expands the receptive field through wavelet convolution and efficiently localizes targets with a global multi-channel attention mechanism; second, the Multi-scale Refined Feature Pyramid Network, employing an Adaptive Global Calibration Attention Unit to integrate fine-grained shallow features and deep semantic features, enhancing multi-scale feature representation; finally, the Histogram Self-Attention mechanism, which classifies pixel-level features to improve feature perception in complex backgrounds. Evaluations on the Det-Fly, DUT-Anti-UAV, and HazyDet datasets demonstrate that Anti-DETR surpasses several state-of-the-art methods and classical detectors, confirming its effectiveness and generalizability for accurate anti-UAV detection tasks under challenging environmental conditions. The code is available at https://github.com/Image-Zhang/anti-detr.
随着无人机技术的进步,空对空场景下反无人机任务的视觉检测变得越来越重要。然而,由于背景噪声和目标边缘模糊的干扰,在复杂背景下检测快速移动的小型无人机仍然具有挑战性,导致检测精度较差。为了解决这些问题,我们提出了Anti-DETR,一种端到端检测网络,利用小波卷积专门用于小目标无人机检测。Anti-DETR由三个关键部分组成:一是全局多通道小波残差网络,通过小波卷积扩展接收野,利用全局多通道注意机制有效定位目标;二是多尺度精细特征金字塔网络,采用自适应全局校准注意单元将细粒度浅层特征与深层语义特征融合,增强多尺度特征表征;最后是直方图自注意机制,该机制对像素级特征进行分类,以提高复杂背景下的特征感知能力。对Det-Fly、DUT-Anti-UAV和HazyDet数据集的评估表明,Anti-DETR超过了几种最先进的方法和经典探测器,证实了其在具有挑战性的环境条件下精确反无人机探测任务的有效性和通用性。代码可在https://github.com/Image-Zhang/anti-detr上获得。
{"title":"Anti-DETR: End-to-end anti-drone visual detection network based on wavelet convolution","authors":"Jiarui Zhang ,&nbsp;Zhihua Chen ,&nbsp;Chun Zheng ,&nbsp;Wenjun Yi ,&nbsp;Guoxu Yan ,&nbsp;Yi Wang","doi":"10.1016/j.neucom.2026.132935","DOIUrl":"10.1016/j.neucom.2026.132935","url":null,"abstract":"<div><div>With the advancement of unmanned aerial vehicle technology, visual detection for anti-drone tasks in air-to-air scenarios has become increasingly critical. However, detecting fast-moving small UAVs in complex backgrounds remains challenging due to interference from background noise and blurred target edges, resulting in poor detection accuracy. To address these issues, we propose Anti-DETR, an end-to-end detection network leveraging wavelet convolution specifically for small-target UAV detection. Anti-DETR consists of three key components: first, the Global Multi-channel Wavelet Residual Network, which expands the receptive field through wavelet convolution and efficiently localizes targets with a global multi-channel attention mechanism; second, the Multi-scale Refined Feature Pyramid Network, employing an Adaptive Global Calibration Attention Unit to integrate fine-grained shallow features and deep semantic features, enhancing multi-scale feature representation; finally, the Histogram Self-Attention mechanism, which classifies pixel-level features to improve feature perception in complex backgrounds. Evaluations on the Det-Fly, DUT-Anti-UAV, and HazyDet datasets demonstrate that Anti-DETR surpasses several state-of-the-art methods and classical detectors, confirming its effectiveness and generalizability for accurate anti-UAV detection tasks under challenging environmental conditions. The code is available at <span><span>https://github.com/Image-Zhang/anti-detr</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"674 ","pages":"Article 132935"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tensor-to-tensor models with fast iterated sum features 具有快速迭代和特征的张量到张量模型
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.neucom.2026.132884
Joscha Diehl , Rasheed Ibraheem , Leonard Schmitz , Yue Wu
Designing expressive yet computationally efficient layers for high-dimensional tensor data (e.g., images) remains a significant challenge. While sequence modeling has seen a shift toward linear-time architectures, extending these benefits to higher-order tensors is non-trivial.
In this work, we introduce the Fast Iterated Sums (FIS) layer, a novel tensor-to-tensor primitive with linear time and space complexity relative to the input size.
Theoretically, our framework bridges deep learning and algorithmic combinatorics: it leverages “corner tree” structures from permutation pattern counting to efficiently compute 2D iterated sums. This formulation admits dual interpretations as both a higher-order state-space model (SSM) and a multiparameter extension of the Signature Transform.
Practically, the FIS layer serves as a drop-in replacement for standard layers in vision backbones. We evaluate its performance on image classification and anomaly detection. When replacing layers in a smaller ResNet, the FIS-based model achieves accuracy of a larger ResNet baseline while reducing both trainable parameters and multiply-add operations. When replacing layers in ConvNeXt tiny, the FIS-based model saves around 2% of parameters, has around 8% shorter time per epoch and improves accuracy by around 0.6% on CIFAR-10 and around 2% on CIFAR-100. Furthermore, on the texture subset of MVTec AD, it attains an average AUROC of 97.3%. The code is available at https://github.com/diehlj/fast-iterated-sums.
为高维张量数据(如图像)设计具有表现力且计算效率高的层仍然是一个重大挑战。虽然序列建模已经转向线性时间架构,但将这些好处扩展到高阶张量是非常重要的。在这项工作中,我们引入了快速迭代求和(FIS)层,这是一种新的张量到张量原语,相对于输入大小具有线性的时间和空间复杂性。从理论上讲,我们的框架连接了深度学习和算法组合:它利用来自排列模式计数的“角树”结构来有效地计算二维迭代和。该公式允许双重解释,即高阶状态空间模型(SSM)和签名变换的多参数扩展。实际上,FIS层是视觉骨干中标准层的替代层。对其在图像分类和异常检测方面的性能进行了评价。当在较小的ResNet中替换层时,基于fis的模型在减少可训练参数和乘加操作的同时达到了较大ResNet基线的精度。当在ConvNeXt tiny中替换层时,基于fis的模型节省了约2%的参数,每个历元的时间缩短了约8%,并且在CIFAR-10上提高了约0.6%的精度,在CIFAR-100上提高了约2%的精度。在MVTec AD的纹理子集上,平均AUROC达到97.3%。代码可在https://github.com/diehlj/fast-iterated-sums上获得。
{"title":"Tensor-to-tensor models with fast iterated sum features","authors":"Joscha Diehl ,&nbsp;Rasheed Ibraheem ,&nbsp;Leonard Schmitz ,&nbsp;Yue Wu","doi":"10.1016/j.neucom.2026.132884","DOIUrl":"10.1016/j.neucom.2026.132884","url":null,"abstract":"<div><div>Designing expressive yet computationally efficient layers for high-dimensional tensor data (e.g., images) remains a significant challenge. While sequence modeling has seen a shift toward linear-time architectures, extending these benefits to higher-order tensors is non-trivial.</div><div>In this work, we introduce the <strong>Fast Iterated Sums (FIS)</strong> layer, a novel tensor-to-tensor primitive with <strong>linear time and space complexity</strong> relative to the input size.</div><div>Theoretically, our framework bridges deep learning and algorithmic combinatorics: it leverages “corner tree” structures from permutation pattern counting to efficiently compute 2D iterated sums. This formulation admits dual interpretations as both a higher-order state-space model (SSM) and a multiparameter extension of the Signature Transform.</div><div>Practically, the FIS layer serves as a drop-in replacement for standard layers in vision backbones. We evaluate its performance on image classification and anomaly detection. When replacing layers in a smaller ResNet, the FIS-based model achieves accuracy of a larger ResNet baseline while reducing both trainable parameters and multiply-add operations. When replacing layers in ConvNeXt tiny, the FIS-based model saves around 2% of parameters, has around 8% shorter time per epoch and improves accuracy by around 0.6% on CIFAR-10 and around 2% on CIFAR-100. Furthermore, on the texture subset of MVTec AD, it attains an average AUROC of 97.3%. The code is available at <span><span>https://github.com/diehlj/fast-iterated-sums</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132884"},"PeriodicalIF":6.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neurocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1