首页 > 最新文献

Neurocomputing最新文献

英文 中文
HierLoRA: A hierarchical multi-concept learning approach with enhanced LoRA for personalized image diffusion models HierLoRA:一种针对个性化图像扩散模型的分层多概念学习方法,具有增强的LoRA
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-02-04 DOI: 10.1016/j.neucom.2026.132927
Yongjie Niu , Pengbo Zhou , Rui Zhou , Mingquan Zhou
Personalized image generation, a key application of diffusion models, holds significant importance for the advancement of computer vision, artistic creation, and content generation technologies. However, existing diffusion models fine-tuned with Low-Rank Adaptation (LoRA) face multiple challenges when learning novel concepts: language drift undermines the generation quality of new concepts in novel contexts; the entanglement of object features with other elements in reference images leads to misalignment between the learning target and its unique identifier; and traditional LoRA approaches are limited to learning only one concept at a time. To address these issues, this study proposes a novel hierarchical learning strategy and an enhanced LoRA module. Specifically, we incorporate the GeLU activation function into the LoRA architecture as a nonlinear transformation to effectively mitigate language drift. Furthermore, a gated hierarchical learning mechanism is designed to achieve inter-concept disentanglement, enabling a single LoRA module to learn multiple concepts concurrently. Experimental results across multiple random seeds demonstrate that our approach achieves a 4%–6% improvement in memory retention metrics and outperforms state-of-the-art methods in object fidelity and style similarity by approximately 12.5% and 10%, respectively. In addition to superior generation quality, our method demonstrates high computational efficiency, requiring significantly fewer trainable parameters (45M) compared to existing baselines. While preserving critical features of target objects and maintaining the model’s original capabilities, our method enables the generation of images across diverse scenes in new styles. In scenarios requiring the simultaneous learning of multiple concepts, this study not only presents a novel solution to the multi-concept learning problem in personalized diffusion model training but also lays a technical foundation for high-quality customized AI image generation and diverse visual content creation. The source code is publicly available at https://github.com/ydniuyongjie/HierLoRA/tree/main.
个性化图像生成是扩散模型的关键应用,对计算机视觉、艺术创作和内容生成技术的进步具有重要意义。然而,现有的经低秩自适应(LoRA)微调的扩散模型在学习新概念时面临多重挑战:语言漂移破坏了新语境下新概念的生成质量;对象特征与参考图像中其他元素的纠缠导致学习目标与其唯一标识符之间的不对齐;而传统的LoRA方法一次只能学习一个概念。为了解决这些问题,本研究提出了一种新的分层学习策略和增强的LoRA模块。具体来说,我们将GeLU激活函数作为非线性转换合并到LoRA体系结构中,以有效地减轻语言漂移。此外,设计了一种门控分层学习机制来实现概念间的解纠缠,使单个LoRA模块能够同时学习多个概念。跨多个随机种子的实验结果表明,我们的方法在记忆保留指标上实现了4%-6%的改进,并且在对象保真度和风格相似性方面分别优于最先进的方法约12.5%和10%。除了优越的生成质量外,我们的方法还显示出高计算效率,与现有基线相比,需要更少的可训练参数(~ 45M)。在保留目标物体的关键特征和保持模型的原始功能的同时,我们的方法能够以新的风格生成不同场景的图像。在需要同时学习多个概念的场景中,本研究不仅为个性化扩散模型训练中的多概念学习问题提供了新颖的解决方案,而且为高质量的定制AI图像生成和多样化的视觉内容创作奠定了技术基础。源代码可在https://github.com/ydniuyongjie/HierLoRA/tree/main上公开获得。
{"title":"HierLoRA: A hierarchical multi-concept learning approach with enhanced LoRA for personalized image diffusion models","authors":"Yongjie Niu ,&nbsp;Pengbo Zhou ,&nbsp;Rui Zhou ,&nbsp;Mingquan Zhou","doi":"10.1016/j.neucom.2026.132927","DOIUrl":"10.1016/j.neucom.2026.132927","url":null,"abstract":"<div><div>Personalized image generation, a key application of diffusion models, holds significant importance for the advancement of computer vision, artistic creation, and content generation technologies. However, existing diffusion models fine-tuned with Low-Rank Adaptation (LoRA) face multiple challenges when learning novel concepts: language drift undermines the generation quality of new concepts in novel contexts; the entanglement of object features with other elements in reference images leads to misalignment between the learning target and its unique identifier; and traditional LoRA approaches are limited to learning only one concept at a time. To address these issues, this study proposes a novel hierarchical learning strategy and an enhanced LoRA module. Specifically, we incorporate the GeLU activation function into the LoRA architecture as a nonlinear transformation to effectively mitigate language drift. Furthermore, a gated hierarchical learning mechanism is designed to achieve inter-concept disentanglement, enabling a single LoRA module to learn multiple concepts concurrently. Experimental results across multiple random seeds demonstrate that our approach achieves a 4%–6% improvement in memory retention metrics and outperforms state-of-the-art methods in object fidelity and style similarity by approximately 12.5% and 10%, respectively. In addition to superior generation quality, our method demonstrates high computational efficiency, requiring significantly fewer trainable parameters (<span><math><mo>∼</mo></math></span>45M) compared to existing baselines. While preserving critical features of target objects and maintaining the model’s original capabilities, our method enables the generation of images across diverse scenes in new styles. In scenarios requiring the simultaneous learning of multiple concepts, this study not only presents a novel solution to the multi-concept learning problem in personalized diffusion model training but also lays a technical foundation for high-quality customized AI image generation and diverse visual content creation. <strong>The source code is publicly available at</strong> <span><span><strong>https://github.com/ydniuyongjie/HierLoRA/tree/main</strong></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132927"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Depth aware image compression with multi-reference dynamic entropy model 基于多参考动态熵模型的深度感知图像压缩
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-02-04 DOI: 10.1016/j.neucom.2026.132971
Jingyi He, Yongjun Li, Yifei Liang, Mengyan Lu, Haorui Liu, Jixing Zhou, Yi Wei, Hongyan Liu
To overcome the limitations of static feature extraction and inefficient context modeling in existing learned image compression, this paper proposes an image compression algorithm that integrates Depth-aware Adaptive Transformation (DAT) framework and Multi-reference Dynamic Entropy Model (MDEM). A proposed Multi-scale Capacity-aware Feature Enhancer (MCFE) model is adaptively embedded into the network to enhance feature extraction capability. The DAT architecture integrates a variational autoencoder framework with MCFE to increase the density of latent representations. Furthermore, an improved soft-threshold sparse attention mechanism is combined with a multi-context model, incorporating adaptive weights to eliminate spatial redundancy in the latent representations across local, non-local, and global dimensions, while channel context is introduced to capture channel dependencies. Building upon this, the MDEM integrates the side information provided by DAT along with spatial and channel context information and employs a channel-wise autoregressive model to achieve accurate pixel estimation for precise entropy probability estimation, which improves compression performance. Evaluated on the Kodak, Tecnick, and CLIC(Challenge on Learned Image Compression) Professional Validation datasets, the proposed method achieves BD-rate(Bjøntegaard Delta rate) gains of 7.75%, 9.33%, and 5.73%, respectively, compared to the VTM(Versatile Video Coding Test Model)-17.0 benchmark. Therefore, the proposed algorithm overcomes the limitations of fixed-context and static feature extraction strategies, enabling precise probability estimation and superior compression performance through dynamic resource allocation and multi-dimensional contextual modeling.
为了克服现有学习图像压缩中静态特征提取和上下文建模效率低下的局限性,提出了一种融合深度感知自适应变换(DAT)框架和多参考动态熵模型(MDEM)的图像压缩算法。提出了一种自适应嵌入网络的多尺度容量感知特征增强器(MCFE)模型,以增强特征提取能力。DAT架构将变分自编码器框架与MCFE集成在一起,以增加潜在表示的密度。此外,将改进的软阈值稀疏注意机制与多上下文模型相结合,结合自适应权重来消除局部、非局部和全局维度潜在表征中的空间冗余,同时引入通道上下文来捕获通道依赖性。在此基础上,MDEM集成了DAT提供的侧信息以及空间和信道上下文信息,并采用信道自回归模型实现精确的像素估计,以实现精确的熵概率估计,从而提高了压缩性能。在Kodak, Tecnick和CLIC(Challenge on Learned Image Compression) Professional Validation数据集上进行了评估,与VTM(Versatile Video Coding Test Model)-17.0基准相比,该方法的BD-rate(Bjøntegaard Delta rate)分别提高了7.75%,9.33%和5.73%。因此,该算法克服了固定上下文和静态特征提取策略的局限性,通过动态资源分配和多维上下文建模实现了精确的概率估计和优越的压缩性能。
{"title":"Depth aware image compression with multi-reference dynamic entropy model","authors":"Jingyi He,&nbsp;Yongjun Li,&nbsp;Yifei Liang,&nbsp;Mengyan Lu,&nbsp;Haorui Liu,&nbsp;Jixing Zhou,&nbsp;Yi Wei,&nbsp;Hongyan Liu","doi":"10.1016/j.neucom.2026.132971","DOIUrl":"10.1016/j.neucom.2026.132971","url":null,"abstract":"<div><div>To overcome the limitations of static feature extraction and inefficient context modeling in existing learned image compression, this paper proposes an image compression algorithm that integrates Depth-aware Adaptive Transformation (DAT) framework and Multi-reference Dynamic Entropy Model (MDEM). A proposed Multi-scale Capacity-aware Feature Enhancer (MCFE) model is adaptively embedded into the network to enhance feature extraction capability. The DAT architecture integrates a variational autoencoder framework with MCFE to increase the density of latent representations. Furthermore, an improved soft-threshold sparse attention mechanism is combined with a multi-context model, incorporating adaptive weights to eliminate spatial redundancy in the latent representations across local, non-local, and global dimensions, while channel context is introduced to capture channel dependencies. Building upon this, the MDEM integrates the side information provided by DAT along with spatial and channel context information and employs a channel-wise autoregressive model to achieve accurate pixel estimation for precise entropy probability estimation, which improves compression performance. Evaluated on the Kodak, Tecnick, and CLIC(Challenge on Learned Image Compression) Professional Validation datasets, the proposed method achieves BD-rate(Bjøntegaard Delta rate) gains of <span><math><mn>7.75</mn><mi>%</mi></math></span>, <span><math><mn>9.33</mn><mi>%</mi></math></span>, and <span><math><mn>5.73</mn><mi>%</mi></math></span>, respectively, compared to the VTM(Versatile Video Coding Test Model)-17.0 benchmark. Therefore, the proposed algorithm overcomes the limitations of fixed-context and static feature extraction strategies, enabling precise probability estimation and superior compression performance through dynamic resource allocation and multi-dimensional contextual modeling.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132971"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable artificial intelligence with Boolean rule-aware predictions in ridge regression models 山脊回归模型中具有布尔规则感知预测的可解释人工智能
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-02-07 DOI: 10.1016/j.neucom.2026.132991
Seyed Amir Malekpour , Hamid Pezeshk
Recent artificial intelligence (AI) systems, including deep neural networks (DNNs), have become increasingly complex and less interpretable. We propose a model named Regression-Based Boolean Rule Inference, RBBR, that is understandable to humans. By transforming input features into multiple conjunctions, RBBR fits a ridge regression model to the conjunctions and target variable data and derives the Boolean rule set from conjunctions with a positive weight sign in the model. Moreover, for high-dimensional datasets, a strategy is presented to derive Boolean sub-rules from regression sub-models fitted to specific feature subsets. The Bayesian Information Criterion (BIC) is employed to rank the fitted models and associated Boolean rules, striking a balance between interpretability and accuracy. Additionally, a Bayesian framework is proposed for predicting the target class of new datapoints based on top-ranked Boolean rules selected by BIC. By considering the combinatorial interactions among input features, RBBR offers a robust feature selection strategy, surpassing decision trees. Experiments conducted on datasets with low sample sizes reveal that RBBR exhibits data efficiency. Our approach for Boolean rule inference from regression models is compatible with the learning structure of black-box models like DNNs, enabling the interpretation of parameter sets or neurons using Boolean rules.
最近的人工智能(AI)系统,包括深度神经网络(dnn),变得越来越复杂,越来越难以解释。我们提出了一个人类可以理解的基于回归的布尔规则推理(RBBR)模型。RBBR通过将输入特征转换为多个连词,对连词和目标变量数据拟合脊回归模型,并从模型中权号为正的连词中导出布尔规则集。此外,对于高维数据集,提出了一种从适合特定特征子集的回归子模型中导出布尔子规则的策略。采用贝叶斯信息准则(BIC)对拟合模型和相关布尔规则进行排序,在可解释性和准确性之间取得平衡。此外,提出了一种贝叶斯框架,用于基于BIC选择的布尔规则排序来预测新数据点的目标类别。通过考虑输入特征之间的组合交互作用,RBBR提供了一种鲁棒的特征选择策略,优于决策树。在低样本量的数据集上进行的实验表明,RBBR具有数据效率。我们从回归模型中进行布尔规则推理的方法与dnn等黑箱模型的学习结构兼容,可以使用布尔规则解释参数集或神经元。
{"title":"Explainable artificial intelligence with Boolean rule-aware predictions in ridge regression models","authors":"Seyed Amir Malekpour ,&nbsp;Hamid Pezeshk","doi":"10.1016/j.neucom.2026.132991","DOIUrl":"10.1016/j.neucom.2026.132991","url":null,"abstract":"<div><div>Recent artificial intelligence (AI) systems, including deep neural networks (DNNs), have become increasingly complex and less interpretable. We propose a model named Regression-Based Boolean Rule Inference, RBBR, that is understandable to humans. By transforming input features into multiple conjunctions, RBBR fits a ridge regression model to the conjunctions and target variable data and derives the Boolean rule set from conjunctions with a positive weight sign in the model. Moreover, for high-dimensional datasets, a strategy is presented to derive Boolean sub-rules from regression sub-models fitted to specific feature subsets. The Bayesian Information Criterion (BIC) is employed to rank the fitted models and associated Boolean rules, striking a balance between interpretability and accuracy. Additionally, a Bayesian framework is proposed for predicting the target class of new datapoints based on top-ranked Boolean rules selected by BIC. By considering the combinatorial interactions among input features, RBBR offers a robust feature selection strategy, surpassing decision trees. Experiments conducted on datasets with low sample sizes reveal that RBBR exhibits data efficiency. Our approach for Boolean rule inference from regression models is compatible with the learning structure of black-box models like DNNs, enabling the interpretation of parameter sets or neurons using Boolean rules.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132991"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146172891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse assemblies of recurrent neural networks with stability guarantees 具有稳定性保证的递归神经网络的稀疏集合
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-02-03 DOI: 10.1016/j.neucom.2026.132952
Andrea Ceni, Valerio De Caro, Davide Bacciu, Claudio Gallicchio
We introduce AdaDiag, a framework for constructing sparse assemblies of recurrent neural networks (RNNs) with formal stability guarantees. Our approach builds upon contraction theory by designing RNN modules that are inherently contractive through adaptive diagonal parametrization and learnable characteristic time scales. This formulation enables each module to remain fully trainable while preserving global stability under skew-symmetric coupling. We provide rigorous theoretical analysis of contractivity, along with a complexity discussion showing that stability is achieved without additional computational burden. Experiments on ten heterogeneous time series benchmarks demonstrate that AdaDiag consistently surpasses SCN, LSTM, and Vanilla RNN baselines, and achieves competitive performance with state-of-the-art models, all while requiring substantially fewer trainable parameters. These results highlight the effectiveness of sparse and stable assemblies for efficient, adaptive, and generalizable sequence modeling.
我们介绍了AdaDiag,一个用于构造具有形式稳定性保证的递归神经网络(rnn)的稀疏集合的框架。我们的方法建立在收缩理论的基础上,通过设计RNN模块,这些模块通过自适应对角参数化和可学习的特征时间尺度固有地收缩。该公式使每个模块保持完全可训练性,同时保持偏对称耦合下的全局稳定性。我们对收缩性进行了严格的理论分析,并对复杂性进行了讨论,表明在没有额外计算负担的情况下实现了稳定性。在10个异构时间序列基准上的实验表明,AdaDiag始终超过SCN、LSTM和Vanilla RNN基线,并在需要更少的可训练参数的情况下,使用最先进的模型实现了具有竞争力的性能。这些结果突出了稀疏和稳定组合对于高效、自适应和可推广的序列建模的有效性。
{"title":"Sparse assemblies of recurrent neural networks with stability guarantees","authors":"Andrea Ceni,&nbsp;Valerio De Caro,&nbsp;Davide Bacciu,&nbsp;Claudio Gallicchio","doi":"10.1016/j.neucom.2026.132952","DOIUrl":"10.1016/j.neucom.2026.132952","url":null,"abstract":"<div><div>We introduce AdaDiag, a framework for constructing sparse assemblies of recurrent neural networks (RNNs) with formal stability guarantees. Our approach builds upon contraction theory by designing RNN modules that are inherently contractive through adaptive diagonal parametrization and learnable characteristic time scales. This formulation enables each module to remain fully trainable while preserving global stability under skew-symmetric coupling. We provide rigorous theoretical analysis of contractivity, along with a complexity discussion showing that stability is achieved without additional computational burden. Experiments on ten heterogeneous time series benchmarks demonstrate that AdaDiag consistently surpasses SCN, LSTM, and Vanilla RNN baselines, and achieves competitive performance with state-of-the-art models, all while requiring substantially fewer trainable parameters. These results highlight the effectiveness of sparse and stable assemblies for efficient, adaptive, and generalizable sequence modeling.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132952"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146172947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CGEM: A cognitive-guided network for human-aligned entity matching CGEM:一种认知导向的人类对齐实体匹配网络
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-02-06 DOI: 10.1016/j.neucom.2026.132950
Xin Liu, Xiaojun Li, Junping Yao, Yanfei Liu, Qinggang Fan, Haifeng Sun, Chengrong Dong
Deep learning (DL) has advanced entity matching (EM), yet limited interpretability is particularly problematic for real-world deployment in decision-support settings, highlighting the need for models aligned with human reasoning as well as strong performance. Existing approaches improve interpretability but rarely reflect how humans make decisions. We propose Cognitive-Guided Entity Matching (CGEM), a human-aligned framework that reconceptualizes EM as a cognitive process rather than a purely technical task. CGEM is grounded in established theories: it introduces complexity-guided gating inspired by Cognitive Load Theory; builds holistic semantic representation grounded in Frame Semantics; and employs core-attribute reasoning following Cue Validity Theory to ensure diagnostic features govern final decisions. CGEM thus explicitly models complexity, contextuality, and diagnosticity, which remain underexplored in EM research. Experiments on DeepMatcher benchmarks show that CGEM delivers its strongest improvements on the Amazon–Google, Abt–Buy, iTunes–Amazon, and Walmart–Amazon datasets, yielding gains of up to 9.34% over DITTO (2023) and 5.51% over AttendEM (2024), and further exceeds large language model (LLM)–based EM methods on multiple benchmarks. To the best of our knowledge, CGEM is the first EM framework grounded in cognitive decision-making theories, advancing entity matching with human-aligned reasoning, strong predictive performance, and improved interpretability.
深度学习(DL)具有先进的实体匹配(EM),但在决策支持设置的实际部署中,有限的可解释性尤其成问题,这突出了对与人类推理和强大性能相一致的模型的需求。现有的方法提高了可解释性,但很少反映人类如何做出决定。我们提出认知导向实体匹配(CGEM),这是一个与人类一致的框架,它将EM重新定义为一个认知过程,而不是纯粹的技术任务。CGEM以已有的理论为基础:它引入了受认知负荷理论启发的复杂性导向门控;构建基于框架语义的整体语义表征;并采用遵循提示效度理论的核心属性推理,以确保诊断特征支配最终决策。因此,CGEM明确地模拟了复杂性、情境性和诊断性,这些在EM研究中仍未得到充分的探索。在DeepMatcher基准测试上的实验表明,CGEM在Amazon-Google、Abt-Buy、iTunes-Amazon和Walmart-Amazon数据集上表现出了最强的改进,比DITTO(2023)和AttendEM(2024)的收益分别高出9.34%和5.51%,并且在多个基准测试上进一步超过了基于大型语言模型(LLM)的EM方法。据我们所知,CGEM是第一个基于认知决策理论的EM框架,通过与人类一致的推理、强大的预测性能和改进的可解释性来推进实体匹配。
{"title":"CGEM: A cognitive-guided network for human-aligned entity matching","authors":"Xin Liu,&nbsp;Xiaojun Li,&nbsp;Junping Yao,&nbsp;Yanfei Liu,&nbsp;Qinggang Fan,&nbsp;Haifeng Sun,&nbsp;Chengrong Dong","doi":"10.1016/j.neucom.2026.132950","DOIUrl":"10.1016/j.neucom.2026.132950","url":null,"abstract":"<div><div>Deep learning (DL) has advanced entity matching (EM), yet limited interpretability is particularly problematic for real-world deployment in decision-support settings, highlighting the need for models aligned with human reasoning as well as strong performance. Existing approaches improve interpretability but rarely reflect how humans make decisions. We propose Cognitive-Guided Entity Matching (CGEM), a human-aligned framework that reconceptualizes EM as a cognitive process rather than a purely technical task. CGEM is grounded in established theories: it introduces complexity-guided gating inspired by Cognitive Load Theory; builds holistic semantic representation grounded in Frame Semantics; and employs core-attribute reasoning following Cue Validity Theory to ensure diagnostic features govern final decisions. CGEM thus explicitly models complexity, contextuality, and diagnosticity, which remain underexplored in EM research. Experiments on DeepMatcher benchmarks show that CGEM delivers its strongest improvements on the Amazon–Google, Abt–Buy, iTunes–Amazon, and Walmart–Amazon datasets, yielding gains of up to 9.34% over DITTO (2023) and 5.51% over AttendEM (2024), and further exceeds large language model (LLM)–based EM methods on multiple benchmarks. To the best of our knowledge, CGEM is the first EM framework grounded in cognitive decision-making theories, advancing entity matching with human-aligned reasoning, strong predictive performance, and improved interpretability.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132950"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146172892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MPFNet: Mamba-driven progressive fusion network for RAW-RGB collaborative demoiréing MPFNet:用于RAW-RGB协同分解的mamba驱动渐进式融合网络
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-02-07 DOI: 10.1016/j.neucom.2026.133019
Lieqiang Yang, Li Yu, Wang Zhang, Chengyan Deng, Jianqin Liu
With the development of smartphones and display technologies, screen-captured images have become an indispensable means of recording information. However, moiré patterns, generated due to the aliasing effect between the Color Filter Array (CFA) and screen display pixels, severely degrade image quality. Existing demoiré methods suffer from issues such as significant loss of original information in RGB images, limited receptive field range, and high computational complexity, leading to incomplete removal of moiré patterns. To address these limitations, we propose a Mamba-Driven Progressive Fusion Network (MPFNet) for RAW-RGB Collaborative Demoiréing. The MPFNet fully leverages RAW data (which retains richer original information) and RGB data (which provides guidance during RAW-to-RGB conversion), while harnessing the global receptive field attention enabled by Mamba’s linear computational complexity, thereby achieving low-color-difference moiré removal. The MPFNet adopts a two-stage architecture: In the first stage, a Simple Demoiré Block (SDB) performs shallow demoiréing on RAW data while extracting multi-scale RAW features. In the second stage, the dual-path adaptive feature fusion (DAFF) module is used to progressively fuse multi-scale RAW and RGB features, and then the DemoiréMamba Block (DMB) is used to achieve deep moiré removal and accurate color restoration. Extensive experiments on TMM22, RAWVDemoiré and FHDMI datasets demonstrate that MPFNet achieves state-of-the-art performance in both quantitative metrics and qualitative visual comparisons, while maintaining relatively low FLOPs. For instance, MPFNet achieves a PSNR of 28.86 dB on the TMM22 dataset, which is 0.51 dB higher than previous methods, and it also has lower GFLOPs.
随着智能手机和显示技术的发展,屏幕截图已经成为一种不可缺少的记录信息的手段。然而,由于彩色滤波器阵列(CFA)和屏幕显示像素之间的混叠效应而产生的波纹图案严重降低了图像质量。现有的图像去除方法存在RGB图像原始信息丢失严重、接收野范围有限、计算复杂度高等问题,导致图像去除不完全。为了解决这些限制,我们提出了一个mamba驱动的渐进式融合网络(MPFNet),用于RAW-RGB协同分解。MPFNet充分利用RAW数据(保留更丰富的原始信息)和RGB数据(在RAW到RGB转换期间提供指导),同时利用曼巴线性计算复杂性所启用的全球接受场注意力,从而实现低色差的条纹去除。MPFNet采用两阶段架构:第一阶段,SDB (Simple demoir Block)对原始数据进行浅层分解,同时提取多尺度RAW特征。第二阶段,采用双径自适应特征融合(DAFF)模块逐步融合多尺度RAW和RGB特征,然后采用demoir曼巴块(DMB)实现深度斑点去除和精确色彩恢复。在TMM22、rawvdemoir和FHDMI数据集上进行的大量实验表明,MPFNet在定量指标和定性视觉比较方面都达到了最先进的性能,同时保持了相对较低的FLOPs。例如,MPFNet在TMM22数据集上实现了28.86 dB的PSNR,比以前的方法提高了0.51 dB, GFLOPs也更低。
{"title":"MPFNet: Mamba-driven progressive fusion network for RAW-RGB collaborative demoiréing","authors":"Lieqiang Yang,&nbsp;Li Yu,&nbsp;Wang Zhang,&nbsp;Chengyan Deng,&nbsp;Jianqin Liu","doi":"10.1016/j.neucom.2026.133019","DOIUrl":"10.1016/j.neucom.2026.133019","url":null,"abstract":"<div><div>With the development of smartphones and display technologies, screen-captured images have become an indispensable means of recording information. However, moiré patterns, generated due to the aliasing effect between the Color Filter Array (CFA) and screen display pixels, severely degrade image quality. Existing demoiré methods suffer from issues such as significant loss of original information in RGB images, limited receptive field range, and high computational complexity, leading to incomplete removal of moiré patterns. To address these limitations, we propose a Mamba-Driven Progressive Fusion Network (MPFNet) for RAW-RGB Collaborative Demoiréing. The MPFNet fully leverages RAW data (which retains richer original information) and RGB data (which provides guidance during RAW-to-RGB conversion), while harnessing the global receptive field attention enabled by Mamba’s linear computational complexity, thereby achieving low-color-difference moiré removal. The MPFNet adopts a two-stage architecture: In the first stage, a Simple Demoiré Block (SDB) performs shallow demoiréing on RAW data while extracting multi-scale RAW features. In the second stage, the dual-path adaptive feature fusion (DAFF) module is used to progressively fuse multi-scale RAW and RGB features, and then the DemoiréMamba Block (DMB) is used to achieve deep moiré removal and accurate color restoration. Extensive experiments on TMM22, RAWVDemoiré and FHDMI datasets demonstrate that MPFNet achieves state-of-the-art performance in both quantitative metrics and qualitative visual comparisons, while maintaining relatively low FLOPs. For instance, MPFNet achieves a PSNR of 28.86 dB on the TMM22 dataset, which is 0.51 dB higher than previous methods, and it also has lower GFLOPs.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 133019"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146172894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstruction error-based anomaly detection with few outlying examples 基于重构错误的异常检测方法
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-02-09 DOI: 10.1016/j.neucom.2026.133002
Fabrizio Angiulli, Fabio Fassetti, Luca Ferragina
Reconstruction error-based neural architectures constitute a classical deep learning approach to anomaly detection which has shown great performances. It consists in training an Autoencoder to reconstruct a set of examples deemed to represent the normality and then to point out as anomalies those data that show a sufficiently large reconstruction error. Unfortunately, these architectures often become able to well reconstruct also the anomalies in the data. This phenomenon is more evident when there are anomalies in the training set. In particular, when these anomalies are labeled, a setting called semi-supervised, the best way to train Autoencoders is to ignore anomalies and minimize the reconstruction error on normal data.
When a sufficiently large and representative set of anomalous examples is available, the problem essentially shifts toward a classification task, where standard supervised strategies can be applied effectively. In this work, instead, we focus on the more challenging scenario in which only a limited number of anomalous examples is available, and these examples are not sufficiently representative of the wide variability that anomalies may exhibit.
We propose AE--SAD, a novel reconstruction error-based architecture that explicitly leverages labeled anomalies to guide the model. Our method introduces a new loss formulation that forces anomalies to be reconstructed according to a transformation function, effectively pushing them outside the description of normal data. This strategy increases the separation between the reconstruction errors of normal and anomalous samples, thereby improving the detection of both seen and unseen anomalies.
Extensive experiments demonstrate that AE--SAD consistently outperforms both standard Autoencoders and the most competitive deep learning techniques for semi-supervised anomaly detection, achieving state-of-the-art results. In particular, our method proves superior across a diverse set of benchmarks, including vectorial data, high-dimensional datasets, and image domains. Moreover, AE--SAD maintains its advantage even in challenging scenarios where the training data are polluted by anomalies that are incorrectly labeled as normal, further highlighting its robustness and practical applicability.
基于重构误差的神经结构构成了一种经典的深度学习异常检测方法,并已显示出良好的性能。它包括训练一个自动编码器来重建一组被认为代表正态的例子,然后指出那些显示出足够大的重建误差的数据为异常。不幸的是,这些体系结构往往也能很好地重建数据中的异常。当训练集中存在异常时,这种现象更为明显。特别是,当这些异常被标记时,一种称为半监督的设置,训练自动编码器的最佳方法是忽略异常并最小化正常数据上的重建误差。当一个足够大且具有代表性的异常示例集可用时,问题本质上转向分类任务,可以有效地应用标准监督策略。相反,在这项工作中,我们关注的是更具挑战性的场景,其中只有有限数量的异常示例可用,并且这些示例不足以代表异常可能表现出的广泛可变性。我们提出了AE- SAD,这是一种新的基于重构错误的体系结构,它明确地利用标记异常来指导模型。我们的方法引入了一种新的损失公式,迫使异常根据转换函数进行重构,有效地将它们推到正常数据的描述之外。该策略增加了正常和异常样本重建误差之间的分离,从而提高了对可见和未见异常的检测。大量实验表明,AE- SAD始终优于标准的自动编码器和最具竞争力的半监督异常检测深度学习技术,取得了最先进的结果。特别是,我们的方法在各种基准测试中表现优异,包括向量数据、高维数据集和图像域。此外,即使在训练数据被错误标记为正常的异常污染的具有挑战性的场景中,AE- SAD也保持了其优势,进一步突出了其鲁棒性和实用性。
{"title":"Reconstruction error-based anomaly detection with few outlying examples","authors":"Fabrizio Angiulli,&nbsp;Fabio Fassetti,&nbsp;Luca Ferragina","doi":"10.1016/j.neucom.2026.133002","DOIUrl":"10.1016/j.neucom.2026.133002","url":null,"abstract":"<div><div>Reconstruction error-based neural architectures constitute a classical deep learning approach to anomaly detection which has shown great performances. It consists in training an Autoencoder to reconstruct a set of examples deemed to represent the normality and then to point out as anomalies those data that show a sufficiently large reconstruction error. Unfortunately, these architectures often become able to well reconstruct also the anomalies in the data. This phenomenon is more evident when there are anomalies in the training set. In particular, when these anomalies are labeled, a setting called semi-supervised, the best way to train Autoencoders is to ignore anomalies and minimize the reconstruction error on normal data.</div><div>When a sufficiently large and representative set of anomalous examples is available, the problem essentially shifts toward a classification task, where standard supervised strategies can be applied effectively. In this work, instead, we focus on the more challenging scenario in which only a limited number of anomalous examples is available, and these examples are not sufficiently representative of the wide variability that anomalies may exhibit.</div><div>We propose <span><math><mtext>AE--SAD</mtext></math></span>, a novel reconstruction error-based architecture that explicitly leverages labeled anomalies to guide the model. Our method introduces a new loss formulation that forces anomalies to be reconstructed according to a transformation function, effectively pushing them outside the description of normal data. This strategy increases the separation between the reconstruction errors of normal and anomalous samples, thereby improving the detection of both seen and unseen anomalies.</div><div>Extensive experiments demonstrate that <span><math><mtext>AE--SAD</mtext></math></span> consistently outperforms both standard Autoencoders and the most competitive deep learning techniques for semi-supervised anomaly detection, achieving state-of-the-art results. In particular, our method proves superior across a diverse set of benchmarks, including vectorial data, high-dimensional datasets, and image domains. Moreover, <span><math><mtext>AE--SAD</mtext></math></span> maintains its advantage even in challenging scenarios where the training data are polluted by anomalies that are incorrectly labeled as normal, further highlighting its robustness and practical applicability.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 133002"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146172953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiCo: Disentangled concept representation for text-to-image person re-identification 文本到图像人物再识别的解纠缠概念表示
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-01-30 DOI: 10.1016/j.neucom.2026.132885
Giyeol Kim , Chanho Eom
Text-to-image person re-identification (TIReID) aims to retrieve person images from a large gallery given free-form textual descriptions. TIReID is challenging due to the substantial modality gap between visual appearances and textual expressions, as well as the need to model fine-grained correspondences that distinguish individuals with similar attributes such as clothing color, texture, or outfit style. To address these issues, we propose DiCo (Disentangled Concept Representation), a novel framework that achieves hierarchical and disentangled cross-modal alignment. DiCo introduces a shared slot-based representation, where each slot acts as a part-level anchor across modalities and is further decomposed into multiple concept blocks. This design enables the disentanglement of complementary attributes (e.g., color, texture, shape) while maintaining consistent part-level correspondence between image and text. Extensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid demonstrate that our framework achieves competitive performance with state-of-the-art methods, while also enhancing interpretability through explicit slot- and block-level representations for more fine-grained retrieval results.
文本到图像的人物再识别(TIReID)旨在从给定自由格式文本描述的大型图库中检索人物图像。TIReID具有挑战性,因为视觉外观和文本表达之间存在巨大的模态差异,并且需要对细粒度对应进行建模,以区分具有相似属性(如服装颜色、质地或服装风格)的个体。为了解决这些问题,我们提出了DiCo(解纠缠概念表示),这是一个实现分层和解纠缠跨模态对齐的新框架。DiCo引入了一种基于插槽的共享表示,其中每个插槽充当跨模式的部件级锚,并进一步分解为多个概念块。这种设计使互补属性(如颜色、纹理、形状)的分离成为可能,同时保持图像和文本之间一致的部分级对应关系。在中大- pedes、ICFG-PEDES和RSTPReid上进行的大量实验表明,我们的框架使用最先进的方法实现了具有竞争力的性能,同时还通过显式的槽级和块级表示增强了可解释性,从而获得更细粒度的检索结果。
{"title":"DiCo: Disentangled concept representation for text-to-image person re-identification","authors":"Giyeol Kim ,&nbsp;Chanho Eom","doi":"10.1016/j.neucom.2026.132885","DOIUrl":"10.1016/j.neucom.2026.132885","url":null,"abstract":"<div><div>Text-to-image person re-identification (TIReID) aims to retrieve person images from a large gallery given free-form textual descriptions. TIReID is challenging due to the substantial modality gap between visual appearances and textual expressions, as well as the need to model fine-grained correspondences that distinguish individuals with similar attributes such as clothing color, texture, or outfit style. To address these issues, we propose DiCo (Disentangled Concept Representation), a novel framework that achieves hierarchical and disentangled cross-modal alignment. DiCo introduces a shared slot-based representation, where each slot acts as a part-level anchor across modalities and is further decomposed into multiple concept blocks. This design enables the disentanglement of complementary attributes (<em>e.g.</em>, color, texture, shape) while maintaining consistent part-level correspondence between image and text. Extensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid demonstrate that our framework achieves competitive performance with state-of-the-art methods, while also enhancing interpretability through explicit slot- and block-level representations for more fine-grained retrieval results.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132885"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sign language translation via cross-modal alignment and graph convolution 基于跨模态对齐和图卷积的手语翻译
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-02-03 DOI: 10.1016/j.neucom.2026.132949
Ming Yu , Pengfei Zhang , Cuihong Xue , Yingchun Guo
Sign language translation (SLT) converts sign language videos into textual sentences. This process is essential for enabling communication between deaf and hearing individuals. However, the inherent modal gap between visual sign sequences and textual linguistics severely limits performance. Existing methods rely on costly gloss annotations for intermediate supervision, restricting scalability; unsupervised alternatives lack fine-grained alignment or semantic learning capabilities. To address this, we introduce CMAG-Net, a framework integrating cross-modal alignment pre-training and dynamic graph convolutions. The architecture comprises two modules: (1) A cross-modal alignment pre-training module. Optimized with a multi-objective loss, it learns to align visual features with textual semantics, effectively bridging the modality gap without gloss supervision; (2) A dynamic dual-graph spatiotemporal module. It consists of a temporal graph that captures local sign dynamics and a similarity graph that aggregates global semantic relationships. This design suppresses noise, enhances discriminative features, and addresses the challenges of redundant frames and complex spatiotemporal dependencies. Experiments show CMAG-Net outperforms all gloss-free methods on PHOENIX-2014T, CSL-Daily and How2Sign, approaching gloss-based state-of-the-art performance. Versus GFSLT-VLP (gloss-free) on PHOENIX-2014T dev/test sets, BLEU-4 improves by +5.19/+5.95. Compared to MMTLB (gloss-based), the gap narrows to 0.37/0.22 BLEU-4.
手语翻译(SLT)将手语视频转换成文本句子。这一过程对于聋人与正常人之间的交流至关重要。然而,视觉符号序列与语篇语言学之间固有的模态差距严重限制了语言的表现。现有方法依赖于昂贵的光泽注释进行中间监督,限制了可扩展性;无监督替代方案缺乏细粒度对齐或语义学习能力。为了解决这个问题,我们引入了CMAG-Net,这是一个集成了跨模态对齐预训练和动态图卷积的框架。该体系结构包括两个模块:(1)跨模态对齐预训练模块。通过多目标损失优化,学习将视觉特征与文本语义对齐,在没有光泽监督的情况下有效弥合模态差距;(2)动态双图时空模块。它由捕获局部符号动态的时间图和聚合全局语义关系的相似图组成。这种设计抑制了噪声,增强了识别特征,并解决了冗余帧和复杂时空依赖性的挑战。实验表明,CMAG-Net在PHOENIX-2014T、CSL-Daily和How2Sign上优于所有无光泽方法,接近基于光泽的最先进性能。与PHOENIX-2014T开发/测试集上的GFSLT-VLP(无光泽)相比,BLEU-4提高了+5.19/+5.95。与MMTLB(基于光泽)相比,差距缩小到0.37/0.22 BLEU-4。
{"title":"Sign language translation via cross-modal alignment and graph convolution","authors":"Ming Yu ,&nbsp;Pengfei Zhang ,&nbsp;Cuihong Xue ,&nbsp;Yingchun Guo","doi":"10.1016/j.neucom.2026.132949","DOIUrl":"10.1016/j.neucom.2026.132949","url":null,"abstract":"<div><div>Sign language translation (SLT) converts sign language videos into textual sentences. This process is essential for enabling communication between deaf and hearing individuals. However, the inherent modal gap between visual sign sequences and textual linguistics severely limits performance. Existing methods rely on costly gloss annotations for intermediate supervision, restricting scalability; unsupervised alternatives lack fine-grained alignment or semantic learning capabilities. To address this, we introduce CMAG-Net, a framework integrating cross-modal alignment pre-training and dynamic graph convolutions. The architecture comprises two modules: (1) A cross-modal alignment pre-training module. Optimized with a multi-objective loss, it learns to align visual features with textual semantics, effectively bridging the modality gap without gloss supervision; (2) A dynamic dual-graph spatiotemporal module. It consists of a temporal graph that captures local sign dynamics and a similarity graph that aggregates global semantic relationships. This design suppresses noise, enhances discriminative features, and addresses the challenges of redundant frames and complex spatiotemporal dependencies. Experiments show CMAG-Net outperforms all gloss-free methods on PHOENIX-2014T, CSL-Daily and How2Sign, approaching gloss-based state-of-the-art performance. Versus GFSLT-VLP (gloss-free) on PHOENIX-2014T dev/test sets, BLEU-4 improves by +5.19/+5.95. Compared to MMTLB (gloss-based), the gap narrows to 0.37/0.22 BLEU-4.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132949"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CDINet: A cascaded dual-domain interaction network for vapor degraded thermal infrared image restoration CDINet:用于蒸汽退化热红外图像恢复的级联双域相互作用网络
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 Epub Date: 2026-02-03 DOI: 10.1016/j.neucom.2026.132930
Kailun Wei, Xiaoyan Liu, Wei Zhao
Infrared thermography allows imaging in dark and smoky environments and is widely used in firefighting and industrial scenarios. However, high temperature water vapor in the above scenarios can significantly degrade the quality of thermal infrared (TIR) images, leading to errors in subsequent visual tasks. The non-uniform distribution of high-temperature water vapor and the resulting severe information loss in TIR images pose significant challenges to restoration. To address this issue, we propose a cascaded dual-domain interaction network (CDINet) for TIR image restoration. The Dual-domain Interaction Block (DIB) is designed as the basic unit of CDINet. This module enhances feature representation through spatial-frequency interaction, thereby improving the model’s performance in perceiving and restoring non-uniform vapor degraded regions. In addition, we introduce Long Short-Term Memory (LSTM) and design CDINet as a cascade structure to progressively restore and refine the lost information caused by vapor interference in an iterative manner. Furthermore, we have constructed a benchmark dataset comprising 12,500 vapor degraded TIR images to evaluate the restoration performance of different models. Extensive experiments comparing our CDINet with 12 state-of-the-art methods have shown that CDINet can effectively eliminate vapor interference from scenes with varying distributions. It outperforms other methods, especially in challenging scenarios with large non-uniform dense and localized non-uniform vapor degradation. The dataset and code are publicly available at: https://github.com/wkl1996/CDINet-TIR-Restoration.
红外热成像允许在黑暗和烟雾环境中成像,并广泛用于消防和工业场景。然而,在上述情况下,高温水蒸气会显著降低热红外(TIR)图像的质量,导致后续视觉任务的错误。高温水蒸气的不均匀分布及其造成的严重信息损失给红外图像的恢复带来了巨大的挑战。为了解决这个问题,我们提出了一个级联双域交互网络(CDINet)用于红外图像恢复。双域交互块(Dual-domain Interaction Block, DIB)是CDINet的基本单元。该模块通过空间-频率交互增强特征表示,从而提高模型感知和恢复非均匀蒸汽退化区域的性能。此外,我们引入了长短期记忆(LSTM),并将CDINet设计为级联结构,以迭代的方式逐步恢复和细化由蒸汽干扰引起的丢失信息。此外,我们还构建了一个包含12500张蒸汽退化TIR图像的基准数据集,以评估不同模型的恢复性能。将CDINet与12种最先进的方法进行比较的大量实验表明,CDINet可以有效地消除不同分布场景中的蒸汽干扰。它优于其他方法,特别是在具有大量非均匀密度和局部非均匀蒸汽降解的挑战性场景中。数据集和代码可在:https://github.com/wkl1996/CDINet-TIR-Restoration上公开获取。
{"title":"CDINet: A cascaded dual-domain interaction network for vapor degraded thermal infrared image restoration","authors":"Kailun Wei,&nbsp;Xiaoyan Liu,&nbsp;Wei Zhao","doi":"10.1016/j.neucom.2026.132930","DOIUrl":"10.1016/j.neucom.2026.132930","url":null,"abstract":"<div><div>Infrared thermography allows imaging in dark and smoky environments and is widely used in firefighting and industrial scenarios. However, high temperature water vapor in the above scenarios can significantly degrade the quality of thermal infrared (TIR) images, leading to errors in subsequent visual tasks. The non-uniform distribution of high-temperature water vapor and the resulting severe information loss in TIR images pose significant challenges to restoration. To address this issue, we propose a cascaded dual-domain interaction network (CDINet) for TIR image restoration. The Dual-domain Interaction Block (DIB) is designed as the basic unit of CDINet. This module enhances feature representation through spatial-frequency interaction, thereby improving the model’s performance in perceiving and restoring non-uniform vapor degraded regions. In addition, we introduce Long Short-Term Memory (LSTM) and design CDINet as a cascade structure to progressively restore and refine the lost information caused by vapor interference in an iterative manner. Furthermore, we have constructed a benchmark dataset comprising 12,500 vapor degraded TIR images to evaluate the restoration performance of different models. Extensive experiments comparing our CDINet with 12 state-of-the-art methods have shown that CDINet can effectively eliminate vapor interference from scenes with varying distributions. It outperforms other methods, especially in challenging scenarios with large non-uniform dense and localized non-uniform vapor degradation. The dataset and code are publicly available at: <span><span>https://github.com/wkl1996/CDINet-TIR-Restoration</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"675 ","pages":"Article 132930"},"PeriodicalIF":6.5,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neurocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1