Neural Networks最新文献

FSDM: An efficient video super-resolution method based on Frames-Shift Diffusion Model

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-04-03 DOI: 10.1016/j.neunet.2025.107435

Shijie Yang , Chao Chen , Jie Liu , Jie Tang , Gangshan Wu

Video super-resolution is a fundamental task aimed at enhancing video quality through intricate modeling techniques. Recent advancements in diffusion models have significantly enhanced image super-resolution processing capabilities. However, their integration into video super-resolution workflows remains constrained due to the computational complexity of temporal fusion modules, demanding more computational resources compared to their image counterparts. To address this challenge, we propose a novel approach: a Frames-Shift Diffusion Model based on the image diffusion models. Compared to directly training diffusion-based video super-resolution models, redesigning the diffusion process of image models without introducing complex temporal modules requires minimal training consumption. We incorporate temporal information into the image super-resolution diffusion model by using optical flow and perform multi-frame fusion. This model adapts the diffusion process to smoothly transition from image super-resolution to video super-resolution diffusion without additional weight parameters. As a result, the Frames-Shift Diffusion Model efficiently processes videos frame by frame while maintaining computational efficiency and achieving superior performance. It enhances perceptual quality and achieves comparable performance to other state-of-the-art diffusion-based VSR methods in PSNR and SSIM. This approach optimizes video super-resolution by simplifying the integration of temporal data, thus addressing key challenges in the field.

{"title":"FSDM: An efficient video super-resolution method based on Frames-Shift Diffusion Model","authors":"Shijie Yang , Chao Chen , Jie Liu , Jie Tang , Gangshan Wu","doi":"10.1016/j.neunet.2025.107435","DOIUrl":"10.1016/j.neunet.2025.107435","url":null,"abstract":"<div><div>Video super-resolution is a fundamental task aimed at enhancing video quality through intricate modeling techniques. Recent advancements in diffusion models have significantly enhanced image super-resolution processing capabilities. However, their integration into video super-resolution workflows remains constrained due to the computational complexity of temporal fusion modules, demanding more computational resources compared to their image counterparts. To address this challenge, we propose a novel approach: a Frames-Shift Diffusion Model based on the image diffusion models. Compared to directly training diffusion-based video super-resolution models, redesigning the diffusion process of image models without introducing complex temporal modules requires minimal training consumption. We incorporate temporal information into the image super-resolution diffusion model by using optical flow and perform multi-frame fusion. This model adapts the diffusion process to smoothly transition from image super-resolution to video super-resolution diffusion without additional weight parameters. As a result, the Frames-Shift Diffusion Model efficiently processes videos frame by frame while maintaining computational efficiency and achieving superior performance. It enhances perceptual quality and achieves comparable performance to other state-of-the-art diffusion-based VSR methods in PSNR and SSIM. This approach optimizes video super-resolution by simplifying the integration of temporal data, thus addressing key challenges in the field.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107435"},"PeriodicalIF":6.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Localize-diffusion based dual-branch anomaly detection

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-04-03 DOI: 10.1016/j.neunet.2025.107439

Jielin Jiang , Xiying Liu , Peiyi Yan , Shun Wei , Yan Cui

Due to the scarcity of real anomaly samples for use in anomaly detection studies, data augmentation methods are typically employed to generate pseudo anomaly samples to supplement the limited real samples. However, existing data augmentation methods often generate image patches with fixed shapes as anomalies in random regions. These anomalies are unrealistic and lack diversity, resulting in generated samples with limited practical value. To address this issue, we propose a dual-branch anomaly detection (DBA) technique based on Localize-Diffusion (LD) augmentation. LD can infer the approximate position and size of the object to be detected based on the samples’ color distribution: this can effectively avoid the problem of patch generation outside the target object’s location. LD subsequently incorporates hard augmentation and continuously propagates irregular patches to the surrounding area, which enriches the diversity of the generated samples. Based on the anomalies’ multi-scale characteristics, DBA adopts two branches for training and anomaly detection based on the generated pseudo anomaly samples: one focuses on identifying anomaly-specific features from learned anomalies, while the other discriminates between normal and anomaly samples based on residual features in the latent space. Finally, an adaptive scoring module is used to calculate a weighted average of the results of the two branches, achieving the goal of anomaly detection. Extensive experimental analyses reveal that DBA achieves excellent anomaly detection performance using only 14.2M parameters, notably achieving 99.6 detection AUC on the MVTec AD dataset.

{"title":"Localize-diffusion based dual-branch anomaly detection","authors":"Jielin Jiang , Xiying Liu , Peiyi Yan , Shun Wei , Yan Cui","doi":"10.1016/j.neunet.2025.107439","DOIUrl":"10.1016/j.neunet.2025.107439","url":null,"abstract":"<div><div>Due to the scarcity of real anomaly samples for use in anomaly detection studies, data augmentation methods are typically employed to generate pseudo anomaly samples to supplement the limited real samples. However, existing data augmentation methods often generate image patches with fixed shapes as anomalies in random regions. These anomalies are unrealistic and lack diversity, resulting in generated samples with limited practical value. To address this issue, we propose a dual-branch anomaly detection (DBA) technique based on Localize-Diffusion (LD) augmentation. LD can infer the approximate position and size of the object to be detected based on the samples’ color distribution: this can effectively avoid the problem of patch generation outside the target object’s location. LD subsequently incorporates hard augmentation and continuously propagates irregular patches to the surrounding area, which enriches the diversity of the generated samples. Based on the anomalies’ multi-scale characteristics, DBA adopts two branches for training and anomaly detection based on the generated pseudo anomaly samples: one focuses on identifying anomaly-specific features from learned anomalies, while the other discriminates between normal and anomaly samples based on residual features in the latent space. Finally, an adaptive scoring module is used to calculate a weighted average of the results of the two branches, achieving the goal of anomaly detection. Extensive experimental analyses reveal that DBA achieves excellent anomaly detection performance using only 14.2M parameters, notably achieving 99.6 detection AUC on the MVTec AD dataset.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107439"},"PeriodicalIF":6.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143769163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Source-free time series domain adaptation with wavelet-based multi-scale temporal imputation

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-04-02 DOI: 10.1016/j.neunet.2025.107428

Yingyi Zhong, Wen’an Zhou, Liwen Tao

Recent works on source-free domain adaptation (SFDA) for time series reveal the effectiveness of learning domain-invariant temporal dynamics on improving the cross-domain performance of the model. However, existing SFDA methods for time series mainly focus on modeling the original sequence, lacking the utilization of the multi-scale properties of time series. This may result in insufficient extraction of domain-invariant temporal patterns. Furthermore, previous multi-scale analysis methods typically ignore important frequency domain information during multi-scale division, leading to the limited ability for multi-scale time series modeling. To this end, we propose LEMON, a novel SFDA method for time series with wavelet-based multi-scale temporal imputation. It utilizes the discrete wavelet transform to decompose a time series into multiple scales, each with a distinct time–frequency resolution and specific frequency range, enabling full-spectrum utilization. To effectively transfer multi-scale temporal dynamics from the source domain to the target domain, we introduce a multi-scale temporal imputation module which assigns a deep neural network to perform the temporal imputation task on the sequence at each scale, learning scale-specific domain-invariant information. We further design an energy-based multi-scale weighting strategy, which adaptively integrates information from multiple scales based on the frequency distribution of the input data to improve the transfer performance of the model. Extensive experiments on three real-world time series datasets demonstrate that LEMON significantly outperforms the state-of-the-art methods, achieving an average improvement of 4.45% in accuracy and 6.29% in MF1-score.

{"title":"Source-free time series domain adaptation with wavelet-based multi-scale temporal imputation","authors":"Yingyi Zhong, Wen’an Zhou, Liwen Tao","doi":"10.1016/j.neunet.2025.107428","DOIUrl":"10.1016/j.neunet.2025.107428","url":null,"abstract":"<div><div>Recent works on source-free domain adaptation (SFDA) for time series reveal the effectiveness of learning domain-invariant temporal dynamics on improving the cross-domain performance of the model. However, existing SFDA methods for time series mainly focus on modeling the original sequence, lacking the utilization of the multi-scale properties of time series. This may result in insufficient extraction of domain-invariant temporal patterns. Furthermore, previous multi-scale analysis methods typically ignore important frequency domain information during multi-scale division, leading to the limited ability for multi-scale time series modeling. To this end, we propose LEMON, a novel SFDA method for time series with wavelet-based multi-scale temporal imputation. It utilizes the discrete wavelet transform to decompose a time series into multiple scales, each with a distinct time–frequency resolution and specific frequency range, enabling full-spectrum utilization. To effectively transfer multi-scale temporal dynamics from the source domain to the target domain, we introduce a multi-scale temporal imputation module which assigns a deep neural network to perform the temporal imputation task on the sequence at each scale, learning scale-specific domain-invariant information. We further design an energy-based multi-scale weighting strategy, which adaptively integrates information from multiple scales based on the frequency distribution of the input data to improve the transfer performance of the model. Extensive experiments on three real-world time series datasets demonstrate that LEMON significantly outperforms the state-of-the-art methods, achieving an average improvement of 4.45% in accuracy and 6.29% in MF1-score.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107428"},"PeriodicalIF":6.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143759541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-level feature fusion networks for smoke recognition in remote sensing imagery. 多尺度特征融合网络用于遥感图像烟雾识别。

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-04-01 Epub Date: 2025-01-04 DOI: 10.1016/j.neunet.2024.107112

Yupeng Wang, Yongli Wang, Zaki Ahmad Khan, Anqi Huang, Jianghui Sang

Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.

烟雾是森林火灾的关键指标，通常在火焰点燃之前就可以探测到。在物联网（IoT）系统中，准确的遥感图像烟雾识别对于有效的森林火灾监测至关重要。然而，现有的检测方法在复杂的现实场景中经常出现问题，在这些场景中，不同的烟雾形状和大小、复杂的背景和类似烟雾的现象（例如云和雾霾）会导致漏检和误报。为了解决这些挑战，我们提出了多层次特征融合网络（MFFNet），这是一种基于对比学习的新框架。MFFNet首先使用预训练的ConvNeXt模型从遥感图像中提取多尺度特征，捕获不同粒度级别的信息，以适应烟雾外观的变化。注意特征增强模块进一步细化这些多尺度特征，增强与烟雾探测相关的细粒度、判别属性。随后，双线性特征融合模块将这些丰富的特征结合起来，有效地减少了背景干扰，提高了模型区分烟雾和视觉相似现象的能力。最后，通过关注烟雾模式内的独特区域，采用对比特征学习来提高对类内变化的鲁棒性。在基准数据集USTC_SmokeRS上进行评估，MFFNet的准确率达到了98.87%。此外，我们的模型在扩展的E_SmokeRS数据集上的检测率为94.54%，虚警率为3.30%。这些结果突出了MFFNet在识别遥感图像中的烟雾方面的有效性，超越了现有的方法。代码可在https://github.com/WangYuPeng1/MFFNet上访问。

{"title":"Multi-level feature fusion networks for smoke recognition in remote sensing imagery.","authors":"Yupeng Wang, Yongli Wang, Zaki Ahmad Khan, Anqi Huang, Jianghui Sang","doi":"10.1016/j.neunet.2024.107112","DOIUrl":"10.1016/j.neunet.2024.107112","url":null,"abstract":"<p><p>Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107112"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ICH-PRNet: a cross-modal intracerebral haemorrhage prognostic prediction method using joint-attention interaction mechanism. ICH-PRNet：基于联合注意相互作用机制的跨模式脑出血预后预测方法。

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-04-01 Epub Date: 2025-01-06 DOI: 10.1016/j.neunet.2024.107096

Xinlei Yu, Ahmed Elazab, Ruiquan Ge, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Qing Wu, Xiang Wan, Lihua Li, Changmiao Wang

Accurately predicting intracerebral hemorrhage (ICH) prognosis is a critical and indispensable step in the clinical management of patients post-ICH. Recently, integrating artificial intelligence, particularly deep learning, has significantly enhanced prediction accuracy and alleviated neurosurgeons from the burden of manual prognosis assessment. However, uni-modal methods have shown suboptimal performance due to the intricate pathophysiology of the ICH. On the other hand, existing cross-modal approaches that incorporate tabular data have often failed to effectively extract complementary information and cross-modal features between modalities, thereby limiting their prognostic capabilities. This study introduces a novel cross-modal network, ICH-PRNet, designed to predict ICH prognosis outcomes. Specifically, we propose a joint-attention interaction encoder that effectively integrates computed tomography images and clinical texts within a unified representational space. Additionally, we define a multi-loss function comprising three components to comprehensively optimize cross-modal fusion capabilities. To balance the training process, we employ a self-adaptive dynamic prioritization algorithm that adjusts the weights of each component, accordingly. Our model, through these innovative designs, establishes robust semantic connections between modalities and uncovers rich, complementary cross-modal information, thereby achieving superior prediction results. Extensive experimental results and comparisons with state-of-the-art methods on both in-house and publicly available datasets unequivocally demonstrate the superiority and efficacy of the proposed method. Our code is at https://github.com/YU-deep/ICH-PRNet.git.

准确预测脑出血预后是脑出血后患者临床治疗中至关重要和不可缺少的一步。近年来，人工智能特别是深度学习的融合显著提高了预测精度，减轻了神经外科医生人工预后评估的负担。然而，由于脑出血复杂的病理生理，单模态方法表现不佳。另一方面，现有的包含表格数据的跨模态方法往往无法有效地提取模态之间的互补信息和跨模态特征，从而限制了它们的预测能力。本研究介绍了一种新的跨模式网络ICH- prnet，旨在预测脑出血预后。具体而言，我们提出了一种联合关注交互编码器，该编码器有效地将计算机断层扫描图像和临床文本集成在统一的表示空间内。此外，我们定义了一个包含三个组成部分的多损失函数，以全面优化跨模态融合能力。为了平衡训练过程，我们采用了一种自适应动态优先排序算法，该算法相应地调整每个组件的权重。通过这些创新设计，我们的模型在模态之间建立了鲁棒的语义连接，并揭示了丰富的、互补的跨模态信息，从而获得了卓越的预测结果。广泛的实验结果和与内部和公开可用数据集上最先进的方法的比较明确地证明了所提出方法的优越性和有效性。我们的代码在https://github.com/YU-deep/ICH-PRNet.git。

{"title":"ICH-PRNet: a cross-modal intracerebral haemorrhage prognostic prediction method using joint-attention interaction mechanism.","authors":"Xinlei Yu, Ahmed Elazab, Ruiquan Ge, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Qing Wu, Xiang Wan, Lihua Li, Changmiao Wang","doi":"10.1016/j.neunet.2024.107096","DOIUrl":"10.1016/j.neunet.2024.107096","url":null,"abstract":"<p><p>Accurately predicting intracerebral hemorrhage (ICH) prognosis is a critical and indispensable step in the clinical management of patients post-ICH. Recently, integrating artificial intelligence, particularly deep learning, has significantly enhanced prediction accuracy and alleviated neurosurgeons from the burden of manual prognosis assessment. However, uni-modal methods have shown suboptimal performance due to the intricate pathophysiology of the ICH. On the other hand, existing cross-modal approaches that incorporate tabular data have often failed to effectively extract complementary information and cross-modal features between modalities, thereby limiting their prognostic capabilities. This study introduces a novel cross-modal network, ICH-PRNet, designed to predict ICH prognosis outcomes. Specifically, we propose a joint-attention interaction encoder that effectively integrates computed tomography images and clinical texts within a unified representational space. Additionally, we define a multi-loss function comprising three components to comprehensively optimize cross-modal fusion capabilities. To balance the training process, we employ a self-adaptive dynamic prioritization algorithm that adjusts the weights of each component, accordingly. Our model, through these innovative designs, establishes robust semantic connections between modalities and uncovers rich, complementary cross-modal information, thereby achieving superior prediction results. Extensive experimental results and comparisons with state-of-the-art methods on both in-house and publicly available datasets unequivocally demonstrate the superiority and efficacy of the proposed method. Our code is at https://github.com/YU-deep/ICH-PRNet.git.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107096"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142972996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neighborhood relation-based knowledge distillation for image classification

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-04-01 DOI: 10.1016/j.neunet.2025.107429

Jianping Gou , Xiaomeng Xin , Baosheng Yu , Heping Song , Weiyong Zhang , Shaohua Wan

As an efficient model compression method, recent knowledge distillation methods primarily transfer the knowledge from a large teacher model to a small student model by minimizing the differences between the predictions from teacher and student. However, the relationship between different samples has not been well-investigated, since recent relational distillation methods mainly construct the knowledge from all randomly selected samples, e.g., the similarity matrix of mini-batch samples. In this paper, we propose Neighborhood Relation-Based Knowledge Distillation (NRKD) to consider the local structure as the novel relational knowledge for better knowledge transfer. Specifically, we first find a subset of samples with their

K

-nearest neighbors according to the similarity matrix of mini-batch samples and then build the neighborhood relationship knowledge for knowledge distillation, where the characterized relational knowledge can be transferred by both intermediate feature maps and output logits. We perform extensive experiments on several popular image classification datasets for knowledge distillation, including CIFAR10, CIFAR100, Tiny ImageNet, and ImageNet. Experimental results demonstrate that the proposed NRKD yields competitive results, compared to the state-of-the art distillation methods. Our codes are available at: https://github.com/xinxiaoxiaomeng/NRKD.git.

{"title":"Neighborhood relation-based knowledge distillation for image classification","authors":"Jianping Gou , Xiaomeng Xin , Baosheng Yu , Heping Song , Weiyong Zhang , Shaohua Wan","doi":"10.1016/j.neunet.2025.107429","DOIUrl":"10.1016/j.neunet.2025.107429","url":null,"abstract":"<div><div>As an efficient model compression method, recent knowledge distillation methods primarily transfer the knowledge from a large teacher model to a small student model by minimizing the differences between the predictions from teacher and student. However, the relationship between different samples has not been well-investigated, since recent relational distillation methods mainly construct the knowledge from all randomly selected samples, e.g., the similarity matrix of mini-batch samples. In this paper, we propose <strong>N</strong>eighborhood <strong>R</strong>elation-Based <strong>K</strong>nowledge <strong>D</strong>istillation (NRKD) to consider the local structure as the novel relational knowledge for better knowledge transfer. Specifically, we first find a subset of samples with their <span><math><mi>K</mi></math></span>-nearest neighbors according to the similarity matrix of mini-batch samples and then build the neighborhood relationship knowledge for knowledge distillation, where the characterized relational knowledge can be transferred by both intermediate feature maps and output logits. We perform extensive experiments on several popular image classification datasets for knowledge distillation, including CIFAR10, CIFAR100, Tiny ImageNet, and ImageNet. Experimental results demonstrate that the proposed NRKD yields competitive results, compared to the state-of-the art distillation methods. Our codes are available at: <span><span>https://github.com/xinxiaoxiaomeng/NRKD.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107429"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143759539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identity Model Transformation for boosting performance and efficiency in object detection network. 身份模型转换提高目标检测网络的性能和效率。

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-04-01 Epub Date: 2024-12-31 DOI: 10.1016/j.neunet.2024.107098

Zhongyuan Lu, Jin Liu, Miaozhong Xu

Modifying the structure of an existing network is a common method to further improve the performance of the network. However, modifying some layers in network often results in pre-trained weight mismatch, and fine-tune process is time-consuming and resource-inefficient. To address this issue, we propose a novel technique called Identity Model Transformation (IMT), which keep the output before and after transformation in an equal form by rigorous algebraic transformations. This approach ensures the preservation of the original model's performance when modifying layers. Additionally, IMT significantly reduces the total training time required to achieve optimal results while further enhancing network performance. IMT has established a bridge for rapid transformation between model architectures, enabling a model to quickly perform analytic continuation and derive a family of tree-like models with better performance. This model family possesses a greater potential for optimization improvements compared to a single model. Extensive experiments across various object detection tasks validated the effectiveness and efficiency of our proposed IMT solution, which saved 94.76% time in fine-tuning the basic model YOLOv4-Rot on DOTA 1.5 dataset, and by using the IMT method, we saw stable performance improvements of 9.89%, 6.94%, 2.36%, and 4.86% on the four datasets: AI-TOD, DOTA1.5, coco2017, and MRSAText, respectively.

修改现有网络的结构是进一步提高网络性能的常用方法。然而，修改网络中的某些层往往会导致预训练权值不匹配，并且微调过程耗时且资源效率低。为了解决这个问题，我们提出了一种称为单位模型变换（IMT）的新技术，该技术通过严格的代数变换使变换前后的输出保持相等的形式。这种方法保证了在修改图层时保持原始模型的性能。此外，IMT显著减少了获得最佳结果所需的总训练时间，同时进一步提高了网络性能。IMT为模型体系结构之间的快速转换建立了桥梁，使模型能够快速进行解析延拓，并派生出一系列性能更好的树状模型。与单个模型相比，该模型族具有更大的优化改进潜力。在各种目标检测任务中进行的大量实验验证了我们提出的IMT方案的有效性和效率，在DOTA1.5数据集上对基本模型YOLOv4-Rot进行调优节省了94.76%的时间，并且通过使用IMT方法，我们在AI-TOD、DOTA1.5、coco2017和MRSAText四个数据集上分别实现了9.89%、6.94%、2.36%和4.86%的稳定性能提升。

{"title":"Identity Model Transformation for boosting performance and efficiency in object detection network.","authors":"Zhongyuan Lu, Jin Liu, Miaozhong Xu","doi":"10.1016/j.neunet.2024.107098","DOIUrl":"10.1016/j.neunet.2024.107098","url":null,"abstract":"<p><p>Modifying the structure of an existing network is a common method to further improve the performance of the network. However, modifying some layers in network often results in pre-trained weight mismatch, and fine-tune process is time-consuming and resource-inefficient. To address this issue, we propose a novel technique called Identity Model Transformation (IMT), which keep the output before and after transformation in an equal form by rigorous algebraic transformations. This approach ensures the preservation of the original model's performance when modifying layers. Additionally, IMT significantly reduces the total training time required to achieve optimal results while further enhancing network performance. IMT has established a bridge for rapid transformation between model architectures, enabling a model to quickly perform analytic continuation and derive a family of tree-like models with better performance. This model family possesses a greater potential for optimization improvements compared to a single model. Extensive experiments across various object detection tasks validated the effectiveness and efficiency of our proposed IMT solution, which saved 94.76% time in fine-tuning the basic model YOLOv4-Rot on DOTA 1.5 dataset, and by using the IMT method, we saw stable performance improvements of 9.89%, 6.94%, 2.36%, and 4.86% on the four datasets: AI-TOD, DOTA1.5, coco2017, and MRSAText, respectively.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107098"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synergistic learning with multi-task DeepONet for efficient PDE problem solving. 协同学习与多任务DeepONet的高效PDE问题求解。

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-04-01 Epub Date: 2025-01-03 DOI: 10.1016/j.neunet.2024.107113

Varun Kumar, Somdatta Goswami, Katiana Kontolati, Michael D Shields, George Em Karniadakis

Multi-task learning (MTL) is an inductive transfer mechanism designed to leverage useful information from multiple tasks to improve generalization performance compared to single-task learning. It has been extensively explored in traditional machine learning to address issues such as data sparsity and overfitting in neural networks. In this work, we apply MTL to problems in science and engineering governed by partial differential equations (PDEs). However, implementing MTL in this context is complex, as it requires task-specific modifications to accommodate various scenarios representing different physical processes. To this end, we present a multi-task deep operator network (MT-DeepONet) to learn solutions across various functional forms of source terms in a PDE and multiple geometries in a single concurrent training session. We introduce modifications in the branch network of the vanilla DeepONet to account for various functional forms of a parameterized coefficient in a PDE. Additionally, we handle parameterized geometries by introducing a binary mask in the branch network and incorporating it into the loss term to improve convergence and generalization to new geometry tasks. Our approach is demonstrated on three benchmark problems: (1) learning different functional forms of the source term in the Fisher equation; (2) learning multiple geometries in a 2D Darcy Flow problem and showcasing better transfer learning capabilities to new geometries; and (3) learning 3D parameterized geometries for a heat transfer problem and demonstrate the ability to predict on new but similar geometries. Our MT-DeepONet framework offers a novel approach to solving PDE problems in engineering and science under a unified umbrella based on synergistic learning that reduces the overall training cost for neural operators.

多任务学习（MTL）是一种归纳迁移机制，旨在利用多任务中的有用信息来提高单任务学习的泛化性能。它在传统机器学习中被广泛探索，以解决神经网络中的数据稀疏性和过拟合等问题。在这项工作中，我们将MTL应用于偏微分方程（PDEs）控制的科学和工程问题。然而，在这种上下文中实现MTL是复杂的，因为它需要特定于任务的修改，以适应表示不同物理过程的各种场景。为此，我们提出了一个多任务深度算子网络（MT-DeepONet），以在单个并发训练会话中学习PDE中源项的各种功能形式和多个几何形状的解决方案。我们在香草DeepONet的分支网络中引入修改，以考虑PDE中参数化系数的各种函数形式。此外，我们通过在分支网络中引入二进制掩码并将其纳入损失项来处理参数化几何，以提高对新几何任务的收敛性和泛化性。我们的方法在三个基准问题上得到了证明：(1)学习Fisher方程中源项的不同函数形式；(2)在二维达西流问题中学习多种几何形状，并展示更好的新几何形状的迁移学习能力；(3)学习一个传热问题的三维参数化几何，并展示在新的但类似的几何上预测的能力。我们的MT-DeepONet框架提供了一种新的方法，在基于协同学习的统一框架下解决工程和科学中的PDE问题，从而降低了神经算子的总体训练成本。

{"title":"Synergistic learning with multi-task DeepONet for efficient PDE problem solving.","authors":"Varun Kumar, Somdatta Goswami, Katiana Kontolati, Michael D Shields, George Em Karniadakis","doi":"10.1016/j.neunet.2024.107113","DOIUrl":"10.1016/j.neunet.2024.107113","url":null,"abstract":"<p><p>Multi-task learning (MTL) is an inductive transfer mechanism designed to leverage useful information from multiple tasks to improve generalization performance compared to single-task learning. It has been extensively explored in traditional machine learning to address issues such as data sparsity and overfitting in neural networks. In this work, we apply MTL to problems in science and engineering governed by partial differential equations (PDEs). However, implementing MTL in this context is complex, as it requires task-specific modifications to accommodate various scenarios representing different physical processes. To this end, we present a multi-task deep operator network (MT-DeepONet) to learn solutions across various functional forms of source terms in a PDE and multiple geometries in a single concurrent training session. We introduce modifications in the branch network of the vanilla DeepONet to account for various functional forms of a parameterized coefficient in a PDE. Additionally, we handle parameterized geometries by introducing a binary mask in the branch network and incorporating it into the loss term to improve convergence and generalization to new geometry tasks. Our approach is demonstrated on three benchmark problems: (1) learning different functional forms of the source term in the Fisher equation; (2) learning multiple geometries in a 2D Darcy Flow problem and showcasing better transfer learning capabilities to new geometries; and (3) learning 3D parameterized geometries for a heat transfer problem and demonstrate the ability to predict on new but similar geometries. Our MT-DeepONet framework offers a novel approach to solving PDE problems in engineering and science under a unified umbrella based on synergistic learning that reduces the overall training cost for neural operators.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107113"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Recommender Systems through Imputation and Social-Aware Graph Convolutional Neural Network. 基于归算和社会感知图卷积神经网络的推荐系统增强。

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-04-01 Epub Date: 2024-12-31 DOI: 10.1016/j.neunet.2024.107071

Azadeh Faroughi, Parham Moradi, Mahdi Jalili

Recommendation systems are vital tools for helping users discover content that suits their interests. Collaborative filtering methods are one of the techniques employed for analyzing interactions between users and items, which are typically stored in a sparse matrix. This inherent sparsity poses a challenge because it necessitates accurately and effectively filling in these gaps to provide users with meaningful and personalized recommendations. Our solution addresses sparsity in recommendations by incorporating diverse data sources, including trust statements and an imputation graph. The trust graph captures user relationships and trust levels, working in conjunction with an imputation graph, which is constructed by estimating the missing rates of each user based on the user-item matrix using the average rates of the most similar users. Combined with the user-item rating graph, an attention mechanism fine tunes the influence of these graphs, resulting in more personalized and effective recommendations. Our method consistently outperforms state-of-the-art recommenders in real-world dataset evaluations, underscoring its potential to strengthen recommendation systems and mitigate sparsity challenges.

推荐系统是帮助用户发现符合他们兴趣的内容的重要工具。协同过滤方法是用于分析用户和项目之间交互的技术之一，通常存储在稀疏矩阵中。这种固有的稀疏性带来了挑战，因为它需要准确有效地填补这些空白，以便为用户提供有意义的个性化建议。我们的解决方案通过合并不同的数据源（包括信任语句和imputation图）来解决推荐中的稀疏性问题。信任图捕获用户关系和信任级别，并与一个imputation图一起工作，该图是通过使用最相似用户的平均比率根据用户-项目矩阵估计每个用户的缺失率来构建的。结合用户-物品评分图，注意力机制可以微调这些图的影响，从而产生更个性化和更有效的推荐。我们的方法在现实世界的数据集评估中始终优于最先进的推荐器，强调了其加强推荐系统和缓解稀疏性挑战的潜力。

{"title":"Enhancing Recommender Systems through Imputation and Social-Aware Graph Convolutional Neural Network.","authors":"Azadeh Faroughi, Parham Moradi, Mahdi Jalili","doi":"10.1016/j.neunet.2024.107071","DOIUrl":"10.1016/j.neunet.2024.107071","url":null,"abstract":"<p><p>Recommendation systems are vital tools for helping users discover content that suits their interests. Collaborative filtering methods are one of the techniques employed for analyzing interactions between users and items, which are typically stored in a sparse matrix. This inherent sparsity poses a challenge because it necessitates accurately and effectively filling in these gaps to provide users with meaningful and personalized recommendations. Our solution addresses sparsity in recommendations by incorporating diverse data sources, including trust statements and an imputation graph. The trust graph captures user relationships and trust levels, working in conjunction with an imputation graph, which is constructed by estimating the missing rates of each user based on the user-item matrix using the average rates of the most similar users. Combined with the user-item rating graph, an attention mechanism fine tunes the influence of these graphs, resulting in more personalized and effective recommendations. Our method consistently outperforms state-of-the-art recommenders in real-world dataset evaluations, underscoring its potential to strengthen recommendation systems and mitigate sparsity challenges.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107071"},"PeriodicalIF":6.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep prior embedding method for Electrical Impedance Tomography

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-03-31 DOI: 10.1016/j.neunet.2025.107419

Junwu Wang , Jiansong Deng , Dong Liu

This paper presents a novel deep learning-based approach for Electrical Impedance Tomography (EIT) reconstruction that effectively integrates image priors to enhance reconstruction quality. Traditional neural network methods often rely on random initialization, which may not fully exploit available prior information. Our method addresses this by using image priors to guide the initialization of the neural network, allowing for a more informed starting point and better utilization of prior knowledge throughout the reconstruction process. We explore three different strategies for embedding prior information: non-prior embedding, implicit prior embedding, and full prior embedding. Through simulations and experimental studies, we demonstrate that the incorporation of accurate image priors significantly improves the fidelity of the reconstructed conductivity distribution. The method is robust across varying levels of noise in the measurement data, and the quality of the reconstruction is notably higher when the prior closely resembles the true distribution. This work highlights the importance of leveraging prior information in EIT and provides a framework that could be extended to other inverse problems where prior knowledge is available.

{"title":"Deep prior embedding method for Electrical Impedance Tomography","authors":"Junwu Wang , Jiansong Deng , Dong Liu","doi":"10.1016/j.neunet.2025.107419","DOIUrl":"10.1016/j.neunet.2025.107419","url":null,"abstract":"<div><div>This paper presents a novel deep learning-based approach for Electrical Impedance Tomography (EIT) reconstruction that effectively integrates image priors to enhance reconstruction quality. Traditional neural network methods often rely on random initialization, which may not fully exploit available prior information. Our method addresses this by using image priors to guide the initialization of the neural network, allowing for a more informed starting point and better utilization of prior knowledge throughout the reconstruction process. We explore three different strategies for embedding prior information: non-prior embedding, implicit prior embedding, and full prior embedding. Through simulations and experimental studies, we demonstrate that the incorporation of accurate image priors significantly improves the fidelity of the reconstructed conductivity distribution. The method is robust across varying levels of noise in the measurement data, and the quality of the reconstruction is notably higher when the prior closely resembles the true distribution. This work highlights the importance of leveraging prior information in EIT and provides a framework that could be extended to other inverse problems where prior knowledge is available.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107419"},"PeriodicalIF":6.0,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143759542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0