首页 > 最新文献

Neural Networks最新文献

英文 中文
Free-VTON: Cost-free acceleration and quality enhancement for diffusion-based virtual try-on Free-VTON:免费加速和质量增强扩散为基础的虚拟试戴
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.neunet.2025.108536
Dan Song , Yuhang Pan , Shuangyan Yue , Yao Jin , An-An Liu
With the breakthrough of diffusion model in the field of image generation, the virtual try-on based on diffusion models has significant advantages in try-on performance. However, diffusion-based inference suffers from slow generation speed. Additionally, improving image quality generally relies on more and more network parameters, which raises computational cost. These issues hinder the real-time interactivity and practical deployment of virtual try-on systems. In this work, we propose Cost-Free Acceleration and Quality Enhancement for Diffusion-based Virtual Try-On called Free-VTON. Specifically, we introduce an Adaptive Caching acceleration strategy that adaptively caches and reuses features according to the similarity between features in adjacent diffusion steps. Aggressive caching is used when the similarity is high, and conservative caching is used when the similarity is low. Different caching trajectories are applied to different samples, and the caching rhythm is adaptively adjusted based on the content. This strategy accelerates the inference process without affecting the quality of try on. In addition, we introduce a Symmetric Feature Enhancement technique, which symmetrically amplifies the backbone features on both sides of the U-Net during inference to enhance feature extraction and reconstruction generation capabilities. Similarly, this technique improves generation quality with almost no additional computational overhead. Experiments demonstrate the superiority of our method for speed and quality trade-offs. Code will be avaliable at https://github.com/PERSIST10/freevton.
随着扩散模型在图像生成领域的突破,基于扩散模型的虚拟试戴在试戴性能上具有显著优势。然而,基于扩散的推理存在生成速度慢的问题。此外,提高图像质量通常依赖于越来越多的网络参数,这增加了计算成本。这些问题阻碍了虚拟试戴系统的实时交互性和实际部署。在这项工作中,我们提出了一种基于扩散的虚拟试戴(Free-VTON)的无成本加速和质量增强方法。具体来说,我们引入了一种自适应缓存加速策略,该策略根据相邻扩散步骤中特征之间的相似性自适应缓存和重用特征。相似度高时使用主动缓存,相似度低时使用保守缓存。对不同的样本应用不同的缓存轨迹,并根据内容自适应调整缓存节奏。该策略在不影响试装质量的前提下加快了推理过程。此外,我们还引入了一种对称特征增强技术,该技术在推理过程中对称地放大U-Net两侧的骨干特征,以增强特征提取和重建生成能力。类似地,这种技术在几乎没有额外计算开销的情况下提高了生成质量。实验证明了我们的方法在速度和质量权衡方面的优越性。代码可在https://github.com/PERSIST10/freevton上获得。
{"title":"Free-VTON: Cost-free acceleration and quality enhancement for diffusion-based virtual try-on","authors":"Dan Song ,&nbsp;Yuhang Pan ,&nbsp;Shuangyan Yue ,&nbsp;Yao Jin ,&nbsp;An-An Liu","doi":"10.1016/j.neunet.2025.108536","DOIUrl":"10.1016/j.neunet.2025.108536","url":null,"abstract":"<div><div>With the breakthrough of diffusion model in the field of image generation, the virtual try-on based on diffusion models has significant advantages in try-on performance. However, diffusion-based inference suffers from slow generation speed. Additionally, improving image quality generally relies on more and more network parameters, which raises computational cost. These issues hinder the real-time interactivity and practical deployment of virtual try-on systems. In this work, we propose Cost-Free Acceleration and Quality Enhancement for Diffusion-based Virtual Try-On called Free-VTON. Specifically, we introduce an Adaptive Caching acceleration strategy that adaptively caches and reuses features according to the similarity between features in adjacent diffusion steps. Aggressive caching is used when the similarity is high, and conservative caching is used when the similarity is low. Different caching trajectories are applied to different samples, and the caching rhythm is adaptively adjusted based on the content. This strategy accelerates the inference process without affecting the quality of try on. In addition, we introduce a Symmetric Feature Enhancement technique, which symmetrically amplifies the backbone features on both sides of the U-Net during inference to enhance feature extraction and reconstruction generation capabilities. Similarly, this technique improves generation quality with almost no additional computational overhead. Experiments demonstrate the superiority of our method for speed and quality trade-offs. Code will be avaliable at <span><span>https://github.com/PERSIST10/freevton</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"197 ","pages":"Article 108536"},"PeriodicalIF":6.3,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonnegative spectral embedding learning with adaptive neighbors for multi-view clustering 多视图聚类的自适应邻域非负谱嵌入学习。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.neunet.2025.108537
Mingyu Zhao , Feiping Nie , Cong Wang , Xuelong Li , Zehan Tan , Huaqiang Hu
Graph-based multi-view clustering (MVC) methods often rely on fixed or manually constructed similarity graphs and involve multiple sensitive hyperparameters, which limits their robustness and practical applicability. To address these issues, we propose Nonnegative Spectral Embedding with Adaptive Neighbors (NSEAN), a unified one-stage MVC framework that integrates per-view adaptive graph learning with nonnegative spectral embedding. NSEAN jointly learns adaptive similarity graphs and a consensus spectral embedding that directly serves as the clustering indicator matrix, thereby eliminating the need for post-processing. By enforcing nonnegativity and orthogonality, the learned embedding admits a clear and interpretable cluster-assignment structure. To efficiently optimize the coupled constraints, an Augmented Lagrangian Multiplier (ALM) strategy is employed to ensure stable and effective optimization. Extensive experiments on real-world multi-view datasets demonstrate that NSEAN consistently achieves competitive or superior clustering performance while requiring only a single hyperparameter number of neighbors k, to which the model is empirically insensitive, thus avoiding cumbersome parameter tuning. The code is available at https://github.com/haha1206/NSEAN.
基于图的多视图聚类(MVC)方法通常依赖于固定的或人工构造的相似图,并且涉及多个敏感超参数,这限制了其鲁棒性和实用性。为了解决这些问题,我们提出了带有自适应邻居的非负谱嵌入(NSEAN),这是一种统一的单阶段MVC框架,将每视图自适应图学习与非负谱嵌入集成在一起。NSEAN联合学习自适应相似图和直接作为聚类指标矩阵的共识谱嵌入,从而消除了后处理的需要。通过强化非负性和正交性,学习嵌入具有清晰、可解释的聚类分配结构。为了有效地优化耦合约束,采用增广拉格朗日乘子(ALM)策略保证了优化的稳定性和有效性。在真实世界的多视图数据集上进行的大量实验表明,NSEAN在只需要单个超参数邻居数量k的情况下,始终能够获得具有竞争力或更好的聚类性能,模型对k的经验不敏感,从而避免了繁琐的参数调优。代码可在https://github.com/haha1206/NSEAN上获得。
{"title":"Nonnegative spectral embedding learning with adaptive neighbors for multi-view clustering","authors":"Mingyu Zhao ,&nbsp;Feiping Nie ,&nbsp;Cong Wang ,&nbsp;Xuelong Li ,&nbsp;Zehan Tan ,&nbsp;Huaqiang Hu","doi":"10.1016/j.neunet.2025.108537","DOIUrl":"10.1016/j.neunet.2025.108537","url":null,"abstract":"<div><div>Graph-based multi-view clustering (MVC) methods often rely on fixed or manually constructed similarity graphs and involve multiple sensitive hyperparameters, which limits their robustness and practical applicability. To address these issues, we propose Nonnegative Spectral Embedding with Adaptive Neighbors (NSEAN), a unified one-stage MVC framework that integrates per-view adaptive graph learning with nonnegative spectral embedding. NSEAN jointly learns adaptive similarity graphs and a consensus spectral embedding that directly serves as the clustering indicator matrix, thereby eliminating the need for post-processing. By enforcing nonnegativity and orthogonality, the learned embedding admits a clear and interpretable cluster-assignment structure. To efficiently optimize the coupled constraints, an Augmented Lagrangian Multiplier (ALM) strategy is employed to ensure stable and effective optimization. Extensive experiments on real-world multi-view datasets demonstrate that NSEAN consistently achieves competitive or superior clustering performance while requiring only a single hyperparameter number of neighbors <em>k</em>, to which the model is empirically insensitive, thus avoiding cumbersome parameter tuning. The code is available at <span><span>https://github.com/haha1206/NSEAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"197 ","pages":"Article 108537"},"PeriodicalIF":6.3,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145953774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Angel or devil: Discriminating hard samples and anomaly contaminations for unsupervised time series anomaly detection 天使还是魔鬼:判别硬样本和异常污染的无监督时间序列异常检测
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.neunet.2025.108532
Ruyi Zhang , Hongzuo Xu , Songlei Jian , Yusong Tan , Haifang Zhou , Rulin Xu
Training for unsupervised time series anomaly detection is constantly plagued by the discrimination between harmful anomaly contaminations and beneficial hard normal samples. These two types of samples display similar loss behavior, which conventional loss-based methods struggle to differentiate. To address this issue, we introduce a novel metric that augments traditional loss behavior with parameter behavior, thereby enabling a more granular delineation of anomalous patterns. Parameter behavior is formalized by quantifying the parametric response to minor perturbations in data samples. By exploiting the complementary nature of parameter and loss behaviors, we further introduce PLDA, a dual Parameter-Loss Data Augmentation method. During the training phase of anomaly detection, PLDA dynamically augments the training set via an iterative procedure. It concurrently mitigates anomaly contaminations while amplifying informative hard normal samples. PLDA exhibits an impressive adaptability, enabling it to serve as an additional component that seamlessly integrates with existing anomaly detectors to enhance their performance. Extensive experiments on ten datasets demonstrate that PLDA significantly enhances the performance of four different detectors by up to 8%, outperforming three data augmentation competitors.1
无监督时间序列异常检测的训练一直受到有害异常污染和有益硬正常样本的区分的困扰。这两种类型的样品表现出相似的损失行为,这是传统的基于损失的方法难以区分的。为了解决这个问题,我们引入了一种新的度量,该度量通过参数行为来增加传统的损失行为,从而能够更细致地描述异常模式。参数行为通过量化对数据样本中微小扰动的参数响应来形式化。通过利用参数和损耗行为的互补性,我们进一步引入了双参数-损耗数据增强方法PLDA。在异常检测的训练阶段,PLDA通过迭代过程对训练集进行动态扩充。它同时减轻异常污染,同时扩大信息硬正常样本。PLDA表现出令人印象深刻的适应性,使其能够作为与现有异常检测器无缝集成的附加组件,以提高其性能。在10个数据集上进行的大量实验表明,PLDA显着提高了四种不同检测器的性能,最高可达8%,优于三种数据增强竞争对手
{"title":"Angel or devil: Discriminating hard samples and anomaly contaminations for unsupervised time series anomaly detection","authors":"Ruyi Zhang ,&nbsp;Hongzuo Xu ,&nbsp;Songlei Jian ,&nbsp;Yusong Tan ,&nbsp;Haifang Zhou ,&nbsp;Rulin Xu","doi":"10.1016/j.neunet.2025.108532","DOIUrl":"10.1016/j.neunet.2025.108532","url":null,"abstract":"<div><div>Training for unsupervised time series anomaly detection is constantly plagued by the discrimination between harmful <em>anomaly contaminations</em> and beneficial <em>hard normal samples</em>. These two types of samples display similar loss behavior, which conventional loss-based methods struggle to differentiate. To address this issue, we introduce a novel metric that augments traditional loss behavior with <em>parameter behavior</em>, thereby enabling a more granular delineation of anomalous patterns. Parameter behavior is formalized by quantifying the parametric response to minor perturbations in data samples. By exploiting the complementary nature of parameter and loss behaviors, we further introduce <span>PLDA</span>, a dual <span>P</span>arameter-<span>L</span>oss <span>D</span>ata <span>A</span>ugmentation method. During the training phase of anomaly detection, <span>PLDA</span> dynamically augments the training set via an iterative procedure. It concurrently mitigates anomaly contaminations while amplifying informative hard normal samples. <span>PLDA</span> exhibits an impressive adaptability, enabling it to serve as an additional component that seamlessly integrates with existing anomaly detectors to enhance their performance. Extensive experiments on ten datasets demonstrate that <span>PLDA</span> significantly enhances the performance of four different detectors by up to 8%, outperforming three data augmentation competitors.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"197 ","pages":"Article 108532"},"PeriodicalIF":6.3,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the white matter disruptions for Schizophrenia based on convolutional ensemble kernel randomized network. 基于卷积集成核随机网络的精神分裂症脑白质中断研究。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-08-28 DOI: 10.1016/j.neunet.2025.108044
S A Varaprasad, Tripti Goel, M Tanveer

Schizophrenia (SZ) is characterized by cognitive impairments and widespread structural brain alterations. The potential adaptability of convolutional neural networks (CNN) to identify the complex and extensive brain alterations associated with SZ relies on its automatic feature learning capability. Structural magnetic resonance imaging (sMRI) is a non-invasive technique for investigating disruptions related to white matter (WM), grey matter (GM), and cerebrospinal fluid (CSF) of brain regions. We proposed an intrinsic CNN ensemble of kernel ridge regression-based random vector functional link (KRR-RVFL) architecture to explore the WM disruptions for SZ. In this approach, we have integrated an eight-layer CNN into five different KRR-RVFL classifiers for feature extraction and classification. The classifiers' outputs are averaged and fed to the final KRR-RVFL classifier for final classification. The KRR-RVFL classifier enhances stability and robustness by addressing non-linearity limitations in the standard RVFL network. The proposed CNN ensemble KRR-RVFL outperforms other classifiers with 97.33 % accuracy for the WM region, showing significant disruptions compared to GM and CSF. Furthermore, we calculated the correlation coefficient between tissue volumes and the scale of symptoms for GM and WM. According to the results, tissue volume for WM is reduced more than GM for SZ. The proposed model assists clinicians in exploring the role of WM disruptions for accurate diagnosis of SZ.

精神分裂症(SZ)的特征是认知障碍和广泛的大脑结构改变。卷积神经网络(CNN)识别与SZ相关的复杂而广泛的大脑变化的潜在适应性依赖于其自动特征学习能力。结构磁共振成像(sMRI)是一种非侵入性技术,用于研究脑区域白质(WM)、灰质(GM)和脑脊液(CSF)相关的中断。我们提出了一种基于核脊回归的随机向量功能链接(KRR-RVFL)架构的内禀CNN集成来研究SZ的WM中断。在这种方法中,我们将一个八层CNN集成到五个不同的KRR-RVFL分类器中,用于特征提取和分类。对分类器的输出进行平均并馈送到最终的KRR-RVFL分类器进行最终分类。KRR-RVFL分类器通过解决标准RVFL网络中的非线性限制来提高稳定性和鲁棒性。提出的CNN集成KRR-RVFL在WM区域的准确率为97.33%,优于其他分类器,与GM和CSF相比显示出明显的中断。此外,我们计算了GM和WM的组织体积与症状规模之间的相关系数。结果表明,WM的组织体积减少幅度大于GM的组织体积减少幅度。该模型有助于临床医生探索WM干扰对SZ准确诊断的作用。
{"title":"Exploring the white matter disruptions for Schizophrenia based on convolutional ensemble kernel randomized network.","authors":"S A Varaprasad, Tripti Goel, M Tanveer","doi":"10.1016/j.neunet.2025.108044","DOIUrl":"10.1016/j.neunet.2025.108044","url":null,"abstract":"<p><p>Schizophrenia (SZ) is characterized by cognitive impairments and widespread structural brain alterations. The potential adaptability of convolutional neural networks (CNN) to identify the complex and extensive brain alterations associated with SZ relies on its automatic feature learning capability. Structural magnetic resonance imaging (sMRI) is a non-invasive technique for investigating disruptions related to white matter (WM), grey matter (GM), and cerebrospinal fluid (CSF) of brain regions. We proposed an intrinsic CNN ensemble of kernel ridge regression-based random vector functional link (KRR-RVFL) architecture to explore the WM disruptions for SZ. In this approach, we have integrated an eight-layer CNN into five different KRR-RVFL classifiers for feature extraction and classification. The classifiers' outputs are averaged and fed to the final KRR-RVFL classifier for final classification. The KRR-RVFL classifier enhances stability and robustness by addressing non-linearity limitations in the standard RVFL network. The proposed CNN ensemble KRR-RVFL outperforms other classifiers with 97.33 % accuracy for the WM region, showing significant disruptions compared to GM and CSF. Furthermore, we calculated the correlation coefficient between tissue volumes and the scale of symptoms for GM and WM. According to the results, tissue volume for WM is reduced more than GM for SZ. The proposed model assists clinicians in exploring the role of WM disruptions for accurate diagnosis of SZ.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"108044"},"PeriodicalIF":6.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145056248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dormant key: Unlocking universal adversarial control in text-to-image models. 休眠键:解锁文本到图像模型中的通用对抗性控制。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-09-03 DOI: 10.1016/j.neunet.2025.108065
Jingqi Hu, Li Li, Hanzhou Wu, Huixin Luo, Xinpeng Zhang

Text-to-Image (T2I) diffusion models have gained significant traction due to their remarkable image generation capabilities, raising growing concerns over the security risks associated with their use. Prior studies have shown that malicious users can subtly modify prompts to produce visually misleading or Not-Safe-For-Work (NSFW) content, even bypassing existing safety filters. Existing adversarial attacks are often optimized for specific prompts, limiting their generalizability, and their text-space perturbations are easily detectable by current defenses. To address these limitations, we propose a universal adversarial attack framework called dormant key. It appends a transferable suffix that can be appended as a "plug-in" to any text input to guide the generated image toward a specific target. To ensure robustness across diverse prompts, we introduce a novel hierarchical gradient aggregation strategy that stabilizes optimization over prompt batches. This enables efficient learning of universal perturbations in the text space, improving both attack transferability and imperceptibility. Experimental results show that our method effectively balances attack performance and stealth. In NSFW generation tasks, it bypasses major safety mechanisms, including keyword filtering, semantic analysis, and text classifiers, and achieves over 18 % improvement in success rate over baselines.

文本到图像(tt2i)扩散模型由于其卓越的图像生成能力而获得了显著的吸引力,引起了对其使用相关安全风险的日益关注。先前的研究表明,恶意用户可以巧妙地修改提示,以产生视觉误导或不安全的工作(NSFW)内容,甚至绕过现有的安全过滤器。现有的对抗性攻击通常针对特定的提示进行优化,限制了它们的泛化性,并且它们的文本空间扰动很容易被当前的防御检测到。为了解决这些限制,我们提出了一个通用的对抗性攻击框架,称为休眠密钥。它附加了一个可转移的后缀,该后缀可以作为“插件”附加到任何文本输入,以引导生成的图像指向特定目标。为了确保不同提示的鲁棒性,我们引入了一种新的分层梯度聚合策略,该策略可以在提示批次上稳定优化。这使得有效地学习文本空间中的普遍扰动,提高攻击的可转移性和不可感知性。实验结果表明,该方法有效地平衡了攻击性能和隐身性。在NSFW生成任务中,它绕过了主要的安全机制,包括关键字过滤、语义分析和文本分类器,成功率比基线提高了18%以上。
{"title":"Dormant key: Unlocking universal adversarial control in text-to-image models.","authors":"Jingqi Hu, Li Li, Hanzhou Wu, Huixin Luo, Xinpeng Zhang","doi":"10.1016/j.neunet.2025.108065","DOIUrl":"10.1016/j.neunet.2025.108065","url":null,"abstract":"<p><p>Text-to-Image (T2I) diffusion models have gained significant traction due to their remarkable image generation capabilities, raising growing concerns over the security risks associated with their use. Prior studies have shown that malicious users can subtly modify prompts to produce visually misleading or Not-Safe-For-Work (NSFW) content, even bypassing existing safety filters. Existing adversarial attacks are often optimized for specific prompts, limiting their generalizability, and their text-space perturbations are easily detectable by current defenses. To address these limitations, we propose a universal adversarial attack framework called dormant key. It appends a transferable suffix that can be appended as a \"plug-in\" to any text input to guide the generated image toward a specific target. To ensure robustness across diverse prompts, we introduce a novel hierarchical gradient aggregation strategy that stabilizes optimization over prompt batches. This enables efficient learning of universal perturbations in the text space, improving both attack transferability and imperceptibility. Experimental results show that our method effectively balances attack performance and stealth. In NSFW generation tasks, it bypasses major safety mechanisms, including keyword filtering, semantic analysis, and text classifiers, and achieves over 18 % improvement in success rate over baselines.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"108065"},"PeriodicalIF":6.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145058622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CocoAdapter: Efficient end-to-end temporal action detection via self-constrained multi-cognitive adapters CocoAdapter:通过自我约束的多认知适配器进行高效的端到端临时动作检测。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 DOI: 10.1016/j.neunet.2025.108531
Lizao Zhang , Qiuhong Tian , Junxiao Ning , Yihan Yuan , Ziyu Yang , Yang Yu
End-to-end training in temporal action detection (TAD) has shown great potential for performance improvement by jointly optimizing the video encoder and action classification head. However, memory bottlenecks have limited the performance of end-to-end TAD. To alleviate the memory overhead during training, this paper explores the application of adapters in TAD and proposes a specialized TAD-oriented self-constraint multi-cognitive adapter (CocoAdapter). Based on CocoAdapter, we construct a novel baseline, CocoTad. Our proposed CocoAdapter utilizes self-constraint projection layers to adjust multiple cognitive convolutional groups based on network depth, enabling a fine-tuning process tailored to the TAD task. As a result, the network only needs to update the parameters in CocoAdapter to achieve end-to-end training, significantly reducing memory consumption during training. We evaluate our model on four representative datasets. Experimental results demonstrate that our proposed CocoTad surpasses previous state-of-the-art methods in terms of mAP.
通过对视频编码器和动作分类头的联合优化,端到端训练在时间动作检测(TAD)中显示出巨大的性能提升潜力。然而,内存瓶颈限制了端到端TAD的性能。为了减轻训练过程中的内存开销,本文探讨了适配器在TAD中的应用,提出了一种专门的面向TAD的自约束多认知适配器(CocoAdapter)。基于CocoAdapter,我们构建了一个新的基线——CocoTad。我们提出的CocoAdapter利用自我约束投影层来调整基于网络深度的多个认知卷积组,从而实现针对TAD任务的微调过程。因此,网络只需要更新CocoAdapter中的参数就可以实现端到端的训练,大大减少了训练过程中的内存消耗。我们在四个代表性数据集上评估了我们的模型。实验结果表明,我们提出的CocoTad在mAP方面超越了以前最先进的方法。
{"title":"CocoAdapter: Efficient end-to-end temporal action detection via self-constrained multi-cognitive adapters","authors":"Lizao Zhang ,&nbsp;Qiuhong Tian ,&nbsp;Junxiao Ning ,&nbsp;Yihan Yuan ,&nbsp;Ziyu Yang ,&nbsp;Yang Yu","doi":"10.1016/j.neunet.2025.108531","DOIUrl":"10.1016/j.neunet.2025.108531","url":null,"abstract":"<div><div>End-to-end training in temporal action detection (TAD) has shown great potential for performance improvement by jointly optimizing the video encoder and action classification head. However, memory bottlenecks have limited the performance of end-to-end TAD. To alleviate the memory overhead during training, this paper explores the application of adapters in TAD and proposes a specialized TAD-oriented self-<strong>co</strong>nstraint multi-<strong>co</strong>gnitive <strong>adapter</strong> (<strong>CocoAdapter</strong>). Based on CocoAdapter, we construct a novel baseline, CocoTad. Our proposed CocoAdapter utilizes self-constraint projection layers to adjust multiple cognitive convolutional groups based on network depth, enabling a fine-tuning process tailored to the TAD task. As a result, the network only needs to update the parameters in CocoAdapter to achieve end-to-end training, significantly reducing memory consumption during training. We evaluate our model on four representative datasets. Experimental results demonstrate that our proposed CocoTad surpasses previous state-of-the-art methods in terms of mAP.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"198 ","pages":"Article 108531"},"PeriodicalIF":6.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting DIRE: towards universal AI-generated image detection. 重访可怕:走向通用人工智能生成的图像检测。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-09-06 DOI: 10.1016/j.neunet.2025.108084
Huanqi Lin, Jinghui Qin, Xiaoqi Wu, Tianshui Chen, Zhijing Yang

The rapid development of generative models has improved image quality and made image synthesis widely accessible, raising concerns about content credibility. To address this issue, we propose a method called Universal Reconstruction Residual Analysis (UR2EA) for detecting synthetic images. Our study reveals that, when GAN- and diffusion-generated images are reconstructed by pre-trained diffusion models, they exhibit significant differences in reconstruction error compared to real images: GAN-generated images show lower reconstruction quality than real images, whereas diffusion-generated images are more accurately reconstructed. We leverage these residual maps as a universal prior to training a model for detecting synthetic images. In addition, we introduce a Multi-scale Channel and Window Attention (MCWA) module to extract fine-grained features from residual maps across multiple scales, capturing both local and global details. To facilitate the exploration of diverse detection methods, we constructed a new UniversalForensics dataset, which includes various representations of synthetic images generated by 30 different models. Compared to the best-performing baselines, our method improves average accuracy by 3.3 % and precision by 1.6 %, achieving state-of-the-art results.

生成模型的快速发展提高了图像质量,并使图像合成广泛使用,引起了对内容可信度的关注。为了解决这个问题,我们提出了一种称为通用重建残差分析(UR2EA)的方法来检测合成图像。我们的研究表明,当使用预训练的扩散模型重建GAN生成的图像和扩散生成的图像时,与真实图像相比,它们的重建误差存在显著差异:GAN生成的图像的重建质量低于真实图像,而扩散生成的图像的重建精度更高。在训练检测合成图像的模型之前,我们利用这些残差地图作为通用的。此外,我们引入了一个多尺度通道和窗口注意(MCWA)模块,从多尺度残差地图中提取细粒度特征,捕获局部和全局细节。为了方便探索不同的检测方法,我们构建了一个新的通用取证数据集,其中包括由30种不同模型生成的合成图像的各种表示。与表现最好的基线相比,我们的方法平均准确度提高了3.3%,精密度提高了1.6%,达到了最先进的结果。
{"title":"Revisiting DIRE: towards universal AI-generated image detection.","authors":"Huanqi Lin, Jinghui Qin, Xiaoqi Wu, Tianshui Chen, Zhijing Yang","doi":"10.1016/j.neunet.2025.108084","DOIUrl":"10.1016/j.neunet.2025.108084","url":null,"abstract":"<p><p>The rapid development of generative models has improved image quality and made image synthesis widely accessible, raising concerns about content credibility. To address this issue, we propose a method called Universal Reconstruction Residual Analysis (UR<sup>2</sup>EA) for detecting synthetic images. Our study reveals that, when GAN- and diffusion-generated images are reconstructed by pre-trained diffusion models, they exhibit significant differences in reconstruction error compared to real images: GAN-generated images show lower reconstruction quality than real images, whereas diffusion-generated images are more accurately reconstructed. We leverage these residual maps as a universal prior to training a model for detecting synthetic images. In addition, we introduce a Multi-scale Channel and Window Attention (MCWA) module to extract fine-grained features from residual maps across multiple scales, capturing both local and global details. To facilitate the exploration of diverse detection methods, we constructed a new UniversalForensics dataset, which includes various representations of synthetic images generated by 30 different models. Compared to the best-performing baselines, our method improves average accuracy by 3.3 % and precision by 1.6 %, achieving state-of-the-art results.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"108084"},"PeriodicalIF":6.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145058700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A DINO-based progressive semantic enhanced infrared and visible image fusion network 一种基于dino的渐进式语义增强红外与可见光图像融合网络
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-31 DOI: 10.1016/j.neunet.2025.108527
Shihan Yao , Zhonghui Pei , Huiqin Zhang , Haiyang Jiang , Huabing Zhou
Infrared and visible image fusion aims to integrate complementary information from two source images into a single fused image with rich detail. However, most existing fusion methods focus on visual appearance and pay little attention to the semantic requirements of downstream applications. Although some semantic driven approaches enhance the semantic content of fused images, they rely on labelled data that contain only limited semantic target information.To address this limitation, this paper proposes a DINO-based progressive semantic enhanced infrared and visible image fusion network (DPSEF). DINO is a self supervised model that learns representations from large volumes of unlabelled images and exhibits powerful spatial semantic clustering capabilities. We exploit DINO to extract fine grained spatial semantic features as prior knowledge, and then introduce a semantic enhanced fusion module (SEFM) that progressively injects these semantic priors into the fusion network. This mechanism guides the model to focus on target relevant regions and generates high quality fused images that combine rich semantic and detailed information, thereby meeting the needs of subsequent high level vision tasks.Extensive experiments demonstrate that DPSEF produces fused images whose visual quality significantly exceeds that of mainstream algorithms. Qualitative and quantitative analyses further confirm the strong potential of DPSEF in high level vision applications. Moreover, additional experiments on multi focus image fusion validate the generality and robustness of the proposed network.
红外图像与可见光图像融合的目的是将两幅源图像的互补信息融合成一幅细节丰富的融合图像。然而,现有的融合方法大多关注视觉外观,很少关注下游应用的语义需求。尽管一些语义驱动方法增强了融合图像的语义内容,但它们依赖于仅包含有限语义目标信息的标记数据。为了解决这一问题,本文提出了一种基于dino的渐进式语义增强红外和可见光图像融合网络(DPSEF)。DINO是一种自监督模型,它从大量未标记的图像中学习表征,并表现出强大的空间语义聚类能力。我们利用DINO提取细粒度的空间语义特征作为先验知识,然后引入语义增强融合模块(SEFM),逐步将这些语义先验注入融合网络。该机制引导模型聚焦目标相关区域,生成高质量的融合图像,结合丰富的语义和细节信息,从而满足后续高级视觉任务的需要。大量实验表明,DPSEF产生的融合图像的视觉质量明显优于主流算法。定性和定量分析进一步证实了DPSEF在高级视觉应用中的强大潜力。此外,多焦点图像融合实验验证了该网络的通用性和鲁棒性。
{"title":"A DINO-based progressive semantic enhanced infrared and visible image fusion network","authors":"Shihan Yao ,&nbsp;Zhonghui Pei ,&nbsp;Huiqin Zhang ,&nbsp;Haiyang Jiang ,&nbsp;Huabing Zhou","doi":"10.1016/j.neunet.2025.108527","DOIUrl":"10.1016/j.neunet.2025.108527","url":null,"abstract":"<div><div>Infrared and visible image fusion aims to integrate complementary information from two source images into a single fused image with rich detail. However, most existing fusion methods focus on visual appearance and pay little attention to the semantic requirements of downstream applications. Although some semantic driven approaches enhance the semantic content of fused images, they rely on labelled data that contain only limited semantic target information.To address this limitation, this paper proposes a DINO-based progressive semantic enhanced infrared and visible image fusion network (DPSEF). DINO is a self supervised model that learns representations from large volumes of unlabelled images and exhibits powerful spatial semantic clustering capabilities. We exploit DINO to extract fine grained spatial semantic features as prior knowledge, and then introduce a semantic enhanced fusion module (SEFM) that progressively injects these semantic priors into the fusion network. This mechanism guides the model to focus on target relevant regions and generates high quality fused images that combine rich semantic and detailed information, thereby meeting the needs of subsequent high level vision tasks.Extensive experiments demonstrate that DPSEF produces fused images whose visual quality significantly exceeds that of mainstream algorithms. Qualitative and quantitative analyses further confirm the strong potential of DPSEF in high level vision applications. Moreover, additional experiments on multi focus image fusion validate the generality and robustness of the proposed network.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"197 ","pages":"Article 108527"},"PeriodicalIF":6.3,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ClinReadNet: A clinical reading-inspired network for low-dose abdominal CT image quality assessment ClinReadNet:一个临床阅读启发网络,用于低剂量腹部CT图像质量评估。
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-31 DOI: 10.1016/j.neunet.2025.108535
Xianye Xiao , Yulong Zou , Yujie Luo , Taihui Yu , Cun-Jing Zheng , Yuan-ming Geng , Shuihua Wang , Yudong Zhang , Jin Hong
In abdominal CT imaging, optimizing the balance between radiation dose and image quality is crucial, and the primary prerequisite is accurate image quality assessment. Clinical practice uses doctors' subjective judgment as the gold standard, but it is time-consuming and costly; therefore, developing a low-dose, no-reference image quality assessment (No-reference IQA) model that mimics doctors' reading habits for evaluating CT image quality has significant practical value. This paper proposes a novel deep learning-based framework, ClinReadNet, whose design aligns with the clinical reading logic of radiologists: first, it introduces the Sobel ordinal quality network (SOQN) module, which can simultaneously focus on edge details highly relevant to image quality and the quality distribution pattern of the entire image, accurately matching the clinical image-reading judgment habit of "considering both local details and overall context"; second, the framework integrates the (shifted) window multi-scale temperature multi-head self-attention ((S)W-MTMSA) module, which further replicates the radiologists' image-reading process of shifting from overall scanning to local focusing, and accurately locks in regions of interest through multi-sharpness attention; third, it designs the hierarchical ranked probability score (HRPS) loss function, which combines the dual logics of coarse classification and fine classification, while paying attention to the distance information between grading labels, effectively improving the performance of image quality assessment. Experiments conducted on the LDCTIQAG2023 dataset show that the proposed method achieves the current state-of-the-art (SOTA) performance: the values of Pearson’s linear correlation coefficient (PLCC), Spearman’s rank-order correlation coefficient (SROCC), and Kendall’s rank-order correlation coefficient (KROCC) reach 0.9507, 0.9554, and 0.8629 respectively, with the sum of their absolute values (Score) being 2.7690, outperforming existing methods.
在腹部CT成像中,优化辐射剂量与图像质量之间的平衡至关重要,而准确的图像质量评估是首要前提。临床实践以医生的主观判断为金标准,但费时费钱;因此,开发一种模仿医生阅读习惯的低剂量无参考图像质量评估(no-reference IQA)模型来评估CT图像质量具有重要的实用价值。本文提出了一种新颖的基于深度学习的框架ClinReadNet,其设计与放射科医生的临床阅读逻辑一致:首先,引入了Sobel序数质量网络(SOQN)模块,该模块可以同时关注与图像质量高度相关的边缘细节和整个图像的质量分布模式,准确匹配“兼顾局部细节和整体背景”的临床图像阅读判断习惯;其次,该框架集成了(移位)窗口多尺度温度多头自关注(W-MTMSA)模块,进一步复制了放射科医生从整体扫描转向局部聚焦的图像读取过程,并通过多清晰度关注准确锁定感兴趣区域;第三,设计了层次排序概率分数(HRPS)损失函数,该函数结合了粗分类和精分类的双重逻辑,同时关注了分级标签之间的距离信息,有效提高了图像质量评估的性能。在LDCTIQAG2023数据集上进行的实验表明,该方法达到了当前最先进(SOTA)的性能:Pearson线性相关系数(PLCC)、Spearman秩序相关系数(SROCC)和Kendall秩序相关系数(KROCC)分别达到0.9507、0.9554和0.8629,其绝对值之和(Score)为2.7690,优于现有方法。
{"title":"ClinReadNet: A clinical reading-inspired network for low-dose abdominal CT image quality assessment","authors":"Xianye Xiao ,&nbsp;Yulong Zou ,&nbsp;Yujie Luo ,&nbsp;Taihui Yu ,&nbsp;Cun-Jing Zheng ,&nbsp;Yuan-ming Geng ,&nbsp;Shuihua Wang ,&nbsp;Yudong Zhang ,&nbsp;Jin Hong","doi":"10.1016/j.neunet.2025.108535","DOIUrl":"10.1016/j.neunet.2025.108535","url":null,"abstract":"<div><div>In abdominal CT imaging, optimizing the balance between radiation dose and image quality is crucial, and the primary prerequisite is accurate image quality assessment. Clinical practice uses doctors' subjective judgment as the gold standard, but it is time-consuming and costly; therefore, developing a low-dose, no-reference image quality assessment (No-reference IQA) model that mimics doctors' reading habits for evaluating CT image quality has significant practical value. This paper proposes a novel deep learning-based framework, ClinReadNet, whose design aligns with the clinical reading logic of radiologists: first, it introduces the Sobel ordinal quality network (SOQN) module, which can simultaneously focus on edge details highly relevant to image quality and the quality distribution pattern of the entire image, accurately matching the clinical image-reading judgment habit of \"considering both local details and overall context\"; second, the framework integrates the (shifted) window multi-scale temperature multi-head self-attention ((S)W-MTMSA) module, which further replicates the radiologists' image-reading process of shifting from overall scanning to local focusing, and accurately locks in regions of interest through multi-sharpness attention; third, it designs the hierarchical ranked probability score (HRPS) loss function, which combines the dual logics of coarse classification and fine classification, while paying attention to the distance information between grading labels, effectively improving the performance of image quality assessment. Experiments conducted on the LDCTIQAG2023 dataset show that the proposed method achieves the current state-of-the-art (SOTA) performance: the values of Pearson’s linear correlation coefficient (PLCC), Spearman’s rank-order correlation coefficient (SROCC), and Kendall’s rank-order correlation coefficient (KROCC) reach 0.9507, 0.9554, and 0.8629 respectively, with the sum of their absolute values (Score) being 2.7690, outperforming existing methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"197 ","pages":"Article 108535"},"PeriodicalIF":6.3,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multispectral remote sensing object detection via selective cross-modal interaction and aggregation 基于选择性跨模态交互和聚合的多光谱遥感目标检测
IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.neunet.2025.108533
Minghao Cui , Jing Nie , Hanqing Sun , Jin Xie , Jiale Cao , Yanwei Pang , Xuelong Li
Multispectral remote sensing object detection plays a vital role in a wide range of geoscience and remote sensing applications, such as environmental monitoring and disaster monitoring, by leveraging complementary information from RGB and infrared modalities. The performance of such systems heavily relies on the effective fusion of information across these modalities. A key challenge lies in capturing meaningful cross-modal long-range dependencies that aid object localization and identification, while simultaneously suppressing noise and irrelevant information during feature fusion to enhance the discriminative quality of the fused representations. To address these challenges, we propose a novel framework, termed Selective cross-modal Interaction and Aggregation (SIA), which comprises two key components: the Selective Cross-modal Interaction (SCI) module and the Selective Feature Aggregation (SFA) module. The SCI module addresses the inefficiency of traditional cross-modal attention mechanisms by selectively prioritizing the most informative long-range dependencies. This significantly reduces computational costs while maintaining high detection accuracy. The SFA module utilizes a gating mechanism to effectively filter out noise and redundant information introduced by equal-weight fusion, thereby yielding more discriminative feature representations. Comprehensive experiments are conducted on the challenging multispectral remote sensing object detection benchmark DroneVehicle, as well as two additional multispectral urban object detection datasets, M3FD and LLVIP. The proposed approach consistently achieves superior detection accuracy across all datasets. Notably, on the DroneVehicle test set, our method outperforms the recently introduced C2Former by 2.8% [email protected], while incurring lower computational cost.
通过利用RGB和红外模式的互补信息,多光谱遥感目标探测在环境监测和灾害监测等广泛的地球科学和遥感应用中发挥着至关重要的作用。这些系统的性能在很大程度上依赖于这些模式之间信息的有效融合。一个关键的挑战在于捕获有意义的跨模态远程依赖关系,以帮助目标定位和识别,同时在特征融合过程中抑制噪声和不相关信息,以增强融合表征的判别质量。为了解决这些挑战,我们提出了一个新的框架,称为选择性跨模态交互和聚合(SIA),它包括两个关键组件:选择性跨模态交互(SCI)模块和选择性特征聚合(SFA)模块。SCI模块通过选择性地优先处理信息量最大的远程依赖关系,解决了传统跨模式注意机制的低效率问题。这大大降低了计算成本,同时保持了较高的检测精度。SFA模块利用门控机制有效滤除等权融合引入的噪声和冗余信息,从而产生更具判别性的特征表示。在具有挑战性的多光谱遥感目标检测基准无人机上,以及M3FD和LLVIP两个额外的多光谱城市目标检测数据集上进行了综合实验。所提出的方法在所有数据集上都能保持较高的检测精度。值得注意的是,在无人机测试集上,我们的方法比最近推出的C2Former方法性能好2.8%,同时计算成本更低。
{"title":"Multispectral remote sensing object detection via selective cross-modal interaction and aggregation","authors":"Minghao Cui ,&nbsp;Jing Nie ,&nbsp;Hanqing Sun ,&nbsp;Jin Xie ,&nbsp;Jiale Cao ,&nbsp;Yanwei Pang ,&nbsp;Xuelong Li","doi":"10.1016/j.neunet.2025.108533","DOIUrl":"10.1016/j.neunet.2025.108533","url":null,"abstract":"<div><div>Multispectral remote sensing object detection plays a vital role in a wide range of geoscience and remote sensing applications, such as environmental monitoring and disaster monitoring, by leveraging complementary information from RGB and infrared modalities. The performance of such systems heavily relies on the effective fusion of information across these modalities. A key challenge lies in capturing meaningful cross-modal long-range dependencies that aid object localization and identification, while simultaneously suppressing noise and irrelevant information during feature fusion to enhance the discriminative quality of the fused representations. To address these challenges, we propose a novel framework, termed Selective cross-modal Interaction and Aggregation (SIA), which comprises two key components: the Selective Cross-modal Interaction (SCI) module and the Selective Feature Aggregation (SFA) module. The SCI module addresses the inefficiency of traditional cross-modal attention mechanisms by selectively prioritizing the most informative long-range dependencies. This significantly reduces computational costs while maintaining high detection accuracy. The SFA module utilizes a gating mechanism to effectively filter out noise and redundant information introduced by equal-weight fusion, thereby yielding more discriminative feature representations. Comprehensive experiments are conducted on the challenging multispectral remote sensing object detection benchmark DroneVehicle, as well as two additional multispectral urban object detection datasets, M<sup>3</sup>FD and LLVIP. The proposed approach consistently achieves superior detection accuracy across all datasets. Notably, on the DroneVehicle <em>test</em> set, our method outperforms the recently introduced C<sup>2</sup>Former by 2.8% [email protected], while incurring lower computational cost.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"197 ","pages":"Article 108533"},"PeriodicalIF":6.3,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1