首页 > 最新文献

Neurocomputing最新文献

英文 中文
Contrastive coarse-to-fine medical segmentation with prototype guidance and dual-granularity fusion 基于原型引导和双粒度融合的对比粗精医学分割
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.neucom.2025.132603
Zekai Liu, Muxi Li, Fei Yang
Recent advances in the Segment Anything Model (SAM) have demonstrated remarkable zero-shot segmentation and interactive editing capabilities in general computer vision. However, adapting SAM to medical imaging remains challenging due to substantial gaps in imaging physics, contrast distributions, and structural priors between natural and medical images. Achieving optimal performance typically requires extensive fine-tuning on large-scale medical datasets and high-quality manual prompts. To address these limitations, we propose CCF-SAM, a parameter-efficient adaptation framework for medical segmentation. With the SAM image encoder frozen, CCF-SAM constructs a coarse-to-fine, two-stage pipeline: a standard decoder first generates a coarse prior mask, which is then refined by a Contrastive Prototype Refiner that softly disentangles foreground/background tokens and enhances their discriminability via token-level contrastive learning. An EMA-based prototype memory accumulates stable semantic anchors across images, and a cross-attention re-embedding module injects the enhanced prototypes into spatial features to drive fine-grained decoding. The framework trains only the prompt encoder, the mask decoder, and the newly added lightweight modules, substantially reducing training costs while ensuring reproducibility. Comprehensive evaluations and ablations on multiple representative 2D medical segmentation benchmarks show that CCF-SAM consistently outperforms classical CNN/Transformer baselines and recent SAM-based approaches. Qualitative results further indicate superior recall and boundary consistency on small or low-contrast lesions, validating a prototype-guided, progressive refinement paradigm for adapting SAM to medical imaging. Code is publicly available at https://github.com/KKKKAAAAIIII/CCF-SAM.
片段任意模型(SAM)的最新进展已经在通用计算机视觉中展示了显著的零镜头分割和交互式编辑能力。然而,将SAM应用于医学成像仍然具有挑战性,因为在成像物理、对比度分布和自然图像与医学图像之间的结构先验方面存在巨大差距。实现最佳性能通常需要对大规模医疗数据集和高质量的手动提示进行广泛的微调。为了解决这些限制,我们提出了CCF-SAM,一种参数有效的医学分割自适应框架。随着SAM图像编码器的冻结,CCF-SAM构建了一个从粗到细的两阶段管道:标准解码器首先生成一个粗先验掩码,然后由对比原型细化器进行细化,该细化器会轻轻地分离前景/背景令牌,并通过令牌级对比学习增强其可辨别性。基于ema的原型记忆在图像间积累稳定的语义锚点,跨注意再嵌入模块将增强的原型注入到空间特征中,驱动细粒度解码。该框架只训练提示编码器、掩码解码器和新添加的轻量级模块,在确保可重复性的同时大大降低了训练成本。对多个代表性二维医学分割基准的综合评估和分析表明,CCF-SAM始终优于经典的CNN/Transformer基线和最近基于sam的方法。定性结果进一步表明,在小病变或低对比度病变上具有更好的召回率和边界一致性,验证了将SAM应用于医学成像的原型引导、渐进改进范例。代码可在https://github.com/KKKKAAAAIIII/CCF-SAM上公开获取。
{"title":"Contrastive coarse-to-fine medical segmentation with prototype guidance and dual-granularity fusion","authors":"Zekai Liu,&nbsp;Muxi Li,&nbsp;Fei Yang","doi":"10.1016/j.neucom.2025.132603","DOIUrl":"10.1016/j.neucom.2025.132603","url":null,"abstract":"<div><div>Recent advances in the Segment Anything Model (SAM) have demonstrated remarkable zero-shot segmentation and interactive editing capabilities in general computer vision. However, adapting SAM to medical imaging remains challenging due to substantial gaps in imaging physics, contrast distributions, and structural priors between natural and medical images. Achieving optimal performance typically requires extensive fine-tuning on large-scale medical datasets and high-quality manual prompts. To address these limitations, we propose CCF-SAM, a parameter-efficient adaptation framework for medical segmentation. With the SAM image encoder frozen, CCF-SAM constructs a coarse-to-fine, two-stage pipeline: a standard decoder first generates a coarse prior mask, which is then refined by a Contrastive Prototype Refiner that softly disentangles foreground/background tokens and enhances their discriminability via token-level contrastive learning. An EMA-based prototype memory accumulates stable semantic anchors across images, and a cross-attention re-embedding module injects the enhanced prototypes into spatial features to drive fine-grained decoding. The framework trains only the prompt encoder, the mask decoder, and the newly added lightweight modules, substantially reducing training costs while ensuring reproducibility. Comprehensive evaluations and ablations on multiple representative 2D medical segmentation benchmarks show that CCF-SAM consistently outperforms classical CNN/Transformer baselines and recent SAM-based approaches. Qualitative results further indicate superior recall and boundary consistency on small or low-contrast lesions, validating a prototype-guided, progressive refinement paradigm for adapting SAM to medical imaging. Code is publicly available at <span><span>https://github.com/KKKKAAAAIIII/CCF-SAM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"670 ","pages":"Article 132603"},"PeriodicalIF":6.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dismantling strategies for cost networks based on multi-view deep learning 基于多视图深度学习的成本网络分解策略
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.neucom.2025.132576
Xuetong Li, Xiao-Dong Zhang
Studying network dismantling strategies is crucial for both attacking malicious networks and ensuring the security of one’s own network, as the entity relationships of systems, such as social groups, supply chains, can be described by a network. Existing research primarily focuses on two types of strategies: Network Dismantling (ND) which emphasizes the rapid destruction of network structures, while Generalized Network Dismantling (GND) which additionally considers attack costs. However, neither strategy takes into account the heterogeneity of attack transfer. To address this limitation, we introduce cost networks, which provide a more comprehensive representation of attack costs by incorporating both node and edge heterogeneity. To achieve rapid network connectivity disruption and cost efficiency, two new models, Epi_CND and Ins_CND, are proposed for cost network dismantling. Reinforcement learning serves as the core solution framework, complemented by a novel multi-view graph neural network designed for information extraction. Experimental validation on both synthetic and real cost networks demonstrates that the proposed algorithms exhibit significant effectiveness and superiority.
研究网络拆解策略对于攻击恶意网络和保证自身网络安全都至关重要,因为社会群体、供应链等系统的实体关系可以用网络来描述。现有的研究主要集中在两种策略上:一种是强调快速破坏网络结构的网络拆除策略(Network demolition, ND),另一种是考虑攻击代价的广义网络拆除策略(Generalized Network demolition, GND)。然而,这两种策略都没有考虑到攻击转移的异质性。为了解决这一限制,我们引入了成本网络,它通过结合节点和边缘异质性来提供更全面的攻击成本表示。为了实现快速的网络连接中断和成本效率,提出了两种新的模型,Epi_CND和Ins_CND,用于成本网络拆除。强化学习作为核心解决方案框架,辅以一种新的多视图图神经网络,用于信息提取。在合成成本网络和真实成本网络上的实验验证表明,所提出的算法具有显著的有效性和优越性。
{"title":"Dismantling strategies for cost networks based on multi-view deep learning","authors":"Xuetong Li,&nbsp;Xiao-Dong Zhang","doi":"10.1016/j.neucom.2025.132576","DOIUrl":"10.1016/j.neucom.2025.132576","url":null,"abstract":"<div><div>Studying network dismantling strategies is crucial for both attacking malicious networks and ensuring the security of one’s own network, as the entity relationships of systems, such as social groups, supply chains, can be described by a network. Existing research primarily focuses on two types of strategies: Network Dismantling (ND) which emphasizes the rapid destruction of network structures, while Generalized Network Dismantling (GND) which additionally considers attack costs. However, neither strategy takes into account the heterogeneity of attack transfer. To address this limitation, we introduce cost networks, which provide a more comprehensive representation of attack costs by incorporating both node and edge heterogeneity. To achieve rapid network connectivity disruption and cost efficiency, two new models, Epi_CND and Ins_CND, are proposed for cost network dismantling. Reinforcement learning serves as the core solution framework, complemented by a novel multi-view graph neural network designed for information extraction. Experimental validation on both synthetic and real cost networks demonstrates that the proposed algorithms exhibit significant effectiveness and superiority.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"670 ","pages":"Article 132576"},"PeriodicalIF":6.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LECMARL: A cooperative multi-agent reinforcement learning method based on lazy mechanisms and efficient exploration LECMARL:一种基于惰性机制和高效探索的协同多智能体强化学习方法
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.neucom.2025.132578
Yukang Cao, Quan Liu, Hongzhe Liu, Renyang You
In the domain of Multi-Agent Reinforcement Learning (MARL), agents learn optimal policies by interacting with their environment to maximize their cumulative rewards. Nevertheless, a prevalent challenge in many real-world scenarios is reward sparsity. Agents receive informative feedback only upon accomplishing specific goals, with minimal guidance during intermediate steps. This sparsity of rewards significantly complicates the learning process, as agents face considerable difficulty in discovering effective strategies through trial and error. To address this issue, we propose the Lazy-Efficient Cooperative Multi-Agent Reinforcement Learning (LECMARL) approach. It is specifically designed to address the challenges of sparse rewards and inefficient exploration in MARL. LECMARL integrates a lazy mechanism with an efficient exploration policy. This integration notably enhances agents’ cooperative capabilities and learning efficiency in complex environments. Experimental results demonstrate the superiority of LECMARL over baseline methods across diverse MARL tasks, particularly in environments characterized by sparse rewards. These findings highlight LECMARL’s potential to revolutionize learning in challenging multi-agent scenarios.
在多智能体强化学习(MARL)领域,智能体通过与其环境的交互来学习最优策略,以最大化其累积奖励。然而,在许多现实场景中普遍存在的挑战是奖励稀疏性。代理商只有在完成特定目标时才能获得信息反馈,中间步骤的指导很少。这种奖励的稀疏性显著地使学习过程复杂化,因为智能体在通过试错发现有效策略时面临相当大的困难。为了解决这个问题,我们提出了Lazy-Efficient Cooperative Multi-Agent Reinforcement Learning (LECMARL)方法。它是专门设计来解决MARL中稀疏奖励和低效探索的挑战的。LECMARL将惰性机制与高效的探索策略集成在一起。这种集成显著提高了智能体在复杂环境下的合作能力和学习效率。实验结果表明LECMARL在不同的MARL任务中优于基线方法,特别是在以稀疏奖励为特征的环境中。这些发现突出了LECMARL在具有挑战性的多智能体场景中彻底改变学习的潜力。
{"title":"LECMARL: A cooperative multi-agent reinforcement learning method based on lazy mechanisms and efficient exploration","authors":"Yukang Cao,&nbsp;Quan Liu,&nbsp;Hongzhe Liu,&nbsp;Renyang You","doi":"10.1016/j.neucom.2025.132578","DOIUrl":"10.1016/j.neucom.2025.132578","url":null,"abstract":"<div><div>In the domain of Multi-Agent Reinforcement Learning (MARL), agents learn optimal policies by interacting with their environment to maximize their cumulative rewards. Nevertheless, a prevalent challenge in many real-world scenarios is reward sparsity. Agents receive informative feedback only upon accomplishing specific goals, with minimal guidance during intermediate steps. This sparsity of rewards significantly complicates the learning process, as agents face considerable difficulty in discovering effective strategies through trial and error. To address this issue, we propose the Lazy-Efficient Cooperative Multi-Agent Reinforcement Learning (LECMARL) approach. It is specifically designed to address the challenges of sparse rewards and inefficient exploration in MARL. LECMARL integrates a lazy mechanism with an efficient exploration policy. This integration notably enhances agents’ cooperative capabilities and learning efficiency in complex environments. Experimental results demonstrate the superiority of LECMARL over baseline methods across diverse MARL tasks, particularly in environments characterized by sparse rewards. These findings highlight LECMARL’s potential to revolutionize learning in challenging multi-agent scenarios.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"671 ","pages":"Article 132578"},"PeriodicalIF":6.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145941448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finite-time analysis of simultaneous double Q-learning 同时双q学习的有限时间分析
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.neucom.2025.132581
Hyunjun Na, Donghwan Lee
Q-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the Q-learning update. To address this issue, double Q-learning employs two independent Q-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double Q-learning, called simultaneous double Q-learning (SDQ), along with its finite-time analysis. SDQ eliminates the need for random selection between the two Q-estimators, and this modification allows us to analyze double Q-learning through the lens of a novel switching system framework facilitating efficient finite-time analysis. Empirical studies demonstrate that SDQ converges faster than double Q-learning while retaining the ability to mitigate the maximization bias. Finally, we derive a finite-time expected error bound for SDQ.
q学习是最基本的强化学习(RL)算法之一。尽管它在各种应用中取得了广泛的成功,但它在Q-learning更新中容易出现高估偏差。为了解决这个问题,双q学习使用两个独立的q估计器,它们在学习过程中随机选择和更新。本文提出了一种改进的双q学习,称为同步双q学习(SDQ),并对其进行了有限时间分析。SDQ消除了在两个q估计器之间进行随机选择的需要,这种修改使我们能够通过一种新的交换系统框架来分析双q学习,从而促进有效的有限时间分析。实证研究表明,SDQ收敛速度比双q学习更快,同时保留了减轻最大化偏差的能力。最后,我们导出了SDQ的有限时间期望误差界。
{"title":"Finite-time analysis of simultaneous double Q-learning","authors":"Hyunjun Na,&nbsp;Donghwan Lee","doi":"10.1016/j.neucom.2025.132581","DOIUrl":"10.1016/j.neucom.2025.132581","url":null,"abstract":"<div><div><span><math><mi>Q</mi></math></span>-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the <span><math><mi>Q</mi></math></span>-learning update. To address this issue, double <span><math><mi>Q</mi></math></span>-learning employs two independent <span><math><mi>Q</mi></math></span>-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double <span><math><mi>Q</mi></math></span>-learning, called simultaneous double <span><math><mi>Q</mi></math></span>-learning (SDQ), along with its finite-time analysis. SDQ eliminates the need for random selection between the two <span><math><mi>Q</mi></math></span>-estimators, and this modification allows us to analyze double <span><math><mi>Q</mi></math></span>-learning through the lens of a novel switching system framework facilitating efficient finite-time analysis. Empirical studies demonstrate that SDQ converges faster than double <span><math><mi>Q</mi></math></span>-learning while retaining the ability to mitigate the maximization bias. Finally, we derive a finite-time expected error bound for SDQ.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"671 ","pages":"Article 132581"},"PeriodicalIF":6.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UniBVR: Balancing visual and reasoning abilities in unified 3D scene understanding UniBVR:在统一的3D场景理解中平衡视觉和推理能力
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.neucom.2025.132599
Panqi Yang, Haodong Jing, Nanning Zheng, Yongqiang Ma
Recent advances in Large Language Models (LLMs) enable remarkable general-purpose task-solving in computer vision, robotics, and beyond. Although LLMs perform well in 2D tasks, their adaptation to 3D scene understanding faces critical challenges: (1) the inherent complexity of 3D spatial relationships and multimodal alignment, and (2) the performance imbalance between vision-centric tasks and reasoning-centric tasks. Existing approaches either develop specialized models for individual tasks or rely on LLM fine-tuning with limited visual grounding capabilities, failing to achieve unified 3D scene understanding. To bridge this gap, we propose UniBVR, a Unified framework that Balances Visual and Reasoning abilities through two innovative components: (i) task-agnostic Align-Former module that establishes fine-grained 3D vision-language correspondence through cross-modal attention, and (ii) task-specific lightweight decoders that dynamically generate diverse outputs (texts, boxes or masks) via efficient routing. To mitigate task imbalance, we design a multi-task balancing strategy that automatically adjusts loss weights based on task difficulty. Experiments on seven benchmarks (ScanRefer, Nr3D, ScanQA, etc.) achieve state-of-the-art results, with gains of 5.8% (3D-VG), 4.3% (3D-DC), and 6.1% (3D-QA) over prior methods.
大型语言模型(llm)的最新进展使计算机视觉、机器人等领域的通用任务解决变得非常出色。尽管llm在2D任务中表现良好,但其对3D场景理解的适应面临着严峻的挑战:(1)3D空间关系和多模态对齐的固有复杂性;(2)以视觉为中心的任务和以推理为中心的任务之间的性能不平衡。现有的方法要么为单个任务开发专门的模型,要么依赖LLM微调,视觉基础能力有限,无法实现统一的3D场景理解。为了弥合这一差距,我们提出了UniBVR,这是一个统一的框架,通过两个创新组件来平衡视觉和推理能力:(i)通过跨模态注意建立细粒度3D视觉语言对应的任务无关的对齐前模块,以及(ii)通过有效路由动态生成不同输出(文本,框或掩码)的任务特定轻量级解码器。为了缓解任务不平衡,我们设计了一种基于任务难度自动调整损失权重的多任务平衡策略。在七个基准(scanreference, Nr3D, ScanQA等)上的实验获得了最先进的结果,与先前的方法相比,增益为5.8% (3D-VG), 4.3% (3D-DC)和6.1% (3D-QA)。
{"title":"UniBVR: Balancing visual and reasoning abilities in unified 3D scene understanding","authors":"Panqi Yang,&nbsp;Haodong Jing,&nbsp;Nanning Zheng,&nbsp;Yongqiang Ma","doi":"10.1016/j.neucom.2025.132599","DOIUrl":"10.1016/j.neucom.2025.132599","url":null,"abstract":"<div><div>Recent advances in Large Language Models (LLMs) enable remarkable general-purpose task-solving in computer vision, robotics, and beyond. Although LLMs perform well in 2D tasks, their adaptation to 3D scene understanding faces critical challenges: (1) the inherent complexity of 3D spatial relationships and multimodal alignment, and (2) the performance imbalance between vision-centric tasks and reasoning-centric tasks. Existing approaches either develop specialized models for individual tasks or rely on LLM fine-tuning with limited visual grounding capabilities, failing to achieve unified 3D scene understanding. To bridge this gap, we propose <strong>UniBVR</strong>, a <strong>U</strong>nified framework that <strong>B</strong>alances <strong>V</strong>isual and <strong>R</strong>easoning abilities through two innovative components: (i) <em>task-agnostic</em> Align-Former module that establishes fine-grained 3D vision-language correspondence through cross-modal attention, and (ii) <em>task-specific</em> lightweight decoders that dynamically generate diverse outputs (texts, boxes or masks) via efficient routing. To mitigate task imbalance, we design a multi-task balancing strategy that automatically adjusts loss weights based on task difficulty. Experiments on seven benchmarks (ScanRefer, Nr3D, ScanQA, etc.) achieve state-of-the-art results, with gains of 5.8% (3D-VG), 4.3% (3D-DC), and 6.1% (3D-QA) over prior methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"671 ","pages":"Article 132599"},"PeriodicalIF":6.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised deep hashing based on multi-scale aggregation and optimal transport matching for image retrieval 基于多尺度聚合和最优传输匹配的无监督深度哈希图像检索
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-06 DOI: 10.1016/j.neucom.2025.132590
Lei Ma , Hao Pei , Lei Wang , Ying Zhu , Yu Shi , Hanyu Hong , Xinyu Dai , Fanman Meng , Qingbo Wu
Unsupervised hashing image retrieval retrieves images semantically similar to query images by encoding images into compact and discriminative hash codes without labels. Recently, reconstruction-based methods and contrastive learning-based methods have improved unsupervised hashing by preserving information about original images or learning distortion-invariant hash codes. Nevertheless, these methods suffer from background interference and changes in object scales, which hinder capturing precise similarity. Amidst complex background interference and without label supervision, existing unsupervised deep hashing methods are prone to overfitting specific samples rather than capturing discriminative object patterns, which leads to weak retrieval performance and limited generalization capability. Furthermore, contrastive learning-based methods ignore the dense semantic correspondences between diverse augmented views, which makes learning semantically-invariant information challenging and hinders the convergence of models. To alleviate these problems, we propose an unsupervised hashing framework based on multi-scale aggregation and optimal transport matching for large-scale image retrieval. Specifically, we use a multi-scale aggregation module to divide feature maps into different scales and perform aggregation to obtain robust feature representations, which accurately describe similarity by incorporating cross-image variation information into feature representations to alleviate the impact of changes in object scale. Subsequently, we build an optimal transport matching module to facilitate capturing semantically-invariant information between positive samples by pixel-level dense semantic correspondence, which accurately captures similarities of foreground objects and mitigates the interferences of the intra-class variation and background by integrating discriminative pixel-level semantics into feature representations. The experimental results in image retrieval indicate that the proposed approach outperforms the state-of-the-art unsupervised hashing approaches.
无监督哈希图像检索通过将图像编码为不带标签的紧凑判别哈希码来检索语义上类似于查询图像的图像。最近,基于重建的方法和基于对比学习的方法通过保留原始图像的信息或学习畸变不变性哈希码来改进无监督哈希。然而,这些方法受到背景干扰和目标尺度变化的影响,阻碍了精确的相似性捕获。在复杂的背景干扰和没有标签监督的情况下,现有的无监督深度哈希方法容易对特定样本进行过拟合,而不能捕捉到有区别的对象模式,导致检索性能较差,泛化能力有限。此外,基于对比学习的方法忽略了不同增强视图之间的密集语义对应关系,这使得学习语义不变信息变得困难,阻碍了模型的收敛。为了解决这些问题,我们提出了一种基于多尺度聚合和最优传输匹配的无监督哈希框架,用于大规模图像检索。具体而言,我们使用多尺度聚合模块将特征映射划分为不同的尺度并进行聚合,从而获得鲁棒的特征表示,该特征表示通过将跨图像变化信息纳入特征表示来准确描述相似性,以减轻目标尺度变化的影响。随后,我们构建了最优传输匹配模块,通过像素级密集语义对应来捕获正样本之间的语义不变信息,准确捕获前景对象的相似性,并通过将判别性像素级语义集成到特征表示中,减轻类内变化和背景的干扰。图像检索的实验结果表明,该方法优于目前最先进的无监督哈希方法。
{"title":"Unsupervised deep hashing based on multi-scale aggregation and optimal transport matching for image retrieval","authors":"Lei Ma ,&nbsp;Hao Pei ,&nbsp;Lei Wang ,&nbsp;Ying Zhu ,&nbsp;Yu Shi ,&nbsp;Hanyu Hong ,&nbsp;Xinyu Dai ,&nbsp;Fanman Meng ,&nbsp;Qingbo Wu","doi":"10.1016/j.neucom.2025.132590","DOIUrl":"10.1016/j.neucom.2025.132590","url":null,"abstract":"<div><div>Unsupervised hashing image retrieval retrieves images semantically similar to query images by encoding images into compact and discriminative hash codes without labels. Recently, reconstruction-based methods and contrastive learning-based methods have improved unsupervised hashing by preserving information about original images or learning distortion-invariant hash codes. Nevertheless, these methods suffer from background interference and changes in object scales, which hinder capturing precise similarity. Amidst complex background interference and without label supervision, existing unsupervised deep hashing methods are prone to overfitting specific samples rather than capturing discriminative object patterns, which leads to weak retrieval performance and limited generalization capability. Furthermore, contrastive learning-based methods ignore the dense semantic correspondences between diverse augmented views, which makes learning semantically-invariant information challenging and hinders the convergence of models. To alleviate these problems, we propose an unsupervised hashing framework based on multi-scale aggregation and optimal transport matching for large-scale image retrieval. Specifically, we use a multi-scale aggregation module to divide feature maps into different scales and perform aggregation to obtain robust feature representations, which accurately describe similarity by incorporating cross-image variation information into feature representations to alleviate the impact of changes in object scale. Subsequently, we build an optimal transport matching module to facilitate capturing semantically-invariant information between positive samples by pixel-level dense semantic correspondence, which accurately captures similarities of foreground objects and mitigates the interferences of the intra-class variation and background by integrating discriminative pixel-level semantics into feature representations. The experimental results in image retrieval indicate that the proposed approach outperforms the state-of-the-art unsupervised hashing approaches.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"671 ","pages":"Article 132590"},"PeriodicalIF":6.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EEGDiffuser: Label-guided EEG signals synthesis via diffusion model for BCI applications EEGDiffuser:标签引导的脑电信号合成通过扩散模型的脑机接口应用
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.neucom.2026.132636
Jiquan Wang , Sha Zhao , Zhiling Luo , Yangxuan Zhou , Shijian Li , Gang Pan
The limited availability of labeled task-related electroencephalography (EEG) data continues to hinder progress in brain-computer interface (BCI) research. Acquiring and annotating EEG signals under specific experimental conditions is often labor-intensive and time-consuming, creating significant challenges in scaling data collection and ensuring model robustness. To alleviate this constraint, we propose EEGDiffuser, a diffusion-based generative model designed to synthesize EEG signals conditioned on task labels. EEGDiffuser formulates EEG synthesis as a reverse stochastic process guided by learned score functions, progressively refining Gaussian noise into structured signals. To ensure that the generated samples align with specific experimental conditions, a conditional guidance mechanism is introduced to incorporate label information during generation. By modeling key neurophysiological characteristics of EEG, EEGDiffuser is able to generate realistic and label-consistent EEG signals. Empirical evaluations across diverse tasks and decoder architectures demonstrate that incorporating synthetic signals consistently improves decoding performance, even in low-resource scenarios. On three benchmark datasets (FACED, BCIC-IV-2a, and BCIC2020-3), EEGDiffuser yields a relative improvement of 4%–6% in over real-data-only baselines and consistently outperforms existing EEG synthesis methods. Further analysis shows that the generated EEG data preserve neurophysiologically relevant patterns, such as emotion-related topographies and motor imagery activation distributions, comparable to those observed in real data collected under specific conditions. EEGDiffuser highlights the potential of diffusion-based generative modeling to facilitate data-driven EEG research and enable the investigation of neurophysiological patterns in BCI applications.
标记任务相关脑电图(EEG)数据的有限可用性继续阻碍脑机接口(BCI)研究的进展。在特定实验条件下获取和注释EEG信号通常是劳动密集型和耗时的,在扩展数据收集和确保模型鲁棒性方面带来了重大挑战。为了缓解这种限制,我们提出了EEGDiffuser,这是一个基于扩散的生成模型,旨在合成以任务标签为条件的脑电信号。EEGDiffuser将EEG合成作为一个反向随机过程,由学习分数函数引导,逐步将高斯噪声细化为结构化信号。为了确保生成的样本符合特定的实验条件,在生成过程中引入了条件引导机制来整合标签信息。通过对EEG的关键神经生理特征进行建模,EEGDiffuser能够生成真实且标签一致的EEG信号。对不同任务和解码器架构的经验评估表明,即使在资源不足的情况下,合并合成信号也能持续提高解码性能。在三个基准数据集(FACED、bbic - iv -2a和bic2020 -3)上,EEGDiffuser比纯真实数据的基线相对提高了4%-6%,并且始终优于现有的脑电图合成方法。进一步分析表明,生成的脑电图数据保留了神经生理学相关的模式,如情绪相关的地形和运动意象激活分布,与在特定条件下收集的真实数据相媲美。EEGDiffuser强调了基于扩散的生成建模的潜力,以促进数据驱动的EEG研究,并使BCI应用中的神经生理模式的研究成为可能。
{"title":"EEGDiffuser: Label-guided EEG signals synthesis via diffusion model for BCI applications","authors":"Jiquan Wang ,&nbsp;Sha Zhao ,&nbsp;Zhiling Luo ,&nbsp;Yangxuan Zhou ,&nbsp;Shijian Li ,&nbsp;Gang Pan","doi":"10.1016/j.neucom.2026.132636","DOIUrl":"10.1016/j.neucom.2026.132636","url":null,"abstract":"<div><div>The limited availability of labeled task-related electroencephalography (EEG) data continues to hinder progress in brain-computer interface (BCI) research. Acquiring and annotating EEG signals under specific experimental conditions is often labor-intensive and time-consuming, creating significant challenges in scaling data collection and ensuring model robustness. To alleviate this constraint, we propose EEGDiffuser, a diffusion-based generative model designed to synthesize EEG signals conditioned on task labels. EEGDiffuser formulates EEG synthesis as a reverse stochastic process guided by learned score functions, progressively refining Gaussian noise into structured signals. To ensure that the generated samples align with specific experimental conditions, a conditional guidance mechanism is introduced to incorporate label information during generation. By modeling key neurophysiological characteristics of EEG, EEGDiffuser is able to generate realistic and label-consistent EEG signals. Empirical evaluations across diverse tasks and decoder architectures demonstrate that incorporating synthetic signals consistently improves decoding performance, even in low-resource scenarios. On three benchmark datasets (FACED, BCIC-IV-2a, and BCIC2020-3), EEGDiffuser yields a relative improvement of 4%–6% in over real-data-only baselines and consistently outperforms existing EEG synthesis methods. Further analysis shows that the generated EEG data preserve neurophysiologically relevant patterns, such as emotion-related topographies and motor imagery activation distributions, comparable to those observed in real data collected under specific conditions. EEGDiffuser highlights the potential of diffusion-based generative modeling to facilitate data-driven EEG research and enable the investigation of neurophysiological patterns in BCI applications.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"670 ","pages":"Article 132636"},"PeriodicalIF":6.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relative depth knowledge distillation for generalizable monocular depth estimation 广义单目深度估计的相对深度知识精馏
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.neucom.2026.132632
Lulu Zhang , Mankun Li , Meng Yang , Xuguang Lan , Ce Zhu
Monocular depth estimation provides an easily deployable solution for robots to perceive the 3D scene. Existing methods have achieved impressive performance on benchmark datasets. However, these methods tend to overfit to training domains, resulting in limited generalization in the real world. A dominant solution is to train on large-scale datasets featuring high-quality GT depth and precise camera intrinsics, both of which are often unavailable or difficult to obtain. To mitigate this issue, we propose a relative depth knowledge distillation framework to boost the generalization of monocular depth estimation with limited training data. It is based on the insight that recent relative depth foundation models can be trained efficiently on large-scale datasets to capture accurate object structure and general relative depth relationships. More specifically, in the teacher network, we generate relative depth from a pre-trained foundation model and introduce a scale alignment module to ensure its scale consistency with GT depth. In the student network, we infer the depth bin centers and corresponding probabilities to represent the scales and relative depth relationships, respectively, and compute the final depth via their linear combination. Furthermore, we design two novel response-based distillation modules to distill knowledge of relative depth and object structure, respectively, from the teacher to the student. For validation, our model is trained on widely used benchmark datasets in three settings, including indoor NYUDv2, outdoor KITTI, and a mixture of both. Extensive experiments on six unseen indoor and outdoor datasets verify that our model consistently reduces the RMSE of the base model by 3.0%, 5.3%, and 5.4% on average, respectively, and achieves state-of-the-art performance in the three settings. Our model even achieves competitive accuracy when compared to recent models trained on very large-scale datasets.
单目深度估计为机器人感知3D场景提供了一种易于部署的解决方案。现有的方法在基准数据集上取得了令人印象深刻的性能。然而,这些方法倾向于过度拟合训练域,导致在现实世界中的泛化有限。一个主要的解决方案是在具有高质量GT深度和精确相机特性的大规模数据集上进行训练,这两者通常是不可用的或难以获得的。为了解决这个问题,我们提出了一个相对深度知识蒸馏框架,以提高有限训练数据的单目深度估计的泛化。基于这一见解,最近的相对深度基础模型可以在大规模数据集上有效地训练,以捕获准确的目标结构和一般的相对深度关系。更具体地说,在教师网络中,我们从预训练的基础模型中生成相对深度,并引入尺度对齐模块,以确保其尺度与GT深度一致。在学生网络中,我们分别推断深度bin中心和相应的概率来表示尺度和相对深度关系,并通过它们的线性组合计算最终深度。此外,我们设计了两个新的基于响应的提取模块,分别从教师到学生提取相对深度知识和对象结构知识。为了验证,我们的模型在三种设置下广泛使用的基准数据集上进行训练,包括室内NYUDv2,室外KITTI以及两者的混合。在六个未见过的室内和室外数据集上进行的大量实验验证了我们的模型平均分别将基本模型的RMSE降低了3.0%,5.3%和5.4%,并在三种设置下达到了最先进的性能。与最近在大规模数据集上训练的模型相比,我们的模型甚至达到了具有竞争力的准确性。
{"title":"Relative depth knowledge distillation for generalizable monocular depth estimation","authors":"Lulu Zhang ,&nbsp;Mankun Li ,&nbsp;Meng Yang ,&nbsp;Xuguang Lan ,&nbsp;Ce Zhu","doi":"10.1016/j.neucom.2026.132632","DOIUrl":"10.1016/j.neucom.2026.132632","url":null,"abstract":"<div><div>Monocular depth estimation provides an easily deployable solution for robots to perceive the 3D scene. Existing methods have achieved impressive performance on benchmark datasets. However, these methods tend to overfit to training domains, resulting in limited generalization in the real world. A dominant solution is to train on large-scale datasets featuring high-quality GT depth and precise camera intrinsics, both of which are often unavailable or difficult to obtain. To mitigate this issue, we propose a relative depth knowledge distillation framework to boost the generalization of monocular depth estimation with limited training data. It is based on the insight that recent relative depth foundation models can be trained efficiently on large-scale datasets to capture accurate object structure and general relative depth relationships. More specifically, in the teacher network, we generate relative depth from a pre-trained foundation model and introduce a scale alignment module to ensure its scale consistency with GT depth. In the student network, we infer the depth bin centers and corresponding probabilities to represent the scales and relative depth relationships, respectively, and compute the final depth via their linear combination. Furthermore, we design two novel response-based distillation modules to distill knowledge of relative depth and object structure, respectively, from the teacher to the student. For validation, our model is trained on widely used benchmark datasets in three settings, including indoor NYUDv2, outdoor KITTI, and a mixture of both. Extensive experiments on six unseen indoor and outdoor datasets verify that our model consistently reduces the RMSE of the base model by 3.0%, 5.3%, and 5.4% on average, respectively, and achieves state-of-the-art performance in the three settings. Our model even achieves competitive accuracy when compared to recent models trained on very large-scale datasets.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"671 ","pages":"Article 132632"},"PeriodicalIF":6.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Offset-corrected query generation strategies for cross-modality misalignment in 3D object detection: aligning LiDAR and camera 三维目标检测中跨模态不对准的偏移校正查询生成策略:对准激光雷达和相机
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.neucom.2025.132582
Jiayao Li , Chak Fong Cheang , Xiaoyuan Yu , Suigu Tang , Zhaolong Du , Qianxiang Cheng
Although cross-modality data fusion can effectively mitigate the limitations of monomodal approaches in 3D object detection through multi-source information complementarity, the issue of data misalignment caused by inherent discrepancies between modalities remains a critical challenge that hinders detection performance improvement. To address the perceptual degradation caused by inter-modal conflicts during fusion, we propose a multimodal fusion network for 3D object detection in autonomous driving using offset correction and query generation strategies (DADNet). The architecture features two innovative components: (1) an Offset Correction Module (OCM) that establishes learnable offset fields for pre-fusion spatial feature alignment, and (2) a Query Generation Module (QGM) designed to recover dissolved high-value objects from fusion heatmaps through monomodal feature mining. Specifically, the OCM aligns LiDAR and camera Bird’s-Eye-View (BEV) features into a unified distribution space via adaptive coordinate transformation, while the QGM reconstructs critical detection queries using attention-based feature reactivation from individual sensor modalities. Through rigorous benchmarking against 14 state-of-the-art detectors on the KITTI dataset, DADNet demonstrates superior performance across all evaluation scenarios. The core code will be released at https://github.com/ljyw17/3DDet.
尽管跨模态数据融合可以通过多源信息互补有效缓解单模态方法在3D目标检测中的局限性,但模态之间固有差异导致的数据不对齐问题仍然是阻碍检测性能提高的关键挑战。为了解决融合过程中多模态冲突引起的感知退化问题,我们提出了一种使用偏移校正和查询生成策略(DADNet)的多模态融合网络,用于自动驾驶中的3D目标检测。该架构具有两个创新组件:(1)偏移校正模块(OCM),用于建立可学习的偏移字段,用于融合前的空间特征对齐;(2)查询生成模块(QGM),旨在通过单模特征挖掘从融合热图中恢复溶解的高价值对象。具体来说,OCM通过自适应坐标变换将LiDAR和相机的鸟瞰图(BEV)特征对齐到统一的分布空间中,而QGM使用来自单个传感器模态的基于注意力的特征重新激活来重建关键检测查询。通过对KITTI数据集上14个最先进探测器的严格基准测试,DADNet在所有评估场景中都表现出卓越的性能。核心代码将在https://github.com/ljyw17/3DDet上发布。
{"title":"Offset-corrected query generation strategies for cross-modality misalignment in 3D object detection: aligning LiDAR and camera","authors":"Jiayao Li ,&nbsp;Chak Fong Cheang ,&nbsp;Xiaoyuan Yu ,&nbsp;Suigu Tang ,&nbsp;Zhaolong Du ,&nbsp;Qianxiang Cheng","doi":"10.1016/j.neucom.2025.132582","DOIUrl":"10.1016/j.neucom.2025.132582","url":null,"abstract":"<div><div>Although cross-modality data fusion can effectively mitigate the limitations of monomodal approaches in 3D object detection through multi-source information complementarity, the issue of data misalignment caused by inherent discrepancies between modalities remains a critical challenge that hinders detection performance improvement. To address the perceptual degradation caused by inter-modal conflicts during fusion, we propose a multimodal fusion network for 3D object detection in autonomous driving using offset correction and query generation strategies (DADNet). The architecture features two innovative components: (1) an Offset Correction Module (OCM) that establishes learnable offset fields for pre-fusion spatial feature alignment, and (2) a Query Generation Module (QGM) designed to recover dissolved high-value objects from fusion heatmaps through monomodal feature mining. Specifically, the OCM aligns LiDAR and camera Bird’s-Eye-View (BEV) features into a unified distribution space via adaptive coordinate transformation, while the QGM reconstructs critical detection queries using attention-based feature reactivation from individual sensor modalities. Through rigorous benchmarking against 14 state-of-the-art detectors on the KITTI dataset, DADNet demonstrates superior performance across all evaluation scenarios. The core code will be released at <span><span>https://github.com/ljyw17/3DDet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"670 ","pages":"Article 132582"},"PeriodicalIF":6.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive momentum enhanced second-order stochastic federated learning 自适应动量增强二阶随机联邦学习
IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.neucom.2026.132628
Jiahao Zhang, Xiaokang Pan, Jingling Liu, Zhe Qu
Second-order optimization has garnered significant attention in Federated Learning (FL) due to its potential for faster convergence. However, despite recent advancements, the full potential of second-order FL algorithms remains underutilized. Existing research either fails to fully leverage the advantages of well-established first-order FL methods in non-convex settings or imposes overly stringent conditions. To address these challenges, we propose a novel second-order FL algorithm, Federated Cubic Regularized Adaptive Momentum (FedCRAM). Specifically, FedCRAM incorporates both gradient and Hessian information to adaptively adjust the momentum parameter, enabling faster convergence while also mitigating the adverse effects of data heterogeneity. Our theoretical analysis demonstrates that FedCRAM can find an ϵ-second-order stationary point in O(R1ϵ3/2) rounds, outperforming existing state-of-the-art second-order FL algorithms without requiring additional assumptions. Furthermore, to reduce communication costs, we introduce a variant of FedCRAM, FedC2RAM, which uses biased compression. Notably, FedC2RAM achieves the same convergence rate as FedCRAM, while significantly lowering communication overhead. Extensive experimental results validate the superior performance of our proposed algorithms.
二阶优化由于具有更快收敛的潜力,在联邦学习(FL)中引起了极大的关注。然而,尽管最近取得了进展,二阶FL算法的全部潜力仍未得到充分利用。现有的研究要么未能充分利用已建立的一阶FL方法在非凸设置中的优势,要么施加过于严格的条件。为了解决这些挑战,我们提出了一种新的二阶FL算法,联邦立方正则化自适应动量(FedCRAM)。具体来说,FedCRAM结合梯度信息和Hessian信息自适应调整动量参数,使收敛速度更快,同时也减轻了数据异质性的不利影响。我们的理论分析表明,FedCRAM可以在O(R−1λ−3/2)轮中找到ϵ-second-order平稳点,优于现有的最先进的二阶FL算法,而无需额外的假设。此外,为了降低通信成本,我们引入了FedCRAM的一个变体,FedC2RAM,它使用偏置压缩。值得注意的是,FedC2RAM实现了与FedCRAM相同的收敛速率,同时显著降低了通信开销。大量的实验结果验证了我们提出的算法的优越性能。
{"title":"Adaptive momentum enhanced second-order stochastic federated learning","authors":"Jiahao Zhang,&nbsp;Xiaokang Pan,&nbsp;Jingling Liu,&nbsp;Zhe Qu","doi":"10.1016/j.neucom.2026.132628","DOIUrl":"10.1016/j.neucom.2026.132628","url":null,"abstract":"<div><div>Second-order optimization has garnered significant attention in Federated Learning (FL) due to its potential for faster convergence. However, despite recent advancements, the full potential of second-order FL algorithms remains underutilized. Existing research either fails to fully leverage the advantages of well-established first-order FL methods in non-convex settings or imposes overly stringent conditions. To address these challenges, we propose a novel second-order FL algorithm, Federated Cubic Regularized Adaptive Momentum (FedCRAM). Specifically, FedCRAM incorporates both gradient and Hessian information to adaptively adjust the momentum parameter, enabling faster convergence while also mitigating the adverse effects of data heterogeneity. Our theoretical analysis demonstrates that FedCRAM can find an <span><math><mi>ϵ</mi></math></span>-second-order stationary point in <span><math><mrow><mi>O</mi></mrow><mo>(</mo><msup><mi>R</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><msup><mi>ϵ</mi><mrow><mo>−</mo><mn>3</mn><mrow><mo>/</mo></mrow><mn>2</mn></mrow></msup><mo>)</mo></math></span> rounds, outperforming existing state-of-the-art second-order FL algorithms without requiring additional assumptions. Furthermore, to reduce communication costs, we introduce a variant of FedCRAM, FedC<sup>2</sup>RAM, which uses biased compression. Notably, FedC<sup>2</sup>RAM achieves the same convergence rate as FedCRAM, while significantly lowering communication overhead. Extensive experimental results validate the superior performance of our proposed algorithms.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"670 ","pages":"Article 132628"},"PeriodicalIF":6.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neurocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1