首页 > 最新文献

Frontiers in Neurorobotics最新文献

英文 中文
Imitation-relaxation reinforcement learning for sparse badminton strikes via dynamic trajectory generation. 基于动态轨迹生成的稀疏羽毛球击球的模仿松弛强化学习。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-02 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1649870
Yanyan Yuan, Yucheng Tao, Shaowen Cheng, Yanhong Liang, Yongbin Jin, Hongtao Wang

Robotic racket sports provide exceptional benchmarks for evaluating dynamic motion control capabilities in robots. Due to the highly non-linear dynamics of the shuttlecock, the stringent demands on robots' dynamic responses, and the convergence difficulties caused by sparse rewards in reinforcement learning, badminton strikes remain a formidable challenge for robot systems. To address these issues, this study proposes DTG-IRRL, a novel learning framework for badminton strikes that integrates imitation-relaxation reinforcement learning with dynamic trajectory generation. The framework demonstrates significantly improved training efficiency and performance, achieving faster convergence and twice the landing accuracy. Analysis of the reward function within a specific parameter space hyperplane intuitively reveals the convergence difficulties arising from the inherent sparsity of rewards in racket sports and demonstrates the framework's effectiveness in mitigating local and slow convergence. Implemented on hardware with zero-shot transfer, the framework achieves a 90% hitting rate and a 70% landing accuracy, enabling sustained humanrobot rallies. Cross-platform validation using the UR5 robot demonstrates the framework's generalizability while highlighting the requirement for high dynamic performance of robotic arms in racket sports.

机器人球拍运动为评估机器人的动态运动控制能力提供了卓越的基准。由于羽毛球运动的高度非线性动力学特性、对机器人动态响应的严格要求以及强化学习中稀疏奖励带来的收敛困难,羽毛球击球仍然是机器人系统面临的一个巨大挑战。为了解决这些问题,本研究提出了一种新的羽毛球击球学习框架DTG-IRRL,该框架将模仿-放松强化学习与动态轨迹生成相结合。该框架显著提高了训练效率和性能,实现了更快的收敛速度和两倍的着陆精度。对特定参数空间超平面内奖励函数的分析直观地揭示了球拍运动中奖励固有的稀疏性所带来的收敛困难,并证明了该框架在缓解局部和缓慢收敛方面的有效性。该框架在硬件上实现了零射击转移,实现了90%的命中率和70%的着陆精度,实现了持续的人机拉力赛。使用UR5机器人进行的跨平台验证证明了框架的通用性,同时突出了球拍运动中机械臂的高动态性能要求。
{"title":"Imitation-relaxation reinforcement learning for sparse badminton strikes via dynamic trajectory generation.","authors":"Yanyan Yuan, Yucheng Tao, Shaowen Cheng, Yanhong Liang, Yongbin Jin, Hongtao Wang","doi":"10.3389/fnbot.2025.1649870","DOIUrl":"10.3389/fnbot.2025.1649870","url":null,"abstract":"<p><p>Robotic racket sports provide exceptional benchmarks for evaluating dynamic motion control capabilities in robots. Due to the highly non-linear dynamics of the shuttlecock, the stringent demands on robots' dynamic responses, and the convergence difficulties caused by sparse rewards in reinforcement learning, badminton strikes remain a formidable challenge for robot systems. To address these issues, this study proposes DTG-IRRL, a novel learning framework for badminton strikes that integrates imitation-relaxation reinforcement learning with dynamic trajectory generation. The framework demonstrates significantly improved training efficiency and performance, achieving faster convergence and twice the landing accuracy. Analysis of the reward function within a specific parameter space hyperplane intuitively reveals the convergence difficulties arising from the inherent sparsity of rewards in racket sports and demonstrates the framework's effectiveness in mitigating local and slow convergence. Implemented on hardware with zero-shot transfer, the framework achieves a 90% hitting rate and a 70% landing accuracy, enabling sustained humanrobot rallies. Cross-platform validation using the UR5 robot demonstrates the framework's generalizability while highlighting the requirement for high dynamic performance of robotic arms in racket sports.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1649870"},"PeriodicalIF":2.8,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12436432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145079389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable admittance control with sEMG-based support for wearable wrist exoskeleton. 可变导纳控制与肌电图为基础的支持可穿戴手腕外骨骼。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-01 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1562675
Charles Lambelet, Melvin Mathis, Marc Siegenthaler, Jeremia P O Held, Daniel Woolley, Olivier Lambercy, Roger Gassert, Nicole Wenderoth

Introduction: Wrist function impairment is common after stroke and heavily impacts the execution of daily tasks. Robotic therapy, and more specifically wearable exoskeletons, have the potential to boost training dose in context-relevant scenarios, promote voluntary effort through motor intent detection, and mitigate the effect of gravity. Portable exoskeletons are often non-backdrivable and it is challenging to make their control safe, reactive and stable. Admittance control is often used in this case, however, this type of control can become unstable when the supported biological joint stiffens. Variable admittance control adapts its parameters dynamically to allow free motion and stabilize the human-robot interaction.

Methods: In this study, we implemented a variable admittance control scheme on a one degree of freedom wearable wrist exoskeleton. The damping parameter of the admittance scheme is adjusted in real-time to cope with instabilities and varying wrist stiffness. In addition to the admittance control scheme, sEMG- and gravity-based controllers were implemented, characterized and optimized on ten healthy participants and tested on six stroke survivors.

Results: The results show that (1) the variable admittance control scheme could stabilize the interaction but at the cost of a decrease in transparency, and (2) when coupled with the variable admittance controller the sEMG-based control enhanced wrist functionality of stroke survivors in the most extreme angular positions.

Discussion: Our variable admittance control scheme with sEMG- and gravity-based support was most beneficial for patients with higher levels of impairment by improving range of motion and promoting voluntary effort. Future work could combine both controllers to customize and fine tune the stability of the support to a wider range of impairment levels and types.

手腕功能损伤是中风后常见的,严重影响日常工作的执行。机器人疗法,更具体地说,可穿戴外骨骼,有可能在与环境相关的场景中提高训练剂量,通过运动意图检测促进自愿努力,并减轻重力的影响。便携式外骨骼通常是不可反向驱动的,使其控制安全、反应性和稳定性具有挑战性。导纳控制通常用于这种情况,然而,当支撑的生物关节变硬时,这种类型的控制会变得不稳定。变导纳控制可以动态地调整其参数以实现自由运动和稳定人机交互。方法:在本研究中,我们在一自由度可穿戴腕部外骨骼上实现了可变导纳控制方案。实时调整导纳方案的阻尼参数,以应对不稳定性和手腕刚度的变化。除了导纳控制方案外,还在10名健康参与者和6名中风幸存者身上实施、表征和优化了基于表面肌电信号和重力的控制器。结果表明:(1)可变导纳控制方案可以稳定相互作用,但以降低透明度为代价;(2)与可变导纳控制器结合使用时,基于表面肌电信号的控制可以增强中风幸存者在最极端角度位置的手腕功能。讨论:我们的可变导纳控制方案结合肌电图和重力支持,通过改善活动范围和促进自主活动,对重度损伤患者最有益。未来的工作可以结合这两个控制器来定制和微调支持的稳定性,以适应更大范围的损伤水平和类型。
{"title":"Variable admittance control with sEMG-based support for wearable wrist exoskeleton.","authors":"Charles Lambelet, Melvin Mathis, Marc Siegenthaler, Jeremia P O Held, Daniel Woolley, Olivier Lambercy, Roger Gassert, Nicole Wenderoth","doi":"10.3389/fnbot.2025.1562675","DOIUrl":"10.3389/fnbot.2025.1562675","url":null,"abstract":"<p><strong>Introduction: </strong>Wrist function impairment is common after stroke and heavily impacts the execution of daily tasks. Robotic therapy, and more specifically wearable exoskeletons, have the potential to boost training dose in context-relevant scenarios, promote voluntary effort through motor intent detection, and mitigate the effect of gravity. Portable exoskeletons are often non-backdrivable and it is challenging to make their control safe, reactive and stable. Admittance control is often used in this case, however, this type of control can become unstable when the supported biological joint stiffens. Variable admittance control adapts its parameters dynamically to allow free motion and stabilize the human-robot interaction.</p><p><strong>Methods: </strong>In this study, we implemented a variable admittance control scheme on a one degree of freedom wearable wrist exoskeleton. The damping parameter of the admittance scheme is adjusted in real-time to cope with instabilities and varying wrist stiffness. In addition to the admittance control scheme, sEMG- and gravity-based controllers were implemented, characterized and optimized on ten healthy participants and tested on six stroke survivors.</p><p><strong>Results: </strong>The results show that (1) the variable admittance control scheme could stabilize the interaction but at the cost of a decrease in transparency, and (2) when coupled with the variable admittance controller the sEMG-based control enhanced wrist functionality of stroke survivors in the most extreme angular positions.</p><p><strong>Discussion: </strong>Our variable admittance control scheme with sEMG- and gravity-based support was most beneficial for patients with higher levels of impairment by improving range of motion and promoting voluntary effort. Future work could combine both controllers to customize and fine tune the stability of the support to a wider range of impairment levels and types.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1562675"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12434121/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145075000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
4D trajectory lightweight prediction algorithm based on knowledge distillation technique. 基于知识蒸馏技术的四维轨迹轻量化预测算法。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-22 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1643919
Weizhen Tang, Jie Dai, Zhousheng Huang, Boyang Hao, Weizheng Xie

Introduction: To address the challenges of current 4D trajectory prediction-specifically, limited multi-factor feature extraction and excessive computational cost-this study develops a lightweight prediction framework tailored for real-time air-traffic management.

Methods: We propose a hybrid RCBAM-TCN-LSTM architecture enhanced with a teacher-student knowledge distillation mechanism. The Residual Convolutional Block Attention Module (RCBAM) serves as the teacher network to extract high-dimensional spatial features via residual structures and channel-spatial attention. The student network adopts a Temporal Convolutional Network-LSTM (TCN-LSTM) design, integrating dilated causal convolutions and two LSTM layers for efficient temporal modeling. Historical ADS-B trajectory data from Zhuhai Jinwan Airport are preprocessed using cubic spline interpolation and a uniform-step sliding window to ensure data alignment and temporal consistency. In the distillation process, soft labels from the teacher and hard labels from actual observations jointly guide student training.

Results: In multi-step prediction experiments, the distilled RCBAM-TCN-LSTM model achieved average reductions of 40%-60% in MAE, RMSE, and MAPE compared with the original RCBAM and TCN-LSTM models, while improving R ² by 4%-6%. The approach maintained high accuracy across different prediction horizons while reducing computational complexity.

Discussion: The proposed method effectively balances high-precision modeling of spatiotemporal dependencies with lightweight deployment requirements, enabling real-time air-traffic monitoring and early warning on standard CPUs and embedded devices. This framework offers a scalable solution for enhancing the operational safety and efficiency of modern air-traffic control systems.

导言:为了解决当前四维轨迹预测的挑战,特别是有限的多因素特征提取和过高的计算成本,本研究开发了一个专为实时空中交通管理量身定制的轻量级预测框架。方法:我们提出了一种混合rbam - tcn - lstm架构,增强了师生知识蒸馏机制。残差卷积块注意模块(RCBAM)作为教师网络,通过残差结构和通道空间注意提取高维空间特征。学生网络采用时序卷积网络-LSTM (TCN-LSTM)设计,将扩展因果卷积和两层LSTM相结合,实现了高效的时序建模。采用三次样条插值和均匀步长滑动窗口对珠海金湾机场ADS-B历史轨迹数据进行预处理,确保数据对齐和时间一致性。在蒸馏过程中,来自老师的软标签和来自实际观察的硬标签共同指导学生的训练。结果:在多步预测实验中,与原始RCBAM和TCN-LSTM模型相比,蒸馏后的RCBAM-TCN-LSTM模型的MAE、RMSE和MAPE平均降低了40%-60%,R²提高了4%-6%。该方法在降低计算复杂度的同时,在不同预测范围内保持了较高的预测精度。讨论:提出的方法有效地平衡了高精度的时空依赖性建模和轻量级部署需求,在标准cpu和嵌入式设备上实现实时空中交通监控和预警。该框架为提高现代空中交通管制系统的操作安全性和效率提供了可扩展的解决方案。
{"title":"4D trajectory lightweight prediction algorithm based on knowledge distillation technique.","authors":"Weizhen Tang, Jie Dai, Zhousheng Huang, Boyang Hao, Weizheng Xie","doi":"10.3389/fnbot.2025.1643919","DOIUrl":"10.3389/fnbot.2025.1643919","url":null,"abstract":"<p><strong>Introduction: </strong>To address the challenges of current 4D trajectory prediction-specifically, limited multi-factor feature extraction and excessive computational cost-this study develops a lightweight prediction framework tailored for real-time air-traffic management.</p><p><strong>Methods: </strong>We propose a hybrid RCBAM-TCN-LSTM architecture enhanced with a teacher-student knowledge distillation mechanism. The Residual Convolutional Block Attention Module (RCBAM) serves as the teacher network to extract high-dimensional spatial features via residual structures and channel-spatial attention. The student network adopts a Temporal Convolutional Network-LSTM (TCN-LSTM) design, integrating dilated causal convolutions and two LSTM layers for efficient temporal modeling. Historical ADS-B trajectory data from Zhuhai Jinwan Airport are preprocessed using cubic spline interpolation and a uniform-step sliding window to ensure data alignment and temporal consistency. In the distillation process, soft labels from the teacher and hard labels from actual observations jointly guide student training.</p><p><strong>Results: </strong>In multi-step prediction experiments, the distilled RCBAM-TCN-LSTM model achieved average reductions of 40%-60% in MAE, RMSE, and MAPE compared with the original RCBAM and TCN-LSTM models, while improving <i>R</i> <sup>²</sup> by 4%-6%. The approach maintained high accuracy across different prediction horizons while reducing computational complexity.</p><p><strong>Discussion: </strong>The proposed method effectively balances high-precision modeling of spatiotemporal dependencies with lightweight deployment requirements, enabling real-time air-traffic monitoring and early warning on standard CPUs and embedded devices. This framework offers a scalable solution for enhancing the operational safety and efficiency of modern air-traffic control systems.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1643919"},"PeriodicalIF":2.8,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12411499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145014961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tri-manual interaction in hybrid BCI-VR systems: integrating gaze, EEG control for enhanced 3D object manipulation. 混合BCI-VR系统中的三手交互:整合凝视、脑电图控制以增强3D对象操作。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-14 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1628968
Jian Teng, Sukyoung Cho, Shaw-Mung Lee

Brain-computer interface (BCI) integration with virtual reality (VR) has progressed from single-limb control to multi-limb coordination, yet achieving intuitive tri-manual operation remains challenging. This study presents a consumer-grade hybrid BCI-VR framework enabling simultaneous control of two biological hands and a virtual third limb through integration of Tobii eye-tracking, NeuroSky single-channel EEG, and non-haptic controllers. The system employs e-Sense attention thresholds (>80% for 300 ms) to trigger virtual hand activation combined with gaze-driven targeting within 45° visual cones. A soft maximum weighted arbitration algorithm resolves spatiotemporal conflicts between manual and virtual inputs with 92.4% success rate. Experimental validation with eight participants across 160 trials demonstrated 87.5% virtual hand success rate and 41% spatial error reduction (σ = 0.23 mm vs. 0.39 mm) compared to traditional dual-hand control. The framework achieved 320 ms activation latency and 22% NASA-TLX workload reduction through adaptive cognitive load management. Time-frequency analysis revealed characteristic beta-band (15-20 Hz) energy modulations during successful virtual limb control, providing neurophysiological evidence for attention-mediated supernumerary limb embodiment. These findings demonstrate that sophisticated algorithmic approaches can compensate for consumer-grade hardware limitations, enabling laboratory-grade precision in accessible tri-manual VR applications for rehabilitation, training, and assistive technologies.

脑机接口(BCI)与虚拟现实(VR)的集成已经从单肢控制发展到多肢协调,但实现直观的三手操作仍然是一个挑战。本研究提出了一种消费级混合BCI-VR框架,通过集成Tobii眼动追踪、NeuroSky单通道EEG和非触觉控制器,可以同时控制两只生物手和虚拟第三肢。该系统采用e-Sense注意力阈值(bbb80 %, 300 ms)来触发虚拟手激活,并结合45°视锥内的凝视驱动目标。一种软最大加权仲裁算法解决了人工和虚拟输入的时空冲突,成功率为92.4%。8名参与者参与的160项实验验证表明,与传统双手控制相比,虚拟手成功率为87.5%,空间误差降低41% (σ = 0.23 mm vs. 0.39 mm)。该框架通过自适应认知负载管理实现了320 ms的激活延迟和22%的NASA-TLX工作负载减少。时频分析显示,在成功的虚拟肢体控制过程中,特征的β波段(15-20 Hz)能量调制,为注意介导的多肢体体现提供了神经生理学证据。这些研究结果表明,复杂的算法方法可以弥补消费者级硬件的限制,使实验室级的精度在可访问的三手动VR应用中用于康复、训练和辅助技术。
{"title":"Tri-manual interaction in hybrid BCI-VR systems: integrating gaze, EEG control for enhanced 3D object manipulation.","authors":"Jian Teng, Sukyoung Cho, Shaw-Mung Lee","doi":"10.3389/fnbot.2025.1628968","DOIUrl":"10.3389/fnbot.2025.1628968","url":null,"abstract":"<p><p>Brain-computer interface (BCI) integration with virtual reality (VR) has progressed from single-limb control to multi-limb coordination, yet achieving intuitive tri-manual operation remains challenging. This study presents a consumer-grade hybrid BCI-VR framework enabling simultaneous control of two biological hands and a virtual third limb through integration of Tobii eye-tracking, NeuroSky single-channel EEG, and non-haptic controllers. The system employs e-Sense attention thresholds (>80% for 300 ms) to trigger virtual hand activation combined with gaze-driven targeting within 45° visual cones. A soft maximum weighted arbitration algorithm resolves spatiotemporal conflicts between manual and virtual inputs with 92.4% success rate. Experimental validation with eight participants across 160 trials demonstrated 87.5% virtual hand success rate and 41% spatial error reduction (<i>σ</i> = 0.23 mm vs. 0.39 mm) compared to traditional dual-hand control. The framework achieved 320 ms activation latency and 22% NASA-TLX workload reduction through adaptive cognitive load management. Time-frequency analysis revealed characteristic beta-band (15-20 Hz) energy modulations during successful virtual limb control, providing neurophysiological evidence for attention-mediated supernumerary limb embodiment. These findings demonstrate that sophisticated algorithmic approaches can compensate for consumer-grade hardware limitations, enabling laboratory-grade precision in accessible tri-manual VR applications for rehabilitation, training, and assistive technologies.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1628968"},"PeriodicalIF":2.8,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12390853/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144950948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-grained image classification using the MogaNet network and a multi-level gating mechanism. 使用MogaNet网络和多级门控机制进行细粒度图像分类。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-06 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1630281
Dahai Li, Su Chen

Fine-grained image classification tasks face challenges such as difficulty in labeling, scarcity of samples, and small category differences. To address this problem, this study proposes a novel fine-grained image classification method based on the MogaNet network and a multi-level gating mechanism. A feature extraction network based on MogaNet is constructed, and multi-scale feature fusion is combined to fully mine image information. The contextual information extractor is designed to align and filter more discriminative local features using the semantic context of the network, thereby strengthening the network's ability to capture detailed features. Meanwhile, a multi-level gating mechanism is introduced to obtain the saliency features of images. A feature elimination strategy is proposed to suppress the interference of fuzzy class features and background noise. A loss function is designed to constrain the elimination of fuzzy class features and classification prediction. Experimental results demonstrate that the new method can be applied to 5-shot tasks across four public datasets: Mini-ImageNet, CUB-200-2011, Stanford Dogs, and Stanford Cars. The accuracy rates reach 79.33, 87.58, 79.34, and 83.82%, respectively, which shows better performance than other state-of-the-art image classification methods.

细粒度图像分类任务面临着标记困难、样本稀缺性和类别差异小等挑战。为了解决这一问题,本研究提出了一种基于MogaNet网络和多级门控机制的新型细粒度图像分类方法。构建基于MogaNet的特征提取网络,结合多尺度特征融合,充分挖掘图像信息。上下文信息提取器旨在利用网络的语义上下文来对齐和过滤更具判别性的局部特征,从而增强网络捕获详细特征的能力。同时,引入多级门控机制来获取图像的显著性特征。为了抑制模糊类特征和背景噪声的干扰,提出了一种特征消除策略。设计了一个损失函数来约束模糊分类特征的消除和分类预测。实验结果表明,该方法可以应用于Mini-ImageNet、CUB-200-2011、Stanford Dogs和Stanford Cars四个公共数据集的5-shot任务。准确率分别达到79.33、87.58、79.34和83.82%,优于其他最先进的图像分类方法。
{"title":"Fine-grained image classification using the MogaNet network and a multi-level gating mechanism.","authors":"Dahai Li, Su Chen","doi":"10.3389/fnbot.2025.1630281","DOIUrl":"10.3389/fnbot.2025.1630281","url":null,"abstract":"<p><p>Fine-grained image classification tasks face challenges such as difficulty in labeling, scarcity of samples, and small category differences. To address this problem, this study proposes a novel fine-grained image classification method based on the MogaNet network and a multi-level gating mechanism. A feature extraction network based on MogaNet is constructed, and multi-scale feature fusion is combined to fully mine image information. The contextual information extractor is designed to align and filter more discriminative local features using the semantic context of the network, thereby strengthening the network's ability to capture detailed features. Meanwhile, a multi-level gating mechanism is introduced to obtain the saliency features of images. A feature elimination strategy is proposed to suppress the interference of fuzzy class features and background noise. A loss function is designed to constrain the elimination of fuzzy class features and classification prediction. Experimental results demonstrate that the new method can be applied to 5-shot tasks across four public datasets: Mini-ImageNet, CUB-200-2011, Stanford Dogs, and Stanford Cars. The accuracy rates reach 79.33, 87.58, 79.34, and 83.82%, respectively, which shows better performance than other state-of-the-art image classification methods.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1630281"},"PeriodicalIF":2.8,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12364808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144950965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated neural network framework for multi-object detection and recognition using UAV imagery. 基于无人机图像的多目标检测与识别集成神经网络框架。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-30 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1643011
Mohammed Alshehri, Tingting Xue, Ghulam Mujtaba, Yahya AlQahtani, Nouf Abdullah Almujally, Ahmad Jalal, Hui Liu

Introduction: Accurate vehicle analysis from aerial imagery has become increasingly vital for emerging technologies and public service applications such as intelligent traffic management, urban planning, autonomous navigation, and military surveillance. However, analyzing UAV-captured video poses several inherent challenges, such as the small size of target vehicles, occlusions, cluttered urban backgrounds, motion blur, and fluctuating lighting conditions which hinder the accuracy and consistency of conventional perception systems. To address these complexities, our research proposes a fully end-to-end deep learning-driven perception pipeline specifically optimized for UAV-based traffic monitoring. The proposed framwork integrates multiple advanced modules: RetinexNet for preprocessing, segmentation using HRNet to preserve high-resolution semantic information, and vehicle detection using the YOLOv11 framework. Deep SORT is employed for efficient vehicle tracking, while CSRNet facilitates high-density vehicle counting. LSTM networks are integrated to predict vehicle trajectories based on temporal patterns, and a combination of DenseNet and SuperPoint is utilized for robust feature extraction. Finally, classification is performed using Vision Transformers (ViTs), leveraging attention mechanisms to ensure accurate recognition across diverse categories. The modular yet unified architecture is designed to handle spatiotemporal dynamics, making it suitable for real-time deployment in diverse UAV platforms.

Method: The framework suggests using today's best neural networks that are made to solve different problems in aerial vehicle analysis. RetinexNet is used in preprocessing to make the lighting of each input frame consistent. Using HRNet for semantic segmentation allows for accurate splitting between vehicles and their surroundings. YOLOv11 provides high precision and quick vehicle detection and Deep SORT allows reliable tracking without losing track of individual cars. CSRNet are used for vehicle counting that is unaffected by obstacles or traffic jams. LSTM models capture how a car moves in time to forecast future positions. Combining DenseNet and SuperPoint embeddings that were improved with an AutoEncoder is done during feature extraction. In the end, using an attention function, Vision Transformer-based models classify vehicles seen from above. Every part of the system is developed and included to give the improved performance when the UAV is being used in real life.

Results: Our proposed framework significantly improves the accuracy, reliability, and efficiency of vehicle analysis from UAV imagery. Our pipeline was rigorously evaluated on two famous datasets, AU-AIR and Roundabout. On the AU-AIR dataset, the system achieved a detection accuracy of 97.8%, a tracking accuracy of 96.5%, and a classification accuracy of 98.4%. Similarly, on the Roundabout dataset, it reached 96.9% det

导言:对于新兴技术和公共服务应用,如智能交通管理、城市规划、自主导航和军事监视,从航空图像中进行准确的车辆分析变得越来越重要。然而,分析无人机捕获的视频提出了几个固有的挑战,例如目标车辆的小尺寸、遮挡、杂乱的城市背景、运动模糊和波动的照明条件,这些都会阻碍传统感知系统的准确性和一致性。为了解决这些复杂性,我们的研究提出了一个完全端到端深度学习驱动的感知管道,专门针对基于无人机的交通监控进行了优化。该框架集成了多个高级模块:用于预处理的retexnet,用于保留高分辨率语义信息的HRNet分割,以及使用YOLOv11框架的车辆检测。采用Deep SORT实现高效的车辆跟踪,CSRNet实现高密度的车辆计数。将LSTM网络集成到基于时间模式的车辆轨迹预测中,并结合DenseNet和SuperPoint进行鲁棒特征提取。最后,使用视觉变形器(ViTs)进行分类,利用注意力机制确保对不同类别的准确识别。模块化但统一的架构设计用于处理时空动态,使其适合于在各种无人机平台上实时部署。方法:该框架建议使用当今最好的神经网络来解决飞行器分析中的不同问题。预处理中使用了retexnet,使每个输入帧的光照一致。使用HRNet进行语义分割允许在车辆和周围环境之间进行准确的分割。YOLOv11提供高精度和快速的车辆检测,Deep SORT允许可靠的跟踪,而不会丢失单个车辆的跟踪。CSRNet用于不受障碍物或交通堵塞影响的车辆计数。LSTM模型捕捉汽车如何及时移动,以预测未来的位置。结合DenseNet和SuperPoint嵌入,通过AutoEncoder改进,在特征提取过程中完成。最后,利用注意力功能,基于视觉转换器的模型对从上方看到的车辆进行分类。系统的每个部分都被开发和包含,以提高无人机在实际生活中的使用性能。结果:我们提出的框架显著提高了无人机图像车辆分析的准确性、可靠性和效率。我们的管道在AU-AIR和Roundabout两个著名的数据集上进行了严格的评估。在AU-AIR数据集上,该系统的检测准确率为97.8%,跟踪准确率为96.5%,分类准确率为98.4%。同样,在Roundabout数据集上,检测准确率达到96.9%,跟踪准确率达到94.4%,分类准确率达到97.7%。这些结果超越了以前的基准,证明了该系统在各种空中交通场景中的稳健性能。先进模型的集成,用于检测的YOLOv11,用于分割的HRNet,用于跟踪的Deep SORT,用于计数的CSRNet,用于轨迹预测的LSTM,以及用于分类的Vision transformer,使该框架即使在遮挡、可变光照和尺度变化等具有挑战性的条件下也能保持高精度。讨论:结果表明,所选择的深度学习系统足够强大,可以应对飞行器分析的挑战,并在上述所有任务中提供可靠和精确的结果。结合几个先进的模型,确保系统即使在处理诸如人被掩盖和大小不一的问题时也能顺利运行。
{"title":"Integrated neural network framework for multi-object detection and recognition using UAV imagery.","authors":"Mohammed Alshehri, Tingting Xue, Ghulam Mujtaba, Yahya AlQahtani, Nouf Abdullah Almujally, Ahmad Jalal, Hui Liu","doi":"10.3389/fnbot.2025.1643011","DOIUrl":"10.3389/fnbot.2025.1643011","url":null,"abstract":"<p><strong>Introduction: </strong>Accurate vehicle analysis from aerial imagery has become increasingly vital for emerging technologies and public service applications such as intelligent traffic management, urban planning, autonomous navigation, and military surveillance. However, analyzing UAV-captured video poses several inherent challenges, such as the small size of target vehicles, occlusions, cluttered urban backgrounds, motion blur, and fluctuating lighting conditions which hinder the accuracy and consistency of conventional perception systems. To address these complexities, our research proposes a fully end-to-end deep learning-driven perception pipeline specifically optimized for UAV-based traffic monitoring. The proposed framwork integrates multiple advanced modules: RetinexNet for preprocessing, segmentation using HRNet to preserve high-resolution semantic information, and vehicle detection using the YOLOv11 framework. Deep SORT is employed for efficient vehicle tracking, while CSRNet facilitates high-density vehicle counting. LSTM networks are integrated to predict vehicle trajectories based on temporal patterns, and a combination of DenseNet and SuperPoint is utilized for robust feature extraction. Finally, classification is performed using Vision Transformers (ViTs), leveraging attention mechanisms to ensure accurate recognition across diverse categories. The modular yet unified architecture is designed to handle spatiotemporal dynamics, making it suitable for real-time deployment in diverse UAV platforms.</p><p><strong>Method: </strong>The framework suggests using today's best neural networks that are made to solve different problems in aerial vehicle analysis. RetinexNet is used in preprocessing to make the lighting of each input frame consistent. Using HRNet for semantic segmentation allows for accurate splitting between vehicles and their surroundings. YOLOv11 provides high precision and quick vehicle detection and Deep SORT allows reliable tracking without losing track of individual cars. CSRNet are used for vehicle counting that is unaffected by obstacles or traffic jams. LSTM models capture how a car moves in time to forecast future positions. Combining DenseNet and SuperPoint embeddings that were improved with an AutoEncoder is done during feature extraction. In the end, using an attention function, Vision Transformer-based models classify vehicles seen from above. Every part of the system is developed and included to give the improved performance when the UAV is being used in real life.</p><p><strong>Results: </strong>Our proposed framework significantly improves the accuracy, reliability, and efficiency of vehicle analysis from UAV imagery. Our pipeline was rigorously evaluated on two famous datasets, AU-AIR and Roundabout. On the AU-AIR dataset, the system achieved a detection accuracy of 97.8%, a tracking accuracy of 96.5%, and a classification accuracy of 98.4%. Similarly, on the Roundabout dataset, it reached 96.9% det","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1643011"},"PeriodicalIF":2.8,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343587/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144845568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeuroVI-based wave compensation system control for offshore wind turbines. 基于神经网络的海上风力发电机波浪补偿系统控制。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-30 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1648713
Fengshuang Ma, Xiangyong Liu, Zhiqiang Xu, Tianhong Ding

In deep-sea areas, the hoisting operation of offshore wind turbines is seriously affected by waves, and the secondary impact is prone to occur between the turbine and the pile foundation. To address this issue, this study proposes an integrated wave compensation system for offshore wind turbines based on a neuromorphic vision (NeuroVI) camera. The system employs a NeuroVI camera to achieve non-contact, high-precision, and low-latency displacement detection of hydraulic cylinders, overcoming the limitations of traditional magnetostrictive displacement sensors, which exhibit slow response and susceptibility to interference in harsh marine conditions. A dynamic simulation model was developed using AMESim-Simulink co-simulation to analyze the compensation performance of the NeuroVI-based system under step and sinusoidal wave disturbances. Comparative results demonstrate that the NeuroVI feedback system achieves faster response times and superior stability over conventional sensors. Laboratory-scale model tests and real-world application in the installation of a 5.2 MW offshore wind turbine validated the system's feasibility and robustness, enabling real-time collaborative control of turbine and cylinder displacement to effectively mitigate multi-impact risks. This research provides an innovative approach for deploying neural perception technology in complex marine scenarios and advances the development of neuro-robotic systems in ocean engineering.

在深海地区,海上风电机组吊装作业受海浪影响严重,风机与桩基之间容易发生二次冲击。为了解决这一问题,本研究提出了一种基于神经形态视觉(NeuroVI)相机的海上风力涡轮机综合波浪补偿系统。该系统采用NeuroVI摄像头,实现了液压缸的非接触式、高精度、低延迟位移检测,克服了传统磁致伸缩位移传感器在恶劣海洋条件下响应缓慢、易受干扰的局限性。利用AMESim-Simulink联合仿真建立了动态仿真模型,分析了基于neurovi的系统在阶跃波和正弦波干扰下的补偿性能。对比结果表明,与传统传感器相比,NeuroVI反馈系统具有更快的响应时间和更好的稳定性。实验室规模的模型测试和5.2 MW海上风力涡轮机的实际应用验证了该系统的可行性和鲁棒性,实现了涡轮机和气缸位移的实时协同控制,有效降低了多重影响风险。该研究为在复杂的海洋环境中应用神经感知技术提供了一种创新的方法,并推动了海洋工程中神经机器人系统的发展。
{"title":"NeuroVI-based wave compensation system control for offshore wind turbines.","authors":"Fengshuang Ma, Xiangyong Liu, Zhiqiang Xu, Tianhong Ding","doi":"10.3389/fnbot.2025.1648713","DOIUrl":"10.3389/fnbot.2025.1648713","url":null,"abstract":"<p><p>In deep-sea areas, the hoisting operation of offshore wind turbines is seriously affected by waves, and the secondary impact is prone to occur between the turbine and the pile foundation. To address this issue, this study proposes an integrated wave compensation system for offshore wind turbines based on a neuromorphic vision (NeuroVI) camera. The system employs a NeuroVI camera to achieve non-contact, high-precision, and low-latency displacement detection of hydraulic cylinders, overcoming the limitations of traditional magnetostrictive displacement sensors, which exhibit slow response and susceptibility to interference in harsh marine conditions. A dynamic simulation model was developed using AMESim-Simulink co-simulation to analyze the compensation performance of the NeuroVI-based system under step and sinusoidal wave disturbances. Comparative results demonstrate that the NeuroVI feedback system achieves faster response times and superior stability over conventional sensors. Laboratory-scale model tests and real-world application in the installation of a 5.2 MW offshore wind turbine validated the system's feasibility and robustness, enabling real-time collaborative control of turbine and cylinder displacement to effectively mitigate multi-impact risks. This research provides an innovative approach for deploying neural perception technology in complex marine scenarios and advances the development of neuro-robotic systems in ocean engineering.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1648713"},"PeriodicalIF":2.8,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144845569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pre-training, personalization, and self-calibration: all a neural network-based myoelectric decoder needs. 预训练,个性化和自校准:所有基于神经网络的肌电解码器需要。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-28 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1604453
Chenfei Ma, Xinyu Jiang, Kianoush Nazarpour

Myoelectric control systems translate electromyographic signals (EMG) from muscles into movement intentions, allowing control over various interfaces, such as prosthetics, wearable devices, and robotics. However, a major challenge lies in enhancing the system's ability to generalize, personalize, and adapt to the high variability of EMG signals. Artificial intelligence, particularly neural networks, has shown promising decoding performance when applied to large datasets. However, highly parameterized deep neural networks usually require extensive user-specific data with ground truth labels to learn individual unique EMG patterns. However, the characteristics of the EMG signal can change significantly over time, even for the same user, leading to performance degradation during extended use. In this work, we propose an innovative three-stage neural network training scheme designed to progressively develop an adaptive workflow, improving and maintaining the network performance on 28 subjects over 2 days. Experiments demonstrate the importance and necessity of each stage in the proposed framework.

肌电控制系统将来自肌肉的肌电图信号(EMG)转化为运动意图,允许控制各种接口,如假肢,可穿戴设备和机器人。然而,一个主要的挑战在于增强系统的泛化、个性化和适应肌电信号的高度可变性的能力。人工智能,特别是神经网络,在应用于大型数据集时显示出有希望的解码性能。然而,高度参数化的深度神经网络通常需要大量的用户特定数据和基础真值标签来学习单个独特的肌电模式。然而,随着时间的推移,肌电图信号的特征会发生显著变化,即使是同一用户,也会在长时间使用期间导致性能下降。在这项工作中,我们提出了一种创新的三阶段神经网络训练方案,旨在逐步开发自适应工作流程,在2天内改善和保持28个受试者的网络性能。实验证明了该框架中每个阶段的重要性和必要性。
{"title":"Pre-training, personalization, and self-calibration: all a neural network-based myoelectric decoder needs.","authors":"Chenfei Ma, Xinyu Jiang, Kianoush Nazarpour","doi":"10.3389/fnbot.2025.1604453","DOIUrl":"10.3389/fnbot.2025.1604453","url":null,"abstract":"<p><p>Myoelectric control systems translate electromyographic signals (EMG) from muscles into movement intentions, allowing control over various interfaces, such as prosthetics, wearable devices, and robotics. However, a major challenge lies in enhancing the system's ability to generalize, personalize, and adapt to the high variability of EMG signals. Artificial intelligence, particularly neural networks, has shown promising decoding performance when applied to large datasets. However, highly parameterized deep neural networks usually require extensive user-specific data with ground truth labels to learn individual unique EMG patterns. However, the characteristics of the EMG signal can change significantly over time, even for the same user, leading to performance degradation during extended use. In this work, we propose an innovative three-stage neural network training scheme designed to progressively develop an adaptive workflow, improving and maintaining the network performance on 28 subjects over 2 days. Experiments demonstrate the importance and necessity of each stage in the proposed framework.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1604453"},"PeriodicalIF":2.8,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12336220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144821257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and analysis of combined discrete-time zeroing neural network for solving time-varying nonlinear equation with robot application. 用于求解时变非线性方程的组合离散时间归零神经网络的设计与分析。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-11 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1576473
Zhisheng Ma, Shaobin Huang

Zeroing neural network (ZNN) is viewed as an effective solution to time-varying nonlinear equation (TVNE). In this paper, a further study is shown by proposing a novel combined discrete-time ZNN (CDTZNN) model for solving TVNE. Specifically, a new difference formula, which is called the Taylor difference formula, is constructed for first-order derivative approximation by following Taylor series expansion. The Taylor difference formula is then used to discretize the continuous-time ZNN model in the previous study. The corresponding DTZNN model is obtained, where the direct Jacobian matrix inversion is required (being time consuming). Another DTZNN model for computing the inverse of Jacobian matrix is established to solve the aforementioned limitation. The novel CDTZNN model for solving the TVNE is thus developed by combining the two models. Theoretical analysis and numerical results demonstrate the efficacy of the proposed CDTZNN model. The CDTZNN applicability is further indicated by applying the proposed model to the motion planning of robot manipulators.

归零神经网络(ZNN)是求解时变非线性方程的有效方法。本文提出了一种新的组合离散时间ZNN (CDTZNN)模型来求解TVNE。具体来说,通过泰勒级数展开,构造了一阶导数近似的差分公式,称为泰勒差分公式。然后利用泰勒差分公式对连续时间ZNN模型进行离散化。得到相应的DTZNN模型,其中需要对雅可比矩阵进行直接反演(耗时较长)。为了解决上述问题,建立了另一种计算雅可比矩阵逆的DTZNN模型。将这两种模型结合起来,建立了求解TVNE的新型CDTZNN模型。理论分析和数值结果验证了所提出的CDTZNN模型的有效性。将该模型应用于机器人机械手的运动规划,进一步证明了CDTZNN的适用性。
{"title":"Design and analysis of combined discrete-time zeroing neural network for solving time-varying nonlinear equation with robot application.","authors":"Zhisheng Ma, Shaobin Huang","doi":"10.3389/fnbot.2025.1576473","DOIUrl":"10.3389/fnbot.2025.1576473","url":null,"abstract":"<p><p>Zeroing neural network (ZNN) is viewed as an effective solution to time-varying nonlinear equation (TVNE). In this paper, a further study is shown by proposing a novel combined discrete-time ZNN (CDTZNN) model for solving TVNE. Specifically, a new difference formula, which is called the Taylor difference formula, is constructed for first-order derivative approximation by following Taylor series expansion. The Taylor difference formula is then used to discretize the continuous-time ZNN model in the previous study. The corresponding DTZNN model is obtained, where the direct Jacobian matrix inversion is required (being time consuming). Another DTZNN model for computing the inverse of Jacobian matrix is established to solve the aforementioned limitation. The novel CDTZNN model for solving the TVNE is thus developed by combining the two models. Theoretical analysis and numerical results demonstrate the efficacy of the proposed CDTZNN model. The CDTZNN applicability is further indicated by applying the proposed model to the motion planning of robot manipulators.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1576473"},"PeriodicalIF":2.8,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144729707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robust and effective framework for 3D scene reconstruction and high-quality rendering in nasal endoscopy surgery. 鼻内窥镜手术中三维场景重建和高质量渲染的鲁棒有效框架。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-27 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1630728
Xueqin Ji, Shuting Zhao, Di Liu, Feng Wang, Xinrong Chen

In nasal endoscopic surgery, the narrow nasal cavity restricts the surgical field of view and the manipulation of surgical instruments. Therefore, precise real-time intraoperative navigation, which can provide precise 3D information, plays a crucial role in avoiding critical areas with dense blood vessels and nerves. Although significant progress has been made in endoscopic 3D reconstruction methods, their application in nasal scenarios still faces numerous challenges. On the one hand, there is a lack of high-quality, annotated nasal endoscopy datasets. On the other hand, issues such as motion blur and soft tissue deformations complicate the nasal endoscopy reconstruction process. To tackle these challenges, a series of nasal endoscopy examination videos are collected, and the pose information for each frame is recorded. Additionally, a novel model named Mip-EndoGS is proposed, which integrates 3D Gaussian Splatting for reconstruction and rendering and a diffusion module to reduce image blurring in endoscopic data. Meanwhile, by incorporating an adaptive low-pass filter into the rendering pipeline, the aliasing artifacts (jagged edges) are mitigated, which occur during the rendering process. Extensive quantitative and visual experiments show that the proposed model is capable of reconstructing 3D scenes within the nasal cavity in real-time, thereby offering surgeons more detailed and precise information about the surgical scene. Moreover, the proposed approach holds great potential for integration with AR-based surgical navigation systems to enhance intraoperative guidance.

在鼻内镜手术中,狭窄的鼻腔限制了手术视野和手术器械的操作。因此,精确的术中实时导航,能够提供精确的三维信息,对于避开血管和神经密集的关键区域起着至关重要的作用。尽管内窥镜三维重建方法取得了重大进展,但其在鼻腔场景中的应用仍面临许多挑战。一方面,缺乏高质量的、带注释的鼻内窥镜数据集。另一方面,运动模糊和软组织变形等问题使鼻内窥镜重建过程复杂化。为了解决这些问题,我们收集了一系列鼻内窥镜检查视频,并记录了每帧的姿势信息。此外,提出了一种新的模型Mip-EndoGS,该模型集成了用于重建和渲染的三维高斯飞溅和用于减少内镜数据图像模糊的扩散模块。同时,通过在渲染管道中加入自适应低通滤波器,可以减轻渲染过程中出现的混叠现象(锯齿状边缘)。大量的定量和视觉实验表明,该模型能够实时重建鼻腔内的三维场景,从而为外科医生提供更详细和精确的手术场景信息。此外,该方法具有与基于ar的手术导航系统集成以增强术中引导的巨大潜力。
{"title":"A robust and effective framework for 3D scene reconstruction and high-quality rendering in nasal endoscopy surgery.","authors":"Xueqin Ji, Shuting Zhao, Di Liu, Feng Wang, Xinrong Chen","doi":"10.3389/fnbot.2025.1630728","DOIUrl":"10.3389/fnbot.2025.1630728","url":null,"abstract":"<p><p>In nasal endoscopic surgery, the narrow nasal cavity restricts the surgical field of view and the manipulation of surgical instruments. Therefore, precise real-time intraoperative navigation, which can provide precise 3D information, plays a crucial role in avoiding critical areas with dense blood vessels and nerves. Although significant progress has been made in endoscopic 3D reconstruction methods, their application in nasal scenarios still faces numerous challenges. On the one hand, there is a lack of high-quality, annotated nasal endoscopy datasets. On the other hand, issues such as motion blur and soft tissue deformations complicate the nasal endoscopy reconstruction process. To tackle these challenges, a series of nasal endoscopy examination videos are collected, and the pose information for each frame is recorded. Additionally, a novel model named Mip-EndoGS is proposed, which integrates 3D Gaussian Splatting for reconstruction and rendering and a diffusion module to reduce image blurring in endoscopic data. Meanwhile, by incorporating an adaptive low-pass filter into the rendering pipeline, the aliasing artifacts (jagged edges) are mitigated, which occur during the rendering process. Extensive quantitative and visual experiments show that the proposed model is capable of reconstructing 3D scenes within the nasal cavity in real-time, thereby offering surgeons more detailed and precise information about the surgical scene. Moreover, the proposed approach holds great potential for integration with AR-based surgical navigation systems to enhance intraoperative guidance.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1630728"},"PeriodicalIF":2.6,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12245865/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144626010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Neurorobotics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1