首页 > 最新文献

Frontiers in Neurorobotics最新文献

英文 中文
4D trajectory prediction for inbound flights. 入境航班4D轨迹预测。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-17 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1625074
Weizhen Tang, Jie Dai

Introduction: To address the challenges of cumulative errors, insufficient modeling of complex spatiotemporal features, and limitations in computational efficiency and generalization ability in 4D trajectory prediction, this paper proposes a high-precision, robust prediction method.

Methods: A hybrid model SVMD-DBO-RCBAM is constructed, integrating sequential variational modal decomposition (SVMD), the dung beetle optimization algorithm (DBO), and the ResNet-CBAM network. Innovations include frequency-domain feature decoupling, dynamic parameter optimization, and enhanced spatio-temporal feature focusing.

Results: Experiments show that the model achieves a low longitude MAE of 0.0377 in single-step prediction, a 38.5% reduction compared to the baseline model; in multi-step prediction, the longitude R2 reaches 0.9844, with a 72.9% reduction in cumulative error rate and an IQR of prediction errors less than 10% of traditional models, demonstrating high accuracy and stability.

Discussion: Experiments show that the model achieves a low longitude MAE of 0.0377 in single-step prediction, a 38.5% reduction compared to the baseline model; in multi-step prediction, the longitude R2 reaches 0.9844, with a 72.9% reduction in cumulative error rate and an IQR of prediction errors less than 10% of traditional models, demonstrating high accuracy and stability.

针对四维轨迹预测存在累积误差、复杂时空特征建模不足、计算效率和泛化能力有限等问题,提出了一种高精度、鲁棒的四维轨迹预测方法。方法:将顺序变分模态分解(SVMD)、屎壳郎优化算法(DBO)和ResNet-CBAM网络相结合,构建SVMD-DBO- rcbam混合模型。创新包括频域特征解耦、动态参数优化和增强的时空特征聚焦。结果:实验表明,该模型单步预测的低经度MAE为0.0377,比基线模型降低38.5%;在多步预测中,经R2达到0.9844,累计错误率降低72.9%,预测误差的IQR小于传统模型的10%,具有较高的准确性和稳定性。讨论:实验表明,该模型单步预测的低经度MAE为0.0377,比基线模型降低38.5%;在多步预测中,经R2达到0.9844,累计错误率降低72.9%,预测误差的IQR小于传统模型的10%,具有较高的准确性和稳定性。
{"title":"4D trajectory prediction for inbound flights.","authors":"Weizhen Tang, Jie Dai","doi":"10.3389/fnbot.2025.1625074","DOIUrl":"https://doi.org/10.3389/fnbot.2025.1625074","url":null,"abstract":"<p><strong>Introduction: </strong>To address the challenges of cumulative errors, insufficient modeling of complex spatiotemporal features, and limitations in computational efficiency and generalization ability in 4D trajectory prediction, this paper proposes a high-precision, robust prediction method.</p><p><strong>Methods: </strong>A hybrid model SVMD-DBO-RCBAM is constructed, integrating sequential variational modal decomposition (SVMD), the dung beetle optimization algorithm (DBO), and the ResNet-CBAM network. Innovations include frequency-domain feature decoupling, dynamic parameter optimization, and enhanced spatio-temporal feature focusing.</p><p><strong>Results: </strong>Experiments show that the model achieves a low longitude MAE of 0.0377 in single-step prediction, a 38.5% reduction compared to the baseline model; in multi-step prediction, the longitude R2 reaches 0.9844, with a 72.9% reduction in cumulative error rate and an IQR of prediction errors less than 10% of traditional models, demonstrating high accuracy and stability.</p><p><strong>Discussion: </strong>Experiments show that the model achieves a low longitude MAE of 0.0377 in single-step prediction, a 38.5% reduction compared to the baseline model; in multi-step prediction, the longitude R2 reaches 0.9844, with a 72.9% reduction in cumulative error rate and an IQR of prediction errors less than 10% of traditional models, demonstrating high accuracy and stability.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1625074"},"PeriodicalIF":2.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12497917/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145244444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RSA-TransUNet: a robust structure-adaptive TransUNet for enhanced road crack segmentation. RSA-TransUNet:用于增强道路裂缝分割的鲁棒结构自适应TransUNet。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-16 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1633697
Liling Hou, Fei Yu, Yaowen Hu, Yang Hu, Ruoli Yang

With the advancement of deep learning, road crack segmentation has become increasingly crucial for intelligent transportation safety. Despite notable progress, existing methods still face challenges in capturing fine-grained textures in small crack regions, handling blurred edges and significant width variations, and performing multi-class segmentation. Moreover, the high computational cost of training such models hinders their practical deployment. To tackle these limitations, we propose RSA-TransUNet, a novel model for road crack segmentation. At its core is the Axial-shift MLP Attention (ASMA) mechanism, which integrates axial perception with sparse contextual modeling. Through multi-path axial perturbations and an attention-guided structure, ASMA effectively captures long-range dependencies within row-column patterns, enabling detailed modeling of multi-scale crack features. To improve the model's adaptability to structural irregularities, we introduce the Adaptive Spline Linear Unit (ASLU), which enhances the model's capacity to represent nonlinear transformations. ASLU improves responsiveness to microstructural variations, morphological distortions, and local discontinuities, thereby boosting robustness across different domains. We further develop a Structure-aware Multi-stage Evolutionary Optimization (SMEO) strategy, which guides the training process through three phases: structural perception exploration, feature stability enhancement, and global perturbation. This strategy combines breadth sampling, convergence compression, and local escape mechanisms to improve convergence speed, global search efficiency, and generalization performance. Extensive evaluations on the Crack500, CFD, and DeepCrack datasets-including ablation studies and comparative experiments-demonstrate that RSA-TransUNet achieves superior segmentation accuracy and robustness in complex road environments, highlighting its potential for real-world applications.

随着深度学习技术的发展,道路裂缝分割对智能交通安全的重要性日益凸显。尽管取得了显著进展,但现有方法在小裂纹区域的细粒度纹理捕获、边缘模糊和显著宽度变化的处理以及多类分割等方面仍面临挑战。此外,训练这些模型的高计算成本阻碍了它们的实际部署。为了解决这些限制,我们提出了一种新的道路裂缝分割模型RSA-TransUNet。其核心是轴向转移MLP注意(ASMA)机制,该机制将轴向感知与稀疏上下文建模相结合。通过多路径轴向扰动和注意力引导结构,ASMA有效地捕获了行-列模式中的远程依赖关系,从而实现了多尺度裂缝特征的详细建模。为了提高模型对结构不规则性的适应性,我们引入了自适应样条线性单元(ASLU),增强了模型表示非线性变换的能力。ASLU提高了对微观结构变化、形态扭曲和局部不连续性的响应能力,从而增强了跨不同领域的鲁棒性。我们进一步开发了一种结构感知的多阶段进化优化(SMEO)策略,该策略指导训练过程通过三个阶段:结构感知探索、特征稳定性增强和全局扰动。该策略结合了广度采样、收敛压缩和局部逃避机制,提高了收敛速度、全局搜索效率和泛化性能。对Crack500、CFD和DeepCrack数据集(包括消融研究和比较实验)的广泛评估表明,RSA-TransUNet在复杂的道路环境中实现了卓越的分割精度和鲁棒性,突出了其在实际应用中的潜力。
{"title":"RSA-TransUNet: a robust structure-adaptive TransUNet for enhanced road crack segmentation.","authors":"Liling Hou, Fei Yu, Yaowen Hu, Yang Hu, Ruoli Yang","doi":"10.3389/fnbot.2025.1633697","DOIUrl":"10.3389/fnbot.2025.1633697","url":null,"abstract":"<p><p>With the advancement of deep learning, road crack segmentation has become increasingly crucial for intelligent transportation safety. Despite notable progress, existing methods still face challenges in capturing fine-grained textures in small crack regions, handling blurred edges and significant width variations, and performing multi-class segmentation. Moreover, the high computational cost of training such models hinders their practical deployment. To tackle these limitations, we propose RSA-TransUNet, a novel model for road crack segmentation. At its core is the Axial-shift MLP Attention (ASMA) mechanism, which integrates axial perception with sparse contextual modeling. Through multi-path axial perturbations and an attention-guided structure, ASMA effectively captures long-range dependencies within row-column patterns, enabling detailed modeling of multi-scale crack features. To improve the model's adaptability to structural irregularities, we introduce the Adaptive Spline Linear Unit (ASLU), which enhances the model's capacity to represent nonlinear transformations. ASLU improves responsiveness to microstructural variations, morphological distortions, and local discontinuities, thereby boosting robustness across different domains. We further develop a Structure-aware Multi-stage Evolutionary Optimization (SMEO) strategy, which guides the training process through three phases: structural perception exploration, feature stability enhancement, and global perturbation. This strategy combines breadth sampling, convergence compression, and local escape mechanisms to improve convergence speed, global search efficiency, and generalization performance. Extensive evaluations on the Crack500, CFD, and DeepCrack datasets-including ablation studies and comparative experiments-demonstrate that RSA-TransUNet achieves superior segmentation accuracy and robustness in complex road environments, highlighting its potential for real-world applications.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1633697"},"PeriodicalIF":2.8,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12479514/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145206146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward accurate single image sand dust removal by utilizing uncertainty-aware neural network. 利用不确定性感知神经网络实现单幅图像的精确除尘。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-10 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1575995
Bingcai Wei, Hui Liu, Chuang Qian, Haoliang Shen, Yibiao Chen, Yixin Wang

Although deep learning methods have made significant strides in single image sand dust removal, the heterogeneous uncertainty induced by dusty environments poses a considerable challenge. In response, our research presents a novel framework known as the Hierarchical Interactive Uncertainty-aware Network (HIUNet). HIUNet leverages Bayesian neural networks for the extraction of robust shallow features, bolstered by pre-trained encoders for feature extraction and the agility of lightweight decoders for preliminary image reconstitution. Subsequently, a feature frequency selection mechanism is activated to enhance overall performance by strategically identifying and retaining valuable features while effectively suppressing redundant and irrelevant ones. Following this, a feature enhancement module is applied to the preliminary restoration. This intricate fusion culminates in the production of a restored image of superior quality. Our extensive experiments, using our proposed Sand11K dataset that exhibits various levels of degradation from dust and sand, confirm the effectiveness and soundness of our proposed method. By modeling uncertainty via Bayesian neural networks to extract robust shallow features and selecting valuable features through frequency selection, HIUNet can reconstruct high-quality clean images. For future work, we plan to extend our uncertainty-aware framework to handle extreme sand scenarios.

尽管深度学习方法在单幅图像沙尘去除方面取得了重大进展,但由沙尘环境引起的异质性不确定性带来了相当大的挑战。作为回应,我们的研究提出了一种新的框架,称为层次交互不确定性感知网络(HIUNet)。HIUNet利用贝叶斯神经网络提取鲁棒的浅层特征,通过预训练的编码器进行特征提取和轻量级解码器的敏捷性进行初步图像重构。随后,激活特征频率选择机制,通过战略性地识别和保留有价值的特征,同时有效地抑制冗余和不相关的特征,从而提高整体性能。在此之后,特征增强模块应用于初步恢复。这种复杂的融合最终产生了高质量的修复图像。我们使用我们提出的Sand11K数据集进行了广泛的实验,该数据集显示了灰尘和沙子的不同程度的退化,证实了我们提出的方法的有效性和合理性。HIUNet通过贝叶斯神经网络对不确定性进行建模,提取鲁棒的浅层特征,并通过频率选择选择有价值的特征,重建出高质量的干净图像。在未来的工作中,我们计划扩展我们的不确定性感知框架来处理极端的沙子场景。
{"title":"Toward accurate single image sand dust removal by utilizing uncertainty-aware neural network.","authors":"Bingcai Wei, Hui Liu, Chuang Qian, Haoliang Shen, Yibiao Chen, Yixin Wang","doi":"10.3389/fnbot.2025.1575995","DOIUrl":"10.3389/fnbot.2025.1575995","url":null,"abstract":"<p><p>Although deep learning methods have made significant strides in single image sand dust removal, the heterogeneous uncertainty induced by dusty environments poses a considerable challenge. In response, our research presents a novel framework known as the Hierarchical Interactive Uncertainty-aware Network (HIUNet). HIUNet leverages Bayesian neural networks for the extraction of robust shallow features, bolstered by pre-trained encoders for feature extraction and the agility of lightweight decoders for preliminary image reconstitution. Subsequently, a feature frequency selection mechanism is activated to enhance overall performance by strategically identifying and retaining valuable features while effectively suppressing redundant and irrelevant ones. Following this, a feature enhancement module is applied to the preliminary restoration. This intricate fusion culminates in the production of a restored image of superior quality. Our extensive experiments, using our proposed Sand11K dataset that exhibits various levels of degradation from dust and sand, confirm the effectiveness and soundness of our proposed method. By modeling uncertainty via Bayesian neural networks to extract robust shallow features and selecting valuable features through frequency selection, HIUNet can reconstruct high-quality clean images. For future work, we plan to extend our uncertainty-aware framework to handle extreme sand scenarios.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1575995"},"PeriodicalIF":2.8,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12457322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145148867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic graph neural networks for UAV-based group activity recognition in structured team sports. 结构化团队运动中基于无人机的群体活动识别的动态图神经网络。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-08 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1631998
Ishrat Zahra, Yanfeng Wu, Haifa F Alhasson, Shuaa S Alharbi, Hanan Aljuaid, Ahmad Jalal, Hui Liu

Introduction: Understanding group actions in real-world settings is essential for the advancement of applications in surveillance, robotics, and autonomous systems. Group activity recognition, particularly in sports scenarios, presents unique challenges due to dynamic interactions, occlusions, and varying viewpoints. To address these challenges, we develop a deep learning system that recognizes multi-person behaviors by integrating appearance-based features (HOG, LBP, SIFT), skeletal data (MediaPipe, MOCON), and motion features. Our approach employs a Dynamic Graph Neural Network (DGNN) and Bi-LSTM architecture, enabling robust recognition of group activities in diverse and dynamic environments. To further validate our framework's adaptability, we include evaluations on Volleyball and SoccerTrack UAV-recorded datasets, which offer unique perspectives and challenges.

Method: Our framework integrates YOLOv11 for object detection and SORT for tracking to extract multi-modal features-including HOG, LBP, SIFT, skeletal data (MediaPipe), and motion context (MOCON). These features are optimized using genetic algorithms and fused within a Dynamic Graph Neural Network (DGNN), which models players as nodes in a spatio-temporal graph, effectively capturing both spatial formations and temporal dynamics.

Results: We evaluated our framework on three datasets: a volleyball dataset, SoccerTrack UAV-based soccer dataset, and NBA basketball dataset. Our system achieved 94.5% accuracy on the volleyball dataset (mAP: 94.2%, MPCA: 93.8%) with an inference time of 0.18 s per frame. On the SoccerTrack UAV dataset, accuracy was 91.8% (mAP: 91.5%, MPCA: 90.5%) with 0.20 s inference, and on the NBA basketball dataset, it was 91.1% (mAP: 90.8%, MPCA: 89.8%) with the same 0.20 s per frame. These results highlight our framework's high performance and efficient computational efficiency across various sports and perspectives.

Discussion: Our approach demonstrates robust performance in recognizing multi-person actions across diverse conditions, highlighting its adaptability to both conventional and UAV-based video sources.

引言:了解现实环境中的群体行为对于监控、机器人和自主系统的应用进步至关重要。群体活动识别,特别是在运动场景中,由于动态互动,闭塞和不同的观点,呈现出独特的挑战。为了应对这些挑战,我们开发了一个深度学习系统,通过集成基于外观的特征(HOG, LBP, SIFT),骨骼数据(MediaPipe, MOCON)和运动特征来识别多人行为。我们的方法采用动态图神经网络(DGNN)和Bi-LSTM架构,能够在多样化和动态环境中对群体活动进行鲁棒识别。为了进一步验证我们的框架的适应性,我们对排球和SoccerTrack无人机记录的数据集进行了评估,这些数据集提供了独特的视角和挑战。方法:我们的框架集成了用于目标检测的YOLOv11和用于跟踪的SORT,以提取多模态特征-包括HOG, LBP, SIFT,骨骼数据(MediaPipe)和运动上下文(MOCON)。这些功能使用遗传算法进行优化,并融合到动态图神经网络(DGNN)中,该网络将玩家建模为时空图中的节点,有效地捕获空间形成和时间动态。结果:我们在三个数据集上评估了我们的框架:排球数据集、基于SoccerTrack无人机的足球数据集和NBA篮球数据集。我们的系统在排球数据集上实现了94.5%的准确率(mAP: 94.2%, MPCA: 93.8%),每帧推理时间为0.18 s。在SoccerTrack无人机数据集上,准确率为91.8% (mAP: 91.5%, MPCA: 90.5%),每帧推理0.20 s;在NBA篮球数据集上,准确率为91.1% (mAP: 90.8%, MPCA: 89.8%),每帧推理0.20 s。这些结果突出了我们的框架在各种运动和视角中的高性能和高效计算效率。讨论:我们的方法在识别不同条件下的多人动作方面表现出强大的性能,突出了它对传统和基于无人机的视频源的适应性。
{"title":"Dynamic graph neural networks for UAV-based group activity recognition in structured team sports.","authors":"Ishrat Zahra, Yanfeng Wu, Haifa F Alhasson, Shuaa S Alharbi, Hanan Aljuaid, Ahmad Jalal, Hui Liu","doi":"10.3389/fnbot.2025.1631998","DOIUrl":"10.3389/fnbot.2025.1631998","url":null,"abstract":"<p><strong>Introduction: </strong>Understanding group actions in real-world settings is essential for the advancement of applications in surveillance, robotics, and autonomous systems. Group activity recognition, particularly in sports scenarios, presents unique challenges due to dynamic interactions, occlusions, and varying viewpoints. To address these challenges, we develop a deep learning system that recognizes multi-person behaviors by integrating appearance-based features (HOG, LBP, SIFT), skeletal data (MediaPipe, MOCON), and motion features. Our approach employs a Dynamic Graph Neural Network (DGNN) and Bi-LSTM architecture, enabling robust recognition of group activities in diverse and dynamic environments. To further validate our framework's adaptability, we include evaluations on Volleyball and SoccerTrack UAV-recorded datasets, which offer unique perspectives and challenges.</p><p><strong>Method: </strong>Our framework integrates YOLOv11 for object detection and SORT for tracking to extract multi-modal features-including HOG, LBP, SIFT, skeletal data (MediaPipe), and motion context (MOCON). These features are optimized using genetic algorithms and fused within a Dynamic Graph Neural Network (DGNN), which models players as nodes in a spatio-temporal graph, effectively capturing both spatial formations and temporal dynamics.</p><p><strong>Results: </strong>We evaluated our framework on three datasets: a volleyball dataset, SoccerTrack UAV-based soccer dataset, and NBA basketball dataset. Our system achieved 94.5% accuracy on the volleyball dataset (mAP: 94.2%, MPCA: 93.8%) with an inference time of 0.18 s per frame. On the SoccerTrack UAV dataset, accuracy was 91.8% (mAP: 91.5%, MPCA: 90.5%) with 0.20 s inference, and on the NBA basketball dataset, it was 91.1% (mAP: 90.8%, MPCA: 89.8%) with the same 0.20 s per frame. These results highlight our framework's high performance and efficient computational efficiency across various sports and perspectives.</p><p><strong>Discussion: </strong>Our approach demonstrates robust performance in recognizing multi-person actions across diverse conditions, highlighting its adaptability to both conventional and UAV-based video sources.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1631998"},"PeriodicalIF":2.8,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12452097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145130117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imitation-relaxation reinforcement learning for sparse badminton strikes via dynamic trajectory generation. 基于动态轨迹生成的稀疏羽毛球击球的模仿松弛强化学习。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-02 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1649870
Yanyan Yuan, Yucheng Tao, Shaowen Cheng, Yanhong Liang, Yongbin Jin, Hongtao Wang

Robotic racket sports provide exceptional benchmarks for evaluating dynamic motion control capabilities in robots. Due to the highly non-linear dynamics of the shuttlecock, the stringent demands on robots' dynamic responses, and the convergence difficulties caused by sparse rewards in reinforcement learning, badminton strikes remain a formidable challenge for robot systems. To address these issues, this study proposes DTG-IRRL, a novel learning framework for badminton strikes that integrates imitation-relaxation reinforcement learning with dynamic trajectory generation. The framework demonstrates significantly improved training efficiency and performance, achieving faster convergence and twice the landing accuracy. Analysis of the reward function within a specific parameter space hyperplane intuitively reveals the convergence difficulties arising from the inherent sparsity of rewards in racket sports and demonstrates the framework's effectiveness in mitigating local and slow convergence. Implemented on hardware with zero-shot transfer, the framework achieves a 90% hitting rate and a 70% landing accuracy, enabling sustained humanrobot rallies. Cross-platform validation using the UR5 robot demonstrates the framework's generalizability while highlighting the requirement for high dynamic performance of robotic arms in racket sports.

机器人球拍运动为评估机器人的动态运动控制能力提供了卓越的基准。由于羽毛球运动的高度非线性动力学特性、对机器人动态响应的严格要求以及强化学习中稀疏奖励带来的收敛困难,羽毛球击球仍然是机器人系统面临的一个巨大挑战。为了解决这些问题,本研究提出了一种新的羽毛球击球学习框架DTG-IRRL,该框架将模仿-放松强化学习与动态轨迹生成相结合。该框架显著提高了训练效率和性能,实现了更快的收敛速度和两倍的着陆精度。对特定参数空间超平面内奖励函数的分析直观地揭示了球拍运动中奖励固有的稀疏性所带来的收敛困难,并证明了该框架在缓解局部和缓慢收敛方面的有效性。该框架在硬件上实现了零射击转移,实现了90%的命中率和70%的着陆精度,实现了持续的人机拉力赛。使用UR5机器人进行的跨平台验证证明了框架的通用性,同时突出了球拍运动中机械臂的高动态性能要求。
{"title":"Imitation-relaxation reinforcement learning for sparse badminton strikes via dynamic trajectory generation.","authors":"Yanyan Yuan, Yucheng Tao, Shaowen Cheng, Yanhong Liang, Yongbin Jin, Hongtao Wang","doi":"10.3389/fnbot.2025.1649870","DOIUrl":"10.3389/fnbot.2025.1649870","url":null,"abstract":"<p><p>Robotic racket sports provide exceptional benchmarks for evaluating dynamic motion control capabilities in robots. Due to the highly non-linear dynamics of the shuttlecock, the stringent demands on robots' dynamic responses, and the convergence difficulties caused by sparse rewards in reinforcement learning, badminton strikes remain a formidable challenge for robot systems. To address these issues, this study proposes DTG-IRRL, a novel learning framework for badminton strikes that integrates imitation-relaxation reinforcement learning with dynamic trajectory generation. The framework demonstrates significantly improved training efficiency and performance, achieving faster convergence and twice the landing accuracy. Analysis of the reward function within a specific parameter space hyperplane intuitively reveals the convergence difficulties arising from the inherent sparsity of rewards in racket sports and demonstrates the framework's effectiveness in mitigating local and slow convergence. Implemented on hardware with zero-shot transfer, the framework achieves a 90% hitting rate and a 70% landing accuracy, enabling sustained humanrobot rallies. Cross-platform validation using the UR5 robot demonstrates the framework's generalizability while highlighting the requirement for high dynamic performance of robotic arms in racket sports.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1649870"},"PeriodicalIF":2.8,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12436432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145079389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable admittance control with sEMG-based support for wearable wrist exoskeleton. 可变导纳控制与肌电图为基础的支持可穿戴手腕外骨骼。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-01 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1562675
Charles Lambelet, Melvin Mathis, Marc Siegenthaler, Jeremia P O Held, Daniel Woolley, Olivier Lambercy, Roger Gassert, Nicole Wenderoth

Introduction: Wrist function impairment is common after stroke and heavily impacts the execution of daily tasks. Robotic therapy, and more specifically wearable exoskeletons, have the potential to boost training dose in context-relevant scenarios, promote voluntary effort through motor intent detection, and mitigate the effect of gravity. Portable exoskeletons are often non-backdrivable and it is challenging to make their control safe, reactive and stable. Admittance control is often used in this case, however, this type of control can become unstable when the supported biological joint stiffens. Variable admittance control adapts its parameters dynamically to allow free motion and stabilize the human-robot interaction.

Methods: In this study, we implemented a variable admittance control scheme on a one degree of freedom wearable wrist exoskeleton. The damping parameter of the admittance scheme is adjusted in real-time to cope with instabilities and varying wrist stiffness. In addition to the admittance control scheme, sEMG- and gravity-based controllers were implemented, characterized and optimized on ten healthy participants and tested on six stroke survivors.

Results: The results show that (1) the variable admittance control scheme could stabilize the interaction but at the cost of a decrease in transparency, and (2) when coupled with the variable admittance controller the sEMG-based control enhanced wrist functionality of stroke survivors in the most extreme angular positions.

Discussion: Our variable admittance control scheme with sEMG- and gravity-based support was most beneficial for patients with higher levels of impairment by improving range of motion and promoting voluntary effort. Future work could combine both controllers to customize and fine tune the stability of the support to a wider range of impairment levels and types.

手腕功能损伤是中风后常见的,严重影响日常工作的执行。机器人疗法,更具体地说,可穿戴外骨骼,有可能在与环境相关的场景中提高训练剂量,通过运动意图检测促进自愿努力,并减轻重力的影响。便携式外骨骼通常是不可反向驱动的,使其控制安全、反应性和稳定性具有挑战性。导纳控制通常用于这种情况,然而,当支撑的生物关节变硬时,这种类型的控制会变得不稳定。变导纳控制可以动态地调整其参数以实现自由运动和稳定人机交互。方法:在本研究中,我们在一自由度可穿戴腕部外骨骼上实现了可变导纳控制方案。实时调整导纳方案的阻尼参数,以应对不稳定性和手腕刚度的变化。除了导纳控制方案外,还在10名健康参与者和6名中风幸存者身上实施、表征和优化了基于表面肌电信号和重力的控制器。结果表明:(1)可变导纳控制方案可以稳定相互作用,但以降低透明度为代价;(2)与可变导纳控制器结合使用时,基于表面肌电信号的控制可以增强中风幸存者在最极端角度位置的手腕功能。讨论:我们的可变导纳控制方案结合肌电图和重力支持,通过改善活动范围和促进自主活动,对重度损伤患者最有益。未来的工作可以结合这两个控制器来定制和微调支持的稳定性,以适应更大范围的损伤水平和类型。
{"title":"Variable admittance control with sEMG-based support for wearable wrist exoskeleton.","authors":"Charles Lambelet, Melvin Mathis, Marc Siegenthaler, Jeremia P O Held, Daniel Woolley, Olivier Lambercy, Roger Gassert, Nicole Wenderoth","doi":"10.3389/fnbot.2025.1562675","DOIUrl":"10.3389/fnbot.2025.1562675","url":null,"abstract":"<p><strong>Introduction: </strong>Wrist function impairment is common after stroke and heavily impacts the execution of daily tasks. Robotic therapy, and more specifically wearable exoskeletons, have the potential to boost training dose in context-relevant scenarios, promote voluntary effort through motor intent detection, and mitigate the effect of gravity. Portable exoskeletons are often non-backdrivable and it is challenging to make their control safe, reactive and stable. Admittance control is often used in this case, however, this type of control can become unstable when the supported biological joint stiffens. Variable admittance control adapts its parameters dynamically to allow free motion and stabilize the human-robot interaction.</p><p><strong>Methods: </strong>In this study, we implemented a variable admittance control scheme on a one degree of freedom wearable wrist exoskeleton. The damping parameter of the admittance scheme is adjusted in real-time to cope with instabilities and varying wrist stiffness. In addition to the admittance control scheme, sEMG- and gravity-based controllers were implemented, characterized and optimized on ten healthy participants and tested on six stroke survivors.</p><p><strong>Results: </strong>The results show that (1) the variable admittance control scheme could stabilize the interaction but at the cost of a decrease in transparency, and (2) when coupled with the variable admittance controller the sEMG-based control enhanced wrist functionality of stroke survivors in the most extreme angular positions.</p><p><strong>Discussion: </strong>Our variable admittance control scheme with sEMG- and gravity-based support was most beneficial for patients with higher levels of impairment by improving range of motion and promoting voluntary effort. Future work could combine both controllers to customize and fine tune the stability of the support to a wider range of impairment levels and types.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1562675"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12434121/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145075000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
4D trajectory lightweight prediction algorithm based on knowledge distillation technique. 基于知识蒸馏技术的四维轨迹轻量化预测算法。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-22 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1643919
Weizhen Tang, Jie Dai, Zhousheng Huang, Boyang Hao, Weizheng Xie

Introduction: To address the challenges of current 4D trajectory prediction-specifically, limited multi-factor feature extraction and excessive computational cost-this study develops a lightweight prediction framework tailored for real-time air-traffic management.

Methods: We propose a hybrid RCBAM-TCN-LSTM architecture enhanced with a teacher-student knowledge distillation mechanism. The Residual Convolutional Block Attention Module (RCBAM) serves as the teacher network to extract high-dimensional spatial features via residual structures and channel-spatial attention. The student network adopts a Temporal Convolutional Network-LSTM (TCN-LSTM) design, integrating dilated causal convolutions and two LSTM layers for efficient temporal modeling. Historical ADS-B trajectory data from Zhuhai Jinwan Airport are preprocessed using cubic spline interpolation and a uniform-step sliding window to ensure data alignment and temporal consistency. In the distillation process, soft labels from the teacher and hard labels from actual observations jointly guide student training.

Results: In multi-step prediction experiments, the distilled RCBAM-TCN-LSTM model achieved average reductions of 40%-60% in MAE, RMSE, and MAPE compared with the original RCBAM and TCN-LSTM models, while improving R ² by 4%-6%. The approach maintained high accuracy across different prediction horizons while reducing computational complexity.

Discussion: The proposed method effectively balances high-precision modeling of spatiotemporal dependencies with lightweight deployment requirements, enabling real-time air-traffic monitoring and early warning on standard CPUs and embedded devices. This framework offers a scalable solution for enhancing the operational safety and efficiency of modern air-traffic control systems.

导言:为了解决当前四维轨迹预测的挑战,特别是有限的多因素特征提取和过高的计算成本,本研究开发了一个专为实时空中交通管理量身定制的轻量级预测框架。方法:我们提出了一种混合rbam - tcn - lstm架构,增强了师生知识蒸馏机制。残差卷积块注意模块(RCBAM)作为教师网络,通过残差结构和通道空间注意提取高维空间特征。学生网络采用时序卷积网络-LSTM (TCN-LSTM)设计,将扩展因果卷积和两层LSTM相结合,实现了高效的时序建模。采用三次样条插值和均匀步长滑动窗口对珠海金湾机场ADS-B历史轨迹数据进行预处理,确保数据对齐和时间一致性。在蒸馏过程中,来自老师的软标签和来自实际观察的硬标签共同指导学生的训练。结果:在多步预测实验中,与原始RCBAM和TCN-LSTM模型相比,蒸馏后的RCBAM-TCN-LSTM模型的MAE、RMSE和MAPE平均降低了40%-60%,R²提高了4%-6%。该方法在降低计算复杂度的同时,在不同预测范围内保持了较高的预测精度。讨论:提出的方法有效地平衡了高精度的时空依赖性建模和轻量级部署需求,在标准cpu和嵌入式设备上实现实时空中交通监控和预警。该框架为提高现代空中交通管制系统的操作安全性和效率提供了可扩展的解决方案。
{"title":"4D trajectory lightweight prediction algorithm based on knowledge distillation technique.","authors":"Weizhen Tang, Jie Dai, Zhousheng Huang, Boyang Hao, Weizheng Xie","doi":"10.3389/fnbot.2025.1643919","DOIUrl":"10.3389/fnbot.2025.1643919","url":null,"abstract":"<p><strong>Introduction: </strong>To address the challenges of current 4D trajectory prediction-specifically, limited multi-factor feature extraction and excessive computational cost-this study develops a lightweight prediction framework tailored for real-time air-traffic management.</p><p><strong>Methods: </strong>We propose a hybrid RCBAM-TCN-LSTM architecture enhanced with a teacher-student knowledge distillation mechanism. The Residual Convolutional Block Attention Module (RCBAM) serves as the teacher network to extract high-dimensional spatial features via residual structures and channel-spatial attention. The student network adopts a Temporal Convolutional Network-LSTM (TCN-LSTM) design, integrating dilated causal convolutions and two LSTM layers for efficient temporal modeling. Historical ADS-B trajectory data from Zhuhai Jinwan Airport are preprocessed using cubic spline interpolation and a uniform-step sliding window to ensure data alignment and temporal consistency. In the distillation process, soft labels from the teacher and hard labels from actual observations jointly guide student training.</p><p><strong>Results: </strong>In multi-step prediction experiments, the distilled RCBAM-TCN-LSTM model achieved average reductions of 40%-60% in MAE, RMSE, and MAPE compared with the original RCBAM and TCN-LSTM models, while improving <i>R</i> <sup>²</sup> by 4%-6%. The approach maintained high accuracy across different prediction horizons while reducing computational complexity.</p><p><strong>Discussion: </strong>The proposed method effectively balances high-precision modeling of spatiotemporal dependencies with lightweight deployment requirements, enabling real-time air-traffic monitoring and early warning on standard CPUs and embedded devices. This framework offers a scalable solution for enhancing the operational safety and efficiency of modern air-traffic control systems.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1643919"},"PeriodicalIF":2.8,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12411499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145014961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tri-manual interaction in hybrid BCI-VR systems: integrating gaze, EEG control for enhanced 3D object manipulation. 混合BCI-VR系统中的三手交互:整合凝视、脑电图控制以增强3D对象操作。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-14 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1628968
Jian Teng, Sukyoung Cho, Shaw-Mung Lee

Brain-computer interface (BCI) integration with virtual reality (VR) has progressed from single-limb control to multi-limb coordination, yet achieving intuitive tri-manual operation remains challenging. This study presents a consumer-grade hybrid BCI-VR framework enabling simultaneous control of two biological hands and a virtual third limb through integration of Tobii eye-tracking, NeuroSky single-channel EEG, and non-haptic controllers. The system employs e-Sense attention thresholds (>80% for 300 ms) to trigger virtual hand activation combined with gaze-driven targeting within 45° visual cones. A soft maximum weighted arbitration algorithm resolves spatiotemporal conflicts between manual and virtual inputs with 92.4% success rate. Experimental validation with eight participants across 160 trials demonstrated 87.5% virtual hand success rate and 41% spatial error reduction (σ = 0.23 mm vs. 0.39 mm) compared to traditional dual-hand control. The framework achieved 320 ms activation latency and 22% NASA-TLX workload reduction through adaptive cognitive load management. Time-frequency analysis revealed characteristic beta-band (15-20 Hz) energy modulations during successful virtual limb control, providing neurophysiological evidence for attention-mediated supernumerary limb embodiment. These findings demonstrate that sophisticated algorithmic approaches can compensate for consumer-grade hardware limitations, enabling laboratory-grade precision in accessible tri-manual VR applications for rehabilitation, training, and assistive technologies.

脑机接口(BCI)与虚拟现实(VR)的集成已经从单肢控制发展到多肢协调,但实现直观的三手操作仍然是一个挑战。本研究提出了一种消费级混合BCI-VR框架,通过集成Tobii眼动追踪、NeuroSky单通道EEG和非触觉控制器,可以同时控制两只生物手和虚拟第三肢。该系统采用e-Sense注意力阈值(bbb80 %, 300 ms)来触发虚拟手激活,并结合45°视锥内的凝视驱动目标。一种软最大加权仲裁算法解决了人工和虚拟输入的时空冲突,成功率为92.4%。8名参与者参与的160项实验验证表明,与传统双手控制相比,虚拟手成功率为87.5%,空间误差降低41% (σ = 0.23 mm vs. 0.39 mm)。该框架通过自适应认知负载管理实现了320 ms的激活延迟和22%的NASA-TLX工作负载减少。时频分析显示,在成功的虚拟肢体控制过程中,特征的β波段(15-20 Hz)能量调制,为注意介导的多肢体体现提供了神经生理学证据。这些研究结果表明,复杂的算法方法可以弥补消费者级硬件的限制,使实验室级的精度在可访问的三手动VR应用中用于康复、训练和辅助技术。
{"title":"Tri-manual interaction in hybrid BCI-VR systems: integrating gaze, EEG control for enhanced 3D object manipulation.","authors":"Jian Teng, Sukyoung Cho, Shaw-Mung Lee","doi":"10.3389/fnbot.2025.1628968","DOIUrl":"10.3389/fnbot.2025.1628968","url":null,"abstract":"<p><p>Brain-computer interface (BCI) integration with virtual reality (VR) has progressed from single-limb control to multi-limb coordination, yet achieving intuitive tri-manual operation remains challenging. This study presents a consumer-grade hybrid BCI-VR framework enabling simultaneous control of two biological hands and a virtual third limb through integration of Tobii eye-tracking, NeuroSky single-channel EEG, and non-haptic controllers. The system employs e-Sense attention thresholds (>80% for 300 ms) to trigger virtual hand activation combined with gaze-driven targeting within 45° visual cones. A soft maximum weighted arbitration algorithm resolves spatiotemporal conflicts between manual and virtual inputs with 92.4% success rate. Experimental validation with eight participants across 160 trials demonstrated 87.5% virtual hand success rate and 41% spatial error reduction (<i>σ</i> = 0.23 mm vs. 0.39 mm) compared to traditional dual-hand control. The framework achieved 320 ms activation latency and 22% NASA-TLX workload reduction through adaptive cognitive load management. Time-frequency analysis revealed characteristic beta-band (15-20 Hz) energy modulations during successful virtual limb control, providing neurophysiological evidence for attention-mediated supernumerary limb embodiment. These findings demonstrate that sophisticated algorithmic approaches can compensate for consumer-grade hardware limitations, enabling laboratory-grade precision in accessible tri-manual VR applications for rehabilitation, training, and assistive technologies.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1628968"},"PeriodicalIF":2.8,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12390853/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144950948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-grained image classification using the MogaNet network and a multi-level gating mechanism. 使用MogaNet网络和多级门控机制进行细粒度图像分类。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-06 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1630281
Dahai Li, Su Chen

Fine-grained image classification tasks face challenges such as difficulty in labeling, scarcity of samples, and small category differences. To address this problem, this study proposes a novel fine-grained image classification method based on the MogaNet network and a multi-level gating mechanism. A feature extraction network based on MogaNet is constructed, and multi-scale feature fusion is combined to fully mine image information. The contextual information extractor is designed to align and filter more discriminative local features using the semantic context of the network, thereby strengthening the network's ability to capture detailed features. Meanwhile, a multi-level gating mechanism is introduced to obtain the saliency features of images. A feature elimination strategy is proposed to suppress the interference of fuzzy class features and background noise. A loss function is designed to constrain the elimination of fuzzy class features and classification prediction. Experimental results demonstrate that the new method can be applied to 5-shot tasks across four public datasets: Mini-ImageNet, CUB-200-2011, Stanford Dogs, and Stanford Cars. The accuracy rates reach 79.33, 87.58, 79.34, and 83.82%, respectively, which shows better performance than other state-of-the-art image classification methods.

细粒度图像分类任务面临着标记困难、样本稀缺性和类别差异小等挑战。为了解决这一问题,本研究提出了一种基于MogaNet网络和多级门控机制的新型细粒度图像分类方法。构建基于MogaNet的特征提取网络,结合多尺度特征融合,充分挖掘图像信息。上下文信息提取器旨在利用网络的语义上下文来对齐和过滤更具判别性的局部特征,从而增强网络捕获详细特征的能力。同时,引入多级门控机制来获取图像的显著性特征。为了抑制模糊类特征和背景噪声的干扰,提出了一种特征消除策略。设计了一个损失函数来约束模糊分类特征的消除和分类预测。实验结果表明,该方法可以应用于Mini-ImageNet、CUB-200-2011、Stanford Dogs和Stanford Cars四个公共数据集的5-shot任务。准确率分别达到79.33、87.58、79.34和83.82%,优于其他最先进的图像分类方法。
{"title":"Fine-grained image classification using the MogaNet network and a multi-level gating mechanism.","authors":"Dahai Li, Su Chen","doi":"10.3389/fnbot.2025.1630281","DOIUrl":"10.3389/fnbot.2025.1630281","url":null,"abstract":"<p><p>Fine-grained image classification tasks face challenges such as difficulty in labeling, scarcity of samples, and small category differences. To address this problem, this study proposes a novel fine-grained image classification method based on the MogaNet network and a multi-level gating mechanism. A feature extraction network based on MogaNet is constructed, and multi-scale feature fusion is combined to fully mine image information. The contextual information extractor is designed to align and filter more discriminative local features using the semantic context of the network, thereby strengthening the network's ability to capture detailed features. Meanwhile, a multi-level gating mechanism is introduced to obtain the saliency features of images. A feature elimination strategy is proposed to suppress the interference of fuzzy class features and background noise. A loss function is designed to constrain the elimination of fuzzy class features and classification prediction. Experimental results demonstrate that the new method can be applied to 5-shot tasks across four public datasets: Mini-ImageNet, CUB-200-2011, Stanford Dogs, and Stanford Cars. The accuracy rates reach 79.33, 87.58, 79.34, and 83.82%, respectively, which shows better performance than other state-of-the-art image classification methods.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1630281"},"PeriodicalIF":2.8,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12364808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144950965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated neural network framework for multi-object detection and recognition using UAV imagery. 基于无人机图像的多目标检测与识别集成神经网络框架。
IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-30 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1643011
Mohammed Alshehri, Tingting Xue, Ghulam Mujtaba, Yahya AlQahtani, Nouf Abdullah Almujally, Ahmad Jalal, Hui Liu

Introduction: Accurate vehicle analysis from aerial imagery has become increasingly vital for emerging technologies and public service applications such as intelligent traffic management, urban planning, autonomous navigation, and military surveillance. However, analyzing UAV-captured video poses several inherent challenges, such as the small size of target vehicles, occlusions, cluttered urban backgrounds, motion blur, and fluctuating lighting conditions which hinder the accuracy and consistency of conventional perception systems. To address these complexities, our research proposes a fully end-to-end deep learning-driven perception pipeline specifically optimized for UAV-based traffic monitoring. The proposed framwork integrates multiple advanced modules: RetinexNet for preprocessing, segmentation using HRNet to preserve high-resolution semantic information, and vehicle detection using the YOLOv11 framework. Deep SORT is employed for efficient vehicle tracking, while CSRNet facilitates high-density vehicle counting. LSTM networks are integrated to predict vehicle trajectories based on temporal patterns, and a combination of DenseNet and SuperPoint is utilized for robust feature extraction. Finally, classification is performed using Vision Transformers (ViTs), leveraging attention mechanisms to ensure accurate recognition across diverse categories. The modular yet unified architecture is designed to handle spatiotemporal dynamics, making it suitable for real-time deployment in diverse UAV platforms.

Method: The framework suggests using today's best neural networks that are made to solve different problems in aerial vehicle analysis. RetinexNet is used in preprocessing to make the lighting of each input frame consistent. Using HRNet for semantic segmentation allows for accurate splitting between vehicles and their surroundings. YOLOv11 provides high precision and quick vehicle detection and Deep SORT allows reliable tracking without losing track of individual cars. CSRNet are used for vehicle counting that is unaffected by obstacles or traffic jams. LSTM models capture how a car moves in time to forecast future positions. Combining DenseNet and SuperPoint embeddings that were improved with an AutoEncoder is done during feature extraction. In the end, using an attention function, Vision Transformer-based models classify vehicles seen from above. Every part of the system is developed and included to give the improved performance when the UAV is being used in real life.

Results: Our proposed framework significantly improves the accuracy, reliability, and efficiency of vehicle analysis from UAV imagery. Our pipeline was rigorously evaluated on two famous datasets, AU-AIR and Roundabout. On the AU-AIR dataset, the system achieved a detection accuracy of 97.8%, a tracking accuracy of 96.5%, and a classification accuracy of 98.4%. Similarly, on the Roundabout dataset, it reached 96.9% det

导言:对于新兴技术和公共服务应用,如智能交通管理、城市规划、自主导航和军事监视,从航空图像中进行准确的车辆分析变得越来越重要。然而,分析无人机捕获的视频提出了几个固有的挑战,例如目标车辆的小尺寸、遮挡、杂乱的城市背景、运动模糊和波动的照明条件,这些都会阻碍传统感知系统的准确性和一致性。为了解决这些复杂性,我们的研究提出了一个完全端到端深度学习驱动的感知管道,专门针对基于无人机的交通监控进行了优化。该框架集成了多个高级模块:用于预处理的retexnet,用于保留高分辨率语义信息的HRNet分割,以及使用YOLOv11框架的车辆检测。采用Deep SORT实现高效的车辆跟踪,CSRNet实现高密度的车辆计数。将LSTM网络集成到基于时间模式的车辆轨迹预测中,并结合DenseNet和SuperPoint进行鲁棒特征提取。最后,使用视觉变形器(ViTs)进行分类,利用注意力机制确保对不同类别的准确识别。模块化但统一的架构设计用于处理时空动态,使其适合于在各种无人机平台上实时部署。方法:该框架建议使用当今最好的神经网络来解决飞行器分析中的不同问题。预处理中使用了retexnet,使每个输入帧的光照一致。使用HRNet进行语义分割允许在车辆和周围环境之间进行准确的分割。YOLOv11提供高精度和快速的车辆检测,Deep SORT允许可靠的跟踪,而不会丢失单个车辆的跟踪。CSRNet用于不受障碍物或交通堵塞影响的车辆计数。LSTM模型捕捉汽车如何及时移动,以预测未来的位置。结合DenseNet和SuperPoint嵌入,通过AutoEncoder改进,在特征提取过程中完成。最后,利用注意力功能,基于视觉转换器的模型对从上方看到的车辆进行分类。系统的每个部分都被开发和包含,以提高无人机在实际生活中的使用性能。结果:我们提出的框架显著提高了无人机图像车辆分析的准确性、可靠性和效率。我们的管道在AU-AIR和Roundabout两个著名的数据集上进行了严格的评估。在AU-AIR数据集上,该系统的检测准确率为97.8%,跟踪准确率为96.5%,分类准确率为98.4%。同样,在Roundabout数据集上,检测准确率达到96.9%,跟踪准确率达到94.4%,分类准确率达到97.7%。这些结果超越了以前的基准,证明了该系统在各种空中交通场景中的稳健性能。先进模型的集成,用于检测的YOLOv11,用于分割的HRNet,用于跟踪的Deep SORT,用于计数的CSRNet,用于轨迹预测的LSTM,以及用于分类的Vision transformer,使该框架即使在遮挡、可变光照和尺度变化等具有挑战性的条件下也能保持高精度。讨论:结果表明,所选择的深度学习系统足够强大,可以应对飞行器分析的挑战,并在上述所有任务中提供可靠和精确的结果。结合几个先进的模型,确保系统即使在处理诸如人被掩盖和大小不一的问题时也能顺利运行。
{"title":"Integrated neural network framework for multi-object detection and recognition using UAV imagery.","authors":"Mohammed Alshehri, Tingting Xue, Ghulam Mujtaba, Yahya AlQahtani, Nouf Abdullah Almujally, Ahmad Jalal, Hui Liu","doi":"10.3389/fnbot.2025.1643011","DOIUrl":"10.3389/fnbot.2025.1643011","url":null,"abstract":"<p><strong>Introduction: </strong>Accurate vehicle analysis from aerial imagery has become increasingly vital for emerging technologies and public service applications such as intelligent traffic management, urban planning, autonomous navigation, and military surveillance. However, analyzing UAV-captured video poses several inherent challenges, such as the small size of target vehicles, occlusions, cluttered urban backgrounds, motion blur, and fluctuating lighting conditions which hinder the accuracy and consistency of conventional perception systems. To address these complexities, our research proposes a fully end-to-end deep learning-driven perception pipeline specifically optimized for UAV-based traffic monitoring. The proposed framwork integrates multiple advanced modules: RetinexNet for preprocessing, segmentation using HRNet to preserve high-resolution semantic information, and vehicle detection using the YOLOv11 framework. Deep SORT is employed for efficient vehicle tracking, while CSRNet facilitates high-density vehicle counting. LSTM networks are integrated to predict vehicle trajectories based on temporal patterns, and a combination of DenseNet and SuperPoint is utilized for robust feature extraction. Finally, classification is performed using Vision Transformers (ViTs), leveraging attention mechanisms to ensure accurate recognition across diverse categories. The modular yet unified architecture is designed to handle spatiotemporal dynamics, making it suitable for real-time deployment in diverse UAV platforms.</p><p><strong>Method: </strong>The framework suggests using today's best neural networks that are made to solve different problems in aerial vehicle analysis. RetinexNet is used in preprocessing to make the lighting of each input frame consistent. Using HRNet for semantic segmentation allows for accurate splitting between vehicles and their surroundings. YOLOv11 provides high precision and quick vehicle detection and Deep SORT allows reliable tracking without losing track of individual cars. CSRNet are used for vehicle counting that is unaffected by obstacles or traffic jams. LSTM models capture how a car moves in time to forecast future positions. Combining DenseNet and SuperPoint embeddings that were improved with an AutoEncoder is done during feature extraction. In the end, using an attention function, Vision Transformer-based models classify vehicles seen from above. Every part of the system is developed and included to give the improved performance when the UAV is being used in real life.</p><p><strong>Results: </strong>Our proposed framework significantly improves the accuracy, reliability, and efficiency of vehicle analysis from UAV imagery. Our pipeline was rigorously evaluated on two famous datasets, AU-AIR and Roundabout. On the AU-AIR dataset, the system achieved a detection accuracy of 97.8%, a tracking accuracy of 96.5%, and a classification accuracy of 98.4%. Similarly, on the Roundabout dataset, it reached 96.9% det","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1643011"},"PeriodicalIF":2.8,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343587/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144845568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Neurorobotics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1