Frontiers in Neurorobotics最新文献_第2页

NAN-DETR: noising multi-anchor makes DETR better for object detection. NAN-DETR：噪声多锚使 DETR 更好地用于物体检测。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-10-14 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1484088

Zixin Huang, Xuesong Tao, Xinyuan Liu

Object detection plays a crucial role in robotic vision, focusing on accurately identifying and localizing objects within images. However, many existing methods encounter limitations, particularly when it comes to effectively implementing a one-to-many matching strategy. To address these challenges, we propose NAN-DETR (Noising Multi-Anchor Detection Transformer), an innovative framework based on DETR (Detection Transformer). NAN-DETR introduces three key improvements to transformer-based object detection: a decoder-based multi-anchor strategy, a centralization noising mechanism, and the integration of Complete Intersection over Union (CIoU) loss. The multi-anchor strategy leverages multiple anchors per object, significantly enhancing detection accuracy by improving the one-to-many matching process. The centralization noising mechanism mitigates conflicts among anchors by injecting controlled noise into the detection boxes, thereby increasing the robustness of the model. Additionally, CIoU loss, which incorporates both aspect ratio and spatial distance in its calculations, results in more precise bounding box predictions compared to the conventional IoU loss. Although NAN-DETR may not drastically improve real-time processing capabilities, its exceptional performance positions it as a highly reliable solution for diverse object detection scenarios.

物体检测在机器人视觉中起着至关重要的作用，其重点是准确识别和定位图像中的物体。然而，许多现有方法都存在局限性，尤其是在有效实施一对多匹配策略时。为了应对这些挑战，我们提出了基于 DETR（检测变换器）的创新框架 NAN-DETR（噪声多锚检测变换器）。NAN-DETR 对基于变换器的物体检测引入了三项关键改进：基于解码器的多锚（multi-anchor）策略、集中噪声机制以及完整交叉联合（CIoU）损失的集成。多锚策略利用每个对象的多个锚点，通过改进一对多的匹配过程显著提高了检测精度。集中噪声机制通过向检测盒注入受控噪声来缓解锚点之间的冲突，从而提高模型的鲁棒性。此外，CIoU 丢失在计算中同时考虑了长宽比和空间距离，因此与传统的 IoU 丢失相比，CIoU 丢失能更精确地预测边界框。尽管 NAN-DETR 可能无法大幅提高实时处理能力，但其卓越的性能使其成为适用于各种物体检测场景的高度可靠的解决方案。

{"title":"NAN-DETR: noising multi-anchor makes DETR better for object detection.","authors":"Zixin Huang, Xuesong Tao, Xinyuan Liu","doi":"10.3389/fnbot.2024.1484088","DOIUrl":"10.3389/fnbot.2024.1484088","url":null,"abstract":"Object detection plays a crucial role in robotic vision, focusing on accurately identifying and localizing objects within images. However, many existing methods encounter limitations, particularly when it comes to effectively implementing a one-to-many matching strategy. To address these challenges, we propose NAN-DETR (Noising Multi-Anchor Detection Transformer), an innovative framework based on DETR (Detection Transformer). NAN-DETR introduces three key improvements to transformer-based object detection: a decoder-based multi-anchor strategy, a centralization noising mechanism, and the integration of Complete Intersection over Union (CIoU) loss. The multi-anchor strategy leverages multiple anchors per object, significantly enhancing detection accuracy by improving the one-to-many matching process. The centralization noising mechanism mitigates conflicts among anchors by injecting controlled noise into the detection boxes, thereby increasing the robustness of the model. Additionally, CIoU loss, which incorporates both aspect ratio and spatial distance in its calculations, results in more precise bounding box predictions compared to the conventional IoU loss. Although NAN-DETR may not drastically improve real-time processing capabilities, its exceptional performance positions it as a highly reliable solution for diverse object detection scenarios.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1484088"},"PeriodicalIF":2.6,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11513373/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142521681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Noisy Dueling Double Deep Q-Network algorithm for autonomous underwater vehicle path planning. 用于自主水下航行器路径规划的噪声决斗双深 Q 网络算法。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-10-14 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1466571

Xu Liao, Le Li, Chuangxia Huang, Xian Zhao, Shumin Tan

How to improve the success rate of autonomous underwater vehicle (AUV) path planning and reduce travel time as much as possible is a very challenging and crucial problem in the practical applications of AUV in the complex ocean current environment. Traditional reinforcement learning algorithms lack exploration of the environment, and the strategies learned by the agent may not generalize well to other different environments. To address these challenges, we propose a novel AUV path planning algorithm named the Noisy Dueling Double Deep Q-Network (ND3QN) algorithm by modifying the reward function and introducing a noisy network, which generalizes the traditional D3QN algorithm. Compared with the classical algorithm [e.g., Rapidly-exploring Random Trees Star (RRT*), DQN, and D3QN], with simulation experiments conducted in realistic terrain and ocean currents, the proposed ND3QN algorithm demonstrates the outstanding characteristics of a higher success rate of AUV path planning, shorter travel time, and smoother paths.

在复杂的洋流环境中，如何提高自主潜水器（AUV）路径规划的成功率并尽可能缩短航行时间，是 AUV 实际应用中一个极具挑战性的关键问题。传统的强化学习算法缺乏对环境的探索，代理学习到的策略可能无法很好地推广到其他不同的环境中。为了应对这些挑战，我们通过修改奖励函数和引入噪声网络，提出了一种新型的 AUV 路径规划算法，即噪声决斗双深 Q 网络（ND3QN）算法，该算法是对传统 D3QN 算法的泛化。通过在真实地形和洋流中进行仿真实验，与经典算法[如快速探索随机树星（RRT*）、DQN 和 D3QN]相比，所提出的 ND3QN 算法具有 AUV 路径规划成功率更高、行进时间更短、路径更平滑等突出特点。

引用次数: 0

CAM-Vtrans: real-time sports training utilizing multi-modal robot data. CAM-Vtrans：利用多模态机器人数据进行实时运动训练。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-10-11 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1453571

Hong LinLin, Lee Sangheang, Song GuanTing

Introduction: Assistive robots and human-robot interaction have become integral parts of sports training. However, existing methods often fail to provide real-time and accurate feedback, and they often lack integration of comprehensive multi-modal data.

Methods: To address these issues, we propose a groundbreaking and innovative approach: CAM-Vtrans-Cross-Attention Multi-modal Visual Transformer. By leveraging the strengths of state-of-the-art techniques such as Visual Transformers (ViT) and models like CLIP, along with cross-attention mechanisms, CAM-Vtrans harnesses the power of visual and textual information to provide athletes with highly accurate and timely feedback. Through the utilization of multi-modal robot data, CAM-Vtrans offers valuable assistance, enabling athletes to optimize their performance while minimizing potential injury risks. This novel approach represents a significant advancement in the field, offering an innovative solution to overcome the limitations of existing methods and enhance the precision and efficiency of sports training programs.

简介辅助机器人和人机交互已成为体育训练不可或缺的一部分。然而，现有的方法往往无法提供实时、准确的反馈，而且往往缺乏对综合多模态数据的整合：为了解决这些问题，我们提出了一种突破性的创新方法：CAM-Vtrans-Cross-Attention Multi-modal Visual Transformer。通过利用视觉转换器（ViT）等先进技术和 CLIP 等模型以及交叉注意机制的优势，CAM-Vtrans 利用视觉和文本信息的力量为运动员提供高度准确和及时的反馈。通过利用多模态机器人数据，CAM-Vtrans 提供了宝贵的帮助，使运动员能够优化其表现，同时将潜在的受伤风险降至最低。这种新颖的方法代表了该领域的重大进步，为克服现有方法的局限性、提高运动训练计划的精确性和效率提供了创新解决方案。

{"title":"CAM-Vtrans: real-time sports training utilizing multi-modal robot data.","authors":"Hong LinLin, Lee Sangheang, Song GuanTing","doi":"10.3389/fnbot.2024.1453571","DOIUrl":"10.3389/fnbot.2024.1453571","url":null,"abstract":"Introduction: Assistive robots and human-robot interaction have become integral parts of sports training. However, existing methods often fail to provide real-time and accurate feedback, and they often lack integration of comprehensive multi-modal data.Methods: To address these issues, we propose a groundbreaking and innovative approach: CAM-Vtrans-Cross-Attention Multi-modal Visual Transformer. By leveraging the strengths of state-of-the-art techniques such as Visual Transformers (ViT) and models like CLIP, along with cross-attention mechanisms, CAM-Vtrans harnesses the power of visual and textual information to provide athletes with highly accurate and timely feedback. Through the utilization of multi-modal robot data, CAM-Vtrans offers valuable assistance, enabling athletes to optimize their performance while minimizing potential injury risks. This novel approach represents a significant advancement in the field, offering an innovative solution to overcome the limitations of existing methods and enhance the precision and efficiency of sports training programs.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1453571"},"PeriodicalIF":2.6,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502466/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142516399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sports-ACtrans Net: research on multimodal robotic sports action recognition driven via ST-GCN. Sports-ACtrans Net：通过 ST-GCN 驱动的多模态机器人运动动作识别研究。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-10-11 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1443432

Qi Lu

Introduction: Accurately recognizing and understanding human motion actions presents a key challenge in the development of intelligent sports robots. Traditional methods often encounter significant drawbacks, such as high computational resource requirements and suboptimal real-time performance. To address these limitations, this study proposes a novel approach called Sports-ACtrans Net.

Methods: In this approach, the Swin Transformer processes visual data to extract spatial features, while the Spatio-Temporal Graph Convolutional Network (ST-GCN) models human motion as graphs to handle skeleton data. By combining these outputs, a comprehensive representation of motion actions is created. Reinforcement learning is employed to optimize the action recognition process, framing it as a sequential decision-making problem. Deep Q-learning is utilized to learn the optimal policy, thereby enhancing the robot's ability to accurately recognize and engage in motion.

Results and discussion: Experiments demonstrate significant improvements over state-of-the-art methods. This research advances the fields of neural computation, computer vision, and neuroscience, aiding in the development of intelligent robotic systems capable of understanding and participating in sports activities.

简介准确识别和理解人类运动动作是开发智能运动机器人的关键挑战。传统方法往往存在严重缺陷，如计算资源要求高、实时性不理想等。为了解决这些局限性，本研究提出了一种名为 Sports-ACtrans Net.Methods 的新方法：在这种方法中，斯文变换器（Swin Transformer）处理视觉数据以提取空间特征，而时空图卷积网络（ST-GCN）将人体运动建模为图形以处理骨架数据。通过将这些输出组合起来，就能创建一个全面的运动动作表示。强化学习用于优化动作识别过程，将其视为一个连续决策问题。利用深度 Q-learning 学习最优策略，从而提高机器人准确识别和参与运动的能力：实验表明，与最先进的方法相比，该方法有了显著改进。这项研究推动了神经计算、计算机视觉和神经科学领域的发展，有助于开发能够理解和参与体育活动的智能机器人系统。

{"title":"Sports-ACtrans Net: research on multimodal robotic sports action recognition driven via ST-GCN.","authors":"Qi Lu","doi":"10.3389/fnbot.2024.1443432","DOIUrl":"10.3389/fnbot.2024.1443432","url":null,"abstract":"Introduction: Accurately recognizing and understanding human motion actions presents a key challenge in the development of intelligent sports robots. Traditional methods often encounter significant drawbacks, such as high computational resource requirements and suboptimal real-time performance. To address these limitations, this study proposes a novel approach called Sports-ACtrans Net.Methods: In this approach, the Swin Transformer processes visual data to extract spatial features, while the Spatio-Temporal Graph Convolutional Network (ST-GCN) models human motion as graphs to handle skeleton data. By combining these outputs, a comprehensive representation of motion actions is created. Reinforcement learning is employed to optimize the action recognition process, framing it as a sequential decision-making problem. Deep Q-learning is utilized to learn the optimal policy, thereby enhancing the robot's ability to accurately recognize and engage in motion.Results and discussion: Experiments demonstrate significant improvements over state-of-the-art methods. This research advances the fields of neural computation, computer vision, and neuroscience, aiding in the development of intelligent robotic systems capable of understanding and participating in sports activities.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1443432"},"PeriodicalIF":2.6,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502397/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142498770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The SocialAI school: a framework leveraging developmental psychology toward artificial socio-cultural agents. SocialAI 学校：利用发展心理学实现人工社会文化代理的框架。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-10-09 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1396359

Grgur Kovač, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer

Developmental psychologists have long-established socio-cognitive abilities as fundamental to human intelligence and development. These abilities enable individuals to enter, learn from, and contribute to a surrounding culture. This drives the process of cumulative cultural evolution, which is responsible for humanity's most remarkable achievements. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to enter a culture as well. We draw inspiration from the work of Michael Tomasello and Jerome Bruner, who studied socio-cognitive development and emphasized the influence of a cultural environment on intelligence. We outline a broader set of concepts than those currently studied in AI to provide a foundation for research in artificial social intelligence. Those concepts include social cognition (joint attention, perspective taking), communication, social learning, formats, and scaffolding. To facilitate research in this domain, we present The SocialAI school-a tool that offers a customizable parameterized suite of procedurally generated environments. This tool simplifies experimentation with the introduced concepts. Additionally, these environments can be used both with multimodal RL agents, or with pure-text Large Language Models (LLMs) as interactive agents. Through a series of case studies, we demonstrate the versatility of the SocialAI school for studying both RL and LLM-based agents. Our motivation is to engage the AI community around social intelligence informed by developmental psychology, and to provide a user-friendly resource and tool for initial investigations in this direction. Refer to the project website for code and additional resources: https://sites.google.com/view/socialai-school.

长期以来，发展心理学家一直认为社会认知能力是人类智力和发展的基础。这些能力使个人能够进入周围的文化，从中学习并做出贡献。这推动了文化的累积进化过程，而人类最杰出的成就正是由这一过程促成的。有关社会互动代理的人工智能研究大多涉及多代理环境中文化的出现（通常没有发展心理学的坚实基础）。我们认为，人工智能研究应借鉴心理学知识，研究进入文化的社会认知能力。我们从迈克尔-托马塞罗（Michael Tomasello）和杰罗姆-布鲁纳（Jerome Bruner）的研究中汲取灵感，他们研究社会认知发展，强调文化环境对智力的影响。我们概述了一套比目前人工智能研究更广泛的概念，为人工社会智能的研究奠定了基础。这些概念包括社会认知（共同注意、视角把握）、交流、社会学习、格式和支架。为了促进这一领域的研究，我们推出了 SocialAI 学校--一种提供可定制参数化程序生成环境套件的工具。该工具简化了对所引入概念的实验。此外，这些环境既可用于多模态 RL 代理，也可用于纯文本大语言模型（LLM）交互代理。通过一系列案例研究，我们展示了 SocialAI 学校在研究基于 RL 和 LLM 的代理方面的多功能性。我们的动机是让人工智能社区参与到以发展心理学为基础的社会智能中来，并为这一方向的初步研究提供用户友好型资源和工具。有关代码和其他资源，请访问项目网站：https://sites.google.com/view/socialai-school。

{"title":"The SocialAI school: a framework leveraging developmental psychology toward artificial socio-cultural agents.","authors":"Grgur Kovač, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer","doi":"10.3389/fnbot.2024.1396359","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1396359","url":null,"abstract":"Developmental psychologists have long-established socio-cognitive abilities as fundamental to human intelligence and development. These abilities enable individuals to enter, learn from, and contribute to a surrounding culture. This drives the process of cumulative cultural evolution, which is responsible for humanity's most remarkable achievements. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to enter a culture as well. We draw inspiration from the work of Michael Tomasello and Jerome Bruner, who studied socio-cognitive development and emphasized the influence of a cultural environment on intelligence. We outline a broader set of concepts than those currently studied in AI to provide a foundation for research in artificial social intelligence. Those concepts include social cognition (joint attention, perspective taking), communication, social learning, formats, and scaffolding. To facilitate research in this domain, we present The SocialAI school-a tool that offers a customizable parameterized suite of procedurally generated environments. This tool simplifies experimentation with the introduced concepts. Additionally, these environments can be used both with multimodal RL agents, or with pure-text Large Language Models (LLMs) as interactive agents. Through a series of case studies, we demonstrate the versatility of the SocialAI school for studying both RL and LLM-based agents. Our motivation is to engage the AI community around social intelligence informed by developmental psychology, and to provide a user-friendly resource and tool for initial investigations in this direction. Refer to the project website for code and additional resources: https://sites.google.com/view/socialai-school.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1396359"},"PeriodicalIF":2.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11496287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142498771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast reconstruction of milling temperature field based on CNN-GRU machine learning models. 基于 CNN-GRU 机器学习模型的铣削温度场快速重建。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-09-27 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1448482

Fengyuan Ma, Haoyu Wang, Mingfeng E, Zhongjin Sha, Xingshu Wang, Yunxian Cui, Junwei Yin

With the development of intelligent manufacturing technology, robots have become more widespread in the field of milling processing. When milling difficult-to-machine alloy materials, the localized high temperature and large temperature gradient at the front face of the tool lead to shortened tool life and poor machining quality. The existing temperature field reconstruction methods have many assumptions, large arithmetic volume and long solution time. In this paper, an inverse heat conduction problem solution model based on Gated Convolutional Recurrent Neural Network (CNN-GRU) is proposed for reconstructing the temperature field of the tool during milling. In order to ensure the speed and accuracy of the reconstruction, we propose to utilize the inverse heat conduction problem solution model constructed by knowledge distillation (KD) and compression acceleration, which achieves a significant reduction of the training time with a small loss of optimality and ensures the accuracy and efficiency of the prediction model. With different levels of random noise added to the model input data, CNN-GRU + KD is noise-resistant and still shows good robustness and stability under noisy data. The temperature field reconstruction of the milling tool is carried out for three different working conditions, and the curve fitting excellence under the three conditions is 0.97 at the highest, and the root mean square error is 1.43°C at the minimum, respectively, and the experimental results show that the model is feasible and effective in carrying out the temperature field reconstruction of the milling tool and is of great significance in improving the accuracy of the milling machining robot.

随着智能制造技术的发展，机器人在铣削加工领域的应用越来越广泛。在铣削难加工合金材料时，刀具前端面的局部高温和较大的温度梯度会导致刀具寿命缩短和加工质量下降。现有的温度场重建方法存在假设条件多、算术量大、求解时间长等问题。本文提出了一种基于门控卷积递归神经网络（CNN-GRU）的反热传导问题求解模型，用于重建铣削过程中的刀具温度场。为了保证重构的速度和准确性，我们提出利用知识蒸馏（KD）和压缩加速构建的反热传导问题求解模型，在损失少量最优性的情况下显著缩短了训练时间，保证了预测模型的准确性和效率。在模型输入数据中加入不同程度的随机噪声后，CNN-GRU + KD 的抗噪能力很强，在高噪声数据下仍表现出良好的鲁棒性和稳定性。对三种不同工况下的铣刀温度场进行了重构，三种工况下的曲线拟合优度最高分别为 0.97，均方根误差最小分别为 1.43°C，实验结果表明该模型对铣刀温度场的重构是可行且有效的，对提高铣削加工机器人的精度具有重要意义。

{"title":"Fast reconstruction of milling temperature field based on CNN-GRU machine learning models.","authors":"Fengyuan Ma, Haoyu Wang, Mingfeng E, Zhongjin Sha, Xingshu Wang, Yunxian Cui, Junwei Yin","doi":"10.3389/fnbot.2024.1448482","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1448482","url":null,"abstract":"With the development of intelligent manufacturing technology, robots have become more widespread in the field of milling processing. When milling difficult-to-machine alloy materials, the localized high temperature and large temperature gradient at the front face of the tool lead to shortened tool life and poor machining quality. The existing temperature field reconstruction methods have many assumptions, large arithmetic volume and long solution time. In this paper, an inverse heat conduction problem solution model based on Gated Convolutional Recurrent Neural Network (CNN-GRU) is proposed for reconstructing the temperature field of the tool during milling. In order to ensure the speed and accuracy of the reconstruction, we propose to utilize the inverse heat conduction problem solution model constructed by knowledge distillation (KD) and compression acceleration, which achieves a significant reduction of the training time with a small loss of optimality and ensures the accuracy and efficiency of the prediction model. With different levels of random noise added to the model input data, CNN-GRU + KD is noise-resistant and still shows good robustness and stability under noisy data. The temperature field reconstruction of the milling tool is carried out for three different working conditions, and the curve fitting excellence under the three conditions is 0.97 at the highest, and the root mean square error is 1.43°C at the minimum, respectively, and the experimental results show that the model is feasible and effective in carrying out the temperature field reconstruction of the milling tool and is of great significance in improving the accuracy of the milling machining robot.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1448482"},"PeriodicalIF":2.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11466942/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142462936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ACA-Net: adaptive context-aware network for basketball action recognition. ACA-Net：用于篮球动作识别的自适应情境感知网络。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-09-25 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1471327

Yaolei Zhang, Fei Zhang, Yuanli Zhou, Xiao Xu

The advancements in intelligent action recognition can be instrumental in developing autonomous robotic systems capable of analyzing complex human activities in real-time, contributing to the growing field of robotics that operates in dynamic environments. The precise recognition of basketball players' actions using artificial intelligence technology can provide valuable assistance and guidance to athletes, coaches, and analysts, and can help referees make fairer decisions during games. However, unlike action recognition in simpler scenarios, the background in basketball is similar and complex, the differences between various actions are subtle, and lighting conditions are inconsistent, making action recognition in basketball a challenging task. To address this problem, an Adaptive Context-Aware Network (ACA-Net) for basketball player action recognition is proposed in this paper. It contains a Long Short-term Adaptive (LSTA) module and a Triplet Spatial-Channel Interaction (TSCI) module to extract effective features at the temporal, spatial, and channel levels. The LSTA module adaptively learns global and local temporal features of the video. The TSCI module enhances the feature representation by learning the interaction features between space and channels. We conducted extensive experiments on the popular basketball action recognition datasets SpaceJam and Basketball-51. The results show that ACA-Net outperforms the current mainstream methods, achieving 89.26% and 92.05% in terms of classification accuracy on the two datasets, respectively. ACA-Net's adaptable architecture also holds potential for real-world applications in autonomous robotics, where accurate recognition of complex human actions in unstructured environments is crucial for tasks such as automated game analysis, player performance evaluation, and enhanced interactive broadcasting experiences.

智能动作识别技术的进步有助于开发能够实时分析复杂人类活动的自主机器人系统，从而推动在动态环境中运行的机器人技术领域不断发展。利用人工智能技术对篮球运动员的动作进行精确识别，可以为运动员、教练员和分析人员提供有价值的帮助和指导，并有助于裁判员在比赛中做出更公平的裁决。然而，与简单场景中的动作识别不同，篮球比赛中的背景相似而复杂，各种动作之间的差异微妙，光照条件也不一致，因此篮球比赛中的动作识别是一项具有挑战性的任务。针对这一问题，本文提出了一种用于篮球运动员动作识别的自适应上下文感知网络（ACA-Net）。它包含一个长短期自适应（LSTA）模块和一个三重空间-信道交互（TSCI）模块，用于提取时间、空间和信道层面的有效特征。LSTA 模块能自适应地学习视频的全局和局部时间特征。TSCI 模块通过学习空间和通道之间的交互特征来增强特征表示。我们在流行的篮球动作识别数据集 SpaceJam 和 Basketball-51 上进行了大量实验。结果表明，ACA-Net 优于目前的主流方法，在这两个数据集上的分类准确率分别达到了 89.26% 和 92.05%。ACA-Net 的适应性架构在自主机器人的实际应用中也具有潜力，在非结构化环境中准确识别复杂的人类动作对于自动游戏分析、球员表现评估和增强互动广播体验等任务至关重要。

{"title":"ACA-Net: adaptive context-aware network for basketball action recognition.","authors":"Yaolei Zhang, Fei Zhang, Yuanli Zhou, Xiao Xu","doi":"10.3389/fnbot.2024.1471327","DOIUrl":"10.3389/fnbot.2024.1471327","url":null,"abstract":"The advancements in intelligent action recognition can be instrumental in developing autonomous robotic systems capable of analyzing complex human activities in real-time, contributing to the growing field of robotics that operates in dynamic environments. The precise recognition of basketball players' actions using artificial intelligence technology can provide valuable assistance and guidance to athletes, coaches, and analysts, and can help referees make fairer decisions during games. However, unlike action recognition in simpler scenarios, the background in basketball is similar and complex, the differences between various actions are subtle, and lighting conditions are inconsistent, making action recognition in basketball a challenging task. To address this problem, an Adaptive Context-Aware Network (ACA-Net) for basketball player action recognition is proposed in this paper. It contains a Long Short-term Adaptive (LSTA) module and a Triplet Spatial-Channel Interaction (TSCI) module to extract effective features at the temporal, spatial, and channel levels. The LSTA module adaptively learns global and local temporal features of the video. The TSCI module enhances the feature representation by learning the interaction features between space and channels. We conducted extensive experiments on the popular basketball action recognition datasets SpaceJam and Basketball-51. The results show that ACA-Net outperforms the current mainstream methods, achieving 89.26% and 92.05% in terms of classification accuracy on the two datasets, respectively. ACA-Net's adaptable architecture also holds potential for real-world applications in autonomous robotics, where accurate recognition of complex human actions in unstructured environments is crucial for tasks such as automated game analysis, player performance evaluation, and enhanced interactive broadcasting experiences.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1471327"},"PeriodicalIF":2.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461453/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142389755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Swimtrans Net: a multimodal robotic system for swimming action recognition driven via Swin-Transformer. Swimtrans Net：通过斯温变换器驱动的游泳动作识别多模态机器人系统。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-09-24 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1452019

He Chen, Xiaoyu Yue

Introduction: Currently, using machine learning methods for precise analysis and improvement of swimming techniques holds significant research value and application prospects. The existing machine learning methods have improved the accuracy of action recognition to some extent. However, they still face several challenges such as insufficient data feature extraction, limited model generalization ability, and poor real-time performance.

Methods: To address these issues, this paper proposes an innovative approach called Swimtrans Net: A multimodal robotic system for swimming action recognition driven via Swin-Transformer. By leveraging the powerful visual data feature extraction capabilities of Swin-Transformer, Swimtrans Net effectively extracts swimming image information. Additionally, to meet the requirements of multimodal tasks, we integrate the CLIP model into the system. Swin-Transformer serves as the image encoder for CLIP, and through fine-tuning the CLIP model, it becomes capable of understanding and interpreting swimming action data, learning relevant features and patterns associated with swimming. Finally, we introduce transfer learning for pre-training to reduce training time and lower computational resources, thereby providing real-time feedback to swimmers.

Results and discussion: Experimental results show that Swimtrans Net has achieved a 2.94% improvement over the current state-of-the-art methods in swimming motion analysis and prediction, making significant progress. This study introduces an innovative machine learning method that can help coaches and swimmers better understand and improve swimming techniques, ultimately improving swimming performance.

导言：目前，利用机器学习方法对游泳技术进行精确分析和改进具有重要的研究价值和应用前景。现有的机器学习方法在一定程度上提高了动作识别的准确性。然而，它们仍然面临着数据特征提取不足、模型泛化能力有限、实时性差等挑战：为了解决这些问题，本文提出了一种名为 Swimtrans Net 的创新方法：方法：针对这些问题，本文提出了一种名为 Swimtrans Net 的创新方法：通过 Swin-Transformer 驱动的游泳动作识别多模态机器人系统。通过利用 Swin-Transformer 强大的视觉数据特征提取功能，Swimtrans Net 可有效提取游泳图像信息。此外，为了满足多模态任务的要求，我们在系统中集成了 CLIP 模型。Swin-Transformer 可作为 CLIP 的图像编码器，通过微调 CLIP 模型，它能够理解和解释游泳动作数据，学习与游泳相关的特征和模式。最后，我们引入迁移学习进行预训练，以减少训练时间和降低计算资源，从而为游泳者提供实时反馈：实验结果表明，在游泳运动分析和预测方面，Swimtrans Net 比目前最先进的方法提高了 2.94%，取得了显著进步。这项研究介绍了一种创新的机器学习方法，可以帮助教练和游泳运动员更好地理解和改进游泳技术，最终提高游泳成绩。

{"title":"Swimtrans Net: a multimodal robotic system for swimming action recognition driven via Swin-Transformer.","authors":"He Chen, Xiaoyu Yue","doi":"10.3389/fnbot.2024.1452019","DOIUrl":"10.3389/fnbot.2024.1452019","url":null,"abstract":"Introduction: Currently, using machine learning methods for precise analysis and improvement of swimming techniques holds significant research value and application prospects. The existing machine learning methods have improved the accuracy of action recognition to some extent. However, they still face several challenges such as insufficient data feature extraction, limited model generalization ability, and poor real-time performance.Methods: To address these issues, this paper proposes an innovative approach called Swimtrans Net: A multimodal robotic system for swimming action recognition driven via Swin-Transformer. By leveraging the powerful visual data feature extraction capabilities of Swin-Transformer, Swimtrans Net effectively extracts swimming image information. Additionally, to meet the requirements of multimodal tasks, we integrate the CLIP model into the system. Swin-Transformer serves as the image encoder for CLIP, and through fine-tuning the CLIP model, it becomes capable of understanding and interpreting swimming action data, learning relevant features and patterns associated with swimming. Finally, we introduce transfer learning for pre-training to reduce training time and lower computational resources, thereby providing real-time feedback to swimmers.Results and discussion: Experimental results show that Swimtrans Net has achieved a 2.94% improvement over the current state-of-the-art methods in swimming motion analysis and prediction, making significant progress. This study introduces an innovative machine learning method that can help coaches and swimmers better understand and improve swimming techniques, ultimately improving swimming performance.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1452019"},"PeriodicalIF":2.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11458561/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142389758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human in the collaborative loop: a strategy for integrating human activity recognition and non-invasive brain-machine interfaces to control collaborative robots. 协作回路中的人：整合人类活动识别和非侵入式脑机接口以控制协作机器人的策略。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-09-24 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1383089

Artur Pilacinski, Lukas Christ, Marius Boshoff, Ioannis Iossifidis, Patrick Adler, Michael Miro, Bernd Kuhlenkötter, Christian Klaes

Human activity recognition (HAR) and brain-machine interface (BMI) are two emerging technologies that can enhance human-robot collaboration (HRC) in domains such as industry or healthcare. HAR uses sensors or cameras to capture and analyze the movements and actions of humans, while BMI uses human brain signals to decode action intentions. Both technologies face challenges impacting accuracy, reliability, and usability. In this article, we review the state-of-the-art techniques and methods for HAR and BMI and highlight their strengths and limitations. We then propose a hybrid framework that fuses HAR and BMI data, which can integrate the complementary information from the brain and body motion signals and improve the performance of human state decoding. We also discuss our hybrid method's potential benefits and implications for HRC.

人类活动识别（HAR）和脑机接口（BMI）是两项新兴技术，可增强工业或医疗保健等领域的人机协作（HRC）。HAR 使用传感器或摄像头来捕捉和分析人类的动作和行动，而 BMI 则使用人脑信号来解码行动意图。这两种技术都面临着影响准确性、可靠性和可用性的挑战。在本文中，我们回顾了 HAR 和 BMI 的最新技术和方法，并强调了它们的优势和局限性。然后，我们提出了一种融合 HAR 和 BMI 数据的混合框架，它可以整合大脑和身体运动信号的互补信息，提高人类状态解码的性能。我们还讨论了混合方法的潜在优势和对 HRC 的影响。

引用次数: 0

MLFGCN: short-term residential load forecasting via graph attention temporal convolution network. MLFGCN：通过图注意时间卷积网络进行短期住宅负荷预测。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-09-23 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1461403

Ding Feng, Dengao Li, Yu Zhou, Wei Wang

Introduction: Residential load forecasting is a challenging task due to the random fluctuations caused by complex correlations and individual differences. The existing short-term load forecasting models usually introduce external influencing factors such as climate and date. However, these additional information not only bring computational burden to the model, but also have uncertainty. To address these issues, we propose a novel multi-level feature fusion model based on graph attention temporal convolutional network (MLFGCN) for short-term residential load forecasting.

Methods: The proposed MLFGCN model fully considers the potential long-term dependencies in a single load series and the correlations between multiple load series, and does not require any additional information to be added. Temporal convolutional network (TCN) with gating mechanism is introduced to learn potential long-term dependencies in the original load series. In addition, we design two graph attentive convolutional modules to capture potential multi-level dependencies in load data. Finally, the outputs of each module are fused through an information fusion layer to obtain the highly accurate forecasting results.

Results: We conduct validation experiments on two real-world datasets. The results show that the proposed MLFGCN model achieves 0.25, 7.58% and 0.50 for MAE, MAPE and RMSE, respectively. These values are significantly better than those of baseline models.

Discussion: The MLFGCN algorithm proposed in this paper can significantly improve the accuracy of short-term residential load forecasting. This is achieved through high-quality feature reconstruction, comprehensive information graph construction and spatiotemporal features capture.

引言由于复杂的相关性和个体差异导致的随机波动，居民负荷预测是一项具有挑战性的任务。现有的短期负荷预测模型通常会引入气候和日期等外部影响因素。然而，这些附加信息不仅给模型带来了计算负担，还具有不确定性。针对这些问题，我们提出了一种基于图注意时序卷积网络（MLFGCN）的新型多层次特征融合模型，用于短期居民负荷预测：方法：所提出的 MLFGCN 模型充分考虑了单个负荷序列中潜在的长期依赖性以及多个负荷序列之间的相关性，并且不需要添加任何额外信息。我们引入了具有门控机制的时序卷积网络（TCN）来学习原始负载序列中的潜在长期依赖关系。此外，我们还设计了两个图注意卷积模块，以捕捉负载数据中潜在的多级依赖关系。最后，通过信息融合层对每个模块的输出进行融合，从而获得高精度的预测结果：我们在两个真实世界数据集上进行了验证实验。结果表明，所提出的 MLFGCN 模型的 MAE、MAPE 和 RMSE 分别达到了 0.25、7.58% 和 0.50。这些数值明显优于基线模型：本文提出的 MLFGCN 算法可显著提高短期居民负荷预测的准确性。本文提出的 MLFGCN 算法通过高质量的特征重构、全面的信息图构建和时空特征捕捉实现了这一目标。

{"title":"MLFGCN: short-term residential load forecasting via graph attention temporal convolution network.","authors":"Ding Feng, Dengao Li, Yu Zhou, Wei Wang","doi":"10.3389/fnbot.2024.1461403","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1461403","url":null,"abstract":"Introduction: Residential load forecasting is a challenging task due to the random fluctuations caused by complex correlations and individual differences. The existing short-term load forecasting models usually introduce external influencing factors such as climate and date. However, these additional information not only bring computational burden to the model, but also have uncertainty. To address these issues, we propose a novel multi-level feature fusion model based on graph attention temporal convolutional network (MLFGCN) for short-term residential load forecasting.Methods: The proposed MLFGCN model fully considers the potential long-term dependencies in a single load series and the correlations between multiple load series, and does not require any additional information to be added. Temporal convolutional network (TCN) with gating mechanism is introduced to learn potential long-term dependencies in the original load series. In addition, we design two graph attentive convolutional modules to capture potential multi-level dependencies in load data. Finally, the outputs of each module are fused through an information fusion layer to obtain the highly accurate forecasting results.Results: We conduct validation experiments on two real-world datasets. The results show that the proposed MLFGCN model achieves 0.25, 7.58% and 0.50 for MAE, MAPE and RMSE, respectively. These values are significantly better than those of baseline models.Discussion: The MLFGCN algorithm proposed in this paper can significantly improve the accuracy of short-term residential load forecasting. This is achieved through high-quality feature reconstruction, comprehensive information graph construction and spatiotemporal features capture.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1461403"},"PeriodicalIF":2.6,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11457015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142389757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0