Frontiers in Neurorobotics最新文献_第5页

Frontiers | Enhanced LSTM-based robotic agent for load forecasting in low-voltage distributed photovoltaic power distribution network 前沿｜基于 LSTM 的增强型机器人代理，用于低压分布式光伏配电网的负荷预测

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-31 DOI: 10.3389/fnbot.2024.1431643

Xudong Zhang, Junlong Wang, Jun Wang, Hao Wang, Lijun Lu

To ensure the safe operation and dispatching control of a low-voltage distributed photovoltaic (PV) power distribution network (PDN), the load forecasting problem of the PDN is studied in this study. Based on deep learning technology, this paper proposes a robot-assisted load forecasting method for low-voltage distributed photovoltaic power distribution networks using enhanced long short-term memory (LSTM). This method employs the frequency domain decomposition (FDD) to obtain boundary points and incorporates a dense layer following the LSTM layer to better extract data features. The LSTM is used to predict low-frequency and high-frequency components separately, enabling the model to precisely capture the voltage variation patterns across different frequency components, thereby achieving high-precision voltage prediction. By verifying the historical operation data set of a low-voltage distributed PV-PDN in Guangdong Province, experimental results demonstrate that the proposed “FDD+LSTM” model outperforms both recurrent neural network and support vector machine models in terms of prediction accuracy on both time scales of 1 h and 4 h. Precisely forecast the voltage in different seasons and time scales, which has a certain value in promoting the development of the PDN and related technology industry chain.

为确保低压分布式光伏配电网（PDN）的安全运行和调度控制，本研究对该配电网的负荷预测问题进行了研究。本文基于深度学习技术，提出了一种利用增强型长短期记忆（LSTM）的机器人辅助低压分布式光伏配电网负荷预测方法。该方法采用频域分解（FDD）获取边界点，并在 LSTM 层之后加入一个密集层，以更好地提取数据特征。LSTM 用于分别预测低频和高频分量，使模型能够精确捕捉不同频率分量的电压变化模式，从而实现高精度电压预测。通过验证广东省某低压分布式光伏并网电站的历史运行数据集，实验结果表明，所提出的 "FDD+LSTM "模型在1 h和4 h两个时间尺度上的预测精度均优于递归神经网络和支持向量机模型，可精确预测不同季节和时间尺度的电压，对促进光伏并网电站及相关技术产业链的发展具有一定的价值。

{"title":"Frontiers | Enhanced LSTM-based robotic agent for load forecasting in low-voltage distributed photovoltaic power distribution network","authors":"Xudong Zhang, Junlong Wang, Jun Wang, Hao Wang, Lijun Lu","doi":"10.3389/fnbot.2024.1431643","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1431643","url":null,"abstract":"To ensure the safe operation and dispatching control of a low-voltage distributed photovoltaic (PV) power distribution network (PDN), the load forecasting problem of the PDN is studied in this study. Based on deep learning technology, this paper proposes a robot-assisted load forecasting method for low-voltage distributed photovoltaic power distribution networks using enhanced long short-term memory (LSTM). This method employs the frequency domain decomposition (FDD) to obtain boundary points and incorporates a dense layer following the LSTM layer to better extract data features. The LSTM is used to predict low-frequency and high-frequency components separately, enabling the model to precisely capture the voltage variation patterns across different frequency components, thereby achieving high-precision voltage prediction. By verifying the historical operation data set of a low-voltage distributed PV-PDN in Guangdong Province, experimental results demonstrate that the proposed “FDD+LSTM” model outperforms both recurrent neural network and support vector machine models in terms of prediction accuracy on both time scales of 1 h and 4 h. Precisely forecast the voltage in different seasons and time scales, which has a certain value in promoting the development of the PDN and related technology industry chain.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"55 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On designing a configurable UAV autopilot for unmanned quadrotors 关于为无人驾驶四旋翼飞行器设计可配置的 UAV 自动驾驶仪

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-30 DOI: 10.3389/fnbot.2024.1363366

Ali Bhar, Mounir Sayadi

Unmanned Aerial Vehicles (UAVs) and quadrotors are being used in an increasing number of applications. The detection and management of forest fires is continually improved by the incorporation of new economical technologies in order to prevent ecological degradation and disasters. Using an inner-outer loop design, this paper discusses an attitude and altitude controller for a quadrotor. As a highly nonlinear system, quadrotor dynamics can be simplified by assuming several assumptions. Quadrotor autopilot is developed using nonlinear feedback linearization technique, LQR, SMC, PD, and PID controllers. Often, these approaches are used to improve control and to reject disturbances. PD-PID controllers are also deployed in the tracking and surveillance of smoke or fire by intelligent algorithms. In this paper, the efficiency using a combined PD-PID controllers with adjustable parameters have been studied. The performance was assessed by simulation using matlab Simulink. The computational study conducted to assess the proposed approach showed that the PD-PID combination presented in this paper yields promising outcomes.

无人驾驶飞行器（UAV）和四旋翼飞行器的应用越来越广泛。为了防止生态退化和灾难，森林火灾的探测和管理通过采用新的经济技术得到不断改进。本文采用内-外环设计，讨论了四旋翼飞行器的姿态和高度控制器。作为一个高度非线性系统，四旋翼飞行器的动力学可以通过几个假设进行简化。四旋翼飞行器自动驾驶仪的开发采用了非线性反馈线性化技术、LQR、SMC、PD 和 PID 控制器。这些方法通常用于改善控制和抑制干扰。PD-PID 控制器还通过智能算法用于烟雾或火灾的跟踪和监控。本文研究了使用带可调参数的组合 PD-PID 控制器的效率。通过使用 matlab Simulink 进行仿真，对其性能进行了评估。为评估所提出的方法而进行的计算研究表明，本文提出的 PD-PID 组合产生了良好的效果。

引用次数: 0

Editorial: Recent advances in image fusion and quality improvement for cyber-physical systems, volume II. 编辑：网络物理系统图像融合与质量改进的最新进展，第 II 卷。

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-29 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1422982

Xin Jin, Shin-Jye Lee, Michal Wozniak, Qian Jiang

引用次数: 0

Optimization of robotic path planning and navigation point configuration based on convolutional neural networks 基于卷积神经网络的机器人路径规划和导航点配置优化

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-20 DOI: 10.3389/fnbot.2024.1406658

Jian Wu, Huan Li, Bangjie Li, Xiaolong Zheng, Daqiao Zhang

This study introduces a novel approach for enhancing robotic path planning and navigation by optimizing point configuration through convolutional neural networks (CNNs). Faced with the challenge of precise area coverage and the inefficiency of traditional traversal and intelligent algorithms (e.g., genetic algorithms, particle swarm optimization) in point layout, we proposed a CNN-based optimization model. This model not only tackles the issues of speed and accuracy in point configuration with Gaussian distribution characteristics but also significantly improves the robot's capability to efficiently navigate and cover designated areas with high precision. Our methodology begins with defining a coverage index, followed by an optimization model that integrates polygon image features with the variability of Gaussian distribution. The proposed CNN model is trained with datasets generated from systematic point configurations, which then predicts optimal layouts for enhanced navigation. Our method achieves an experimental result error of <8% on the test dataset. The results validate effectiveness of the proposed model in achieving efficient and accurate path planning for robotic systems.

本研究介绍了一种通过卷积神经网络（CNN）优化点配置来增强机器人路径规划和导航的新方法。面对精确区域覆盖的挑战以及传统遍历和智能算法（如遗传算法、粒子群优化）在点配置中的低效率，我们提出了一种基于卷积神经网络的优化模型。该模型不仅解决了具有高斯分布特征的点配置的速度和精度问题，还显著提高了机器人高效导航和高精度覆盖指定区域的能力。我们的方法首先是定义一个覆盖指数，然后是一个将多边形图像特征与高斯分布的可变性相结合的优化模型。提出的 CNN 模型通过系统点配置生成的数据集进行训练，然后预测最佳布局以增强导航功能。我们的方法在测试数据集上的实验结果误差小于 8%。结果验证了所提模型在实现机器人系统高效、准确路径规划方面的有效性。

{"title":"Optimization of robotic path planning and navigation point configuration based on convolutional neural networks","authors":"Jian Wu, Huan Li, Bangjie Li, Xiaolong Zheng, Daqiao Zhang","doi":"10.3389/fnbot.2024.1406658","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1406658","url":null,"abstract":"This study introduces a novel approach for enhancing robotic path planning and navigation by optimizing point configuration through convolutional neural networks (CNNs). Faced with the challenge of precise area coverage and the inefficiency of traditional traversal and intelligent algorithms (e.g., genetic algorithms, particle swarm optimization) in point layout, we proposed a CNN-based optimization model. This model not only tackles the issues of speed and accuracy in point configuration with Gaussian distribution characteristics but also significantly improves the robot's capability to efficiently navigate and cover designated areas with high precision. Our methodology begins with defining a coverage index, followed by an optimization model that integrates polygon image features with the variability of Gaussian distribution. The proposed CNN model is trained with datasets generated from systematic point configurations, which then predicts optimal layouts for enhanced navigation. Our method achieves an experimental result error of <8% on the test dataset. The results validate effectiveness of the proposed model in achieving efficient and accurate path planning for robotic systems.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"26 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A framework for neurosymbolic robot action planning using large language models 使用大型语言模型的神经符号机器人行动规划框架

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-15 DOI: 10.3389/fnbot.2024.1342786

Alessio Capitanelli, Fulvio Mastrogiovanni

Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in engineered robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, highly dynamic, human-robot collaboration scenarios because of the poor performance in planning domains where action effects may not be immediate, or when frequent re-planning is needed due to changed circumstances in the robot workspace. The validity of plans in the long term, plan length, and planning time could hinder the robot's efficiency and negatively affect the overall human-robot interaction's fluency. We present a framework, which we refer to as Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs' response time linearly scales with the combined length of the input and the output, instead of super-linearly as in the case of symbolic task planners, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, and to make each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. In the past year, significant efforts have been devoted by the research community to evaluate the overall cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to providing an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities in other metrics, specifically those related to their short- and mid-term generative capabilities, which are used to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.

由于符号任务规划易于理解并可在工程机器人架构中部署，因此被广泛用于实现机器人自主性。然而，符号任务规划技术很难在现实世界的高动态人机协作场景中推广，因为在规划领域中，行动效果可能不是立竿见影的，或者由于机器人工作区的环境变化而需要频繁重新规划时，符号任务规划技术的性能较差。计划的长期有效性、计划长度和计划时间可能会妨碍机器人的效率，并对整个人机交互的流畅性产生负面影响。我们提出了一个框架，我们称之为 "Teriyaki"，专门用于弥合符号任务规划与机器学习方法之间的差距。其原理是将大型语言模型（LLM），即 GPT-3 训练成与规划域定义语言（PDDL）兼容的神经符号任务规划器，然后利用其生成能力克服符号任务规划器固有的一些局限性。其潜在优势包括：(i) 由于 LLM 的响应时间与输入和输出的总长度呈线性关系，而不是像符号任务规划器那样呈超线性关系，因此随着规划领域复杂度的增加，LLM 具有更好的可扩展性；(ii) 能够逐个行动合成计划，而不是端到端合成计划，并且每个行动生成后可立即执行，而不是等待整个计划可用，这反过来又实现了并行规划和执行。在过去的一年里，研究界在评估 LLM 的整体认知能力方面做出了巨大的努力，并取得了一些成功。而 Teriyaki 的目标是在特定规划领域提供与传统规划师相当的整体规划性能，同时利用 LLMs 在其他指标方面的能力，特别是与短期和中期生成能力相关的指标，这些指标用于构建前瞻性预测规划模型。选定领域的初步结果表明，我们的方法可以(i) 解决 1,000 个样本测试数据集中 95.5% 的问题；(ii) 生成的计划比传统的符号规划器最多缩短 13.5%；(iii) 将计划可用性的平均总体等待时间最多减少 61.4%。

{"title":"A framework for neurosymbolic robot action planning using large language models","authors":"Alessio Capitanelli, Fulvio Mastrogiovanni","doi":"10.3389/fnbot.2024.1342786","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1342786","url":null,"abstract":"Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in engineered robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, highly dynamic, human-robot collaboration scenarios because of the poor performance in planning domains where action effects may not be immediate, or when frequent re-planning is needed due to changed circumstances in the robot workspace. The validity of plans in the long term, plan length, and planning time could hinder the robot's efficiency and negatively affect the overall human-robot interaction's fluency. We present a framework, which we refer to as Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs' response time linearly scales with the combined length of the input and the output, instead of super-linearly as in the case of symbolic task planners, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, and to make each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. In the past year, significant efforts have been devoted by the research community to evaluate the overall cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to providing an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities in other metrics, specifically those related to their short- and mid-term generative capabilities, which are used to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"119 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141259394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fine-grained image classification method based on hybrid attention module 基于混合注意力模块的精细图像分类方法

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-03 DOI: 10.3389/fnbot.2024.1391791

Weixiang Lu, Ying Yang, Lei Yang

To efficiently capture feature information in tasks of fine-grained image classification, this study introduces a new network model for fine-grained image classification, which utilizes a hybrid attention approach. The model is built upon a hybrid attention module (MA), and with the assistance of the attention erasure module (EA), it can adaptively enhance the prominent areas in the image and capture more detailed image information. Specifically, for tasks involving fine-grained image classification, this study designs an attention module capable of applying the attention mechanism to both the channel and spatial dimensions. This highlights the important regions and key feature channels in the image, allowing for the extraction of distinct local features. Furthermore, this study presents an attention erasure module (EA) that can remove significant areas in the image based on the features identified; thus, shifting focus to additional feature details within the image and improving the diversity and completeness of the features. Moreover, this study enhances the pooling layer of ResNet50 to augment the perceptual region and the capability to extract features from the network’s less deep layers. For the objective of fine-grained image classification, this study extracts a variety of features and merges them effectively to create the final feature representation. To assess the effectiveness of the proposed model, experiments were conducted on three publicly available fine-grained image classification datasets: Stanford Cars, FGVC-Aircraft, and CUB-200–2011. The method achieved classification accuracies of 92.8, 94.0, and 88.2% on these datasets, respectively. In comparison with existing approaches, the efficiency of this method has significantly improved, demonstrating higher accuracy and robustness.

为了在细粒度图像分类任务中有效捕捉特征信息，本研究采用混合注意力方法，引入了一种新的细粒度图像分类网络模型。该模型以混合注意力模块（MA）为基础，在注意力消除模块（EA）的辅助下，可以自适应地增强图像中的突出区域，捕捉更详细的图像信息。具体来说，对于涉及细粒度图像分类的任务，本研究设计的注意力模块能够将注意力机制应用于通道和空间维度。这样就能突出图像中的重要区域和关键特征通道，从而提取独特的局部特征。此外，本研究还提出了一种注意力消除模块（EA），可根据识别出的特征消除图像中的重要区域，从而将注意力转移到图像中的其他特征细节，并提高特征的多样性和完整性。此外，本研究还增强了 ResNet50 的池化层，以扩大感知区域，并提高从网络较低深度层中提取特征的能力。为了实现细粒度图像分类的目标，本研究提取了多种特征，并将它们有效合并，以创建最终的特征表示。为了评估所提出模型的有效性，我们在三个公开的细粒度图像分类数据集上进行了实验：斯坦福汽车》、《FGVC-飞机》和《CUB-200-2011》。该方法在这些数据集上的分类准确率分别为 92.8%、94.0% 和 88.2%。与现有方法相比，该方法的效率显著提高，表现出更高的准确性和鲁棒性。

{"title":"Fine-grained image classification method based on hybrid attention module","authors":"Weixiang Lu, Ying Yang, Lei Yang","doi":"10.3389/fnbot.2024.1391791","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1391791","url":null,"abstract":"To efficiently capture feature information in tasks of fine-grained image classification, this study introduces a new network model for fine-grained image classification, which utilizes a hybrid attention approach. The model is built upon a hybrid attention module (MA), and with the assistance of the attention erasure module (EA), it can adaptively enhance the prominent areas in the image and capture more detailed image information. Specifically, for tasks involving fine-grained image classification, this study designs an attention module capable of applying the attention mechanism to both the channel and spatial dimensions. This highlights the important regions and key feature channels in the image, allowing for the extraction of distinct local features. Furthermore, this study presents an attention erasure module (EA) that can remove significant areas in the image based on the features identified; thus, shifting focus to additional feature details within the image and improving the diversity and completeness of the features. Moreover, this study enhances the pooling layer of ResNet50 to augment the perceptual region and the capability to extract features from the network’s less deep layers. For the objective of fine-grained image classification, this study extracts a variety of features and merges them effectively to create the final feature representation. To assess the effectiveness of the proposed model, experiments were conducted on three publicly available fine-grained image classification datasets: Stanford Cars, FGVC-Aircraft, and CUB-200–2011. The method achieved classification accuracies of 92.8, 94.0, and 88.2% on these datasets, respectively. In comparison with existing approaches, the efficiency of this method has significantly improved, demonstrating higher accuracy and robustness.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"89 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fusion inception and transformer network for continuous estimation of finger kinematics from surface electromyography 从表面肌电图连续估算手指运动学的融合阈值和变压器网络

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-03 DOI: 10.3389/fnbot.2024.1305605

Chuang Lin, Xiaobing Zhang

Decoding surface electromyography (sEMG) to recognize human movement intentions enables us to achieve stable, natural and consistent control in the field of human computer interaction (HCI). In this paper, we present a novel deep learning (DL) model, named fusion inception and transformer network (FIT), which effectively models both local and global information on sequence data by fully leveraging the capabilities of Inception and Transformer networks. In the publicly available Ninapro dataset, we selected surface EMG signals from six typical hand grasping maneuvers in 10 subjects for predicting the values of the 10 most important joint angles in the hand. Our model’s performance, assessed through Pearson’s correlation coefficient (PCC), root mean square error (RMSE), and R-squared (R2) metrics, was compared with temporal convolutional network (TCN), long short-term memory network (LSTM), and bidirectional encoder representation from transformers model (BERT). Additionally, we also calculate the training time and the inference time of the models. The results show that FIT is the most performant, with excellent estimation accuracy and low computational cost. Our model contributes to the development of HCI technology and has significant practical value.

解码表面肌电图（sEMG）以识别人类的运动意图，使我们能够在人机交互（HCI）领域实现稳定、自然和一致的控制。在本文中，我们提出了一种新颖的深度学习（DL）模型，命名为融合入门和变压器网络（FIT），通过充分利用入门和变压器网络的能力，有效地对序列数据的局部和全局信息进行建模。在公开的 Ninapro 数据集中，我们选取了 10 名受试者 6 个典型手部抓握动作的表面肌电信号，用于预测手部 10 个最重要关节角度的值。我们通过皮尔逊相关系数 (PCC)、均方根误差 (RMSE) 和 R 平方 (R2) 指标评估了模型的性能，并将其与时序卷积网络 (TCN)、长短期记忆网络 (LSTM) 和变压器双向编码器表示模型 (BERT) 进行了比较。此外，我们还计算了模型的训练时间和推理时间。结果表明，FIT 的性能最好，估计精度高，计算成本低。我们的模型有助于人机交互技术的发展，具有重要的实用价值。

{"title":"Fusion inception and transformer network for continuous estimation of finger kinematics from surface electromyography","authors":"Chuang Lin, Xiaobing Zhang","doi":"10.3389/fnbot.2024.1305605","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1305605","url":null,"abstract":"Decoding surface electromyography (sEMG) to recognize human movement intentions enables us to achieve stable, natural and consistent control in the field of human computer interaction (HCI). In this paper, we present a novel deep learning (DL) model, named fusion inception and transformer network (FIT), which effectively models both local and global information on sequence data by fully leveraging the capabilities of Inception and Transformer networks. In the publicly available Ninapro dataset, we selected surface EMG signals from six typical hand grasping maneuvers in 10 subjects for predicting the values of the 10 most important joint angles in the hand. Our model’s performance, assessed through Pearson’s correlation coefficient (PCC), root mean square error (RMSE), and R-squared (R2) metrics, was compared with temporal convolutional network (TCN), long short-term memory network (LSTM), and bidirectional encoder representation from transformers model (BERT). Additionally, we also calculate the training time and the inference time of the models. The results show that FIT is the most performant, with excellent estimation accuracy and low computational cost. Our model contributes to the development of HCI technology and has significant practical value.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"1 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mining local and global spatiotemporal features for tactile object recognition 挖掘局部和全局时空特征，实现触觉物体识别

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-03 DOI: 10.3389/fnbot.2024.1387428

Xiaoliang Qian, Wei Deng, Wei Wang, Yucui Liu, Liying Jiang

The tactile object recognition (TOR) is highly important for environmental perception of robots. The previous works usually utilize single scale convolution which cannot simultaneously extract local and global spatiotemporal features of tactile data, which leads to low accuracy in TOR task. To address above problem, this article proposes a local and global residual (LGR-18) network which is mainly consisted of multiple local and global convolution (LGC) blocks. An LGC block contains two pairs of local convolution (LC) and global convolution (GC) modules. The LC module mainly utilizes a temporal shift operation and a 2D convolution layer to extract local spatiotemporal features. The GC module extracts global spatiotemporal features by fusing multiple 1D and 2D convolutions which can expand the receptive field in temporal and spatial dimensions. Consequently, our LGR-18 network can extract local-global spatiotemporal features without using 3D convolutions which usually require a large number of parameters. The effectiveness of LC module, GC module and LGC block is verified by ablation studies. Quantitative comparisons with state-of-the-art methods reveal the excellent capability of our method.

触觉物体识别（TOR）对于机器人的环境感知非常重要。以往的研究通常采用单尺度卷积，无法同时提取触觉数据的局部和全局时空特征，导致 TOR 任务的准确率较低。针对上述问题，本文提出了一种局部和全局残差（LGR-18）网络，它主要由多个局部和全局卷积（LGC）块组成。一个 LGC 块包含两对局部卷积（LC）和全局卷积（GC）模块。LC 模块主要利用时移操作和二维卷积层来提取局部时空特征。全局卷积模块通过融合多个一维和二维卷积来提取全局时空特征，从而在时间和空间维度上扩展感受野。因此，我们的 LGR-18 网络可以提取局部-全局时空特征，而无需使用通常需要大量参数的三维卷积。消融研究验证了 LC 模块、GC 模块和 LGC 模块的有效性。与最先进方法的定量比较显示了我们方法的卓越能力。

{"title":"Mining local and global spatiotemporal features for tactile object recognition","authors":"Xiaoliang Qian, Wei Deng, Wei Wang, Yucui Liu, Liying Jiang","doi":"10.3389/fnbot.2024.1387428","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1387428","url":null,"abstract":"The tactile object recognition (TOR) is highly important for environmental perception of robots. The previous works usually utilize single scale convolution which cannot simultaneously extract local and global spatiotemporal features of tactile data, which leads to low accuracy in TOR task. To address above problem, this article proposes a local and global residual (LGR-18) network which is mainly consisted of multiple local and global convolution (LGC) blocks. An LGC block contains two pairs of local convolution (LC) and global convolution (GC) modules. The LC module mainly utilizes a temporal shift operation and a 2D convolution layer to extract local spatiotemporal features. The GC module extracts global spatiotemporal features by fusing multiple 1D and 2D convolutions which can expand the receptive field in temporal and spatial dimensions. Consequently, our LGR-18 network can extract local-global spatiotemporal features without using 3D convolutions which usually require a large number of parameters. The effectiveness of LC module, GC module and LGC block is verified by ablation studies. Quantitative comparisons with state-of-the-art methods reveal the excellent capability of our method.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"27 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D hand pose and mesh estimation via a generic Topology-aware Transformer model 通过通用拓扑感知变换器模型进行三维手部姿态和网格估算

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-03 DOI: 10.3389/fnbot.2024.1395652

Shaoqi Yu, Yintong Wang, Lili Chen, Xiaolin Zhang, Jiamao Li

In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.

在人机交互（HRI）中，精确的三维手部姿势和网格估计至关重要。然而，在严重自闭和高度自相似的情况下推断合理准确的姿势仍然是一个固有的挑战。为了减轻 HRI 过程中因关节不可见和相似而造成的模糊性，我们提出了一种名为 HandGCNFormer 的新拓扑感知变换器网络，以深度图像为输入，将手部运动拓扑的先验知识纳入网络，同时对远距离上下文信息进行建模。具体来说，我们提出了一种带有额外节点偏移图卷积层（NoffGConv）的新型 Graphformer 解码器。Graphformer 解码器优化了 Transformer 和 GCN 之间的协同作用，捕捉了关节之间的长距离依赖关系和局部拓扑连接。在此基础上，我们用新颖的拓扑感知头取代了标准的 MLP 预测头，从而更好地利用局部拓扑约束来获得更合理、更准确的姿势。我们的方法在四个具有挑战性的数据集（包括 Hands2017、NYU、ICVL 和 MSRA）上实现了最先进的 3D 手部姿态估计性能。为了进一步证明我们提出的 Graphformer 解码器和拓扑感知头的有效性和可扩展性，我们将框架扩展为 HandGCNFormer-Mesh，用于三维手部网格估计任务。扩展框架有效地将形状回归器与原始 Graphformer 解码器和拓扑感知头集成在一起，生成了马诺参数。HO-3D 数据集包含各种具有挑战性的遮挡物，在该数据集上的结果表明，与之前最先进的三维手部网格估计方法相比，我们的 HandGCNFormer-Mesh 取得了具有竞争力的结果。

{"title":"3D hand pose and mesh estimation via a generic Topology-aware Transformer model","authors":"Shaoqi Yu, Yintong Wang, Lili Chen, Xiaolin Zhang, Jiamao Li","doi":"10.3389/fnbot.2024.1395652","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1395652","url":null,"abstract":"In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"45 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Curiosity model policy optimization for robotic manipulator tracking control with input saturation in uncertain environment 针对不确定环境下输入饱和的机器人机械手跟踪控制的好奇心模型策略优化

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-05-01 DOI: 10.3389/fnbot.2024.1376215

Tu Wang, Fujie Wang, Zhongye Xie, Feiyan Qin

In uncertain environments with robot input saturation, both model-based reinforcement learning (MBRL) and traditional controllers struggle to perform control tasks optimally. In this study, an algorithmic framework of Curiosity Model Policy Optimization (CMPO) is proposed by combining curiosity and model-based approach, where tracking errors are reduced via training agents on control gains for traditional model-free controllers. To begin with, a metric for judging positive and negative curiosity is proposed. Constrained optimization is employed to update the curiosity ratio, which improves the efficiency of agent training. Next, the novelty distance buffer ratio is defined to reduce bias between the environment and the model. Finally, CMPO is simulated with traditional controllers and baseline MBRL algorithms in the robotic environment designed with non-linear rewards. The experimental results illustrate that the algorithm achieves superior tracking performance and generalization capabilities.

在机器人输入饱和的不确定环境中，基于模型的强化学习（MBRL）和传统控制器都难以以最佳方式执行控制任务。本研究结合好奇心和基于模型的方法，提出了好奇心模型策略优化（CMPO）的算法框架，通过训练传统无模型控制器控制增益上的代理来减少跟踪误差。首先，提出了判断正负好奇心的指标。利用约束优化来更新好奇心比率，从而提高了代理训练的效率。接着，定义了新奇距离缓冲比，以减少环境与模型之间的偏差。最后，在非线性奖励设计的机器人环境中，将 CMPO 与传统控制器和基准 MBRL 算法进行了仿真。实验结果表明，该算法实现了卓越的跟踪性能和泛化能力。

引用次数: 0