Pub Date : 2024-05-31DOI: 10.3389/fnbot.2024.1431643
Xudong Zhang, Junlong Wang, Jun Wang, Hao Wang, Lijun Lu
To ensure the safe operation and dispatching control of a low-voltage distributed photovoltaic (PV) power distribution network (PDN), the load forecasting problem of the PDN is studied in this study. Based on deep learning technology, this paper proposes a robot-assisted load forecasting method for low-voltage distributed photovoltaic power distribution networks using enhanced long short-term memory (LSTM). This method employs the frequency domain decomposition (FDD) to obtain boundary points and incorporates a dense layer following the LSTM layer to better extract data features. The LSTM is used to predict low-frequency and high-frequency components separately, enabling the model to precisely capture the voltage variation patterns across different frequency components, thereby achieving high-precision voltage prediction. By verifying the historical operation data set of a low-voltage distributed PV-PDN in Guangdong Province, experimental results demonstrate that the proposed “FDD+LSTM” model outperforms both recurrent neural network and support vector machine models in terms of prediction accuracy on both time scales of 1 h and 4 h. Precisely forecast the voltage in different seasons and time scales, which has a certain value in promoting the development of the PDN and related technology industry chain.
{"title":"Frontiers | Enhanced LSTM-based robotic agent for load forecasting in low-voltage distributed photovoltaic power distribution network","authors":"Xudong Zhang, Junlong Wang, Jun Wang, Hao Wang, Lijun Lu","doi":"10.3389/fnbot.2024.1431643","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1431643","url":null,"abstract":"To ensure the safe operation and dispatching control of a low-voltage distributed photovoltaic (PV) power distribution network (PDN), the load forecasting problem of the PDN is studied in this study. Based on deep learning technology, this paper proposes a robot-assisted load forecasting method for low-voltage distributed photovoltaic power distribution networks using enhanced long short-term memory (LSTM). This method employs the frequency domain decomposition (FDD) to obtain boundary points and incorporates a dense layer following the LSTM layer to better extract data features. The LSTM is used to predict low-frequency and high-frequency components separately, enabling the model to precisely capture the voltage variation patterns across different frequency components, thereby achieving high-precision voltage prediction. By verifying the historical operation data set of a low-voltage distributed PV-PDN in Guangdong Province, experimental results demonstrate that the proposed “FDD+LSTM” model outperforms both recurrent neural network and support vector machine models in terms of prediction accuracy on both time scales of 1 h and 4 h. Precisely forecast the voltage in different seasons and time scales, which has a certain value in promoting the development of the PDN and related technology industry chain.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"55 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-30DOI: 10.3389/fnbot.2024.1363366
Ali Bhar, Mounir Sayadi
Unmanned Aerial Vehicles (UAVs) and quadrotors are being used in an increasing number of applications. The detection and management of forest fires is continually improved by the incorporation of new economical technologies in order to prevent ecological degradation and disasters. Using an inner-outer loop design, this paper discusses an attitude and altitude controller for a quadrotor. As a highly nonlinear system, quadrotor dynamics can be simplified by assuming several assumptions. Quadrotor autopilot is developed using nonlinear feedback linearization technique, LQR, SMC, PD, and PID controllers. Often, these approaches are used to improve control and to reject disturbances. PD-PID controllers are also deployed in the tracking and surveillance of smoke or fire by intelligent algorithms. In this paper, the efficiency using a combined PD-PID controllers with adjustable parameters have been studied. The performance was assessed by simulation using matlab Simulink. The computational study conducted to assess the proposed approach showed that the PD-PID combination presented in this paper yields promising outcomes.
{"title":"On designing a configurable UAV autopilot for unmanned quadrotors","authors":"Ali Bhar, Mounir Sayadi","doi":"10.3389/fnbot.2024.1363366","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1363366","url":null,"abstract":"Unmanned Aerial Vehicles (UAVs) and quadrotors are being used in an increasing number of applications. The detection and management of forest fires is continually improved by the incorporation of new economical technologies in order to prevent ecological degradation and disasters. Using an inner-outer loop design, this paper discusses an attitude and altitude controller for a quadrotor. As a highly nonlinear system, quadrotor dynamics can be simplified by assuming several assumptions. Quadrotor autopilot is developed using nonlinear feedback linearization technique, LQR, SMC, PD, and PID controllers. Often, these approaches are used to improve control and to reject disturbances. PD-PID controllers are also deployed in the tracking and surveillance of smoke or fire by intelligent algorithms. In this paper, the efficiency using a combined PD-PID controllers with adjustable parameters have been studied. The performance was assessed by simulation using matlab Simulink. The computational study conducted to assess the proposed approach showed that the PD-PID combination presented in this paper yields promising outcomes.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"60 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-29eCollection Date: 2024-01-01DOI: 10.3389/fnbot.2024.1422982
Xin Jin, Shin-Jye Lee, Michal Wozniak, Qian Jiang
{"title":"Editorial: Recent advances in image fusion and quality improvement for cyber-physical systems, volume II.","authors":"Xin Jin, Shin-Jye Lee, Michal Wozniak, Qian Jiang","doi":"10.3389/fnbot.2024.1422982","DOIUrl":"10.3389/fnbot.2024.1422982","url":null,"abstract":"","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1422982"},"PeriodicalIF":3.1,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11167091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141310513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study introduces a novel approach for enhancing robotic path planning and navigation by optimizing point configuration through convolutional neural networks (CNNs). Faced with the challenge of precise area coverage and the inefficiency of traditional traversal and intelligent algorithms (e.g., genetic algorithms, particle swarm optimization) in point layout, we proposed a CNN-based optimization model. This model not only tackles the issues of speed and accuracy in point configuration with Gaussian distribution characteristics but also significantly improves the robot's capability to efficiently navigate and cover designated areas with high precision. Our methodology begins with defining a coverage index, followed by an optimization model that integrates polygon image features with the variability of Gaussian distribution. The proposed CNN model is trained with datasets generated from systematic point configurations, which then predicts optimal layouts for enhanced navigation. Our method achieves an experimental result error of <8% on the test dataset. The results validate effectiveness of the proposed model in achieving efficient and accurate path planning for robotic systems.
{"title":"Optimization of robotic path planning and navigation point configuration based on convolutional neural networks","authors":"Jian Wu, Huan Li, Bangjie Li, Xiaolong Zheng, Daqiao Zhang","doi":"10.3389/fnbot.2024.1406658","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1406658","url":null,"abstract":"This study introduces a novel approach for enhancing robotic path planning and navigation by optimizing point configuration through convolutional neural networks (CNNs). Faced with the challenge of precise area coverage and the inefficiency of traditional traversal and intelligent algorithms (e.g., genetic algorithms, particle swarm optimization) in point layout, we proposed a CNN-based optimization model. This model not only tackles the issues of speed and accuracy in point configuration with Gaussian distribution characteristics but also significantly improves the robot's capability to efficiently navigate and cover designated areas with high precision. Our methodology begins with defining a coverage index, followed by an optimization model that integrates polygon image features with the variability of Gaussian distribution. The proposed CNN model is trained with datasets generated from systematic point configurations, which then predicts optimal layouts for enhanced navigation. Our method achieves an experimental result error of <8% on the test dataset. The results validate effectiveness of the proposed model in achieving efficient and accurate path planning for robotic systems.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"26 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-15DOI: 10.3389/fnbot.2024.1342786
Alessio Capitanelli, Fulvio Mastrogiovanni
Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in engineered robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, highly dynamic, human-robot collaboration scenarios because of the poor performance in planning domains where action effects may not be immediate, or when frequent re-planning is needed due to changed circumstances in the robot workspace. The validity of plans in the long term, plan length, and planning time could hinder the robot's efficiency and negatively affect the overall human-robot interaction's fluency. We present a framework, which we refer to as Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs' response time linearly scales with the combined length of the input and the output, instead of super-linearly as in the case of symbolic task planners, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, and to make each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. In the past year, significant efforts have been devoted by the research community to evaluate the overall cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to providing an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities in other metrics, specifically those related to their short- and mid-term generative capabilities, which are used to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.
{"title":"A framework for neurosymbolic robot action planning using large language models","authors":"Alessio Capitanelli, Fulvio Mastrogiovanni","doi":"10.3389/fnbot.2024.1342786","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1342786","url":null,"abstract":"Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in engineered robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, highly dynamic, human-robot collaboration scenarios because of the poor performance in planning domains where action effects may not be immediate, or when frequent re-planning is needed due to changed circumstances in the robot workspace. The validity of plans in the long term, plan length, and planning time could hinder the robot's efficiency and negatively affect the overall human-robot interaction's fluency. We present a framework, which we refer to as Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs' response time linearly scales with the combined length of the input and the output, instead of super-linearly as in the case of symbolic task planners, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, and to make each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. In the past year, significant efforts have been devoted by the research community to evaluate the overall cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to providing an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities in other metrics, specifically those related to their short- and mid-term generative capabilities, which are used to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"119 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141259394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.3389/fnbot.2024.1391791
Weixiang Lu, Ying Yang, Lei Yang
To efficiently capture feature information in tasks of fine-grained image classification, this study introduces a new network model for fine-grained image classification, which utilizes a hybrid attention approach. The model is built upon a hybrid attention module (MA), and with the assistance of the attention erasure module (EA), it can adaptively enhance the prominent areas in the image and capture more detailed image information. Specifically, for tasks involving fine-grained image classification, this study designs an attention module capable of applying the attention mechanism to both the channel and spatial dimensions. This highlights the important regions and key feature channels in the image, allowing for the extraction of distinct local features. Furthermore, this study presents an attention erasure module (EA) that can remove significant areas in the image based on the features identified; thus, shifting focus to additional feature details within the image and improving the diversity and completeness of the features. Moreover, this study enhances the pooling layer of ResNet50 to augment the perceptual region and the capability to extract features from the network’s less deep layers. For the objective of fine-grained image classification, this study extracts a variety of features and merges them effectively to create the final feature representation. To assess the effectiveness of the proposed model, experiments were conducted on three publicly available fine-grained image classification datasets: Stanford Cars, FGVC-Aircraft, and CUB-200–2011. The method achieved classification accuracies of 92.8, 94.0, and 88.2% on these datasets, respectively. In comparison with existing approaches, the efficiency of this method has significantly improved, demonstrating higher accuracy and robustness.
{"title":"Fine-grained image classification method based on hybrid attention module","authors":"Weixiang Lu, Ying Yang, Lei Yang","doi":"10.3389/fnbot.2024.1391791","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1391791","url":null,"abstract":"To efficiently capture feature information in tasks of fine-grained image classification, this study introduces a new network model for fine-grained image classification, which utilizes a hybrid attention approach. The model is built upon a hybrid attention module (MA), and with the assistance of the attention erasure module (EA), it can adaptively enhance the prominent areas in the image and capture more detailed image information. Specifically, for tasks involving fine-grained image classification, this study designs an attention module capable of applying the attention mechanism to both the channel and spatial dimensions. This highlights the important regions and key feature channels in the image, allowing for the extraction of distinct local features. Furthermore, this study presents an attention erasure module (EA) that can remove significant areas in the image based on the features identified; thus, shifting focus to additional feature details within the image and improving the diversity and completeness of the features. Moreover, this study enhances the pooling layer of ResNet50 to augment the perceptual region and the capability to extract features from the network’s less deep layers. For the objective of fine-grained image classification, this study extracts a variety of features and merges them effectively to create the final feature representation. To assess the effectiveness of the proposed model, experiments were conducted on three publicly available fine-grained image classification datasets: Stanford Cars, FGVC-Aircraft, and CUB-200–2011. The method achieved classification accuracies of 92.8, 94.0, and 88.2% on these datasets, respectively. In comparison with existing approaches, the efficiency of this method has significantly improved, demonstrating higher accuracy and robustness.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"89 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.3389/fnbot.2024.1305605
Chuang Lin, Xiaobing Zhang
Decoding surface electromyography (sEMG) to recognize human movement intentions enables us to achieve stable, natural and consistent control in the field of human computer interaction (HCI). In this paper, we present a novel deep learning (DL) model, named fusion inception and transformer network (FIT), which effectively models both local and global information on sequence data by fully leveraging the capabilities of Inception and Transformer networks. In the publicly available Ninapro dataset, we selected surface EMG signals from six typical hand grasping maneuvers in 10 subjects for predicting the values of the 10 most important joint angles in the hand. Our model’s performance, assessed through Pearson’s correlation coefficient (PCC), root mean square error (RMSE), and R-squared (R2) metrics, was compared with temporal convolutional network (TCN), long short-term memory network (LSTM), and bidirectional encoder representation from transformers model (BERT). Additionally, we also calculate the training time and the inference time of the models. The results show that FIT is the most performant, with excellent estimation accuracy and low computational cost. Our model contributes to the development of HCI technology and has significant practical value.
{"title":"Fusion inception and transformer network for continuous estimation of finger kinematics from surface electromyography","authors":"Chuang Lin, Xiaobing Zhang","doi":"10.3389/fnbot.2024.1305605","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1305605","url":null,"abstract":"Decoding surface electromyography (sEMG) to recognize human movement intentions enables us to achieve stable, natural and consistent control in the field of human computer interaction (HCI). In this paper, we present a novel deep learning (DL) model, named fusion inception and transformer network (FIT), which effectively models both local and global information on sequence data by fully leveraging the capabilities of Inception and Transformer networks. In the publicly available Ninapro dataset, we selected surface EMG signals from six typical hand grasping maneuvers in 10 subjects for predicting the values of the 10 most important joint angles in the hand. Our model’s performance, assessed through Pearson’s correlation coefficient (PCC), root mean square error (RMSE), and R-squared (R<jats:sup>2</jats:sup>) metrics, was compared with temporal convolutional network (TCN), long short-term memory network (LSTM), and bidirectional encoder representation from transformers model (BERT). Additionally, we also calculate the training time and the inference time of the models. The results show that FIT is the most performant, with excellent estimation accuracy and low computational cost. Our model contributes to the development of HCI technology and has significant practical value.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"1 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The tactile object recognition (TOR) is highly important for environmental perception of robots. The previous works usually utilize single scale convolution which cannot simultaneously extract local and global spatiotemporal features of tactile data, which leads to low accuracy in TOR task. To address above problem, this article proposes a local and global residual (LGR-18) network which is mainly consisted of multiple local and global convolution (LGC) blocks. An LGC block contains two pairs of local convolution (LC) and global convolution (GC) modules. The LC module mainly utilizes a temporal shift operation and a 2D convolution layer to extract local spatiotemporal features. The GC module extracts global spatiotemporal features by fusing multiple 1D and 2D convolutions which can expand the receptive field in temporal and spatial dimensions. Consequently, our LGR-18 network can extract local-global spatiotemporal features without using 3D convolutions which usually require a large number of parameters. The effectiveness of LC module, GC module and LGC block is verified by ablation studies. Quantitative comparisons with state-of-the-art methods reveal the excellent capability of our method.
{"title":"Mining local and global spatiotemporal features for tactile object recognition","authors":"Xiaoliang Qian, Wei Deng, Wei Wang, Yucui Liu, Liying Jiang","doi":"10.3389/fnbot.2024.1387428","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1387428","url":null,"abstract":"The tactile object recognition (TOR) is highly important for environmental perception of robots. The previous works usually utilize single scale convolution which cannot simultaneously extract local and global spatiotemporal features of tactile data, which leads to low accuracy in TOR task. To address above problem, this article proposes a local and global residual (LGR-18) network which is mainly consisted of multiple local and global convolution (LGC) blocks. An LGC block contains two pairs of local convolution (LC) and global convolution (GC) modules. The LC module mainly utilizes a temporal shift operation and a 2D convolution layer to extract local spatiotemporal features. The GC module extracts global spatiotemporal features by fusing multiple 1D and 2D convolutions which can expand the receptive field in temporal and spatial dimensions. Consequently, our LGR-18 network can extract local-global spatiotemporal features without using 3D convolutions which usually require a large number of parameters. The effectiveness of LC module, GC module and LGC block is verified by ablation studies. Quantitative comparisons with state-of-the-art methods reveal the excellent capability of our method.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"27 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.3389/fnbot.2024.1395652
Shaoqi Yu, Yintong Wang, Lili Chen, Xiaolin Zhang, Jiamao Li
In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.
{"title":"3D hand pose and mesh estimation via a generic Topology-aware Transformer model","authors":"Shaoqi Yu, Yintong Wang, Lili Chen, Xiaolin Zhang, Jiamao Li","doi":"10.3389/fnbot.2024.1395652","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1395652","url":null,"abstract":"In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"45 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.3389/fnbot.2024.1376215
Tu Wang, Fujie Wang, Zhongye Xie, Feiyan Qin
In uncertain environments with robot input saturation, both model-based reinforcement learning (MBRL) and traditional controllers struggle to perform control tasks optimally. In this study, an algorithmic framework of Curiosity Model Policy Optimization (CMPO) is proposed by combining curiosity and model-based approach, where tracking errors are reduced via training agents on control gains for traditional model-free controllers. To begin with, a metric for judging positive and negative curiosity is proposed. Constrained optimization is employed to update the curiosity ratio, which improves the efficiency of agent training. Next, the novelty distance buffer ratio is defined to reduce bias between the environment and the model. Finally, CMPO is simulated with traditional controllers and baseline MBRL algorithms in the robotic environment designed with non-linear rewards. The experimental results illustrate that the algorithm achieves superior tracking performance and generalization capabilities.
{"title":"Curiosity model policy optimization for robotic manipulator tracking control with input saturation in uncertain environment","authors":"Tu Wang, Fujie Wang, Zhongye Xie, Feiyan Qin","doi":"10.3389/fnbot.2024.1376215","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1376215","url":null,"abstract":"In uncertain environments with robot input saturation, both model-based reinforcement learning (MBRL) and traditional controllers struggle to perform control tasks optimally. In this study, an algorithmic framework of Curiosity Model Policy Optimization (CMPO) is proposed by combining curiosity and model-based approach, where tracking errors are reduced via training agents on control gains for traditional model-free controllers. To begin with, a metric for judging positive and negative curiosity is proposed. Constrained optimization is employed to update the curiosity ratio, which improves the efficiency of agent training. Next, the novelty distance buffer ratio is defined to reduce bias between the environment and the model. Finally, CMPO is simulated with traditional controllers and baseline MBRL algorithms in the robotic environment designed with non-linear rewards. The experimental results illustrate that the algorithm achieves superior tracking performance and generalization capabilities.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"12 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}