Pub Date : 2024-10-09eCollection Date: 2024-01-01DOI: 10.3389/fnbot.2024.1396359
Grgur Kovač, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer
Developmental psychologists have long-established socio-cognitive abilities as fundamental to human intelligence and development. These abilities enable individuals to enter, learn from, and contribute to a surrounding culture. This drives the process of cumulative cultural evolution, which is responsible for humanity's most remarkable achievements. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to enter a culture as well. We draw inspiration from the work of Michael Tomasello and Jerome Bruner, who studied socio-cognitive development and emphasized the influence of a cultural environment on intelligence. We outline a broader set of concepts than those currently studied in AI to provide a foundation for research in artificial social intelligence. Those concepts include social cognition (joint attention, perspective taking), communication, social learning, formats, and scaffolding. To facilitate research in this domain, we present The SocialAI school-a tool that offers a customizable parameterized suite of procedurally generated environments. This tool simplifies experimentation with the introduced concepts. Additionally, these environments can be used both with multimodal RL agents, or with pure-text Large Language Models (LLMs) as interactive agents. Through a series of case studies, we demonstrate the versatility of the SocialAI school for studying both RL and LLM-based agents. Our motivation is to engage the AI community around social intelligence informed by developmental psychology, and to provide a user-friendly resource and tool for initial investigations in this direction. Refer to the project website for code and additional resources: https://sites.google.com/view/socialai-school.
{"title":"The SocialAI school: a framework leveraging developmental psychology toward artificial socio-cultural agents.","authors":"Grgur Kovač, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer","doi":"10.3389/fnbot.2024.1396359","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1396359","url":null,"abstract":"<p><p>Developmental psychologists have long-established socio-cognitive abilities as fundamental to human intelligence and development. These abilities enable individuals to enter, learn from, and contribute to a surrounding culture. This drives the process of cumulative cultural evolution, which is responsible for humanity's most remarkable achievements. AI research on social interactive agents mostly concerns the <i>emergence</i> of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to <i>enter</i> a culture as well. We draw inspiration from the work of Michael Tomasello and Jerome Bruner, who studied socio-cognitive development and emphasized the influence of a cultural environment on intelligence. We outline a broader set of concepts than those currently studied in AI to provide a foundation for research in artificial social intelligence. Those concepts include social cognition (joint attention, perspective taking), communication, social learning, formats, and scaffolding. To facilitate research in this domain, we present The SocialAI school-a tool that offers a customizable parameterized suite of procedurally generated environments. This tool simplifies experimentation with the introduced concepts. Additionally, these environments can be used both with multimodal RL agents, or with pure-text Large Language Models (LLMs) as interactive agents. Through a series of case studies, we demonstrate the versatility of the SocialAI school for studying both RL and LLM-based agents. Our motivation is to engage the AI community around social intelligence informed by developmental psychology, and to provide a user-friendly resource and tool for initial investigations in this direction. Refer to the project website for code and additional resources: https://sites.google.com/view/socialai-school.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1396359"},"PeriodicalIF":2.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11496287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142498771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of intelligent manufacturing technology, robots have become more widespread in the field of milling processing. When milling difficult-to-machine alloy materials, the localized high temperature and large temperature gradient at the front face of the tool lead to shortened tool life and poor machining quality. The existing temperature field reconstruction methods have many assumptions, large arithmetic volume and long solution time. In this paper, an inverse heat conduction problem solution model based on Gated Convolutional Recurrent Neural Network (CNN-GRU) is proposed for reconstructing the temperature field of the tool during milling. In order to ensure the speed and accuracy of the reconstruction, we propose to utilize the inverse heat conduction problem solution model constructed by knowledge distillation (KD) and compression acceleration, which achieves a significant reduction of the training time with a small loss of optimality and ensures the accuracy and efficiency of the prediction model. With different levels of random noise added to the model input data, CNN-GRU + KD is noise-resistant and still shows good robustness and stability under noisy data. The temperature field reconstruction of the milling tool is carried out for three different working conditions, and the curve fitting excellence under the three conditions is 0.97 at the highest, and the root mean square error is 1.43°C at the minimum, respectively, and the experimental results show that the model is feasible and effective in carrying out the temperature field reconstruction of the milling tool and is of great significance in improving the accuracy of the milling machining robot.
{"title":"Fast reconstruction of milling temperature field based on CNN-GRU machine learning models.","authors":"Fengyuan Ma, Haoyu Wang, Mingfeng E, Zhongjin Sha, Xingshu Wang, Yunxian Cui, Junwei Yin","doi":"10.3389/fnbot.2024.1448482","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1448482","url":null,"abstract":"<p><p>With the development of intelligent manufacturing technology, robots have become more widespread in the field of milling processing. When milling difficult-to-machine alloy materials, the localized high temperature and large temperature gradient at the front face of the tool lead to shortened tool life and poor machining quality. The existing temperature field reconstruction methods have many assumptions, large arithmetic volume and long solution time. In this paper, an inverse heat conduction problem solution model based on Gated Convolutional Recurrent Neural Network (CNN-GRU) is proposed for reconstructing the temperature field of the tool during milling. In order to ensure the speed and accuracy of the reconstruction, we propose to utilize the inverse heat conduction problem solution model constructed by knowledge distillation (KD) and compression acceleration, which achieves a significant reduction of the training time with a small loss of optimality and ensures the accuracy and efficiency of the prediction model. With different levels of random noise added to the model input data, CNN-GRU + KD is noise-resistant and still shows good robustness and stability under noisy data. The temperature field reconstruction of the milling tool is carried out for three different working conditions, and the curve fitting excellence under the three conditions is 0.97 at the highest, and the root mean square error is 1.43°C at the minimum, respectively, and the experimental results show that the model is feasible and effective in carrying out the temperature field reconstruction of the milling tool and is of great significance in improving the accuracy of the milling machining robot.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1448482"},"PeriodicalIF":2.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11466942/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142462936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-25eCollection Date: 2024-01-01DOI: 10.3389/fnbot.2024.1471327
Yaolei Zhang, Fei Zhang, Yuanli Zhou, Xiao Xu
The advancements in intelligent action recognition can be instrumental in developing autonomous robotic systems capable of analyzing complex human activities in real-time, contributing to the growing field of robotics that operates in dynamic environments. The precise recognition of basketball players' actions using artificial intelligence technology can provide valuable assistance and guidance to athletes, coaches, and analysts, and can help referees make fairer decisions during games. However, unlike action recognition in simpler scenarios, the background in basketball is similar and complex, the differences between various actions are subtle, and lighting conditions are inconsistent, making action recognition in basketball a challenging task. To address this problem, an Adaptive Context-Aware Network (ACA-Net) for basketball player action recognition is proposed in this paper. It contains a Long Short-term Adaptive (LSTA) module and a Triplet Spatial-Channel Interaction (TSCI) module to extract effective features at the temporal, spatial, and channel levels. The LSTA module adaptively learns global and local temporal features of the video. The TSCI module enhances the feature representation by learning the interaction features between space and channels. We conducted extensive experiments on the popular basketball action recognition datasets SpaceJam and Basketball-51. The results show that ACA-Net outperforms the current mainstream methods, achieving 89.26% and 92.05% in terms of classification accuracy on the two datasets, respectively. ACA-Net's adaptable architecture also holds potential for real-world applications in autonomous robotics, where accurate recognition of complex human actions in unstructured environments is crucial for tasks such as automated game analysis, player performance evaluation, and enhanced interactive broadcasting experiences.
{"title":"ACA-Net: adaptive context-aware network for basketball action recognition.","authors":"Yaolei Zhang, Fei Zhang, Yuanli Zhou, Xiao Xu","doi":"10.3389/fnbot.2024.1471327","DOIUrl":"10.3389/fnbot.2024.1471327","url":null,"abstract":"<p><p>The advancements in intelligent action recognition can be instrumental in developing autonomous robotic systems capable of analyzing complex human activities in real-time, contributing to the growing field of robotics that operates in dynamic environments. The precise recognition of basketball players' actions using artificial intelligence technology can provide valuable assistance and guidance to athletes, coaches, and analysts, and can help referees make fairer decisions during games. However, unlike action recognition in simpler scenarios, the background in basketball is similar and complex, the differences between various actions are subtle, and lighting conditions are inconsistent, making action recognition in basketball a challenging task. To address this problem, an Adaptive Context-Aware Network (ACA-Net) for basketball player action recognition is proposed in this paper. It contains a Long Short-term Adaptive (LSTA) module and a Triplet Spatial-Channel Interaction (TSCI) module to extract effective features at the temporal, spatial, and channel levels. The LSTA module adaptively learns global and local temporal features of the video. The TSCI module enhances the feature representation by learning the interaction features between space and channels. We conducted extensive experiments on the popular basketball action recognition datasets SpaceJam and Basketball-51. The results show that ACA-Net outperforms the current mainstream methods, achieving 89.26% and 92.05% in terms of classification accuracy on the two datasets, respectively. ACA-Net's adaptable architecture also holds potential for real-world applications in autonomous robotics, where accurate recognition of complex human actions in unstructured environments is crucial for tasks such as automated game analysis, player performance evaluation, and enhanced interactive broadcasting experiences.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1471327"},"PeriodicalIF":2.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461453/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142389755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-24eCollection Date: 2024-01-01DOI: 10.3389/fnbot.2024.1452019
He Chen, Xiaoyu Yue
Introduction: Currently, using machine learning methods for precise analysis and improvement of swimming techniques holds significant research value and application prospects. The existing machine learning methods have improved the accuracy of action recognition to some extent. However, they still face several challenges such as insufficient data feature extraction, limited model generalization ability, and poor real-time performance.
Methods: To address these issues, this paper proposes an innovative approach called Swimtrans Net: A multimodal robotic system for swimming action recognition driven via Swin-Transformer. By leveraging the powerful visual data feature extraction capabilities of Swin-Transformer, Swimtrans Net effectively extracts swimming image information. Additionally, to meet the requirements of multimodal tasks, we integrate the CLIP model into the system. Swin-Transformer serves as the image encoder for CLIP, and through fine-tuning the CLIP model, it becomes capable of understanding and interpreting swimming action data, learning relevant features and patterns associated with swimming. Finally, we introduce transfer learning for pre-training to reduce training time and lower computational resources, thereby providing real-time feedback to swimmers.
Results and discussion: Experimental results show that Swimtrans Net has achieved a 2.94% improvement over the current state-of-the-art methods in swimming motion analysis and prediction, making significant progress. This study introduces an innovative machine learning method that can help coaches and swimmers better understand and improve swimming techniques, ultimately improving swimming performance.
导言:目前,利用机器学习方法对游泳技术进行精确分析和改进具有重要的研究价值和应用前景。现有的机器学习方法在一定程度上提高了动作识别的准确性。然而,它们仍然面临着数据特征提取不足、模型泛化能力有限、实时性差等挑战:为了解决这些问题,本文提出了一种名为 Swimtrans Net 的创新方法:方法:针对这些问题,本文提出了一种名为 Swimtrans Net 的创新方法:通过 Swin-Transformer 驱动的游泳动作识别多模态机器人系统。通过利用 Swin-Transformer 强大的视觉数据特征提取功能,Swimtrans Net 可有效提取游泳图像信息。此外,为了满足多模态任务的要求,我们在系统中集成了 CLIP 模型。Swin-Transformer 可作为 CLIP 的图像编码器,通过微调 CLIP 模型,它能够理解和解释游泳动作数据,学习与游泳相关的特征和模式。最后,我们引入迁移学习进行预训练,以减少训练时间和降低计算资源,从而为游泳者提供实时反馈:实验结果表明,在游泳运动分析和预测方面,Swimtrans Net 比目前最先进的方法提高了 2.94%,取得了显著进步。这项研究介绍了一种创新的机器学习方法,可以帮助教练和游泳运动员更好地理解和改进游泳技术,最终提高游泳成绩。
{"title":"Swimtrans Net: a multimodal robotic system for swimming action recognition driven via Swin-Transformer.","authors":"He Chen, Xiaoyu Yue","doi":"10.3389/fnbot.2024.1452019","DOIUrl":"10.3389/fnbot.2024.1452019","url":null,"abstract":"<p><strong>Introduction: </strong>Currently, using machine learning methods for precise analysis and improvement of swimming techniques holds significant research value and application prospects. The existing machine learning methods have improved the accuracy of action recognition to some extent. However, they still face several challenges such as insufficient data feature extraction, limited model generalization ability, and poor real-time performance.</p><p><strong>Methods: </strong>To address these issues, this paper proposes an innovative approach called Swimtrans Net: A multimodal robotic system for swimming action recognition driven via Swin-Transformer. By leveraging the powerful visual data feature extraction capabilities of Swin-Transformer, Swimtrans Net effectively extracts swimming image information. Additionally, to meet the requirements of multimodal tasks, we integrate the CLIP model into the system. Swin-Transformer serves as the image encoder for CLIP, and through fine-tuning the CLIP model, it becomes capable of understanding and interpreting swimming action data, learning relevant features and patterns associated with swimming. Finally, we introduce transfer learning for pre-training to reduce training time and lower computational resources, thereby providing real-time feedback to swimmers.</p><p><strong>Results and discussion: </strong>Experimental results show that Swimtrans Net has achieved a 2.94% improvement over the current state-of-the-art methods in swimming motion analysis and prediction, making significant progress. This study introduces an innovative machine learning method that can help coaches and swimmers better understand and improve swimming techniques, ultimately improving swimming performance.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1452019"},"PeriodicalIF":2.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11458561/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142389758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-24eCollection Date: 2024-01-01DOI: 10.3389/fnbot.2024.1383089
Artur Pilacinski, Lukas Christ, Marius Boshoff, Ioannis Iossifidis, Patrick Adler, Michael Miro, Bernd Kuhlenkötter, Christian Klaes
Human activity recognition (HAR) and brain-machine interface (BMI) are two emerging technologies that can enhance human-robot collaboration (HRC) in domains such as industry or healthcare. HAR uses sensors or cameras to capture and analyze the movements and actions of humans, while BMI uses human brain signals to decode action intentions. Both technologies face challenges impacting accuracy, reliability, and usability. In this article, we review the state-of-the-art techniques and methods for HAR and BMI and highlight their strengths and limitations. We then propose a hybrid framework that fuses HAR and BMI data, which can integrate the complementary information from the brain and body motion signals and improve the performance of human state decoding. We also discuss our hybrid method's potential benefits and implications for HRC.
人类活动识别(HAR)和脑机接口(BMI)是两项新兴技术,可增强工业或医疗保健等领域的人机协作(HRC)。HAR 使用传感器或摄像头来捕捉和分析人类的动作和行动,而 BMI 则使用人脑信号来解码行动意图。这两种技术都面临着影响准确性、可靠性和可用性的挑战。在本文中,我们回顾了 HAR 和 BMI 的最新技术和方法,并强调了它们的优势和局限性。然后,我们提出了一种融合 HAR 和 BMI 数据的混合框架,它可以整合大脑和身体运动信号的互补信息,提高人类状态解码的性能。我们还讨论了混合方法的潜在优势和对 HRC 的影响。
{"title":"Human in the collaborative loop: a strategy for integrating human activity recognition and non-invasive brain-machine interfaces to control collaborative robots.","authors":"Artur Pilacinski, Lukas Christ, Marius Boshoff, Ioannis Iossifidis, Patrick Adler, Michael Miro, Bernd Kuhlenkötter, Christian Klaes","doi":"10.3389/fnbot.2024.1383089","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1383089","url":null,"abstract":"<p><p>Human activity recognition (HAR) and brain-machine interface (BMI) are two emerging technologies that can enhance human-robot collaboration (HRC) in domains such as industry or healthcare. HAR uses sensors or cameras to capture and analyze the movements and actions of humans, while BMI uses human brain signals to decode action intentions. Both technologies face challenges impacting accuracy, reliability, and usability. In this article, we review the state-of-the-art techniques and methods for HAR and BMI and highlight their strengths and limitations. We then propose a hybrid framework that fuses HAR and BMI data, which can integrate the complementary information from the brain and body motion signals and improve the performance of human state decoding. We also discuss our hybrid method's potential benefits and implications for HRC.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1383089"},"PeriodicalIF":2.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11458527/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142389756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23eCollection Date: 2024-01-01DOI: 10.3389/fnbot.2024.1461403
Ding Feng, Dengao Li, Yu Zhou, Wei Wang
Introduction: Residential load forecasting is a challenging task due to the random fluctuations caused by complex correlations and individual differences. The existing short-term load forecasting models usually introduce external influencing factors such as climate and date. However, these additional information not only bring computational burden to the model, but also have uncertainty. To address these issues, we propose a novel multi-level feature fusion model based on graph attention temporal convolutional network (MLFGCN) for short-term residential load forecasting.
Methods: The proposed MLFGCN model fully considers the potential long-term dependencies in a single load series and the correlations between multiple load series, and does not require any additional information to be added. Temporal convolutional network (TCN) with gating mechanism is introduced to learn potential long-term dependencies in the original load series. In addition, we design two graph attentive convolutional modules to capture potential multi-level dependencies in load data. Finally, the outputs of each module are fused through an information fusion layer to obtain the highly accurate forecasting results.
Results: We conduct validation experiments on two real-world datasets. The results show that the proposed MLFGCN model achieves 0.25, 7.58% and 0.50 for MAE, MAPE and RMSE, respectively. These values are significantly better than those of baseline models.
Discussion: The MLFGCN algorithm proposed in this paper can significantly improve the accuracy of short-term residential load forecasting. This is achieved through high-quality feature reconstruction, comprehensive information graph construction and spatiotemporal features capture.
{"title":"MLFGCN: short-term residential load forecasting via graph attention temporal convolution network.","authors":"Ding Feng, Dengao Li, Yu Zhou, Wei Wang","doi":"10.3389/fnbot.2024.1461403","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1461403","url":null,"abstract":"<p><strong>Introduction: </strong>Residential load forecasting is a challenging task due to the random fluctuations caused by complex correlations and individual differences. The existing short-term load forecasting models usually introduce external influencing factors such as climate and date. However, these additional information not only bring computational burden to the model, but also have uncertainty. To address these issues, we propose a novel multi-level feature fusion model based on graph attention temporal convolutional network (MLFGCN) for short-term residential load forecasting.</p><p><strong>Methods: </strong>The proposed MLFGCN model fully considers the potential long-term dependencies in a single load series and the correlations between multiple load series, and does not require any additional information to be added. Temporal convolutional network (TCN) with gating mechanism is introduced to learn potential long-term dependencies in the original load series. In addition, we design two graph attentive convolutional modules to capture potential multi-level dependencies in load data. Finally, the outputs of each module are fused through an information fusion layer to obtain the highly accurate forecasting results.</p><p><strong>Results: </strong>We conduct validation experiments on two real-world datasets. The results show that the proposed MLFGCN model achieves 0.25, 7.58% and 0.50 for MAE, MAPE and RMSE, respectively. These values are significantly better than those of baseline models.</p><p><strong>Discussion: </strong>The MLFGCN algorithm proposed in this paper can significantly improve the accuracy of short-term residential load forecasting. This is achieved through high-quality feature reconstruction, comprehensive information graph construction and spatiotemporal features capture.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1461403"},"PeriodicalIF":2.6,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11457015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142389757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.3389/fnbot.2024.1429952
Shaowen Cheng, Yongbin Jin, Yanhong Liang, Lei Jiang, Hongtao Wang
Robot control in complex and unpredictable scenarios presents challenges such as adaptability, robustness, and human-robot interaction. These scenarios often require robots to perform tasks that involve unknown objects in unstructured environments with high levels of uncertainty. Traditional control methods, such as automatic control, may not be suitable due to their limited adaptability and reliance on prior knowledge. Human-in-the-loop method faces issues such as insufficient feedback, increased failure rates due to noise and delays, and lack of operator immersion, preventing the achievement of human-level performance. This study proposed a shared control framework to achieve a trade-off between efficiency and adaptability by combing the advantages of both teleoperation and automatic control method. The proposed approach combines the advantages of both human and automatic control methods to achieve a balance between performance and adaptability. We developed a linear model to compare three control methods and analyzed the impact of position noise and communication delays on performance. The real-world implementation of the shared control system demonstrates its effectiveness in object grasping and manipulation tasks. The results suggest that shared control can significantly improve grasping efficiency while maintaining adaptability in task execution for practical robotics applications.
{"title":"An efficient grasping shared control architecture for unpredictable and unspecified tasks","authors":"Shaowen Cheng, Yongbin Jin, Yanhong Liang, Lei Jiang, Hongtao Wang","doi":"10.3389/fnbot.2024.1429952","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1429952","url":null,"abstract":"Robot control in complex and unpredictable scenarios presents challenges such as adaptability, robustness, and human-robot interaction. These scenarios often require robots to perform tasks that involve unknown objects in unstructured environments with high levels of uncertainty. Traditional control methods, such as automatic control, may not be suitable due to their limited adaptability and reliance on prior knowledge. Human-in-the-loop method faces issues such as insufficient feedback, increased failure rates due to noise and delays, and lack of operator immersion, preventing the achievement of human-level performance. This study proposed a shared control framework to achieve a trade-off between efficiency and adaptability by combing the advantages of both teleoperation and automatic control method. The proposed approach combines the advantages of both human and automatic control methods to achieve a balance between performance and adaptability. We developed a linear model to compare three control methods and analyzed the impact of position noise and communication delays on performance. The real-world implementation of the shared control system demonstrates its effectiveness in object grasping and manipulation tasks. The results suggest that shared control can significantly improve grasping efficiency while maintaining adaptability in task execution for practical robotics applications.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"22 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.3389/fnbot.2024.1442080
Ziang Du, Xia Ye, Pujie Zhao
Physiological signal recognition is crucial in emotion recognition, and recent advancements in multi-modal fusion have enabled the integration of various physiological signals for improved recognition tasks. However, current models for emotion recognition with hyper complex multi-modal signals face limitations due to fusion methods and insufficient attention mechanisms, preventing further enhancement in classification performance. To address these challenges, we propose a new model framework named Signal Channel Attention Network (SCA-Net), which comprises three main components: an encoder, an attention fusion module, and a decoder. In the attention fusion module, we developed five types of attention mechanisms inspired by existing research and performed comparative experiments using the public dataset MAHNOB-HCI. All of these experiments demonstrate the effectiveness of the attention module we addressed for our baseline model in improving both accuracy and F1 score metrics. We also conducted ablation experiments within the most effective attention fusion module to verify the benefits of multi-modal fusion. Additionally, we adjusted the training process for different attention fusion modules by employing varying early stopping parameters to prevent model overfitting.
生理信号识别在情绪识别中至关重要,而最近在多模态融合方面取得的进展使得整合各种生理信号以改进识别任务成为可能。然而,由于融合方法和注意力机制的不足,目前用于超复杂多模态信号情绪识别的模型面临着局限性,无法进一步提高分类性能。为了应对这些挑战,我们提出了一个名为 "信号通道注意力网络(SCA-Net)"的新模型框架,它由三个主要部分组成:编码器、注意力融合模块和解码器。在注意力融合模块中,我们受现有研究启发开发了五种注意力机制,并使用公共数据集 MAHNOB-HCI 进行了对比实验。所有这些实验都证明了我们为基线模型设计的注意力模块在提高准确率和 F1 分数指标方面的有效性。我们还在最有效的注意力融合模块内进行了消融实验,以验证多模态融合的优势。此外,我们还调整了不同注意力融合模块的训练过程,采用了不同的早期停止参数,以防止模型过度拟合。
{"title":"A novel signal channel attention network for multi-modal emotion recognition","authors":"Ziang Du, Xia Ye, Pujie Zhao","doi":"10.3389/fnbot.2024.1442080","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1442080","url":null,"abstract":"Physiological signal recognition is crucial in emotion recognition, and recent advancements in multi-modal fusion have enabled the integration of various physiological signals for improved recognition tasks. However, current models for emotion recognition with hyper complex multi-modal signals face limitations due to fusion methods and insufficient attention mechanisms, preventing further enhancement in classification performance. To address these challenges, we propose a new model framework named Signal Channel Attention Network (SCA-Net), which comprises three main components: an encoder, an attention fusion module, and a decoder. In the attention fusion module, we developed five types of attention mechanisms inspired by existing research and performed comparative experiments using the public dataset MAHNOB-HCI. All of these experiments demonstrate the effectiveness of the attention module we addressed for our baseline model in improving both accuracy and F1 score metrics. We also conducted ablation experiments within the most effective attention fusion module to verify the benefits of multi-modal fusion. Additionally, we adjusted the training process for different attention fusion modules by employing varying early stopping parameters to prevent model overfitting.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"10 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.3389/fnbot.2024.1451055
Yanchun Xie, Xue Zhao, Yang Jiang, Yao Wu, Hailong Yu
This paper introduces the flexible control and trajectory planning medical two-arm surgical robots, and employs effective collision detection methods to ensure the safety and precision during tasks. Firstly, the DH method is employed to establish relative rotation matrices between coordinate systems, determining the relative relationships of each joint link. A neural network based on a multilayer perceptron is proposed to solve FKP problem in real time. Secondly, a universal interpolator based on Non-Uniform Rational B-Splines (NURBS) is developed, capable of handling any geometric shape to ensure smooth and flexible motion trajectories. Finally, we developed a generalized momentum observer to detect external collisions, eliminating the need for external sensors and thereby reducing mechanical complexity and cost. The experiments verify the effectiveness of the kinematics solution and trajectory planning, demonstrating that the improved momentum torque observer can significantly reduce system overshoot, enabling the two-arm surgical robot to perform precise and safe surgical tasks under algorithmic guidance.
{"title":"Flexible control and trajectory planning of medical two-arm surgical robot","authors":"Yanchun Xie, Xue Zhao, Yang Jiang, Yao Wu, Hailong Yu","doi":"10.3389/fnbot.2024.1451055","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1451055","url":null,"abstract":"This paper introduces the flexible control and trajectory planning medical two-arm surgical robots, and employs effective collision detection methods to ensure the safety and precision during tasks. Firstly, the DH method is employed to establish relative rotation matrices between coordinate systems, determining the relative relationships of each joint link. A neural network based on a multilayer perceptron is proposed to solve FKP problem in real time. Secondly, a universal interpolator based on Non-Uniform Rational B-Splines (NURBS) is developed, capable of handling any geometric shape to ensure smooth and flexible motion trajectories. Finally, we developed a generalized momentum observer to detect external collisions, eliminating the need for external sensors and thereby reducing mechanical complexity and cost. The experiments verify the effectiveness of the kinematics solution and trajectory planning, demonstrating that the improved momentum torque observer can significantly reduce system overshoot, enabling the two-arm surgical robot to perform precise and safe surgical tasks under algorithmic guidance.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"177 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.3389/fnbot.2024.1448538
Muhammad Ovais Yusuf, Muhammad Hanzla, Naif Al Mudawi, Touseef Sadiq, Bayan Alabdullah, Hameedur Rahman, Asaad Algarni
IntroductionAdvanced traffic monitoring systems face significant challenges in vehicle detection and classification. Conventional methods often require substantial computational resources and struggle to adapt to diverse data collection methods.MethodsThis research introduces an innovative technique for classifying and recognizing vehicles in aerial image sequences. The proposed model encompasses several phases, starting with image enhancement through noise reduction and Contrast Limited Adaptive Histogram Equalization (CLAHE). Following this, contour-based segmentation and Fuzzy C-means segmentation (FCM) are applied to identify foreground objects. Vehicle detection and identification are performed using EfficientDet. For feature extraction, Accelerated KAZE (AKAZE), Oriented FAST and Rotated BRIEF (ORB), and Scale Invariant Feature Transform (SIFT) are utilized. Object classification is achieved through a Convolutional Neural Network (CNN) and ResNet Residual Network.ResultsThe proposed method demonstrates improved performance over previous approaches. Experiments on datasets including Vehicle Aerial Imagery from a Drone (VAID) and Unmanned Aerial Vehicle Intruder Dataset (UAVID) reveal that the model achieves an accuracy of 96.6% on UAVID and 97% on VAID.DiscussionThe results indicate that the proposed model significantly enhances vehicle detection and classification in aerial images, surpassing existing methods and offering notable improvements for traffic monitoring systems.
引言 先进的交通监控系统在车辆检测和分类方面面临巨大挑战。传统方法通常需要大量的计算资源,而且难以适应各种数据收集方法。所提出的模型包括几个阶段,首先是通过降噪和对比度受限自适应直方图均衡(CLAHE)进行图像增强。随后,应用基于轮廓的分割和模糊 C-means 分割 (FCM) 来识别前景物体。在特征提取方面,使用了加速 KAZE(AKAZE)、定向 FAST 和旋转 BRIEF(ORB)以及尺度不变特征变换(SIFT)。通过卷积神经网络(CNN)和 ResNet 残差网络实现物体分类。在无人机飞行器空中图像(VAID)和无人驾驶飞行器入侵数据集(UAVID)等数据集上进行的实验表明,该模型在 UAVID 上的准确率达到 96.6%,在 VAID 上的准确率达到 97%。
{"title":"Target detection and classification via EfficientDet and CNN over unmanned aerial vehicles","authors":"Muhammad Ovais Yusuf, Muhammad Hanzla, Naif Al Mudawi, Touseef Sadiq, Bayan Alabdullah, Hameedur Rahman, Asaad Algarni","doi":"10.3389/fnbot.2024.1448538","DOIUrl":"https://doi.org/10.3389/fnbot.2024.1448538","url":null,"abstract":"IntroductionAdvanced traffic monitoring systems face significant challenges in vehicle detection and classification. Conventional methods often require substantial computational resources and struggle to adapt to diverse data collection methods.MethodsThis research introduces an innovative technique for classifying and recognizing vehicles in aerial image sequences. The proposed model encompasses several phases, starting with image enhancement through noise reduction and Contrast Limited Adaptive Histogram Equalization (CLAHE). Following this, contour-based segmentation and Fuzzy C-means segmentation (FCM) are applied to identify foreground objects. Vehicle detection and identification are performed using EfficientDet. For feature extraction, Accelerated KAZE (AKAZE), Oriented FAST and Rotated BRIEF (ORB), and Scale Invariant Feature Transform (SIFT) are utilized. Object classification is achieved through a Convolutional Neural Network (CNN) and ResNet Residual Network.ResultsThe proposed method demonstrates improved performance over previous approaches. Experiments on datasets including Vehicle Aerial Imagery from a Drone (VAID) and Unmanned Aerial Vehicle Intruder Dataset (UAVID) reveal that the model achieves an accuracy of 96.6% on UAVID and 97% on VAID.DiscussionThe results indicate that the proposed model significantly enhances vehicle detection and classification in aerial images, surpassing existing methods and offering notable improvements for traffic monitoring systems.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"27 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}