Frontiers in Neurorobotics最新文献_第4页

Cross-attention swin-transformer for detailed segmentation of ancient architectural color patterns. 古建筑色彩图案精细分割的交叉关注旋转变压器。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-12-13 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1513488

Lv Yongyin, Yu Caixia

Introduction: Segmentation tasks in computer vision play a crucial role in various applications, ranging from object detection to medical imaging and cultural heritage preservation. Traditional approaches, including convolutional neural networks (CNNs) and standard transformer-based models, have achieved significant success; however, they often face challenges in capturing fine-grained details and maintaining efficiency across diverse datasets. These methods struggle with balancing precision and computational efficiency, especially when dealing with complex patterns and high-resolution images.

Methods: To address these limitations, we propose a novel segmentation model that integrates a hierarchical vision transformer backbone with multi-scale self-attention, cascaded attention decoding, and diffusion-based robustness enhancement. Our approach aims to capture both local details and global contexts effectively while maintaining lower computational overhead.

Results and discussion: Experiments conducted on four diverse datasets, including Ancient Architecture, MS COCO, Cityscapes, and ScanNet, demonstrate that our model outperforms state-of-the-art methods in accuracy, recall, and computational efficiency. The results highlight the model's ability to generalize well across different tasks and provide robust segmentation, even in challenging scenarios. Our work paves the way for more efficient and precise segmentation techniques, making it valuable for applications where both detail and speed are critical.

计算机视觉中的分割任务在各种应用中发挥着至关重要的作用，从物体检测到医学成像和文化遗产保护。传统的方法，包括卷积神经网络（cnn）和标准的基于变压器的模型，已经取得了显著的成功；然而，他们在捕获细粒度的细节和保持跨不同数据集的效率方面经常面临挑战。这些方法努力平衡精度和计算效率，特别是在处理复杂模式和高分辨率图像时。方法：为了解决这些局限性，我们提出了一种新的分割模型，该模型将分层视觉转换主干与多尺度自注意、级联注意解码和基于扩散的鲁棒性增强相结合。我们的方法旨在有效地捕获局部细节和全局上下文，同时保持较低的计算开销。结果和讨论：在四个不同的数据集上进行的实验，包括古建筑、MS COCO、城市景观和ScanNet，表明我们的模型在准确性、召回率和计算效率方面优于最先进的方法。结果表明，即使在具有挑战性的场景中，该模型也能很好地泛化不同任务，并提供稳健的分割。我们的工作为更有效和精确的分割技术铺平了道路，使其对细节和速度都至关重要的应用程序有价值。

{"title":"Cross-attention swin-transformer for detailed segmentation of ancient architectural color patterns.","authors":"Lv Yongyin, Yu Caixia","doi":"10.3389/fnbot.2024.1513488","DOIUrl":"10.3389/fnbot.2024.1513488","url":null,"abstract":"Introduction: Segmentation tasks in computer vision play a crucial role in various applications, ranging from object detection to medical imaging and cultural heritage preservation. Traditional approaches, including convolutional neural networks (CNNs) and standard transformer-based models, have achieved significant success; however, they often face challenges in capturing fine-grained details and maintaining efficiency across diverse datasets. These methods struggle with balancing precision and computational efficiency, especially when dealing with complex patterns and high-resolution images.Methods: To address these limitations, we propose a novel segmentation model that integrates a hierarchical vision transformer backbone with multi-scale self-attention, cascaded attention decoding, and diffusion-based robustness enhancement. Our approach aims to capture both local details and global contexts effectively while maintaining lower computational overhead.Results and discussion: Experiments conducted on four diverse datasets, including Ancient Architecture, MS COCO, Cityscapes, and ScanNet, demonstrate that our model outperforms state-of-the-art methods in accuracy, recall, and computational efficiency. The results highlight the model's ability to generalize well across different tasks and provide robust segmentation, even in challenging scenarios. Our work paves the way for more efficient and precise segmentation techniques, making it valuable for applications where both detail and speed are critical.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1513488"},"PeriodicalIF":2.6,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D convolutional neural network based on spatial-spectral feature pictures learning for decoding motor imagery EEG signal. 基于空间频谱特征图学习的三维卷积神经网络在运动意象脑电信号解码中的应用。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-12-10 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1485640

Xiaoguang Li, Yaqi Chu, Xuejian Wu

Non-invasive brain-computer interfaces (BCI) hold great promise in the field of neurorehabilitation. They are easy to use and do not require surgery, particularly in the area of motor imagery electroencephalography (EEG). However, motor imagery EEG signals often have a low signal-to-noise ratio and limited spatial and temporal resolution. Traditional deep neural networks typically only focus on the spatial and temporal features of EEG, resulting in relatively low decoding and accuracy rates for motor imagery tasks. To address these challenges, this paper proposes a 3D Convolutional Neural Network (P-3DCNN) decoding method that jointly learns spatial-frequency feature maps from the frequency and spatial domains of the EEG signals. First, the Welch method is used to calculate the frequency band power spectrum of the EEG, and a 2D matrix representing the spatial topology distribution of the electrodes is constructed. These spatial-frequency representations are then generated through cubic interpolation of the temporal EEG data. Next, the paper designs a 3DCNN network with 1D and 2D convolutional layers in series to optimize the convolutional kernel parameters and effectively learn the spatial-frequency features of the EEG. Batch normalization and dropout are also applied to improve the training speed and classification performance of the network. Finally, through experiments, the proposed method is compared to various classic machine learning and deep learning techniques. The results show an average decoding accuracy rate of 86.69%, surpassing other advanced networks. This demonstrates the effectiveness of our approach in decoding motor imagery EEG and offers valuable insights for the development of BCI.

无创脑机接口（BCI）在神经康复领域具有广阔的应用前景。它们易于使用，不需要手术，特别是在运动图像脑电图（EEG）领域。然而，运动意象脑电信号通常具有低信噪比和有限的空间和时间分辨率。传统的深度神经网络通常只关注脑电图的空间和时间特征，导致运动图像任务的解码率和准确率相对较低。为了解决这些问题，本文提出了一种3D卷积神经网络（P-3DCNN）解码方法，该方法从脑电图信号的频率域和空间域共同学习空间-频率特征映射。首先，采用Welch方法计算EEG的频带功率谱，构造一个表示电极空间拓扑分布的二维矩阵；然后通过对时间脑电图数据的三次插值生成这些空间频率表示。接下来，设计了一维和二维卷积层串联的3DCNN网络，优化卷积核参数，有效学习脑电的空间频率特征。为了提高网络的训练速度和分类性能，还采用了批归一化和dropout方法。最后，通过实验，将该方法与各种经典的机器学习和深度学习技术进行了比较。结果表明，平均解码准确率为86.69%，超过其他先进网络。这证明了我们的方法在解码运动图像脑电图方面的有效性，并为脑机接口的发展提供了有价值的见解。

{"title":"3D convolutional neural network based on spatial-spectral feature pictures learning for decoding motor imagery EEG signal.","authors":"Xiaoguang Li, Yaqi Chu, Xuejian Wu","doi":"10.3389/fnbot.2024.1485640","DOIUrl":"10.3389/fnbot.2024.1485640","url":null,"abstract":"Non-invasive brain-computer interfaces (BCI) hold great promise in the field of neurorehabilitation. They are easy to use and do not require surgery, particularly in the area of motor imagery electroencephalography (EEG). However, motor imagery EEG signals often have a low signal-to-noise ratio and limited spatial and temporal resolution. Traditional deep neural networks typically only focus on the spatial and temporal features of EEG, resulting in relatively low decoding and accuracy rates for motor imagery tasks. To address these challenges, this paper proposes a 3D Convolutional Neural Network (P-3DCNN) decoding method that jointly learns spatial-frequency feature maps from the frequency and spatial domains of the EEG signals. First, the Welch method is used to calculate the frequency band power spectrum of the EEG, and a 2D matrix representing the spatial topology distribution of the electrodes is constructed. These spatial-frequency representations are then generated through cubic interpolation of the temporal EEG data. Next, the paper designs a 3DCNN network with 1D and 2D convolutional layers in series to optimize the convolutional kernel parameters and effectively learn the spatial-frequency features of the EEG. Batch normalization and dropout are also applied to improve the training speed and classification performance of the network. Finally, through experiments, the proposed method is compared to various classic machine learning and deep learning techniques. The results show an average decoding accuracy rate of 86.69%, surpassing other advanced networks. This demonstrates the effectiveness of our approach in decoding motor imagery EEG and offers valuable insights for the development of BCI.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1485640"},"PeriodicalIF":2.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11667157/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142885708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An improved graph factorization machine based on solving unbalanced game perception. 一种基于求解不平衡博弈感知的改进图分解机。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-12-04 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1481297

Xiaoxia Xie, Yuan Jia, Tiande Ma

The user perception of mobile game is crucial for improving user experience and thus enhancing game profitability. The sparse data captured in the game can lead to sporadic performance of the model. This paper proposes a new method, the balanced graph factorization machine (BGFM), based on existing algorithms, considering the data imbalance and important high-dimensional features. The data categories are first balanced by Borderline-SMOTE oversampling, and then features are represented naturally in a graph-structured way. The highlight is that the BGFM contains interaction mechanisms for aggregating beneficial features. The results are represented as edges in the graph. Next, BGFM combines factorization machine (FM) and graph neural network strategies to concatenate any sequential feature interactions of features in the graph with an attention mechanism that assigns inter-feature weights. Experiments were conducted on the collected game perception dataset. The performance of proposed BGFM was compared with eight state-of-the-art models, significantly surpassing all of them by AUC, precision, recall, and F-measure indices.

用户对手机游戏的感知对于改善用户体验，从而提高游戏的盈利能力至关重要。游戏中捕获的稀疏数据可能导致模型的零星性能。本文在现有算法的基础上，考虑到数据的不平衡性和重要的高维特征，提出了一种新的算法——平衡图分解机（BGFM）。首先通过Borderline-SMOTE过采样平衡数据类别，然后以图结构的方式自然地表示特征。重点是BGFM包含聚合有益特性的交互机制。结果表示为图中的边。其次，BGFM结合因子分解机（FM）和图神经网络策略，通过分配特征间权重的注意机制将图中特征的任何顺序特征交互连接起来。在收集到的游戏感知数据集上进行实验。将所提出的BGFM与8种最先进的模型进行了比较，在AUC、精度、召回率和F-measure指标上显著优于所有模型。

{"title":"An improved graph factorization machine based on solving unbalanced game perception.","authors":"Xiaoxia Xie, Yuan Jia, Tiande Ma","doi":"10.3389/fnbot.2024.1481297","DOIUrl":"10.3389/fnbot.2024.1481297","url":null,"abstract":"The user perception of mobile game is crucial for improving user experience and thus enhancing game profitability. The sparse data captured in the game can lead to sporadic performance of the model. This paper proposes a new method, the balanced graph factorization machine (BGFM), based on existing algorithms, considering the data imbalance and important high-dimensional features. The data categories are first balanced by Borderline-SMOTE oversampling, and then features are represented naturally in a graph-structured way. The highlight is that the BGFM contains interaction mechanisms for aggregating beneficial features. The results are represented as edges in the graph. Next, BGFM combines factorization machine (FM) and graph neural network strategies to concatenate any sequential feature interactions of features in the graph with an attention mechanism that assigns inter-feature weights. Experiments were conducted on the collected game perception dataset. The performance of proposed BGFM was compared with eight state-of-the-art models, significantly surpassing all of them by AUC, precision, recall, and F-measure indices.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1481297"},"PeriodicalIF":2.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652536/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142853907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unmanned aerial vehicles for human detection and recognition using neural-network model. 基于神经网络的无人机人体检测与识别模型。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-12-04 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1443678

Yawar Abbas, Naif Al Mudawi, Bayan Alabdullah, Touseef Sadiq, Asaad Algarni, Hameedur Rahman, Ahmad Jalal

Introduction: Recognizing human actions is crucial for allowing machines to understand and recognize human behavior, with applications spanning video based surveillance systems, human-robot collaboration, sports analysis systems, and entertainment. The immense diversity in human movement and appearance poses a significant challenge in this field, especially when dealing with drone-recorded (RGB) videos. Factors such as dynamic backgrounds, motion blur, occlusions, varying video capture angles, and exposure issues greatly complicate recognition tasks.

Methods: In this study, we suggest a method that addresses these challenges in RGB videos captured by drones. Our approach begins by segmenting the video into individual frames, followed by preprocessing steps applied to these RGB frames. The preprocessing aims to reduce computational costs, optimize image quality, and enhance foreground objects while removing the background.

Result: This results in improved visibility of foreground objects while eliminating background noise. Next, we employ the YOLOv9 detection algorithm to identify human bodies within the images. From the grayscale silhouette, we extract the human skeleton and identify 15 important locations, such as the head, neck, shoulders (left and right), elbows, wrists, hips, knees, ankles, and hips (left and right), and belly button. By using all these points, we extract specific positions, angular and distance relationships between them, as well as 3D point clouds and fiducial points. Subsequently, we optimize this data using the kernel discriminant analysis (KDA) optimizer, followed by classification using a deep neural network (CNN). To validate our system, we conducted experiments on three benchmark datasets: UAV-Human, UCF, and Drone-Action.

Discussion: On these datasets, our suggested model produced corresponding action recognition accuracies of 0.68, 0.75, and 0.83.

导读：识别人类行为对于让机器理解和识别人类行为至关重要，其应用涵盖基于视频的监控系统、人机协作、体育分析系统和娱乐。人类运动和外观的巨大多样性在这一领域提出了重大挑战，特别是在处理无人机录制的（RGB）视频时。诸如动态背景、运动模糊、遮挡、不同的视频捕捉角度和曝光问题等因素极大地复杂化了识别任务。方法：在本研究中，我们提出了一种方法来解决无人机捕获的RGB视频中的这些挑战。我们的方法首先将视频分割成单独的帧，然后是应用于这些RGB帧的预处理步骤。预处理的目的是减少计算成本，优化图像质量，增强前景对象，同时去除背景。结果：这提高了前景物体的可见度，同时消除了背景噪声。接下来，我们使用YOLOv9检测算法来识别图像中的人体。从灰度剪影中，我们提取出人体骨架，并识别出15个重要位置，如头部、颈部、肩部（左和右）、肘部、手腕、臀部、膝盖、脚踝和臀部（左和右）以及肚脐。通过使用所有这些点，我们提取了它们之间的特定位置，角度和距离关系，以及3D点云和基准点。随后，我们使用核判别分析（KDA）优化器对该数据进行优化，然后使用深度神经网络（CNN）进行分类。为了验证我们的系统，我们在三个基准数据集上进行了实验：UAV-Human， UCF和Drone-Action。讨论：在这些数据集上，我们建议的模型产生相应的动作识别精度分别为0.68、0.75和0.83。

{"title":"Unmanned aerial vehicles for human detection and recognition using neural-network model.","authors":"Yawar Abbas, Naif Al Mudawi, Bayan Alabdullah, Touseef Sadiq, Asaad Algarni, Hameedur Rahman, Ahmad Jalal","doi":"10.3389/fnbot.2024.1443678","DOIUrl":"10.3389/fnbot.2024.1443678","url":null,"abstract":"Introduction: Recognizing human actions is crucial for allowing machines to understand and recognize human behavior, with applications spanning video based surveillance systems, human-robot collaboration, sports analysis systems, and entertainment. The immense diversity in human movement and appearance poses a significant challenge in this field, especially when dealing with drone-recorded (RGB) videos. Factors such as dynamic backgrounds, motion blur, occlusions, varying video capture angles, and exposure issues greatly complicate recognition tasks.Methods: In this study, we suggest a method that addresses these challenges in RGB videos captured by drones. Our approach begins by segmenting the video into individual frames, followed by preprocessing steps applied to these RGB frames. The preprocessing aims to reduce computational costs, optimize image quality, and enhance foreground objects while removing the background.Result: This results in improved visibility of foreground objects while eliminating background noise. Next, we employ the YOLOv9 detection algorithm to identify human bodies within the images. From the grayscale silhouette, we extract the human skeleton and identify 15 important locations, such as the head, neck, shoulders (left and right), elbows, wrists, hips, knees, ankles, and hips (left and right), and belly button. By using all these points, we extract specific positions, angular and distance relationships between them, as well as 3D point clouds and fiducial points. Subsequently, we optimize this data using the kernel discriminant analysis (KDA) optimizer, followed by classification using a deep neural network (CNN). To validate our system, we conducted experiments on three benchmark datasets: UAV-Human, UCF, and Drone-Action.Discussion: On these datasets, our suggested model produced corresponding action recognition accuracies of 0.68, 0.75, and 0.83.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1443678"},"PeriodicalIF":2.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142853909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Posture-invariant myoelectric control with self-calibrating random forests. 具有自校正随机森林的姿态不变肌电控制。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-12-04 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1462023

Xinyu Jiang, Chenfei Ma, Kianoush Nazarpour

Introduction: Myoelectric control systems translate different patterns of electromyographic (EMG) signals into the control commands of diverse human-machine interfaces via hand gesture recognition, enabling intuitive control of prosthesis and immersive interactions in the metaverse. The effect of arm position is a confounding factor leading to the variability of EMG characteristics. Developing a model with its characteristics and performance invariant across postures, could largely promote the translation of myoelectric control into real world practice.

Methods: Here we propose a self-calibrating random forest (RF) model which can (1) be pre-trained on data from many users, then one-shot calibrated on a new user and (2) self-calibrate in an unsupervised and autonomous way to adapt to varying arm positions.

Results: Analyses on data from 86 participants (66 for pre-training and 20 in real-time evaluation experiments) demonstrate the high generalisability of the proposed RF architecture to varying arm positions.

Discussion: Our work promotes the use of simple, explainable, efficient and parallelisable model for posture-invariant myoelectric control.

简介：肌电控制系统通过手势识别，将不同模式的肌电信号转化为各种人机界面的控制命令，实现对假肢的直观控制和虚拟世界的沉浸式交互。手臂位置的影响是导致肌电特征变异性的一个混杂因素。开发一个具有不同姿势的特征和性能不变的模型，可以在很大程度上促进肌电控制在现实世界中的应用。方法：本文提出了一种自校准随机森林（RF）模型，该模型可以(1)对来自许多用户的数据进行预训练，然后对新用户进行一次校准；(2)以无监督和自主的方式进行自校准，以适应不同的手臂位置。结果：对86名参与者（66名用于预训练，20名用于实时评估实验）的数据进行分析，证明了所提出的射频架构对不同手臂位置的高通用性。讨论：我们的工作促进了使用简单，可解释，高效和并行模型的姿势不变肌电控制。

{"title":"Posture-invariant myoelectric control with self-calibrating random forests.","authors":"Xinyu Jiang, Chenfei Ma, Kianoush Nazarpour","doi":"10.3389/fnbot.2024.1462023","DOIUrl":"10.3389/fnbot.2024.1462023","url":null,"abstract":"Introduction: Myoelectric control systems translate different patterns of electromyographic (EMG) signals into the control commands of diverse human-machine interfaces via hand gesture recognition, enabling intuitive control of prosthesis and immersive interactions in the metaverse. The effect of arm position is a confounding factor leading to the variability of EMG characteristics. Developing a model with its characteristics and performance invariant across postures, could largely promote the translation of myoelectric control into real world practice.Methods: Here we propose a self-calibrating random forest (RF) model which can (1) be pre-trained on data from many users, then one-shot calibrated on a new user and (2) self-calibrate in an unsupervised and autonomous way to adapt to varying arm positions.Results: Analyses on data from 86 participants (66 for pre-training and 20 in real-time evaluation experiments) demonstrate the high generalisability of the proposed RF architecture to varying arm positions.Discussion: Our work promotes the use of simple, explainable, efficient and parallelisable model for posture-invariant myoelectric control.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1462023"},"PeriodicalIF":2.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652494/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142853908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EEG-based action anticipation in human-robot interaction: a comparative pilot study. 人机交互中基于脑电图的动作预期：一项比较试点研究。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-12-03 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1491721

Rodrigo Vieira, Plinio Moreno, Athanasios Vourvopoulos

As robots become integral to various sectors, improving human-robot collaboration is crucial, particularly in anticipating human actions to enhance safety and efficiency. Electroencephalographic (EEG) signals offer a promising solution, as they can detect brain activity preceding movement by over a second, enabling predictive capabilities in robots. This study explores how EEG can be used for action anticipation in human-robot interaction (HRI), leveraging its high temporal resolution and modern deep learning techniques. We evaluated multiple Deep Learning classification models on a motor imagery (MI) dataset, achieving up to 80.90% accuracy. These results were further validated in a pilot experiment, where actions were accurately predicted several hundred milliseconds before execution. This research demonstrates the potential of combining EEG with deep learning to enhance real-time collaborative tasks, paving the way for safer and more efficient human-robot interactions.

随着机器人成为各个领域不可或缺的一部分，改善人机协作至关重要，特别是在预测人类行为以提高安全性和效率方面。脑电图（EEG）信号提供了一个很有前途的解决方案，因为它们可以在一秒钟内检测到运动前的大脑活动，从而实现机器人的预测能力。本研究探讨了EEG如何利用其高时间分辨率和现代深度学习技术用于人机交互（HRI）中的动作预期。我们在一个运动图像（MI）数据集上评估了多个深度学习分类模型，准确率高达80.90%。这些结果在试点实验中得到了进一步验证，在执行前几百毫秒准确预测了操作。这项研究展示了脑电图与深度学习相结合的潜力，可以增强实时协作任务，为更安全、更高效的人机交互铺平道路。

引用次数: 0

Immersive interfaces for clinical applications: current status and future perspective. 临床应用沉浸式界面：现状与未来展望

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-11-27 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1362444

Naïg Chenais, Arno Görgen

Digital immersive technologies have become increasingly prominent in clinical research and practice, including medical communication and technical education, serious games for health, psychotherapy, and interfaces for neurorehabilitation. The worldwide enthusiasm for digital health and digital therapeutics has prompted the development and testing of numerous applications and interaction methods. Nevertheless, the lack of consistency in the approaches and the peculiarity of the constructed environments contribute to an increasing disparity between the eagerness for new immersive designs and the long-term clinical adoption of these technologies. Several challenges emerge in aligning the different priorities of virtual environment designers and clinicians. This article seeks to examine the utilization and mechanics of medical immersive interfaces based on extended reality and highlight specific design challenges. The transfer of skills from virtual to clinical environments is often confounded by perceptual and attractiveness factors. We argue that a multidisciplinary approach to development and testing, along with a comprehensive acknowledgement of the shared mechanisms that underlie immersive training, are essential for the sustainable integration of extended reality into clinical settings. The present review discusses the application of a multilevel sensory framework to extended reality design, with the aim of developing brain-centered immersive interfaces tailored for therapeutic and educational purposes. Such a framework must include broader design questions, such as the integration of digital technologies into psychosocial care models, clinical validation, and related ethical concerns. We propose that efforts to bridge the virtual gap should include mixed methodologies and neurodesign approaches, integrating user behavioral and physiological feedback into iterative design phases.

数字沉浸式技术在临床研究和实践中日益突出，包括医学交流和技术教育、健康严肃游戏、心理治疗和神经康复接口。世界范围内对数字健康和数字治疗的热情促使了许多应用和交互方法的开发和测试。然而，方法的缺乏一致性和构建环境的特殊性导致了对新的沉浸式设计的渴望与这些技术的长期临床采用之间的差距越来越大。在协调虚拟环境设计师和临床医生的不同优先级时，出现了一些挑战。本文旨在研究基于扩展现实的医疗沉浸式界面的使用和机制，并强调具体的设计挑战。从虚拟环境到临床环境的技能转移常常受到感知和吸引力因素的干扰。我们认为，开发和测试的多学科方法，以及对沉浸式培训基础的共享机制的全面认识，对于将扩展现实可持续地整合到临床环境中至关重要。本综述讨论了多层次感官框架在扩展现实设计中的应用，旨在开发适合治疗和教育目的的以大脑为中心的沉浸式界面。这样的框架必须包括更广泛的设计问题，例如将数字技术整合到社会心理护理模式、临床验证和相关的伦理问题中。我们建议，弥合虚拟差距的努力应包括混合方法和神经设计方法，将用户行为和生理反馈整合到迭代设计阶段。

{"title":"Immersive interfaces for clinical applications: current status and future perspective.","authors":"Naïg Chenais, Arno Görgen","doi":"10.3389/fnbot.2024.1362444","DOIUrl":"10.3389/fnbot.2024.1362444","url":null,"abstract":"Digital immersive technologies have become increasingly prominent in clinical research and practice, including medical communication and technical education, serious games for health, psychotherapy, and interfaces for neurorehabilitation. The worldwide enthusiasm for digital health and digital therapeutics has prompted the development and testing of numerous applications and interaction methods. Nevertheless, the lack of consistency in the approaches and the peculiarity of the constructed environments contribute to an increasing disparity between the eagerness for new immersive designs and the long-term clinical adoption of these technologies. Several challenges emerge in aligning the different priorities of virtual environment designers and clinicians. This article seeks to examine the utilization and mechanics of medical immersive interfaces based on extended reality and highlight specific design challenges. The transfer of skills from virtual to clinical environments is often confounded by perceptual and attractiveness factors. We argue that a multidisciplinary approach to development and testing, along with a comprehensive acknowledgement of the shared mechanisms that underlie immersive training, are essential for the sustainable integration of extended reality into clinical settings. The present review discusses the application of a multilevel sensory framework to extended reality design, with the aim of developing brain-centered immersive interfaces tailored for therapeutic and educational purposes. Such a framework must include broader design questions, such as the integration of digital technologies into psychosocial care models, clinical validation, and related ethical concerns. We propose that efforts to bridge the virtual gap should include mixed methodologies and neurodesign approaches, integrating user behavioral and physiological feedback into iterative design phases.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1362444"},"PeriodicalIF":2.6,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631914/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142812874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multimodal travel route recommendation system leveraging visual Transformers and self-attention mechanisms. 利用视觉变形和自关注机制的多模式旅行路线推荐系统。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-11-26 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1439195

Zhang Juan, Jing Zhang, Ming Gao

Introduction: With the rapid development of the tourism industry, the demand for accurate and personalized travel route recommendations has significantly increased. However, traditional methods often fail to effectively integrate visual and sequential information, leading to recommendations that are both less accurate and less personalized.

Methods: This paper introduces SelfAM-Vtrans, a novel algorithm that leverages multimodal data-combining visual Transformers, LSTMs, and self-attention mechanisms-to enhance the accuracy and personalization of travel route recommendations. SelfAM-Vtrans integrates visual and sequential information by employing a visual Transformer to extract features from travel images, thereby capturing spatial relationships within them. Concurrently, a Long Short-Term Memory (LSTM) network encodes sequential data to capture the temporal dependencies within travel sequences. To effectively merge these two modalities, a self-attention mechanism fuses the visual features and sequential encodings, thoroughly accounting for their interdependencies. Based on this fused representation, a classification or regression model is trained using real travel datasets to recommend optimal travel routes.

Results and discussion: The algorithm was rigorously evaluated through experiments conducted on real-world travel datasets, and its performance was benchmarked against other route recommendation methods. The results demonstrate that SelfAM-Vtrans significantly outperforms traditional approaches in terms of both recommendation accuracy and personalization. By comprehensively incorporating both visual and sequential data, this method offers travelers more tailored and precise route suggestions, thereby enriching the overall travel experience.

导读：随着旅游业的快速发展，人们对精准、个性化的旅游路线推荐的需求显著增加。然而，传统的方法往往不能有效地整合视觉和顺序信息，导致推荐既不准确又不个性化。方法：本文介绍了一种利用多模态数据（结合视觉变形、lstm和自关注机制）来提高旅行路线推荐准确性和个性化的新算法SelfAM-Vtrans。SelfAM-Vtrans通过使用视觉转换器从旅行图像中提取特征，从而捕获其中的空间关系，从而集成了视觉和顺序信息。同时，长短期记忆（LSTM）网络对序列数据进行编码，以捕获旅行序列中的时间依赖性。为了有效地融合这两种模式，一种自注意机制融合了视觉特征和顺序编码，彻底考虑了它们的相互依赖性。基于这种融合表示，使用真实的旅行数据集训练分类或回归模型，以推荐最优的旅行路线。结果与讨论：通过在真实旅行数据集上进行的实验对该算法进行了严格的评估，并将其性能与其他路线推荐方法进行了基准测试。结果表明，SelfAM-Vtrans在推荐准确性和个性化方面都明显优于传统方法。通过综合结合视觉和顺序数据，该方法为旅行者提供更定制和精确的路线建议，从而丰富整体旅行体验。

{"title":"A multimodal travel route recommendation system leveraging visual Transformers and self-attention mechanisms.","authors":"Zhang Juan, Jing Zhang, Ming Gao","doi":"10.3389/fnbot.2024.1439195","DOIUrl":"10.3389/fnbot.2024.1439195","url":null,"abstract":"Introduction: With the rapid development of the tourism industry, the demand for accurate and personalized travel route recommendations has significantly increased. However, traditional methods often fail to effectively integrate visual and sequential information, leading to recommendations that are both less accurate and less personalized.Methods: This paper introduces SelfAM-Vtrans, a novel algorithm that leverages multimodal data-combining visual Transformers, LSTMs, and self-attention mechanisms-to enhance the accuracy and personalization of travel route recommendations. SelfAM-Vtrans integrates visual and sequential information by employing a visual Transformer to extract features from travel images, thereby capturing spatial relationships within them. Concurrently, a Long Short-Term Memory (LSTM) network encodes sequential data to capture the temporal dependencies within travel sequences. To effectively merge these two modalities, a self-attention mechanism fuses the visual features and sequential encodings, thoroughly accounting for their interdependencies. Based on this fused representation, a classification or regression model is trained using real travel datasets to recommend optimal travel routes.Results and discussion: The algorithm was rigorously evaluated through experiments conducted on real-world travel datasets, and its performance was benchmarked against other route recommendation methods. The results demonstrate that SelfAM-Vtrans significantly outperforms traditional approaches in terms of both recommendation accuracy and personalization. By comprehensively incorporating both visual and sequential data, this method offers travelers more tailored and precise route suggestions, thereby enriching the overall travel experience.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1439195"},"PeriodicalIF":2.6,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11628496/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142806763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MusicARLtrans Net: a multimodal agent interactive music education system driven via reinforcement learning. musicarltransnet：一个多模态智能体互动音乐教育系统，通过强化学习驱动。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-11-21 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1479694

Jie Chang, Zhenmeng Wang, Chao Yan

Introduction: In recent years, with the rapid development of artificial intelligence technology, the field of music education has begun to explore new teaching models. Traditional music education research methods have primarily focused on single-modal studies such as note recognition and instrument performance techniques, often overlooking the importance of multimodal data integration and interactive teaching. Existing methods often struggle with handling multimodal data effectively, unable to fully utilize visual, auditory, and textual information for comprehensive analysis, which limits the effectiveness of teaching.

Methods: To address these challenges, this project introduces MusicARLtrans Net, a multimodal interactive music education agent system driven by reinforcement learning. The system integrates Speech-to-Text (STT) technology to achieve accurate transcription of user voice commands, utilizes the ALBEF (Align Before Fuse) model for aligning and integrating multimodal data, and applies reinforcement learning to optimize teaching strategies.

Results and discussion: This approach provides a personalized and real-time feedback interactive learning experience by effectively combining auditory, visual, and textual information. The system collects and annotates multimodal data related to music education, trains and integrates various modules, and ultimately delivers an efficient and intelligent music education agent. Experimental results demonstrate that MusicARLtrans Net significantly outperforms traditional methods, achieving an accuracy of 96.77% on the LibriSpeech dataset and 97.55% on the MS COCO dataset, with marked improvements in recall, F1 score, and AUC metrics. These results highlight the system's superiority in speech recognition accuracy, multimodal data understanding, and teaching strategy optimization, which together lead to enhanced learning outcomes and user satisfaction. The findings hold substantial academic and practical significance, demonstrating the potential of advanced AI-driven systems in revolutionizing music education.

导读：近年来，随着人工智能技术的飞速发展，音乐教育领域开始探索新的教学模式。传统的音乐教育研究方法主要集中在音符识别、乐器演奏技巧等单模态研究上，往往忽视了多模态数据整合和互动教学的重要性。现有的方法往往难以有效地处理多模态数据，无法充分利用视觉、听觉和文本信息进行综合分析，这限制了教学的有效性。方法：为了解决这些挑战，本项目引入了MusicARLtrans Net，这是一个由强化学习驱动的多模态交互式音乐教育代理系统。该系统集成了语音到文本（STT）技术，实现用户语音命令的准确转录，利用ALBEF （Align Before Fuse）模型对多模态数据进行对齐和集成，并应用强化学习优化教学策略。结果与讨论：该方法通过有效地结合听觉、视觉和文本信息，提供个性化和实时反馈的交互式学习体验。系统对音乐教育相关的多模态数据进行采集和标注，对各个模块进行训练和整合，最终提供一个高效、智能的音乐教育代理。实验结果表明，MusicARLtrans Net显著优于传统方法，在librisspeech数据集上达到96.77%的准确率，在MS COCO数据集上达到97.55%的准确率，在召回率、F1分数和AUC指标上有显著提高。这些结果突出了系统在语音识别准确性、多模态数据理解和教学策略优化方面的优势，这些优势共同提高了学习效果和用户满意度。这一发现具有重大的学术和现实意义，展示了先进的人工智能驱动系统在彻底改变音乐教育方面的潜力。

{"title":"MusicARLtrans Net: a multimodal agent interactive music education system driven via reinforcement learning.","authors":"Jie Chang, Zhenmeng Wang, Chao Yan","doi":"10.3389/fnbot.2024.1479694","DOIUrl":"10.3389/fnbot.2024.1479694","url":null,"abstract":"Introduction: In recent years, with the rapid development of artificial intelligence technology, the field of music education has begun to explore new teaching models. Traditional music education research methods have primarily focused on single-modal studies such as note recognition and instrument performance techniques, often overlooking the importance of multimodal data integration and interactive teaching. Existing methods often struggle with handling multimodal data effectively, unable to fully utilize visual, auditory, and textual information for comprehensive analysis, which limits the effectiveness of teaching.Methods: To address these challenges, this project introduces MusicARLtrans Net, a multimodal interactive music education agent system driven by reinforcement learning. The system integrates Speech-to-Text (STT) technology to achieve accurate transcription of user voice commands, utilizes the ALBEF (Align Before Fuse) model for aligning and integrating multimodal data, and applies reinforcement learning to optimize teaching strategies.Results and discussion: This approach provides a personalized and real-time feedback interactive learning experience by effectively combining auditory, visual, and textual information. The system collects and annotates multimodal data related to music education, trains and integrates various modules, and ultimately delivers an efficient and intelligent music education agent. Experimental results demonstrate that MusicARLtrans Net significantly outperforms traditional methods, achieving an accuracy of 96.77% on the LibriSpeech dataset and 97.55% on the MS COCO dataset, with marked improvements in recall, F1 score, and AUC metrics. These results highlight the system's superiority in speech recognition accuracy, multimodal data understanding, and teaching strategy optimization, which together lead to enhanced learning outcomes and user satisfaction. The findings hold substantial academic and practical significance, demonstrating the potential of advanced AI-driven systems in revolutionizing music education.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1479694"},"PeriodicalIF":2.6,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11617572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142785067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal robot-assisted English writing guidance and error correction with reinforcement learning. 多模态机器人辅助英语写作指导与纠错强化学习。

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics

Pub Date : 2024-11-20 eCollection Date: 2024-01-01 DOI: 10.3389/fnbot.2024.1483131

Ni Wang

Introduction: With the development of globalization and the increasing importance of English in international communication, effectively improving English writing skills has become a key focus in language learning. Traditional methods for English writing guidance and error correction have predominantly relied on rule-based approaches or statistical models, such as conventional language models and basic machine learning algorithms. While these methods can aid learners in improving their writing quality to some extent, they often suffer from limitations such as inflexibility, insufficient contextual understanding, and an inability to handle multimodal information. These shortcomings restrict their effectiveness in more complex linguistic environments.

Methods: To address these challenges, this study introduces ETG-ALtrans, a multimodal robot-assisted English writing guidance and error correction technology based on an improved ALBEF model and VGG19 architecture, enhanced by reinforcement learning. The approach leverages VGG19 to extract visual features and integrates them with the ALBEF model, achieving precise alignment and fusion of images and text. This enhances the model's ability to comprehend context. Furthermore, by incorporating reinforcement learning, the model can adaptively refine its correction strategies, thereby optimizing the effectiveness of writing guidance.

Results and discussion: Experimental results demonstrate that the proposed ETG-ALtrans method significantly improves the accuracy of English writing error correction and the intelligence level of writing guidance in multimodal data scenarios. Compared to traditional methods, this approach not only enhances the precision of writing suggestions but also better caters to the personalized needs of learners, thereby effectively improving their writing skills. This research is of significant importance in the field of language learning technology and offers new perspectives and methodologies for the development of future English writing assistance tools.

引言：随着全球化的发展和英语在国际交流中的重要性日益提高，有效提高英语写作能力已成为语言学习的重点。传统的英语写作指导和纠错方法主要依赖于基于规则的方法或统计模型，如传统的语言模型和基本的机器学习算法。虽然这些方法可以在一定程度上帮助学习者提高写作质量，但它们往往存在一些局限性，如缺乏灵活性，对上下文的理解不足，以及无法处理多模态信息。这些缺点限制了它们在更复杂的语言环境中的有效性。为了解决这些挑战，本研究引入了ETG-ALtrans，这是一种基于改进的ALBEF模型和VGG19架构的多模式机器人辅助英语写作指导和纠错技术，并通过强化学习进行了增强。该方法利用VGG19提取视觉特征，并将其与ALBEF模型集成，实现图像和文本的精确对齐和融合。这增强了模型理解上下文的能力。此外，通过结合强化学习，该模型可以自适应地改进其纠正策略，从而优化写作指导的有效性。结果与讨论：实验结果表明，提出的ETG-ALtrans方法显著提高了多模态数据场景下英语写作纠错的准确性和写作引导的智能水平。与传统方法相比，这种方法不仅提高了写作建议的准确性，而且更好地迎合了学习者的个性化需求，从而有效地提高了学习者的写作技能。该研究在语言学习技术领域具有重要意义，为未来英语写作辅助工具的开发提供了新的视角和方法。

{"title":"Multimodal robot-assisted English writing guidance and error correction with reinforcement learning.","authors":"Ni Wang","doi":"10.3389/fnbot.2024.1483131","DOIUrl":"10.3389/fnbot.2024.1483131","url":null,"abstract":"Introduction: With the development of globalization and the increasing importance of English in international communication, effectively improving English writing skills has become a key focus in language learning. Traditional methods for English writing guidance and error correction have predominantly relied on rule-based approaches or statistical models, such as conventional language models and basic machine learning algorithms. While these methods can aid learners in improving their writing quality to some extent, they often suffer from limitations such as inflexibility, insufficient contextual understanding, and an inability to handle multimodal information. These shortcomings restrict their effectiveness in more complex linguistic environments.Methods: To address these challenges, this study introduces ETG-ALtrans, a multimodal robot-assisted English writing guidance and error correction technology based on an improved ALBEF model and VGG19 architecture, enhanced by reinforcement learning. The approach leverages VGG19 to extract visual features and integrates them with the ALBEF model, achieving precise alignment and fusion of images and text. This enhances the model's ability to comprehend context. Furthermore, by incorporating reinforcement learning, the model can adaptively refine its correction strategies, thereby optimizing the effectiveness of writing guidance.Results and discussion: Experimental results demonstrate that the proposed ETG-ALtrans method significantly improves the accuracy of English writing error correction and the intelligence level of writing guidance in multimodal data scenarios. Compared to traditional methods, this approach not only enhances the precision of writing suggestions but also better caters to the personalized needs of learners, thereby effectively improving their writing skills. This research is of significant importance in the field of language learning technology and offers new perspectives and methodologies for the development of future English writing assistance tools.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"18 ","pages":"1483131"},"PeriodicalIF":2.6,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11614782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142779207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0