首页 > 最新文献

Frontiers in Neurorobotics最新文献

英文 中文
A robust and effective framework for 3D scene reconstruction and high-quality rendering in nasal endoscopy surgery. 鼻内窥镜手术中三维场景重建和高质量渲染的鲁棒有效框架。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-27 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1630728
Xueqin Ji, Shuting Zhao, Di Liu, Feng Wang, Xinrong Chen

In nasal endoscopic surgery, the narrow nasal cavity restricts the surgical field of view and the manipulation of surgical instruments. Therefore, precise real-time intraoperative navigation, which can provide precise 3D information, plays a crucial role in avoiding critical areas with dense blood vessels and nerves. Although significant progress has been made in endoscopic 3D reconstruction methods, their application in nasal scenarios still faces numerous challenges. On the one hand, there is a lack of high-quality, annotated nasal endoscopy datasets. On the other hand, issues such as motion blur and soft tissue deformations complicate the nasal endoscopy reconstruction process. To tackle these challenges, a series of nasal endoscopy examination videos are collected, and the pose information for each frame is recorded. Additionally, a novel model named Mip-EndoGS is proposed, which integrates 3D Gaussian Splatting for reconstruction and rendering and a diffusion module to reduce image blurring in endoscopic data. Meanwhile, by incorporating an adaptive low-pass filter into the rendering pipeline, the aliasing artifacts (jagged edges) are mitigated, which occur during the rendering process. Extensive quantitative and visual experiments show that the proposed model is capable of reconstructing 3D scenes within the nasal cavity in real-time, thereby offering surgeons more detailed and precise information about the surgical scene. Moreover, the proposed approach holds great potential for integration with AR-based surgical navigation systems to enhance intraoperative guidance.

在鼻内镜手术中,狭窄的鼻腔限制了手术视野和手术器械的操作。因此,精确的术中实时导航,能够提供精确的三维信息,对于避开血管和神经密集的关键区域起着至关重要的作用。尽管内窥镜三维重建方法取得了重大进展,但其在鼻腔场景中的应用仍面临许多挑战。一方面,缺乏高质量的、带注释的鼻内窥镜数据集。另一方面,运动模糊和软组织变形等问题使鼻内窥镜重建过程复杂化。为了解决这些问题,我们收集了一系列鼻内窥镜检查视频,并记录了每帧的姿势信息。此外,提出了一种新的模型Mip-EndoGS,该模型集成了用于重建和渲染的三维高斯飞溅和用于减少内镜数据图像模糊的扩散模块。同时,通过在渲染管道中加入自适应低通滤波器,可以减轻渲染过程中出现的混叠现象(锯齿状边缘)。大量的定量和视觉实验表明,该模型能够实时重建鼻腔内的三维场景,从而为外科医生提供更详细和精确的手术场景信息。此外,该方法具有与基于ar的手术导航系统集成以增强术中引导的巨大潜力。
{"title":"A robust and effective framework for 3D scene reconstruction and high-quality rendering in nasal endoscopy surgery.","authors":"Xueqin Ji, Shuting Zhao, Di Liu, Feng Wang, Xinrong Chen","doi":"10.3389/fnbot.2025.1630728","DOIUrl":"10.3389/fnbot.2025.1630728","url":null,"abstract":"<p><p>In nasal endoscopic surgery, the narrow nasal cavity restricts the surgical field of view and the manipulation of surgical instruments. Therefore, precise real-time intraoperative navigation, which can provide precise 3D information, plays a crucial role in avoiding critical areas with dense blood vessels and nerves. Although significant progress has been made in endoscopic 3D reconstruction methods, their application in nasal scenarios still faces numerous challenges. On the one hand, there is a lack of high-quality, annotated nasal endoscopy datasets. On the other hand, issues such as motion blur and soft tissue deformations complicate the nasal endoscopy reconstruction process. To tackle these challenges, a series of nasal endoscopy examination videos are collected, and the pose information for each frame is recorded. Additionally, a novel model named Mip-EndoGS is proposed, which integrates 3D Gaussian Splatting for reconstruction and rendering and a diffusion module to reduce image blurring in endoscopic data. Meanwhile, by incorporating an adaptive low-pass filter into the rendering pipeline, the aliasing artifacts (jagged edges) are mitigated, which occur during the rendering process. Extensive quantitative and visual experiments show that the proposed model is capable of reconstructing 3D scenes within the nasal cavity in real-time, thereby offering surgeons more detailed and precise information about the surgical scene. Moreover, the proposed approach holds great potential for integration with AR-based surgical navigation systems to enhance intraoperative guidance.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1630728"},"PeriodicalIF":2.6,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12245865/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144626010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding human co-manipulation via motion and haptic information to enable future physical human-robotic collaborations. 通过运动和触觉信息了解人类的协同操作,以实现未来的物理人机协作。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-19 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1480399
Kody Shaw, John L Salmon, Marc D Killpack

Human teams intuitively and effectively collaborate to move large, heavy, or unwieldy objects. However, understanding of this interaction in literature is limited. This is especially problematic given our goal to enable human-robot teams to work together. Therefore, to better understand how human teams work together to eventually enable intuitive human-robot interaction, in this paper we examine four sub-components of collaborative manipulation (co-manipulation), using motion and haptics. We define co-manipulation as a group of two or more agents collaboratively moving an object. We present a study that uses a large object for co-manipulation as we vary the number of participants (two or three) and the roles of the participants (leaders or followers), and the degrees of freedom necessary to complete the defined motion for the object. In analyzing the results, we focus on four key components related to motion and haptics. Specifically, we first define and examine a static or rest state to demonstrate a method of detecting transitions between the static state and an active state, where one or more agents are moving toward an intended goal. Secondly, we analyze a variety of signals (e.g. force, acceleration, etc.) during movements in each of the six rigid-body degrees of freedom of the co-manipulated object. This data allows us to identify the best signals that correlate with the desired motion of the team. Third, we examine the completion percentage of each task. The completion percentage for each task can be used to determine which motion objectives can be communicated via haptic feedback. Finally, we define a metric to determine if participants divide two degree-of-freedom tasks into separate degrees of freedom or if they take the most direct path. These four components contribute to the necessary groundwork for advancing intuitive human-robot interaction.

人类团队直观而有效地协作来移动大型、重型或笨重的物体。然而,文献中对这种相互作用的理解是有限的。考虑到我们的目标是让人-机器人团队一起工作,这尤其成问题。因此,为了更好地理解人类团队如何共同工作,最终实现直观的人机交互,在本文中,我们研究了使用运动和触觉的协作操作(协同操作)的四个子组件。我们将协同操作定义为一组两个或多个代理协作移动一个对象。我们提出了一项研究,该研究使用大型对象进行协同操作,因为我们改变了参与者的数量(两个或三个)和参与者的角色(领导者或追随者),以及完成对象定义运动所需的自由度。在分析结果时,我们重点关注与运动和触觉相关的四个关键组件。具体来说,我们首先定义和检查静态或静止状态,以演示检测静态状态和活动状态之间转换的方法,在活动状态中,一个或多个代理正在向预期目标移动。其次,我们分析了协同操纵对象的六个刚体自由度运动过程中的各种信号(例如力,加速度等)。这些数据使我们能够识别出与团队期望动作相关的最佳信号。第三,我们检查每个任务的完成百分比。每个任务的完成百分比可以用来确定哪些运动目标可以通过触觉反馈进行交流。最后,我们定义了一个度量来确定参与者是否将两个自由度任务划分为单独的自由度,或者他们是否采取最直接的路径。这四个组件为推进直观的人机交互提供了必要的基础。
{"title":"Understanding human co-manipulation via motion and haptic information to enable future physical human-robotic collaborations.","authors":"Kody Shaw, John L Salmon, Marc D Killpack","doi":"10.3389/fnbot.2025.1480399","DOIUrl":"10.3389/fnbot.2025.1480399","url":null,"abstract":"<p><p>Human teams intuitively and effectively collaborate to move large, heavy, or unwieldy objects. However, understanding of this interaction in literature is limited. This is especially problematic given our goal to enable human-robot teams to work together. Therefore, to better understand how human teams work together to eventually enable intuitive human-robot interaction, in this paper we examine four sub-components of collaborative manipulation (co-manipulation), using motion and haptics. We define co-manipulation as a group of two or more agents collaboratively moving an object. We present a study that uses a large object for co-manipulation as we vary the number of participants (two or three) and the roles of the participants (leaders or followers), and the degrees of freedom necessary to complete the defined motion for the object. In analyzing the results, we focus on four key components related to motion and haptics. Specifically, we first define and examine a static or rest state to demonstrate a method of detecting transitions between the static state and an active state, where one or more agents are moving toward an intended goal. Secondly, we analyze a variety of signals (e.g. force, acceleration, etc.) during movements in each of the six rigid-body degrees of freedom of the co-manipulated object. This data allows us to identify the best signals that correlate with the desired motion of the team. Third, we examine the completion percentage of each task. The completion percentage for each task can be used to determine which motion objectives can be communicated via haptic feedback. Finally, we define a metric to determine if participants divide two degree-of-freedom tasks into separate degrees of freedom or if they take the most direct path. These four components contribute to the necessary groundwork for advancing intuitive human-robot interaction.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1480399"},"PeriodicalIF":2.6,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12222233/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144559877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal fusion image enhancement technique and CFEC-YOLOv7 for underwater target detection algorithm research. 多模态融合图像增强技术与CFEC-YOLOv7水下目标检测算法研究。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-19 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1616919
Xiaorong Qiu, Yingzhong Shi

The underwater environment is more complex than that on land, resulting in severe static and dynamic blurring in underwater images, reducing the recognition accuracy of underwater targets and failing to meet the needs of underwater environment detection. Firstly, for the static blurring problem, we propose an adaptive color compensation algorithm and an improved MSR algorithm. Secondly, for the problem of dynamic blur, we adopt the Restormer network to eliminate the dynamic blur caused by the combined effects of camera shake, camera out-of-focus and relative motion displacement, etc. then, through qualitative analysis, quantitative analysis and underwater target detection on the enhanced dataset, the feasibility of our underwater enhancement method is verified. Finally, we propose a target recognition network suitable for the complex underwater environment. The local and global information is fused through the CCBC module and the ECLOU loss function to improve the positioning accuracy. The FasterNet module is introduced to reduce redundant computations and parameter counting. The experimental results show that the CFEC-YOLOv7 model and the underwater image enhancement method proposed by us exhibit excellent performance, can better adapt to the underwater target recognition task, and have a good application prospect.

水下环境比陆地环境复杂,导致水下图像静态和动态模糊严重,降低了水下目标的识别精度,不能满足水下环境检测的需要。首先,针对静态模糊问题,提出了一种自适应颜色补偿算法和改进的MSR算法。其次,针对动态模糊问题,采用Restormer网络消除由相机抖动、相机失焦和相对运动位移等综合影响引起的动态模糊,然后通过对增强数据集的定性分析、定量分析和水下目标检测,验证了我们的水下增强方法的可行性。最后,提出了一种适用于复杂水下环境的目标识别网络。通过CCBC模块和ECLOU损失函数融合局部和全局信息,提高定位精度。引入FasterNet模块,减少冗余计算和参数计数。实验结果表明,CFEC-YOLOv7模型和我们提出的水下图像增强方法表现出优异的性能,能更好地适应水下目标识别任务,具有良好的应用前景。
{"title":"Multimodal fusion image enhancement technique and CFEC-YOLOv7 for underwater target detection algorithm research.","authors":"Xiaorong Qiu, Yingzhong Shi","doi":"10.3389/fnbot.2025.1616919","DOIUrl":"10.3389/fnbot.2025.1616919","url":null,"abstract":"<p><p>The underwater environment is more complex than that on land, resulting in severe static and dynamic blurring in underwater images, reducing the recognition accuracy of underwater targets and failing to meet the needs of underwater environment detection. Firstly, for the static blurring problem, we propose an adaptive color compensation algorithm and an improved MSR algorithm. Secondly, for the problem of dynamic blur, we adopt the Restormer network to eliminate the dynamic blur caused by the combined effects of camera shake, camera out-of-focus and relative motion displacement, etc. then, through qualitative analysis, quantitative analysis and underwater target detection on the enhanced dataset, the feasibility of our underwater enhancement method is verified. Finally, we propose a target recognition network suitable for the complex underwater environment. The local and global information is fused through the CCBC module and the ECLOU loss function to improve the positioning accuracy. The FasterNet module is introduced to reduce redundant computations and parameter counting. The experimental results show that the CFEC-YOLOv7 model and the underwater image enhancement method proposed by us exhibit excellent performance, can better adapt to the underwater target recognition task, and have a good application prospect.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1616919"},"PeriodicalIF":2.6,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12222134/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144559876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
User recommendation method integrating hierarchical graph attention network with multimodal knowledge graph. 结合层次图关注网络和多模态知识图的用户推荐方法。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-18 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1587973
Xiaofei Han, Xin Dou

In common graph neural network (GNN), although incorporating social network information effectively utilizes interactions between users, it often overlooks the deeper semantic relationships between items and fails to integrate visual and textual feature information. This limitation can restrict the diversity and accuracy of recommendation results. To address this, the present study combines knowledge graph, GNN, and multimodal information to enhance feature representations of both users and items. The inclusion of knowledge graph not only provides a better understanding of the underlying logic behind user interests and preferences but also aids in addressing the cold-start problem for new users and items. Moreover, in improving recommendation accuracy, visual and textual features of items are incorporated as supplementary information. Therefore, a user recommendation model is proposed that integrates hierarchical graph attention network with multimodal knowledge graph. The model consists of four key components: a collaborative knowledge graph neural layer, an image feature extraction layer, a text feature extraction layer, and a prediction layer. The first three layers extract user and item features, and the recommendation is completed in the prediction layer. Experimental results based on two public datasets demonstrate that the proposed model significantly outperforms existing recommendation methods in terms of recommendation performance.

在普通图神经网络(GNN)中,虽然整合社交网络信息有效地利用了用户之间的交互,但它往往忽略了项目之间更深层次的语义关系,未能整合视觉和文本特征信息。这种限制会限制推荐结果的多样性和准确性。为了解决这个问题,本研究结合了知识图、GNN和多模态信息来增强用户和物品的特征表示。知识图谱的包含不仅提供了对用户兴趣和偏好背后的底层逻辑的更好理解,而且还有助于解决新用户和新项目的冷启动问题。此外,为了提高推荐的准确性,将项目的视觉特征和文本特征作为补充信息。为此,提出了一种将层次图关注网络与多模态知识图相结合的用户推荐模型。该模型由协同知识图神经层、图像特征提取层、文本特征提取层和预测层四个关键部分组成。前三层提取用户和项目特征,在预测层完成推荐。基于两个公开数据集的实验结果表明,该模型在推荐性能方面明显优于现有的推荐方法。
{"title":"User recommendation method integrating hierarchical graph attention network with multimodal knowledge graph.","authors":"Xiaofei Han, Xin Dou","doi":"10.3389/fnbot.2025.1587973","DOIUrl":"10.3389/fnbot.2025.1587973","url":null,"abstract":"<p><p>In common graph neural network (GNN), although incorporating social network information effectively utilizes interactions between users, it often overlooks the deeper semantic relationships between items and fails to integrate visual and textual feature information. This limitation can restrict the diversity and accuracy of recommendation results. To address this, the present study combines knowledge graph, GNN, and multimodal information to enhance feature representations of both users and items. The inclusion of knowledge graph not only provides a better understanding of the underlying logic behind user interests and preferences but also aids in addressing the cold-start problem for new users and items. Moreover, in improving recommendation accuracy, visual and textual features of items are incorporated as supplementary information. Therefore, a user recommendation model is proposed that integrates hierarchical graph attention network with multimodal knowledge graph. The model consists of four key components: a collaborative knowledge graph neural layer, an image feature extraction layer, a text feature extraction layer, and a prediction layer. The first three layers extract user and item features, and the recommendation is completed in the prediction layer. Experimental results based on two public datasets demonstrate that the proposed model significantly outperforms existing recommendation methods in terms of recommendation performance.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1587973"},"PeriodicalIF":2.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12213718/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144553235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-Aware Enhanced Feature Refinement for small object detection with Deformable DETR. 上下文感知增强特征细化小对象检测与变形的DETR。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-10 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1588565
Donghao Shi, Cunbin Zhao, Jianwen Shao, Minjie Feng, Lei Luo, Bing Ouyang, Jiamin Huang

Small object detection is a critical task in applications like autonomous driving and ship black smoke detection. While Deformable DETR has advanced small object detection, it faces limitations due to its reliance on CNNs for feature extraction, which restricts global context understanding and results in suboptimal feature representation. Additionally, it struggles with detecting small objects that occupy only a few pixels due to significant size disparities. To overcome these challenges, we propose the Context-Aware Enhanced Feature Refinement Deformable DETR, an improved Deformable DETR network. Our approach introduces Mask Attention in the backbone to improve feature extraction while effectively suppressing irrelevant background information. Furthermore, we propose a Context-Aware Enhanced Feature Refinement Encoder to address the issue of small objects with limited pixel representation. Experimental results demonstrate that our method outperforms the baseline, achieving a 2.1% improvement in mAP.

在自动驾驶和船舶黑烟检测等应用中,小物体检测是一项关键任务。虽然Deformable DETR具有先进的小目标检测,但由于其依赖cnn进行特征提取,限制了全局上下文理解并导致次优特征表示,因此面临局限性。此外,由于显著的尺寸差异,它难以检测仅占用几个像素的小物体。为了克服这些挑战,我们提出了上下文感知增强特征细化可变形DETR,这是一种改进的可变形DETR网络。我们的方法在主干中引入了掩模注意,以改进特征提取,同时有效地抑制不相关的背景信息。此外,我们提出了一个上下文感知增强特征细化编码器,以解决像素表示有限的小对象问题。实验结果表明,我们的方法优于基线,实现了2.1%的mAP改进。
{"title":"Context-Aware Enhanced Feature Refinement for small object detection with Deformable DETR.","authors":"Donghao Shi, Cunbin Zhao, Jianwen Shao, Minjie Feng, Lei Luo, Bing Ouyang, Jiamin Huang","doi":"10.3389/fnbot.2025.1588565","DOIUrl":"10.3389/fnbot.2025.1588565","url":null,"abstract":"<p><p>Small object detection is a critical task in applications like autonomous driving and ship black smoke detection. While Deformable DETR has advanced small object detection, it faces limitations due to its reliance on CNNs for feature extraction, which restricts global context understanding and results in suboptimal feature representation. Additionally, it struggles with detecting small objects that occupy only a few pixels due to significant size disparities. To overcome these challenges, we propose the Context-Aware Enhanced Feature Refinement Deformable DETR, an improved Deformable DETR network. Our approach introduces Mask Attention in the backbone to improve feature extraction while effectively suppressing irrelevant background information. Furthermore, we propose a Context-Aware Enhanced Feature Refinement Encoder to address the issue of small objects with limited pixel representation. Experimental results demonstrate that our method outperforms the baseline, achieving a 2.1% improvement in mAP.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1588565"},"PeriodicalIF":2.6,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12185399/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144484070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Depth-aware unpaired image-to-image translation for autonomous driving test scenario generation using a dual-branch GAN. 使用双分支GAN生成自动驾驶测试场景的深度感知非配对图像到图像转换。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-30 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1603964
Donghao Shi, Chenxin Zhao, Cunbin Zhao, Zhou Fang, Chonghao Yu, Jian Li, Minjie Feng

Reliable visual perception is essential for autonomous driving test scenario generation, yet adverse weather and lighting variations pose significant challenges to simulation robustness and generalization. Traditional unpaired image-to-image translation methods primarily rely on RGB-based transformations, often resulting in geometric distortions and loss of structural consistency, which can negatively impact the realism and accuracy of generated test scenarios. To address these limitations, we propose a Depth-Aware Dual-Branch Generative Adversarial Network (DAB-GAN) that explicitly incorporates depth information to preserve spatial structures during scenario generation. The dual-branch generator processes both RGB and depth inputs, ensuring geometric fidelity, while a self-attention mechanism enhances spatial dependencies and local detail refinement. This enables the creation of realistic and structure-preserving test environments that are crucial for evaluating autonomous driving perception systems, especially under adverse weather conditions. Experimental results demonstrate that DAB-GAN outperforms existing unpaired image-to-image translation methods, achieving superior visual fidelity and maintaining depth-aware structural integrity. This approach provides a robust framework for generating diverse and challenging test scenarios, enhancing the development and validation of autonomous driving systems under various real-world conditions.

可靠的视觉感知对于自动驾驶测试场景的生成至关重要,然而恶劣的天气和光照变化对模拟的鲁棒性和泛化构成了重大挑战。传统的非配对图像到图像的转换方法主要依赖于基于rgb的转换,经常导致几何扭曲和结构一致性的丧失,这可能会对生成的测试场景的真实感和准确性产生负面影响。为了解决这些限制,我们提出了一种深度感知双分支生成对抗网络(DAB-GAN),该网络明确地融合了深度信息,以在场景生成过程中保留空间结构。双支路生成器同时处理RGB和深度输入,确保几何保真度,而自关注机制增强了空间依赖性和局部细节细化。这使得能够创建真实且保留结构的测试环境,这对于评估自动驾驶感知系统至关重要,特别是在恶劣天气条件下。实验结果表明,DAB-GAN优于现有的非配对图像到图像的转换方法,在保持深度感知结构完整性的同时获得了卓越的视觉保真度。这种方法为生成多样化且具有挑战性的测试场景提供了一个强大的框架,增强了在各种现实条件下自动驾驶系统的开发和验证。
{"title":"Depth-aware unpaired image-to-image translation for autonomous driving test scenario generation using a dual-branch GAN.","authors":"Donghao Shi, Chenxin Zhao, Cunbin Zhao, Zhou Fang, Chonghao Yu, Jian Li, Minjie Feng","doi":"10.3389/fnbot.2025.1603964","DOIUrl":"10.3389/fnbot.2025.1603964","url":null,"abstract":"<p><p>Reliable visual perception is essential for autonomous driving test scenario generation, yet adverse weather and lighting variations pose significant challenges to simulation robustness and generalization. Traditional unpaired image-to-image translation methods primarily rely on RGB-based transformations, often resulting in geometric distortions and loss of structural consistency, which can negatively impact the realism and accuracy of generated test scenarios. To address these limitations, we propose a Depth-Aware Dual-Branch Generative Adversarial Network (DAB-GAN) that explicitly incorporates depth information to preserve spatial structures during scenario generation. The dual-branch generator processes both RGB and depth inputs, ensuring geometric fidelity, while a self-attention mechanism enhances spatial dependencies and local detail refinement. This enables the creation of realistic and structure-preserving test environments that are crucial for evaluating autonomous driving perception systems, especially under adverse weather conditions. Experimental results demonstrate that DAB-GAN outperforms existing unpaired image-to-image translation methods, achieving superior visual fidelity and maintaining depth-aware structural integrity. This approach provides a robust framework for generating diverse and challenging test scenarios, enhancing the development and validation of autonomous driving systems under various real-world conditions.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1603964"},"PeriodicalIF":2.6,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162506/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144301898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gait analysis system for assessing abnormal patterns in individuals with hemiparetic stroke during robot-assisted gait training: a criterion-related validity study in healthy adults. 步态分析系统在机器人辅助的步态训练中评估偏瘫中风患者的异常模式:一项健康成人标准相关的有效性研究。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-21 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1558009
Issei Nakashima, Daisuke Imoto, Satoshi Hirano, Hitoshi Konosu, Yohei Otaka

Introduction: Gait robots have the potential to analyze gait characteristics during gait training using mounted sensors in addition to robotic assistance of the individual's movements. However, no systems have been proposed to analyze gait performance during robot-assisted gait training. Our newly developed gait robot," Welwalk WW-2000 (WW-2000)" is equipped with a gait analysis system to analyze abnormal gait patterns during robot-assisted gait training. We previously investigated the validity of the index values for the nine abnormal gait patterns. Here, we proposed new index values for four abnormal gait patterns, which are anterior trunk tilt, excessive trunk shifts over the affected side, excessive knee joint flexion, and swing difficulty; we investigated the criterion validity of the WW-2000 gait analysis system in healthy adults for these new index values.

Methods: Twelve healthy participants simulated four abnormal gait patterns manifested in individuals with hemiparetic stroke while wearing the robot. Each participant was instructed to perform 16 gait trials, with four grades of severity for each of the four abnormal gait patterns. Twenty strides were recorded for each gait trial using a gait analysis system in the WW-2000 and video cameras. Abnormal gait patterns were assessed using the two parameters: the index values calculated for each stride from the WW-2000 gait analysis system, and assessor's severity scores for each stride. The correlation of the index values between the two methods was evaluated using the Spearman rank correlation coefficient for each gait pattern in each participant.

Results: The median (minimum to maximum) values of Spearman rank correlation coefficient among the 12 participants between the index value calculated using the WW-2000 gait analysis system and the assessor's severity scores for anterior trunk tilt, excessive trunk shifts over the affected side, excessive knee joint flexion, and swing difficulty were 0.892 (0.749-0.969), 0.859 (0.439-0.923), 0.920 (0.738-0.969), and 0.681 (0.391-0.889), respectively.

Discussion: The WW-2000 gait analysis system captured four new abnormal gait patterns observed in individuals with hemiparetic stroke with high validity, in addition to nine previously validated abnormal gait patterns. Assessing abnormal gait patterns is important as improving them contributes to stroke rehabilitation.

Clinical trial registration: https://jrct.niph.go.jp, identifier jRCT 042190109.

步态机器人在步态训练过程中,除了机器人辅助个人运动外,还可以使用安装的传感器分析步态特征。然而,在机器人辅助的步态训练过程中,还没有提出分析步态性能的系统。我们新开发的步态机器人Welwalk www -2000 (www -2000)配备了步态分析系统,用于分析机器人辅助步态训练过程中的异常步态模式。我们之前调查了九种异常步态模式的指标值的有效性。在这里,我们提出了四种异常步态模式的新指标值,这四种异常步态模式是躯干前倾、躯干过度向患侧移动、膝关节过度屈曲和摇摆困难;我们调查了WW-2000步态分析系统在健康成人中对这些新指标值的效度。方法:12名健康参与者在佩戴机器人时模拟了偏瘫中风患者的四种异常步态模式。每位参与者被要求进行16次步态试验,每种步态异常模式的严重程度分为四个等级。使用WW-2000中的步态分析系统和摄像机记录每次步态试验的20步。采用WW-2000步态分析系统计算的每一步的指数值和评估者对每一步的严重程度评分两个参数对异常步态模式进行评估。使用Spearman秩相关系数对每个参与者的每种步态模式评估两种方法之间指标值的相关性。结果:12名受试者中,采用WW-2000步态分析系统计算的指数值与评估者前肢倾斜、躯干过度向患侧移动、膝关节过度屈曲、摇摆困难程度评分的Spearman秩相关系数中位数(最小至最大值)分别为0.892(0.749-0.969)、0.859(0.439-0.923)、0.920(0.738-0.969)、0.681(0.391-0.889)。讨论:WW-2000步态分析系统捕获了四种新的异常步态模式,在偏瘫中风患者中观察到高效度,除了先前验证的九种异常步态模式。评估异常的步态模式是重要的,因为改善它们有助于中风康复。临床试验注册:https://jrct.niph.go.jp,编号jRCT 042190109。
{"title":"Gait analysis system for assessing abnormal patterns in individuals with hemiparetic stroke during robot-assisted gait training: a criterion-related validity study in healthy adults.","authors":"Issei Nakashima, Daisuke Imoto, Satoshi Hirano, Hitoshi Konosu, Yohei Otaka","doi":"10.3389/fnbot.2025.1558009","DOIUrl":"10.3389/fnbot.2025.1558009","url":null,"abstract":"<p><strong>Introduction: </strong>Gait robots have the potential to analyze gait characteristics during gait training using mounted sensors in addition to robotic assistance of the individual's movements. However, no systems have been proposed to analyze gait performance during robot-assisted gait training. Our newly developed gait robot,\" Welwalk WW-2000 (WW-2000)\" is equipped with a gait analysis system to analyze abnormal gait patterns during robot-assisted gait training. We previously investigated the validity of the index values for the nine abnormal gait patterns. Here, we proposed new index values for four abnormal gait patterns, which are anterior trunk tilt, excessive trunk shifts over the affected side, excessive knee joint flexion, and swing difficulty; we investigated the criterion validity of the WW-2000 gait analysis system in healthy adults for these new index values.</p><p><strong>Methods: </strong>Twelve healthy participants simulated four abnormal gait patterns manifested in individuals with hemiparetic stroke while wearing the robot. Each participant was instructed to perform 16 gait trials, with four grades of severity for each of the four abnormal gait patterns. Twenty strides were recorded for each gait trial using a gait analysis system in the WW-2000 and video cameras. Abnormal gait patterns were assessed using the two parameters: the index values calculated for each stride from the WW-2000 gait analysis system, and assessor's severity scores for each stride. The correlation of the index values between the two methods was evaluated using the Spearman rank correlation coefficient for each gait pattern in each participant.</p><p><strong>Results: </strong>The median (minimum to maximum) values of Spearman rank correlation coefficient among the 12 participants between the index value calculated using the WW-2000 gait analysis system and the assessor's severity scores for anterior trunk tilt, excessive trunk shifts over the affected side, excessive knee joint flexion, and swing difficulty were 0.892 (0.749-0.969), 0.859 (0.439-0.923), 0.920 (0.738-0.969), and 0.681 (0.391-0.889), respectively.</p><p><strong>Discussion: </strong>The WW-2000 gait analysis system captured four new abnormal gait patterns observed in individuals with hemiparetic stroke with high validity, in addition to nine previously validated abnormal gait patterns. Assessing abnormal gait patterns is important as improving them contributes to stroke rehabilitation.</p><p><strong>Clinical trial registration: </strong>https://jrct.niph.go.jp, identifier jRCT 042190109.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1558009"},"PeriodicalIF":2.6,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144225249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hexapod robot motion planning investigation under the influence of multi-dimensional terrain features. 多维地形特征影响下的六足机器人运动规划研究。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-21 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1605938
Chen Chen, Junbo Lin, Bo You, Jiayu Li, Biao Gao

To address the challenges arising from the coupled interactions between multi-dimensional terrain features-encompassing both geometric and physical properties of complex field environments-and the locomotion stability of hexapod robots, this paper presents a comprehensive motion planning framework incorporating multi-dimensional terrain information. The proposed methodology systematically extracts multi-dimensional geometric and physical terrain features from a multi-layered environmental map. Based on these features, a traversal cost map is synthesized, and an enhanced A* algorithm is developed that incorporates terrain traversal metrics to optimize path planning safety across complex field environments. Furthermore, the framework introduces a foothold cost map derived from multi-dimensional terrain data, coupled with a fault-tolerant free gait planning algorithm based on foothold cost evaluation. This approach enables dynamic gait modulation to enhance overall locomotion stability while maintaining safe trajectory planning. The efficacy of the proposed framework is validated through both simulation studies and physical experiments on a hexapod robotic platform. Experimental results demonstrate that, compared to conventional hexapod motion planning approaches, the proposed multi-dimensional terrain-aware planning framework significantly enhances both locomotion safety and stability across complex field environments.

为了解决复杂野外环境中包含几何和物理特性的多维地形特征与六足机器人运动稳定性之间的耦合相互作用所带来的挑战,本文提出了一个包含多维地形信息的综合运动规划框架。该方法系统地从多层环境地图中提取多维几何和物理地形特征。基于这些特征,合成了一个遍历成本图,并开发了一个增强的a *算法,该算法包含地形遍历指标,以优化复杂野外环境中的路径规划安全性。此外,该框架引入了基于多维地形数据的立足点成本图,并结合基于立足点成本评估的容错无步态规划算法。这种方法使动态步态调节能够在保持安全轨迹规划的同时增强整体运动稳定性。在六足机器人平台上进行了仿真研究和物理实验,验证了该框架的有效性。实验结果表明,与传统的六足机器人运动规划方法相比,所提出的多维地形感知规划框架显著提高了机器人在复杂野外环境中的运动安全性和稳定性。
{"title":"Hexapod robot motion planning investigation under the influence of multi-dimensional terrain features.","authors":"Chen Chen, Junbo Lin, Bo You, Jiayu Li, Biao Gao","doi":"10.3389/fnbot.2025.1605938","DOIUrl":"10.3389/fnbot.2025.1605938","url":null,"abstract":"<p><p>To address the challenges arising from the coupled interactions between multi-dimensional terrain features-encompassing both geometric and physical properties of complex field environments-and the locomotion stability of hexapod robots, this paper presents a comprehensive motion planning framework incorporating multi-dimensional terrain information. The proposed methodology systematically extracts multi-dimensional geometric and physical terrain features from a multi-layered environmental map. Based on these features, a traversal cost map is synthesized, and an enhanced A* algorithm is developed that incorporates terrain traversal metrics to optimize path planning safety across complex field environments. Furthermore, the framework introduces a foothold cost map derived from multi-dimensional terrain data, coupled with a fault-tolerant free gait planning algorithm based on foothold cost evaluation. This approach enables dynamic gait modulation to enhance overall locomotion stability while maintaining safe trajectory planning. The efficacy of the proposed framework is validated through both simulation studies and physical experiments on a hexapod robotic platform. Experimental results demonstrate that, compared to conventional hexapod motion planning approaches, the proposed multi-dimensional terrain-aware planning framework significantly enhances both locomotion safety and stability across complex field environments.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1605938"},"PeriodicalIF":2.6,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144225250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and experiment of a positioning and pointing mechanism based on the stick-slip driving principle. 一种基于粘滑驱动原理的定位指向机构的分析与实验。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-15 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1567291
Yongqi Zhu, Juan Li, Jianbin Huang, Weida Li, Gai Liu, Lining Sun

Introduction: Traditional positioning and pointing mechanisms often face limitations in simultaneously achieving high speed and high resolution, and their travel range is typically constrained. To overcome these challenges, we propose a novel positioning and pointing mechanism driven by piezoelectric ceramics in this study. This mechanism is capable of achieving both high speed and high resolution by using two driving principles: resonance and stick-slip. This paper will focus on analyzing the stick-slip driving principle.

Methods: We propose a configuration of the drive module within the positioning and pointing mechanism. By applying a low-frequency sawtooth wave excitation to the piezoelectric ceramics, the mechanism achieves high resolution based on the stick-slip driving principle. First, a simplified dynamic model of the drive module is established. The motion process of the drive module in stick-slip driving is divided into the stick phase and slip phase. With static and transient dynamic analyses conducted for each phase, the relationship between the output shaft angle, resolution, and driving voltage is derived. It is observed that during the stick phase, the output shaft angle and the driving voltage exhibit an approximately linear relationship, while in the slip phase, the output shaft angle and the driving voltage display nonlinearity due to impact forces and vibrations. Finally, a prototype of the positioning and pointing mechanism is designed, and an experimental platform is constructed to test the resolution of the prototype.

Results: We construct a prototype of a dual-axis positioning and pointing mechanism composed of multiple drive modules and conduct resolution tests using two control methods: synchronous control and independent control. When synchronous control is used, the output shaft achieves a resolution of 0.38μrad, while with independent control, the resolution of the output shaft reaches 0.0276μrad.

Discussion: The research results show that the positioning and pointing mechanism proposed in this study achieves high resolution through stick-slip driving principle, offering a novel approach for the advancement of such mechanisms.

传统的定位和指向机构在同时实现高速度和高分辨率时往往面临限制,其行程范围通常受到限制。为了克服这些挑战,我们在这项研究中提出了一种新的压电陶瓷驱动的定位和指向机构。该机构采用共振和粘滑两种驱动原理,能够实现高速和高分辨率。本文将重点分析粘滑传动原理。方法:提出了一种定位指向机构内驱动模块的结构。该机构通过对压电陶瓷施加低频锯齿波激励,实现了基于粘滑驱动原理的高分辨率驱动。首先,建立了驱动模块的简化动力学模型。将粘滑传动中驱动模块的运动过程分为粘滑阶段和滑移阶段。通过对各相位进行静态和瞬态动态分析,推导出输出轴角、分辨率和驱动电压之间的关系。结果表明,在粘滞阶段,输出轴角与驱动电压呈近似线性关系,而在滑移阶段,由于冲击力和振动的影响,输出轴角与驱动电压呈非线性关系。最后,设计了定位指向机构样机,搭建了实验平台,对样机的分辨率进行了测试。结果:构建了由多个驱动模块组成的双轴定位指向机构样机,并采用同步控制和独立控制两种控制方式进行了分辨率测试。采用同步控制时,输出轴的分辨率达到0.38μrad,采用独立控制时,输出轴的分辨率达到0.0276μrad。讨论:研究结果表明,本研究提出的定位指向机构通过粘滑驱动原理实现了高分辨率,为该类机构的发展提供了新的途径。
{"title":"Analysis and experiment of a positioning and pointing mechanism based on the stick-slip driving principle.","authors":"Yongqi Zhu, Juan Li, Jianbin Huang, Weida Li, Gai Liu, Lining Sun","doi":"10.3389/fnbot.2025.1567291","DOIUrl":"10.3389/fnbot.2025.1567291","url":null,"abstract":"<p><strong>Introduction: </strong>Traditional positioning and pointing mechanisms often face limitations in simultaneously achieving high speed and high resolution, and their travel range is typically constrained. To overcome these challenges, we propose a novel positioning and pointing mechanism driven by piezoelectric ceramics in this study. This mechanism is capable of achieving both high speed and high resolution by using two driving principles: resonance and stick-slip. This paper will focus on analyzing the stick-slip driving principle.</p><p><strong>Methods: </strong>We propose a configuration of the drive module within the positioning and pointing mechanism. By applying a low-frequency sawtooth wave excitation to the piezoelectric ceramics, the mechanism achieves high resolution based on the stick-slip driving principle. First, a simplified dynamic model of the drive module is established. The motion process of the drive module in stick-slip driving is divided into the stick phase and slip phase. With static and transient dynamic analyses conducted for each phase, the relationship between the output shaft angle, resolution, and driving voltage is derived. It is observed that during the stick phase, the output shaft angle and the driving voltage exhibit an approximately linear relationship, while in the slip phase, the output shaft angle and the driving voltage display nonlinearity due to impact forces and vibrations. Finally, a prototype of the positioning and pointing mechanism is designed, and an experimental platform is constructed to test the resolution of the prototype.</p><p><strong>Results: </strong>We construct a prototype of a dual-axis positioning and pointing mechanism composed of multiple drive modules and conduct resolution tests using two control methods: synchronous control and independent control. When synchronous control is used, the output shaft achieves a resolution of 0.38<i>μrad</i>, while with independent control, the resolution of the output shaft reaches 0.0276<i>μrad</i>.</p><p><strong>Discussion: </strong>The research results show that the positioning and pointing mechanism proposed in this study achieves high resolution through stick-slip driving principle, offering a novel approach for the advancement of such mechanisms.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1567291"},"PeriodicalIF":2.6,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12119557/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144181186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TS-Resformer: a model based on multimodal fusion for the classification of music signals. TS-Resformer:一种基于多模态融合的音乐信号分类模型。
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-13 eCollection Date: 2025-01-01 DOI: 10.3389/fnbot.2025.1568811
Yilin Zhang

The number of music of different genres is increasing year by year, and manual classification is costly and requires professionals in the field of music to manually design features, some of which lack the generality of music genre classification. Deep learning has had a large number of scientific research results in the field of music classification, but the existing deep learning methods still have the problems of insufficient extraction of music feature information, low accuracy rate of music genres, loss of time series information, and slow training. To address the problem that different music durations affect the accuracy of music genre classification, we form a Log Mel spectrum with music audio data of different cut durations. After discarding incomplete audio, we design data enhancement with different slicing durations and verify its effect on accuracy and training time through comparison experiments. Based on this, the audio signal is divided into frames, windowed and short-time Fourier transformed, and then the Log Mel spectrum is obtained by using the Mel filter and logarithmic compression. Aiming at the problems of loss of time information, insufficient feature extraction, and low classification accuracy in music genre classification, firstly, we propose a Res-Transformer model that fuses the residual network with the Transformer coding layer. The model consists of two branches, the left branch is an improved residual network, which enhances the spectral feature extraction ability and network expression ability and realizes the dimensionality reduction; the right branch uses four Transformer coding layers to extract the time-series information of the Log Mel spectrum. The output vectors of the two branches are spliced and input into the classifier to realize music genre classification. Then, to further improve the classification accuracy of the model, we propose the TS-Resformer model based on the Res-Transformer model, combined with different attention mechanisms, and design the time-frequency attention mechanism, which employs different scales of filters to fully extract the low-level music features from the two dimensions of time and frequency as the input to the time-frequency attention mechanism, respectively. Finally, experiments show that the accuracy of this method is 90.23% on the FMA-small dataset, which is an improvement in classification accuracy compared with the classical model.

不同体裁的音乐数量逐年增加,手工分类成本高,需要音乐领域的专业人员手工设计功能,其中一些缺乏音乐体裁分类的通用性。深度学习在音乐分类领域已经有了大量的科学研究成果,但现有的深度学习方法仍然存在音乐特征信息提取不足、音乐流派准确率低、时间序列信息丢失、训练缓慢等问题。为了解决不同音乐持续时间影响音乐类型分类准确性的问题,我们将不同剪切持续时间的音乐音频数据组成Log Mel谱。在丢弃不完整音频后,我们设计了不同切片持续时间的数据增强,并通过对比实验验证了其对准确率和训练时间的影响。在此基础上,对音频信号进行分帧、加窗和短时傅里叶变换,然后利用梅尔滤波和对数压缩得到对数梅尔谱。针对音乐类型分类中存在的时间信息丢失、特征提取不足、分类准确率低等问题,首先提出了残差网络与Transformer编码层融合的Res-Transformer模型;该模型由两个分支组成,左分支是一个改进的残差网络,增强了光谱特征提取能力和网络表达能力,实现了降维;右分支使用四个Transformer编码层提取Log Mel频谱的时间序列信息。将两个分支的输出向量拼接输入到分类器中,实现音乐类型的分类。然后,为了进一步提高模型的分类精度,我们在Res-Transformer模型的基础上提出了TS-Resformer模型,结合不同的注意机制,设计了时频注意机制,采用不同尺度的滤波器分别从时间和频率两个维度上充分提取低层次音乐特征作为时频注意机制的输入。最后,实验表明,该方法在fma小数据集上的分类准确率为90.23%,与经典模型相比,分类准确率有了很大提高。
{"title":"TS-Resformer: a model based on multimodal fusion for the classification of music signals.","authors":"Yilin Zhang","doi":"10.3389/fnbot.2025.1568811","DOIUrl":"10.3389/fnbot.2025.1568811","url":null,"abstract":"<p><p>The number of music of different genres is increasing year by year, and manual classification is costly and requires professionals in the field of music to manually design features, some of which lack the generality of music genre classification. Deep learning has had a large number of scientific research results in the field of music classification, but the existing deep learning methods still have the problems of insufficient extraction of music feature information, low accuracy rate of music genres, loss of time series information, and slow training. To address the problem that different music durations affect the accuracy of music genre classification, we form a Log Mel spectrum with music audio data of different cut durations. After discarding incomplete audio, we design data enhancement with different slicing durations and verify its effect on accuracy and training time through comparison experiments. Based on this, the audio signal is divided into frames, windowed and short-time Fourier transformed, and then the Log Mel spectrum is obtained by using the Mel filter and logarithmic compression. Aiming at the problems of loss of time information, insufficient feature extraction, and low classification accuracy in music genre classification, firstly, we propose a Res-Transformer model that fuses the residual network with the Transformer coding layer. The model consists of two branches, the left branch is an improved residual network, which enhances the spectral feature extraction ability and network expression ability and realizes the dimensionality reduction; the right branch uses four Transformer coding layers to extract the time-series information of the Log Mel spectrum. The output vectors of the two branches are spliced and input into the classifier to realize music genre classification. Then, to further improve the classification accuracy of the model, we propose the TS-Resformer model based on the Res-Transformer model, combined with different attention mechanisms, and design the time-frequency attention mechanism, which employs different scales of filters to fully extract the low-level music features from the two dimensions of time and frequency as the input to the time-frequency attention mechanism, respectively. Finally, experiments show that the accuracy of this method is 90.23% on the FMA-small dataset, which is an improvement in classification accuracy compared with the classical model.</p>","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1568811"},"PeriodicalIF":2.6,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12106318/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144157987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Neurorobotics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1