首页 > 最新文献

IEEE Transactions on Cognitive and Developmental Systems最新文献

英文 中文
MAT: Morphological Adaptive Transformer for Universal Morphology Policy Learning MAT:用于通用形态学策略学习的形态学自适应变换器
IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-01 DOI: 10.1109/TCDS.2024.3383158
Boyu Li;Haoran Li;Yuanheng Zhu;Dongbin Zhao
Agent-agnostic reinforcement learning aims to learn a universal control policy that can simultaneously control a set of robots with different morphologies. Recent studies have suggested that using the transformer model can address variations in state and action spaces caused by different morphologies, and morphology information is necessary to improve policy performance. However, existing methods have limitations in exploiting morphological information, where the rationality of observation integration cannot be guaranteed. We propose morphological adaptive transformer (MAT), a transformer-based universal control algorithm that can adapt to various morphologies without any modifications. MAT includes two essential components: functional position encoding (FPE) and morphological attention mechanism (MAM). The FPE provides robust and consistent positional prior information for limb observation to avoid limb confusion and implicitly obtain functional descriptions of limbs. The MAM enhances the attribute prior information of limbs, improves the correlation between observations, and makes the policy pay attention to more limbs. We combine observation with prior information to help policy adapt to the morphology of robots, thereby optimizing its performance with unknown morphologies. Experiments on agent-agnostic tasks in Gym MuJoCo environment demonstrate that our algorithm can assign more reasonable morphological prior information to each limb, and the performance of our algorithm is comparable to the prior state-of-the-art algorithm with better generalization.
与代理无关的强化学习旨在学习一种通用控制策略,该策略可同时控制一组具有不同形态的机器人。最近的研究表明,使用变压器模型可以解决不同形态引起的状态和行动空间的变化,而形态信息是提高策略性能的必要条件。然而,现有方法在利用形态信息方面存在局限性,无法保证观测整合的合理性。我们提出了形态自适应变换器(MAT),这是一种基于变换器的通用控制算法,无需任何修改即可适应各种形态。MAT 包括两个基本组成部分:功能位置编码(FPE)和形态注意机制(MAM)。FPE 为肢体观察提供稳健一致的位置先验信息,以避免肢体混淆,并隐含地获得肢体的功能描述。MAM 增强了肢体的属性先验信息,提高了观察之间的相关性,并使政策关注更多的肢体。我们将观察结果与先验信息相结合,帮助策略适应机器人的形态,从而优化其在未知形态下的性能。在 Gym MuJoCo 环境中进行的与代理无关的任务实验表明,我们的算法可以为每个肢体分配更合理的形态先验信息,而且我们算法的性能与之前最先进的算法相当,并具有更好的泛化能力。
{"title":"MAT: Morphological Adaptive Transformer for Universal Morphology Policy Learning","authors":"Boyu Li;Haoran Li;Yuanheng Zhu;Dongbin Zhao","doi":"10.1109/TCDS.2024.3383158","DOIUrl":"10.1109/TCDS.2024.3383158","url":null,"abstract":"Agent-agnostic reinforcement learning aims to learn a universal control policy that can simultaneously control a set of robots with different morphologies. Recent studies have suggested that using the transformer model can address variations in state and action spaces caused by different morphologies, and morphology information is necessary to improve policy performance. However, existing methods have limitations in exploiting morphological information, where the rationality of observation integration cannot be guaranteed. We propose morphological adaptive transformer (MAT), a transformer-based universal control algorithm that can adapt to various morphologies without any modifications. MAT includes two essential components: functional position encoding (FPE) and morphological attention mechanism (MAM). The FPE provides robust and consistent positional prior information for limb observation to avoid limb confusion and implicitly obtain functional descriptions of limbs. The MAM enhances the attribute prior information of limbs, improves the correlation between observations, and makes the policy pay attention to more limbs. We combine observation with prior information to help policy adapt to the morphology of robots, thereby optimizing its performance with unknown morphologies. Experiments on agent-agnostic tasks in Gym MuJoCo environment demonstrate that our algorithm can assign more reasonable morphological prior information to each limb, and the performance of our algorithm is comparable to the prior state-of-the-art algorithm with better generalization.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring Human Comfort in Human-Robot Collaboration via Wearable Sensing 通过可穿戴传感技术测量人机协作中的舒适度
IF 5 3区 计算机科学 Q1 Computer Science Pub Date : 2024-03-29 DOI: 10.1109/tcds.2024.3383296
Yuchen Yan, Haotian Su, Yunyi Jia
{"title":"Measuring Human Comfort in Human-Robot Collaboration via Wearable Sensing","authors":"Yuchen Yan, Haotian Su, Yunyi Jia","doi":"10.1109/tcds.2024.3383296","DOIUrl":"https://doi.org/10.1109/tcds.2024.3383296","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Neural Networks for Automatic Sleep Stage Classification and Consciousness Assessment in Patients With Disorder of Consciousness 用于意识障碍患者自动睡眠阶段分类和意识评估的深度神经网络
IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-26 DOI: 10.1109/TCDS.2024.3382109
Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li
Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.
意识障碍(DOC)通常与睡眠结构的严重变化有关。本文介绍了一种睡眠评估算法,可对 DOC 患者的睡眠结构进行评分,以帮助评估其意识水平。该睡眠评估算法分为两部分:1)自动睡眠分期模型:采用卷积神经网络(CNN)从脑电图(EEG)和脑电图(EOG)中提取信号特征,并应用具有注意力机制的双向长短期记忆(Bi-LSTM)学习序列信息;2)意识评估:利用自动睡眠分期结果提取与意识相关的睡眠特征,并利用支持向量机(SVM)分类器评估意识。在本研究中,使用睡眠-EDF 和 MASS 数据集对带有注意力睡眠网络(CBASleepNet)的 CNN-BiLSTM 模型进行了实验。实验结果证明了所提出模型的有效性,其表现优于同类模型。此外,通过迁移学习和微调,CBASleepNet 被应用于 DOC 患者的睡眠分期。对七名微意识状态(MCS)患者和四名植物人状态(VS)/无反应清醒综合征(UWS)患者进行了意识评估,总体准确率达到 81.8%。该睡眠评估算法可用于有效评估患者的意识水平。
{"title":"Deep Neural Networks for Automatic Sleep Stage Classification and Consciousness Assessment in Patients With Disorder of Consciousness","authors":"Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li","doi":"10.1109/TCDS.2024.3382109","DOIUrl":"10.1109/TCDS.2024.3382109","url":null,"abstract":"Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140315148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EventAugment: Learning Augmentation Policies From Asynchronous Event-Based Data EventAugment:从基于事件的异步数据中学习增强策略
IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-22 DOI: 10.1109/TCDS.2024.3380907
Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li
Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.
数据增强是克服深度学习模型过拟合问题的有效方法。然而,现有的数据增强研究大多针对框架类数据(如图像),很少涉及基于事件的数据。基于事件的数据不同于框架类数据,因此为框架类数据设计的增强技术不适合基于事件的数据。这项工作涉及基于事件的对象分类和语义分割的数据增强,这对自动驾驶和机器人操纵非常重要。具体来说,我们引入了 EventAugment,这是一种通过自动学习增强策略来增强基于事件的异步数据的新方法。我们首先确定了 13 种增强基于事件数据的操作。接下来,我们将寻找最佳增强策略的问题表述为一个超参数优化问题。为了解决这个问题,我们提出了一个基于随机搜索的框架。最后,我们在 N-Caltech101、N-Cars、ST-MNIST、N-MNIST、DVSGesture 和 DDD17 等六个公共数据集上评估了所提出的方法。实验结果表明,EventAugment 对基于深度神经网络的模型和基于尖峰神经网络的模型都有显著的性能提升,提升幅度高达约 4%。值得注意的是,EventAugment 在整体性能方面优于最先进的方法。
{"title":"EventAugment: Learning Augmentation Policies From Asynchronous Event-Based Data","authors":"Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li","doi":"10.1109/TCDS.2024.3380907","DOIUrl":"10.1109/TCDS.2024.3380907","url":null,"abstract":"Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emergence of Human Oculomotor Behavior in a Cable-Driven Biomimetic Robotic Eye Using Optimal Control 利用优化控制在线缆驱动的仿生机器人眼球中出现人类眼球运动行为
IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-18 DOI: 10.1109/TCDS.2024.3376072
Reza Javanmard Alitappeh;Akhil John;Bernardo Dias;A. John van Opstal;Alexandre Bernardino
This article explores the application of model-based optimal control principles in understanding stereotyped human oculomotor behaviors. Using a realistic model of the human eye with a six-muscle cable-driven actuation system, we tackle the novel challenges of addressing a system with six degrees of freedom. We apply nonlinear optimal control techniques to optimize accuracy, energy, and duration of eye-movement trajectories. Employing a recurrent neural network to emulate system dynamics, we focus on generating rapid, unconstrained saccadic eye-movements. Remarkably, our model replicates realistic 3-D rotational kinematics and dynamics observed in human saccades, with the six cables organizing themselves into appropriate antagonistic muscle pairs, resembling the primate oculomotor system.
本文探讨了基于模型的最优控制原理在理解人类刻板眼球运动行为中的应用。我们使用一个具有六肌肉拉索驱动执行系统的现实人眼模型,解决了处理具有六个自由度的系统所面临的新挑战。我们应用非线性优化控制技术来优化眼球运动轨迹的精度、能量和持续时间。我们采用递归神经网络来模拟系统动力学,重点是产生快速、无约束的眼球运动。值得注意的是,我们的模型复制了在人类眼球运动中观察到的真实三维旋转运动学和动力学,六条缆线组织成适当的拮抗肌肉对,类似于灵长类动物的眼球运动系统。
{"title":"Emergence of Human Oculomotor Behavior in a Cable-Driven Biomimetic Robotic Eye Using Optimal Control","authors":"Reza Javanmard Alitappeh;Akhil John;Bernardo Dias;A. John van Opstal;Alexandre Bernardino","doi":"10.1109/TCDS.2024.3376072","DOIUrl":"10.1109/TCDS.2024.3376072","url":null,"abstract":"This article explores the application of model-based optimal control principles in understanding stereotyped human oculomotor behaviors. Using a realistic model of the human eye with a six-muscle cable-driven actuation system, we tackle the novel challenges of addressing a system with six degrees of freedom. We apply nonlinear optimal control techniques to optimize accuracy, energy, and duration of eye-movement trajectories. Employing a recurrent neural network to emulate system dynamics, we focus on generating rapid, unconstrained saccadic eye-movements. Remarkably, our model replicates realistic 3-D rotational kinematics and dynamics observed in human saccades, with the six cables organizing themselves into appropriate antagonistic muscle pairs, resembling the primate oculomotor system.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474482","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Inadequacy of Reinforcement Learning From Human Feedback—Radicalizing Large Language Models via Semantic Vulnerabilities 从人类反馈中强化学习的不足--通过语义漏洞激化大型语言模型
IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-18 DOI: 10.1109/TCDS.2024.3377445
Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge
This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).
本研究是对四种流行的预训练商业大语言模型(LLM)在意识形态操纵下的语义脆弱性进行的实证调查。我们采用与心理学中的人类语义条件反射类似的策略,诱导并评估了四种商业预训练大语言模型在回答 30 个有争议的问题时的意识形态错位及其保留情况,这些问题涉及广泛的意识形态和社会范围,包括极左和极右观点。这种语义漏洞的产生是由于 LLMs 理解语言细节变化的能力存在根本性的限制,这使得它们很容易被有针对性的语义利用来进行意识形态操纵。我们观察到从人类反馈中强化学习(RLHF)对 LLM 初始答案的影响,但强调了 RLHF 在两个方面的局限性:1)它无法完全缓解意识形态条件提示的影响,导致部分减轻了 LLM 的语义漏洞;以及 2)它无法充分体现多样化的 "人类价值观",往往反映的是控制 LLM 的某些群体的预定义价值观。我们的研究结果提供了当前 LLM 固有语义漏洞的实证证据,对 RLHF 作为使 LLM 符合人类价值观的主流方法的稳健性和适当性提出了质疑,并强调了在开发符合伦理和具有弹性的人工智能(AI)时采用多学科方法的必要性。
{"title":"The Inadequacy of Reinforcement Learning From Human Feedback—Radicalizing Large Language Models via Semantic Vulnerabilities","authors":"Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge","doi":"10.1109/TCDS.2024.3377445","DOIUrl":"10.1109/TCDS.2024.3377445","url":null,"abstract":"This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Two-Stage Foveal Vision Tracker Based on Transformer Model 基于变压器模型的两级眼窝视觉跟踪器
IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-18 DOI: 10.1109/TCDS.2024.3377642
Guang Han;Jianshu Ma;Ziyang Li;Haitao Zhao
With the development of transformer visual models, attention-based trackers have shown highly competitive performance in the field of object tracking. However, in some tracking scenarios, especially those with multiple similar objects, the performance of existing trackers is often not satisfactory. In order to improve the performance of trackers in such scenarios, inspired by the fovea vision structure and its visual characteristics, this article proposes a novel foveal vision tracker (FVT). FVT combines the process of human eye fixation and object tracking, pruning based on the distance to the object rather than attention scores. This pruning method allows the receptive field of the feature extraction network to focus on the object, excluding background interference. FVT divides the feature extraction network into two stages: local and global, and introduces the local recursive module (LRM) and the view elimination module (VEM). LRM is used to enhance foreground features in the local stage, while VEM generates circular fovea-like visual field masks in the global stage and prunes tokens outside the mask, guiding the model to focus attention on high-information regions of the object. Experimental results on multiple object tracking datasets demonstrate that the proposed FVT achieves stronger object discrimination capability in the feature extraction stage, improves tracking accuracy and robustness in complex scenes, and achieves a significant accuracy improvement with an area overlap (AO) of 72.6% on the generic object tracking (GOT)-10k dataset.
随着变换器视觉模型的发展,基于注意力的跟踪器在物体跟踪领域显示出了极具竞争力的性能。然而,在某些跟踪场景中,特别是在有多个相似物体的场景中,现有跟踪器的性能往往不能令人满意。为了提高跟踪器在这类场景中的性能,本文受眼窝视觉结构及其视觉特性的启发,提出了一种新型眼窝视觉跟踪器(FVT)。FVT 结合了人眼固定和物体跟踪的过程,根据到物体的距离而不是注意力分数进行剪枝。这种剪枝方法能让特征提取网络的感受野聚焦于目标,排除背景干扰。FVT 将特征提取网络分为两个阶段:局部和全局,并引入了局部递归模块(LRM)和视图消除模块(VEM)。局部递归模块用于增强局部阶段的前景特征,而全局递归模块则在全局阶段生成类似眼窝的圆形视场遮罩,并剪除遮罩外的标记,引导模型将注意力集中在物体的高信息区域。在多个物体跟踪数据集上的实验结果表明,所提出的 FVT 在特征提取阶段实现了更强的物体识别能力,提高了复杂场景中的跟踪精度和鲁棒性,并在通用物体跟踪(GOT)-10k 数据集上实现了显著的精度提高,区域重叠率(AO)达到 72.6%。
{"title":"A Two-Stage Foveal Vision Tracker Based on Transformer Model","authors":"Guang Han;Jianshu Ma;Ziyang Li;Haitao Zhao","doi":"10.1109/TCDS.2024.3377642","DOIUrl":"10.1109/TCDS.2024.3377642","url":null,"abstract":"With the development of transformer visual models, attention-based trackers have shown highly competitive performance in the field of object tracking. However, in some tracking scenarios, especially those with multiple similar objects, the performance of existing trackers is often not satisfactory. In order to improve the performance of trackers in such scenarios, inspired by the fovea vision structure and its visual characteristics, this article proposes a novel foveal vision tracker (FVT). FVT combines the process of human eye fixation and object tracking, pruning based on the distance to the object rather than attention scores. This pruning method allows the receptive field of the feature extraction network to focus on the object, excluding background interference. FVT divides the feature extraction network into two stages: local and global, and introduces the local recursive module (LRM) and the view elimination module (VEM). LRM is used to enhance foreground features in the local stage, while VEM generates circular fovea-like visual field masks in the global stage and prunes tokens outside the mask, guiding the model to focus attention on high-information regions of the object. Experimental results on multiple object tracking datasets demonstrate that the proposed FVT achieves stronger object discrimination capability in the feature extraction stage, improves tracking accuracy and robustness in complex scenes, and achieves a significant accuracy improvement with an area overlap (AO) of 72.6% on the generic object tracking (GOT)-10k dataset.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Converting Artificial Neural Networks to Ultralow-Latency Spiking Neural Networks for Action Recognition 将人工神经网络转换为超低延迟尖峰神经网络以进行动作识别
IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-14 DOI: 10.1109/TCDS.2024.3375620
Hong You;Xian Zhong;Wenxuan Liu;Qi Wei;Wenxin Huang;Zhaofei Yu;Tiejun Huang
Spiking neural networks (SNNs) have garnered significant attention for their potential in ultralow-power event-driven neuromorphic hardware implementations. One effective strategy for obtaining SNNs involves the conversion of artificial neural networks (ANNs) to SNNs. However, existing research on ANN–SNN conversion has predominantly focused on image classification task, leaving the exploration of action recognition task limited. In this article, we investigate the performance degradation of SNNs on action recognition task. Through in-depth analysis, we propose a framework called scalable dual threshold mapping (SDM) that effectively overcomes three types of conversion errors. By effectively mitigating these conversion errors, we are able to reduce the time required for the spike firing rate of SNNs to align with the activation values of ANNs. Consequently, our method enables the generation of accurate and ultralow-latency SNNs. We conduct extensive evaluations on multiple action recognition datasets, including University of Central Florida (UCF)-101 and Human Motion DataBase (HMDB)-51. Through rigorous experiments and analysis, we demonstrate the effectiveness of our approach. Notably, SDM achieves a remarkable Top-1 accuracy of 92.94% on UCF-101 while requiring ultralow latency (four time steps), highlighting its high performance with reduced computational requirements.
尖峰神经网络(SNN)因其在超低功耗事件驱动神经形态硬件实现方面的潜力而备受关注。获得尖峰神经网络的有效策略之一是将人工神经网络(ANN)转换为尖峰神经网络。然而,现有的 ANN-SNN 转换研究主要集中在图像分类任务上,对动作识别任务的探索十分有限。本文研究了 SNN 在动作识别任务中的性能退化问题。通过深入分析,我们提出了一种称为可扩展双阈值映射(SDM)的框架,它能有效克服三种类型的转换误差。通过有效缓解这些转换误差,我们能够缩短 SNNs 的尖峰发射率与 ANNs 的激活值保持一致所需的时间。因此,我们的方法能够生成准确且超低延迟的 SNN。我们在多个动作识别数据集上进行了广泛的评估,包括中佛罗里达大学(UCF)-101 和人类运动数据库(HMDB)-51。通过严格的实验和分析,我们证明了我们方法的有效性。值得注意的是,在 UCF-101 数据集上,SDM 的 Top-1 准确率高达 92.94%,而延迟时间却极短(4 个时间步),这凸显了该方法在降低计算要求的同时还具有很高的性能。
{"title":"Converting Artificial Neural Networks to Ultralow-Latency Spiking Neural Networks for Action Recognition","authors":"Hong You;Xian Zhong;Wenxuan Liu;Qi Wei;Wenxin Huang;Zhaofei Yu;Tiejun Huang","doi":"10.1109/TCDS.2024.3375620","DOIUrl":"10.1109/TCDS.2024.3375620","url":null,"abstract":"Spiking neural networks (SNNs) have garnered significant attention for their potential in ultralow-power event-driven neuromorphic hardware implementations. One effective strategy for obtaining SNNs involves the conversion of artificial neural networks (ANNs) to SNNs. However, existing research on ANN–SNN conversion has predominantly focused on image classification task, leaving the exploration of action recognition task limited. In this article, we investigate the performance degradation of SNNs on action recognition task. Through in-depth analysis, we propose a framework called scalable dual threshold mapping (SDM) that effectively overcomes three types of conversion errors. By effectively mitigating these conversion errors, we are able to reduce the time required for the spike firing rate of SNNs to align with the activation values of ANNs. Consequently, our method enables the generation of accurate and ultralow-latency SNNs. We conduct extensive evaluations on multiple action recognition datasets, including University of Central Florida (UCF)-101 and Human Motion DataBase (HMDB)-51. Through rigorous experiments and analysis, we demonstrate the effectiveness of our approach. Notably, SDM achieves a remarkable Top-1 accuracy of 92.94% on UCF-101 while requiring ultralow latency (four time steps), highlighting its high performance with reduced computational requirements.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EEG-based Auditory Attention Detection with Spiking Graph Convolutional Network 利用尖峰图卷积网络进行基于脑电图的听觉注意力检测
IF 5 3区 计算机科学 Q1 Computer Science Pub Date : 2024-03-12 DOI: 10.1109/tcds.2024.3376433
Siqi Cai, Ran Zhang, Malu Zhang, Jibin Wu, Haizhou Li
{"title":"EEG-based Auditory Attention Detection with Spiking Graph Convolutional Network","authors":"Siqi Cai, Ran Zhang, Malu Zhang, Jibin Wu, Haizhou Li","doi":"10.1109/tcds.2024.3376433","DOIUrl":"https://doi.org/10.1109/tcds.2024.3376433","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140115810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Perception-Based Visual Simultaneous Localization and Tracking in Dynamic Environments 动态环境中基于感知的稳健视觉同步定位与跟踪
IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-28 DOI: 10.1109/TCDS.2024.3371073
Song Peng;Teng Ran;Liang Yuan;Jianbo Zhang;Wendong Xiao
Visual simultaneous localization and mapping (SLAM) in dynamic scenes is a prerequisite for robot-related applications. Most of the existing SLAM algorithms mainly focus on dynamic object rejection, which makes part of the valuable information lost and prone to failure in complex environments. This article proposes a semantic visual SLAM system that incorporates rigid object tracking. A robust scene perception frame is designed, which gives autonomous robots the ability to perceive scenes similar to human cognition. Specifically, we propose a two-stage mask revision method to generate fine mask of the object. Based on the revised mask, we propose a semantic and geometric constraint (SAG) strategy, which provides a fast and robust way to perceive dynamic rigid objects. Then, the motion tracking of rigid objects is integrated into the SLAM pipeline, and a novel bundle adjustment is constructed to optimize camera localization and object six-degree of freedom (DoF) poses. Finally, the evaluation of the proposed algorithm is performed on publicly available KITTI dataset, Oxford Multimotion dataset, and real-world scenarios. The proposed algorithm achieves the comprehensive performance of $text{RPE}_{text{t}}$ less than 0.07 m per frame and $text{RPE}_{text{R}}$ about 0.03${}^{circ}$ per frame in the KITTI dataset. The experimental results reveal that the proposed algorithm enables accurate localization and robust tracking than state-of-the-art SLAM algorithms in challenging dynamic scenarios.
动态场景中的视觉同步定位与映射(SLAM)是机器人相关应用的先决条件。现有的大多数 SLAM 算法主要集中在动态物体剔除上,这使得部分有价值的信息丢失,在复杂环境中容易失效。本文提出了一种结合刚性物体跟踪的语义视觉 SLAM 系统。我们设计了一个稳健的场景感知框架,使自主机器人具有与人类认知类似的场景感知能力。具体来说,我们提出了一种两阶段遮罩修正方法,以生成物体的精细遮罩。基于修正后的遮罩,我们提出了一种语义和几何约束(SAG)策略,为感知动态刚性物体提供了一种快速而稳健的方法。然后,将刚性物体的运动跟踪集成到 SLAM 管道中,并构建了一种新颖的捆绑调整,以优化相机定位和物体的六自由度 (DoF) 位置。最后,在公开的 KITTI 数据集、牛津 Multimotion 数据集和实际场景中对所提出的算法进行了评估。在 KITTI 数据集中,所提算法实现了每帧小于 0.07 米、每帧约 0.03 美元{}^{circ}$的综合性能。实验结果表明,与最先进的 SLAM 算法相比,所提出的算法能够在具有挑战性的动态场景中实现精确定位和稳健跟踪。
{"title":"Robust Perception-Based Visual Simultaneous Localization and Tracking in Dynamic Environments","authors":"Song Peng;Teng Ran;Liang Yuan;Jianbo Zhang;Wendong Xiao","doi":"10.1109/TCDS.2024.3371073","DOIUrl":"10.1109/TCDS.2024.3371073","url":null,"abstract":"Visual simultaneous localization and mapping (SLAM) in dynamic scenes is a prerequisite for robot-related applications. Most of the existing SLAM algorithms mainly focus on dynamic object rejection, which makes part of the valuable information lost and prone to failure in complex environments. This article proposes a semantic visual SLAM system that incorporates rigid object tracking. A robust scene perception frame is designed, which gives autonomous robots the ability to perceive scenes similar to human cognition. Specifically, we propose a two-stage mask revision method to generate fine mask of the object. Based on the revised mask, we propose a semantic and geometric constraint (SAG) strategy, which provides a fast and robust way to perceive dynamic rigid objects. Then, the motion tracking of rigid objects is integrated into the SLAM pipeline, and a novel bundle adjustment is constructed to optimize camera localization and object six-degree of freedom (DoF) poses. Finally, the evaluation of the proposed algorithm is performed on publicly available KITTI dataset, Oxford Multimotion dataset, and real-world scenarios. The proposed algorithm achieves the comprehensive performance of \u0000<inline-formula><tex-math>$text{RPE}_{text{t}}$</tex-math></inline-formula>\u0000 less than 0.07 m per frame and \u0000<inline-formula><tex-math>$text{RPE}_{text{R}}$</tex-math></inline-formula>\u0000 about 0.03\u0000<inline-formula><tex-math>${}^{circ}$</tex-math></inline-formula>\u0000 per frame in the KITTI dataset. The experimental results reveal that the proposed algorithm enables accurate localization and robust tracking than state-of-the-art SLAM algorithms in challenging dynamic scenarios.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140002820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Cognitive and Developmental Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1