IEEE Transactions on Cognitive and Developmental Systems最新文献_第9页

Deep Neural Networks for Automatic Sleep Stage Classification and Consciousness Assessment in Patients With Disorder of Consciousness 用于意识障碍患者自动睡眠阶段分类和意识评估的深度神经网络

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-26 DOI: 10.1109/TCDS.2024.3382109

Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li

Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.

意识障碍（DOC）通常与睡眠结构的严重变化有关。本文介绍了一种睡眠评估算法，可对 DOC 患者的睡眠结构进行评分，以帮助评估其意识水平。该睡眠评估算法分为两部分：1）自动睡眠分期模型：采用卷积神经网络（CNN）从脑电图（EEG）和脑电图（EOG）中提取信号特征，并应用具有注意力机制的双向长短期记忆（Bi-LSTM）学习序列信息；2）意识评估：利用自动睡眠分期结果提取与意识相关的睡眠特征，并利用支持向量机（SVM）分类器评估意识。在本研究中，使用睡眠-EDF 和 MASS 数据集对带有注意力睡眠网络（CBASleepNet）的 CNN-BiLSTM 模型进行了实验。实验结果证明了所提出模型的有效性，其表现优于同类模型。此外，通过迁移学习和微调，CBASleepNet 被应用于 DOC 患者的睡眠分期。对七名微意识状态（MCS）患者和四名植物人状态（VS）/无反应清醒综合征（UWS）患者进行了意识评估，总体准确率达到 81.8%。该睡眠评估算法可用于有效评估患者的意识水平。

{"title":"Deep Neural Networks for Automatic Sleep Stage Classification and Consciousness Assessment in Patients With Disorder of Consciousness","authors":"Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li","doi":"10.1109/TCDS.2024.3382109","DOIUrl":"10.1109/TCDS.2024.3382109","url":null,"abstract":"Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1589-1603"},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140315148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Long-Term and Short-Term Opponent Intention Inference for Football Multiplayer Policy Learning 足球多人策略学习的长期和短期对手意图推断

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-22 DOI: 10.1109/TCDS.2024.3404061

Shijie Wang;Zhiqiang Pu;Yi Pan;Boyin Liu;Hao Ma;Jianqiang Yi

A highly competitive and confrontational football match is full of strategic and tactical challenges. Therefore, player's cognition on their opponents’ strategies and tactics is quite crucial. However, the match's complexity results in that the opponents’ intentions are often changeable. Under these circumstances, how to discriminate and predict the opponents’ intentions of future actions and tactics is an important problem for football players’ decision-making. Considering that the opponents’ cognitive processes involve deliberative and reactive processes, a long-term and short-term opponent intention inference (LS-OII) method for football multiplayer policy learning is proposed. First, to capture the cognition about opponents’ deliberative process, we design an opponent tactics deduction module for inferring the opponents’ long-term tactical intentions from a macro perspective. Furthermore, an opponent decision prediction module is designed to infer the opponents’ short-term decision which often yields rapid and direct impacts on football matches. Additionally, an opponent-driven incentive module is designed to enhance the players’ causal awareness of the opponents’ intentions, further to improve the players exploration capabilities and effectively obtain outstanding policies. Representative results demonstrate that the LS-OII method significantly enhances the efficacy of players’ strategies in the Google Research Football environment, thereby affirming the superiority of our method.

一场高度竞争和对抗性的足球比赛充满了战略和战术挑战。因此，玩家对对手战略战术的认知是至关重要的。然而，比赛的复杂性导致对手的意图往往是多变的。在这种情况下，如何辨别和预测对手未来行动和战术的意图是足球运动员决策的重要问题。考虑到对手的认知过程包括深思熟虑和反应性过程，提出了一种用于足球多人策略学习的长期和短期对手意图推理（LS-OII）方法。首先，为了捕捉对对手商议过程的认知，我们设计了对手战术演绎模块，从宏观角度推断对手的长期战术意图。此外，设计了对手决策预测模块来推断对手的短期决策，这些决策通常会对足球比赛产生快速而直接的影响。另外，设计对手驱动的激励模块，增强玩家对对手意图的因果意识，进而提高玩家的探索能力，有效获取优秀政策。代表性结果表明，LS-OII方法在谷歌Research Football环境下显著提高了球员策略的有效性，从而肯定了我们方法的优越性。

{"title":"Long-Term and Short-Term Opponent Intention Inference for Football Multiplayer Policy Learning","authors":"Shijie Wang;Zhiqiang Pu;Yi Pan;Boyin Liu;Hao Ma;Jianqiang Yi","doi":"10.1109/TCDS.2024.3404061","DOIUrl":"10.1109/TCDS.2024.3404061","url":null,"abstract":"A highly competitive and confrontational football match is full of strategic and tactical challenges. Therefore, player's cognition on their opponents’ strategies and tactics is quite crucial. However, the match's complexity results in that the opponents’ intentions are often changeable. Under these circumstances, how to discriminate and predict the opponents’ intentions of future actions and tactics is an important problem for football players’ decision-making. Considering that the opponents’ cognitive processes involve deliberative and reactive processes, a long-term and short-term opponent intention inference (LS-OII) method for football multiplayer policy learning is proposed. First, to capture the cognition about opponents’ deliberative process, we design an opponent tactics deduction module for inferring the opponents’ long-term tactical intentions from a macro perspective. Furthermore, an opponent decision prediction module is designed to infer the opponents’ short-term decision which often yields rapid and direct impacts on football matches. Additionally, an opponent-driven incentive module is designed to enhance the players’ causal awareness of the opponents’ intentions, further to improve the players exploration capabilities and effectively obtain outstanding policies. Representative results demonstrate that the LS-OII method significantly enhances the efficacy of players’ strategies in the Google Research Football environment, thereby affirming the superiority of our method.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2055-2069"},"PeriodicalIF":5.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EventAugment: Learning Augmentation Policies From Asynchronous Event-Based Data EventAugment：从基于事件的异步数据中学习增强策略

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-22 DOI: 10.1109/TCDS.2024.3380907

Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li

Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.

数据增强是克服深度学习模型过拟合问题的有效方法。然而，现有的数据增强研究大多针对框架类数据（如图像），很少涉及基于事件的数据。基于事件的数据不同于框架类数据，因此为框架类数据设计的增强技术不适合基于事件的数据。这项工作涉及基于事件的对象分类和语义分割的数据增强，这对自动驾驶和机器人操纵非常重要。具体来说，我们引入了 EventAugment，这是一种通过自动学习增强策略来增强基于事件的异步数据的新方法。我们首先确定了 13 种增强基于事件数据的操作。接下来，我们将寻找最佳增强策略的问题表述为一个超参数优化问题。为了解决这个问题，我们提出了一个基于随机搜索的框架。最后，我们在 N-Caltech101、N-Cars、ST-MNIST、N-MNIST、DVSGesture 和 DDD17 等六个公共数据集上评估了所提出的方法。实验结果表明，EventAugment 对基于深度神经网络的模型和基于尖峰神经网络的模型都有显著的性能提升，提升幅度高达约 4%。值得注意的是，EventAugment 在整体性能方面优于最先进的方法。

{"title":"EventAugment: Learning Augmentation Policies From Asynchronous Event-Based Data","authors":"Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li","doi":"10.1109/TCDS.2024.3380907","DOIUrl":"10.1109/TCDS.2024.3380907","url":null,"abstract":"Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1521-1532"},"PeriodicalIF":5.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EEG Decoding Based on Normalized Mutual Information for Motor Imagery Brain–Computer Interfaces 基于归一化互信息的脑电解码用于运动图像脑机接口

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-20 DOI: 10.1109/TCDS.2024.3401717

Chao Tang;Dongyao Jiang;Lujuan Dang;Badong Chen

In current research, noninvasive brain–computer interfaces (BCIs) typically rely on electroencephalogram (EEG) signals to measure brain activity. Motor imagery EEG decoding is an important research field of BCIs. Although multichannel EEG signals provide higher resolution, they contain noise and redundant data unrelated to the task, which affect the performance of BCI systems. We investigate the interactions between EEG signals from dependence analysis to improve the classification accuracy. In this article, a novel channel selection method based on normalized mutual information (NMI) is first proposed to select the informative channels. Then, a histogram of oriented gradient is applied to feature extraction in the rearranged NMI matrices. Finally, a support vector machine with a radial basis function kernel is used for the classification of different motor imagery tasks. Four publicly available BCI datasets are employed to evaluate the effectiveness of the proposed method. The experimental results show that the proposed decoding scheme significantly improves classification accuracy and outperforms other competing methods.

在目前的研究中，无创脑机接口（bci）通常依靠脑电图（EEG）信号来测量大脑活动。运动意象脑电解码是脑机接口的一个重要研究领域。虽然多通道脑电信号具有较高的分辨率，但其中含有与任务无关的噪声和冗余数据，影响脑机接口系统的性能。通过相关性分析研究脑电信号之间的相互作用，提高分类精度。本文首次提出了一种基于归一化互信息（NMI）的信道选择方法来选择信息信道。然后，将定向梯度直方图应用于重排NMI矩阵的特征提取。最后，利用径向基函数核支持向量机对不同的运动图像任务进行分类。使用四个公开可用的BCI数据集来评估所提出方法的有效性。实验结果表明，所提出的解码方案显著提高了分类精度，优于其他竞争方法。

{"title":"EEG Decoding Based on Normalized Mutual Information for Motor Imagery Brain–Computer Interfaces","authors":"Chao Tang;Dongyao Jiang;Lujuan Dang;Badong Chen","doi":"10.1109/TCDS.2024.3401717","DOIUrl":"10.1109/TCDS.2024.3401717","url":null,"abstract":"In current research, noninvasive brain–computer interfaces (BCIs) typically rely on electroencephalogram (EEG) signals to measure brain activity. Motor imagery EEG decoding is an important research field of BCIs. Although multichannel EEG signals provide higher resolution, they contain noise and redundant data unrelated to the task, which affect the performance of BCI systems. We investigate the interactions between EEG signals from dependence analysis to improve the classification accuracy. In this article, a novel channel selection method based on normalized mutual information (NMI) is first proposed to select the informative channels. Then, a histogram of oriented gradient is applied to feature extraction in the rearranged NMI matrices. Finally, a support vector machine with a radial basis function kernel is used for the classification of different motor imagery tasks. Four publicly available BCI datasets are employed to evaluate the effectiveness of the proposed method. The experimental results show that the proposed decoding scheme significantly improves classification accuracy and outperforms other competing methods.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1997-2007"},"PeriodicalIF":5.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Multiview Graph Convolutional Network for 3-D Point Cloud Classification and Segmentation 用于三维点云分类和分割的自适应多视图图卷积网络

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-20 DOI: 10.1109/TCDS.2024.3403900

Wanhao Niu;Haowen Wang;Chungang Zhuang

Point cloud classification and segmentation are crucial tasks for point cloud processing and have wide range of applications, such as autonomous driving and robot grasping. Some pioneering methods, including PointNet, VoxNet, and DGCNN, have made substantial advancements. However, most of these methods overlook the geometric relationships between points at large distances from different perspectives within the point cloud. This oversight constrains feature extraction capabilities and consequently limits any further improvements in classification and segmentation accuracy. To address this issue, we propose an adaptive multiview graph convolutional network (AM-GCN), which comprehensively synthesizes both the global geometric features of the point cloud and the local features within the projection planes of multiple views through an adaptive graph construction method. First, an adaptive rotation module in AM-GCN is proposed to predict a more favorable angle of view for projection. Then, a multilevel feature extraction network can flexibly be constructed by spatial-based or spectral-based graph convolution layers. Finally, AM-GCN is evaluated on ModelNet40 for classification, ShapeNetPart for part segmentation, ScanNetv2 and S3DIS for scene segmentation, which demonstrates the robustness of the AM-GCN with competitive performance compared with existing methods. It is worth noting that it performs state-of-the-art performance in many categories.

点云分类和分割是点云处理的关键任务，在自动驾驶和机器人抓取等领域有着广泛的应用。一些开创性的方法，包括PointNet、VoxNet和DGCNN，已经取得了实质性的进展。然而，这些方法大多忽略了点云内不同角度远距离点之间的几何关系。这种疏忽限制了特征提取能力，从而限制了分类和分割精度的进一步改进。为了解决这一问题，我们提出了一种自适应多视图图卷积网络（AM-GCN），该网络通过自适应图构建方法综合了点云的全局几何特征和多视图投影平面内的局部特征。首先，提出了AM-GCN中的自适应旋转模块来预测更有利的投影视角；然后，通过基于空间或基于频谱的图卷积层，可以灵活地构建多层特征提取网络。最后，在ModelNet40分类、ShapeNetPart零件分割、ScanNetv2和S3DIS场景分割上对AM-GCN进行了评估，结果表明，与现有方法相比，AM-GCN具有较强的鲁棒性和竞争力。值得注意的是，它在许多类别中表现出最先进的性能。

{"title":"Adaptive Multiview Graph Convolutional Network for 3-D Point Cloud Classification and Segmentation","authors":"Wanhao Niu;Haowen Wang;Chungang Zhuang","doi":"10.1109/TCDS.2024.3403900","DOIUrl":"10.1109/TCDS.2024.3403900","url":null,"abstract":"Point cloud classification and segmentation are crucial tasks for point cloud processing and have wide range of applications, such as autonomous driving and robot grasping. Some pioneering methods, including PointNet, VoxNet, and DGCNN, have made substantial advancements. However, most of these methods overlook the geometric relationships between points at large distances from different perspectives within the point cloud. This oversight constrains feature extraction capabilities and consequently limits any further improvements in classification and segmentation accuracy. To address this issue, we propose an adaptive multiview graph convolutional network (AM-GCN), which comprehensively synthesizes both the global geometric features of the point cloud and the local features within the projection planes of multiple views through an adaptive graph construction method. First, an adaptive rotation module in AM-GCN is proposed to predict a more favorable angle of view for projection. Then, a multilevel feature extraction network can flexibly be constructed by spatial-based or spectral-based graph convolution layers. Finally, AM-GCN is evaluated on ModelNet40 for classification, ShapeNetPart for part segmentation, ScanNetv2 and S3DIS for scene segmentation, which demonstrates the robustness of the AM-GCN with competitive performance compared with existing methods. It is worth noting that it performs state-of-the-art performance in many categories.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2043-2054"},"PeriodicalIF":5.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Emergence of Human Oculomotor Behavior in a Cable-Driven Biomimetic Robotic Eye Using Optimal Control 利用优化控制在线缆驱动的仿生机器人眼球中出现人类眼球运动行为

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-18 DOI: 10.1109/TCDS.2024.3376072

Reza Javanmard Alitappeh;Akhil John;Bernardo Dias;A. John van Opstal;Alexandre Bernardino

This article explores the application of model-based optimal control principles in understanding stereotyped human oculomotor behaviors. Using a realistic model of the human eye with a six-muscle cable-driven actuation system, we tackle the novel challenges of addressing a system with six degrees of freedom. We apply nonlinear optimal control techniques to optimize accuracy, energy, and duration of eye-movement trajectories. Employing a recurrent neural network to emulate system dynamics, we focus on generating rapid, unconstrained saccadic eye-movements. Remarkably, our model replicates realistic 3-D rotational kinematics and dynamics observed in human saccades, with the six cables organizing themselves into appropriate antagonistic muscle pairs, resembling the primate oculomotor system.

本文探讨了基于模型的最优控制原理在理解人类刻板眼球运动行为中的应用。我们使用一个具有六肌肉拉索驱动执行系统的现实人眼模型，解决了处理具有六个自由度的系统所面临的新挑战。我们应用非线性优化控制技术来优化眼球运动轨迹的精度、能量和持续时间。我们采用递归神经网络来模拟系统动力学，重点是产生快速、无约束的眼球运动。值得注意的是，我们的模型复制了在人类眼球运动中观察到的真实三维旋转运动学和动力学，六条缆线组织成适当的拮抗肌肉对，类似于灵长类动物的眼球运动系统。

引用次数: 0

The Inadequacy of Reinforcement Learning From Human Feedback—Radicalizing Large Language Models via Semantic Vulnerabilities 从人类反馈中强化学习的不足--通过语义漏洞激化大型语言模型

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-18 DOI: 10.1109/TCDS.2024.3377445

Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge

This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).

本研究是对四种流行的预训练商业大语言模型（LLM）在意识形态操纵下的语义脆弱性进行的实证调查。我们采用与心理学中的人类语义条件反射类似的策略，诱导并评估了四种商业预训练大语言模型在回答 30 个有争议的问题时的意识形态错位及其保留情况，这些问题涉及广泛的意识形态和社会范围，包括极左和极右观点。这种语义漏洞的产生是由于 LLMs 理解语言细节变化的能力存在根本性的限制，这使得它们很容易被有针对性的语义利用来进行意识形态操纵。我们观察到从人类反馈中强化学习（RLHF）对 LLM 初始答案的影响，但强调了 RLHF 在两个方面的局限性：1）它无法完全缓解意识形态条件提示的影响，导致部分减轻了 LLM 的语义漏洞；以及 2）它无法充分体现多样化的 "人类价值观"，往往反映的是控制 LLM 的某些群体的预定义价值观。我们的研究结果提供了当前 LLM 固有语义漏洞的实证证据，对 RLHF 作为使 LLM 符合人类价值观的主流方法的稳健性和适当性提出了质疑，并强调了在开发符合伦理和具有弹性的人工智能（AI）时采用多学科方法的必要性。

{"title":"The Inadequacy of Reinforcement Learning From Human Feedback—Radicalizing Large Language Models via Semantic Vulnerabilities","authors":"Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge","doi":"10.1109/TCDS.2024.3377445","DOIUrl":"10.1109/TCDS.2024.3377445","url":null,"abstract":"This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1561-1574"},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Two-Stage Foveal Vision Tracker Based on Transformer Model 基于变压器模型的两级眼窝视觉跟踪器

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-18 DOI: 10.1109/TCDS.2024.3377642

Guang Han;Jianshu Ma;Ziyang Li;Haitao Zhao

With the development of transformer visual models, attention-based trackers have shown highly competitive performance in the field of object tracking. However, in some tracking scenarios, especially those with multiple similar objects, the performance of existing trackers is often not satisfactory. In order to improve the performance of trackers in such scenarios, inspired by the fovea vision structure and its visual characteristics, this article proposes a novel foveal vision tracker (FVT). FVT combines the process of human eye fixation and object tracking, pruning based on the distance to the object rather than attention scores. This pruning method allows the receptive field of the feature extraction network to focus on the object, excluding background interference. FVT divides the feature extraction network into two stages: local and global, and introduces the local recursive module (LRM) and the view elimination module (VEM). LRM is used to enhance foreground features in the local stage, while VEM generates circular fovea-like visual field masks in the global stage and prunes tokens outside the mask, guiding the model to focus attention on high-information regions of the object. Experimental results on multiple object tracking datasets demonstrate that the proposed FVT achieves stronger object discrimination capability in the feature extraction stage, improves tracking accuracy and robustness in complex scenes, and achieves a significant accuracy improvement with an area overlap (AO) of 72.6% on the generic object tracking (GOT)-10k dataset.

随着变换器视觉模型的发展，基于注意力的跟踪器在物体跟踪领域显示出了极具竞争力的性能。然而，在某些跟踪场景中，特别是在有多个相似物体的场景中，现有跟踪器的性能往往不能令人满意。为了提高跟踪器在这类场景中的性能，本文受眼窝视觉结构及其视觉特性的启发，提出了一种新型眼窝视觉跟踪器（FVT）。FVT 结合了人眼固定和物体跟踪的过程，根据到物体的距离而不是注意力分数进行剪枝。这种剪枝方法能让特征提取网络的感受野聚焦于目标，排除背景干扰。FVT 将特征提取网络分为两个阶段：局部和全局，并引入了局部递归模块（LRM）和视图消除模块（VEM）。局部递归模块用于增强局部阶段的前景特征，而全局递归模块则在全局阶段生成类似眼窝的圆形视场遮罩，并剪除遮罩外的标记，引导模型将注意力集中在物体的高信息区域。在多个物体跟踪数据集上的实验结果表明，所提出的 FVT 在特征提取阶段实现了更强的物体识别能力，提高了复杂场景中的跟踪精度和鲁棒性，并在通用物体跟踪（GOT）-10k 数据集上实现了显著的精度提高，区域重叠率（AO）达到 72.6%。

{"title":"A Two-Stage Foveal Vision Tracker Based on Transformer Model","authors":"Guang Han;Jianshu Ma;Ziyang Li;Haitao Zhao","doi":"10.1109/TCDS.2024.3377642","DOIUrl":"10.1109/TCDS.2024.3377642","url":null,"abstract":"With the development of transformer visual models, attention-based trackers have shown highly competitive performance in the field of object tracking. However, in some tracking scenarios, especially those with multiple similar objects, the performance of existing trackers is often not satisfactory. In order to improve the performance of trackers in such scenarios, inspired by the fovea vision structure and its visual characteristics, this article proposes a novel foveal vision tracker (FVT). FVT combines the process of human eye fixation and object tracking, pruning based on the distance to the object rather than attention scores. This pruning method allows the receptive field of the feature extraction network to focus on the object, excluding background interference. FVT divides the feature extraction network into two stages: local and global, and introduces the local recursive module (LRM) and the view elimination module (VEM). LRM is used to enhance foreground features in the local stage, while VEM generates circular fovea-like visual field masks in the global stage and prunes tokens outside the mask, guiding the model to focus attention on high-information regions of the object. Experimental results on multiple object tracking datasets demonstrate that the proposed FVT achieves stronger object discrimination capability in the feature extraction stage, improves tracking accuracy and robustness in complex scenes, and achieves a significant accuracy improvement with an area overlap (AO) of 72.6% on the generic object tracking (GOT)-10k dataset.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1575-1588"},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Converting Artificial Neural Networks to Ultralow-Latency Spiking Neural Networks for Action Recognition 将人工神经网络转换为超低延迟尖峰神经网络以进行动作识别

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-14 DOI: 10.1109/TCDS.2024.3375620

Hong You;Xian Zhong;Wenxuan Liu;Qi Wei;Wenxin Huang;Zhaofei Yu;Tiejun Huang

Spiking neural networks (SNNs) have garnered significant attention for their potential in ultralow-power event-driven neuromorphic hardware implementations. One effective strategy for obtaining SNNs involves the conversion of artificial neural networks (ANNs) to SNNs. However, existing research on ANN–SNN conversion has predominantly focused on image classification task, leaving the exploration of action recognition task limited. In this article, we investigate the performance degradation of SNNs on action recognition task. Through in-depth analysis, we propose a framework called scalable dual threshold mapping (SDM) that effectively overcomes three types of conversion errors. By effectively mitigating these conversion errors, we are able to reduce the time required for the spike firing rate of SNNs to align with the activation values of ANNs. Consequently, our method enables the generation of accurate and ultralow-latency SNNs. We conduct extensive evaluations on multiple action recognition datasets, including University of Central Florida (UCF)-101 and Human Motion DataBase (HMDB)-51. Through rigorous experiments and analysis, we demonstrate the effectiveness of our approach. Notably, SDM achieves a remarkable Top-1 accuracy of 92.94% on UCF-101 while requiring ultralow latency (four time steps), highlighting its high performance with reduced computational requirements.

尖峰神经网络（SNN）因其在超低功耗事件驱动神经形态硬件实现方面的潜力而备受关注。获得尖峰神经网络的有效策略之一是将人工神经网络（ANN）转换为尖峰神经网络。然而，现有的 ANN-SNN 转换研究主要集中在图像分类任务上，对动作识别任务的探索十分有限。本文研究了 SNN 在动作识别任务中的性能退化问题。通过深入分析，我们提出了一种称为可扩展双阈值映射（SDM）的框架，它能有效克服三种类型的转换误差。通过有效缓解这些转换误差，我们能够缩短 SNNs 的尖峰发射率与 ANNs 的激活值保持一致所需的时间。因此，我们的方法能够生成准确且超低延迟的 SNN。我们在多个动作识别数据集上进行了广泛的评估，包括中佛罗里达大学（UCF）-101 和人类运动数据库（HMDB）-51。通过严格的实验和分析，我们证明了我们方法的有效性。值得注意的是，在 UCF-101 数据集上，SDM 的 Top-1 准确率高达 92.94%，而延迟时间却极短（4 个时间步），这凸显了该方法在降低计算要求的同时还具有很高的性能。

{"title":"Converting Artificial Neural Networks to Ultralow-Latency Spiking Neural Networks for Action Recognition","authors":"Hong You;Xian Zhong;Wenxuan Liu;Qi Wei;Wenxin Huang;Zhaofei Yu;Tiejun Huang","doi":"10.1109/TCDS.2024.3375620","DOIUrl":"10.1109/TCDS.2024.3375620","url":null,"abstract":"Spiking neural networks (SNNs) have garnered significant attention for their potential in ultralow-power event-driven neuromorphic hardware implementations. One effective strategy for obtaining SNNs involves the conversion of artificial neural networks (ANNs) to SNNs. However, existing research on ANN–SNN conversion has predominantly focused on image classification task, leaving the exploration of action recognition task limited. In this article, we investigate the performance degradation of SNNs on action recognition task. Through in-depth analysis, we propose a framework called scalable dual threshold mapping (SDM) that effectively overcomes three types of conversion errors. By effectively mitigating these conversion errors, we are able to reduce the time required for the spike firing rate of SNNs to align with the activation values of ANNs. Consequently, our method enables the generation of accurate and ultralow-latency SNNs. We conduct extensive evaluations on multiple action recognition datasets, including University of Central Florida (UCF)-101 and Human Motion DataBase (HMDB)-51. Through rigorous experiments and analysis, we demonstrate the effectiveness of our approach. Notably, SDM achieves a remarkable Top-1 accuracy of 92.94% on UCF-101 while requiring ultralow latency (four time steps), highlighting its high performance with reduced computational requirements.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1533-1545"},"PeriodicalIF":5.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EEG-Based Auditory Attention Detection With Spiking Graph Convolutional Network 利用尖峰图卷积网络进行基于脑电图的听觉注意力检测

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-12 DOI: 10.1109/TCDS.2024.3376433

Siqi Cai;Ran Zhang;Malu Zhang;Jibin Wu;Haizhou Li

Decoding auditory attention from brain activities, such as electroencephalography (EEG), sheds light on solving the machine cocktail party problem. However, effective representation of EEG signals remains a challenge. One of the reasons is that the current feature extraction techniques have not fully exploited the spatial information along the EEG signals. EEG signals reflect the collective dynamics of brain activities across different regions. The intricate interactions among these channels, rather than individual EEG channels alone, reflect the distinctive features of brain activities. In this study, we propose a spiking graph convolutional network (SGCN), which captures the spatial features of multichannel EEG in a biologically plausible manner. Comprehensive experiments were conducted on two publicly available datasets. Results demonstrate that the proposed SGCN achieves competitive auditory attention detection (AAD) performance in low-latency and low-density EEG settings. As it features low power consumption, the SGCN has the potential for practical implementation in intelligent hearing aids and other brain–computer interfaces (BCIs).

从脑电图（EEG）等大脑活动中解码听觉注意力，有助于解决机器鸡尾酒会问题。然而，脑电信号的有效表示仍然是一项挑战。原因之一是目前的特征提取技术没有充分利用脑电信号的空间信息。脑电信号反映了不同区域大脑活动的集体动态。这些信道之间错综复杂的相互作用，而不是单个脑电图信道，反映了大脑活动的显著特征。在本研究中，我们提出了一种尖峰图卷积网络（SGCN），它能以一种生物学上合理的方式捕捉多通道脑电图的空间特征。我们在两个公开数据集上进行了综合实验。结果表明，所提出的 SGCN 在低延迟和低密度脑电图设置下实现了具有竞争力的听觉注意力检测（AAD）性能。由于 SGCN 功耗低，因此有望在智能助听器和其他脑机接口（BCI）中得到实际应用。

{"title":"EEG-Based Auditory Attention Detection With Spiking Graph Convolutional Network","authors":"Siqi Cai;Ran Zhang;Malu Zhang;Jibin Wu;Haizhou Li","doi":"10.1109/TCDS.2024.3376433","DOIUrl":"10.1109/TCDS.2024.3376433","url":null,"abstract":"Decoding auditory attention from brain activities, such as electroencephalography (EEG), sheds light on solving the machine cocktail party problem. However, effective representation of EEG signals remains a challenge. One of the reasons is that the current feature extraction techniques have not fully exploited the spatial information along the EEG signals. EEG signals reflect the collective dynamics of brain activities across different regions. The intricate interactions among these channels, rather than individual EEG channels alone, reflect the distinctive features of brain activities. In this study, we propose a spiking graph convolutional network (SGCN), which captures the spatial features of multichannel EEG in a biologically plausible manner. Comprehensive experiments were conducted on two publicly available datasets. Results demonstrate that the proposed SGCN achieves competitive auditory attention detection (AAD) performance in low-latency and low-density EEG settings. As it features low power consumption, the SGCN has the potential for practical implementation in intelligent hearing aids and other brain–computer interfaces (BCIs).","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1698-1706"},"PeriodicalIF":5.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140115810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0