Pub Date : 2024-03-26DOI: 10.1109/TCDS.2024.3382109
Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li
Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.
{"title":"Deep Neural Networks for Automatic Sleep Stage Classification and Consciousness Assessment in Patients With Disorder of Consciousness","authors":"Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li","doi":"10.1109/TCDS.2024.3382109","DOIUrl":"10.1109/TCDS.2024.3382109","url":null,"abstract":"Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1589-1603"},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140315148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-22DOI: 10.1109/TCDS.2024.3404061
Shijie Wang;Zhiqiang Pu;Yi Pan;Boyin Liu;Hao Ma;Jianqiang Yi
A highly competitive and confrontational football match is full of strategic and tactical challenges. Therefore, player's cognition on their opponents’ strategies and tactics is quite crucial. However, the match's complexity results in that the opponents’ intentions are often changeable. Under these circumstances, how to discriminate and predict the opponents’ intentions of future actions and tactics is an important problem for football players’ decision-making. Considering that the opponents’ cognitive processes involve deliberative and reactive processes, a long-term and short-term opponent intention inference (LS-OII) method for football multiplayer policy learning is proposed. First, to capture the cognition about opponents’ deliberative process, we design an opponent tactics deduction module for inferring the opponents’ long-term tactical intentions from a macro perspective. Furthermore, an opponent decision prediction module is designed to infer the opponents’ short-term decision which often yields rapid and direct impacts on football matches. Additionally, an opponent-driven incentive module is designed to enhance the players’ causal awareness of the opponents’ intentions, further to improve the players exploration capabilities and effectively obtain outstanding policies. Representative results demonstrate that the LS-OII method significantly enhances the efficacy of players’ strategies in the Google Research Football environment, thereby affirming the superiority of our method.
{"title":"Long-Term and Short-Term Opponent Intention Inference for Football Multiplayer Policy Learning","authors":"Shijie Wang;Zhiqiang Pu;Yi Pan;Boyin Liu;Hao Ma;Jianqiang Yi","doi":"10.1109/TCDS.2024.3404061","DOIUrl":"10.1109/TCDS.2024.3404061","url":null,"abstract":"A highly competitive and confrontational football match is full of strategic and tactical challenges. Therefore, player's cognition on their opponents’ strategies and tactics is quite crucial. However, the match's complexity results in that the opponents’ intentions are often changeable. Under these circumstances, how to discriminate and predict the opponents’ intentions of future actions and tactics is an important problem for football players’ decision-making. Considering that the opponents’ cognitive processes involve deliberative and reactive processes, a long-term and short-term opponent intention inference (LS-OII) method for football multiplayer policy learning is proposed. First, to capture the cognition about opponents’ deliberative process, we design an opponent tactics deduction module for inferring the opponents’ long-term tactical intentions from a macro perspective. Furthermore, an opponent decision prediction module is designed to infer the opponents’ short-term decision which often yields rapid and direct impacts on football matches. Additionally, an opponent-driven incentive module is designed to enhance the players’ causal awareness of the opponents’ intentions, further to improve the players exploration capabilities and effectively obtain outstanding policies. Representative results demonstrate that the LS-OII method significantly enhances the efficacy of players’ strategies in the Google Research Football environment, thereby affirming the superiority of our method.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2055-2069"},"PeriodicalIF":5.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-22DOI: 10.1109/TCDS.2024.3380907
Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li
Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.
{"title":"EventAugment: Learning Augmentation Policies From Asynchronous Event-Based Data","authors":"Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li","doi":"10.1109/TCDS.2024.3380907","DOIUrl":"10.1109/TCDS.2024.3380907","url":null,"abstract":"Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1521-1532"},"PeriodicalIF":5.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-20DOI: 10.1109/TCDS.2024.3401717
Chao Tang;Dongyao Jiang;Lujuan Dang;Badong Chen
In current research, noninvasive brain–computer interfaces (BCIs) typically rely on electroencephalogram (EEG) signals to measure brain activity. Motor imagery EEG decoding is an important research field of BCIs. Although multichannel EEG signals provide higher resolution, they contain noise and redundant data unrelated to the task, which affect the performance of BCI systems. We investigate the interactions between EEG signals from dependence analysis to improve the classification accuracy. In this article, a novel channel selection method based on normalized mutual information (NMI) is first proposed to select the informative channels. Then, a histogram of oriented gradient is applied to feature extraction in the rearranged NMI matrices. Finally, a support vector machine with a radial basis function kernel is used for the classification of different motor imagery tasks. Four publicly available BCI datasets are employed to evaluate the effectiveness of the proposed method. The experimental results show that the proposed decoding scheme significantly improves classification accuracy and outperforms other competing methods.
{"title":"EEG Decoding Based on Normalized Mutual Information for Motor Imagery Brain–Computer Interfaces","authors":"Chao Tang;Dongyao Jiang;Lujuan Dang;Badong Chen","doi":"10.1109/TCDS.2024.3401717","DOIUrl":"10.1109/TCDS.2024.3401717","url":null,"abstract":"In current research, noninvasive brain–computer interfaces (BCIs) typically rely on electroencephalogram (EEG) signals to measure brain activity. Motor imagery EEG decoding is an important research field of BCIs. Although multichannel EEG signals provide higher resolution, they contain noise and redundant data unrelated to the task, which affect the performance of BCI systems. We investigate the interactions between EEG signals from dependence analysis to improve the classification accuracy. In this article, a novel channel selection method based on normalized mutual information (NMI) is first proposed to select the informative channels. Then, a histogram of oriented gradient is applied to feature extraction in the rearranged NMI matrices. Finally, a support vector machine with a radial basis function kernel is used for the classification of different motor imagery tasks. Four publicly available BCI datasets are employed to evaluate the effectiveness of the proposed method. The experimental results show that the proposed decoding scheme significantly improves classification accuracy and outperforms other competing methods.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1997-2007"},"PeriodicalIF":5.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-20DOI: 10.1109/TCDS.2024.3403900
Wanhao Niu;Haowen Wang;Chungang Zhuang
Point cloud classification and segmentation are crucial tasks for point cloud processing and have wide range of applications, such as autonomous driving and robot grasping. Some pioneering methods, including PointNet, VoxNet, and DGCNN, have made substantial advancements. However, most of these methods overlook the geometric relationships between points at large distances from different perspectives within the point cloud. This oversight constrains feature extraction capabilities and consequently limits any further improvements in classification and segmentation accuracy. To address this issue, we propose an adaptive multiview graph convolutional network (AM-GCN), which comprehensively synthesizes both the global geometric features of the point cloud and the local features within the projection planes of multiple views through an adaptive graph construction method. First, an adaptive rotation module in AM-GCN is proposed to predict a more favorable angle of view for projection. Then, a multilevel feature extraction network can flexibly be constructed by spatial-based or spectral-based graph convolution layers. Finally, AM-GCN is evaluated on ModelNet40 for classification, ShapeNetPart for part segmentation, ScanNetv2 and S3DIS for scene segmentation, which demonstrates the robustness of the AM-GCN with competitive performance compared with existing methods. It is worth noting that it performs state-of-the-art performance in many categories.
{"title":"Adaptive Multiview Graph Convolutional Network for 3-D Point Cloud Classification and Segmentation","authors":"Wanhao Niu;Haowen Wang;Chungang Zhuang","doi":"10.1109/TCDS.2024.3403900","DOIUrl":"10.1109/TCDS.2024.3403900","url":null,"abstract":"Point cloud classification and segmentation are crucial tasks for point cloud processing and have wide range of applications, such as autonomous driving and robot grasping. Some pioneering methods, including PointNet, VoxNet, and DGCNN, have made substantial advancements. However, most of these methods overlook the geometric relationships between points at large distances from different perspectives within the point cloud. This oversight constrains feature extraction capabilities and consequently limits any further improvements in classification and segmentation accuracy. To address this issue, we propose an adaptive multiview graph convolutional network (AM-GCN), which comprehensively synthesizes both the global geometric features of the point cloud and the local features within the projection planes of multiple views through an adaptive graph construction method. First, an adaptive rotation module in AM-GCN is proposed to predict a more favorable angle of view for projection. Then, a multilevel feature extraction network can flexibly be constructed by spatial-based or spectral-based graph convolution layers. Finally, AM-GCN is evaluated on ModelNet40 for classification, ShapeNetPart for part segmentation, ScanNetv2 and S3DIS for scene segmentation, which demonstrates the robustness of the AM-GCN with competitive performance compared with existing methods. It is worth noting that it performs state-of-the-art performance in many categories.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2043-2054"},"PeriodicalIF":5.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TCDS.2024.3376072
Reza Javanmard Alitappeh;Akhil John;Bernardo Dias;A. John van Opstal;Alexandre Bernardino
This article explores the application of model-based optimal control principles in understanding stereotyped human oculomotor behaviors. Using a realistic model of the human eye with a six-muscle cable-driven actuation system, we tackle the novel challenges of addressing a system with six degrees of freedom. We apply nonlinear optimal control techniques to optimize accuracy, energy, and duration of eye-movement trajectories. Employing a recurrent neural network to emulate system dynamics, we focus on generating rapid, unconstrained saccadic eye-movements. Remarkably, our model replicates realistic 3-D rotational kinematics and dynamics observed in human saccades, with the six cables organizing themselves into appropriate antagonistic muscle pairs, resembling the primate oculomotor system.
{"title":"Emergence of Human Oculomotor Behavior in a Cable-Driven Biomimetic Robotic Eye Using Optimal Control","authors":"Reza Javanmard Alitappeh;Akhil John;Bernardo Dias;A. John van Opstal;Alexandre Bernardino","doi":"10.1109/TCDS.2024.3376072","DOIUrl":"10.1109/TCDS.2024.3376072","url":null,"abstract":"This article explores the application of model-based optimal control principles in understanding stereotyped human oculomotor behaviors. Using a realistic model of the human eye with a six-muscle cable-driven actuation system, we tackle the novel challenges of addressing a system with six degrees of freedom. We apply nonlinear optimal control techniques to optimize accuracy, energy, and duration of eye-movement trajectories. Employing a recurrent neural network to emulate system dynamics, we focus on generating rapid, unconstrained saccadic eye-movements. Remarkably, our model replicates realistic 3-D rotational kinematics and dynamics observed in human saccades, with the six cables organizing themselves into appropriate antagonistic muscle pairs, resembling the primate oculomotor system.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1546-1560"},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474482","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TCDS.2024.3377445
Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge
This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).
{"title":"The Inadequacy of Reinforcement Learning From Human Feedback—Radicalizing Large Language Models via Semantic Vulnerabilities","authors":"Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge","doi":"10.1109/TCDS.2024.3377445","DOIUrl":"10.1109/TCDS.2024.3377445","url":null,"abstract":"This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1561-1574"},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TCDS.2024.3377642
Guang Han;Jianshu Ma;Ziyang Li;Haitao Zhao
With the development of transformer visual models, attention-based trackers have shown highly competitive performance in the field of object tracking. However, in some tracking scenarios, especially those with multiple similar objects, the performance of existing trackers is often not satisfactory. In order to improve the performance of trackers in such scenarios, inspired by the fovea vision structure and its visual characteristics, this article proposes a novel foveal vision tracker (FVT). FVT combines the process of human eye fixation and object tracking, pruning based on the distance to the object rather than attention scores. This pruning method allows the receptive field of the feature extraction network to focus on the object, excluding background interference. FVT divides the feature extraction network into two stages: local and global, and introduces the local recursive module (LRM) and the view elimination module (VEM). LRM is used to enhance foreground features in the local stage, while VEM generates circular fovea-like visual field masks in the global stage and prunes tokens outside the mask, guiding the model to focus attention on high-information regions of the object. Experimental results on multiple object tracking datasets demonstrate that the proposed FVT achieves stronger object discrimination capability in the feature extraction stage, improves tracking accuracy and robustness in complex scenes, and achieves a significant accuracy improvement with an area overlap (AO) of 72.6% on the generic object tracking (GOT)-10k dataset.
{"title":"A Two-Stage Foveal Vision Tracker Based on Transformer Model","authors":"Guang Han;Jianshu Ma;Ziyang Li;Haitao Zhao","doi":"10.1109/TCDS.2024.3377642","DOIUrl":"10.1109/TCDS.2024.3377642","url":null,"abstract":"With the development of transformer visual models, attention-based trackers have shown highly competitive performance in the field of object tracking. However, in some tracking scenarios, especially those with multiple similar objects, the performance of existing trackers is often not satisfactory. In order to improve the performance of trackers in such scenarios, inspired by the fovea vision structure and its visual characteristics, this article proposes a novel foveal vision tracker (FVT). FVT combines the process of human eye fixation and object tracking, pruning based on the distance to the object rather than attention scores. This pruning method allows the receptive field of the feature extraction network to focus on the object, excluding background interference. FVT divides the feature extraction network into two stages: local and global, and introduces the local recursive module (LRM) and the view elimination module (VEM). LRM is used to enhance foreground features in the local stage, while VEM generates circular fovea-like visual field masks in the global stage and prunes tokens outside the mask, guiding the model to focus attention on high-information regions of the object. Experimental results on multiple object tracking datasets demonstrate that the proposed FVT achieves stronger object discrimination capability in the feature extraction stage, improves tracking accuracy and robustness in complex scenes, and achieves a significant accuracy improvement with an area overlap (AO) of 72.6% on the generic object tracking (GOT)-10k dataset.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1575-1588"},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-14DOI: 10.1109/TCDS.2024.3375620
Hong You;Xian Zhong;Wenxuan Liu;Qi Wei;Wenxin Huang;Zhaofei Yu;Tiejun Huang
Spiking neural networks (SNNs) have garnered significant attention for their potential in ultralow-power event-driven neuromorphic hardware implementations. One effective strategy for obtaining SNNs involves the conversion of artificial neural networks (ANNs) to SNNs. However, existing research on ANN–SNN conversion has predominantly focused on image classification task, leaving the exploration of action recognition task limited. In this article, we investigate the performance degradation of SNNs on action recognition task. Through in-depth analysis, we propose a framework called scalable dual threshold mapping (SDM) that effectively overcomes three types of conversion errors. By effectively mitigating these conversion errors, we are able to reduce the time required for the spike firing rate of SNNs to align with the activation values of ANNs. Consequently, our method enables the generation of accurate and ultralow-latency SNNs. We conduct extensive evaluations on multiple action recognition datasets, including University of Central Florida (UCF)-101 and Human Motion DataBase (HMDB)-51. Through rigorous experiments and analysis, we demonstrate the effectiveness of our approach. Notably, SDM achieves a remarkable Top-1 accuracy of 92.94% on UCF-101 while requiring ultralow latency (four time steps), highlighting its high performance with reduced computational requirements.
{"title":"Converting Artificial Neural Networks to Ultralow-Latency Spiking Neural Networks for Action Recognition","authors":"Hong You;Xian Zhong;Wenxuan Liu;Qi Wei;Wenxin Huang;Zhaofei Yu;Tiejun Huang","doi":"10.1109/TCDS.2024.3375620","DOIUrl":"10.1109/TCDS.2024.3375620","url":null,"abstract":"Spiking neural networks (SNNs) have garnered significant attention for their potential in ultralow-power event-driven neuromorphic hardware implementations. One effective strategy for obtaining SNNs involves the conversion of artificial neural networks (ANNs) to SNNs. However, existing research on ANN–SNN conversion has predominantly focused on image classification task, leaving the exploration of action recognition task limited. In this article, we investigate the performance degradation of SNNs on action recognition task. Through in-depth analysis, we propose a framework called scalable dual threshold mapping (SDM) that effectively overcomes three types of conversion errors. By effectively mitigating these conversion errors, we are able to reduce the time required for the spike firing rate of SNNs to align with the activation values of ANNs. Consequently, our method enables the generation of accurate and ultralow-latency SNNs. We conduct extensive evaluations on multiple action recognition datasets, including University of Central Florida (UCF)-101 and Human Motion DataBase (HMDB)-51. Through rigorous experiments and analysis, we demonstrate the effectiveness of our approach. Notably, SDM achieves a remarkable Top-1 accuracy of 92.94% on UCF-101 while requiring ultralow latency (four time steps), highlighting its high performance with reduced computational requirements.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1533-1545"},"PeriodicalIF":5.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-12DOI: 10.1109/TCDS.2024.3376433
Siqi Cai;Ran Zhang;Malu Zhang;Jibin Wu;Haizhou Li
Decoding auditory attention from brain activities, such as electroencephalography (EEG), sheds light on solving the machine cocktail party problem. However, effective representation of EEG signals remains a challenge. One of the reasons is that the current feature extraction techniques have not fully exploited the spatial information along the EEG signals. EEG signals reflect the collective dynamics of brain activities across different regions. The intricate interactions among these channels, rather than individual EEG channels alone, reflect the distinctive features of brain activities. In this study, we propose a spiking graph convolutional network (SGCN), which captures the spatial features of multichannel EEG in a biologically plausible manner. Comprehensive experiments were conducted on two publicly available datasets. Results demonstrate that the proposed SGCN achieves competitive auditory attention detection (AAD) performance in low-latency and low-density EEG settings. As it features low power consumption, the SGCN has the potential for practical implementation in intelligent hearing aids and other brain–computer interfaces (BCIs).
{"title":"EEG-Based Auditory Attention Detection With Spiking Graph Convolutional Network","authors":"Siqi Cai;Ran Zhang;Malu Zhang;Jibin Wu;Haizhou Li","doi":"10.1109/TCDS.2024.3376433","DOIUrl":"10.1109/TCDS.2024.3376433","url":null,"abstract":"Decoding auditory attention from brain activities, such as electroencephalography (EEG), sheds light on solving the machine cocktail party problem. However, effective representation of EEG signals remains a challenge. One of the reasons is that the current feature extraction techniques have not fully exploited the spatial information along the EEG signals. EEG signals reflect the collective dynamics of brain activities across different regions. The intricate interactions among these channels, rather than individual EEG channels alone, reflect the distinctive features of brain activities. In this study, we propose a spiking graph convolutional network (SGCN), which captures the spatial features of multichannel EEG in a biologically plausible manner. Comprehensive experiments were conducted on two publicly available datasets. Results demonstrate that the proposed SGCN achieves competitive auditory attention detection (AAD) performance in low-latency and low-density EEG settings. As it features low power consumption, the SGCN has the potential for practical implementation in intelligent hearing aids and other brain–computer interfaces (BCIs).","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1698-1706"},"PeriodicalIF":5.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140115810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}