IEEE Transactions on Cognitive and Developmental Systems最新文献_第10页

Small Object Detection Based on Microscale Perception and Enhancement-Location Feature Pyramid 基于微尺度感知和增强的小物体检测--位置特征金字塔

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-07 DOI: 10.1109/TCDS.2024.3397684

Guang Han;Chenwei Guo;Ziyang Li;Haitao Zhao

Due to the large number of small objects, significant scale variation, and uneven distribution in images captured by unmanned aerial vehicles (UAVs), existing algorithms have high rates of missing and false detections of small objects in drone images. A new object detection algorithm based on microscale perception and enhancement-location feature pyramid is proposed in this article. The microscale perception module alternatives the original convolution module in backbone, changing the receptive field through two dilation branches with various dilation rates and an adjustment switch branch. To better match the size and shape of sampled targets, the weighted deformable convolution is employed. The enhancement-location feature pyramid module aggregates the features from each layer to obtain balanced semantic information and refines aggregated features to enhance their ability to represent features. Moreover, a bottom-up branch structure is added to utilize the property of lower layer features being beneficial to locating small objects to enhance the localization ability for small objects. Additionally, by using specific image cropping and combining techniques, the target distribution of the training data is altered to make the model more sensitive to small objects and improving its robustness. Finally, a sample balance strategy is used in combination with focal loss and a sample extraction control method to balance simple hard sample imbalance and the long-tail distribution of interclass sample imbalance during training. Experimental results show that the proposed algorithm achieves a mean average precision of 35.9% on the VisDrone2019 dataset, which is a 14.2% improvement over the baseline Cascade RCNN and demonstrates better performance in detecting small objects in drone images. Compared with advanced algorithms in recent years, it also achieves state-of-the-art detection accuracy.

由于无人机捕获的图像中小目标数量多、尺度变化大、分布不均匀，现有算法对无人机图像中的小目标存在较高的漏检率和误检率。提出了一种新的基于微尺度感知和增强的目标检测算法——定位特征金字塔。微尺度感知模块替代原有的主干卷积模块，通过两个不同扩张速率的扩张分支和一个调节开关分支改变感受野。为了更好地匹配采样目标的大小和形状，采用了加权可变形卷积。增强-位置特征金字塔模块对各层特征进行聚合，获得均衡的语义信息，并对聚合特征进行细化，增强特征表示能力。此外，利用底层特征有利于小目标定位的特性，增加了自底向上的分支结构，增强了对小目标的定位能力。此外，通过使用特定的图像裁剪和组合技术，改变训练数据的目标分布，使模型对小目标更加敏感，提高了模型的鲁棒性。最后，结合焦点损失和样本提取控制方法，采用样本平衡策略来平衡训练过程中简单硬样本不平衡和类间样本不平衡的长尾分布。实验结果表明，该算法在VisDrone2019数据集上的平均精度为35.9%，比基线Cascade RCNN提高了14.2%，在无人机图像中的小目标检测方面表现出更好的性能。与近年来的先进算法相比，它也达到了最先进的检测精度。

{"title":"Small Object Detection Based on Microscale Perception and Enhancement-Location Feature Pyramid","authors":"Guang Han;Chenwei Guo;Ziyang Li;Haitao Zhao","doi":"10.1109/TCDS.2024.3397684","DOIUrl":"10.1109/TCDS.2024.3397684","url":null,"abstract":"Due to the large number of small objects, significant scale variation, and uneven distribution in images captured by unmanned aerial vehicles (UAVs), existing algorithms have high rates of missing and false detections of small objects in drone images. A new object detection algorithm based on microscale perception and enhancement-location feature pyramid is proposed in this article. The microscale perception module alternatives the original convolution module in backbone, changing the receptive field through two dilation branches with various dilation rates and an adjustment switch branch. To better match the size and shape of sampled targets, the weighted deformable convolution is employed. The enhancement-location feature pyramid module aggregates the features from each layer to obtain balanced semantic information and refines aggregated features to enhance their ability to represent features. Moreover, a bottom-up branch structure is added to utilize the property of lower layer features being beneficial to locating small objects to enhance the localization ability for small objects. Additionally, by using specific image cropping and combining techniques, the target distribution of the training data is altered to make the model more sensitive to small objects and improving its robustness. Finally, a sample balance strategy is used in combination with focal loss and a sample extraction control method to balance simple hard sample imbalance and the long-tail distribution of interclass sample imbalance during training. Experimental results show that the proposed algorithm achieves a mean average precision of 35.9% on the VisDrone2019 dataset, which is a 14.2% improvement over the baseline Cascade RCNN and demonstrates better performance in detecting small objects in drone images. Compared with advanced algorithms in recent years, it also achieves state-of-the-art detection accuracy.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1982-1996"},"PeriodicalIF":5.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LITE-SNN: Leveraging Inherent Dynamics to Train Energy-Efficient Spiking Neural Networks for Sequential Learning LITE-SNN：利用固有动态性训练高能效尖峰神经网络以进行序列学习

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-03 DOI: 10.1109/TCDS.2024.3396431

Nitin Rathi;Kaushik Roy

Spiking neural networks (SNNs) are gaining popularity for their promise of low-power machine intelligence on event-driven neuromorphic hardware. SNNs have achieved comparable performance as artificial neural networks (ANNs) on static tasks (image classification) with lower compute energy. In this work, we explore the inherent dynamics of SNNs for sequential tasks such as gesture recognition, sentiment analysis, and sequence-to-sequence learning on data from dynamic vision sensors (DVSs) and natural language processing (NLP). Sequential data are generally processed with complex recurrent neural networks (RNNs) [long short-term memory/gated recurrent unit (LSTM/GRU)] with explicit feedback connections and internal states to handle the long-term dependencies. The neuron models in SNNs—integrate-and-fire (IF) or leaky-integrate-and-fire (LIF)—have internal states (membrane potential) that can be efficiently leveraged for sequential tasks. The membrane potential in the IF/LIF neuron integrates the incoming current and outputs an event (or spike) when the potential crosses a threshold value. Since SNNs compute with highly sparse spike-based spatiotemporal data, the energy/inference is lower than LSTMs/GRUs. We also show that SNNs require fewer parameters than LSTM/GRU resulting in smaller models and faster inference. We observe the problem of vanishing gradients in vanilla SNNs for longer sequences and implement a convolutional SNN with attention layers to perform sequence-to-sequence learning tasks. The inherent recurrence in SNNs, in addition to the fully parallelized convolutional operations, provide additional mechanisms to model sequential dependencies that lead to better accuracy than convolutional neural networks (CNNs) with ReLU activations. We evaluate SNN on gesture recognition from the IBM DVS dataset, sentiment analysis from the IMDB movie reviews dataset, and German-to-English translation from the Multi30k dataset.

脉冲神经网络（snn）因其在事件驱动的神经形态硬件上实现低功耗机器智能的前景而越来越受欢迎。snn在静态任务（图像分类）上以更低的计算能量取得了与人工神经网络相当的性能。在这项工作中，我们探索了snn在序列任务中的内在动态，如手势识别、情感分析和序列对序列学习，这些学习来自动态视觉传感器（DVSs）和自然语言处理（NLP）的数据。序列数据通常使用复杂递归神经网络（rnn）[长短期记忆/门控递归单元（LSTM/GRU）]处理，具有明确的反馈连接和内部状态来处理长期依赖关系。snns中的神经元模型-整合-激活（IF）或泄漏-整合-激活(liff) -具有内部状态（膜电位），可以有效地用于顺序任务。IF/LIF神经元中的膜电位整合输入电流，并在电位超过阈值时输出一个事件（或峰值）。由于snn使用高度稀疏的基于峰值的时空数据进行计算，因此能量/推理比LSTMs/ gru低。我们还表明，snn比LSTM/GRU需要更少的参数，从而导致更小的模型和更快的推理。我们观察了普通SNN中对于较长序列的梯度消失问题，并实现了一个带有注意层的卷积SNN来执行序列到序列的学习任务。snn固有的递归性，除了完全并行化的卷积操作之外，还提供了额外的机制来建模顺序依赖关系，从而比具有ReLU激活的卷积神经网络（cnn）具有更好的准确性。我们评估了SNN对来自IBM DVS数据集的手势识别、来自IMDB电影评论数据集的情感分析以及来自Multi30k数据集的德语到英语翻译。

{"title":"LITE-SNN: Leveraging Inherent Dynamics to Train Energy-Efficient Spiking Neural Networks for Sequential Learning","authors":"Nitin Rathi;Kaushik Roy","doi":"10.1109/TCDS.2024.3396431","DOIUrl":"10.1109/TCDS.2024.3396431","url":null,"abstract":"Spiking neural networks (SNNs) are gaining popularity for their promise of low-power machine intelligence on event-driven neuromorphic hardware. SNNs have achieved comparable performance as artificial neural networks (ANNs) on static tasks (image classification) with lower compute energy. In this work, we explore the inherent dynamics of SNNs for sequential tasks such as gesture recognition, sentiment analysis, and sequence-to-sequence learning on data from dynamic vision sensors (DVSs) and natural language processing (NLP). Sequential data are generally processed with complex recurrent neural networks (RNNs) [long short-term memory/gated recurrent unit (LSTM/GRU)] with explicit feedback connections and internal states to handle the long-term dependencies. The neuron models in SNNs—integrate-and-fire (IF) or leaky-integrate-and-fire (LIF)—have internal states (membrane potential) that can be efficiently leveraged for sequential tasks. The membrane potential in the IF/LIF neuron integrates the incoming current and outputs an event (or spike) when the potential crosses a threshold value. Since SNNs compute with highly sparse spike-based spatiotemporal data, the energy/inference is lower than LSTMs/GRUs. We also show that SNNs require fewer parameters than LSTM/GRU resulting in smaller models and faster inference. We observe the problem of vanishing gradients in vanilla SNNs for longer sequences and implement a convolutional SNN with attention layers to perform sequence-to-sequence learning tasks. The inherent recurrence in SNNs, in addition to the fully parallelized convolutional operations, provide additional mechanisms to model sequential dependencies that lead to better accuracy than convolutional neural networks (CNNs) with ReLU activations. We evaluate SNN on gesture recognition from the IBM DVS dataset, sentiment analysis from the IMDB movie reviews dataset, and German-to-English translation from the Multi30k dataset.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1905-1914"},"PeriodicalIF":5.0,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine Unlearning for Seizure Prediction 用于癫痫发作预测的机器学习

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-01 DOI: 10.1109/TCDS.2024.3395663

Chenghao Shao;Chang Li;Rencheng Song;Xiang Liu;Ruobing Qian;Xun Chen

In recent years, companies and organizations have been required to provide individuals with the right to be forgotten to alleviate privacy concerns. In machine learning, this requires researchers not only to delete data from databases but also to remove data information from trained models. Thus, machine unlearning is becoming an emerging research problem. In seizure prediction field, prediction applications are established most on private electroencephalogram (EEG) signals. To provide the right to be forgotten, we propose a machine unlearning method for seizure prediction. Our proposed unlearning method is based on knowledge distillation using two teacher models to guide the student model toward achieving model-level unlearning objective. One teacher model is used to induce the student model to forget data information of patients with unlearning request (forgetting patients), while the other teacher model is used to enable the student model to retain data information of other patients (remaining patients). Experiments were conducted on CHBMIT and Kaggle databases. Results show that our proposed unlearning method can effectively make trained ML models forget the information of forgetting patients and maintain satisfactory performance on remaining patients. To the best of our knowledge, it is the first work of machine unlearning in seizure prediction field.

近年来，公司和组织被要求向个人提供被遗忘的权利，以减轻对隐私的担忧。在机器学习中，这不仅需要研究人员从数据库中删除数据，还需要从训练过的模型中删除数据信息。因此，机器学习正在成为一个新兴的研究问题。在癫痫发作预测领域，预测应用多建立在私人脑电图信号上。为了提供被遗忘的权利，我们提出了一种用于癫痫发作预测的机器学习方法。我们提出了一种基于知识蒸馏的学习方法，使用两个教师模型来引导学生模型实现模型级的学习目标。一个教师模型用于诱导学生模型忘记有遗忘请求的患者（遗忘患者）的数据信息，另一个教师模型用于使学生模型保留其他患者（剩余患者）的数据信息。实验在CHBMIT和Kaggle数据库上进行。结果表明，我们提出的学习方法可以有效地使训练后的ML模型忘记遗忘患者的信息，并对剩余患者保持满意的表现。据我们所知，这是机器学习在癫痫预测领域的第一次工作。

{"title":"Machine Unlearning for Seizure Prediction","authors":"Chenghao Shao;Chang Li;Rencheng Song;Xiang Liu;Ruobing Qian;Xun Chen","doi":"10.1109/TCDS.2024.3395663","DOIUrl":"10.1109/TCDS.2024.3395663","url":null,"abstract":"In recent years, companies and organizations have been required to provide individuals with the right to be forgotten to alleviate privacy concerns. In machine learning, this requires researchers not only to delete data from databases but also to remove data information from trained models. Thus, machine unlearning is becoming an emerging research problem. In seizure prediction field, prediction applications are established most on private electroencephalogram (EEG) signals. To provide the right to be forgotten, we propose a machine unlearning method for seizure prediction. Our proposed unlearning method is based on knowledge distillation using two teacher models to guide the student model toward achieving model-level unlearning objective. One teacher model is used to induce the student model to forget data information of patients with unlearning request (forgetting patients), while the other teacher model is used to enable the student model to retain data information of other patients (remaining patients). Experiments were conducted on CHBMIT and Kaggle databases. Results show that our proposed unlearning method can effectively make trained ML models forget the information of forgetting patients and maintain satisfactory performance on remaining patients. To the best of our knowledge, it is the first work of machine unlearning in seizure prediction field.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1969-1981"},"PeriodicalIF":5.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Perception-Based Visual Simultaneous Localization and Tracking in Dynamic Environments 动态环境中基于感知的稳健视觉同步定位与跟踪

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-02-28 DOI: 10.1109/TCDS.2024.3371073

Song Peng;Teng Ran;Liang Yuan;Jianbo Zhang;Wendong Xiao

Visual simultaneous localization and mapping (SLAM) in dynamic scenes is a prerequisite for robot-related applications. Most of the existing SLAM algorithms mainly focus on dynamic object rejection, which makes part of the valuable information lost and prone to failure in complex environments. This article proposes a semantic visual SLAM system that incorporates rigid object tracking. A robust scene perception frame is designed, which gives autonomous robots the ability to perceive scenes similar to human cognition. Specifically, we propose a two-stage mask revision method to generate fine mask of the object. Based on the revised mask, we propose a semantic and geometric constraint (SAG) strategy, which provides a fast and robust way to perceive dynamic rigid objects. Then, the motion tracking of rigid objects is integrated into the SLAM pipeline, and a novel bundle adjustment is constructed to optimize camera localization and object six-degree of freedom (DoF) poses. Finally, the evaluation of the proposed algorithm is performed on publicly available KITTI dataset, Oxford Multimotion dataset, and real-world scenarios. The proposed algorithm achieves the comprehensive performance of

$text{RPE}_{text{t}}$

less than 0.07 m per frame and

$text{RPE}_{text{R}}$

about 0.03

${}^{circ}$

per frame in the KITTI dataset. The experimental results reveal that the proposed algorithm enables accurate localization and robust tracking than state-of-the-art SLAM algorithms in challenging dynamic scenarios.

动态场景中的视觉同步定位与映射（SLAM）是机器人相关应用的先决条件。现有的大多数 SLAM 算法主要集中在动态物体剔除上，这使得部分有价值的信息丢失，在复杂环境中容易失效。本文提出了一种结合刚性物体跟踪的语义视觉 SLAM 系统。我们设计了一个稳健的场景感知框架，使自主机器人具有与人类认知类似的场景感知能力。具体来说，我们提出了一种两阶段遮罩修正方法，以生成物体的精细遮罩。基于修正后的遮罩，我们提出了一种语义和几何约束（SAG）策略，为感知动态刚性物体提供了一种快速而稳健的方法。然后，将刚性物体的运动跟踪集成到 SLAM 管道中，并构建了一种新颖的捆绑调整，以优化相机定位和物体的六自由度 (DoF) 位置。最后，在公开的 KITTI 数据集、牛津 Multimotion 数据集和实际场景中对所提出的算法进行了评估。在 KITTI 数据集中，所提算法实现了每帧小于 0.07 米、每帧约 0.03 美元{}^{circ}$的综合性能。实验结果表明，与最先进的 SLAM 算法相比，所提出的算法能够在具有挑战性的动态场景中实现精确定位和稳健跟踪。

{"title":"Robust Perception-Based Visual Simultaneous Localization and Tracking in Dynamic Environments","authors":"Song Peng;Teng Ran;Liang Yuan;Jianbo Zhang;Wendong Xiao","doi":"10.1109/TCDS.2024.3371073","DOIUrl":"10.1109/TCDS.2024.3371073","url":null,"abstract":"Visual simultaneous localization and mapping (SLAM) in dynamic scenes is a prerequisite for robot-related applications. Most of the existing SLAM algorithms mainly focus on dynamic object rejection, which makes part of the valuable information lost and prone to failure in complex environments. This article proposes a semantic visual SLAM system that incorporates rigid object tracking. A robust scene perception frame is designed, which gives autonomous robots the ability to perceive scenes similar to human cognition. Specifically, we propose a two-stage mask revision method to generate fine mask of the object. Based on the revised mask, we propose a semantic and geometric constraint (SAG) strategy, which provides a fast and robust way to perceive dynamic rigid objects. Then, the motion tracking of rigid objects is integrated into the SLAM pipeline, and a novel bundle adjustment is constructed to optimize camera localization and object six-degree of freedom (DoF) poses. Finally, the evaluation of the proposed algorithm is performed on publicly available KITTI dataset, Oxford Multimotion dataset, and real-world scenarios. The proposed algorithm achieves the comprehensive performance of \u0000<inline-formula><tex-math>$text{RPE}_{text{t}}$</tex-math></inline-formula>\u0000 less than 0.07 m per frame and \u0000<inline-formula><tex-math>$text{RPE}_{text{R}}$</tex-math></inline-formula>\u0000 about 0.03\u0000<inline-formula><tex-math>${}^{circ}$</tex-math></inline-formula>\u0000 per frame in the KITTI dataset. The experimental results reveal that the proposed algorithm enables accurate localization and robust tracking than state-of-the-art SLAM algorithms in challenging dynamic scenarios.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1507-1520"},"PeriodicalIF":5.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140002820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Brain Connectivity Analysis for EEG-Based Face Perception Task 基于脑电图的人脸感知任务的大脑连接性分析

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-02-27 DOI: 10.1109/TCDS.2024.3370635

Debashis Das Chakladar;Nikhil R. Pal

Face perception is considered a highly developed visual recognition skill in human beings. Most face perception studies used functional magnetic resonance imaging to identify different brain cortices related to face perception. However, studying brain connectivity networks for face perception using electroencephalography (EEG) has not yet been done. In the proposed framework, initially, a correlation-tree traversal-based channel selection algorithm is developed to identify the “optimum” EEG channels by removing the highly correlated EEG channels from the input channel set. Next, the effective brain connectivity network among those “optimum” EEG channels is developed using multivariate transfer entropy (TE) while participants watched different face stimuli (i.e., famous, unfamiliar, and scrambled). We transform EEG channels into corresponding brain regions for generalization purposes and identify the active brain regions for each face stimulus. To find the stimuluswise brain dynamics, the information transfer among the identified brain regions is estimated using several graphical measures [global efficiency (GE) and transitivity]. Our model archives the mean GE of 0.800, 0.695, and 0.581 for famous, unfamiliar, and scrambled faces, respectively. Identifying face perception-specific brain regions will enhance understanding of the EEG-based face-processing system. Understanding the brain networks of famous, unfamiliar, and scrambled faces can be useful in criminal investigation applications.

人脸感知被认为是人类高度发达的视觉识别技能。大多数人脸感知研究都使用功能性磁共振成像来识别与人脸感知相关的不同大脑皮层。然而，利用脑电图（EEG）研究人脸感知的大脑连接网络的工作尚未开展。在提议的框架中，首先开发了一种基于相关树遍历的通道选择算法，通过从输入通道集中剔除高度相关的脑电图通道来识别 "最佳 "脑电图通道。接着，在参与者观看不同的人脸刺激（即著名的、陌生的和乱码的）时，使用多变量转移熵（TE）在这些 "最佳 "脑电图通道中建立有效的大脑连接网络。我们将脑电图通道转换为相应的脑区，以达到概括的目的，并识别出每个人脸刺激的活跃脑区。为了找到刺激时的大脑动态，我们使用几种图形测量方法（全局效率（GE）和传递性）来估算已识别脑区之间的信息传递。我们的模型得出，著名人脸、陌生人脸和乱码人脸的平均 GE 分别为 0.800、0.695 和 0.581。识别人脸感知的特定脑区将加深对基于脑电图的人脸处理系统的理解。了解著名人脸、陌生人脸和乱码人脸的大脑网络有助于刑事调查应用。

{"title":"Brain Connectivity Analysis for EEG-Based Face Perception Task","authors":"Debashis Das Chakladar;Nikhil R. Pal","doi":"10.1109/TCDS.2024.3370635","DOIUrl":"10.1109/TCDS.2024.3370635","url":null,"abstract":"Face perception is considered a highly developed visual recognition skill in human beings. Most face perception studies used functional magnetic resonance imaging to identify different brain cortices related to face perception. However, studying brain connectivity networks for face perception using electroencephalography (EEG) has not yet been done. In the proposed framework, initially, a correlation-tree traversal-based channel selection algorithm is developed to identify the “optimum” EEG channels by removing the highly correlated EEG channels from the input channel set. Next, the effective brain connectivity network among those “optimum” EEG channels is developed using multivariate transfer entropy (TE) while participants watched different face stimuli (i.e., famous, unfamiliar, and scrambled). We transform EEG channels into corresponding brain regions for generalization purposes and identify the active brain regions for each face stimulus. To find the stimuluswise brain dynamics, the information transfer among the identified brain regions is estimated using several graphical measures [global efficiency (GE) and transitivity]. Our model archives the mean GE of 0.800, 0.695, and 0.581 for famous, unfamiliar, and scrambled faces, respectively. Identifying face perception-specific brain regions will enhance understanding of the EEG-based face-processing system. Understanding the brain networks of famous, unfamiliar, and scrambled faces can be useful in criminal investigation applications.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1494-1506"},"PeriodicalIF":5.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140002461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

D-FaST: Cognitive Signal Decoding With Disentangled Frequency–Spatial–Temporal Attention D-FaST：频率-空间-时间注意力分离的认知信号解码

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-02-26 DOI: 10.1109/TCDS.2024.3370261

WeiGuo Chen;Changjian Wang;Kele Xu;Yuan Yuan;Yanru Bai;Dongsong Zhang

Cognitive language processing (CLP), situated at the intersection of natural language processing (NLP) and cognitive science, plays a progressively pivotal role in the domains of artificial intelligence, cognitive intelligence, and brain science. Among the essential areas of investigation in CLP, cognitive signal decoding (CSD) has made remarkable achievements, yet there still exist challenges related to insufficient global dynamic representation capability and deficiencies in multidomain feature integration. In this article, we introduce a novel paradigm for CLP referred to as disentangled frequency–spatial–temporal attention (D-FaST). Specifically, we present a novel cognitive signal decoder that operates on disentangled frequency–space–time domain attention. This decoder encompasses three key components: frequency domain feature extraction employing multiview attention (MVA), spatial domain feature extraction utilizing dynamic brain connection graph attention, and temporal feature extraction relying on local time sliding window attention. These components are integrated within a novel disentangled framework. Additionally, to encourage advancements in this field, we have created a new CLP dataset, MNRED. Subsequently, we conducted an extensive series of experiments, evaluating D-FaST's performance on MNRED, as well as on publicly available datasets including ZuCo, BCIC IV-2A, and BCIC IV-2B. Our experimental results demonstrate that D-FaST outperforms existing methods significantly on both our datasets and traditional CSD datasets including establishing a state-of-the-art accuracy score 78.72% on MNRED, pushing the accuracy score on ZuCo to 78.35%, accuracy score on BCIC IV-2A to 74.85%, and accuracy score on BCIC IV-2B to 76.81%.

认知语言处理（CLP）是自然语言处理（NLP）和认知科学的交叉学科，在人工智能、认知智能和脑科学领域发挥着举足轻重的作用。在认知语言处理的重要研究领域中，认知信号解码（CSD）已经取得了令人瞩目的成就，但仍然存在全局动态表征能力不足和多域特征整合方面的缺陷等挑战。在本文中，我们介绍了一种新颖的中长期语言学习范式，即频率-空间-时间分离注意力（D-FaST）。具体来说，我们提出了一种新型认知信号解码器，该解码器可在频率-空间-时间分离域注意力上运行。该解码器包括三个关键部分：采用多视角注意力（MVA）的频域特征提取、利用动态脑连接图注意力的空间域特征提取，以及依靠局部时间滑动窗口注意力的时间特征提取。这些部分被整合到一个新颖的分离框架中。此外，为了鼓励这一领域的进步，我们还创建了一个新的 CLP 数据集 MNRED。随后，我们进行了一系列广泛的实验，评估了 D-FaST 在 MNRED 以及 ZuCo、BCIC IV-2A 和 BCIC IV-2B 等公开数据集上的性能。我们的实验结果表明，D-FaST 在我们的数据集和传统 CSD 数据集上的表现都明显优于现有方法，包括在 MNRED 上获得 78.72% 的最高准确率，将 ZuCo 的准确率推高到 78.35%，将 BCIC IV-2A 的准确率推高到 74.85%，将 BCIC IV-2B 的准确率推高到 76.81%。

{"title":"D-FaST: Cognitive Signal Decoding With Disentangled Frequency–Spatial–Temporal Attention","authors":"WeiGuo Chen;Changjian Wang;Kele Xu;Yuan Yuan;Yanru Bai;Dongsong Zhang","doi":"10.1109/TCDS.2024.3370261","DOIUrl":"10.1109/TCDS.2024.3370261","url":null,"abstract":"Cognitive language processing (CLP), situated at the intersection of natural language processing (NLP) and cognitive science, plays a progressively pivotal role in the domains of artificial intelligence, cognitive intelligence, and brain science. Among the essential areas of investigation in CLP, cognitive signal decoding (CSD) has made remarkable achievements, yet there still exist challenges related to insufficient global dynamic representation capability and deficiencies in multidomain feature integration. In this article, we introduce a novel paradigm for CLP referred to as disentangled frequency–spatial–temporal attention (D-FaST). Specifically, we present a novel cognitive signal decoder that operates on disentangled frequency–space–time domain attention. This decoder encompasses three key components: frequency domain feature extraction employing multiview attention (MVA), spatial domain feature extraction utilizing dynamic brain connection graph attention, and temporal feature extraction relying on local time sliding window attention. These components are integrated within a novel disentangled framework. Additionally, to encourage advancements in this field, we have created a new CLP dataset, MNRED. Subsequently, we conducted an extensive series of experiments, evaluating D-FaST's performance on MNRED, as well as on publicly available datasets including ZuCo, BCIC IV-2A, and BCIC IV-2B. Our experimental results demonstrate that D-FaST outperforms existing methods significantly on both our datasets and traditional CSD datasets including establishing a state-of-the-art accuracy score 78.72% on MNRED, pushing the accuracy score on ZuCo to 78.35%, accuracy score on BCIC IV-2A to 74.85%, and accuracy score on BCIC IV-2B to 76.81%.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1476-1493"},"PeriodicalIF":5.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139979066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DTCM: Deep Transformer Capsule Mutual Distillation for Multivariate Time Series Classification DTCM：用于多变量时间序列分类的深度变压器胶囊互馏法

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-02-26 DOI: 10.1109/TCDS.2024.3370219

Zhiwen Xiao;Xin Xu;Huanlai Xing;Bowen Zhao;Xinhan Wang;Fuhong Song;Rong Qu;Li Feng

This article proposes a dual-network-based feature extractor, perceptive capsule network (PCapN), for multivariate time series classification (MTSC), including a local feature network (LFN) and a global relation network (GRN). The LFN has two heads (i.e., Head_A and Head_B), each containing two squash convolutional neural network (CNN) blocks and one dynamic routing block to extract the local features from the data and mine the connections among them. The GRN consists of two capsule-based transformer blocks and one dynamic routing block to capture the global patterns of each variable and correlate the useful information of multiple variables. Unfortunately, it is difficult to directly deploy PCapN on mobile devices due to its strict requirement for computing resources. So, this article designs a lightweight capsule network (LCapN) to mimic the cumbersome PCapN. To promote knowledge transfer from PCapN to LCapN, this article proposes a deep transformer capsule mutual (DTCM) distillation method. It is targeted and offline, using one- and two-way operations to supervise the knowledge distillation (KD) process for the dual-network-based student and teacher models. Experimental results show that the proposed PCapN and DTCM achieve excellent performance on University of East Anglia 2018 (UEA2018) datasets regarding top-1 accuracy.

本文提出了一种基于双网络的特征提取器--感知胶囊网络（PCapN），用于多变量时间序列分类（MTSC），包括局部特征网络（LFN）和全局关系网络（GRN）。LFN 有两个头（即 Head_A 和 Head_B），每个头包含两个挤压卷积神经网络（CNN）块和一个动态路由块，用于从数据中提取局部特征并挖掘它们之间的联系。GRN 由两个基于胶囊的变压器块和一个动态路由块组成，用于捕捉每个变量的全局模式，并将多个变量的有用信息关联起来。遗憾的是，由于 PCapN 对计算资源的严格要求，很难在移动设备上直接部署。因此，本文设计了一种轻量级胶囊网络（LCapN）来模仿笨重的 PCapN。为了促进从 PCapN 到 LCapN 的知识转移，本文提出了一种深变换胶囊互（DTCM）提炼方法。它具有针对性和离线性，使用单向和双向操作来监督基于双网络的学生和教师模型的知识蒸馏（KD）过程。实验结果表明，在东英吉利大学2018（UEA2018）数据集上，所提出的PCapN和DTCM在top-1准确率方面取得了优异的表现。

{"title":"DTCM: Deep Transformer Capsule Mutual Distillation for Multivariate Time Series Classification","authors":"Zhiwen Xiao;Xin Xu;Huanlai Xing;Bowen Zhao;Xinhan Wang;Fuhong Song;Rong Qu;Li Feng","doi":"10.1109/TCDS.2024.3370219","DOIUrl":"10.1109/TCDS.2024.3370219","url":null,"abstract":"This article proposes a dual-network-based feature extractor, perceptive capsule network (PCapN), for multivariate time series classification (MTSC), including a local feature network (LFN) and a global relation network (GRN). The LFN has two heads (i.e., Head_A and Head_B), each containing two squash convolutional neural network (CNN) blocks and one dynamic routing block to extract the local features from the data and mine the connections among them. The GRN consists of two capsule-based transformer blocks and one dynamic routing block to capture the global patterns of each variable and correlate the useful information of multiple variables. Unfortunately, it is difficult to directly deploy PCapN on mobile devices due to its strict requirement for computing resources. So, this article designs a lightweight capsule network (LCapN) to mimic the cumbersome PCapN. To promote knowledge transfer from PCapN to LCapN, this article proposes a deep transformer capsule mutual (DTCM) distillation method. It is targeted and offline, using one- and two-way operations to supervise the knowledge distillation (KD) process for the dual-network-based student and teacher models. Experimental results show that the proposed PCapN and DTCM achieve excellent performance on University of East Anglia 2018 (UEA2018) datasets regarding top-1 accuracy.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1445-1461"},"PeriodicalIF":5.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139979417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Agree to Disagree: Exploring Partial Semantic Consistency Against Visual Deviation for Compositional Zero-Shot Learning 同意到不同意：探索部分语义一致性与视觉偏差对组合式零镜头学习的影响

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-02-20 DOI: 10.1109/TCDS.2024.3367957

Xiangyu Li;Xu Yang;Xi Wang;Cheng Deng

Compositional zero-shot learning (CZSL) aims to recognize novel concepts from known subconcepts. However, it is still challenging since the intricate interaction between subconcepts is entangled with their corresponding visual features, which affects the recognition accuracy of concepts. Besides, the domain gap between training and testing data leads to the model poor generalization. In this article, we tackle these problems by exploring partial semantic consistency (PSC) to eliminate visual deviation to guarantee the discrimination and generalization of representations. Considering the complicated interaction between subconcepts and their visual features, we decompose seen images into visual elements according to their labels and obtain the instance-level subdeviations from compositions, which is utilized to excavate the category-level primitives of subconcepts. Furthermore, we present a multiscale concept composition (MSCC) approach to produce virtual samples from two aspects, which augments the sufficiency and diversity of samples so that the proposed model can generalize to novel compositions. Extensive experiments indicate that our method significantly outperforms the state-of-the-art approaches on three benchmark datasets.

构图零点学习（CZSL）旨在从已知的子概念中识别新概念。然而，由于子概念之间错综复杂的互动关系与其相应的视觉特征纠缠在一起，影响了概念的识别准确性，因此它仍然具有挑战性。此外，训练数据和测试数据之间的领域差距也会导致模型的泛化能力较差。本文针对这些问题，通过探索部分语义一致性（PSC）来消除视觉偏差，从而保证表征的识别和泛化。考虑到子概念与其视觉特征之间复杂的相互作用，我们根据标签将所见的图像分解为视觉元素，并从合成中获得实例级的子偏差，从而挖掘出子概念的类别级基元。此外，我们还提出了一种多尺度概念合成（MSCC）方法，从两个方面生成虚拟样本，从而提高样本的充足性和多样性，使所提出的模型能够泛化到新的合成中。广泛的实验表明，在三个基准数据集上，我们的方法明显优于最先进的方法。

{"title":"Agree to Disagree: Exploring Partial Semantic Consistency Against Visual Deviation for Compositional Zero-Shot Learning","authors":"Xiangyu Li;Xu Yang;Xi Wang;Cheng Deng","doi":"10.1109/TCDS.2024.3367957","DOIUrl":"10.1109/TCDS.2024.3367957","url":null,"abstract":"Compositional zero-shot learning (CZSL) aims to recognize novel concepts from known subconcepts. However, it is still challenging since the intricate interaction between subconcepts is entangled with their corresponding visual features, which affects the recognition accuracy of concepts. Besides, the domain gap between training and testing data leads to the model poor generalization. In this article, we tackle these problems by exploring partial semantic consistency (PSC) to eliminate visual deviation to guarantee the discrimination and generalization of representations. Considering the complicated interaction between subconcepts and their visual features, we decompose seen images into visual elements according to their labels and obtain the instance-level subdeviations from compositions, which is utilized to excavate the category-level primitives of subconcepts. Furthermore, we present a multiscale concept composition (MSCC) approach to produce virtual samples from two aspects, which augments the sufficiency and diversity of samples so that the proposed model can generalize to novel compositions. Extensive experiments indicate that our method significantly outperforms the state-of-the-art approaches on three benchmark datasets.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1433-1444"},"PeriodicalIF":5.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Compressed Video Anomaly Detection of Human Behavior Based on Abnormal Region Determination 基于异常区域判定的人类行为压缩视频异常检测

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-02-20 DOI: 10.1109/TCDS.2024.3367493

Lijun He;Miao Zhang;Hao Liu;Liejun Wang;Fan Li

Video anomaly detection has a wide range of applications in video monitoring-related scenarios. The existing image-domain-based anomaly detection algorithms usually require completely decoding the received videos, complex information extraction, and network structure, which makes them difficult to be implemented directly. In this article, we focus on anomaly detection directly for compressed videos. The compressed videos need not be fully decoded and auxiliary information can be obtained directly, which have low computational complexity. We propose a compressed video anomaly detection algorithm based on accurate abnormal region determination (ARD-VAD), which is suitable to be deployed on edge servers. First, to ensure the overall low complexity and save storage space, we sparsely sample the prior knowledge of I-frame representing the appearance information and motion vector (MV) representing the motion information from compressed videos. Based on the sampled information, a two-branch network structure, which consists of MV reconstruction branch and future I-frame prediction branch, is designed. Specifically, the two branches are connected by an attention network based on the MV residuals to guide the prediction network to focus on the abnormal regions. Furthermore, to emphasize the abnormal regions, we develop an adaptive sensing of abnormal regions determination module based on motion intensity represented by the second derivative of MV. This module can enhance the difference of the real anomaly region between the generated frame and the current frame. The experiments show that our algorithm can achieve a good balance between performance and complexity.

视频异常检测在视频监控相关场景中有着广泛的应用。现有的基于图像域的异常检测算法通常需要对接收到的视频进行完全解码、复杂的信息提取和网络结构，因此难以直接实现。在本文中，我们将重点关注直接针对压缩视频的异常检测。压缩视频无需完全解码，可直接获取辅助信息，计算复杂度低。我们提出了一种基于精确异常区域判定（ARD-VAD）的压缩视频异常检测算法，适合部署在边缘服务器上。首先，为了确保整体的低复杂度并节省存储空间，我们对压缩视频中代表外观信息的 I 帧和代表运动信息的运动矢量（MV）的先验知识进行稀疏采样。根据采样信息，我们设计了一个由 MV 重建分支和未来 I 帧预测分支组成的双分支网络结构。具体来说，这两个分支由一个基于 MV 残差的注意力网络连接，以引导预测网络关注异常区域。此外，为了突出异常区域，我们开发了一个基于 MV 二次导数所代表的运动强度的自适应异常区域感知确定模块。该模块可以增强生成帧与当前帧之间真实异常区域的差异。实验表明，我们的算法可以在性能和复杂性之间取得良好的平衡。

{"title":"Compressed Video Anomaly Detection of Human Behavior Based on Abnormal Region Determination","authors":"Lijun He;Miao Zhang;Hao Liu;Liejun Wang;Fan Li","doi":"10.1109/TCDS.2024.3367493","DOIUrl":"10.1109/TCDS.2024.3367493","url":null,"abstract":"Video anomaly detection has a wide range of applications in video monitoring-related scenarios. The existing image-domain-based anomaly detection algorithms usually require completely decoding the received videos, complex information extraction, and network structure, which makes them difficult to be implemented directly. In this article, we focus on anomaly detection directly for compressed videos. The compressed videos need not be fully decoded and auxiliary information can be obtained directly, which have low computational complexity. We propose a compressed video anomaly detection algorithm based on accurate abnormal region determination (ARD-VAD), which is suitable to be deployed on edge servers. First, to ensure the overall low complexity and save storage space, we sparsely sample the prior knowledge of I-frame representing the appearance information and motion vector (MV) representing the motion information from compressed videos. Based on the sampled information, a two-branch network structure, which consists of MV reconstruction branch and future I-frame prediction branch, is designed. Specifically, the two branches are connected by an attention network based on the MV residuals to guide the prediction network to focus on the abnormal regions. Furthermore, to emphasize the abnormal regions, we develop an adaptive sensing of abnormal regions determination module based on motion intensity represented by the second derivative of MV. This module can enhance the difference of the real anomaly region between the generated frame and the current frame. The experiments show that our algorithm can achieve a good balance between performance and complexity.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1462-1475"},"PeriodicalIF":5.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Reinforcement Learning With Multicritic TD3 for Decentralized Multirobot Path Planning 利用多批判 TD3 进行深度强化学习，实现分散式多机器人路径规划

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-02-20 DOI: 10.1109/TCDS.2024.3368055

Heqing Yin;Chang Wang;Chao Yan;Xiaojia Xiang;Boliang Cai;Changyun Wei

Centralized multirobot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multirobot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multicritic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original gate recurrent unit (GRU)-attention-based TD3 algorithm by incorporating a multicritic network and employing an asynchronous training mechanism. By training each critic with a unique reward function, our learned policy enables each robot to navigate toward its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots. Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.

集中式多机器人路径规划是一种普遍的方法，涉及一个全局规划器，利用共享信息为每个机器人计算可行路径。然而，这种方法受到通信限制和计算复杂性的制约。为了应对这些挑战，我们引入了一种新颖的分散式多机器人路径规划方法，无需共享机器人的状态和意图。我们的方法利用深度强化学习，采用异步多批判孪生延迟深度确定性策略梯度（AMC-TD3）算法，通过纳入多批判网络和采用异步训练机制，增强了原有的基于门递归单元（GRU）-注意力的 TD3 算法。通过用独特的奖励函数训练每个批判者，我们学习到的策略能让每个机器人在复杂环境中朝着自己的长期目标航行，而不会与其他机器人发生碰撞。此外，我们的奖励函数以社会规范为基础，能让机器人在拥挤的情况下自然地避开对方。具体来说，我们训练了三个批评者来鼓励每个机器人实现其长期导航目标，保持其移动方向，并防止与其他机器人发生碰撞。我们的模型可以学习端到端的导航策略，而无需依赖精确的地图或任何定位信息，因此能高度适应各种环境。仿真结果表明，我们提出的方法在具有不同复杂程度和机器人数量的若干环境中超越了基线方法。

{"title":"Deep Reinforcement Learning With Multicritic TD3 for Decentralized Multirobot Path Planning","authors":"Heqing Yin;Chang Wang;Chao Yan;Xiaojia Xiang;Boliang Cai;Changyun Wei","doi":"10.1109/TCDS.2024.3368055","DOIUrl":"10.1109/TCDS.2024.3368055","url":null,"abstract":"Centralized multirobot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multirobot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multicritic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original gate recurrent unit (GRU)-attention-based TD3 algorithm by incorporating a multicritic network and employing an asynchronous training mechanism. By training each critic with a unique reward function, our learned policy enables each robot to navigate toward its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots. Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1233-1247"},"PeriodicalIF":5.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0