Pub Date : 2024-04-01DOI: 10.1109/TCDS.2024.3383158
Boyu Li;Haoran Li;Yuanheng Zhu;Dongbin Zhao
Agent-agnostic reinforcement learning aims to learn a universal control policy that can simultaneously control a set of robots with different morphologies. Recent studies have suggested that using the transformer model can address variations in state and action spaces caused by different morphologies, and morphology information is necessary to improve policy performance. However, existing methods have limitations in exploiting morphological information, where the rationality of observation integration cannot be guaranteed. We propose morphological adaptive transformer (MAT), a transformer-based universal control algorithm that can adapt to various morphologies without any modifications. MAT includes two essential components: functional position encoding (FPE) and morphological attention mechanism (MAM). The FPE provides robust and consistent positional prior information for limb observation to avoid limb confusion and implicitly obtain functional descriptions of limbs. The MAM enhances the attribute prior information of limbs, improves the correlation between observations, and makes the policy pay attention to more limbs. We combine observation with prior information to help policy adapt to the morphology of robots, thereby optimizing its performance with unknown morphologies. Experiments on agent-agnostic tasks in Gym MuJoCo environment demonstrate that our algorithm can assign more reasonable morphological prior information to each limb, and the performance of our algorithm is comparable to the prior state-of-the-art algorithm with better generalization.
{"title":"MAT: Morphological Adaptive Transformer for Universal Morphology Policy Learning","authors":"Boyu Li;Haoran Li;Yuanheng Zhu;Dongbin Zhao","doi":"10.1109/TCDS.2024.3383158","DOIUrl":"10.1109/TCDS.2024.3383158","url":null,"abstract":"Agent-agnostic reinforcement learning aims to learn a universal control policy that can simultaneously control a set of robots with different morphologies. Recent studies have suggested that using the transformer model can address variations in state and action spaces caused by different morphologies, and morphology information is necessary to improve policy performance. However, existing methods have limitations in exploiting morphological information, where the rationality of observation integration cannot be guaranteed. We propose morphological adaptive transformer (MAT), a transformer-based universal control algorithm that can adapt to various morphologies without any modifications. MAT includes two essential components: functional position encoding (FPE) and morphological attention mechanism (MAM). The FPE provides robust and consistent positional prior information for limb observation to avoid limb confusion and implicitly obtain functional descriptions of limbs. The MAM enhances the attribute prior information of limbs, improves the correlation between observations, and makes the policy pay attention to more limbs. We combine observation with prior information to help policy adapt to the morphology of robots, thereby optimizing its performance with unknown morphologies. Experiments on agent-agnostic tasks in Gym MuJoCo environment demonstrate that our algorithm can assign more reasonable morphological prior information to each limb, and the performance of our algorithm is comparable to the prior state-of-the-art algorithm with better generalization.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-29DOI: 10.1109/tcds.2024.3383296
Yuchen Yan, Haotian Su, Yunyi Jia
{"title":"Measuring Human Comfort in Human-Robot Collaboration via Wearable Sensing","authors":"Yuchen Yan, Haotian Su, Yunyi Jia","doi":"10.1109/tcds.2024.3383296","DOIUrl":"https://doi.org/10.1109/tcds.2024.3383296","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-26DOI: 10.1109/TCDS.2024.3382109
Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li
Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.
{"title":"Deep Neural Networks for Automatic Sleep Stage Classification and Consciousness Assessment in Patients With Disorder of Consciousness","authors":"Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li","doi":"10.1109/TCDS.2024.3382109","DOIUrl":"10.1109/TCDS.2024.3382109","url":null,"abstract":"Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140315148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-22DOI: 10.1109/TCDS.2024.3380907
Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li
Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.
{"title":"EventAugment: Learning Augmentation Policies From Asynchronous Event-Based Data","authors":"Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li","doi":"10.1109/TCDS.2024.3380907","DOIUrl":"10.1109/TCDS.2024.3380907","url":null,"abstract":"Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TCDS.2024.3376072
Reza Javanmard Alitappeh;Akhil John;Bernardo Dias;A. John van Opstal;Alexandre Bernardino
This article explores the application of model-based optimal control principles in understanding stereotyped human oculomotor behaviors. Using a realistic model of the human eye with a six-muscle cable-driven actuation system, we tackle the novel challenges of addressing a system with six degrees of freedom. We apply nonlinear optimal control techniques to optimize accuracy, energy, and duration of eye-movement trajectories. Employing a recurrent neural network to emulate system dynamics, we focus on generating rapid, unconstrained saccadic eye-movements. Remarkably, our model replicates realistic 3-D rotational kinematics and dynamics observed in human saccades, with the six cables organizing themselves into appropriate antagonistic muscle pairs, resembling the primate oculomotor system.
{"title":"Emergence of Human Oculomotor Behavior in a Cable-Driven Biomimetic Robotic Eye Using Optimal Control","authors":"Reza Javanmard Alitappeh;Akhil John;Bernardo Dias;A. John van Opstal;Alexandre Bernardino","doi":"10.1109/TCDS.2024.3376072","DOIUrl":"10.1109/TCDS.2024.3376072","url":null,"abstract":"This article explores the application of model-based optimal control principles in understanding stereotyped human oculomotor behaviors. Using a realistic model of the human eye with a six-muscle cable-driven actuation system, we tackle the novel challenges of addressing a system with six degrees of freedom. We apply nonlinear optimal control techniques to optimize accuracy, energy, and duration of eye-movement trajectories. Employing a recurrent neural network to emulate system dynamics, we focus on generating rapid, unconstrained saccadic eye-movements. Remarkably, our model replicates realistic 3-D rotational kinematics and dynamics observed in human saccades, with the six cables organizing themselves into appropriate antagonistic muscle pairs, resembling the primate oculomotor system.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474482","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TCDS.2024.3377445
Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge
This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).
{"title":"The Inadequacy of Reinforcement Learning From Human Feedback—Radicalizing Large Language Models via Semantic Vulnerabilities","authors":"Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge","doi":"10.1109/TCDS.2024.3377445","DOIUrl":"10.1109/TCDS.2024.3377445","url":null,"abstract":"This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TCDS.2024.3377642
Guang Han;Jianshu Ma;Ziyang Li;Haitao Zhao
With the development of transformer visual models, attention-based trackers have shown highly competitive performance in the field of object tracking. However, in some tracking scenarios, especially those with multiple similar objects, the performance of existing trackers is often not satisfactory. In order to improve the performance of trackers in such scenarios, inspired by the fovea vision structure and its visual characteristics, this article proposes a novel foveal vision tracker (FVT). FVT combines the process of human eye fixation and object tracking, pruning based on the distance to the object rather than attention scores. This pruning method allows the receptive field of the feature extraction network to focus on the object, excluding background interference. FVT divides the feature extraction network into two stages: local and global, and introduces the local recursive module (LRM) and the view elimination module (VEM). LRM is used to enhance foreground features in the local stage, while VEM generates circular fovea-like visual field masks in the global stage and prunes tokens outside the mask, guiding the model to focus attention on high-information regions of the object. Experimental results on multiple object tracking datasets demonstrate that the proposed FVT achieves stronger object discrimination capability in the feature extraction stage, improves tracking accuracy and robustness in complex scenes, and achieves a significant accuracy improvement with an area overlap (AO) of 72.6% on the generic object tracking (GOT)-10k dataset.
{"title":"A Two-Stage Foveal Vision Tracker Based on Transformer Model","authors":"Guang Han;Jianshu Ma;Ziyang Li;Haitao Zhao","doi":"10.1109/TCDS.2024.3377642","DOIUrl":"10.1109/TCDS.2024.3377642","url":null,"abstract":"With the development of transformer visual models, attention-based trackers have shown highly competitive performance in the field of object tracking. However, in some tracking scenarios, especially those with multiple similar objects, the performance of existing trackers is often not satisfactory. In order to improve the performance of trackers in such scenarios, inspired by the fovea vision structure and its visual characteristics, this article proposes a novel foveal vision tracker (FVT). FVT combines the process of human eye fixation and object tracking, pruning based on the distance to the object rather than attention scores. This pruning method allows the receptive field of the feature extraction network to focus on the object, excluding background interference. FVT divides the feature extraction network into two stages: local and global, and introduces the local recursive module (LRM) and the view elimination module (VEM). LRM is used to enhance foreground features in the local stage, while VEM generates circular fovea-like visual field masks in the global stage and prunes tokens outside the mask, guiding the model to focus attention on high-information regions of the object. Experimental results on multiple object tracking datasets demonstrate that the proposed FVT achieves stronger object discrimination capability in the feature extraction stage, improves tracking accuracy and robustness in complex scenes, and achieves a significant accuracy improvement with an area overlap (AO) of 72.6% on the generic object tracking (GOT)-10k dataset.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-14DOI: 10.1109/TCDS.2024.3375620
Hong You;Xian Zhong;Wenxuan Liu;Qi Wei;Wenxin Huang;Zhaofei Yu;Tiejun Huang
Spiking neural networks (SNNs) have garnered significant attention for their potential in ultralow-power event-driven neuromorphic hardware implementations. One effective strategy for obtaining SNNs involves the conversion of artificial neural networks (ANNs) to SNNs. However, existing research on ANN–SNN conversion has predominantly focused on image classification task, leaving the exploration of action recognition task limited. In this article, we investigate the performance degradation of SNNs on action recognition task. Through in-depth analysis, we propose a framework called scalable dual threshold mapping (SDM) that effectively overcomes three types of conversion errors. By effectively mitigating these conversion errors, we are able to reduce the time required for the spike firing rate of SNNs to align with the activation values of ANNs. Consequently, our method enables the generation of accurate and ultralow-latency SNNs. We conduct extensive evaluations on multiple action recognition datasets, including University of Central Florida (UCF)-101 and Human Motion DataBase (HMDB)-51. Through rigorous experiments and analysis, we demonstrate the effectiveness of our approach. Notably, SDM achieves a remarkable Top-1 accuracy of 92.94% on UCF-101 while requiring ultralow latency (four time steps), highlighting its high performance with reduced computational requirements.
{"title":"Converting Artificial Neural Networks to Ultralow-Latency Spiking Neural Networks for Action Recognition","authors":"Hong You;Xian Zhong;Wenxuan Liu;Qi Wei;Wenxin Huang;Zhaofei Yu;Tiejun Huang","doi":"10.1109/TCDS.2024.3375620","DOIUrl":"10.1109/TCDS.2024.3375620","url":null,"abstract":"Spiking neural networks (SNNs) have garnered significant attention for their potential in ultralow-power event-driven neuromorphic hardware implementations. One effective strategy for obtaining SNNs involves the conversion of artificial neural networks (ANNs) to SNNs. However, existing research on ANN–SNN conversion has predominantly focused on image classification task, leaving the exploration of action recognition task limited. In this article, we investigate the performance degradation of SNNs on action recognition task. Through in-depth analysis, we propose a framework called scalable dual threshold mapping (SDM) that effectively overcomes three types of conversion errors. By effectively mitigating these conversion errors, we are able to reduce the time required for the spike firing rate of SNNs to align with the activation values of ANNs. Consequently, our method enables the generation of accurate and ultralow-latency SNNs. We conduct extensive evaluations on multiple action recognition datasets, including University of Central Florida (UCF)-101 and Human Motion DataBase (HMDB)-51. Through rigorous experiments and analysis, we demonstrate the effectiveness of our approach. Notably, SDM achieves a remarkable Top-1 accuracy of 92.94% on UCF-101 while requiring ultralow latency (four time steps), highlighting its high performance with reduced computational requirements.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-28DOI: 10.1109/TCDS.2024.3371073
Song Peng;Teng Ran;Liang Yuan;Jianbo Zhang;Wendong Xiao
Visual simultaneous localization and mapping (SLAM) in dynamic scenes is a prerequisite for robot-related applications. Most of the existing SLAM algorithms mainly focus on dynamic object rejection, which makes part of the valuable information lost and prone to failure in complex environments. This article proposes a semantic visual SLAM system that incorporates rigid object tracking. A robust scene perception frame is designed, which gives autonomous robots the ability to perceive scenes similar to human cognition. Specifically, we propose a two-stage mask revision method to generate fine mask of the object. Based on the revised mask, we propose a semantic and geometric constraint (SAG) strategy, which provides a fast and robust way to perceive dynamic rigid objects. Then, the motion tracking of rigid objects is integrated into the SLAM pipeline, and a novel bundle adjustment is constructed to optimize camera localization and object six-degree of freedom (DoF) poses. Finally, the evaluation of the proposed algorithm is performed on publicly available KITTI dataset, Oxford Multimotion dataset, and real-world scenarios. The proposed algorithm achieves the comprehensive performance of $text{RPE}_{text{t}}$