The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.
{"title":"Attention Mechanism and Out-of-Distribution Data on Cross Language Image Matching for Weakly Supervised Semantic Segmentation","authors":"Chi-Chia Sun;Jing-Ming Guo;Chen-Hung Chung;Bo-Yu Chen","doi":"10.1109/TCDS.2024.3382914","DOIUrl":"10.1109/TCDS.2024.3382914","url":null,"abstract":"The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1604-1610"},"PeriodicalIF":5.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-02DOI: 10.1109/TCDS.2024.3383952
Sonal Kumar;Arijit Sur;Rashmi Dutta Baruah
Successive proposals of several self-supervised training schemes (STSs) continue to emerge, taking one step closer to developing a universal foundation model. In this process, unsupervised downstream tasks are recognized as one of the evaluation methods to validate the quality of visual features learned with self-supervised training. However, unsupervised dense semantic segmentation has yet to be explored as a downstream task, which can utilize and evaluate the quality of semantic information introduced in patch-level feature representations during self-supervised training of vision transformers. Therefore, we propose a novel data-driven framework, DatUS, to perform unsupervised dense semantic segmentation (DSS) as a downstream task. DatUS generates semantically consistent pseudosegmentation masks for an unlabeled image dataset without using visual prior or synchronized data. The experiment shows that the proposed framework achieves the highest MIoU (24.90) and average F1 score (36.3) by choosing DINOv2 and the highest pixel accuracy (62.18) by choosing DINO as the STS on the training set of SUIM dataset. It also outperforms state-of-the-art methods for the unsupervised DSS task with 15.02% MIoU, 21.47% pixel accuracy, and 16.06% average F1 score on the validation set of SUIM dataset. It achieves a competitive level of accuracy for a large-scale COCO dataset.
{"title":"DatUS: Data-Driven Unsupervised Semantic Segmentation With Pretrained Self-Supervised Vision Transformer","authors":"Sonal Kumar;Arijit Sur;Rashmi Dutta Baruah","doi":"10.1109/TCDS.2024.3383952","DOIUrl":"10.1109/TCDS.2024.3383952","url":null,"abstract":"Successive proposals of several self-supervised training schemes (STSs) continue to emerge, taking one step closer to developing a universal foundation model. In this process, unsupervised downstream tasks are recognized as one of the evaluation methods to validate the quality of visual features learned with self-supervised training. However, unsupervised dense semantic segmentation has yet to be explored as a downstream task, which can utilize and evaluate the quality of semantic information introduced in patch-level feature representations during self-supervised training of vision transformers. Therefore, we propose a novel data-driven framework, DatUS, to perform unsupervised dense semantic segmentation (DSS) as a downstream task. DatUS generates semantically consistent pseudosegmentation masks for an unlabeled image dataset without using visual prior or synchronized data. The experiment shows that the proposed framework achieves the highest MIoU (24.90) and average F1 score (36.3) by choosing DINOv2 and the highest pixel accuracy (62.18) by choosing DINO as the STS on the training set of SUIM dataset. It also outperforms state-of-the-art methods for the unsupervised DSS task with 15.02% MIoU, 21.47% pixel accuracy, and 16.06% average F1 score on the validation set of SUIM dataset. It achieves a competitive level of accuracy for a large-scale COCO dataset.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1775-1788"},"PeriodicalIF":5.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-02DOI: 10.1109/TCDS.2024.3384269
Yuqi Liu;Qichao Zhang;Yinfeng Gao;Dongbin Zhao
Learning an efficient and safe driving strategy in a traffic-heavy intersection scenario and generalizing it to different intersections remains a challenging task for autonomous driving. This is because there are differences in the structure of roads at different intersections, and autonomous vehicles need to generalize the strategies they have learned in the training environments. This requires the autonomous vehicle to capture not only the interactions between agents but also the relationships between agents and the map effectively. To address this challenge, we present a technique that integrates the information of high-definition (HD) maps and traffic participants into vector representations, called lane graph vectorization (LGV). In order to construct a driving policy for intersection navigation, we incorporate LGV into the twin-delayed deep deterministic policy gradient (TD3) algorithm with prioritized experience replay (PER). To train and validate the proposed algorithm, we construct a gym environment for intersection navigation within the high-fidelity CARLA simulator, integrating dense interactive traffic flow and various generalization test intersection scenarios. Experimental results demonstrate the effectiveness of LGV for intersection navigation tasks and outperform the state-of-the-art in our proposed scenarios.
{"title":"Deep-Reinforcement-Learning-Based Driving Policy at Intersections Utilizing Lane Graph Networks","authors":"Yuqi Liu;Qichao Zhang;Yinfeng Gao;Dongbin Zhao","doi":"10.1109/TCDS.2024.3384269","DOIUrl":"10.1109/TCDS.2024.3384269","url":null,"abstract":"Learning an efficient and safe driving strategy in a traffic-heavy intersection scenario and generalizing it to different intersections remains a challenging task for autonomous driving. This is because there are differences in the structure of roads at different intersections, and autonomous vehicles need to generalize the strategies they have learned in the training environments. This requires the autonomous vehicle to capture not only the interactions between agents but also the relationships between agents and the map effectively. To address this challenge, we present a technique that integrates the information of high-definition (HD) maps and traffic participants into vector representations, called lane graph vectorization (LGV). In order to construct a driving policy for intersection navigation, we incorporate LGV into the twin-delayed deep deterministic policy gradient (TD3) algorithm with prioritized experience replay (PER). To train and validate the proposed algorithm, we construct a gym environment for intersection navigation within the high-fidelity CARLA simulator, integrating dense interactive traffic flow and various generalization test intersection scenarios. Experimental results demonstrate the effectiveness of LGV for intersection navigation tasks and outperform the state-of-the-art in our proposed scenarios.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1759-1774"},"PeriodicalIF":5.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.1109/TCDS.2024.3383428
Yangfan Hu;Qian Zheng;Gang Pan
To address the energy bottleneck in deep neural networks (DNNs), the research community has developed binary neural networks (BNNs) and spiking neural networks (SNNs) from different perspectives. To combine the advantages of both BNNs and SNNs for better energy efficiency, this article proposes BitSNNs, which leverage binary weights, single-step inference, and activation sparsity. During the development of BitSNNs, we observed performance degradation in deep ResNets due to the gradient approximation error. To mitigate this issue, we delve into the learning process and propose the utilization of a hardtanh function before activation binarization. Additionally, this article investigates the critical role of activation sparsity in BitSNNs for energy efficiency, a topic often overlooked in the existing literature. Our study reveals strategies to strike a balance between accuracy and energy consumption during the training/testing stage, potentially benefiting applications in edge computing. Notably, our proposed method achieves state-of-the-art performance while significantly reducing energy consumption.
{"title":"BitSNNs: Revisiting Energy-Efficient Spiking Neural Networks","authors":"Yangfan Hu;Qian Zheng;Gang Pan","doi":"10.1109/TCDS.2024.3383428","DOIUrl":"10.1109/TCDS.2024.3383428","url":null,"abstract":"To address the energy bottleneck in deep neural networks (DNNs), the research community has developed binary neural networks (BNNs) and spiking neural networks (SNNs) from different perspectives. To combine the advantages of both BNNs and SNNs for better energy efficiency, this article proposes BitSNNs, which leverage binary weights, single-step inference, and activation sparsity. During the development of BitSNNs, we observed performance degradation in deep ResNets due to the gradient approximation error. To mitigate this issue, we delve into the learning process and propose the utilization of a hardtanh function before activation binarization. Additionally, this article investigates the critical role of activation sparsity in BitSNNs for energy efficiency, a topic often overlooked in the existing literature. Our study reveals strategies to strike a balance between accuracy and energy consumption during the training/testing stage, potentially benefiting applications in edge computing. Notably, our proposed method achieves state-of-the-art performance while significantly reducing energy consumption.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1736-1747"},"PeriodicalIF":5.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.1109/TCDS.2024.3383158
Boyu Li;Haoran Li;Yuanheng Zhu;Dongbin Zhao
Agent-agnostic reinforcement learning aims to learn a universal control policy that can simultaneously control a set of robots with different morphologies. Recent studies have suggested that using the transformer model can address variations in state and action spaces caused by different morphologies, and morphology information is necessary to improve policy performance. However, existing methods have limitations in exploiting morphological information, where the rationality of observation integration cannot be guaranteed. We propose morphological adaptive transformer (MAT), a transformer-based universal control algorithm that can adapt to various morphologies without any modifications. MAT includes two essential components: functional position encoding (FPE) and morphological attention mechanism (MAM). The FPE provides robust and consistent positional prior information for limb observation to avoid limb confusion and implicitly obtain functional descriptions of limbs. The MAM enhances the attribute prior information of limbs, improves the correlation between observations, and makes the policy pay attention to more limbs. We combine observation with prior information to help policy adapt to the morphology of robots, thereby optimizing its performance with unknown morphologies. Experiments on agent-agnostic tasks in Gym MuJoCo environment demonstrate that our algorithm can assign more reasonable morphological prior information to each limb, and the performance of our algorithm is comparable to the prior state-of-the-art algorithm with better generalization.
{"title":"MAT: Morphological Adaptive Transformer for Universal Morphology Policy Learning","authors":"Boyu Li;Haoran Li;Yuanheng Zhu;Dongbin Zhao","doi":"10.1109/TCDS.2024.3383158","DOIUrl":"10.1109/TCDS.2024.3383158","url":null,"abstract":"Agent-agnostic reinforcement learning aims to learn a universal control policy that can simultaneously control a set of robots with different morphologies. Recent studies have suggested that using the transformer model can address variations in state and action spaces caused by different morphologies, and morphology information is necessary to improve policy performance. However, existing methods have limitations in exploiting morphological information, where the rationality of observation integration cannot be guaranteed. We propose morphological adaptive transformer (MAT), a transformer-based universal control algorithm that can adapt to various morphologies without any modifications. MAT includes two essential components: functional position encoding (FPE) and morphological attention mechanism (MAM). The FPE provides robust and consistent positional prior information for limb observation to avoid limb confusion and implicitly obtain functional descriptions of limbs. The MAM enhances the attribute prior information of limbs, improves the correlation between observations, and makes the policy pay attention to more limbs. We combine observation with prior information to help policy adapt to the morphology of robots, thereby optimizing its performance with unknown morphologies. Experiments on agent-agnostic tasks in Gym MuJoCo environment demonstrate that our algorithm can assign more reasonable morphological prior information to each limb, and the performance of our algorithm is comparable to the prior state-of-the-art algorithm with better generalization.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1611-1621"},"PeriodicalIF":5.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-29DOI: 10.1109/TCDS.2024.3383296
Yuchen Yan;Haotian Su;Yunyi Jia
The development of collaborative robots has enabled a safer and more efficient human–robot collaboration (HRC) manufacturing environment. Tremendous research efforts have been conducted to improve user safety and robot working efficiency after the debut of collaborative robots. However, human comfort in HRC scenarios has not been thoroughly discussed but is critically important to the user acceptance of collaborative robots. Previous studies mostly utilize the subjective rating method to evaluate how human comfort varies as one robot factor changes, yet such method is limited in evaluating comfort online. Some other studies leverage wearable sensors to collect physiological signals to detect human emotions, but few of them implement this for a human comfort model in HRC scenarios. In this study, we designed an online comfort model for HRC using wearable sensing data. The model uses physiological signals acquired from wearable sensing and calculates the in-situ human comfort levels based on our developed algorithms. We have conducted experiments in realistic HRC tasks, and the prediction results demonstrated the effectiveness of the proposed approach in identifying human comfort levels in HRC.
{"title":"Measuring Human Comfort in Human–Robot Collaboration via Wearable Sensing","authors":"Yuchen Yan;Haotian Su;Yunyi Jia","doi":"10.1109/TCDS.2024.3383296","DOIUrl":"10.1109/TCDS.2024.3383296","url":null,"abstract":"The development of collaborative robots has enabled a safer and more efficient human–robot collaboration (HRC) manufacturing environment. Tremendous research efforts have been conducted to improve user safety and robot working efficiency after the debut of collaborative robots. However, human comfort in HRC scenarios has not been thoroughly discussed but is critically important to the user acceptance of collaborative robots. Previous studies mostly utilize the subjective rating method to evaluate how human comfort varies as one robot factor changes, yet such method is limited in evaluating comfort online. Some other studies leverage wearable sensors to collect physiological signals to detect human emotions, but few of them implement this for a human comfort model in HRC scenarios. In this study, we designed an online comfort model for HRC using wearable sensing data. The model uses physiological signals acquired from wearable sensing and calculates the in-situ human comfort levels based on our developed algorithms. We have conducted experiments in realistic HRC tasks, and the prediction results demonstrated the effectiveness of the proposed approach in identifying human comfort levels in HRC.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1748-1758"},"PeriodicalIF":5.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-26DOI: 10.1109/TCDS.2024.3382109
Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li
Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.
{"title":"Deep Neural Networks for Automatic Sleep Stage Classification and Consciousness Assessment in Patients With Disorder of Consciousness","authors":"Jiahui Pan;Yangzuyi Yu;Jianhui Wu;Xinjie Zhou;Yanbin He;Yuanqing Li","doi":"10.1109/TCDS.2024.3382109","DOIUrl":"10.1109/TCDS.2024.3382109","url":null,"abstract":"Disorders of consciousness (DOC) are often related to serious changes in sleep structure. This article presents a sleep evaluation algorithm that scores the sleep structure of DOC patients to assist in assessing their consciousness level. The sleep evaluation algorithm is divided into two parts: 1) automatic sleep staging model: convolutional neural networks (CNNs) are employed for the extraction of signal features from electroencephalogram (EEG) and electrooculogram (EOG), and bidirectional long short-term memory (Bi-LSTM) with attention mechanism is applied to learn sequential information; and 2) consciousness assessment: automated sleep staging results are used to extract consciousness-related sleep features that are utilized by a support vector machine (SVM) classifier to assess consciousness. In this study, the CNN-BiLSTM model with an attention sleep network (CBASleepNet) was conducted using the sleep-EDF and MASS datasets. The experimental results demonstrated the effectiveness of the proposed model, which outperformed similar models. Moreover, CBASleepNet was applied to sleep staging in DOC patients through transfer learning and fine-tuning. Consciousness assessments were conducted on seven minimally conscious state (MCS) patients and four vegetative state (VS)/unresponsive wakefulness syndrome (UWS) patients, achieving an overall accuracy of 81.8%. The sleep evaluation algorithm can be used to evaluate the consciousness level of patients effectively.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1589-1603"},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140315148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-22DOI: 10.1109/TCDS.2024.3380907
Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li
Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.
{"title":"EventAugment: Learning Augmentation Policies From Asynchronous Event-Based Data","authors":"Fuqiang Gu;Jiarui Dou;Mingyan Li;Xianlei Long;Songtao Guo;Chao Chen;Kai Liu;Xianlong Jiao;Ruiyuan Li","doi":"10.1109/TCDS.2024.3380907","DOIUrl":"10.1109/TCDS.2024.3380907","url":null,"abstract":"Data augmentation is an effective way to overcome the overfitting problem of deep learning models. However, most existing studies on data augmentation work on framelike data (e.g., images), and few tackles with event-based data. Event-based data are different from framelike data, rendering the augmentation techniques designed for framelike data unsuitable for event-based data. This work deals with data augmentation for event-based object classification and semantic segmentation, which is important for self-driving and robot manipulation. Specifically, we introduce EventAugment, a new method to augment asynchronous event-based data by automatically learning augmentation policies. We first identify 13 types of operations for augmenting event-based data. Next, we formulate the problem of finding optimal augmentation policies as a hyperparameter optimization problem. To tackle this problem, we propose a random search-based framework. Finally, we evaluate the proposed method on six public datasets including N-Caltech101, N-Cars, ST-MNIST, N-MNIST, DVSGesture, and DDD17. Experimental results demonstrate that EventAugment exhibits substantial performance improvements for both deep neural network-based and spiking neural network-based models, with gains of up to approximately 4%. Notably, EventAugment outperform state-of-the-art methods in terms of overall performance.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1521-1532"},"PeriodicalIF":5.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TCDS.2024.3376072
Reza Javanmard Alitappeh;Akhil John;Bernardo Dias;A. John van Opstal;Alexandre Bernardino
This article explores the application of model-based optimal control principles in understanding stereotyped human oculomotor behaviors. Using a realistic model of the human eye with a six-muscle cable-driven actuation system, we tackle the novel challenges of addressing a system with six degrees of freedom. We apply nonlinear optimal control techniques to optimize accuracy, energy, and duration of eye-movement trajectories. Employing a recurrent neural network to emulate system dynamics, we focus on generating rapid, unconstrained saccadic eye-movements. Remarkably, our model replicates realistic 3-D rotational kinematics and dynamics observed in human saccades, with the six cables organizing themselves into appropriate antagonistic muscle pairs, resembling the primate oculomotor system.
{"title":"Emergence of Human Oculomotor Behavior in a Cable-Driven Biomimetic Robotic Eye Using Optimal Control","authors":"Reza Javanmard Alitappeh;Akhil John;Bernardo Dias;A. John van Opstal;Alexandre Bernardino","doi":"10.1109/TCDS.2024.3376072","DOIUrl":"10.1109/TCDS.2024.3376072","url":null,"abstract":"This article explores the application of model-based optimal control principles in understanding stereotyped human oculomotor behaviors. Using a realistic model of the human eye with a six-muscle cable-driven actuation system, we tackle the novel challenges of addressing a system with six degrees of freedom. We apply nonlinear optimal control techniques to optimize accuracy, energy, and duration of eye-movement trajectories. Employing a recurrent neural network to emulate system dynamics, we focus on generating rapid, unconstrained saccadic eye-movements. Remarkably, our model replicates realistic 3-D rotational kinematics and dynamics observed in human saccades, with the six cables organizing themselves into appropriate antagonistic muscle pairs, resembling the primate oculomotor system.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1546-1560"},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474482","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TCDS.2024.3377445
Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge
This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).
{"title":"The Inadequacy of Reinforcement Learning From Human Feedback—Radicalizing Large Language Models via Semantic Vulnerabilities","authors":"Timothy R. McIntosh;Teo Susnjak;Tong Liu;Paul Watters;Malka N. Halgamuge","doi":"10.1109/TCDS.2024.3377445","DOIUrl":"10.1109/TCDS.2024.3377445","url":null,"abstract":"This study is an empirical investigation into the semantic vulnerabilities of four popular pretrained commercial large language models (LLMs) to ideological manipulation. Using tactics reminiscent of human semantic conditioning in psychology, we have induced and assessed ideological misalignments and their retention in four commercial pretrained LLMs, in response to 30 controversial questions that spanned a broad ideological and social spectrum, encompassing both extreme left- and right-wing viewpoints. Such semantic vulnerabilities arise due to fundamental limitations in LLMs’ capability to comprehend detailed linguistic variations, making them susceptible to ideological manipulation through targeted semantic exploits. We observed reinforcement learning from human feedback (RLHF) in effect to LLM initial answers, but highlighted the limitations of RLHF in two aspects: 1) its inability to fully mitigate the impact of ideological conditioning prompts, leading to partial alleviation of LLM semantic vulnerabilities; and 2) its inadequacy in representing a diverse set of “human values,” often reflecting the predefined values of certain groups controlling the LLMs. Our findings have provided empirical evidence of semantic vulnerabilities inherent in current LLMs, challenged both the robustness and the adequacy of RLHF as a mainstream method for aligning LLMs with human values, and underscored the need for a multidisciplinary approach in developing ethical and resilient artificial intelligence (AI).","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1561-1574"},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}