IEEE Transactions on Cognitive and Developmental Systems最新文献_第8页

IEEE Transactions on Cognitive and Developmental Systems Information for Authors 电气和电子工程师学会《认知与发展系统》期刊为作者提供的信息

IF 5 3区计算机科学 Q1 Computer Science

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-04-04 DOI: 10.1109/TCDS.2024.3373155

引用次数: 0

Attention Mechanism and Out-of-Distribution Data on Cross Language Image Matching for Weakly Supervised Semantic Segmentation 弱监督语义分割跨语言图像匹配的注意机制和分布外数据

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-04-02 DOI: 10.1109/TCDS.2024.3382914

Chi-Chia Sun;Jing-Ming Guo;Chen-Hung Chung;Bo-Yu Chen

The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.

完全有监督的语义分割需要对每个像素进行详细标注，而逐个像素的标注费时费力。为了解决这个问题，本文的研究方向是利用图像级分类标注来完成语义分割任务。使用图像级标注的现有方法通常首先使用类激活图（CAM）来查找目标对象的位置。通过训练分类器，可以有效地搜索图像中是否存在物体。然而，类激活图出现了以下问题：1) 物体过度集中在特定区域，只捕捉到最突出、最关键的区域；2) 容易误读经常出现的背景区域，混淆前景和背景。本文介绍了基于分布外数据和卷积块注意力模块（CLODA）的跨语言图像匹配，即跨语言图像匹配框架中的双分支概念，并在注意力分支中加入了卷积注意力模块，以解决 CAM 中物体过度聚焦的问题。在分布外分支上导入分布外数据有助于分类网络改善对焦点区域的误读。利用两个分支上的交叉伪监督优化注意力分支学习的兴趣区域。实验结果表明，通过模式分析、统计建模和计算学习视觉对象类别（PASCAL VOC）2012 训练集，由所提出的网络生成的伪任务在平均交叉超过联合（mIoU）方面能达到 75.3%。在 PASCAL VOC 2012 验证集和测试集上，使用伪掩码训练的分割网络的 mIoU 性能分别达到 72.3% 和 72.1%。

{"title":"Attention Mechanism and Out-of-Distribution Data on Cross Language Image Matching for Weakly Supervised Semantic Segmentation","authors":"Chi-Chia Sun;Jing-Ming Guo;Chen-Hung Chung;Bo-Yu Chen","doi":"10.1109/TCDS.2024.3382914","DOIUrl":"10.1109/TCDS.2024.3382914","url":null,"abstract":"The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1604-1610"},"PeriodicalIF":5.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DatUS: Data-Driven Unsupervised Semantic Segmentation With Pretrained Self-Supervised Vision Transformer DatUS：数据驱动的无监督语义分割与预训练的自监督视觉转换器

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-04-02 DOI: 10.1109/TCDS.2024.3383952

Sonal Kumar;Arijit Sur;Rashmi Dutta Baruah

Successive proposals of several self-supervised training schemes (STSs) continue to emerge, taking one step closer to developing a universal foundation model. In this process, unsupervised downstream tasks are recognized as one of the evaluation methods to validate the quality of visual features learned with self-supervised training. However, unsupervised dense semantic segmentation has yet to be explored as a downstream task, which can utilize and evaluate the quality of semantic information introduced in patch-level feature representations during self-supervised training of vision transformers. Therefore, we propose a novel data-driven framework, DatUS, to perform unsupervised dense semantic segmentation (DSS) as a downstream task. DatUS generates semantically consistent pseudosegmentation masks for an unlabeled image dataset without using visual prior or synchronized data. The experiment shows that the proposed framework achieves the highest MIoU (24.90) and average F1 score (36.3) by choosing DINOv2 and the highest pixel accuracy (62.18) by choosing DINO as the STS on the training set of SUIM dataset. It also outperforms state-of-the-art methods for the unsupervised DSS task with 15.02% MIoU, 21.47% pixel accuracy, and 16.06% average F1 score on the validation set of SUIM dataset. It achieves a competitive level of accuracy for a large-scale COCO dataset.

一些自我监督训练方案（STS）的提案不断涌现，向开发通用基础模型的目标又迈进了一步。在这一过程中，无监督下游任务被认为是验证通过自我监督训练学习到的视觉特征质量的评估方法之一。然而，无监督密集语义分割作为一种下游任务，还有待于探索，它可以利用和评估在视觉转换器的自我监督训练过程中引入到补丁级特征表征中的语义信息的质量。因此，我们提出了一种新颖的数据驱动框架 DatUS，将无监督密集语义分割（DSS）作为一项下游任务来执行。DatUS 无需使用视觉先验数据或同步数据，即可为无标记图像数据集生成语义一致的伪分割掩码。实验结果表明，在 SUIM 数据集的训练集上，通过选择 DINOv2，提议的框架获得了最高的 MIoU（24.90）和平均 F1 分数（36.3）；通过选择 DINO 作为 STS，提议的框架获得了最高的像素准确率（62.18）。在 SUIM 数据集的验证集上，它还以 15.02% 的 MIoU、21.47% 的像素准确率和 16.06% 的平均 F1 得分超越了无监督 DSS 任务的先进方法。对于大规模 COCO 数据集来说，它达到了具有竞争力的准确率水平。

{"title":"DatUS: Data-Driven Unsupervised Semantic Segmentation With Pretrained Self-Supervised Vision Transformer","authors":"Sonal Kumar;Arijit Sur;Rashmi Dutta Baruah","doi":"10.1109/TCDS.2024.3383952","DOIUrl":"10.1109/TCDS.2024.3383952","url":null,"abstract":"Successive proposals of several self-supervised training schemes (STSs) continue to emerge, taking one step closer to developing a universal foundation model. In this process, unsupervised downstream tasks are recognized as one of the evaluation methods to validate the quality of visual features learned with self-supervised training. However, unsupervised dense semantic segmentation has yet to be explored as a downstream task, which can utilize and evaluate the quality of semantic information introduced in patch-level feature representations during self-supervised training of vision transformers. Therefore, we propose a novel data-driven framework, DatUS, to perform unsupervised dense semantic segmentation (DSS) as a downstream task. DatUS generates semantically consistent pseudosegmentation masks for an unlabeled image dataset without using visual prior or synchronized data. The experiment shows that the proposed framework achieves the highest MIoU (24.90) and average F1 score (36.3) by choosing DINOv2 and the highest pixel accuracy (62.18) by choosing DINO as the STS on the training set of SUIM dataset. It also outperforms state-of-the-art methods for the unsupervised DSS task with 15.02% MIoU, 21.47% pixel accuracy, and 16.06% average F1 score on the validation set of SUIM dataset. It achieves a competitive level of accuracy for a large-scale COCO dataset.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1775-1788"},"PeriodicalIF":5.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep-Reinforcement-Learning-Based Driving Policy at Intersections Utilizing Lane Graph Networks 利用车道图网络制定基于深度强化学习的交叉路口驾驶策略

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-04-02 DOI: 10.1109/TCDS.2024.3384269

Yuqi Liu;Qichao Zhang;Yinfeng Gao;Dongbin Zhao

Learning an efficient and safe driving strategy in a traffic-heavy intersection scenario and generalizing it to different intersections remains a challenging task for autonomous driving. This is because there are differences in the structure of roads at different intersections, and autonomous vehicles need to generalize the strategies they have learned in the training environments. This requires the autonomous vehicle to capture not only the interactions between agents but also the relationships between agents and the map effectively. To address this challenge, we present a technique that integrates the information of high-definition (HD) maps and traffic participants into vector representations, called lane graph vectorization (LGV). In order to construct a driving policy for intersection navigation, we incorporate LGV into the twin-delayed deep deterministic policy gradient (TD3) algorithm with prioritized experience replay (PER). To train and validate the proposed algorithm, we construct a gym environment for intersection navigation within the high-fidelity CARLA simulator, integrating dense interactive traffic flow and various generalization test intersection scenarios. Experimental results demonstrate the effectiveness of LGV for intersection navigation tasks and outperform the state-of-the-art in our proposed scenarios.

在交通繁忙的交叉路口场景中学习高效、安全的驾驶策略，并将其推广到不同的交叉路口，对于自动驾驶来说仍然是一项具有挑战性的任务。这是因为不同交叉路口的道路结构存在差异，自动驾驶车辆需要将其在训练环境中学到的策略加以推广。这就要求自动驾驶汽车不仅要捕捉到驾驶员之间的互动，还要有效地捕捉到驾驶员与地图之间的关系。为了应对这一挑战，我们提出了一种将高清（HD）地图和交通参与者的信息整合为矢量表示的技术，称为车道图矢量化（LGV）。为了构建交叉路口导航的驾驶策略，我们将 LGV 纳入了带有优先经验重放（PER）的双延迟深度确定性策略梯度（TD3）算法中。为了训练和验证所提出的算法，我们在高保真 CARLA 模拟器中构建了一个用于路口导航的健身房环境，其中集成了密集的交互式交通流和各种通用测试路口场景。实验结果证明了 LGV 在交叉口导航任务中的有效性，在我们提出的场景中，LGV 的表现优于最先进的算法。

{"title":"Deep-Reinforcement-Learning-Based Driving Policy at Intersections Utilizing Lane Graph Networks","authors":"Yuqi Liu;Qichao Zhang;Yinfeng Gao;Dongbin Zhao","doi":"10.1109/TCDS.2024.3384269","DOIUrl":"10.1109/TCDS.2024.3384269","url":null,"abstract":"Learning an efficient and safe driving strategy in a traffic-heavy intersection scenario and generalizing it to different intersections remains a challenging task for autonomous driving. This is because there are differences in the structure of roads at different intersections, and autonomous vehicles need to generalize the strategies they have learned in the training environments. This requires the autonomous vehicle to capture not only the interactions between agents but also the relationships between agents and the map effectively. To address this challenge, we present a technique that integrates the information of high-definition (HD) maps and traffic participants into vector representations, called lane graph vectorization (LGV). In order to construct a driving policy for intersection navigation, we incorporate LGV into the twin-delayed deep deterministic policy gradient (TD3) algorithm with prioritized experience replay (PER). To train and validate the proposed algorithm, we construct a gym environment for intersection navigation within the high-fidelity CARLA simulator, integrating dense interactive traffic flow and various generalization test intersection scenarios. Experimental results demonstrate the effectiveness of LGV for intersection navigation tasks and outperform the state-of-the-art in our proposed scenarios.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1759-1774"},"PeriodicalIF":5.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BitSNNs: Revisiting Energy-Efficient Spiking Neural Networks BitSNNs：重新审视高能效尖峰神经网络

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-04-01 DOI: 10.1109/TCDS.2024.3383428

Yangfan Hu;Qian Zheng;Gang Pan

To address the energy bottleneck in deep neural networks (DNNs), the research community has developed binary neural networks (BNNs) and spiking neural networks (SNNs) from different perspectives. To combine the advantages of both BNNs and SNNs for better energy efficiency, this article proposes BitSNNs, which leverage binary weights, single-step inference, and activation sparsity. During the development of BitSNNs, we observed performance degradation in deep ResNets due to the gradient approximation error. To mitigate this issue, we delve into the learning process and propose the utilization of a hardtanh function before activation binarization. Additionally, this article investigates the critical role of activation sparsity in BitSNNs for energy efficiency, a topic often overlooked in the existing literature. Our study reveals strategies to strike a balance between accuracy and energy consumption during the training/testing stage, potentially benefiting applications in edge computing. Notably, our proposed method achieves state-of-the-art performance while significantly reducing energy consumption.

为解决深度神经网络（DNN）的能量瓶颈问题，研究界从不同角度开发了二元神经网络（BNN）和尖峰神经网络（SNN）。为了结合 BNN 和 SNN 的优势以提高能效，本文提出了 BitSNN，它充分利用了二进制权重、单步推理和激活稀疏性。在开发 BitSNNs 的过程中，我们发现由于梯度逼近误差，深度 ResNets 的性能有所下降。为了缓解这一问题，我们深入研究了学习过程，并建议在激活二值化之前使用硬坦函数。此外，本文还研究了激活稀疏性在 BitSNNs 中提高能效的关键作用，这是现有文献中经常忽略的一个话题。我们的研究揭示了在训练/测试阶段如何在准确性和能耗之间取得平衡的策略，这对边缘计算中的应用大有裨益。值得注意的是，我们提出的方法在大幅降低能耗的同时，还实现了最先进的性能。

引用次数: 0

MAT: Morphological Adaptive Transformer for Universal Morphology Policy Learning MAT：用于通用形态学策略学习的形态学自适应变换器

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-04-01 DOI: 10.1109/TCDS.2024.3383158

Boyu Li;Haoran Li;Yuanheng Zhu;Dongbin Zhao

Agent-agnostic reinforcement learning aims to learn a universal control policy that can simultaneously control a set of robots with different morphologies. Recent studies have suggested that using the transformer model can address variations in state and action spaces caused by different morphologies, and morphology information is necessary to improve policy performance. However, existing methods have limitations in exploiting morphological information, where the rationality of observation integration cannot be guaranteed. We propose morphological adaptive transformer (MAT), a transformer-based universal control algorithm that can adapt to various morphologies without any modifications. MAT includes two essential components: functional position encoding (FPE) and morphological attention mechanism (MAM). The FPE provides robust and consistent positional prior information for limb observation to avoid limb confusion and implicitly obtain functional descriptions of limbs. The MAM enhances the attribute prior information of limbs, improves the correlation between observations, and makes the policy pay attention to more limbs. We combine observation with prior information to help policy adapt to the morphology of robots, thereby optimizing its performance with unknown morphologies. Experiments on agent-agnostic tasks in Gym MuJoCo environment demonstrate that our algorithm can assign more reasonable morphological prior information to each limb, and the performance of our algorithm is comparable to the prior state-of-the-art algorithm with better generalization.

与代理无关的强化学习旨在学习一种通用控制策略，该策略可同时控制一组具有不同形态的机器人。最近的研究表明，使用变压器模型可以解决不同形态引起的状态和行动空间的变化，而形态信息是提高策略性能的必要条件。然而，现有方法在利用形态信息方面存在局限性，无法保证观测整合的合理性。我们提出了形态自适应变换器（MAT），这是一种基于变换器的通用控制算法，无需任何修改即可适应各种形态。MAT 包括两个基本组成部分：功能位置编码（FPE）和形态注意机制（MAM）。FPE 为肢体观察提供稳健一致的位置先验信息，以避免肢体混淆，并隐含地获得肢体的功能描述。MAM 增强了肢体的属性先验信息，提高了观察之间的相关性，并使政策关注更多的肢体。我们将观察结果与先验信息相结合，帮助策略适应机器人的形态，从而优化其在未知形态下的性能。在 Gym MuJoCo 环境中进行的与代理无关的任务实验表明，我们的算法可以为每个肢体分配更合理的形态先验信息，而且我们算法的性能与之前最先进的算法相当，并具有更好的泛化能力。

{"title":"MAT: Morphological Adaptive Transformer for Universal Morphology Policy Learning","authors":"Boyu Li;Haoran Li;Yuanheng Zhu;Dongbin Zhao","doi":"10.1109/TCDS.2024.3383158","DOIUrl":"10.1109/TCDS.2024.3383158","url":null,"abstract":"Agent-agnostic reinforcement learning aims to learn a universal control policy that can simultaneously control a set of robots with different morphologies. Recent studies have suggested that using the transformer model can address variations in state and action spaces caused by different morphologies, and morphology information is necessary to improve policy performance. However, existing methods have limitations in exploiting morphological information, where the rationality of observation integration cannot be guaranteed. We propose morphological adaptive transformer (MAT), a transformer-based universal control algorithm that can adapt to various morphologies without any modifications. MAT includes two essential components: functional position encoding (FPE) and morphological attention mechanism (MAM). The FPE provides robust and consistent positional prior information for limb observation to avoid limb confusion and implicitly obtain functional descriptions of limbs. The MAM enhances the attribute prior information of limbs, improves the correlation between observations, and makes the policy pay attention to more limbs. We combine observation with prior information to help policy adapt to the morphology of robots, thereby optimizing its performance with unknown morphologies. Experiments on agent-agnostic tasks in Gym MuJoCo environment demonstrate that our algorithm can assign more reasonable morphological prior information to each limb, and the performance of our algorithm is comparable to the prior state-of-the-art algorithm with better generalization.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1611-1621"},"PeriodicalIF":5.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Control With Style: Style Embedding-Based Variational Autoencoder for Controlled Stylized Caption Generation Framework 用风格控制：基于风格嵌入的变异自动编码器用于受控风格化标题生成框架

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-30 DOI: 10.1109/TCDS.2024.3405573

Dhruv Sharma;Chhavi Dhiman;Dinesh Kumar

Automatic image captioning is a computationally intensive and structurally complicated task that describes the contents of an image in the form of a natural language sentence. Methods developed in the recent past focused mainly on the description of factual content in images thereby ignoring the different emotions and styles (romantic, humorous, angry, etc.) associated with the image. To overcome this, few works incorporated style-based caption generation that captures the variability in the generated descriptions. This article presents a style embedding-based variational autoencoder for controlled stylized caption generation framework (RFCG+SE-VAE-CSCG). It generates controlled text-based stylized descriptions of images. It works in two phases, i.e.,

$ 1)$

refined factual caption generation (RFCG); and

$ 2)$

SE-VAE-CSCG. The former defines an encoder–decoder model for the generation of refined factual captions. Whereas, the latter presents a SE-VAE for controlled stylized caption generation. The overall proposed framework generates style-based descriptions of images by leveraging bag of captions (BoCs). More so, with the use of a controlled text generation model, the proposed work efficiently learns disentangled representations and generates realistic stylized descriptions of images. Experiments on MSCOCO, Flickr30K, and FlickrStyle10K provide state-of-the-art results for both refined and style-based caption generation, supported with an ablation study.

自动图像字幕是一项计算密集型和结构复杂的任务，它以自然语言句子的形式描述图像的内容。最近发展起来的方法主要集中在描述图像中的事实内容，从而忽略了与图像相关的不同情感和风格（浪漫，幽默，愤怒等）。为了克服这个问题，很少有作品采用基于样式的标题生成来捕获生成描述中的可变性。本文提出了一种基于样式嵌入的变分自编码器，用于控制样式化标题生成框架（RFCG+SE-VAE-CSCG）。它生成受控的基于文本的图像风格化描述。它分为两个阶段，即：$ 1)$精炼事实标题生成（RFCG）；$ 2)$ SE-VAE-CSCG。前者定义了一个编码器-解码器模型，用于生成精炼的事实说明。而后者则提出了一种用于受控风格化标题生成的SE-VAE。整个提议的框架通过利用字幕包（boc）生成基于样式的图像描述。更重要的是，通过使用受控文本生成模型，所提出的工作有效地学习解纠缠表示并生成逼真的图像风格化描述。在MSCOCO、Flickr30K和FlickrStyle10K上的实验为精炼和基于样式的标题生成提供了最先进的结果，并得到了烧消研究的支持。

{"title":"Control With Style: Style Embedding-Based Variational Autoencoder for Controlled Stylized Caption Generation Framework","authors":"Dhruv Sharma;Chhavi Dhiman;Dinesh Kumar","doi":"10.1109/TCDS.2024.3405573","DOIUrl":"10.1109/TCDS.2024.3405573","url":null,"abstract":"Automatic image captioning is a computationally intensive and structurally complicated task that describes the contents of an image in the form of a natural language sentence. Methods developed in the recent past focused mainly on the description of factual content in images thereby ignoring the different emotions and styles (romantic, humorous, angry, etc.) associated with the image. To overcome this, few works incorporated style-based caption generation that captures the variability in the generated descriptions. This article presents a style embedding-based variational autoencoder for controlled stylized caption generation framework (RFCG+SE-VAE-CSCG). It generates controlled text-based stylized descriptions of images. It works in two phases, i.e., \u0000<inline-formula><tex-math>$ 1)$</tex-math></inline-formula>\u0000 refined factual caption generation (RFCG); and \u0000<inline-formula><tex-math>$ 2)$</tex-math></inline-formula>\u0000 SE-VAE-CSCG. The former defines an encoder–decoder model for the generation of refined factual captions. Whereas, the latter presents a SE-VAE for controlled stylized caption generation. The overall proposed framework generates style-based descriptions of images by leveraging bag of captions (BoCs). More so, with the use of a controlled text generation model, the proposed work efficiently learns disentangled representations and generates realistic stylized descriptions of images. Experiments on MSCOCO, Flickr30K, and FlickrStyle10K provide state-of-the-art results for both refined and style-based caption generation, supported with an ablation study.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2032-2042"},"PeriodicalIF":5.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141195536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Reinforcement Learning for Autonomous Driving Based on Safety Experience Replay 基于安全体验回放的自动驾驶深度强化学习

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-30 DOI: 10.1109/TCDS.2024.3405896

Xiaohan Huang;Yuhu Cheng;Qiang Yu;Xuesong Wang

In the field of autonomous driving, safety has always been a top priority, especially in recent years with the development and increasing application of deep reinforcement learning (DRL) in autonomous driving. Ensuring the safety of algorithms has become an indispensable concern. Reinforcement learning (RL), which involves interacting with the environment through trial and error, may result in unsafe behavior in autonomous driving without any safety constraints. Such behavior could result in the drive path deviation and even collision, causing catastrophic accidents. Therefore, this article proposes a reinforcement learning algorithm based on a safety experience replay mechanism, which is primarily to enhance the safety of reinforcement learning in autonomous driving. First, the ego vehicle conducts preliminary exploration of the environment to collect data. Based on the performance of completing tasks observed from each data trajectory, safety labels of different levels are assigned to all state-action pairs, which establishes a safety experience buffer. Further, a safety-critic network is constructed, which is trained by randomly sampling from the safety experience buffer. This enables the network to quantitatively evaluate the safety of driving actions, and the goal of safe driving for ego vehicle is achieved. The experimental results indicate that the proposed method can effectively reduce driving risks and improve task success rates compared with conventional reinforcement learning algorithms.

在自动驾驶领域，安全一直是重中之重，特别是近年来随着深度强化学习（DRL）在自动驾驶领域的发展和应用越来越多。确保算法的安全性已成为一个不可或缺的问题。强化学习（RL）涉及通过试错与环境相互作用，在没有任何安全约束的自动驾驶中可能会导致不安全行为。这种行为可能导致行驶路径偏离，甚至发生碰撞，造成灾难性事故。因此，本文提出了一种基于安全经验回放机制的强化学习算法，主要是为了提高自动驾驶中强化学习的安全性。首先，自我车辆对环境进行初步探索，收集数据。根据从每个数据轨迹观察到的任务完成情况，为所有状态-动作对分配不同级别的安全标签，建立安全经验缓冲区。在此基础上，构建了一个安全评价网络，该网络通过从安全经验缓冲区中随机抽取样本进行训练。这使得网络能够定量评价驾驶行为的安全性，实现自我车辆安全驾驶的目标。实验结果表明，与传统的强化学习算法相比，该方法能有效降低驾驶风险，提高任务成功率。

{"title":"Deep Reinforcement Learning for Autonomous Driving Based on Safety Experience Replay","authors":"Xiaohan Huang;Yuhu Cheng;Qiang Yu;Xuesong Wang","doi":"10.1109/TCDS.2024.3405896","DOIUrl":"10.1109/TCDS.2024.3405896","url":null,"abstract":"In the field of autonomous driving, safety has always been a top priority, especially in recent years with the development and increasing application of deep reinforcement learning (DRL) in autonomous driving. Ensuring the safety of algorithms has become an indispensable concern. Reinforcement learning (RL), which involves interacting with the environment through trial and error, may result in unsafe behavior in autonomous driving without any safety constraints. Such behavior could result in the drive path deviation and even collision, causing catastrophic accidents. Therefore, this article proposes a reinforcement learning algorithm based on a safety experience replay mechanism, which is primarily to enhance the safety of reinforcement learning in autonomous driving. First, the ego vehicle conducts preliminary exploration of the environment to collect data. Based on the performance of completing tasks observed from each data trajectory, safety labels of different levels are assigned to all state-action pairs, which establishes a safety experience buffer. Further, a safety-critic network is constructed, which is trained by randomly sampling from the safety experience buffer. This enables the network to quantitatively evaluate the safety of driving actions, and the goal of safe driving for ego vehicle is achieved. The experimental results indicate that the proposed method can effectively reduce driving risks and improve task success rates compared with conventional reinforcement learning algorithms.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2070-2084"},"PeriodicalIF":5.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141195749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Progressive Transfer Learning for Dexterous In-Hand Manipulation With Multifingered Anthropomorphic Hand 使用多指拟人手进行灵巧手部操控的渐进式迁移学习

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-29 DOI: 10.1109/TCDS.2024.3406730

Yongkang Luo;Wanyi Li;Peng Wang;Haonan Duan;Wei Wei;Jia Sun

Dexterous in-hand manipulation poses significant challenges for a multifingered anthropomorphic hand due to the high-dimensional state and action spaces, as well as the intricate contact patterns between the fingers and objects. Although deep reinforcement learning has made moderate progress and demonstrated its strong potential for manipulation, it faces certain challenges, including large-scale data collection and high sample complexity. Particularly in scenes with slight changes, it necessitates the recollection of vast amounts of data and numerous iterations of fine-tuning. Remarkably, humans can quickly transfer their learned manipulation skills to different scenarios with minimal supervision. Inspired by the flexible transfer learning capability of humans, we propose a novel framework called progressive transfer learning (PTL) for dexterous in-hand manipulation. This framework efficiently utilizes the collected trajectories and the dynamics model trained on a source dataset. It adopts progressive neural networks for dynamics model transfer learning on samples selected using a new method based on dynamics properties, rewards, and trajectory scores. Experimental results on contact-rich anthropomorphic hand manipulation tasks demonstrate that our method can efficiently and effectively learn in-hand manipulation skills with just a few online attempts and adjustment learning in the new scene. Moreover, compared to learning from scratch, our method significantly reduces training time costs by 85%.

由于高维状态和动作空间，以及手指与物体之间复杂的接触模式，灵巧的手部操作对多指拟人化手提出了重大挑战。虽然深度强化学习已经取得了适度的进展，并显示出其强大的操作潜力，但它面临着一定的挑战，包括大规模的数据收集和高样本复杂性。特别是在有细微变化的场景中，它需要回忆大量的数据和无数次的微调。值得注意的是，人类可以在最少监督的情况下迅速将他们学到的操作技能转移到不同的场景中。受人类灵活迁移学习能力的启发，我们提出了一种新的框架，称为渐进式迁移学习（PTL）。该框架有效地利用了收集到的轨迹和在源数据集上训练的动力学模型。它采用渐进式神经网络对基于动态特性、奖励和轨迹分数的新方法选择的样本进行动态模型迁移学习。在多接触拟人化手操作任务上的实验结果表明，我们的方法在新场景中只需少量的在线尝试和调整学习，就可以高效有效地学习手操作技能。此外，与从头开始学习相比，我们的方法显著减少了85%的训练时间成本。

{"title":"Progressive Transfer Learning for Dexterous In-Hand Manipulation With Multifingered Anthropomorphic Hand","authors":"Yongkang Luo;Wanyi Li;Peng Wang;Haonan Duan;Wei Wei;Jia Sun","doi":"10.1109/TCDS.2024.3406730","DOIUrl":"10.1109/TCDS.2024.3406730","url":null,"abstract":"Dexterous in-hand manipulation poses significant challenges for a multifingered anthropomorphic hand due to the high-dimensional state and action spaces, as well as the intricate contact patterns between the fingers and objects. Although deep reinforcement learning has made moderate progress and demonstrated its strong potential for manipulation, it faces certain challenges, including large-scale data collection and high sample complexity. Particularly in scenes with slight changes, it necessitates the recollection of vast amounts of data and numerous iterations of fine-tuning. Remarkably, humans can quickly transfer their learned manipulation skills to different scenarios with minimal supervision. Inspired by the flexible transfer learning capability of humans, we propose a novel framework called progressive transfer learning (PTL) for dexterous in-hand manipulation. This framework efficiently utilizes the collected trajectories and the dynamics model trained on a source dataset. It adopts progressive neural networks for dynamics model transfer learning on samples selected using a new method based on dynamics properties, rewards, and trajectory scores. Experimental results on contact-rich anthropomorphic hand manipulation tasks demonstrate that our method can efficiently and effectively learn in-hand manipulation skills with just a few online attempts and adjustment learning in the new scene. Moreover, compared to learning from scratch, our method significantly reduces training time costs by 85%.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2019-2031"},"PeriodicalIF":5.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141195537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Measuring Human Comfort in Human–Robot Collaboration via Wearable Sensing 通过可穿戴传感技术测量人机协作中的舒适度

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems

Pub Date : 2024-03-29 DOI: 10.1109/TCDS.2024.3383296

Yuchen Yan;Haotian Su;Yunyi Jia

The development of collaborative robots has enabled a safer and more efficient human–robot collaboration (HRC) manufacturing environment. Tremendous research efforts have been conducted to improve user safety and robot working efficiency after the debut of collaborative robots. However, human comfort in HRC scenarios has not been thoroughly discussed but is critically important to the user acceptance of collaborative robots. Previous studies mostly utilize the subjective rating method to evaluate how human comfort varies as one robot factor changes, yet such method is limited in evaluating comfort online. Some other studies leverage wearable sensors to collect physiological signals to detect human emotions, but few of them implement this for a human comfort model in HRC scenarios. In this study, we designed an online comfort model for HRC using wearable sensing data. The model uses physiological signals acquired from wearable sensing and calculates the in-situ human comfort levels based on our developed algorithms. We have conducted experiments in realistic HRC tasks, and the prediction results demonstrated the effectiveness of the proposed approach in identifying human comfort levels in HRC.

协作机器人的发展为人机协作（HRC）制造环境提供了更安全、更高效的条件。协作机器人问世后，为提高用户安全性和机器人工作效率，人们开展了大量研究工作。然而，人机协作场景中的人类舒适度尚未得到深入讨论，但这对用户接受协作机器人至关重要。以往的研究大多采用主观评分法来评估人的舒适度如何随着机器人某一因素的变化而变化，但这种方法在在线评估舒适度方面存在局限性。还有一些研究利用可穿戴传感器收集生理信号来检测人的情绪，但很少有研究将其用于人机交互场景中的人类舒适度模型。在本研究中，我们利用可穿戴传感数据设计了一个人机交互的在线舒适度模型。该模型使用从可穿戴传感设备获取的生理信号，并根据我们开发的算法计算现场人体舒适度。我们在现实的人机交互任务中进行了实验，预测结果证明了所提出的方法在识别人机交互中人体舒适度水平方面的有效性。

{"title":"Measuring Human Comfort in Human–Robot Collaboration via Wearable Sensing","authors":"Yuchen Yan;Haotian Su;Yunyi Jia","doi":"10.1109/TCDS.2024.3383296","DOIUrl":"10.1109/TCDS.2024.3383296","url":null,"abstract":"The development of collaborative robots has enabled a safer and more efficient human–robot collaboration (HRC) manufacturing environment. Tremendous research efforts have been conducted to improve user safety and robot working efficiency after the debut of collaborative robots. However, human comfort in HRC scenarios has not been thoroughly discussed but is critically important to the user acceptance of collaborative robots. Previous studies mostly utilize the subjective rating method to evaluate how human comfort varies as one robot factor changes, yet such method is limited in evaluating comfort online. Some other studies leverage wearable sensors to collect physiological signals to detect human emotions, but few of them implement this for a human comfort model in HRC scenarios. In this study, we designed an online comfort model for HRC using wearable sensing data. The model uses physiological signals acquired from wearable sensing and calculates the in-situ human comfort levels based on our developed algorithms. We have conducted experiments in realistic HRC tasks, and the prediction results demonstrated the effectiveness of the proposed approach in identifying human comfort levels in HRC.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 5","pages":"1748-1758"},"PeriodicalIF":5.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0