首页 > 最新文献

CAAI Transactions on Intelligence Technology最新文献

英文 中文
A self-learning human-machine cooperative control method based on driver intention recognition 基于驾驶员意图识别的自学习人机协同控制方法
IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-01 DOI: 10.1049/cit2.12313
Yan Jiang, Yuyan Ding, Xinglong Zhang, Xin Xu, Junwen Huang

Human-machine cooperative control has become an important area of intelligent driving, where driver intention recognition and dynamic control authority allocation are key factors for improving the performance of cooperative decision-making and control. In this paper, an online learning method is proposed for human-machine cooperative control, which introduces a priority control parameter in the reward function to achieve optimal allocation of control authority under different driver intentions and driving safety conditions. Firstly, a two-layer LSTM-based sequence prediction algorithm is proposed to recognise the driver's lane change (LC) intention for human-machine cooperative steering control. Secondly, an online reinforcement learning method is developed for optimising the steering authority to reduce driver workload and improve driving safety. The driver-in-the-loop simulation results show that our method can accurately predict the driver's LC intention in cooperative driving and effectively compensate for the driver's non-optimal driving actions. The experimental results on a real intelligent vehicle further demonstrate the online optimisation capability of the proposed RL-based control authority allocation algorithm and its effectiveness in improving driving safety.

人机协同控制已成为智能驾驶的重要领域,其中驾驶员意图识别和动态控制权限分配是提高协同决策和控制性能的关键因素。本文提出了一种人机协同控制的在线学习方法,在奖励函数中引入优先控制参数,实现不同驾驶员意图和驾驶安全条件下控制权的优化分配。首先,提出了一种基于双层 LSTM 的序列预测算法,用于识别驾驶员的变道(LC)意图,实现人机协同转向控制。其次,开发了一种在线强化学习方法来优化转向权限,以减少驾驶员的工作量并提高驾驶安全性。驾驶员在环仿真结果表明,我们的方法可以准确预测合作驾驶中驾驶员的 LC 意图,并有效补偿驾驶员的非最佳驾驶行为。在真实智能车辆上的实验结果进一步证明了所提出的基于 RL 的控制权分配算法的在线优化能力及其在提高驾驶安全性方面的有效性。
{"title":"A self-learning human-machine cooperative control method based on driver intention recognition","authors":"Yan Jiang,&nbsp;Yuyan Ding,&nbsp;Xinglong Zhang,&nbsp;Xin Xu,&nbsp;Junwen Huang","doi":"10.1049/cit2.12313","DOIUrl":"10.1049/cit2.12313","url":null,"abstract":"<p>Human-machine cooperative control has become an important area of intelligent driving, where driver intention recognition and dynamic control authority allocation are key factors for improving the performance of cooperative decision-making and control. In this paper, an online learning method is proposed for human-machine cooperative control, which introduces a priority control parameter in the reward function to achieve optimal allocation of control authority under different driver intentions and driving safety conditions. Firstly, a two-layer LSTM-based sequence prediction algorithm is proposed to recognise the driver's lane change (LC) intention for human-machine cooperative steering control. Secondly, an online reinforcement learning method is developed for optimising the steering authority to reduce driver workload and improve driving safety. The driver-in-the-loop simulation results show that our method can accurately predict the driver's LC intention in cooperative driving and effectively compensate for the driver's non-optimal driving actions. The experimental results on a real intelligent vehicle further demonstrate the online optimisation capability of the proposed RL-based control authority allocation algorithm and its effectiveness in improving driving safety.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1101-1115"},"PeriodicalIF":8.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12313","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140773169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis 基于词典的多语种语言模型微调,用于低资源语言情感分析
IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-01 DOI: 10.1049/cit2.12333
Vinura Dhananjaya, Surangika Ranathunga, Sanath Jayasena

Pre-trained multilingual language models (PMLMs) such as mBERT and XLM-R have shown good cross-lingual transferability. However, they are not specifically trained to capture cross-lingual signals concerning sentiment words. This poses a disadvantage for low-resource languages (LRLs) that are under-represented in these models. To better fine-tune these models for sentiment classification in LRLs, a novel intermediate task fine-tuning (ITFT) technique based on a sentiment lexicon of a high-resource language (HRL) is introduced. The authors experiment with LRLs Sinhala, Tamil and Bengali for a 3-class sentiment classification task and show that this method outperforms vanilla fine-tuning of the PMLM. It also outperforms or is on-par with basic ITFT that relies on an HRL sentiment classification dataset.

预先训练的多语言语言模型(PMLM),如 mBERT 和 XLM-R,已经显示出良好的跨语言可移植性。然而,这些模型并没有经过专门训练,无法捕捉有关情感词的跨语言信号。这给低资源语言(LRL)带来了不利因素,因为这些语言在这些模型中的代表性不足。为了更好地针对 LRLs 中的情感分类对这些模型进行微调,介绍了一种基于高资源语言(HRL)情感词典的新型中间任务微调(ITFT)技术。作者用僧伽罗语、泰米尔语和孟加拉语的 LRL 进行了三类情感分类任务的实验,结果表明这种方法优于 PMLM 的香草微调。它还优于依赖于 HRL 情感分类数据集的基本 ITFT,或者与之相当。
{"title":"Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis","authors":"Vinura Dhananjaya,&nbsp;Surangika Ranathunga,&nbsp;Sanath Jayasena","doi":"10.1049/cit2.12333","DOIUrl":"10.1049/cit2.12333","url":null,"abstract":"<p>Pre-trained multilingual language models (PMLMs) such as mBERT and XLM-R have shown good cross-lingual transferability. However, they are not specifically trained to capture cross-lingual signals concerning sentiment words. This poses a disadvantage for low-resource languages (LRLs) that are under-represented in these models. To better fine-tune these models for sentiment classification in LRLs, a novel intermediate task fine-tuning (ITFT) technique based on a sentiment lexicon of a high-resource language (HRL) is introduced. The authors experiment with LRLs Sinhala, Tamil and Bengali for a 3-class sentiment classification task and show that this method outperforms vanilla fine-tuning of the PMLM. It also outperforms or is on-par with basic ITFT that relies on an HRL sentiment classification dataset.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1116-1125"},"PeriodicalIF":8.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140776263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Edge-guided representation learning for underwater object detection 用于水下物体探测的边缘引导表示学习
IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-01 DOI: 10.1049/cit2.12325
Linhui Dai, Hong Liu, Pinhao Song, Hao Tang, Runwei Ding, Shengquan Li

Underwater object detection (UOD) is crucial for marine economic development, environmental protection, and the planet's sustainable development. The main challenges of this task arise from low-contrast, small objects, and mimicry of aquatic organisms. The key to addressing these challenges is to focus the model on obtaining more discriminative information. The authors observe that the edges of underwater objects are highly unique and can be distinguished from low-contrast or mimicry environments based on their edges. Motivated by this observation, an Edge-guided Representation Learning Network, termed ERL-Net is proposed, that aims to achieve discriminative representation learning and aggregation under the guidance of edge cues. Firstly, an edge-guided attention module is introduced to model the explicit boundary information, which generates more discriminative features. Secondly, a hierarchical feature aggregation module is proposed to aggregate the multi-scale discriminative features by regrouping them into three levels, effectively aggregating global and local information for locating and recognising underwater objects. Finally, a wide and asymmetric receptive field block is proposed to enable features to have a wider receptive field, allowing the model to focus on smaller object information. Comprehensive experiments on three challenging underwater datasets show that our method achieves superior performance on the UOD task.

水下物体探测(UOD)对于海洋经济发展、环境保护和地球的可持续发展至关重要。这项任务的主要挑战来自低对比度、小物体和水生生物的模仿。应对这些挑战的关键在于将模型的重点放在获取更具辨别力的信息上。作者观察到,水下物体的边缘非常独特,可以根据其边缘从低对比度或模仿环境中区分出来。基于这一观察结果,作者提出了边缘引导表征学习网络(ERL-Net),旨在边缘线索的引导下实现分辨表征学习和聚合。首先,引入边缘引导注意力模块来模拟明确的边界信息,从而生成更具区分性的特征。其次,提出了一个分层特征聚合模块,通过将多尺度判别特征重新组合为三个层次来聚合这些特征,从而有效地聚合全局和局部信息,用于定位和识别水下物体。最后,还提出了一个宽而不对称的感受野块,使特征具有更宽的感受野,从而使模型能够专注于更小的物体信息。在三个具有挑战性的水下数据集上进行的综合实验表明,我们的方法在 UOD 任务中取得了优异的性能。
{"title":"Edge-guided representation learning for underwater object detection","authors":"Linhui Dai,&nbsp;Hong Liu,&nbsp;Pinhao Song,&nbsp;Hao Tang,&nbsp;Runwei Ding,&nbsp;Shengquan Li","doi":"10.1049/cit2.12325","DOIUrl":"https://doi.org/10.1049/cit2.12325","url":null,"abstract":"<p>Underwater object detection (UOD) is crucial for marine economic development, environmental protection, and the planet's sustainable development. The main challenges of this task arise from low-contrast, small objects, and mimicry of aquatic organisms. The key to addressing these challenges is to focus the model on obtaining more discriminative information. The authors observe that the edges of underwater objects are highly unique and can be distinguished from low-contrast or mimicry environments based on their edges. Motivated by this observation, an Edge-guided Representation Learning Network, termed ERL-Net is proposed, that aims to achieve discriminative representation learning and aggregation under the guidance of edge cues. Firstly, an edge-guided attention module is introduced to model the explicit boundary information, which generates more discriminative features. Secondly, a hierarchical feature aggregation module is proposed to aggregate the multi-scale discriminative features by regrouping them into three levels, effectively aggregating global and local information for locating and recognising underwater objects. Finally, a wide and asymmetric receptive field block is proposed to enable features to have a wider receptive field, allowing the model to focus on smaller object information. Comprehensive experiments on three challenging underwater datasets show that our method achieves superior performance on the UOD task.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1078-1091"},"PeriodicalIF":8.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12325","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142560452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BTSC: Binary tree structure convolution layers for building interpretable decision-making deep CNN BTSC:用于构建可解释决策深度 CNN 的二叉树结构卷积层
IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-31 DOI: 10.1049/cit2.12328
Yuqi Wang, Dawei Dai, Da Liu, Shuyin Xia, Guoyin Wang

Although deep convolution neural network (DCNN) has achieved great success in computer vision field, such models are considered to lack interpretability in decision-making. One of fundamental issues is that its decision mechanism is considered to be a “black-box” operation. The authors design the binary tree structure convolution (BTSC) module and control the activation level of particular neurons to build the interpretable DCNN model. First, the authors design a BTSC module, in which each parent node generates two independent child layers, and then integrate them into a normal DCNN model. The main advantages of the BTSC are as follows: 1) child nodes of the different parent nodes do not interfere with each other; 2) parent and child nodes can inherit knowledge. Second, considering the activation level of neurons, the authors design an information coding objective to guide neural nodes to learn the particular information coding that is expected. Through the experiments, the authors can verify that: 1) the decision-making made by both the ResNet and DenseNet models can be explained well based on the "decision information flow path" (known as the decision-path) formed in the BTSC module; 2) the decision-path can reasonably interpret the decision reversal mechanism (Robustness mechanism) of the DCNN model; 3) the credibility of decision-making can be measured by the matching degree between the actual and expected decision-path.

虽然深度卷积神经网络(DCNN)在计算机视觉领域取得了巨大成功,但这类模型被认为在决策方面缺乏可解释性。其中一个根本问题是其决策机制被认为是 "黑箱 "操作。作者设计了二叉树结构卷积(BTSC)模块,并控制特定神经元的激活水平,以建立可解释的 DCNN 模型。首先,作者设计了一个 BTSC 模块,其中每个父节点生成两个独立的子层,然后将它们集成到一个普通的 DCNN 模型中。BTSC 的主要优点如下:1)不同父节点的子节点互不干扰;2)父节点和子节点可以继承知识。其次,考虑到神经元的激活水平,作者设计了一个信息编码目标,引导神经节点学习预期的特定信息编码。通过实验,作者可以验证1)根据 BTSC 模块中形成的 "决策信息流路径"(即决策路径),可以很好地解释 ResNet 和 DenseNet 模型的决策;2)决策路径可以合理地解释 DCNN 模型的决策逆转机制(鲁棒性机制);3)决策的可信度可以通过实际决策路径与预期决策路径的匹配程度来衡量。
{"title":"BTSC: Binary tree structure convolution layers for building interpretable decision-making deep CNN","authors":"Yuqi Wang,&nbsp;Dawei Dai,&nbsp;Da Liu,&nbsp;Shuyin Xia,&nbsp;Guoyin Wang","doi":"10.1049/cit2.12328","DOIUrl":"https://doi.org/10.1049/cit2.12328","url":null,"abstract":"<p>Although deep convolution neural network (DCNN) has achieved great success in computer vision field, such models are considered to lack interpretability in decision-making. One of fundamental issues is that its decision mechanism is considered to be a “black-box” operation. The authors design the binary tree structure convolution (BTSC) module and control the activation level of particular neurons to build the interpretable DCNN model. First, the authors design a BTSC module, in which each parent node generates two independent child layers, and then integrate them into a normal DCNN model. The main advantages of the BTSC are as follows: 1) child nodes of the different parent nodes do not interfere with each other; 2) parent and child nodes can inherit knowledge. Second, considering the activation level of neurons, the authors design an information coding objective to guide neural nodes to learn the particular information coding that is expected. Through the experiments, the authors can verify that: 1) the decision-making made by both the ResNet and DenseNet models can be explained well based on the \"decision information flow path\" (known as <b>the decision-path</b>) formed in the BTSC module; 2) <b>the decision-path</b> can reasonably interpret the decision reversal mechanism (Robustness mechanism) of the DCNN model; 3) the credibility of decision-making can be measured by the matching degree between the actual and expected <b>decision-path</b>.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1331-1345"},"PeriodicalIF":8.4,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UKF-MOT: An unscented Kalman filter-based 3D multi-object tracker UKF-MOT:基于卡尔曼滤波器的无香味 3D 多目标跟踪器
IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-29 DOI: 10.1049/cit2.12315
Meng Liu, Jianwei Niu, Yu Liu

Multi-object tracking in autonomous driving is a non-linear problem. To better address the tracking problem, this paper leveraged an unscented Kalman filter to predict the object's state. In the association stage, the Mahalanobis distance was employed as an affinity metric, and a Non-minimum Suppression method was designed for matching. With the detections fed into the tracker and continuous ‘predicting-matching’ steps, the states of each object at different time steps were described as their own continuous trajectories. We conducted extensive experiments to evaluate tracking accuracy on three challenging datasets (KITTI, nuScenes and Waymo). The experimental results demonstrated that our method effectively achieved multi-object tracking with satisfactory accuracy and real-time efficiency.

自动驾驶中的多目标跟踪是一个非线性问题。为了更好地解决跟踪问题,本文利用无特征卡尔曼滤波器来预测物体的状态。在关联阶段,采用了马哈拉诺比斯距离作为亲和度量,并设计了一种非最小抑制方法进行匹配。通过将检测结果输入跟踪器和连续的 "预测-匹配 "步骤,每个物体在不同时间步骤的状态都被描述为各自的连续轨迹。我们在三个具有挑战性的数据集(KITTI、nuScenes 和 Waymo)上进行了大量实验,以评估跟踪精度。实验结果表明,我们的方法有效地实现了多目标跟踪,并且具有令人满意的准确性和实时性。
{"title":"UKF-MOT: An unscented Kalman filter-based 3D multi-object tracker","authors":"Meng Liu,&nbsp;Jianwei Niu,&nbsp;Yu Liu","doi":"10.1049/cit2.12315","DOIUrl":"10.1049/cit2.12315","url":null,"abstract":"<p>Multi-object tracking in autonomous driving is a non-linear problem. To better address the tracking problem, this paper leveraged an unscented Kalman filter to predict the object's state. In the association stage, the Mahalanobis distance was employed as an affinity metric, and a Non-minimum Suppression method was designed for matching. With the detections fed into the tracker and continuous ‘predicting-matching’ steps, the states of each object at different time steps were described as their own continuous trajectories. We conducted extensive experiments to evaluate tracking accuracy on three challenging datasets (KITTI, nuScenes and Waymo). The experimental results demonstrated that our method effectively achieved multi-object tracking with satisfactory accuracy and real-time efficiency.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"1031-1041"},"PeriodicalIF":8.4,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140368908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer force perception skills to robot-assisted laminectomy via imitation learning from human demonstrations 通过对人类示范的模仿学习,将力感知技能转移到机器人辅助椎板切除术中
IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-29 DOI: 10.1049/cit2.12331
Meng Li, Xiaozhi Qi, Xiaoguang Han, Ying Hu, Bing Li, Yu Zhao, Jianwei Zhang

A comparative study of two force perception skill learning approaches for robot-assisted spinal surgery, the impedance model method and the imitation learning (IL) method, is presented. The impedance model method develops separate models for the surgeon and patient, incorporating spring-damper and bone-grinding models. Expert surgeons' feature parameters are collected and mapped using support vector regression and image navigation techniques. The imitation learning approach utilises long short-term memory networks (LSTM) and addresses accurate data labelling challenges with custom models. Experimental results demonstrate skill recognition rates of 63.61%–74.62% for the impedance model approach, relying on manual feature extraction. Conversely, the imitation learning approach achieves a force perception recognition rate of 91.06%, outperforming the impedance model on curved bone surfaces. The findings demonstrate the potential of imitation learning to enhance skill acquisition in robot-assisted spinal surgery by eliminating the laborious process of manual feature extraction.

本文对用于机器人辅助脊柱手术的两种力觉技能学习方法(阻抗模型法和模仿学习法)进行了比较研究。阻抗模型法为外科医生和患者分别建立了模型,其中包含弹簧破坏和骨研磨模型。利用支持向量回归和图像导航技术收集和映射外科医生专家的特征参数。模仿学习方法利用了长短期记忆网络(LSTM),并通过自定义模型解决了精确数据标记的难题。实验结果表明,依靠人工特征提取,阻抗模型方法的技能识别率为 63.61%-74.62%。相反,模仿学习方法的力感知识别率达到 91.06%,在弯曲的骨骼表面上优于阻抗模型。研究结果表明,模仿学习法省去了人工特征提取的繁琐过程,具有提高机器人辅助脊柱手术技能学习的潜力。
{"title":"Transfer force perception skills to robot-assisted laminectomy via imitation learning from human demonstrations","authors":"Meng Li,&nbsp;Xiaozhi Qi,&nbsp;Xiaoguang Han,&nbsp;Ying Hu,&nbsp;Bing Li,&nbsp;Yu Zhao,&nbsp;Jianwei Zhang","doi":"10.1049/cit2.12331","DOIUrl":"10.1049/cit2.12331","url":null,"abstract":"<p>A comparative study of two force perception skill learning approaches for robot-assisted spinal surgery, the impedance model method and the imitation learning (IL) method, is presented. The impedance model method develops separate models for the surgeon and patient, incorporating spring-damper and bone-grinding models. Expert surgeons' feature parameters are collected and mapped using support vector regression and image navigation techniques. The imitation learning approach utilises long short-term memory networks (LSTM) and addresses accurate data labelling challenges with custom models. Experimental results demonstrate skill recognition rates of 63.61%–74.62% for the impedance model approach, relying on manual feature extraction. Conversely, the imitation learning approach achieves a force perception recognition rate of 91.06%, outperforming the impedance model on curved bone surfaces. The findings demonstrate the potential of imitation learning to enhance skill acquisition in robot-assisted spinal surgery by eliminating the laborious process of manual feature extraction.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"903-916"},"PeriodicalIF":8.4,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140365841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking multi-spatial information for transferable adversarial attacks on speaker recognition systems 反思多空间信息对说话人识别系统的可转移对抗攻击
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-29 DOI: 10.1049/cit2.12295
Junjian Zhang, Hao Tan, Le Wang, Yaguan Qian, Zhaoquan Gu

Adversarial attacks have been posing significant security concerns to intelligent systems, such as speaker recognition systems (SRSs). Most attacks assume the neural networks in the systems are known beforehand, while black-box attacks are proposed without such information to meet practical situations. Existing black-box attacks improve transferability by integrating multiple models or training on multiple datasets, but these methods are costly. Motivated by the optimisation strategy with spatial information on the perturbed paths and samples, we propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method (DS-MI-FGSM) to improve the transferability of black-box attacks against SRSs. Specifically, DS-MI-FGSM only needs a single data and one model as the input; by extending to the data and model neighbouring spaces, it generates adversarial examples against the integrating models. To reduce the risk of overfitting, DS-MI-FGSM also introduces gradient masking to improve transferability. The authors conduct extensive experiments regarding the speaker recognition task, and the results demonstrate the effectiveness of their method, which can achieve up to 92% attack success rate on the victim model in black-box scenarios with only one known model.

对抗性攻击一直是扬声器识别系统(SRS)等智能系统的重要安全问题。大多数攻击假定系统中的神经网络是事先已知的,而黑盒攻击则是在没有此类信息的情况下提出的,以满足实际情况的需要。现有的黑盒攻击通过整合多个模型或在多个数据集上进行训练来提高可转移性,但这些方法成本高昂。受扰动路径和样本空间信息优化策略的启发,我们提出了双空间动量迭代快速梯度符号法(DS-MI-FGSM),以提高针对 SRS 的黑盒攻击的可转移性。具体来说,DS-MI-FGSM 只需要一个数据和一个模型作为输入;通过扩展到数据和模型的相邻空间,它可以生成针对整合模型的对抗实例。为了降低过拟合的风险,DS-MI-FGSM 还引入了梯度掩蔽以提高可转移性。作者就扬声器识别任务进行了广泛的实验,结果证明了他们的方法的有效性,在只有一个已知模型的黑盒场景中,对受害者模型的攻击成功率高达 92%。
{"title":"Rethinking multi-spatial information for transferable adversarial attacks on speaker recognition systems","authors":"Junjian Zhang,&nbsp;Hao Tan,&nbsp;Le Wang,&nbsp;Yaguan Qian,&nbsp;Zhaoquan Gu","doi":"10.1049/cit2.12295","DOIUrl":"10.1049/cit2.12295","url":null,"abstract":"<p>Adversarial attacks have been posing significant security concerns to intelligent systems, such as speaker recognition systems (SRSs). Most attacks assume the neural networks in the systems are known beforehand, while black-box attacks are proposed without such information to meet practical situations. Existing black-box attacks improve transferability by integrating multiple models or training on multiple datasets, but these methods are costly. Motivated by the optimisation strategy with spatial information on the perturbed paths and samples, we propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method (DS-MI-FGSM) to improve the transferability of black-box attacks against SRSs. Specifically, DS-MI-FGSM only needs a single data and one model as the input; by extending to the data and model neighbouring spaces, it generates adversarial examples against the integrating models. To reduce the risk of overfitting, DS-MI-FGSM also introduces gradient masking to improve transferability. The authors conduct extensive experiments regarding the speaker recognition task, and the results demonstrate the effectiveness of their method, which can achieve up to 92% attack success rate on the victim model in black-box scenarios with only one known model.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"620-631"},"PeriodicalIF":5.1,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140367055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mutual information oriented deep skill chaining for multi-agent reinforcement learning 面向互信息的多代理强化学习深度技能链
IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-28 DOI: 10.1049/cit2.12322
Zaipeng Xie, Cheng Ji, Chentai Qiao, WenZhan Song, Zewen Li, Yufeng Zhang, Yujing Zhang

Multi-agent reinforcement learning relies on reward signals to guide the policy networks of individual agents. However, in high-dimensional continuous spaces, the non-stationary environment can provide outdated experiences that hinder convergence, resulting in ineffective training performance for multi-agent systems. To tackle this issue, a novel reinforcement learning scheme, Mutual Information Oriented Deep Skill Chaining (MioDSC), is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency. These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state. In addition, MioDSC can generate cooperative policies using the options framework, allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi-agent learning. MioDSC was evaluated in the multi-agent particle environment and the StarCraft multi-agent challenge at varying difficulty levels. The experimental results demonstrate that MioDSC outperforms state-of-the-art methods and is robust across various multi-agent system tasks with high stability.

多代理强化学习依靠奖励信号来引导单个代理的策略网络。然而,在高维连续空间中,非稳态环境会提供过时的经验,阻碍收敛,导致多代理系统的训练效果不佳。为了解决这个问题,我们提出了一种新颖的强化学习方案--互信息导向的深度技能链(MioDSC),它通过纳入基于互信息的内在奖励来生成优化的合作策略,从而提高探索效率。这些奖励鼓励代理通过参与增加其行动与环境状态之间互信息的行动,使其学习过程多样化。此外,MioDSC 还能利用期权框架生成合作政策,允许代理学习和重复使用复杂的行动序列,加快多代理学习的收敛速度。MioDSC 在多代理粒子环境和不同难度的星际争霸多代理挑战中进行了评估。实验结果表明,MioDSC 的性能优于最先进的方法,并且在各种多代理系统任务中都具有高稳定性和鲁棒性。
{"title":"Mutual information oriented deep skill chaining for multi-agent reinforcement learning","authors":"Zaipeng Xie,&nbsp;Cheng Ji,&nbsp;Chentai Qiao,&nbsp;WenZhan Song,&nbsp;Zewen Li,&nbsp;Yufeng Zhang,&nbsp;Yujing Zhang","doi":"10.1049/cit2.12322","DOIUrl":"10.1049/cit2.12322","url":null,"abstract":"<p>Multi-agent reinforcement learning relies on reward signals to guide the policy networks of individual agents. However, in high-dimensional continuous spaces, the non-stationary environment can provide outdated experiences that hinder convergence, resulting in ineffective training performance for multi-agent systems. To tackle this issue, a novel reinforcement learning scheme, Mutual Information Oriented Deep Skill Chaining (MioDSC), is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency. These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state. In addition, MioDSC can generate cooperative policies using the options framework, allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi-agent learning. MioDSC was evaluated in the multi-agent particle environment and the StarCraft multi-agent challenge at varying difficulty levels. The experimental results demonstrate that MioDSC outperforms state-of-the-art methods and is robust across various multi-agent system tasks with high stability.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"1014-1030"},"PeriodicalIF":8.4,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140370730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UDT: U-shaped deformable transformer for subarachnoid haemorrhage image segmentation UDT:用于蛛网膜下腔出血图像分割的 U 形可变形变换器
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-25 DOI: 10.1049/cit2.12302
Wei Xie, Lianghao Jin, Shiqi Hua, Hao Sun, Bo Sun, Zhigang Tu, Jun Liu

Subarachnoid haemorrhage (SAH), mostly caused by the rupture of intracranial aneurysm, is a common disease with a high fatality rate. SAH lesions are generally diffusely distributed, showing a variety of scales with irregular edges. The complex characteristics of lesions make SAH segmentation a challenging task. To cope with these difficulties, a u-shaped deformable transformer (UDT) is proposed for SAH segmentation. Specifically, first, a multi-scale deformable attention (MSDA) module is exploited to model the diffuseness and scale-variant characteristics of SAH lesions, where the MSDA module can fuse features in different scales and adjust the attention field of each element dynamically to generate discriminative multi-scale features. Second, the cross deformable attention-based skip connection (CDASC) module is designed to model the irregular edge characteristic of SAH lesions, where the CDASC module can utilise the spatial details from encoder features to refine the spatial information of decoder features. Third, the MSDA and CDASC modules are embedded into the backbone Res-UNet to construct the proposed UDT. Extensive experiments are conducted on the self-built SAH-CT dataset and two public medical datasets (GlaS and MoNuSeg). Experimental results show that the presented UDT achieves the state-of-the-art performance.

蛛网膜下腔出血(SAH)主要由颅内动脉瘤破裂引起,是一种致死率很高的常见疾病。SAH 病变一般呈弥漫性分布,表现为边缘不规则的各种鳞片。病变的复杂特征使得 SAH 分割成为一项具有挑战性的任务。为了应对这些困难,我们提出了一种用于 SAH 分割的 U 形可变形变换器(UDT)。具体来说,首先,利用多尺度可变形注意(MSDA)模块来模拟 SAH 病变的弥散性和尺度变化特征,其中 MSDA 模块可以融合不同尺度的特征,并动态调整每个元素的注意场,以生成具有区分性的多尺度特征。其次,设计了基于交叉变形注意的跳接(CDASC)模块来模拟 SAH 病变的不规则边缘特征,CDASC 模块可以利用编码器特征的空间细节来完善解码器特征的空间信息。第三,将 MSDA 和 CDASC 模块嵌入到主干 Res-UNet 中,以构建所建议的 UDT。在自建的 SAH-CT 数据集和两个公共医疗数据集(GlaS 和 MoNuSeg)上进行了广泛的实验。实验结果表明,所提出的 UDT 达到了最先进的性能。
{"title":"UDT: U-shaped deformable transformer for subarachnoid haemorrhage image segmentation","authors":"Wei Xie,&nbsp;Lianghao Jin,&nbsp;Shiqi Hua,&nbsp;Hao Sun,&nbsp;Bo Sun,&nbsp;Zhigang Tu,&nbsp;Jun Liu","doi":"10.1049/cit2.12302","DOIUrl":"https://doi.org/10.1049/cit2.12302","url":null,"abstract":"<p>Subarachnoid haemorrhage (SAH), mostly caused by the rupture of intracranial aneurysm, is a common disease with a high fatality rate. SAH lesions are generally diffusely distributed, showing a variety of scales with irregular edges. The complex characteristics of lesions make SAH segmentation a challenging task. To cope with these difficulties, a u-shaped deformable transformer (UDT) is proposed for SAH segmentation. Specifically, first, a multi-scale deformable attention (MSDA) module is exploited to model the diffuseness and scale-variant characteristics of SAH lesions, where the MSDA module can fuse features in different scales and adjust the attention field of each element dynamically to generate discriminative multi-scale features. Second, the cross deformable attention-based skip connection (CDASC) module is designed to model the irregular edge characteristic of SAH lesions, where the CDASC module can utilise the spatial details from encoder features to refine the spatial information of decoder features. Third, the MSDA and CDASC modules are embedded into the backbone Res-UNet to construct the proposed UDT. Extensive experiments are conducted on the self-built SAH-CT dataset and two public medical datasets (GlaS and MoNuSeg). Experimental results show that the presented UDT achieves the state-of-the-art performance.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"756-768"},"PeriodicalIF":5.1,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12302","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141430328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved organs at risk segmentation based on modified U-Net with self-attention and consistency regularisation 基于自关注和一致性正则化的改良 U-Net 改进了风险器官的划分
IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-25 DOI: 10.1049/cit2.12303
Maksym Manko, Anton Popov, Juan Manuel Gorriz, Javier Ramirez

Cancer is one of the leading causes of death in the world, with radiotherapy as one of the treatment options. Radiotherapy planning starts with delineating the affected area from healthy organs, called organs at risk (OAR). A new approach to automatic OAR segmentation in the chest cavity in Computed Tomography (CT) images is presented. The proposed approach is based on the modified U-Net architecture with the ResNet-34 encoder, which is the baseline adopted in this work. The new two-branch CS-SA U-Net architecture is proposed, which consists of two parallel U-Net models in which self-attention blocks with cosine similarity as query-key similarity function (CS-SA) blocks are inserted between the encoder and decoder, which enabled the use of consistency regularisation. The proposed solution demonstrates state-of-the-art performance for the problem of OAR segmentation in CT images on the publicly available SegTHOR benchmark dataset in terms of a Dice coefficient (oesophagus—0.8714, heart—0.9516, trachea—0.9286, aorta—0.9510) and Hausdorff distance (oesophagus—0.2541, heart—0.1514, trachea—0.1722, aorta—0.1114) and significantly outperforms the baseline. The current approach is demonstrated to be viable for improving the quality of OAR segmentation for radiotherapy planning.

癌症是世界上最主要的死亡原因之一,放疗是其中一种治疗方法。放疗计划首先要从健康器官(称为危险器官(OAR))中划分出受影响的区域。本文介绍了一种在计算机断层扫描(CT)图像中自动分割胸腔内危险器官的新方法。所提出的方法基于改进的 U-Net 架构和 ResNet-34 编码器,这是本研究采用的基线。提出了新的双分支 CS-SA U-Net 架构,该架构由两个并行 U-Net 模型组成,其中在编码器和解码器之间插入了具有余弦相似性的自注意块作为查询键相似性函数(CS-SA)块,从而实现了一致性正则化的使用。在公开的 SegTHOR 基准数据集上,针对 CT 图像中的 OAR 分割问题,所提出的解决方案在 Dice 系数(食道-0.8714、心脏-0.9516、气管-0.9286、主动脉-0.9510)和 Hausdorff 距离(食道-0.2541、心脏-0.1514、气管-0.1722、主动脉-0.1114)方面都表现出了最先进的性能,并明显优于基线。事实证明,目前的方法可用于提高放疗计划的 OAR 分割质量。
{"title":"Improved organs at risk segmentation based on modified U-Net with self-attention and consistency regularisation","authors":"Maksym Manko,&nbsp;Anton Popov,&nbsp;Juan Manuel Gorriz,&nbsp;Javier Ramirez","doi":"10.1049/cit2.12303","DOIUrl":"10.1049/cit2.12303","url":null,"abstract":"<p>Cancer is one of the leading causes of death in the world, with radiotherapy as one of the treatment options. Radiotherapy planning starts with delineating the affected area from healthy organs, called organs at risk (OAR). A new approach to automatic OAR segmentation in the chest cavity in Computed Tomography (CT) images is presented. The proposed approach is based on the modified U-Net architecture with the ResNet-34 encoder, which is the baseline adopted in this work. The new two-branch CS-SA U-Net architecture is proposed, which consists of two parallel U-Net models in which self-attention blocks with cosine similarity as query-key similarity function (CS-SA) blocks are inserted between the encoder and decoder, which enabled the use of consistency regularisation. The proposed solution demonstrates state-of-the-art performance for the problem of OAR segmentation in CT images on the publicly available SegTHOR benchmark dataset in terms of a Dice coefficient (oesophagus—0.8714, heart—0.9516, trachea—0.9286, aorta—0.9510) and Hausdorff distance (oesophagus—0.2541, heart—0.1514, trachea—0.1722, aorta—0.1114) and significantly outperforms the baseline. The current approach is demonstrated to be viable for improving the quality of OAR segmentation for radiotherapy planning.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"850-865"},"PeriodicalIF":8.4,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12303","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140383790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
CAAI Transactions on Intelligence Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1