首页 > 最新文献

CAAI Transactions on Intelligence Technology最新文献

英文 中文
Considering spatiotemporal evolutionary information in dynamic multi‐objective optimisation 考虑时空演化信息的动态多目标优化
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-02 DOI: 10.1049/cit2.12249
Qinqin Fan, Min Jiang, Wentao Huang, Qingchao Jiang
{"title":"Considering spatiotemporal evolutionary information in dynamic multi‐objective optimisation","authors":"Qinqin Fan, Min Jiang, Wentao Huang, Qingchao Jiang","doi":"10.1049/cit2.12249","DOIUrl":"https://doi.org/10.1049/cit2.12249","url":null,"abstract":"","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"72 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86311467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pruning method for dendritic neuron model based on dendrite layer significance constraints 基于树突层显著性约束的树突神经元模型修剪方法
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-01 DOI: 10.1049/cit2.12234
Xudong Luo, Xiaohao Wen, Yan Li, Quanfu Li

The dendritic neural model (DNM) mimics the non-linearity of synapses in the human brain to simulate the information processing mechanisms and procedures of neurons. This enhances the understanding of biological nervous systems and the applicability of the model in various fields. However, the existing DNM suffers from high complexity and limited generalisation capability. To address these issues, a DNM pruning method with dendrite layer significance constraints is proposed. This method not only evaluates the significance of dendrite layers but also allocates the significance of a few dendrite layers in the trained model to a few dendrite layers, allowing the removal of low-significance dendrite layers. The simulation experiments on six UCI datasets demonstrate that our method surpasses existing pruning methods in terms of network size and generalisation performance.

树突神经模型(DNM)模拟人脑中突触的非线性,以模拟神经元的信息处理机制和过程。这增强了对生物神经系统的理解以及该模型在各个领域的适用性。然而,现有的DNM具有高复杂性和有限的泛化能力。为了解决这些问题,提出了一种具有枝晶层显著性约束的DNM修剪方法。该方法不仅评估了枝晶层的显著性,而且还将训练模型中少数树枝晶层的重要性分配给少数枝晶层,从而可以去除低显著性的枝晶层。在六个UCI数据集上的仿真实验表明,我们的方法在网络大小和泛化性能方面优于现有的修剪方法。
{"title":"Pruning method for dendritic neuron model based on dendrite layer significance constraints","authors":"Xudong Luo,&nbsp;Xiaohao Wen,&nbsp;Yan Li,&nbsp;Quanfu Li","doi":"10.1049/cit2.12234","DOIUrl":"https://doi.org/10.1049/cit2.12234","url":null,"abstract":"<p>The dendritic neural model (DNM) mimics the non-linearity of synapses in the human brain to simulate the information processing mechanisms and procedures of neurons. This enhances the understanding of biological nervous systems and the applicability of the model in various fields. However, the existing DNM suffers from high complexity and limited generalisation capability. To address these issues, a DNM pruning method with dendrite layer significance constraints is proposed. This method not only evaluates the significance of dendrite layers but also allocates the significance of a few dendrite layers in the trained model to a few dendrite layers, allowing the removal of low-significance dendrite layers. The simulation experiments on six UCI datasets demonstrate that our method surpasses existing pruning methods in terms of network size and generalisation performance.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"308-318"},"PeriodicalIF":5.1,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12234","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50116313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Needle detection and localisation for robot‐assisted subretinal injection using deep learning 基于深度学习的机器人辅助视网膜下注射针头检测和定位
2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-29 DOI: 10.1049/cit2.12242
Mingchuan Zhou, Xiangyu Guo, Matthias Grimm, Elias Lochner, Zhongliang Jiang, Abouzar Eslami, Juan Ye, Nassir Navab, Alois Knoll, Mohammad Ali Nasseri
Abstract Subretinal injection is a complicated task for retinal surgeons to operate manually. In this paper we demonstrate a robust framework for needle detection and localisation in robot‐assisted subretinal injection using microscope‐integrated Optical Coherence Tomography with deep learning. Five convolutional neural networks with different architectures were evaluated. The main differences between the architectures are the amount of information they receive at the input layer. When evaluated on ex‐vivo pig eyes, the top performing network successfully detected all needles in the dataset and localised them with an Intersection over Union value of 0.55. The algorithm was evaluated by comparing the depth of the top and bottom edge of the predicted bounding box to the ground truth. This analysis showed that the top edge can be used to predict the depth of the needle with a maximum error of 8.5 μm.
视网膜下注射是视网膜外科医生手工操作的一项复杂任务。在本文中,我们展示了一个强大的框架,用于机器人辅助视网膜下注射的针头检测和定位,该框架使用显微镜集成光学相干断层扫描和深度学习。对五种不同结构的卷积神经网络进行了评价。这两种体系结构之间的主要区别在于它们在输入层接收的信息量不同。当对离体猪眼进行评估时,表现最好的网络成功地检测到数据集中的所有针头,并以0.55的交联值对它们进行定位。通过将预测的边界盒上下边缘的深度与地面真实值进行比较,对算法进行评价。分析结果表明,利用顶缘可以预测针的深度,最大误差为8.5 μm。
{"title":"Needle detection and localisation for robot‐assisted subretinal injection using deep learning","authors":"Mingchuan Zhou, Xiangyu Guo, Matthias Grimm, Elias Lochner, Zhongliang Jiang, Abouzar Eslami, Juan Ye, Nassir Navab, Alois Knoll, Mohammad Ali Nasseri","doi":"10.1049/cit2.12242","DOIUrl":"https://doi.org/10.1049/cit2.12242","url":null,"abstract":"Abstract Subretinal injection is a complicated task for retinal surgeons to operate manually. In this paper we demonstrate a robust framework for needle detection and localisation in robot‐assisted subretinal injection using microscope‐integrated Optical Coherence Tomography with deep learning. Five convolutional neural networks with different architectures were evaluated. The main differences between the architectures are the amount of information they receive at the input layer. When evaluated on ex‐vivo pig eyes, the top performing network successfully detected all needles in the dataset and localised them with an Intersection over Union value of 0.55. The algorithm was evaluated by comparing the depth of the top and bottom edge of the predicted bounding box to the ground truth. This analysis showed that the top edge can be used to predict the depth of the needle with a maximum error of 8.5 μm.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135741431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An object detection approach with residual feature fusion and second-order term attention mechanism 采用残差特征融合和二阶术语注意机制的物体检测方法
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-27 DOI: 10.1049/cit2.12236
Cuijin Li, Zhong Qu, Shengye Wang

Automatically detecting and locating remote occlusion small objects from the images of complex traffic environments is a valuable and challenging research. Since the boundary box location is not sufficiently accurate and it is difficult to distinguish overlapping and occluded objects, the authors propose a network model with a second-order term attention mechanism and occlusion loss. First, the backbone network is built on CSPDarkNet53. Then a method is designed for the feature extraction network based on an item-wise attention mechanism, which uses the filtered weighted feature vector to replace the original residual fusion and adds a second-order term to reduce the information loss in the process of fusion and accelerate the convergence of the model. Finally, an objected occlusion regression loss function is studied to reduce the problems of missed detections caused by dense objects. Sufficient experimental results demonstrate that the authors’ method achieved state-of-the-art performance without reducing the detection speed. The mAP@.5 of the method is 85.8% on the Foggy_cityscapes dataset and the mAP@.5 of the method is 97.8% on the KITTI dataset.

从复杂交通环境的图像中自动检测和定位远处遮挡的小物体是一项极具价值和挑战性的研究。由于边界框定位不够准确,难以区分重叠和遮挡物体,作者提出了一种具有二阶项注意机制和遮挡损失的网络模型。首先,在 CSPDarkNet53 的基础上建立了骨干网络。然后,为特征提取网络设计了一种基于逐项注意机制的方法,该方法使用过滤后的加权特征向量来替代原始的残差融合,并添加一个二阶项来减少融合过程中的信息损失,加速模型的收敛。最后,研究了一种物体遮挡回归损失函数,以减少密集物体造成的漏检问题。充分的实验结果表明,作者的方法在不降低检测速度的情况下实现了最先进的性能。在 Foggy_cityscapes 数据集上,该方法的 mAP@.5 为 85.8%;在 KITTI 数据集上,该方法的 mAP@.5 为 97.8%。
{"title":"An object detection approach with residual feature fusion and second-order term attention mechanism","authors":"Cuijin Li,&nbsp;Zhong Qu,&nbsp;Shengye Wang","doi":"10.1049/cit2.12236","DOIUrl":"10.1049/cit2.12236","url":null,"abstract":"<p>Automatically detecting and locating remote occlusion small objects from the images of complex traffic environments is a valuable and challenging research. Since the boundary box location is not sufficiently accurate and it is difficult to distinguish overlapping and occluded objects, the authors propose a network model with a second-order term attention mechanism and occlusion loss. First, the backbone network is built on CSPDarkNet53. Then a method is designed for the feature extraction network based on an item-wise attention mechanism, which uses the filtered weighted feature vector to replace the original residual fusion and adds a second-order term to reduce the information loss in the process of fusion and accelerate the convergence of the model. Finally, an objected occlusion regression loss function is studied to reduce the problems of missed detections caused by dense objects. Sufficient experimental results demonstrate that the authors’ method achieved state-of-the-art performance without reducing the detection speed. The <i>mAP@</i>.5 of the method is 85.8% on the Foggy_cityscapes dataset and the <i>mAP@</i>.5 of the method is 97.8% on the KITTI dataset.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"411-424"},"PeriodicalIF":5.1,"publicationDate":"2023-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78498318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multiple sensitive attributes data publishing method with guaranteed information utility 一种具有保证信息实用性的多敏感属性数据发布方法
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-27 DOI: 10.1049/cit2.12235
Haibin Zhu, Tong Yi, Songtao Shang, Minyong Shi, Zhucheng Li, Wenqian Shang

Data publishing methods can provide available information for analysis while preserving privacy. The multiple sensitive attributes data publishing, which preserves the relationship between sensitive attributes, may keep many records from being grouped and bring in a high record suppression ratio. Another category of multiple sensitive attributes data publishing, which reduces the possibility of record suppression by breaking the relationship between sensitive attributes, cannot provide the sensitive attributes association for analysis. Hence, the existing multiple sensitive attributes data publishing fails to fully account for the comprehensive information utility. To acquire a guaranteed information utility, this article defines comprehensive information loss that considers both the suppression of records and the relationship between sensitive attributes. A heuristic method is leveraged to discover the optimal anonymity scheme that has the lowest comprehensive information loss. The experimental results verify the practice of the proposed data publishing method with multiple sensitive attributes. The proposed method can guarantee information utility when compared with previous ones.

数据发布方法可以在保护隐私的同时提供用于分析的可用信息。多敏感属性数据发布保留了敏感属性之间的关系,可以防止许多记录被分组,并带来高的记录抑制率。另一类多敏感属性数据发布,通过打破敏感属性之间的关系来降低记录抑制的可能性,无法提供敏感属性关联进行分析。因此,现有的多敏感属性数据发布未能充分考虑信息的综合效用。为了获得有保证的信息实用性,本文定义了综合信息损失,同时考虑了记录的抑制和敏感属性之间的关系。利用启发式方法来发现具有最低综合信息损失的最优匿名方案。实验结果验证了所提出的具有多个敏感属性的数据发布方法的实用性。与以前的方法相比,该方法能够保证信息的有效性。
{"title":"A multiple sensitive attributes data publishing method with guaranteed information utility","authors":"Haibin Zhu,&nbsp;Tong Yi,&nbsp;Songtao Shang,&nbsp;Minyong Shi,&nbsp;Zhucheng Li,&nbsp;Wenqian Shang","doi":"10.1049/cit2.12235","DOIUrl":"https://doi.org/10.1049/cit2.12235","url":null,"abstract":"<p>Data publishing methods can provide available information for analysis while preserving privacy. The multiple sensitive attributes data publishing, which preserves the relationship between sensitive attributes, may keep many records from being grouped and bring in a high record suppression ratio. Another category of multiple sensitive attributes data publishing, which reduces the possibility of record suppression by breaking the relationship between sensitive attributes, cannot provide the sensitive attributes association for analysis. Hence, the existing multiple sensitive attributes data publishing fails to fully account for the comprehensive information utility. To acquire a guaranteed information utility, this article defines comprehensive information loss that considers both the suppression of records and the relationship between sensitive attributes. A heuristic method is leveraged to discover the optimal anonymity scheme that has the lowest comprehensive information loss. The experimental results verify the practice of the proposed data publishing method with multiple sensitive attributes. The proposed method can guarantee information utility when compared with previous ones.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"288-296"},"PeriodicalIF":5.1,"publicationDate":"2023-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12235","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50146513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D reconstruction and defect pattern recognition of bonding wire based on stereo vision 基于立体视觉的键合丝三维重建和缺陷模式识别
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-26 DOI: 10.1049/cit2.12240
Naigong Yu, Hongzheng Li, Qiao Xu, Ouattara Sie, Essaf Firdaous

Non-destructive detection of wire bonding defects in integrated circuits (IC) is critical for ensuring product quality after packaging. Image-processing-based methods do not provide a detailed evaluation of the three-dimensional defects of the bonding wire. Therefore, a method of 3D reconstruction and pattern recognition of wire defects based on stereo vision, which can achieve non-destructive detection of bonding wire defects is proposed. The contour features of bonding wires and other electronic components in the depth image is analysed to complete the 3D reconstruction of the bonding wires. Especially to filter the noisy point cloud and obtain an accurate point cloud of the bonding wire surface, a point cloud segmentation method based on spatial surface feature detection (SFD) was proposed. SFD can extract more distinct features from the bonding wire surface during the point cloud segmentation process. Furthermore, in the defect detection process, a directional discretisation descriptor with multiple local normal vectors is designed for defect pattern recognition of bonding wires. The descriptor combines local and global features of wire and can describe the spatial variation trends and structural features of wires. The experimental results show that the method can complete the 3D reconstruction and defect pattern recognition of bonding wires, and the average accuracy of defect recognition is 96.47%, which meets the production requirements of bonding wire defect detection.

无损检测集成电路(IC)中的焊线缺陷对于确保封装后的产品质量至关重要。基于图像处理的方法无法提供键合导线三维缺陷的详细评估。因此,本文提出了一种基于立体视觉的焊线缺陷三维重建和模式识别方法,可实现焊线缺陷的无损检测。通过分析深度图像中键合线和其他电子元件的轮廓特征,完成键合线的三维重建。特别是为了过滤噪声点云,获得键合导线表面的精确点云,提出了一种基于空间表面特征检测(SFD)的点云分割方法。在点云分割过程中,SFD 可以从键合导线表面提取更多明显特征。此外,在缺陷检测过程中,设计了一种具有多个局部法向量的方向离散化描述符,用于键合导线的缺陷模式识别。该描述符结合了线材的局部和全局特征,能够描述线材的空间变化趋势和结构特征。实验结果表明,该方法可以完成键合丝的三维重建和缺陷模式识别,缺陷识别的平均准确率为 96.47%,满足键合丝缺陷检测的生产要求。
{"title":"3D reconstruction and defect pattern recognition of bonding wire based on stereo vision","authors":"Naigong Yu,&nbsp;Hongzheng Li,&nbsp;Qiao Xu,&nbsp;Ouattara Sie,&nbsp;Essaf Firdaous","doi":"10.1049/cit2.12240","DOIUrl":"10.1049/cit2.12240","url":null,"abstract":"<p>Non-destructive detection of wire bonding defects in integrated circuits (IC) is critical for ensuring product quality after packaging. Image-processing-based methods do not provide a detailed evaluation of the three-dimensional defects of the bonding wire. Therefore, a method of 3D reconstruction and pattern recognition of wire defects based on stereo vision, which can achieve non-destructive detection of bonding wire defects is proposed. The contour features of bonding wires and other electronic components in the depth image is analysed to complete the 3D reconstruction of the bonding wires. Especially to filter the noisy point cloud and obtain an accurate point cloud of the bonding wire surface, a point cloud segmentation method based on spatial surface feature detection (SFD) was proposed. SFD can extract more distinct features from the bonding wire surface during the point cloud segmentation process. Furthermore, in the defect detection process, a directional discretisation descriptor with multiple local normal vectors is designed for defect pattern recognition of bonding wires. The descriptor combines local and global features of wire and can describe the spatial variation trends and structural features of wires. The experimental results show that the method can complete the 3D reconstruction and defect pattern recognition of bonding wires, and the average accuracy of defect recognition is 96.47%, which meets the production requirements of bonding wire defect detection.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"348-364"},"PeriodicalIF":5.1,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12240","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85004744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepCNN: Spectro-temporal feature representation for speech emotion recognition DeepCNN:语音情感识别的光谱-时间特征表示
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-26 DOI: 10.1049/cit2.12233
Nasir Saleem, Jiechao Gao, Rizwana Irfan, Ahmad Almadhor, Hafiz Tayyab Rauf, Yudong Zhang, Seifedine Kadry

Speech emotion recognition (SER) is an important research problem in human-computer interaction systems. The representation and extraction of features are significant challenges in SER systems. Despite the promising results of recent studies, they generally do not leverage progressive fusion techniques for effective feature representation and increasing receptive fields. To mitigate this problem, this article proposes DeepCNN, which is a fusion of spectral and temporal features of emotional speech by parallelising convolutional neural networks (CNNs) and a convolution layer-based transformer. Two parallel CNNs are applied to extract the spectral features (2D-CNN) and temporal features (1D-CNN) representations. A 2D-convolution layer-based transformer module extracts spectro-temporal features and concatenates them with features from parallel CNNs. The learnt low-level concatenated features are then applied to a deep framework of convolutional blocks, which retrieves high-level feature representation and subsequently categorises the emotional states using an attention gated recurrent unit and classification layer. This fusion technique results in a deeper hierarchical feature representation at a lower computational cost while simultaneously expanding the filter depth and reducing the feature map. The Berlin Database of Emotional Speech (EMO-BD) and Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets are used in experiments to recognise distinct speech emotions. With efficient spectral and temporal feature representation, the proposed SER model achieves 94.2% accuracy for different emotions on the EMO-BD and 81.1% accuracy on the IEMOCAP dataset respectively. The proposed SER system, DeepCNN, outperforms the baseline SER systems in terms of emotion recognition accuracy on the EMO-BD and IEMOCAP datasets.

语音情感识别是人机交互系统中的一个重要研究问题。特征的表示和提取是SER系统中的重大挑战。尽管最近的研究取得了有希望的结果,但他们通常不会利用渐进融合技术来进行有效的特征表示和增加感受野。为了缓解这个问题,本文提出了DeepCNN,它是通过并行卷积神经网络(CNNs)和基于卷积层的转换器来融合情感语音的频谱和时间特征。应用两个并行的细胞神经网络来提取光谱特征(2D-CNN)和时间特征(1D-CNN)表示。基于2D卷积层的变换器模块提取光谱-时间特征,并将它们与来自并行细胞神经网络的特征连接起来。然后,将学习到的低级连接特征应用于卷积块的深层框架,该框架检索高级特征表示,并随后使用注意力门控递归单元和分类层对情绪状态进行分类。这种融合技术以较低的计算成本产生了更深层次的特征表示,同时扩展了滤波器深度并减少了特征图。柏林情感语音数据库(EMO-BD)和交互式情感二元运动捕捉(IEMOCAP)数据集被用于识别不同的语音情绪的实验。通过有效的光谱和时间特征表示,所提出的SER模型在EMO-BD上对不同情绪的准确率分别为94.2%,在IEMOCAP数据集上的准确率为81.1%。所提出的SER系统DeepCNN在EMO-BD和IEMOCAP数据集上的情绪识别准确性方面优于基线SER系统。
{"title":"DeepCNN: Spectro-temporal feature representation for speech emotion recognition","authors":"Nasir Saleem,&nbsp;Jiechao Gao,&nbsp;Rizwana Irfan,&nbsp;Ahmad Almadhor,&nbsp;Hafiz Tayyab Rauf,&nbsp;Yudong Zhang,&nbsp;Seifedine Kadry","doi":"10.1049/cit2.12233","DOIUrl":"https://doi.org/10.1049/cit2.12233","url":null,"abstract":"<p>Speech emotion recognition (SER) is an important research problem in human-computer interaction systems. The representation and extraction of features are significant challenges in SER systems. Despite the promising results of recent studies, they generally do not leverage progressive fusion techniques for effective feature representation and increasing receptive fields. To mitigate this problem, this article proposes DeepCNN, which is a fusion of spectral and temporal features of emotional speech by parallelising convolutional neural networks (CNNs) and a convolution layer-based transformer. Two parallel CNNs are applied to extract the spectral features (2D-CNN) and temporal features (1D-CNN) representations. A 2D-convolution layer-based transformer module extracts spectro-temporal features and concatenates them with features from parallel CNNs. The learnt low-level concatenated features are then applied to a deep framework of convolutional blocks, which retrieves high-level feature representation and subsequently categorises the emotional states using an attention gated recurrent unit and classification layer. This fusion technique results in a deeper hierarchical feature representation at a lower computational cost while simultaneously expanding the filter depth and reducing the feature map. The Berlin Database of Emotional Speech (EMO-BD) and Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets are used in experiments to recognise distinct speech emotions. With efficient spectral and temporal feature representation, the proposed SER model achieves 94.2% accuracy for different emotions on the EMO-BD and 81.1% accuracy on the IEMOCAP dataset respectively. The proposed SER system, DeepCNN, outperforms the baseline SER systems in terms of emotion recognition accuracy on the EMO-BD and IEMOCAP datasets.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"401-417"},"PeriodicalIF":5.1,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12233","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50154670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An advanced discrete-time RNN for handling discrete time-varying matrix inversion: Form model design to disturbance-suppression analysis 一种处理离散时变矩阵反演的先进离散时间RNN:从模型设计到干扰抑制分析
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-24 DOI: 10.1049/cit2.12229
Yang Shi, Qiaowen Shi, Xinwei Cao, Bin Li, Xiaobing Sun, Dimitrios K. Gerontitis

Time-varying matrix inversion is an important field of matrix research, and lots of research achievements have been obtained. In the process of solving time-varying matrix inversion, disturbances inevitably exist, thus, a model that can suppress disturbance while solving the problem is required. In this paper, an advanced continuous-time recurrent neural network (RNN) model based on a double integral RNN design formula is proposed for solving continuous time-varying matrix inversion, which has incomparable disturbance-suppression property. For digital hardware applications, the corresponding advanced discrete-time RNN model is proposed based on the discretisation formulas. As a result of theoretical analysis, it is demonstrated that the advanced continuous-time RNN model and the corresponding advanced discrete-time RNN model have global and exponential convergence performance, and they are excellent for suppressing different disturbances. Finally, inspiring experiments, including two numerical experiments and a practical experiment, are presented to demonstrate the effectiveness and superiority of the advanced discrete-time RNN model for solving discrete time-varying matrix inversion with disturbance-suppression.

时变矩阵反演是矩阵研究的一个重要领域,已经取得了许多研究成果。在求解时变矩阵反演的过程中,不可避免地会存在扰动,因此,需要一个在求解问题的同时能够抑制扰动的模型。本文提出了一种基于二重积分RNN设计公式的先进连续时间递归神经网络模型,用于求解连续时变矩阵反演,该模型具有无与伦比的扰动抑制性能。对于数字硬件应用,基于离散化公式,提出了相应的高级离散时间RNN模型。理论分析结果表明,高级连续时间RNN模型和相应的高级离散时间RNN具有全局和指数收敛性能,在抑制不同扰动方面表现出色。最后,通过两个数值实验和一个实际实验,验证了先进的离散时间RNN模型在求解具有扰动抑制的离散时变矩阵反演中的有效性和优越性。
{"title":"An advanced discrete-time RNN for handling discrete time-varying matrix inversion: Form model design to disturbance-suppression analysis","authors":"Yang Shi,&nbsp;Qiaowen Shi,&nbsp;Xinwei Cao,&nbsp;Bin Li,&nbsp;Xiaobing Sun,&nbsp;Dimitrios K. Gerontitis","doi":"10.1049/cit2.12229","DOIUrl":"https://doi.org/10.1049/cit2.12229","url":null,"abstract":"<p>Time-varying matrix inversion is an important field of matrix research, and lots of research achievements have been obtained. In the process of solving time-varying matrix inversion, disturbances inevitably exist, thus, a model that can suppress disturbance while solving the problem is required. In this paper, an advanced continuous-time recurrent neural network (RNN) model based on a double integral RNN design formula is proposed for solving continuous time-varying matrix inversion, which has incomparable disturbance-suppression property. For digital hardware applications, the corresponding advanced discrete-time RNN model is proposed based on the discretisation formulas. As a result of theoretical analysis, it is demonstrated that the advanced continuous-time RNN model and the corresponding advanced discrete-time RNN model have global and exponential convergence performance, and they are excellent for suppressing different disturbances. Finally, inspiring experiments, including two numerical experiments and a practical experiment, are presented to demonstrate the effectiveness and superiority of the advanced discrete-time RNN model for solving discrete time-varying matrix inversion with disturbance-suppression.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 3","pages":"607-621"},"PeriodicalIF":5.1,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12229","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50154054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-scale cross-domain alignment for person image generation 用于生成人物图像的多尺度跨域配准
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-24 DOI: 10.1049/cit2.12224
Liyuan Ma, Tingwei Gao, Haibin Shen, Kejie Huang

Person image generation aims to generate images that maintain the original human appearance in different target poses. Recent works have revealed that the critical element in achieving this task is the alignment of appearance domain and pose domain. Previous alignment methods, such as appearance flow warping, correspondence learning and cross attention, often encounter challenges when it comes to producing fine texture details. These approaches suffer from limitations in accurately estimating appearance flows due to the lack of global receptive field. Alternatively, they can only perform cross-domain alignment on high-level feature maps with small spatial dimensions since the computational complexity increases quadratically with larger feature sizes. In this article, the significance of multi-scale alignment, in both low-level and high-level domains, for ensuring reliable cross-domain alignment of appearance and pose is demonstrated. To this end, a novel and effective method, named Multi-scale Cross-domain Alignment (MCA) is proposed. Firstly, MCA adopts global context aggregation transformer to model multi-scale interaction between pose and appearance inputs, which employs pair-wise window-based cross attention. Furthermore, leveraging the integrated global source information for each target position, MCA applies flexible flow prediction head and point correlation to effectively conduct warping and fusing for final transformed person image generation. Our proposed MCA achieves superior performance on two popular datasets than other methods, which verifies the effectiveness of our approach.

生成人物图像的目的是在不同的目标姿势下生成保持人物原始外观的图像。最近的研究表明,实现这一任务的关键因素是外观域和姿势域的对齐。以往的配准方法,如外观流扭曲、对应学习和交叉注意等,在生成精细纹理细节时往往会遇到挑战。由于缺乏全局感受野,这些方法在准确估计外观流方面存在局限性。另外,这些方法只能在空间维度较小的高级特征图上进行跨域配准,因为计算复杂度会随着特征尺寸的增大而呈二次曲线增加。本文论证了在低级和高级域中进行多尺度配准对于确保外观和姿势可靠的跨域配准的重要性。为此,本文提出了一种新颖有效的方法,名为多尺度跨域配准(MCA)。首先,MCA 采用全局上下文聚合转换器来模拟姿态和外观输入之间的多尺度交互,该转换器采用了基于窗口的成对交叉关注。此外,MCA 利用每个目标位置的综合全局源信息,采用灵活的流预测头和点相关性,有效地进行扭曲和融合,从而生成最终的变换后人物图像。我们提出的 MCA 在两个流行的数据集上取得了优于其他方法的性能,这验证了我们方法的有效性。
{"title":"Multi-scale cross-domain alignment for person image generation","authors":"Liyuan Ma,&nbsp;Tingwei Gao,&nbsp;Haibin Shen,&nbsp;Kejie Huang","doi":"10.1049/cit2.12224","DOIUrl":"10.1049/cit2.12224","url":null,"abstract":"<p>Person image generation aims to generate images that maintain the original human appearance in different target poses. Recent works have revealed that the critical element in achieving this task is the alignment of appearance domain and pose domain. Previous alignment methods, such as appearance flow warping, correspondence learning and cross attention, often encounter challenges when it comes to producing fine texture details. These approaches suffer from limitations in accurately estimating appearance flows due to the lack of global receptive field. Alternatively, they can only perform cross-domain alignment on high-level feature maps with small spatial dimensions since the computational complexity increases quadratically with larger feature sizes. In this article, the significance of multi-scale alignment, in both low-level and high-level domains, for ensuring reliable cross-domain alignment of appearance and pose is demonstrated. To this end, a novel and effective method, named Multi-scale Cross-domain Alignment (MCA) is proposed. Firstly, MCA adopts global context aggregation transformer to model multi-scale interaction between pose and appearance inputs, which employs pair-wise window-based cross attention. Furthermore, leveraging the integrated global source information for each target position, MCA applies flexible flow prediction head and point correlation to effectively conduct warping and fusing for final transformed person image generation. Our proposed MCA achieves superior performance on two popular datasets than other methods, which verifies the effectiveness of our approach.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"374-387"},"PeriodicalIF":5.1,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74017388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-modal clustering method for traditonal Chinese medicine clinical data via media convergence 基于媒体融合的中医临床数据多模态聚类方法
IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-22 DOI: 10.1049/cit2.12230
Jingna Si, Ziwei Tian, Dongmei Li, Lei Zhang, Lei Yao, Wenjuan Jiang, Jia Liu, Runshun Zhang, Xiaoping Zhang

Media convergence is a media change led by technological innovation. Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion. Obtaining consistent and complementary information among multiple modalities through media convergence can provide technical support for clustering. This article presents an approach based on Media Convergence and Graph convolution Encoder Clustering (MCGEC) for traditonal Chinese medicine (TCM) clinical data. It feeds modal information and graph structure from media information into a multi-modal graph convolution encoder to obtain the media feature representation learnt from multiple modalities. MCGEC captures latent information from various modalities by fusion and optimises the feature representations and network architecture with learnt clustering labels. The experiment is conducted on real-world multi-modal TCM clinical data, including information like images and text. MCGEC has improved clustering results compared to the generic single-modal clustering methods and the current more advanced multi-modal clustering methods. MCGEC applied to TCM clinical datasets can achieve better results. Integrating multimedia features into clustering algorithms offers significant benefits compared to single-modal clustering approaches that simply concatenate features from different modalities. It provides practical technical support for multi-modal clustering in the TCM field incorporating multimedia features.

媒介融合是以技术创新为主导的媒介变革。将媒体融合技术应用于中医聚类研究,可以充分发挥媒体融合的优势。通过媒体融合在多种模式之间获得一致和互补的信息,可以为集群提供技术支持。本文提出了一种基于媒体融合和图卷积编码器聚类(MCGEC)的中医临床数据处理方法。它将媒体信息中的模态信息和图结构馈送到多模态图卷积编码器中,以获得从多个模态中学习的媒体特征表示。MCGEC通过融合捕获来自各种模态的潜在信息,并利用学习的聚类标签优化特征表示和网络架构。该实验是在真实世界的多模态中医临床数据上进行的,包括图像和文本等信息。与通用的单模态聚类方法和当前更先进的多模态聚类方法相比,MCGEC改进了聚类结果。MCGEC应用于中医临床数据集可以获得更好的结果。与简单地连接来自不同模态的特征的单模态聚类方法相比,将多媒体特征集成到聚类算法中提供了显著的好处。它为中医领域结合多媒体功能的多模式聚类提供了实用的技术支持。
{"title":"A multi-modal clustering method for traditonal Chinese medicine clinical data via media convergence","authors":"Jingna Si,&nbsp;Ziwei Tian,&nbsp;Dongmei Li,&nbsp;Lei Zhang,&nbsp;Lei Yao,&nbsp;Wenjuan Jiang,&nbsp;Jia Liu,&nbsp;Runshun Zhang,&nbsp;Xiaoping Zhang","doi":"10.1049/cit2.12230","DOIUrl":"https://doi.org/10.1049/cit2.12230","url":null,"abstract":"<p>Media convergence is a media change led by technological innovation. Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion. Obtaining consistent and complementary information among multiple modalities through media convergence can provide technical support for clustering. This article presents an approach based on Media Convergence and Graph convolution Encoder Clustering (MCGEC) for traditonal Chinese medicine (TCM) clinical data. It feeds modal information and graph structure from media information into a multi-modal graph convolution encoder to obtain the media feature representation learnt from multiple modalities. MCGEC captures latent information from various modalities by fusion and optimises the feature representations and network architecture with learnt clustering labels. The experiment is conducted on real-world multi-modal TCM clinical data, including information like images and text. MCGEC has improved clustering results compared to the generic single-modal clustering methods and the current more advanced multi-modal clustering methods. MCGEC applied to TCM clinical datasets can achieve better results. Integrating multimedia features into clustering algorithms offers significant benefits compared to single-modal clustering approaches that simply concatenate features from different modalities. It provides practical technical support for multi-modal clustering in the TCM field incorporating multimedia features.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"390-400"},"PeriodicalIF":5.1,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12230","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50141384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
CAAI Transactions on Intelligence Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1