The dendritic neural model (DNM) mimics the non-linearity of synapses in the human brain to simulate the information processing mechanisms and procedures of neurons. This enhances the understanding of biological nervous systems and the applicability of the model in various fields. However, the existing DNM suffers from high complexity and limited generalisation capability. To address these issues, a DNM pruning method with dendrite layer significance constraints is proposed. This method not only evaluates the significance of dendrite layers but also allocates the significance of a few dendrite layers in the trained model to a few dendrite layers, allowing the removal of low-significance dendrite layers. The simulation experiments on six UCI datasets demonstrate that our method surpasses existing pruning methods in terms of network size and generalisation performance.
{"title":"Pruning method for dendritic neuron model based on dendrite layer significance constraints","authors":"Xudong Luo, Xiaohao Wen, Yan Li, Quanfu Li","doi":"10.1049/cit2.12234","DOIUrl":"https://doi.org/10.1049/cit2.12234","url":null,"abstract":"<p>The dendritic neural model (DNM) mimics the non-linearity of synapses in the human brain to simulate the information processing mechanisms and procedures of neurons. This enhances the understanding of biological nervous systems and the applicability of the model in various fields. However, the existing DNM suffers from high complexity and limited generalisation capability. To address these issues, a DNM pruning method with dendrite layer significance constraints is proposed. This method not only evaluates the significance of dendrite layers but also allocates the significance of a few dendrite layers in the trained model to a few dendrite layers, allowing the removal of low-significance dendrite layers. The simulation experiments on six UCI datasets demonstrate that our method surpasses existing pruning methods in terms of network size and generalisation performance.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"308-318"},"PeriodicalIF":5.1,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12234","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50116313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingchuan Zhou, Xiangyu Guo, Matthias Grimm, Elias Lochner, Zhongliang Jiang, Abouzar Eslami, Juan Ye, Nassir Navab, Alois Knoll, Mohammad Ali Nasseri
Abstract Subretinal injection is a complicated task for retinal surgeons to operate manually. In this paper we demonstrate a robust framework for needle detection and localisation in robot‐assisted subretinal injection using microscope‐integrated Optical Coherence Tomography with deep learning. Five convolutional neural networks with different architectures were evaluated. The main differences between the architectures are the amount of information they receive at the input layer. When evaluated on ex‐vivo pig eyes, the top performing network successfully detected all needles in the dataset and localised them with an Intersection over Union value of 0.55. The algorithm was evaluated by comparing the depth of the top and bottom edge of the predicted bounding box to the ground truth. This analysis showed that the top edge can be used to predict the depth of the needle with a maximum error of 8.5 μm.
{"title":"Needle detection and localisation for robot‐assisted subretinal injection using deep learning","authors":"Mingchuan Zhou, Xiangyu Guo, Matthias Grimm, Elias Lochner, Zhongliang Jiang, Abouzar Eslami, Juan Ye, Nassir Navab, Alois Knoll, Mohammad Ali Nasseri","doi":"10.1049/cit2.12242","DOIUrl":"https://doi.org/10.1049/cit2.12242","url":null,"abstract":"Abstract Subretinal injection is a complicated task for retinal surgeons to operate manually. In this paper we demonstrate a robust framework for needle detection and localisation in robot‐assisted subretinal injection using microscope‐integrated Optical Coherence Tomography with deep learning. Five convolutional neural networks with different architectures were evaluated. The main differences between the architectures are the amount of information they receive at the input layer. When evaluated on ex‐vivo pig eyes, the top performing network successfully detected all needles in the dataset and localised them with an Intersection over Union value of 0.55. The algorithm was evaluated by comparing the depth of the top and bottom edge of the predicted bounding box to the ground truth. This analysis showed that the top edge can be used to predict the depth of the needle with a maximum error of 8.5 μm.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135741431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatically detecting and locating remote occlusion small objects from the images of complex traffic environments is a valuable and challenging research. Since the boundary box location is not sufficiently accurate and it is difficult to distinguish overlapping and occluded objects, the authors propose a network model with a second-order term attention mechanism and occlusion loss. First, the backbone network is built on CSPDarkNet53. Then a method is designed for the feature extraction network based on an item-wise attention mechanism, which uses the filtered weighted feature vector to replace the original residual fusion and adds a second-order term to reduce the information loss in the process of fusion and accelerate the convergence of the model. Finally, an objected occlusion regression loss function is studied to reduce the problems of missed detections caused by dense objects. Sufficient experimental results demonstrate that the authors’ method achieved state-of-the-art performance without reducing the detection speed. The mAP@.5 of the method is 85.8% on the Foggy_cityscapes dataset and the mAP@.5 of the method is 97.8% on the KITTI dataset.
{"title":"An object detection approach with residual feature fusion and second-order term attention mechanism","authors":"Cuijin Li, Zhong Qu, Shengye Wang","doi":"10.1049/cit2.12236","DOIUrl":"10.1049/cit2.12236","url":null,"abstract":"<p>Automatically detecting and locating remote occlusion small objects from the images of complex traffic environments is a valuable and challenging research. Since the boundary box location is not sufficiently accurate and it is difficult to distinguish overlapping and occluded objects, the authors propose a network model with a second-order term attention mechanism and occlusion loss. First, the backbone network is built on CSPDarkNet53. Then a method is designed for the feature extraction network based on an item-wise attention mechanism, which uses the filtered weighted feature vector to replace the original residual fusion and adds a second-order term to reduce the information loss in the process of fusion and accelerate the convergence of the model. Finally, an objected occlusion regression loss function is studied to reduce the problems of missed detections caused by dense objects. Sufficient experimental results demonstrate that the authors’ method achieved state-of-the-art performance without reducing the detection speed. The <i>mAP@</i>.5 of the method is 85.8% on the Foggy_cityscapes dataset and the <i>mAP@</i>.5 of the method is 97.8% on the KITTI dataset.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"411-424"},"PeriodicalIF":5.1,"publicationDate":"2023-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78498318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data publishing methods can provide available information for analysis while preserving privacy. The multiple sensitive attributes data publishing, which preserves the relationship between sensitive attributes, may keep many records from being grouped and bring in a high record suppression ratio. Another category of multiple sensitive attributes data publishing, which reduces the possibility of record suppression by breaking the relationship between sensitive attributes, cannot provide the sensitive attributes association for analysis. Hence, the existing multiple sensitive attributes data publishing fails to fully account for the comprehensive information utility. To acquire a guaranteed information utility, this article defines comprehensive information loss that considers both the suppression of records and the relationship between sensitive attributes. A heuristic method is leveraged to discover the optimal anonymity scheme that has the lowest comprehensive information loss. The experimental results verify the practice of the proposed data publishing method with multiple sensitive attributes. The proposed method can guarantee information utility when compared with previous ones.
{"title":"A multiple sensitive attributes data publishing method with guaranteed information utility","authors":"Haibin Zhu, Tong Yi, Songtao Shang, Minyong Shi, Zhucheng Li, Wenqian Shang","doi":"10.1049/cit2.12235","DOIUrl":"https://doi.org/10.1049/cit2.12235","url":null,"abstract":"<p>Data publishing methods can provide available information for analysis while preserving privacy. The multiple sensitive attributes data publishing, which preserves the relationship between sensitive attributes, may keep many records from being grouped and bring in a high record suppression ratio. Another category of multiple sensitive attributes data publishing, which reduces the possibility of record suppression by breaking the relationship between sensitive attributes, cannot provide the sensitive attributes association for analysis. Hence, the existing multiple sensitive attributes data publishing fails to fully account for the comprehensive information utility. To acquire a guaranteed information utility, this article defines comprehensive information loss that considers both the suppression of records and the relationship between sensitive attributes. A heuristic method is leveraged to discover the optimal anonymity scheme that has the lowest comprehensive information loss. The experimental results verify the practice of the proposed data publishing method with multiple sensitive attributes. The proposed method can guarantee information utility when compared with previous ones.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"288-296"},"PeriodicalIF":5.1,"publicationDate":"2023-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12235","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50146513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Non-destructive detection of wire bonding defects in integrated circuits (IC) is critical for ensuring product quality after packaging. Image-processing-based methods do not provide a detailed evaluation of the three-dimensional defects of the bonding wire. Therefore, a method of 3D reconstruction and pattern recognition of wire defects based on stereo vision, which can achieve non-destructive detection of bonding wire defects is proposed. The contour features of bonding wires and other electronic components in the depth image is analysed to complete the 3D reconstruction of the bonding wires. Especially to filter the noisy point cloud and obtain an accurate point cloud of the bonding wire surface, a point cloud segmentation method based on spatial surface feature detection (SFD) was proposed. SFD can extract more distinct features from the bonding wire surface during the point cloud segmentation process. Furthermore, in the defect detection process, a directional discretisation descriptor with multiple local normal vectors is designed for defect pattern recognition of bonding wires. The descriptor combines local and global features of wire and can describe the spatial variation trends and structural features of wires. The experimental results show that the method can complete the 3D reconstruction and defect pattern recognition of bonding wires, and the average accuracy of defect recognition is 96.47%, which meets the production requirements of bonding wire defect detection.
{"title":"3D reconstruction and defect pattern recognition of bonding wire based on stereo vision","authors":"Naigong Yu, Hongzheng Li, Qiao Xu, Ouattara Sie, Essaf Firdaous","doi":"10.1049/cit2.12240","DOIUrl":"10.1049/cit2.12240","url":null,"abstract":"<p>Non-destructive detection of wire bonding defects in integrated circuits (IC) is critical for ensuring product quality after packaging. Image-processing-based methods do not provide a detailed evaluation of the three-dimensional defects of the bonding wire. Therefore, a method of 3D reconstruction and pattern recognition of wire defects based on stereo vision, which can achieve non-destructive detection of bonding wire defects is proposed. The contour features of bonding wires and other electronic components in the depth image is analysed to complete the 3D reconstruction of the bonding wires. Especially to filter the noisy point cloud and obtain an accurate point cloud of the bonding wire surface, a point cloud segmentation method based on spatial surface feature detection (SFD) was proposed. SFD can extract more distinct features from the bonding wire surface during the point cloud segmentation process. Furthermore, in the defect detection process, a directional discretisation descriptor with multiple local normal vectors is designed for defect pattern recognition of bonding wires. The descriptor combines local and global features of wire and can describe the spatial variation trends and structural features of wires. The experimental results show that the method can complete the 3D reconstruction and defect pattern recognition of bonding wires, and the average accuracy of defect recognition is 96.47%, which meets the production requirements of bonding wire defect detection.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"348-364"},"PeriodicalIF":5.1,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12240","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85004744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nasir Saleem, Jiechao Gao, Rizwana Irfan, Ahmad Almadhor, Hafiz Tayyab Rauf, Yudong Zhang, Seifedine Kadry
Speech emotion recognition (SER) is an important research problem in human-computer interaction systems. The representation and extraction of features are significant challenges in SER systems. Despite the promising results of recent studies, they generally do not leverage progressive fusion techniques for effective feature representation and increasing receptive fields. To mitigate this problem, this article proposes DeepCNN, which is a fusion of spectral and temporal features of emotional speech by parallelising convolutional neural networks (CNNs) and a convolution layer-based transformer. Two parallel CNNs are applied to extract the spectral features (2D-CNN) and temporal features (1D-CNN) representations. A 2D-convolution layer-based transformer module extracts spectro-temporal features and concatenates them with features from parallel CNNs. The learnt low-level concatenated features are then applied to a deep framework of convolutional blocks, which retrieves high-level feature representation and subsequently categorises the emotional states using an attention gated recurrent unit and classification layer. This fusion technique results in a deeper hierarchical feature representation at a lower computational cost while simultaneously expanding the filter depth and reducing the feature map. The Berlin Database of Emotional Speech (EMO-BD) and Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets are used in experiments to recognise distinct speech emotions. With efficient spectral and temporal feature representation, the proposed SER model achieves 94.2% accuracy for different emotions on the EMO-BD and 81.1% accuracy on the IEMOCAP dataset respectively. The proposed SER system, DeepCNN, outperforms the baseline SER systems in terms of emotion recognition accuracy on the EMO-BD and IEMOCAP datasets.
{"title":"DeepCNN: Spectro-temporal feature representation for speech emotion recognition","authors":"Nasir Saleem, Jiechao Gao, Rizwana Irfan, Ahmad Almadhor, Hafiz Tayyab Rauf, Yudong Zhang, Seifedine Kadry","doi":"10.1049/cit2.12233","DOIUrl":"https://doi.org/10.1049/cit2.12233","url":null,"abstract":"<p>Speech emotion recognition (SER) is an important research problem in human-computer interaction systems. The representation and extraction of features are significant challenges in SER systems. Despite the promising results of recent studies, they generally do not leverage progressive fusion techniques for effective feature representation and increasing receptive fields. To mitigate this problem, this article proposes DeepCNN, which is a fusion of spectral and temporal features of emotional speech by parallelising convolutional neural networks (CNNs) and a convolution layer-based transformer. Two parallel CNNs are applied to extract the spectral features (2D-CNN) and temporal features (1D-CNN) representations. A 2D-convolution layer-based transformer module extracts spectro-temporal features and concatenates them with features from parallel CNNs. The learnt low-level concatenated features are then applied to a deep framework of convolutional blocks, which retrieves high-level feature representation and subsequently categorises the emotional states using an attention gated recurrent unit and classification layer. This fusion technique results in a deeper hierarchical feature representation at a lower computational cost while simultaneously expanding the filter depth and reducing the feature map. The Berlin Database of Emotional Speech (EMO-BD) and Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets are used in experiments to recognise distinct speech emotions. With efficient spectral and temporal feature representation, the proposed SER model achieves 94.2% accuracy for different emotions on the EMO-BD and 81.1% accuracy on the IEMOCAP dataset respectively. The proposed SER system, DeepCNN, outperforms the baseline SER systems in terms of emotion recognition accuracy on the EMO-BD and IEMOCAP datasets.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"401-417"},"PeriodicalIF":5.1,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12233","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50154670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Shi, Qiaowen Shi, Xinwei Cao, Bin Li, Xiaobing Sun, Dimitrios K. Gerontitis
Time-varying matrix inversion is an important field of matrix research, and lots of research achievements have been obtained. In the process of solving time-varying matrix inversion, disturbances inevitably exist, thus, a model that can suppress disturbance while solving the problem is required. In this paper, an advanced continuous-time recurrent neural network (RNN) model based on a double integral RNN design formula is proposed for solving continuous time-varying matrix inversion, which has incomparable disturbance-suppression property. For digital hardware applications, the corresponding advanced discrete-time RNN model is proposed based on the discretisation formulas. As a result of theoretical analysis, it is demonstrated that the advanced continuous-time RNN model and the corresponding advanced discrete-time RNN model have global and exponential convergence performance, and they are excellent for suppressing different disturbances. Finally, inspiring experiments, including two numerical experiments and a practical experiment, are presented to demonstrate the effectiveness and superiority of the advanced discrete-time RNN model for solving discrete time-varying matrix inversion with disturbance-suppression.
{"title":"An advanced discrete-time RNN for handling discrete time-varying matrix inversion: Form model design to disturbance-suppression analysis","authors":"Yang Shi, Qiaowen Shi, Xinwei Cao, Bin Li, Xiaobing Sun, Dimitrios K. Gerontitis","doi":"10.1049/cit2.12229","DOIUrl":"https://doi.org/10.1049/cit2.12229","url":null,"abstract":"<p>Time-varying matrix inversion is an important field of matrix research, and lots of research achievements have been obtained. In the process of solving time-varying matrix inversion, disturbances inevitably exist, thus, a model that can suppress disturbance while solving the problem is required. In this paper, an advanced continuous-time recurrent neural network (RNN) model based on a double integral RNN design formula is proposed for solving continuous time-varying matrix inversion, which has incomparable disturbance-suppression property. For digital hardware applications, the corresponding advanced discrete-time RNN model is proposed based on the discretisation formulas. As a result of theoretical analysis, it is demonstrated that the advanced continuous-time RNN model and the corresponding advanced discrete-time RNN model have global and exponential convergence performance, and they are excellent for suppressing different disturbances. Finally, inspiring experiments, including two numerical experiments and a practical experiment, are presented to demonstrate the effectiveness and superiority of the advanced discrete-time RNN model for solving discrete time-varying matrix inversion with disturbance-suppression.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 3","pages":"607-621"},"PeriodicalIF":5.1,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12229","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50154054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Person image generation aims to generate images that maintain the original human appearance in different target poses. Recent works have revealed that the critical element in achieving this task is the alignment of appearance domain and pose domain. Previous alignment methods, such as appearance flow warping, correspondence learning and cross attention, often encounter challenges when it comes to producing fine texture details. These approaches suffer from limitations in accurately estimating appearance flows due to the lack of global receptive field. Alternatively, they can only perform cross-domain alignment on high-level feature maps with small spatial dimensions since the computational complexity increases quadratically with larger feature sizes. In this article, the significance of multi-scale alignment, in both low-level and high-level domains, for ensuring reliable cross-domain alignment of appearance and pose is demonstrated. To this end, a novel and effective method, named Multi-scale Cross-domain Alignment (MCA) is proposed. Firstly, MCA adopts global context aggregation transformer to model multi-scale interaction between pose and appearance inputs, which employs pair-wise window-based cross attention. Furthermore, leveraging the integrated global source information for each target position, MCA applies flexible flow prediction head and point correlation to effectively conduct warping and fusing for final transformed person image generation. Our proposed MCA achieves superior performance on two popular datasets than other methods, which verifies the effectiveness of our approach.
{"title":"Multi-scale cross-domain alignment for person image generation","authors":"Liyuan Ma, Tingwei Gao, Haibin Shen, Kejie Huang","doi":"10.1049/cit2.12224","DOIUrl":"10.1049/cit2.12224","url":null,"abstract":"<p>Person image generation aims to generate images that maintain the original human appearance in different target poses. Recent works have revealed that the critical element in achieving this task is the alignment of appearance domain and pose domain. Previous alignment methods, such as appearance flow warping, correspondence learning and cross attention, often encounter challenges when it comes to producing fine texture details. These approaches suffer from limitations in accurately estimating appearance flows due to the lack of global receptive field. Alternatively, they can only perform cross-domain alignment on high-level feature maps with small spatial dimensions since the computational complexity increases quadratically with larger feature sizes. In this article, the significance of multi-scale alignment, in both low-level and high-level domains, for ensuring reliable cross-domain alignment of appearance and pose is demonstrated. To this end, a novel and effective method, named Multi-scale Cross-domain Alignment (MCA) is proposed. Firstly, MCA adopts global context aggregation transformer to model multi-scale interaction between pose and appearance inputs, which employs pair-wise window-based cross attention. Furthermore, leveraging the integrated global source information for each target position, MCA applies flexible flow prediction head and point correlation to effectively conduct warping and fusing for final transformed person image generation. Our proposed MCA achieves superior performance on two popular datasets than other methods, which verifies the effectiveness of our approach.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"374-387"},"PeriodicalIF":5.1,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74017388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingna Si, Ziwei Tian, Dongmei Li, Lei Zhang, Lei Yao, Wenjuan Jiang, Jia Liu, Runshun Zhang, Xiaoping Zhang
Media convergence is a media change led by technological innovation. Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion. Obtaining consistent and complementary information among multiple modalities through media convergence can provide technical support for clustering. This article presents an approach based on Media Convergence and Graph convolution Encoder Clustering (MCGEC) for traditonal Chinese medicine (TCM) clinical data. It feeds modal information and graph structure from media information into a multi-modal graph convolution encoder to obtain the media feature representation learnt from multiple modalities. MCGEC captures latent information from various modalities by fusion and optimises the feature representations and network architecture with learnt clustering labels. The experiment is conducted on real-world multi-modal TCM clinical data, including information like images and text. MCGEC has improved clustering results compared to the generic single-modal clustering methods and the current more advanced multi-modal clustering methods. MCGEC applied to TCM clinical datasets can achieve better results. Integrating multimedia features into clustering algorithms offers significant benefits compared to single-modal clustering approaches that simply concatenate features from different modalities. It provides practical technical support for multi-modal clustering in the TCM field incorporating multimedia features.
{"title":"A multi-modal clustering method for traditonal Chinese medicine clinical data via media convergence","authors":"Jingna Si, Ziwei Tian, Dongmei Li, Lei Zhang, Lei Yao, Wenjuan Jiang, Jia Liu, Runshun Zhang, Xiaoping Zhang","doi":"10.1049/cit2.12230","DOIUrl":"https://doi.org/10.1049/cit2.12230","url":null,"abstract":"<p>Media convergence is a media change led by technological innovation. Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion. Obtaining consistent and complementary information among multiple modalities through media convergence can provide technical support for clustering. This article presents an approach based on Media Convergence and Graph convolution Encoder Clustering (MCGEC) for traditonal Chinese medicine (TCM) clinical data. It feeds modal information and graph structure from media information into a multi-modal graph convolution encoder to obtain the media feature representation learnt from multiple modalities. MCGEC captures latent information from various modalities by fusion and optimises the feature representations and network architecture with learnt clustering labels. The experiment is conducted on real-world multi-modal TCM clinical data, including information like images and text. MCGEC has improved clustering results compared to the generic single-modal clustering methods and the current more advanced multi-modal clustering methods. MCGEC applied to TCM clinical datasets can achieve better results. Integrating multimedia features into clustering algorithms offers significant benefits compared to single-modal clustering approaches that simply concatenate features from different modalities. It provides practical technical support for multi-modal clustering in the TCM field incorporating multimedia features.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"390-400"},"PeriodicalIF":5.1,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12230","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50141384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}