Jian Zhang, Yonghong Zhang, Shanshan Liu, Xuquan Ji, Sizhuo Liu, Zhuofu Li, Baoduo Geng, Weishi Li, Tianmiao Wang
Laminectomy is one of the most common posterior spinal operations. Since the lamina is adjacent to important tissues such as nerves, once damaged, it can cause serious complications and even lead to paralysis. In order to prevent the above injuries and complications, ultrasonic bone scalpel and surgical robots have been introduced into spinal laminectomy, and many scholars have studied the recognition method of the bone tissue status. Currently, almost all methods to achieve recognition of bone tissue are based on sensor signals collected by high-precision sensors installed at the end of surgical robots. However, the previous methods could not accurately identify the state of spinal bone tissue. Innovatively, the identification of bone tissue status was regarded as a time series classification task, and the classification algorithm LSTM-FCN was used to process fusion signals composed of force and cutting depth signals, thus achieving an accurate classification of the lamina bone tissue status. In addition, it was verified that the accuracy of the proposed method could reach 98.85% in identifying the state of porcine spinal laminectomy. And the maximum penetration distance can be controlled within 0.6 mm, which is safe and can be used in practice.
{"title":"Safety control strategy of spinal lamina cutting based on force and cutting depth signals","authors":"Jian Zhang, Yonghong Zhang, Shanshan Liu, Xuquan Ji, Sizhuo Liu, Zhuofu Li, Baoduo Geng, Weishi Li, Tianmiao Wang","doi":"10.1049/cit2.12341","DOIUrl":"https://doi.org/10.1049/cit2.12341","url":null,"abstract":"<p>Laminectomy is one of the most common posterior spinal operations. Since the lamina is adjacent to important tissues such as nerves, once damaged, it can cause serious complications and even lead to paralysis. In order to prevent the above injuries and complications, ultrasonic bone scalpel and surgical robots have been introduced into spinal laminectomy, and many scholars have studied the recognition method of the bone tissue status. Currently, almost all methods to achieve recognition of bone tissue are based on sensor signals collected by high-precision sensors installed at the end of surgical robots. However, the previous methods could not accurately identify the state of spinal bone tissue. Innovatively, the identification of bone tissue status was regarded as a time series classification task, and the classification algorithm LSTM-FCN was used to process fusion signals composed of force and cutting depth signals, thus achieving an accurate classification of the lamina bone tissue status. In addition, it was verified that the accuracy of the proposed method could reach 98.85% in identifying the state of porcine spinal laminectomy. And the maximum penetration distance can be controlled within 0.6 mm, which is safe and can be used in practice.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"894-902"},"PeriodicalIF":8.4,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142007196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeyu Zhao, Nan Gao, Zhi Zeng, Guixuan Zhang, Jie Liu, Shuwu Zhang
Generating co-speech gestures for interactive digital humans remains challenging because of the indeterministic nature of the problem. The authors observe that gestures generated from speech audio or text by existing neural methods often contain less movement shift than expected, which can be viewed as slow or dull. Thus, a new generative model coupled with memory networks as dynamic dictionaries for speech-driven gesture generation with improved diversity is proposed. More specifically, the dictionary network dynamically stores connections between text and pose features in a list of key-value pairs as the memory for the pose generation network to look up; the pose generation network then merges the matching pose features and input audio features for generating the final pose sequences. To make the improvements more accurately measurable, a new objective evaluation metric for gesture diversity that can remove the influence of low-quality motions is also proposed and tested. Quantitative and qualitative experiments demonstrate that the proposed architecture succeeds in generating gestures with improved diversity.
{"title":"Improving diversity of speech-driven gesture generation with memory networks as dynamic dictionaries","authors":"Zeyu Zhao, Nan Gao, Zhi Zeng, Guixuan Zhang, Jie Liu, Shuwu Zhang","doi":"10.1049/cit2.12321","DOIUrl":"10.1049/cit2.12321","url":null,"abstract":"<p>Generating co-speech gestures for interactive digital humans remains challenging because of the indeterministic nature of the problem. The authors observe that gestures generated from speech audio or text by existing neural methods often contain less movement shift than expected, which can be viewed as slow or dull. Thus, a new generative model coupled with memory networks as dynamic dictionaries for speech-driven gesture generation with improved diversity is proposed. More specifically, the dictionary network dynamically stores connections between text and pose features in a list of key-value pairs as the memory for the pose generation network to look up; the pose generation network then merges the matching pose features and input audio features for generating the final pose sequences. To make the improvements more accurately measurable, a new objective evaluation metric for gesture diversity that can remove the influence of low-quality motions is also proposed and tested. Quantitative and qualitative experiments demonstrate that the proposed architecture succeeds in generating gestures with improved diversity.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1275-1289"},"PeriodicalIF":8.4,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140673597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rise of online-to-offline (O2O) e-commerce business has brought tremendous opportunities to the logistics industry. In the online-to-offline logistics business, it is essential to detect anomaly merchants with fraudulent shipping behaviours, such as sending other merchants' packages for profit with their low discounts. This can help reduce the financial losses of platforms and ensure a healthy environment. Existing anomaly detection studies have mainly focused on online fraud behaviour detection, such as fraudulent purchase and comment behaviours in e-commerce. However, these methods are not suitable for anomaly merchant detection in logistics due to the more complex online and offline operation of package-sending behaviours and the interpretable requirements of offline deployment in logistics. MultiDet, a semi-supervised multi-view fusion-based Anomaly Detection framework in online-to-offline logistics is proposed, which consists of a basic version SemiDet and an attention-enhanced multi-view fusion model. In SemiDet, pair-wise data augmentation is first conducted to promote model robustness and address the challenge of limited labelled anomaly instances. Then, SemiDet calculates the anomaly scoring of each merchant with an auto-encoder framework. Considering the multi-relationships among logistics merchants, a multi-view attention fusion-based anomaly detection network is further designed to capture merchants' mutual influences and improve the anomaly merchant detection performance. A post-hoc perturbation-based interpretation model is designed to output the importance of different views and ensure the trustworthiness of end-to-end anomaly detection. The framework based on an eight-month real-world dataset collected from one of the largest logistics platforms in China is evaluated, involving 6128 merchants and 16 million historical order consignor records in Beijing. Experimental results show that the proposed model outperforms other baselines in both AUC-ROC and AUC-PR metrics.
{"title":"Trustworthy semi-supervised anomaly detection for online-to-offline logistics business in merchant identification","authors":"Yong Li, Shuhang Wang, Shijie Xu, Jiao Yin","doi":"10.1049/cit2.12301","DOIUrl":"10.1049/cit2.12301","url":null,"abstract":"<p>The rise of online-to-offline (O2O) e-commerce business has brought tremendous opportunities to the logistics industry. In the online-to-offline logistics business, it is essential to detect anomaly merchants with fraudulent shipping behaviours, such as sending other merchants' packages for profit with their low discounts. This can help reduce the financial losses of platforms and ensure a healthy environment. Existing anomaly detection studies have mainly focused on online fraud behaviour detection, such as fraudulent purchase and comment behaviours in e-commerce. However, these methods are not suitable for anomaly merchant detection in logistics due to the more complex online and offline operation of package-sending behaviours and the interpretable requirements of offline deployment in logistics. MultiDet, a semi-supervised multi-view fusion-based Anomaly Detection framework in online-to-offline logistics is proposed, which consists of a basic version SemiDet and an attention-enhanced multi-view fusion model. In SemiDet, pair-wise data augmentation is first conducted to promote model robustness and address the challenge of limited labelled anomaly instances. Then, SemiDet calculates the anomaly scoring of each merchant with an auto-encoder framework. Considering the multi-relationships among logistics merchants, a multi-view attention fusion-based anomaly detection network is further designed to capture merchants' mutual influences and improve the anomaly merchant detection performance. A post-hoc perturbation-based interpretation model is designed to output the importance of different views and ensure the trustworthiness of end-to-end anomaly detection. The framework based on an eight-month real-world dataset collected from one of the largest logistics platforms in China is evaluated, involving 6128 merchants and 16 million historical order consignor records in Beijing. Experimental results show that the proposed model outperforms other baselines in both AUC-ROC and AUC-PR metrics.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"544-556"},"PeriodicalIF":5.1,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12301","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140706776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Syed Konain Abbas, Muhammad Usman Ghani Khan, Jia Zhu, Raheem Sarwar, Naif R. Aljohani, Ibrahim A. Hameed, Muhammad Umair Hassan
Transportation systems primarily depend on vehicular flow on roads. Developed countries have shifted towards automated signal control, which manages and updates signal synchronisation automatically. In contrast, traffic in underdeveloped countries is mainly governed by manual traffic light systems. These existing manual systems lead to numerous issues, wasting substantial resources such as time, energy, and fuel, as they cannot make real-time decisions. In this work, we propose an algorithm to determine traffic signal durations based on real-time vehicle density, obtained from live closed circuit television camera feeds adjacent to traffic signals. The algorithm automates the traffic light system, making decisions based on vehicle density and employing Faster R-CNN for vehicle detection. Additionally, we have created a local dataset from live streams of Punjab Safe City cameras in collaboration with the local police authority. The proposed algorithm achieves a class accuracy of 96.6% and a vehicle detection accuracy of 95.7%. Across both day and night modes, our proposed method maintains an average precision, recall, F1 score, and vehicle detection accuracy of 0.94, 0.98, 0.96 and 0.95, respectively. Our proposed work surpasses all evaluation metrics compared to state-of-the-art methodologies.
{"title":"Vision based intelligent traffic light management system using Faster R-CNN","authors":"Syed Konain Abbas, Muhammad Usman Ghani Khan, Jia Zhu, Raheem Sarwar, Naif R. Aljohani, Ibrahim A. Hameed, Muhammad Umair Hassan","doi":"10.1049/cit2.12309","DOIUrl":"10.1049/cit2.12309","url":null,"abstract":"<p>Transportation systems primarily depend on vehicular flow on roads. Developed countries have shifted towards automated signal control, which manages and updates signal synchronisation automatically. In contrast, traffic in underdeveloped countries is mainly governed by manual traffic light systems. These existing manual systems lead to numerous issues, wasting substantial resources such as time, energy, and fuel, as they cannot make real-time decisions. In this work, we propose an algorithm to determine traffic signal durations based on real-time vehicle density, obtained from live closed circuit television camera feeds adjacent to traffic signals. The algorithm automates the traffic light system, making decisions based on vehicle density and employing Faster R-CNN for vehicle detection. Additionally, we have created a local dataset from live streams of Punjab Safe City cameras in collaboration with the local police authority. The proposed algorithm achieves a class accuracy of 96.6% and a vehicle detection accuracy of 95.7%. Across both day and night modes, our proposed method maintains an average precision, recall, <i>F</i>1 score, and vehicle detection accuracy of 0.94, 0.98, 0.96 and 0.95, respectively. Our proposed work surpasses all evaluation metrics compared to state-of-the-art methodologies.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"932-947"},"PeriodicalIF":8.4,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12309","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140720533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunwei Tian, Xuanyu Zhang, Qi Zhang, Mingming Yang, Zhaojie Ju
Convolutional neural networks depend on deep network architectures to extract accurate information for image super-resolution. However, obtained information of these convolutional neural networks cannot completely express predicted high-quality images for complex scenes. A dynamic network for image super-resolution (DSRNet) is presented, which contains a residual enhancement block, wide enhancement block, feature refinement block and construction block. The residual enhancement block is composed of a residual enhanced architecture to facilitate hierarchical features for image super-resolution. To enhance robustness of obtained super-resolution model for complex scenes, a wide enhancement block achieves a dynamic architecture to learn more robust information to enhance applicability of an obtained super-resolution model for varying scenes. To prevent interference of components in a wide enhancement block, a refinement block utilises a stacked architecture to accurately learn obtained features. Also, a residual learning operation is embedded in the refinement block to prevent long-term dependency problem. Finally, a construction block is responsible for reconstructing high-quality images. Designed heterogeneous architecture can not only facilitate richer structural information, but also be lightweight, which is suitable for mobile digital devices. Experimental results show that our method is more competitive in terms of performance, recovering time of image super-resolution and complexity. The code of DSRNet can be obtained at https://github.com/hellloxiaotian/DSRNet.
{"title":"Image super-resolution via dynamic network","authors":"Chunwei Tian, Xuanyu Zhang, Qi Zhang, Mingming Yang, Zhaojie Ju","doi":"10.1049/cit2.12297","DOIUrl":"https://doi.org/10.1049/cit2.12297","url":null,"abstract":"<p>Convolutional neural networks depend on deep network architectures to extract accurate information for image super-resolution. However, obtained information of these convolutional neural networks cannot completely express predicted high-quality images for complex scenes. A dynamic network for image super-resolution (DSRNet) is presented, which contains a residual enhancement block, wide enhancement block, feature refinement block and construction block. The residual enhancement block is composed of a residual enhanced architecture to facilitate hierarchical features for image super-resolution. To enhance robustness of obtained super-resolution model for complex scenes, a wide enhancement block achieves a dynamic architecture to learn more robust information to enhance applicability of an obtained super-resolution model for varying scenes. To prevent interference of components in a wide enhancement block, a refinement block utilises a stacked architecture to accurately learn obtained features. Also, a residual learning operation is embedded in the refinement block to prevent long-term dependency problem. Finally, a construction block is responsible for reconstructing high-quality images. Designed heterogeneous architecture can not only facilitate richer structural information, but also be lightweight, which is suitable for mobile digital devices. Experimental results show that our method is more competitive in terms of performance, recovering time of image super-resolution and complexity. The code of DSRNet can be obtained at https://github.com/hellloxiaotian/DSRNet.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"837-849"},"PeriodicalIF":8.4,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12297","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142007230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Drug-drug interaction (DDI) prediction is a crucial issue in molecular biology. Traditional methods of observing drug-drug interactions through medical experiments require significant resources and labour. The authors present a Medical Knowledge Graph Question Answering (MedKGQA) model, dubbed MedKGQA, that predicts DDI by employing machine reading comprehension (MRC) from closed-domain literature and constructing a knowledge graph of “drug-protein” triplets from open-domain documents. The model vectorises the drug-protein target attributes in the graph using entity embeddings and establishes directed connections between drug and protein entities based on the metabolic interaction pathways of protein targets in the human body. This aligns multiple external knowledge and applies it to learn the graph neural network. Without bells and whistles, the proposed model achieved a 4.5% improvement in terms of DDI prediction accuracy compared to previous state-of-the-art models on the QAngaroo MedHop dataset. Experimental results demonstrate the efficiency and effectiveness of the model and verify the feasibility of integrating external knowledge in MRC tasks.
{"title":"Medical knowledge graph question answering for drug-drug interaction prediction based on multi-hop machine reading comprehension","authors":"Peng Gao, Feng Gao, Jian-Cheng Ni, Yu Wang, Fei Wang, Qiquan Zhang","doi":"10.1049/cit2.12332","DOIUrl":"10.1049/cit2.12332","url":null,"abstract":"<p>Drug-drug interaction (DDI) prediction is a crucial issue in molecular biology. Traditional methods of observing drug-drug interactions through medical experiments require significant resources and labour. The authors present a Medical Knowledge Graph Question Answering (<i>MedKGQA</i>) model, dubbed <i>MedKGQA</i>, that predicts DDI by employing machine reading comprehension (MRC) from closed-domain literature and constructing a knowledge graph of “drug-protein” triplets from open-domain documents. The model vectorises the drug-protein target attributes in the graph using entity embeddings and establishes directed connections between drug and protein entities based on the metabolic interaction pathways of protein targets in the human body. This aligns multiple external knowledge and applies it to learn the graph neural network. Without bells and whistles, the proposed model achieved a 4.5% improvement in terms of DDI prediction accuracy compared to previous state-of-the-art models on the QA<span>ngaroo</span> M<span>ed</span>H<span>op</span> dataset. Experimental results demonstrate the efficiency and effectiveness of the model and verify the feasibility of integrating external knowledge in MRC tasks.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1217-1228"},"PeriodicalIF":8.4,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12332","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140752882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Support vector machine (SVM) is a binary classifier widely used in machine learning. However, neglecting the latent data structure in previous SVM can limit the performance of SVM and its extensions. To address this issue, the authors propose a novel SVM with discriminative low-rank embedding (LRSVM) that finds a discriminative latent low-rank subspace more suitable for SVM classification. The extension models of LRSVM are introduced by imposing different orthogonality constraints to prevent computational inaccuracies. A detailed derivation of the authors’ iterative algorithms are given that is essentially for solving the SVM on the low-rank subspace. Additionally, some theorems and properties of the proposed models are presented by the authors. It is worth mentioning that the subproblems of the proposed algorithms are equivalent to the standard or the weighted linear discriminant analysis (LDA) problems. This indicates that the projection subspaces obtained by the authors’ algorithms are more suitable for SVM classification compared to those from the LDA method. The convergence analysis for the authors proposed algorithms are also provided. Furthermore, the authors conduct experiments on various machine learning data sets to evaluate the algorithms. The experiment results show that the authors’ algorithms perform significantly better than other algorithms, which indicates their superior abilities on classification tasks.
{"title":"Support vector machine with discriminative low-rank embedding","authors":"Guangfei Liang, Zhihui Lai, Heng Kong","doi":"10.1049/cit2.12329","DOIUrl":"10.1049/cit2.12329","url":null,"abstract":"<p>Support vector machine (SVM) is a binary classifier widely used in machine learning. However, neglecting the latent data structure in previous SVM can limit the performance of SVM and its extensions. To address this issue, the authors propose a novel SVM with discriminative low-rank embedding (LRSVM) that finds a discriminative latent low-rank subspace more suitable for SVM classification. The extension models of LRSVM are introduced by imposing different orthogonality constraints to prevent computational inaccuracies. A detailed derivation of the authors’ iterative algorithms are given that is essentially for solving the SVM on the low-rank subspace. Additionally, some theorems and properties of the proposed models are presented by the authors. It is worth mentioning that the subproblems of the proposed algorithms are equivalent to the standard or the weighted linear discriminant analysis (LDA) problems. This indicates that the projection subspaces obtained by the authors’ algorithms are more suitable for SVM classification compared to those from the LDA method. The convergence analysis for the authors proposed algorithms are also provided. Furthermore, the authors conduct experiments on various machine learning data sets to evaluate the algorithms. The experiment results show that the authors’ algorithms perform significantly better than other algorithms, which indicates their superior abilities on classification tasks.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1249-1262"},"PeriodicalIF":8.4,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12329","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140753268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image classification algorithms are commonly based on the Independent and Identically Distribution (i.i.d.) assumption, but in practice, the Out-Of-Distribution (OOD) problem widely exists, that is, the contexts of images in the model predicting are usually unseen during training. In this case, existing models trained under the i.i.d. assumption are limiting generalisation. Causal inference is an important method to learn the causal associations which are invariant across different environments, thus improving the generalisation ability of the model. However, existing methods usually require partitioning of the environment to learn invariant features, which mostly have imbalance problems due to the lack of constraints. In this paper, we propose a balanced causal learning framework (BCL), starting from how to divide the dataset in a balanced way and the balance of training after the division, which automatically generates fine-grained balanced data partitions in an unsupervised manner and balances the training difficulty of different classes, thereby enhancing the generalisation ability of models in different environments. Experiments on the OOD datasets NICO and NICO++ demonstrate that BCL achieves stable predictions on OOD data, and we also find that models using BCL focus more accurately on the foreground of images compared with the existing causal inference method, which effectively improves the generalisation ability.
{"title":"Causal inference for out-of-distribution recognition via sample balancing","authors":"Yuqing Wang, Xiangxian Li, Yannan Liu, Xiao Cao, Xiangxu Meng, Lei Meng","doi":"10.1049/cit2.12311","DOIUrl":"10.1049/cit2.12311","url":null,"abstract":"<p>Image classification algorithms are commonly based on the Independent and Identically Distribution (i.i.d.) assumption, but in practice, the Out-Of-Distribution (OOD) problem widely exists, that is, the contexts of images in the model predicting are usually unseen during training. In this case, existing models trained under the i.i.d. assumption are limiting generalisation. Causal inference is an important method to learn the causal associations which are invariant across different environments, thus improving the generalisation ability of the model. However, existing methods usually require partitioning of the environment to learn invariant features, which mostly have imbalance problems due to the lack of constraints. In this paper, we propose a balanced causal learning framework (BCL), starting from how to divide the dataset in a balanced way and the balance of training after the division, which automatically generates fine-grained balanced data partitions in an unsupervised manner and balances the training difficulty of different classes, thereby enhancing the generalisation ability of models in different environments. Experiments on the OOD datasets NICO and NICO++ demonstrate that BCL achieves stable predictions on OOD data, and we also find that models using BCL focus more accurately on the foreground of images compared with the existing causal inference method, which effectively improves the generalisation ability.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1172-1184"},"PeriodicalIF":8.4,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12311","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140753670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rihao Chang, Yongtao Ma, Tong Hao, Weijie Wang, Weizhi Nie
The surge in 3D modelling has led to a pronounced research emphasis on the field of 3D shape retrieval. Numerous contemporary approaches have been put forth to tackle this intricate challenge. Nevertheless, effectively addressing the intricacies of cross-modal 3D shape retrieval remains a formidable undertaking, owing to inherent modality-based disparities. The authors present an innovative notion—termed “geometric words”—which functions as elemental constituents for representing entities through combinations. To establish the knowledge graph, the authors employ geometric words as nodes, connecting them via shape categories and geometry attributes. Subsequently, a unique graph embedding method for knowledge acquisition is devised. Finally, an effective similarity measure is introduced for retrieval purposes. Importantly, each 3D or 2D entity can anchor its geometric terms within the knowledge graph, thereby serving as a link between cross-domain data. As a result, the authors’ approach facilitates multiple cross-domain 3D shape retrieval tasks. The authors evaluate the proposed method's performance on the ModelNet40 and ShapeNetCore55 datasets, encompassing scenarios related to 3D shape retrieval and cross-domain retrieval. Furthermore, the authors employ the established cross-modal dataset (MI3DOR) to assess cross-modal 3D shape retrieval. The resulting experimental outcomes, in conjunction with comparisons against state-of-the-art techniques, clearly highlight the superiority of our approach.
{"title":"3D shape knowledge graph for cross-domain 3D shape retrieval","authors":"Rihao Chang, Yongtao Ma, Tong Hao, Weijie Wang, Weizhi Nie","doi":"10.1049/cit2.12326","DOIUrl":"https://doi.org/10.1049/cit2.12326","url":null,"abstract":"<p>The surge in 3D modelling has led to a pronounced research emphasis on the field of 3D shape retrieval. Numerous contemporary approaches have been put forth to tackle this intricate challenge. Nevertheless, effectively addressing the intricacies of cross-modal 3D shape retrieval remains a formidable undertaking, owing to inherent modality-based disparities. The authors present an innovative notion—termed “geometric words”—which functions as elemental constituents for representing entities through combinations. To establish the knowledge graph, the authors employ geometric words as nodes, connecting them via shape categories and geometry attributes. Subsequently, a unique graph embedding method for knowledge acquisition is devised. Finally, an effective similarity measure is introduced for retrieval purposes. Importantly, each 3D or 2D entity can anchor its geometric terms within the knowledge graph, thereby serving as a link between cross-domain data. As a result, the authors’ approach facilitates multiple cross-domain 3D shape retrieval tasks. The authors evaluate the proposed method's performance on the ModelNet40 and ShapeNetCore55 datasets, encompassing scenarios related to 3D shape retrieval and cross-domain retrieval. Furthermore, the authors employ the established cross-modal dataset (MI3DOR) to assess cross-modal 3D shape retrieval. The resulting experimental outcomes, in conjunction with comparisons against state-of-the-art techniques, clearly highlight the superiority of our approach.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1199-1216"},"PeriodicalIF":8.4,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12326","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zong-Shan Wang, Shi-Jin Li, Hong-Wei Ding, Gaurav Dhiman, Peng Hou, Ai-Shan Li, Peng Hu, Zhi-Jun Yang, Jie Wang
The Equilibrium Optimiser (EO) has been demonstrated to be one of the metaheuristic algorithms that can effectively solve global optimisation problems. Balancing the paradox between exploration and exploitation operations while enhancing the ability to jump out of the local optimum are two key points to be addressed in EO research. To alleviate these limitations, an EO variant named adaptive elite-guided Equilibrium Optimiser (AEEO) is introduced. Specifically, the adaptive elite-guided search mechanism enhances the balance between exploration and exploitation. The modified mutualism phase reinforces the information interaction among particles and local optima avoidance. The cooperation of these two mechanisms boosts the overall performance of the basic EO. The AEEO is subjected to competitive experiments with state-of-the-art algorithms and modified algorithms on 23 classical benchmark functions and IEE CEC 2017 function test suite. Experimental results demonstrate that AEEO outperforms several well-performing EO variants, DE variants, PSO variants, SSA variants, and GWO variants in terms of convergence speed and accuracy. In addition, the AEEO algorithm is used for the edge server (ES) placement problem in mobile edge computing (MEC) environments. The experimental results show that the author’s approach outperforms the representative approaches compared in terms of access latency and deployment cost.
均衡优化器(EO)已被证明是能有效解决全局优化问题的元启发式算法之一。平衡探索与开发操作之间的矛盾,同时提高跳出局部最优的能力,是 EO 研究需要解决的两个关键点。为了缓解这些限制,我们引入了一种名为自适应精英引导均衡优化器(AEEO)的 EO 变体。具体来说,自适应精英引导搜索机制增强了探索与开发之间的平衡。修改后的互助阶段加强了粒子间的信息交互和局部最优避免。这两种机制的合作提高了基本 EO 的整体性能。在 23 个经典基准函数和 IEE CEC 2017 函数测试套件上,AEEO 与最先进的算法和改进算法进行了竞争性实验。实验结果表明,在收敛速度和准确性方面,AEEO优于几种性能良好的EO变体、DE变体、PSO变体、SSA变体和GWO变体。此外,AEEO 算法还被用于移动边缘计算(MEC)环境中的边缘服务器(ES)放置问题。实验结果表明,在访问延迟和部署成本方面,作者的方法优于所比较的代表性方法。
{"title":"Elite-guided equilibrium optimiser based on information enhancement: Algorithm and mobile edge computing applications","authors":"Zong-Shan Wang, Shi-Jin Li, Hong-Wei Ding, Gaurav Dhiman, Peng Hou, Ai-Shan Li, Peng Hu, Zhi-Jun Yang, Jie Wang","doi":"10.1049/cit2.12316","DOIUrl":"10.1049/cit2.12316","url":null,"abstract":"<p>The Equilibrium Optimiser (EO) has been demonstrated to be one of the metaheuristic algorithms that can effectively solve global optimisation problems. Balancing the paradox between exploration and exploitation operations while enhancing the ability to jump out of the local optimum are two key points to be addressed in EO research. To alleviate these limitations, an EO variant named adaptive elite-guided Equilibrium Optimiser (AEEO) is introduced. Specifically, the adaptive elite-guided search mechanism enhances the balance between exploration and exploitation. The modified mutualism phase reinforces the information interaction among particles and local optima avoidance. The cooperation of these two mechanisms boosts the overall performance of the basic EO. The AEEO is subjected to competitive experiments with state-of-the-art algorithms and modified algorithms on 23 classical benchmark functions and IEE CEC 2017 function test suite. Experimental results demonstrate that AEEO outperforms several well-performing EO variants, DE variants, PSO variants, SSA variants, and GWO variants in terms of convergence speed and accuracy. In addition, the AEEO algorithm is used for the edge server (ES) placement problem in mobile edge computing (MEC) environments. The experimental results show that the author’s approach outperforms the representative approaches compared in terms of access latency and deployment cost.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1126-1171"},"PeriodicalIF":8.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12316","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140770093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}