Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01602-0
Shaoguang Zhang, Jianguang Lu, Xianghong Tang
In the field of molecular biology, graph representation learning is crucial for molecular structure analysis. However, challenges arise in recognising functional groups and distinguishing isomers due to a lack of spatial structure information. To address these problems, we design a novel graph representation learning method based on a spatial structure information extraction Transformer (SSET). The SSET model comprises the Edge Feature Fusion Subgraph Spatial Structure Extractor (ETSE) module and the Positional Information Encoding Graph Transformer (PEGT) module. The ETSE module extracts spatial structural information by fusing edge features and generating the most-value subgraph (Mv-subgraph). The PEGT module encodes positional information based on the graph transformer, addressing the indistinguishability problem among nodes with identical features. In addition, the SSET model alleviates the burden of high computational complexity by using subgraph. Experiments on real datasets show that the SSET model, built on the graph transformer, considerably improves graph representation learning.
{"title":"Molecular subgraph representation learning based on spatial structure transformer","authors":"Shaoguang Zhang, Jianguang Lu, Xianghong Tang","doi":"10.1007/s40747-024-01602-0","DOIUrl":"https://doi.org/10.1007/s40747-024-01602-0","url":null,"abstract":"<p>In the field of molecular biology, graph representation learning is crucial for molecular structure analysis. However, challenges arise in recognising functional groups and distinguishing isomers due to a lack of spatial structure information. To address these problems, we design a novel graph representation learning method based on a spatial structure information extraction Transformer (SSET). The SSET model comprises the Edge Feature Fusion Subgraph Spatial Structure Extractor (ETSE) module and the Positional Information Encoding Graph Transformer (PEGT) module. The ETSE module extracts spatial structural information by fusing edge features and generating the most-value subgraph (Mv-subgraph). The PEGT module encodes positional information based on the graph transformer, addressing the indistinguishability problem among nodes with identical features. In addition, the SSET model alleviates the burden of high computational complexity by using subgraph. Experiments on real datasets show that the SSET model, built on the graph transformer, considerably improves graph representation learning.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"29 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01572-3
Likun Zhang, Jinbao Li, Benqian Zhang, Yahong Guo
A multi-exit network is an important technique for achieving adaptive inference by dynamically allocating computational resources based on different input samples. The existing works mainly treat the final classifier as the teacher, enhancing the classification accuracy by transferring knowledge to the intermediate classifiers. However, this traditional self-distillation training strategy only utilizes the knowledge contained in the final classifier, neglecting potentially distinctive knowledge in the other classifiers. To address this limitation, we propose a novel multi-level collaborative self-distillation learning strategy (MLCSD) that extracts knowledge from all the classifiers. MLCSD dynamically determines the weight coefficients for each classifier’s contribution through a learning process, thus constructing more comprehensive and effective teachers tailored to each classifier. These new teachers transfer the knowledge back to each classifier through a distillation technique, thereby further improving the network’s inference efficiency. We conduct experiments on three datasets, CIFAR10, CIFAR100, and Tiny-ImageNet. Compared with the baseline network that employs traditional self-distillation, our MLCSD-Net based on ResNet18 enhances the average classification accuracy by 1.18%. The experimental results demonstrate that MLCSD-Net improves the inference efficiency of adaptive inference applications, such as anytime prediction and budgeted batch classification. Code is available at https://github.com/deepzlk/MLCSD-Net.
{"title":"A multi-level collaborative self-distillation learning for improving adaptive inference efficiency","authors":"Likun Zhang, Jinbao Li, Benqian Zhang, Yahong Guo","doi":"10.1007/s40747-024-01572-3","DOIUrl":"https://doi.org/10.1007/s40747-024-01572-3","url":null,"abstract":"<p>A multi-exit network is an important technique for achieving adaptive inference by dynamically allocating computational resources based on different input samples. The existing works mainly treat the final classifier as the teacher, enhancing the classification accuracy by transferring knowledge to the intermediate classifiers. However, this traditional self-distillation training strategy only utilizes the knowledge contained in the final classifier, neglecting potentially distinctive knowledge in the other classifiers. To address this limitation, we propose a novel multi-level collaborative self-distillation learning strategy (MLCSD) that extracts knowledge from all the classifiers. MLCSD dynamically determines the weight coefficients for each classifier’s contribution through a learning process, thus constructing more comprehensive and effective teachers tailored to each classifier. These new teachers transfer the knowledge back to each classifier through a distillation technique, thereby further improving the network’s inference efficiency. We conduct experiments on three datasets, CIFAR10, CIFAR100, and Tiny-ImageNet. Compared with the baseline network that employs traditional self-distillation, our MLCSD-Net based on ResNet18 enhances the average classification accuracy by 1.18%. The experimental results demonstrate that MLCSD-Net improves the inference efficiency of adaptive inference applications, such as anytime prediction and budgeted batch classification. Code is available at https://github.com/deepzlk/MLCSD-Net.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"23 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141986604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01573-2
Kang Haiyan, Wang Jiakang
With the rapid growth of big data, extracting meaningful knowledge from data is crucial for machine learning. The existing Swarm Learning data collaboration models face challenges such as data security, model security, high communication overhead, and model performance optimization. To address this, we propose the Swarm Mutual Learning (SML). Firstly, we introduce an Adaptive Mutual Distillation Algorithm that dynamically controls the learning intensity based on distillation weights and strength, enhancing the efficiency of knowledge extraction and transfer during mutual distillation. Secondly, we design a Global Parameter Aggregation Algorithm based on homomorphic encryption, coupled with a Dynamic Gradient Decomposition Algorithm using singular value decomposition. This allows the model to aggregate parameters in ciphertext, significantly reducing communication overhead during uploads and downloads. Finally, we validate the proposed methods on real datasets, demonstrating their effectiveness and efficiency in model updates. On the MNIST dataset and CIFAR-10 dataset, the local model accuracies reached 95.02% and 55.26%, respectively, surpassing those of the comparative models. Furthermore, while ensuring the security of the aggregation process, we significantly reduced the communication overhead for uploading and downloading.
{"title":"Swarm mutual learning","authors":"Kang Haiyan, Wang Jiakang","doi":"10.1007/s40747-024-01573-2","DOIUrl":"https://doi.org/10.1007/s40747-024-01573-2","url":null,"abstract":"<p>With the rapid growth of big data, extracting meaningful knowledge from data is crucial for machine learning. The existing Swarm Learning data collaboration models face challenges such as data security, model security, high communication overhead, and model performance optimization. To address this, we propose the Swarm Mutual Learning (SML). Firstly, we introduce an Adaptive Mutual Distillation Algorithm that dynamically controls the learning intensity based on distillation weights and strength, enhancing the efficiency of knowledge extraction and transfer during mutual distillation. Secondly, we design a Global Parameter Aggregation Algorithm based on homomorphic encryption, coupled with a Dynamic Gradient Decomposition Algorithm using singular value decomposition. This allows the model to aggregate parameters in ciphertext, significantly reducing communication overhead during uploads and downloads. Finally, we validate the proposed methods on real datasets, demonstrating their effectiveness and efficiency in model updates. On the MNIST dataset and CIFAR-10 dataset, the local model accuracies reached 95.02% and 55.26%, respectively, surpassing those of the comparative models. Furthermore, while ensuring the security of the aggregation process, we significantly reduced the communication overhead for uploading and downloading.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"79 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01601-1
He Yang, Cong Jiang, Yun Song, Wendong Fan, Zelin Deng, Xinke Bai
Traffic prediction is crucial to the intelligent transportation system. However, accurate traffic prediction still faces challenges. It is difficult to extract dynamic spatial–temporal correlations of traffic flow and capture the specific traffic pattern for each sub-region. In this paper, a temporal attention recurrent graph convolutional neural network (TARGCN) is proposed to address these issues. The proposed TARGCN model fuses a node-embedded graph convolutional (Emb-GCN) layer, a gated recurrent unit (GRU) layer, and a temporal attention (TA) layer into a framework to exploit both dynamic spatial correlations between traffic nodes and temporal dependencies between time slices. In the Emb-GCN layer, node embedding matrix and node parameter learning techniques are employed to extract spatial correlations between traffic nodes at a fine-grained level and learn the specific traffic pattern for each node. Following this, a series of gated recurrent units are stacked as a GRU layer to capture spatial and temporal features from the traffic flow of adjacent nodes in the past few time slices simultaneously. Furthermore, an attention layer is applied in the temporal dimension to extend the receptive field of GRU. The combination of the Emb-GCN, GRU, and the TA layer facilitates the proposed framework exploiting not only the spatial–temporal dependencies but also the degree of interconnectedness between traffic nodes, which benefits the prediction a lot. Experiments on public traffic datasets PEMSD4 and PEMSD8 demonstrate the effectiveness of the proposed method. Compared with state-of-the-art baselines, it achieves 4.62% and 5.78% on PEMS03, 3.08% and 0.37% on PEMSD4, and 5.08% and 0.28% on PEMSD8 superiority on average. Especially for long-term prediction, prediction results for the 60-min interval show the proposed method presents a more notable advantage over compared benchmarks. The implementation on Pytorch is publicly available at https://github.com/csust-sonie/TARGCN.
{"title":"TARGCN: temporal attention recurrent graph convolutional neural network for traffic prediction","authors":"He Yang, Cong Jiang, Yun Song, Wendong Fan, Zelin Deng, Xinke Bai","doi":"10.1007/s40747-024-01601-1","DOIUrl":"https://doi.org/10.1007/s40747-024-01601-1","url":null,"abstract":"<p>Traffic prediction is crucial to the intelligent transportation system. However, accurate traffic prediction still faces challenges. It is difficult to extract dynamic spatial–temporal correlations of traffic flow and capture the specific traffic pattern for each sub-region. In this paper, a temporal attention recurrent graph convolutional neural network (TARGCN) is proposed to address these issues. The proposed TARGCN model fuses a node-embedded graph convolutional (Emb-GCN) layer, a gated recurrent unit (GRU) layer, and a temporal attention (TA) layer into a framework to exploit both dynamic spatial correlations between traffic nodes and temporal dependencies between time slices. In the Emb-GCN layer, node embedding matrix and node parameter learning techniques are employed to extract spatial correlations between traffic nodes at a fine-grained level and learn the specific traffic pattern for each node. Following this, a series of gated recurrent units are stacked as a GRU layer to capture spatial and temporal features from the traffic flow of adjacent nodes in the past few time slices simultaneously. Furthermore, an attention layer is applied in the temporal dimension to extend the receptive field of GRU. The combination of the Emb-GCN, GRU, and the TA layer facilitates the proposed framework exploiting not only the spatial–temporal dependencies but also the degree of interconnectedness between traffic nodes, which benefits the prediction a lot. Experiments on public traffic datasets PEMSD4 and PEMSD8 demonstrate the effectiveness of the proposed method. Compared with state-of-the-art baselines, it achieves 4.62% and 5.78% on PEMS03, 3.08% and 0.37% on PEMSD4, and 5.08% and 0.28% on PEMSD8 superiority on average. Especially for long-term prediction, prediction results for the 60-min interval show the proposed method presents a more notable advantage over compared benchmarks. The implementation on Pytorch is publicly available at https://github.com/csust-sonie/TARGCN.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"141 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s40747-024-01580-3
Lang Zhang, Zhan Ao Huang, Canghong Shi, Hongjiang Ma, Xiaojie Li, Xi Wu
Prohibited item detection is crucial for the safety of public places. Deep learning, one of the mainstream methods in prohibited item detection tasks, has shown superior performance far beyond traditional prohibited item detection methods. However, most neural network architectures in deep learning still lack sufficient local feature representation ability for overlapping and small targets, and ignore the problem of semantic conflicts caused by direct feature fusion. In this paper, we propose MFPIDet, a novel prohibited item detection neural network architecture based on improved YOLOV7 to achieve reliable prohibited item detection in complex environments. Specifically, a multi-scale attention module (MAM) backbone is proposed to filter the redundant information of target regions and further applied to enhance the local feature representation ability of overlapping objects. Here, to reduce the redundant information of target regions, a squeeze-excitation (SE) block is used to filter the background. Then, aiming at enhancing the feature expression ability of overlapping objects, a multi-scale feature extraction module (MFEM) is designed for local feature representation. In addition, to obtain richer context information, We design an adaptive fusion feature pyramid network (AF-FPN) to combine the adaptive context information fusion module (ACIFM) with the feature fusion module (FFM) to improve the neck structure of YOLOV7. The proposed method is validated on the PIDray dataset, and the tested results showed that our method obtained the highest mAP (68.7%), which is improved by 3.5% than YOLOV7 methods. Our approach provides a new design pattern for prohibited item detection in complex environments and shows the development potential of deep learning in related fields.
{"title":"MFPIDet: improved YOLOV7 architecture based on multi-scale feature fusion for prohibited item detection in complex environment","authors":"Lang Zhang, Zhan Ao Huang, Canghong Shi, Hongjiang Ma, Xiaojie Li, Xi Wu","doi":"10.1007/s40747-024-01580-3","DOIUrl":"https://doi.org/10.1007/s40747-024-01580-3","url":null,"abstract":"<p>Prohibited item detection is crucial for the safety of public places. Deep learning, one of the mainstream methods in prohibited item detection tasks, has shown superior performance far beyond traditional prohibited item detection methods. However, most neural network architectures in deep learning still lack sufficient local feature representation ability for overlapping and small targets, and ignore the problem of semantic conflicts caused by direct feature fusion. In this paper, we propose MFPIDet, a novel prohibited item detection neural network architecture based on improved YOLOV7 to achieve reliable prohibited item detection in complex environments. Specifically, a multi-scale attention module (MAM) backbone is proposed to filter the redundant information of target regions and further applied to enhance the local feature representation ability of overlapping objects. Here, to reduce the redundant information of target regions, a squeeze-excitation (SE) block is used to filter the background. Then, aiming at enhancing the feature expression ability of overlapping objects, a multi-scale feature extraction module (MFEM) is designed for local feature representation. In addition, to obtain richer context information, We design an adaptive fusion feature pyramid network (AF-FPN) to combine the adaptive context information fusion module (ACIFM) with the feature fusion module (FFM) to improve the neck structure of YOLOV7. The proposed method is validated on the PIDray dataset, and the tested results showed that our method obtained the highest <i>mAP</i> (68.7%), which is improved by 3.5% than YOLOV7 methods. Our approach provides a new design pattern for prohibited item detection in complex environments and shows the development potential of deep learning in related fields.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"16 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1007/s40747-024-01574-1
Shangkun Liu, Minghao Zou, Ning Liu, Yanxin Li, Weimin Zheng
The success of current deep learning models depends on a large number of precise labels. However, in the field of medical image segmentation, acquiring precise labels is labor-intensive and time-consuming. Hence, the challenge of achieving a high-performance model via datasets containing noisy labels has attracted significant research interest. Some existing methods are unable to exclude samples containing noisy labels and some methods still have high requirements on datasets. To solve this problem, we propose a noisy label learning method for medical image segmentation using a mixture of high and low quality labels based on the architecture of mean teacher. Firstly, considering the teacher model’s capacity to aggregate all previously learned information following each training step, we propose to leverage a teacher model to correct noisy label adaptively during the training phase. Secondly, to enhance the model’s robustness, we propose to infuse feature perturbations into the student model. This strategy aims to bolster the model’s ability to handle variations in input data and improve its resilience to noisy labels. Finally, we simulate noisy labels by destroying labels in two medical image datasets: the Automated Cardiac Diagnosis Challenge (ACDC) dataset and the 3D Left Atrium (LA) dataset. Experiments show that the proposed method demonstrates considerable effectiveness. With a noisy ratio of 0.8, compared with other methods, the mean Dice score of our proposed method is improved by 2.58% and 0.31% on ACDC and LA datasets, respectively.
{"title":"A teacher-guided early-learning method for medical image segmentation from noisy labels","authors":"Shangkun Liu, Minghao Zou, Ning Liu, Yanxin Li, Weimin Zheng","doi":"10.1007/s40747-024-01574-1","DOIUrl":"https://doi.org/10.1007/s40747-024-01574-1","url":null,"abstract":"<p>The success of current deep learning models depends on a large number of precise labels. However, in the field of medical image segmentation, acquiring precise labels is labor-intensive and time-consuming. Hence, the challenge of achieving a high-performance model via datasets containing noisy labels has attracted significant research interest. Some existing methods are unable to exclude samples containing noisy labels and some methods still have high requirements on datasets. To solve this problem, we propose a noisy label learning method for medical image segmentation using a mixture of high and low quality labels based on the architecture of mean teacher. Firstly, considering the teacher model’s capacity to aggregate all previously learned information following each training step, we propose to leverage a teacher model to correct noisy label adaptively during the training phase. Secondly, to enhance the model’s robustness, we propose to infuse feature perturbations into the student model. This strategy aims to bolster the model’s ability to handle variations in input data and improve its resilience to noisy labels. Finally, we simulate noisy labels by destroying labels in two medical image datasets: the Automated Cardiac Diagnosis Challenge (ACDC) dataset and the 3D Left Atrium (LA) dataset. Experiments show that the proposed method demonstrates considerable effectiveness. With a noisy ratio of 0.8, compared with other methods, the mean Dice score of our proposed method is improved by 2.58% and 0.31% on ACDC and LA datasets, respectively.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"17 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141973843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1007/s40747-024-01598-7
Zhijun Fu, Bao Ma, Dengfeng Zhao, Yuming Yin
This study is the first time devoted to seek an online optimal tracking solution for unknown nonlinear singularly perturbed systems based on single network adaptive critic (SNAC) design. Firstly, a novel identifier with more efficient parametric multi-time scales differential neural network (PMTSDNN) is developed to obtain the unknown system dynamics. Then, based on the identification results, the online optimal tracking controller consists of an adaptive steady control term and an optimal feedback control term is developed by using SNAC to solve the Hamilton–Jacobi–Bellman (HJB) equation online. New learning law considering filtered parameter identification error is developed for the PMTSDNN identifier and the SNAC, which can realize online synchronous learning and fast convergence. The Lyapunov approach is synthesized to ensure the convergence characteristics of the overall closed loop system consisting of the PMTSDNN identifier, the SNAC and the optimal tracking control policy. Three examples are provided to illustrate the effectiveness of the investigated method.
{"title":"Online optimal tracking control of unknown nonlinear singularly perturbed systems using single network adaptive critic with improved learning","authors":"Zhijun Fu, Bao Ma, Dengfeng Zhao, Yuming Yin","doi":"10.1007/s40747-024-01598-7","DOIUrl":"https://doi.org/10.1007/s40747-024-01598-7","url":null,"abstract":"<p>This study is the first time devoted to seek an online optimal tracking solution for unknown nonlinear singularly perturbed systems based on single network adaptive critic (SNAC) design. Firstly, a novel identifier with more efficient parametric multi-time scales differential neural network (PMTSDNN) is developed to obtain the unknown system dynamics. Then, based on the identification results, the online optimal tracking controller consists of an adaptive steady control term and an optimal feedback control term is developed by using SNAC to solve the Hamilton–Jacobi–Bellman (HJB) equation online. New learning law considering filtered parameter identification error is developed for the PMTSDNN identifier and the SNAC, which can realize online synchronous learning and fast convergence. The Lyapunov approach is synthesized to ensure the convergence characteristics of the overall closed loop system consisting of the PMTSDNN identifier, the SNAC and the optimal tracking control policy. Three examples are provided to illustrate the effectiveness of the investigated method.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"7 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141973842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Named Entity Recognition (NER) is fundamental in natural language processing, involving identifying entity spans and types within a sentence. Nested NER contains other entities, which pose a significant challenge, especially pronounced in the domain of medical-named entities due to intricate nesting patterns inherent in medical terminology. Existing studies can not capture interdependencies among different entity categories, resulting in inadequate performance in nested NER tasks. To address this problem, we propose a novel Layer-based architecture with Segmentation-aware Relational Graph Convolutional Network (LSRGCN) for Nested NER in the medical domain. LSRGCN comprises two key modules: a shared segmentation-aware encoder and a multi-layer conditional random field decoder. The former part provides token representation including boundary information from sentence segmentation. The latter part can learn the connections between different entity classes and improve recognition accuracy through secondary decoding. We conduct experiments on four datasets. Experimental results demonstrate the effectiveness of our model. Additionally, extensive studies are conducted to enhance our understanding of the model and its capabilities.
命名实体识别(NER)是自然语言处理的基础,涉及识别句子中的实体跨度和类型。嵌套 NER 包含其他实体,这构成了巨大的挑战,尤其是在医学命名实体领域,由于医学术语固有的复杂嵌套模式,这种挑战尤为明显。现有研究无法捕捉不同实体类别之间的相互依赖关系,导致嵌套 NER 任务的性能不足。为解决这一问题,我们提出了一种基于层的新型架构,该架构具有分段感知关系图卷积网络(LSRGCN),适用于医学领域的嵌套式 NER。LSRGCN 包括两个关键模块:共享分割感知编码器和多层条件随机场解码器。前者提供标记表示,包括来自句子分割的边界信息。后一部分可以学习不同实体类别之间的联系,并通过二次解码提高识别准确率。我们在四个数据集上进行了实验。实验结果证明了我们模型的有效性。此外,我们还进行了大量研究,以加深对模型及其功能的理解。
{"title":"Segmentation-aware relational graph convolutional network with multi-layer CRF for nested named entity recognition","authors":"Daojun Han, Zemin Wang, Yunsong Li, Xiangbo ma, Juntao Zhang","doi":"10.1007/s40747-024-01551-8","DOIUrl":"https://doi.org/10.1007/s40747-024-01551-8","url":null,"abstract":"<p>Named Entity Recognition (NER) is fundamental in natural language processing, involving identifying entity spans and types within a sentence. Nested NER contains other entities, which pose a significant challenge, especially pronounced in the domain of medical-named entities due to intricate nesting patterns inherent in medical terminology. Existing studies can not capture interdependencies among different entity categories, resulting in inadequate performance in nested NER tasks. To address this problem, we propose a novel <b>L</b>ayer-based architecture with <b>S</b>egmentation-aware <b>R</b>elational <b>G</b>raph <b>C</b>onvolutional <b>N</b>etwork (LSRGCN) for Nested NER in the medical domain. LSRGCN comprises two key modules: a shared segmentation-aware encoder and a multi-layer conditional random field decoder. The former part provides token representation including boundary information from sentence segmentation. The latter part can learn the connections between different entity classes and improve recognition accuracy through secondary decoding. We conduct experiments on four datasets. Experimental results demonstrate the effectiveness of our model. Additionally, extensive studies are conducted to enhance our understanding of the model and its capabilities.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"36 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1007/s40747-024-01578-x
Lang Xiong, Liyun Su, Shiyi Zeng, Xiangjing Li, Tong Wang, Feng Zhao
Spatial–temporal data is widely available in intelligent transportation systems, and accurately solving non-stationary of spatial–temporal regression is critical. In most traffic flow prediction research, the non-stationary solution of deep spatial–temporal regression tasks is typically formulated as a spatial–temporal graph modeling problem. However, there are several issues: (1) the coupled spatial–temporal regression approach renders it unfeasible to accurately learn the dependencies of diverse modalities; (2) the intricate stacking design of deep spatial–temporal network modules limits the interpretation and migration capability; (3) the ability to model dynamic spatial–temporal relationships is inadequate. To tackle the challenges mentioned above, we propose a novel unified spatial–temporal regression framework named Generalized Spatial–Temporal Regression Graph Convolutional Transformer (GSTRGCT) that extends panel model in spatial econometrics and combines it with deep neural networks to effectively model non-stationary relationships of spatial–temporal regression. Considering the coupling of existing deep spatial–temporal networks, we introduce the tensor decomposition to explicitly decompose the panel model into a tensor product of spatial regression on the spatial hyper-plane and temporal regression on the temporal hyper-plane. On the spatial hyper-plane, we present dynamic adaptive spatial weight network (DASWNN) to capture the global and local spatial correlations. Specifically, DASWNN adopts spatial weight neural network (SWNN) to learn the semantic global spatial correlation and dynamically adjusts the local changing spatial correlation by multiplying between spatial nodes embedding. On the temporal hyper-plane, we introduce the Auto-Correlation attention mechanism to capture the period-based temporal dependence. Extensive experiments on the two real-world traffic datasets show that GSTRGCT consistently outperforms other competitive methods with an average of 62% and 59% on predictive performance.
{"title":"Generalized spatial–temporal regression graph convolutional transformer for traffic forecasting","authors":"Lang Xiong, Liyun Su, Shiyi Zeng, Xiangjing Li, Tong Wang, Feng Zhao","doi":"10.1007/s40747-024-01578-x","DOIUrl":"https://doi.org/10.1007/s40747-024-01578-x","url":null,"abstract":"<p>Spatial–temporal data is widely available in intelligent transportation systems, and accurately solving non-stationary of spatial–temporal regression is critical. In most traffic flow prediction research, the non-stationary solution of deep spatial–temporal regression tasks is typically formulated as a spatial–temporal graph modeling problem. However, there are several issues: (1) the coupled spatial–temporal regression approach renders it unfeasible to accurately learn the dependencies of diverse modalities; (2) the intricate stacking design of deep spatial–temporal network modules limits the interpretation and migration capability; (3) the ability to model dynamic spatial–temporal relationships is inadequate. To tackle the challenges mentioned above, we propose a novel unified spatial–temporal regression framework named Generalized Spatial–Temporal Regression Graph Convolutional Transformer (GSTRGCT) that extends panel model in spatial econometrics and combines it with deep neural networks to effectively model non-stationary relationships of spatial–temporal regression. Considering the coupling of existing deep spatial–temporal networks, we introduce the tensor decomposition to explicitly decompose the panel model into a tensor product of spatial regression on the spatial hyper-plane and temporal regression on the temporal hyper-plane. On the spatial hyper-plane, we present dynamic adaptive spatial weight network (DASWNN) to capture the global and local spatial correlations. Specifically, DASWNN adopts spatial weight neural network (SWNN) to learn the semantic global spatial correlation and dynamically adjusts the local changing spatial correlation by multiplying between spatial nodes embedding. On the temporal hyper-plane, we introduce the Auto-Correlation attention mechanism to capture the period-based temporal dependence. Extensive experiments on the two real-world traffic datasets show that GSTRGCT consistently outperforms other competitive methods with an average of 62% and 59% on predictive performance.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"103 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1007/s40747-024-01575-0
Guowei Zhang, Xincheng Tang, Li Wang, Huankang Cui, Teng Fei, Hulin Tang, Shangfeng Jiang
Self-supervised monocular depth estimation has always attracted attention because it does not require ground truth data. Designing a lightweight architecture capable of fast inference is crucial for deployment on mobile devices. The current network effectively integrates Convolutional Neural Networks (CNN) with Transformers, achieving significant improvements in accuracy. However, this advantage comes at the cost of an increase in model size and a significant reduction in inference speed. In this study, we propose a network named Repmono, which includes LCKT module with a large convolutional kernel and RepTM module based on the structural reparameterisation technique. With the combination of these two modules, our network achieves both local and global feature extraction with a smaller number of parameters and significantly enhances inference speed. Our network, with 2.31MB parameters, shows significant accuracy improvements over Monodepth2 in experiments on the KITTI dataset. With uniform input dimensions, our network’s inference speed is 53.7% faster than R-MSFM6, 60.1% faster than Monodepth2, and 81.1% faster than MonoVIT-small. Our code is available at https://github.com/txc320382/Repmono.
{"title":"Repmono: a lightweight self-supervised monocular depth estimation architecture for high-speed inference","authors":"Guowei Zhang, Xincheng Tang, Li Wang, Huankang Cui, Teng Fei, Hulin Tang, Shangfeng Jiang","doi":"10.1007/s40747-024-01575-0","DOIUrl":"https://doi.org/10.1007/s40747-024-01575-0","url":null,"abstract":"<p>Self-supervised monocular depth estimation has always attracted attention because it does not require ground truth data. Designing a lightweight architecture capable of fast inference is crucial for deployment on mobile devices. The current network effectively integrates Convolutional Neural Networks (CNN) with Transformers, achieving significant improvements in accuracy. However, this advantage comes at the cost of an increase in model size and a significant reduction in inference speed. In this study, we propose a network named Repmono, which includes LCKT module with a large convolutional kernel and RepTM module based on the structural reparameterisation technique. With the combination of these two modules, our network achieves both local and global feature extraction with a smaller number of parameters and significantly enhances inference speed. Our network, with 2.31MB parameters, shows significant accuracy improvements over Monodepth2 in experiments on the KITTI dataset. With uniform input dimensions, our network’s inference speed is 53.7% faster than R-MSFM6, 60.1% faster than Monodepth2, and 81.1% faster than MonoVIT-small. Our code is available at https://github.com/txc320382/Repmono.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"12 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}