首页 > 最新文献

CAAI Transactions on Intelligence Technology最新文献

英文 中文
Sophisticated Ensemble Deep Learning Approaches for Multilabel Retinal Disease Classification in Medical Imaging 医学影像中多标签视网膜疾病分类的复杂集成深度学习方法
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-09 DOI: 10.1049/cit2.70012
Asghar Amir, Tariqullah Jan, Mohammad Haseeb Zafar, Shadan Khan Khattak

This paper introduces a novel ensemble Deep learning (DL)-based Multi-Label Retinal Disease Classification (MLRDC) system, known for its high accuracy and efficiency. Utilising a stacking ensemble approach, and integrating DenseNet201, EfficientNetB4, EfficientNetB3 and EfficientNetV2S models, exceptional performance in retinal disease classification is achieved. The proposed MLRDC model, leveraging DL as the meta-model, outperforms individual base detectors, with DenseNet201 and EfficientNetV2S achieving an accuracy of 96.5%, precision of 98.6%, recall of 97.1%, and F1 score of 97.8%. Weighted multilabel classifiers in the ensemble exhibit an average accuracy of 90.6%, precision of 98.3%, recall of 91.2%, and F1 score of 94.6%, whereas unweighted models achieve an average accuracy of 90%, precision of 98.6%, recall of 93.1%, and F1 score of 95.7%. Employing Logistic Regression (LR) as the meta-model, the proposed MLRDC system achieves an accuracy of 93.5%, precision of 98.2%, recall of 93.9%, and F1 score of 96%, with a minimal loss of 0.029. These results highlight the superiority of the proposed model over benchmark state-of-the-art ensembles, emphasising its practical applicability in medical image classification.

本文介绍了一种基于集成深度学习(DL)的多标签视网膜疾病分类(MLRDC)系统,该系统以其高精度和高效率而闻名。利用堆叠集成方法,整合DenseNet201、EfficientNetB4、EfficientNetB3和EfficientNetV2S模型,在视网膜疾病分类方面取得了优异的成绩。采用深度学习作为元模型的MLRDC模型优于单个碱基检测器,其中DenseNet201和EfficientNetV2S的准确率为96.5%,精密度为98.6%,召回率为97.1%,F1得分为97.8%。综上加权多标签分类器的平均准确率为90.6%,精密度为98.3%,召回率为91.2%,F1分数为94.6%,而未加权模型的平均准确率为90%,精密度为98.6%,召回率为93.1%,F1分数为95.7%。采用Logistic回归(LR)作为元模型,该系统的准确率为93.5%,精密度为98.2%,召回率为93.9%,F1分数为96%,最小损失为0.029。这些结果突出了所提出的模型优于基准的最先进的集成,强调了其在医学图像分类中的实际适用性。
{"title":"Sophisticated Ensemble Deep Learning Approaches for Multilabel Retinal Disease Classification in Medical Imaging","authors":"Asghar Amir,&nbsp;Tariqullah Jan,&nbsp;Mohammad Haseeb Zafar,&nbsp;Shadan Khan Khattak","doi":"10.1049/cit2.70012","DOIUrl":"10.1049/cit2.70012","url":null,"abstract":"<p>This paper introduces a novel ensemble Deep learning (DL)-based Multi-Label Retinal Disease Classification (MLRDC) system, known for its high accuracy and efficiency. Utilising a stacking ensemble approach, and integrating DenseNet201, EfficientNetB4, EfficientNetB3 and EfficientNetV2S models, exceptional performance in retinal disease classification is achieved. The proposed MLRDC model, leveraging DL as the meta-model, outperforms individual base detectors, with DenseNet201 and EfficientNetV2S achieving an accuracy of 96.5%, precision of 98.6%, recall of 97.1%, and F1 score of 97.8%. Weighted multilabel classifiers in the ensemble exhibit an average accuracy of 90.6%, precision of 98.3%, recall of 91.2%, and F1 score of 94.6%, whereas unweighted models achieve an average accuracy of 90%, precision of 98.6%, recall of 93.1%, and F1 score of 95.7%. Employing Logistic Regression (LR) as the meta-model, the proposed MLRDC system achieves an accuracy of 93.5%, precision of 98.2%, recall of 93.9%, and F1 score of 96%, with a minimal loss of 0.029. These results highlight the superiority of the proposed model over benchmark state-of-the-art ensembles, emphasising its practical applicability in medical image classification.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1159-1173"},"PeriodicalIF":7.3,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Geometry-Enhanced Implicit Function for Detailed Clothed Human Reconstruction With RGB-D Input 基于RGB-D输入的几何增强隐式人体细节重建
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-03 DOI: 10.1049/cit2.70009
Pengpeng Liu, Zhi Zeng, Qisheng Wang, Min Chen, Guixuan Zhang

Realistic human reconstruction embraces an extensive range of applications as depth sensors advance. However, current state-of-the-art methods with RGB-D input still suffer from artefacts, such as noisy surfaces, non-human shapes, and depth ambiguity, especially for the invisible parts. The authors observe the main issue is the lack of geometric semantics without using depth input priors fully. This paper focuses on improving the representation ability of implicit function, exploring an effective method to utilise depth-related semantics effectively and efficiently. The proposed geometry-enhanced implicit function enhances the geometric semantics with the extra voxel-aligned features from point clouds, promoting the completion of missing parts for unseen regions while preserving the local details on the input. For incorporating multi-scale pixel-aligned and voxel-aligned features, the authors use the Squeeze-and-Excitation attention to capture and fully use channel interdependencies. For the multi-view reconstruction, the proposed depth-enhanced attention explicitly excites the network to “sense” the geometric structure for a more reasonable feature aggregation. Experiments and results show that our method outperforms current RGB and depth-based SOTA methods on the challenging data from Twindom and Thuman3.0, and achieves a detailed and completed human reconstruction, balancing performance and efficiency well.

随着深度传感器的进步,现实的人体重建包含了广泛的应用。然而,目前使用RGB-D输入的最先进的方法仍然受到伪影的影响,例如噪声表面、非人类形状和深度模糊,特别是对于不可见的部分。作者认为,主要问题是缺乏几何语义,没有充分利用深度输入先验。本文着眼于提高隐函数的表示能力,探索一种有效、高效地利用深度相关语义的有效方法。所提出的几何增强隐式函数通过点云中额外的体素对齐特征增强几何语义,促进未见区域缺失部分的补全,同时保留输入的局部细节。为了结合多尺度像素对齐和体素对齐的特征,作者使用挤压和激励注意来捕获和充分利用通道相互依赖性。对于多视图重建,所提出的深度增强关注明确激发网络“感知”几何结构,以进行更合理的特征聚合。实验和结果表明,该方法在Twindom和Thuman3.0具有挑战性的数据上优于现有的RGB和基于深度的SOTA方法,实现了详细完整的人体重建,并在性能和效率上取得了良好的平衡。
{"title":"Geometry-Enhanced Implicit Function for Detailed Clothed Human Reconstruction With RGB-D Input","authors":"Pengpeng Liu,&nbsp;Zhi Zeng,&nbsp;Qisheng Wang,&nbsp;Min Chen,&nbsp;Guixuan Zhang","doi":"10.1049/cit2.70009","DOIUrl":"10.1049/cit2.70009","url":null,"abstract":"<p>Realistic human reconstruction embraces an extensive range of applications as depth sensors advance. However, current state-of-the-art methods with RGB-D input still suffer from artefacts, such as noisy surfaces, non-human shapes, and depth ambiguity, especially for the invisible parts. The authors observe the main issue is the lack of geometric semantics without using depth input priors fully. This paper focuses on improving the representation ability of implicit function, exploring an effective method to utilise depth-related semantics effectively and efficiently. The proposed geometry-enhanced implicit function enhances the geometric semantics with the extra voxel-aligned features from point clouds, promoting the completion of missing parts for unseen regions while preserving the local details on the input. For incorporating multi-scale pixel-aligned and voxel-aligned features, the authors use the Squeeze-and-Excitation attention to capture and fully use channel interdependencies. For the multi-view reconstruction, the proposed depth-enhanced attention explicitly excites the network to “sense” the geometric structure for a more reasonable feature aggregation. Experiments and results show that our method outperforms current RGB and depth-based SOTA methods on the challenging data from Twindom and Thuman3.0, and achieves a detailed and completed human reconstruction, balancing performance and efficiency well.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"858-870"},"PeriodicalIF":7.3,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large Language Models With Contrastive Decoding Algorithm for Hallucination Mitigation in Low-Resource Languages 基于对比解码算法的大语言模型在低资源语言中的幻觉缓解
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-03 DOI: 10.1049/cit2.70004
Zan Hongying, Arifa Javed, Muhammad Abdullah, Javed Rashid, Muhammad Faheem

Neural machine translation (NMT) has advanced with deep learning and large-scale multilingual models, yet translating low-resource languages often lacks sufficient training data and leads to hallucinations. This often results in translated content that diverges significantly from the source text. This research proposes a refined Contrastive Decoding (CD) algorithm that dynamically adjusts weights of log probabilities from strong expert and weak amateur models to mitigate hallucinations in low-resource NMT and improve translation quality. Advanced large language NMT models, including ChatGLM and LLaMA, are fine-tuned and implemented for their superior contextual understanding and cross-lingual capabilities. The refined CD algorithm evaluates multiple candidate translations using BLEU score, semantic similarity, and Named Entity Recognition accuracy. Extensive experimental results show substantial improvements in translation quality and a significant reduction in hallucination rates. Fine-tuned models achieve higher evaluation metrics compared to baseline models and state-of-the-art models. An ablation study confirms the contributions of each methodological component and highlights the effectiveness of the refined CD algorithm and advanced models in mitigating hallucinations. Notably, the refined methodology increased the BLEU score by approximately 30% compared to baseline models.

神经机器翻译(NMT)随着深度学习和大规模多语言模型的发展而进步,但翻译低资源语言往往缺乏足够的训练数据并导致幻觉。这通常会导致翻译的内容与原文有很大的差异。本文提出了一种改进的对比解码(CD)算法,该算法动态调整来自强专家和弱业余模型的对数概率权重,以减轻低资源NMT中的幻觉,提高翻译质量。先进的大型语言NMT模型,包括ChatGLM和LLaMA,经过微调和实现,具有卓越的上下文理解和跨语言能力。改进的CD算法使用BLEU评分、语义相似度和命名实体识别准确性来评估多个候选翻译。大量的实验结果表明,翻译质量有了很大的提高,幻觉率显著降低。与基线模型和最先进的模型相比,微调模型实现了更高的评估度量。消融研究证实了每个方法学组成部分的贡献,并强调了改进的CD算法和先进模型在减轻幻觉方面的有效性。值得注意的是,与基线模型相比,改进的方法将BLEU评分提高了约30%。
{"title":"Large Language Models With Contrastive Decoding Algorithm for Hallucination Mitigation in Low-Resource Languages","authors":"Zan Hongying,&nbsp;Arifa Javed,&nbsp;Muhammad Abdullah,&nbsp;Javed Rashid,&nbsp;Muhammad Faheem","doi":"10.1049/cit2.70004","DOIUrl":"10.1049/cit2.70004","url":null,"abstract":"<p>Neural machine translation (NMT) has advanced with deep learning and large-scale multilingual models, yet translating low-resource languages often lacks sufficient training data and leads to hallucinations. This often results in translated content that diverges significantly from the source text. This research proposes a refined Contrastive Decoding (CD) algorithm that dynamically adjusts weights of log probabilities from strong expert and weak amateur models to mitigate hallucinations in low-resource NMT and improve translation quality. Advanced large language NMT models, including ChatGLM and LLaMA, are fine-tuned and implemented for their superior contextual understanding and cross-lingual capabilities. The refined CD algorithm evaluates multiple candidate translations using BLEU score, semantic similarity, and Named Entity Recognition accuracy. Extensive experimental results show substantial improvements in translation quality and a significant reduction in hallucination rates. Fine-tuned models achieve higher evaluation metrics compared to baseline models and state-of-the-art models. An ablation study confirms the contributions of each methodological component and highlights the effectiveness of the refined CD algorithm and advanced models in mitigating hallucinations. Notably, the refined methodology increased the BLEU score by approximately 30% compared to baseline models.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1104-1117"},"PeriodicalIF":7.3,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Layer-Level Adaptive Gradient Perturbation Protecting Deep Learning Based on Differential Privacy 基于差分隐私的层级自适应梯度扰动保护深度学习
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-03 DOI: 10.1049/cit2.70008
Zhang Xiangfei, Zhang Qingchen, Jiang Liming

Deep learning’s widespread dependence on large datasets raises privacy concerns due to the potential presence of sensitive information. Differential privacy stands out as a crucial method for preserving privacy, garnering significant interest for its ability to offer robust and verifiable privacy safeguards during data training. However, classic differentially private learning introduces the same level of noise into the gradients across training iterations, which affects the trade-off between model utility and privacy guarantees. To address this issue, an adaptive differential privacy mechanism was proposed in this paper, which dynamically adjusts the privacy budget at the layer-level as training progresses to resist member inference attacks. Specifically, an equal privacy budget is initially allocated to each layer. Subsequently, as training advances, the privacy budget for layers closer to the output is reduced (adding more noise), while the budget for layers closer to the input is increased. The adjustment magnitude depends on the training iterations and is automatically determined based on the iteration count. This dynamic allocation provides a simple process for adjusting privacy budgets, alleviating the burden on users to tweak parameters and ensuring that privacy preservation strategies align with training progress. Extensive experiments on five well-known datasets indicate that the proposed method outperforms competing methods in terms of accuracy and resilience against membership inference attacks.

深度学习对大型数据集的广泛依赖引发了隐私问题,因为可能存在敏感信息。差分隐私作为保护隐私的一种重要方法,因其在数据训练期间提供强大且可验证的隐私保护的能力而引起了人们的极大兴趣。然而,经典的差分私有学习在跨训练迭代的梯度中引入了相同水平的噪声,这影响了模型效用和隐私保证之间的权衡。为了解决这一问题,本文提出了一种自适应差分隐私机制,该机制随着训练的进行在层级动态调整隐私预算,以抵御成员推理攻击。具体来说,最初为每一层分配了相同的隐私预算。随后,随着训练的进行,离输出更近的层的隐私预算减少(增加更多的噪声),而离输入更近的层的预算增加。调整幅度取决于训练迭代,并根据迭代次数自动确定。这种动态分配为调整隐私预算提供了一个简单的过程,减轻了用户调整参数的负担,并确保隐私保护策略与培训进度保持一致。在五个知名数据集上的大量实验表明,所提出的方法在准确性和抗隶属度推理攻击的弹性方面优于竞争方法。
{"title":"Layer-Level Adaptive Gradient Perturbation Protecting Deep Learning Based on Differential Privacy","authors":"Zhang Xiangfei,&nbsp;Zhang Qingchen,&nbsp;Jiang Liming","doi":"10.1049/cit2.70008","DOIUrl":"10.1049/cit2.70008","url":null,"abstract":"<p>Deep learning’s widespread dependence on large datasets raises privacy concerns due to the potential presence of sensitive information. Differential privacy stands out as a crucial method for preserving privacy, garnering significant interest for its ability to offer robust and verifiable privacy safeguards during data training. However, classic differentially private learning introduces the same level of noise into the gradients across training iterations, which affects the trade-off between model utility and privacy guarantees. To address this issue, an adaptive differential privacy mechanism was proposed in this paper, which dynamically adjusts the privacy budget at the layer-level as training progresses to resist member inference attacks. Specifically, an equal privacy budget is initially allocated to each layer. Subsequently, as training advances, the privacy budget for layers closer to the output is reduced (adding more noise), while the budget for layers closer to the input is increased. The adjustment magnitude depends on the training iterations and is automatically determined based on the iteration count. This dynamic allocation provides a simple process for adjusting privacy budgets, alleviating the burden on users to tweak parameters and ensuring that privacy preservation strategies align with training progress. Extensive experiments on five well-known datasets indicate that the proposed method outperforms competing methods in terms of accuracy and resilience against membership inference attacks.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"929-944"},"PeriodicalIF":7.3,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sep-NMS: Unlocking the Aptitude of Two-Stage Referring Expression Comprehension Sep-NMS:开启两阶段指称表达理解的能力
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-02 DOI: 10.1049/cit2.70007
Jing Wang, Zhikang Wang, Xiaojie Wang, Fangxiang Feng, Bo Yang

Referring expression comprehension (REC) aims to locate a specific region in an image described by a natural language. Existing two-stage methods generate multiple candidate proposals in the first stage, followed by selecting one of these proposals as the grounding result in the second stage. Nevertheless, the number of candidate proposals generated in the first stage significantly exceeds ground truth and the recall of critical objects is inadequate, thereby enormously limiting the overall network performance. To address the above issues, the authors propose an innovative method termed Separate Non-Maximum Suppression (Sep-NMS) for two-stage REC. Particularly, Sep-NMS models information from the two stages independently and collaboratively, ultimately achieving an overall improvement in comprehension and identification of the target objects. Specifically, the authors propose a Ref-Relatedness module for filtering referent proposals rigorously, decreasing the redundancy of referent proposals. A CLIP ${text{CLIP}}^{{dagger}}$ Relatedness module based on robust multimodal pre-trained encoders is built to precisely assess the relevance between language and proposals to improve the recall of critical objects. It is worth mentioning that the authors are the pioneers in utilising a multimodal pre-training model for proposal filtering in the first stage. Moreover, an Information Fusion module is designed to effectively amalgamate the multimodal information across two stages, ensuring maximum utilisation of the available information. Extensive experiments demonstrate that the approach achieves competitive performance with previous state-of-the-art methods. The datasets used are publicly available: RefCOCO, RefCOCO+: https://doi.org/10.1007/978-3-319-46475-6_5 and RefCOCOg: https://doi.org/10.1109/CVPR.2016.9.

引用表达式理解(REC)的目的是定位自然语言描述的图像中的特定区域。现有的两阶段方法在第一阶段生成多个候选提案,然后在第二阶段选择其中一个提案作为接地结果。然而,在第一阶段产生的候选提案的数量大大超过了基本事实,并且对关键对象的召回不足,从而极大地限制了整体网络性能。为了解决上述问题,作者提出了一种两阶段REC的创新方法Sep-NMS (Separate Non-Maximum Suppression), Sep-NMS对两阶段的信息进行独立和协作的建模,最终实现了对目标对象的理解和识别的全面提高。具体来说,作者提出了一个参考相关性模块来严格过滤参考建议,减少参考建议的冗余。构建了基于鲁棒多模态预训练编码器的CLIP†${text{CLIP}}^{{dagger}}$ Relatedness模块,用于精确评估语言和建议之间的相关性,以提高关键对象的召回率。值得一提的是,作者是在第一阶段使用多模态预训练模型进行建议过滤的先驱。此外,设计了信息融合模块,有效地融合了两个阶段的多模态信息,确保了可用信息的最大利用。大量的实验表明,该方法与以前最先进的方法相比具有竞争力。使用的数据集是公开的:RefCOCO, RefCOCO+: https://doi.org/10.1007/978-3-319-46475-6_5和RefCOCO: https://doi.org/10.1109/CVPR.2016.9。
{"title":"Sep-NMS: Unlocking the Aptitude of Two-Stage Referring Expression Comprehension","authors":"Jing Wang,&nbsp;Zhikang Wang,&nbsp;Xiaojie Wang,&nbsp;Fangxiang Feng,&nbsp;Bo Yang","doi":"10.1049/cit2.70007","DOIUrl":"10.1049/cit2.70007","url":null,"abstract":"<p>Referring expression comprehension (REC) aims to locate a specific region in an image described by a natural language. Existing two-stage methods generate multiple candidate proposals in the first stage, followed by selecting one of these proposals as the grounding result in the second stage. Nevertheless, the number of candidate proposals generated in the first stage significantly exceeds ground truth and the recall of critical objects is inadequate, thereby enormously limiting the overall network performance. To address the above issues, the authors propose an innovative method termed Separate Non-Maximum Suppression (Sep-NMS) for two-stage REC. Particularly, Sep-NMS models information from the two stages independently and collaboratively, ultimately achieving an overall improvement in comprehension and identification of the target objects. Specifically, the authors propose a Ref-Relatedness module for filtering referent proposals rigorously, decreasing the redundancy of referent proposals. A <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mtext>CLIP</mtext>\u0000 <mo>†</mo>\u0000 </msup>\u0000 </mrow>\u0000 <annotation> ${text{CLIP}}^{{dagger}}$</annotation>\u0000 </semantics></math> Relatedness module based on robust multimodal pre-trained encoders is built to precisely assess the relevance between language and proposals to improve the recall of critical objects. It is worth mentioning that the authors are the pioneers in utilising a multimodal pre-training model for proposal filtering in the first stage. Moreover, an Information Fusion module is designed to effectively amalgamate the multimodal information across two stages, ensuring maximum utilisation of the available information. Extensive experiments demonstrate that the approach achieves competitive performance with previous state-of-the-art methods. The datasets used are publicly available: RefCOCO, RefCOCO+: https://doi.org/10.1007/978-3-319-46475-6_5 and RefCOCOg: https://doi.org/10.1109/CVPR.2016.9.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1049-1061"},"PeriodicalIF":7.3,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular Retrosynthesis Top-K Prediction Based on the Latent Generation Process 基于潜伏生成过程的分子反合成Top-K预测
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-01 DOI: 10.1049/cit2.70005
Yupeng Liu, Han Zhang, Rui Hu

In the field of organic synthesis, the core objective of retrosynthetic methods is to deduce possible synthetic routes and precursor molecules for complex target molecules. Traditional retrosynthetic methods, such as template-based retrosynthesis, have high accuracy and interpretability in specific types of reactions but are limited by the scope of the template library, making it difficult to adapt to new or uncommon reaction types. Moreover, sequence-to-sequence retrosynthetic prediction methods, although they enhance the flexibility of prediction, often overlook the complexity of molecular graph structures and the actual interactions between atoms, which limits the accuracy and reliability of the predictions. To address these limitations, this paper proposes a Molecular Retrosynthesis Top-k Prediction based on the Latent Generation Process (MRLGP) that uses latent variables from graph neural networks to model the generation process and produce diverse set of reactants. Utilising an encoding method based on Graphormer, the authors have also introduced topology-aware positional encoding to better capture the interactions between atomic nodes in the molecular graph structure, thereby more accurately simulating the retrosynthetic process. The MRLGP model significantly enhances the accuracy and diversity of predictions by correlating discrete latent variables with the reactant generation process and progressively constructing molecular graphs using a variational autoregressive decoder. Experimental results on benchmark datasets such as USPTO-50k, USPTO-Full, and USPTO-DIVERSE demonstrate that MRLGP outperforms baseline models on multiple Top-k evaluation metrics. Additionally, ablation experiments conducted on the USPTO-50K dataset further validate the effectiveness of the methods used in the encoder and decoder parts of the model.

在有机合成领域,反合成方法的核心目标是推导复杂靶分子可能的合成路线和前体分子。传统的反合成方法,如基于模板的反合成,在特定类型的反应中具有较高的准确性和可解释性,但受到模板库范围的限制,难以适应新的或不常见的反应类型。此外,序列间逆合成预测方法虽然提高了预测的灵活性,但往往忽略了分子图结构的复杂性和原子间实际相互作用,从而限制了预测的准确性和可靠性。为了解决这些限制,本文提出了一种基于潜在生成过程(MRLGP)的分子反合成Top-k预测,该预测使用来自图神经网络的潜在变量来建模生成过程并产生不同的反应物集。利用基于graphhormer的编码方法,作者还引入了拓扑感知的位置编码,以更好地捕获分子图结构中原子节点之间的相互作用,从而更准确地模拟反合成过程。MRLGP模型通过将离散潜在变量与反应物生成过程相关联,并使用变分自回归解码器逐步构建分子图,显著提高了预测的准确性和多样性。在USPTO-50k、USPTO-Full和USPTO-DIVERSE等基准数据集上的实验结果表明,MRLGP在多个Top-k评估指标上优于基线模型。此外,在USPTO-50K数据集上进行的烧蚀实验进一步验证了模型编码器和解码器部分使用的方法的有效性。
{"title":"Molecular Retrosynthesis Top-K Prediction Based on the Latent Generation Process","authors":"Yupeng Liu,&nbsp;Han Zhang,&nbsp;Rui Hu","doi":"10.1049/cit2.70005","DOIUrl":"10.1049/cit2.70005","url":null,"abstract":"<p>In the field of organic synthesis, the core objective of retrosynthetic methods is to deduce possible synthetic routes and precursor molecules for complex target molecules. Traditional retrosynthetic methods, such as template-based retrosynthesis, have high accuracy and interpretability in specific types of reactions but are limited by the scope of the template library, making it difficult to adapt to new or uncommon reaction types. Moreover, sequence-to-sequence retrosynthetic prediction methods, although they enhance the flexibility of prediction, often overlook the complexity of molecular graph structures and the actual interactions between atoms, which limits the accuracy and reliability of the predictions. To address these limitations, this paper proposes a Molecular Retrosynthesis Top-k Prediction based on the Latent Generation Process (MRLGP) that uses latent variables from graph neural networks to model the generation process and produce diverse set of reactants. Utilising an encoding method based on Graphormer, the authors have also introduced topology-aware positional encoding to better capture the interactions between atomic nodes in the molecular graph structure, thereby more accurately simulating the retrosynthetic process. The MRLGP model significantly enhances the accuracy and diversity of predictions by correlating discrete latent variables with the reactant generation process and progressively constructing molecular graphs using a variational autoregressive decoder. Experimental results on benchmark datasets such as USPTO-50k, USPTO-Full, and USPTO-DIVERSE demonstrate that MRLGP outperforms baseline models on multiple Top-k evaluation metrics. Additionally, ablation experiments conducted on the USPTO-50K dataset further validate the effectiveness of the methods used in the encoder and decoder parts of the model.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"902-911"},"PeriodicalIF":7.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144503041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SG-TE: Spatial Guidance and Temporal Enhancement Network for Facial-Bodily Emotion Recognition 面部-身体情感识别的空间引导和时间增强网络
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-26 DOI: 10.1049/cit2.70006
Zhong Huang, Danni Zhang, Fuji Ren, Min Hu, Juan Liu, Haitao Yu

To overcome the deficiencies of single-modal emotion recognition based on facial expression or bodily posture in natural scenes, a spatial guidance and temporal enhancement (SG-TE) network is proposed for facial-bodily emotion recognition. First, ResNet50, DNN and spatial ransformer models are used to capture facial texture vectors, bodily skeleton vectors and whole-body geometric vectors, and an intraframe correlation attention guidance (S-CAG) mechanism, which guides the facial texture vector and the bodily skeleton vector by the whole-body geometric vector, is designed to exploit the spatial potential emotional correlation between face and posture. Second, an interframe significant segment enhancement (T-SSE) structure is embedded into a temporal transformer to enhance high emotional intensity frame information and avoid emotional asynchrony. Finally, an adaptive weight assignment (M-AWA) strategy is constructed to realise facial-bodily fusion. The experimental results on the BabyRobot Emotion Dataset (BRED) and Context-Aware Emotion Recognition (CAER) dataset indicate that the proposed network reaches accuracies of 81.61% and 89.39%, which are 9.61% and 9.46% higher than those of the baseline network, respectively. Compared with the state-of-the-art methods, the proposed method achieves 7.73% and 20.57% higher accuracy than single-modal methods based on facial expression or bodily posture, respectively, and 2.16% higher accuracy than the dual-modal methods based on facial-bodily fusion. Therefore, the proposed method, which adaptively fuses the complementary information of face and posture, improves the quality of emotion recognition in real-world scenarios.

为克服自然场景中基于面部表情或身体姿态的单模态情绪识别的不足,提出了一种基于空间引导和时间增强(SG-TE)的面部-身体情绪识别网络。首先,利用ResNet50、DNN和空间变换模型捕获人脸纹理向量、身体骨架向量和全身几何向量,设计框架内相关注意引导(S-CAG)机制,利用人脸与姿态之间的空间潜在情感关联,利用全身几何向量引导人脸纹理向量和身体骨架向量;其次,在时序转换器中嵌入帧间显著段增强(T-SSE)结构,增强高情绪强度帧信息,避免情绪不同步。最后,构造了一种自适应权重分配(M-AWA)策略,实现了面部与身体的融合。在BabyRobot情感数据集(BRED)和情境感知情感识别(CAER)数据集上的实验结果表明,该网络的准确率分别为81.61%和89.39%,比基线网络分别提高了9.61%和9.46%。与现有方法相比,该方法的准确率分别比基于面部表情和身体姿势的单模态方法高7.73%和20.57%,比基于面部和身体融合的双模态方法高2.16%。因此,该方法自适应地融合了人脸和姿态的互补信息,提高了真实场景下的情绪识别质量。
{"title":"SG-TE: Spatial Guidance and Temporal Enhancement Network for Facial-Bodily Emotion Recognition","authors":"Zhong Huang,&nbsp;Danni Zhang,&nbsp;Fuji Ren,&nbsp;Min Hu,&nbsp;Juan Liu,&nbsp;Haitao Yu","doi":"10.1049/cit2.70006","DOIUrl":"10.1049/cit2.70006","url":null,"abstract":"<p>To overcome the deficiencies of single-modal emotion recognition based on facial expression or bodily posture in natural scenes, a spatial guidance and temporal enhancement (SG-TE) network is proposed for facial-bodily emotion recognition. First, ResNet50, DNN and spatial ransformer models are used to capture facial texture vectors, bodily skeleton vectors and whole-body geometric vectors, and an intraframe correlation attention guidance (S-CAG) mechanism, which guides the facial texture vector and the bodily skeleton vector by the whole-body geometric vector, is designed to exploit the spatial potential emotional correlation between face and posture. Second, an interframe significant segment enhancement (T-SSE) structure is embedded into a temporal transformer to enhance high emotional intensity frame information and avoid emotional asynchrony. Finally, an adaptive weight assignment (M-AWA) strategy is constructed to realise facial-bodily fusion. The experimental results on the BabyRobot Emotion Dataset (BRED) and Context-Aware Emotion Recognition (CAER) dataset indicate that the proposed network reaches accuracies of 81.61% and 89.39%, which are 9.61% and 9.46% higher than those of the baseline network, respectively. Compared with the state-of-the-art methods, the proposed method achieves 7.73% and 20.57% higher accuracy than single-modal methods based on facial expression or bodily posture, respectively, and 2.16% higher accuracy than the dual-modal methods based on facial-bodily fusion. Therefore, the proposed method, which adaptively fuses the complementary information of face and posture, improves the quality of emotion recognition in real-world scenarios.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"871-890"},"PeriodicalIF":7.3,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144503065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Method for Automatic Feature Points Extraction of Pelvic Surface Based on PointMLP_RegNet 基于PointMLP_RegNet的骨盆表面特征点自动提取方法
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-14 DOI: 10.1049/cit2.70003
Wei Kou, Rui Zhou, Hongmiao Zhang, Jianwen Cheng, Chi Zhu, Shaolong Kuang, Lihai Zhang, Lining Sun

The success of robot-assisted pelvic fracture reduction surgery heavily relies on the accuracy of 3D/3D feature-based registration. This process involves extracting anatomical feature points from pre-operative 3D images which can be challenging because of the complex and variable structure of the pelvis. PointMLP_RegNet, a modified PointMLP, was introduced to address this issue. It retains the feature extraction module of PointMLP but replaces the classification layer with a regression layer to predict the coordinates of feature points instead of conducting regular classification. A flowchart for an automatic feature points extraction method was presented, and a series of experiments was conducted on a clinical pelvic dataset to confirm the accuracy and effectiveness of the method. PointMLP_RegNet extracted feature points more accurately, with 8 out of 10 points showing less than 4 mm errors and the remaining two less than 5 mm. Compared to PointNet++ and PointNet, it exhibited higher accuracy, robustness and space efficiency. The proposed method will improve the accuracy of anatomical feature points extraction, enhance intra-operative registration precision and facilitate the widespread clinical application of robot-assisted pelvic fracture reduction.

机器人辅助骨盆骨折复位手术的成功在很大程度上依赖于3D/3D特征配准的准确性。该过程包括从术前3D图像中提取解剖特征点,由于骨盆结构复杂多变,这可能具有挑战性。PointMLP_RegNet,一个修改过的PointMLP,被引入来解决这个问题。它保留了PointMLP的特征提取模块,但用回归层代替分类层来预测特征点的坐标,而不是进行常规分类。提出了一种自动特征点提取方法的流程,并在临床骨盆数据集上进行了一系列实验,验证了该方法的准确性和有效性。PointMLP_RegNet更准确地提取了特征点,10个点中有8个误差小于4毫米,其余两个小于5毫米。与PointNet++和PointNet相比,该方法具有更高的精度、鲁棒性和空间效率。该方法将提高解剖特征点提取的准确性,提高术中配准精度,促进机器人辅助骨盆骨折复位的广泛临床应用。
{"title":"A Method for Automatic Feature Points Extraction of Pelvic Surface Based on PointMLP_RegNet","authors":"Wei Kou,&nbsp;Rui Zhou,&nbsp;Hongmiao Zhang,&nbsp;Jianwen Cheng,&nbsp;Chi Zhu,&nbsp;Shaolong Kuang,&nbsp;Lihai Zhang,&nbsp;Lining Sun","doi":"10.1049/cit2.70003","DOIUrl":"10.1049/cit2.70003","url":null,"abstract":"<p>The success of robot-assisted pelvic fracture reduction surgery heavily relies on the accuracy of 3D/3D feature-based registration. This process involves extracting anatomical feature points from pre-operative 3D images which can be challenging because of the complex and variable structure of the pelvis. PointMLP_RegNet, a modified PointMLP, was introduced to address this issue. It retains the feature extraction module of PointMLP but replaces the classification layer with a regression layer to predict the coordinates of feature points instead of conducting regular classification. A flowchart for an automatic feature points extraction method was presented, and a series of experiments was conducted on a clinical pelvic dataset to confirm the accuracy and effectiveness of the method. PointMLP_RegNet extracted feature points more accurately, with 8 out of 10 points showing less than 4 mm errors and the remaining two less than 5 mm. Compared to PointNet++ and PointNet, it exhibited higher accuracy, robustness and space efficiency. The proposed method will improve the accuracy of anatomical feature points extraction, enhance intra-operative registration precision and facilitate the widespread clinical application of robot-assisted pelvic fracture reduction.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"716-727"},"PeriodicalIF":7.3,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144503086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing patient rehabilitation predictions with a hybrid anomaly detection model: Density-based clustering and interquartile range methods 用混合异常检测模型增强患者康复预测:基于密度的聚类和四分位数范围方法
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-05 DOI: 10.1049/cit2.70000
Murad Ali Khan, Jong-Hyun Jang, Naeem Iqbal, Harun Jamil, Syed Shehryar Ali Naqvi, Salabat Khan, Jae-Chul Kim, Do-Hyeun Kim

In recent years, there has been a concerted effort to improve anomaly detection techniques, particularly in the context of high-dimensional, distributed clinical data. Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy, personalising treatment plans, and optimising resource allocation to enhance clinical outcomes. Nonetheless, this domain faces unique challenges, such as irregular data collection, inconsistent data quality, and patient-specific structural variations. This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges. The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data, facilitating efficient anomaly identification. Subsequently, a stochastic method based on the Interquartile Range filters unreliable data points, ensuring that medical tools and professionals receive only the most pertinent and accurate information. The primary objective of this study is to equip healthcare professionals and researchers with a robust tool for managing extensive, high-dimensional clinical datasets, enabling effective isolation and removal of aberrant data points. Furthermore, a sophisticated regression model has been developed using Automated Machine Learning (AutoML) to assess the impact of the ensemble abnormal pattern detection approach. Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML. Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhancement in AutoML performance, with an average improvement of 0.041 in the R2 ${R}^{2}$ score, surpassing the effectiveness of traditional regression models.

近年来,人们一直在努力改进异常检测技术,特别是在高维、分布式临床数据的背景下。在临床环境中分析患者数据揭示了对改进诊断准确性、个性化治疗计划和优化资源分配以提高临床结果的显著关注。尽管如此,该领域面临着独特的挑战,例如不规则的数据收集、不一致的数据质量和患者特定的结构变化。本文提出了一种新的混合方法,将启发式和随机方法结合起来,用于患者临床数据的异常检测,以解决这些挑战。该策略将基于hpo的最优基于密度的应用空间聚类与噪声相结合,对患者运动数据进行聚类,从而实现高效的异常识别。随后,基于四分位间距的随机方法过滤不可靠的数据点,确保医疗工具和专业人员只接收到最相关和最准确的信息。本研究的主要目的是为医疗保健专业人员和研究人员提供一个强大的工具,用于管理广泛的高维临床数据集,从而有效地隔离和去除异常数据点。此外,利用自动机器学习(AutoML)开发了一个复杂的回归模型来评估集成异常模式检测方法的影响。各种统计误差估计技术与AutoML一起验证了混合方法的有效性。实验结果表明,将该创新混合模型应用于患者康复数据后,AutoML的性能得到了显著提升,r2 ${R}^{2}$评分平均提升0.041,超过了传统回归模型的有效性。
{"title":"Enhancing patient rehabilitation predictions with a hybrid anomaly detection model: Density-based clustering and interquartile range methods","authors":"Murad Ali Khan,&nbsp;Jong-Hyun Jang,&nbsp;Naeem Iqbal,&nbsp;Harun Jamil,&nbsp;Syed Shehryar Ali Naqvi,&nbsp;Salabat Khan,&nbsp;Jae-Chul Kim,&nbsp;Do-Hyeun Kim","doi":"10.1049/cit2.70000","DOIUrl":"10.1049/cit2.70000","url":null,"abstract":"<p>In recent years, there has been a concerted effort to improve anomaly detection techniques, particularly in the context of high-dimensional, distributed clinical data. Analysing patient data within clinical settings reveals a pronounced focus on refining diagnostic accuracy, personalising treatment plans, and optimising resource allocation to enhance clinical outcomes. Nonetheless, this domain faces unique challenges, such as irregular data collection, inconsistent data quality, and patient-specific structural variations. This paper proposed a novel hybrid approach that integrates heuristic and stochastic methods for anomaly detection in patient clinical data to address these challenges. The strategy combines HPO-based optimal Density-Based Spatial Clustering of Applications with Noise for clustering patient exercise data, facilitating efficient anomaly identification. Subsequently, a stochastic method based on the Interquartile Range filters unreliable data points, ensuring that medical tools and professionals receive only the most pertinent and accurate information. The primary objective of this study is to equip healthcare professionals and researchers with a robust tool for managing extensive, high-dimensional clinical datasets, enabling effective isolation and removal of aberrant data points. Furthermore, a sophisticated regression model has been developed using Automated Machine Learning (AutoML) to assess the impact of the ensemble abnormal pattern detection approach. Various statistical error estimation techniques validate the efficacy of the hybrid approach alongside AutoML. Experimental results show that implementing this innovative hybrid model on patient rehabilitation data leads to a notable enhancement in AutoML performance, with an average improvement of 0.041 in the <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mi>R</mi>\u0000 <mn>2</mn>\u0000 </msup>\u0000 </mrow>\u0000 <annotation> ${R}^{2}$</annotation>\u0000 </semantics></math> score, surpassing the effectiveness of traditional regression models.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"983-1006"},"PeriodicalIF":7.3,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive learning for nested Chinese Named Entity Recognition via template words 基于模板词的嵌套中文命名实体识别的对比学习
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-16 DOI: 10.1049/cit2.12403
Yuke Wang, Qiao Liu, Tingting Dai, Junjie Lang, Ling Lu, Yinong Chen

Existing Chinese named entity recognition (NER) research utilises 1D lexicon-based sequence labelling frameworks, which can only recognise flat entities. While lexicons serve as prior knowledge and enhance semantic information, they also pose completeness and resource requirements limitations. This paper proposes a template-based classification (TC) model to avoid lexicon issues and to identify nested entities. Template-based classification provides a template word for each entity type, which utilises contrastive learning to integrate the common characteristics among entities with the same category. Contrastive learning makes template words the centre points of their category in the vector space, thus improving generalisation ability. Additionally, TC presents a 2D table-filling label scheme that classifies entities based on the attention distribution of template words. The proposed novel decoder algorithm enables TC recognition of both flat and nested entities simultaneously. Experimental results show that TC achieves the state-of-the-art performance on five Chinese datasets.

现有的中文命名实体识别(NER)研究利用基于一维词典的序列标记框架,只能识别平面实体。虽然词典可以作为先验知识并增强语义信息,但它们也会带来完整性和资源需求的限制。本文提出了一种基于模板的分类模型,以避免词汇问题和识别嵌套实体。基于模板的分类为每个实体类型提供一个模板词,它利用对比学习来整合具有相同类别的实体之间的共同特征。对比学习使模板词成为其类别在向量空间中的中心点,从而提高泛化能力。此外,TC提出了一种基于模板词的注意力分布对实体进行分类的二维填表标签方案。提出的解码器算法可以同时对平面实体和嵌套实体进行TC识别。实验结果表明,该方法在5个中文数据集上达到了最先进的性能。
{"title":"Contrastive learning for nested Chinese Named Entity Recognition via template words","authors":"Yuke Wang,&nbsp;Qiao Liu,&nbsp;Tingting Dai,&nbsp;Junjie Lang,&nbsp;Ling Lu,&nbsp;Yinong Chen","doi":"10.1049/cit2.12403","DOIUrl":"10.1049/cit2.12403","url":null,"abstract":"<p>Existing Chinese named entity recognition (NER) research utilises 1D lexicon-based sequence labelling frameworks, which can only recognise flat entities. While lexicons serve as prior knowledge and enhance semantic information, they also pose completeness and resource requirements limitations. This paper proposes a template-based classification (TC) model to avoid lexicon issues and to identify nested entities. Template-based classification provides a template word for each entity type, which utilises contrastive learning to integrate the common characteristics among entities with the same category. Contrastive learning makes template words the centre points of their category in the vector space, thus improving generalisation ability. Additionally, TC presents a 2D table-filling label scheme that classifies entities based on the attention distribution of template words. The proposed novel decoder algorithm enables TC recognition of both flat and nested entities simultaneously. Experimental results show that TC achieves the state-of-the-art performance on five Chinese datasets.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"450-459"},"PeriodicalIF":7.3,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12403","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
CAAI Transactions on Intelligence Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1