中文医学实体到统一医学语言系统的映射

Health data science Pub Date : 2023-03-30 eCollection Date: 2023-01-01 DOI:10.34133/hds.0011
Luming Chen, Yifan Qi, Aiping Wu, Lizong Deng, Taijiao Jiang
{"title":"中文医学实体到统一医学语言系统的映射","authors":"Luming Chen, Yifan Qi, Aiping Wu, Lizong Deng, Taijiao Jiang","doi":"10.34133/hds.0011","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Chinese medical entities have not been organized comprehensively due to the lack of well-developed terminology systems, which poses a challenge to processing Chinese medical texts for fine-grained medical knowledge representation. To unify Chinese medical terminologies, mapping Chinese medical entities to their English counterparts in the Unified Medical Language System (UMLS) is an efficient solution. However, their mappings have not been investigated sufficiently in former research. In this study, we explore strategies for mapping Chinese medical entities to the UMLS and systematically evaluate the mapping performance.</p><p><strong>Methods: </strong>First, Chinese medical entities are translated to English using multiple web-based translation engines. Then, 3 mapping strategies are investigated: (a) string-based, (b) semantic-based, and (c) string and semantic similarity combined. In addition, cross-lingual pretrained language models are applied to map Chinese medical entities to UMLS concepts without translation. All of these strategies are evaluated on the ICD10-CN, Chinese Human Phenotype Ontology (CHPO), and RealWorld datasets.</p><p><strong>Results: </strong>The linear combination method based on the SapBERT and term frequency-inverse document frequency bag-of-words models perform the best on all evaluation datasets, with 91.85%, 82.44%, and 78.43% of the top 5 accuracies on the ICD10-CN, CHPO, and RealWorld datasets, respectively.</p><p><strong>Conclusions: </strong>In our study, we explore strategies for mapping Chinese medical entities to the UMLS and identify a satisfactory linear combination method. Our investigation will facilitate Chinese medical entity normalization and inspire research that focuses on Chinese medical ontology development.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"1 1","pages":"0011"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10880171/pdf/","citationCount":"0","resultStr":"{\"title\":\"Mapping Chinese Medical Entities to the Unified Medical Language System.\",\"authors\":\"Luming Chen, Yifan Qi, Aiping Wu, Lizong Deng, Taijiao Jiang\",\"doi\":\"10.34133/hds.0011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Chinese medical entities have not been organized comprehensively due to the lack of well-developed terminology systems, which poses a challenge to processing Chinese medical texts for fine-grained medical knowledge representation. To unify Chinese medical terminologies, mapping Chinese medical entities to their English counterparts in the Unified Medical Language System (UMLS) is an efficient solution. However, their mappings have not been investigated sufficiently in former research. In this study, we explore strategies for mapping Chinese medical entities to the UMLS and systematically evaluate the mapping performance.</p><p><strong>Methods: </strong>First, Chinese medical entities are translated to English using multiple web-based translation engines. Then, 3 mapping strategies are investigated: (a) string-based, (b) semantic-based, and (c) string and semantic similarity combined. In addition, cross-lingual pretrained language models are applied to map Chinese medical entities to UMLS concepts without translation. All of these strategies are evaluated on the ICD10-CN, Chinese Human Phenotype Ontology (CHPO), and RealWorld datasets.</p><p><strong>Results: </strong>The linear combination method based on the SapBERT and term frequency-inverse document frequency bag-of-words models perform the best on all evaluation datasets, with 91.85%, 82.44%, and 78.43% of the top 5 accuracies on the ICD10-CN, CHPO, and RealWorld datasets, respectively.</p><p><strong>Conclusions: </strong>In our study, we explore strategies for mapping Chinese medical entities to the UMLS and identify a satisfactory linear combination method. Our investigation will facilitate Chinese medical entity normalization and inspire research that focuses on Chinese medical ontology development.</p>\",\"PeriodicalId\":73207,\"journal\":{\"name\":\"Health data science\",\"volume\":\"1 1\",\"pages\":\"0011\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10880171/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health data science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34133/hds.0011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:由于缺乏完善的术语系统,中文医学实体尚未得到全面整理,这给处理中文医学文本以进行精细医学知识表征带来了挑战。为了统一中文医学术语,将中文医学实体映射到统一医学语言系统(UMLS)中的英文对应实体是一个有效的解决方案。然而,以往的研究并未对其映射进行充分研究。在本研究中,我们探索了将中文医学实体映射到 UMLS 的策略,并对映射性能进行了系统评估:方法:首先,使用多个网络翻译引擎将中文医学实体翻译成英文。方法:首先,使用多个基于网络的翻译引擎将中文医疗实体翻译成英文,然后研究 3 种映射策略:(a) 基于字符串,(b) 基于语义,(c) 结合字符串和语义相似性。此外,还应用了跨语言预训练语言模型,在不翻译的情况下将中文医学实体映射到 UMLS 概念。所有这些策略都在 ICD10-CN、Chinese Human Phenotype Ontology (CHPO) 和 RealWorld 数据集上进行了评估:基于 SapBERT 和词频-反文档频率词袋模型的线性组合方法在所有评估数据集上表现最佳,在 ICD10-CN、CHPO 和 RealWorld 数据集上的前 5 名准确率分别为 91.85%、82.44% 和 78.43%:在我们的研究中,我们探索了将中医实体映射到 UMLS 的策略,并确定了一种令人满意的线性组合方法。我们的研究将促进中医实体的规范化,并对专注于中医本体开发的研究有所启发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Mapping Chinese Medical Entities to the Unified Medical Language System.

Background: Chinese medical entities have not been organized comprehensively due to the lack of well-developed terminology systems, which poses a challenge to processing Chinese medical texts for fine-grained medical knowledge representation. To unify Chinese medical terminologies, mapping Chinese medical entities to their English counterparts in the Unified Medical Language System (UMLS) is an efficient solution. However, their mappings have not been investigated sufficiently in former research. In this study, we explore strategies for mapping Chinese medical entities to the UMLS and systematically evaluate the mapping performance.

Methods: First, Chinese medical entities are translated to English using multiple web-based translation engines. Then, 3 mapping strategies are investigated: (a) string-based, (b) semantic-based, and (c) string and semantic similarity combined. In addition, cross-lingual pretrained language models are applied to map Chinese medical entities to UMLS concepts without translation. All of these strategies are evaluated on the ICD10-CN, Chinese Human Phenotype Ontology (CHPO), and RealWorld datasets.

Results: The linear combination method based on the SapBERT and term frequency-inverse document frequency bag-of-words models perform the best on all evaluation datasets, with 91.85%, 82.44%, and 78.43% of the top 5 accuracies on the ICD10-CN, CHPO, and RealWorld datasets, respectively.

Conclusions: In our study, we explore strategies for mapping Chinese medical entities to the UMLS and identify a satisfactory linear combination method. Our investigation will facilitate Chinese medical entity normalization and inspire research that focuses on Chinese medical ontology development.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.70
自引率
0.00%
发文量
0
期刊最新文献
Multi-Modal CLIP-Informed Protein Editing. The Burden of Type 2 Diabetes in Adolescents and Young Adults in China: A Secondary Analysis from the Global Burden of Disease Study 2021. Federated Learning in Healthcare: A Benchmark Comparison of Engineering and Statistical Approaches for Structured Data Analysis. Robust Meta-Model for Predicting the Likelihood of Receiving Blood Transfusion in Non-traumatic Intensive Care Unit Patients. Survival Disparities among Cancer Patients Based on Mobility Patterns: A Population-Based Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1