利用基于transformer的模型和关联数据在放射学中进行深度表型分析。

IF 4.9 2区 医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer methods and programs in biomedicine Pub Date : 2025-01-03 DOI:10.1016/j.cmpb.2024.108567
Lluís-F. Hurtado , Luis Marco-Ruiz , Encarna Segarra , Maria Jose Castro-Bleda , Aurelia Bustos-Moreno , Maria de la Iglesia-Vayá , Juan Francisco Vallalta-Rueda
{"title":"利用基于transformer的模型和关联数据在放射学中进行深度表型分析。","authors":"Lluís-F. Hurtado ,&nbsp;Luis Marco-Ruiz ,&nbsp;Encarna Segarra ,&nbsp;Maria Jose Castro-Bleda ,&nbsp;Aurelia Bustos-Moreno ,&nbsp;Maria de la Iglesia-Vayá ,&nbsp;Juan Francisco Vallalta-Rueda","doi":"10.1016/j.cmpb.2024.108567","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Despite significant investments in the normalization and the standardization of Electronic Health Records (EHRs), free text is still the rule rather than the exception in clinical notes. The use of free text has implications in data reuse methods used for supporting clinical research since the query mechanisms used in cohort definition and patient matching are mainly based on structured data and clinical terminologies. This study aims to develop a method for the secondary use of clinical text by: (a) using Natural Language Processing (NLP) for tagging clinical notes with biomedical terminology; and (b) designing an ontology that maps and classifies all the identified tags to various terminologies and allows for running phenotyping queries.</div></div><div><h3>Methods and Results:</h3><div>Transformers-based NLP Models, concretely pre-trained RoBERTa language models, were used to process radiology reports and annotate them identifying elements matching UMLS Concept Unique Identifiers (CUIs) definitions. CUIs were mapped into several biomedical ontologies useful for phenotyping (e.g., SNOMED-CT, HPO, ICD-10, FMA, LOINC, and ICPC2, among others) and represented as a lightweight ontology using OWL (Web Ontology Language) constructs. This process resulted in a Linked Knowledge Base (LKB), which allows running expressive queries to retrieve reports that comply with specific criteria using automatic reasoning.</div></div><div><h3>Conclusion:</h3><div>Although phenotyping tools mostly rely on relational databases, the combination of NLP and Linked Data technologies allows us to build scalable knowledge bases using standard ontologies from the Web of data. Our approach enables us to execute a pipeline which input is free text and automatically maps identified entities to a LKB that allows answering phenotyping queries. In this work, we have only used Spanish radiology reports, although it is extensible to other languages for which suitable corpora are available. This is particularly valuable in regional and national systems dealing with large research databases from different registries and cohorts and plays an essential role in the scalability of large data reuse infrastructures that require indexing and governing distributed data sources.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"260 ","pages":"Article 108567"},"PeriodicalIF":4.9000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging Transformers-based models and linked data for deep phenotyping in radiology\",\"authors\":\"Lluís-F. Hurtado ,&nbsp;Luis Marco-Ruiz ,&nbsp;Encarna Segarra ,&nbsp;Maria Jose Castro-Bleda ,&nbsp;Aurelia Bustos-Moreno ,&nbsp;Maria de la Iglesia-Vayá ,&nbsp;Juan Francisco Vallalta-Rueda\",\"doi\":\"10.1016/j.cmpb.2024.108567\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and Objective:</h3><div>Despite significant investments in the normalization and the standardization of Electronic Health Records (EHRs), free text is still the rule rather than the exception in clinical notes. The use of free text has implications in data reuse methods used for supporting clinical research since the query mechanisms used in cohort definition and patient matching are mainly based on structured data and clinical terminologies. This study aims to develop a method for the secondary use of clinical text by: (a) using Natural Language Processing (NLP) for tagging clinical notes with biomedical terminology; and (b) designing an ontology that maps and classifies all the identified tags to various terminologies and allows for running phenotyping queries.</div></div><div><h3>Methods and Results:</h3><div>Transformers-based NLP Models, concretely pre-trained RoBERTa language models, were used to process radiology reports and annotate them identifying elements matching UMLS Concept Unique Identifiers (CUIs) definitions. CUIs were mapped into several biomedical ontologies useful for phenotyping (e.g., SNOMED-CT, HPO, ICD-10, FMA, LOINC, and ICPC2, among others) and represented as a lightweight ontology using OWL (Web Ontology Language) constructs. This process resulted in a Linked Knowledge Base (LKB), which allows running expressive queries to retrieve reports that comply with specific criteria using automatic reasoning.</div></div><div><h3>Conclusion:</h3><div>Although phenotyping tools mostly rely on relational databases, the combination of NLP and Linked Data technologies allows us to build scalable knowledge bases using standard ontologies from the Web of data. Our approach enables us to execute a pipeline which input is free text and automatically maps identified entities to a LKB that allows answering phenotyping queries. In this work, we have only used Spanish radiology reports, although it is extensible to other languages for which suitable corpora are available. This is particularly valuable in regional and national systems dealing with large research databases from different registries and cohorts and plays an essential role in the scalability of large data reuse infrastructures that require indexing and governing distributed data sources.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"260 \",\"pages\":\"Article 108567\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169260724005601\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724005601","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

背景和目的:尽管在电子健康记录(EHRs)的规范化和标准化方面进行了大量投资,但在临床记录中,自由文本仍然是一种规则,而不是例外。由于队列定义和患者匹配中使用的查询机制主要基于结构化数据和临床术语,因此使用自由文本对用于支持临床研究的数据重用方法具有影响。本研究旨在开发一种临床文本的二次使用方法:(a)使用自然语言处理(NLP)用生物医学术语标记临床笔记;(b)设计一个本体,将所有已识别的标签映射和分类到各种术语,并允许运行表型查询。方法和结果:基于transformer的NLP模型,具体的预训练RoBERTa语言模型,用于处理放射学报告,并对其进行注释,以识别与UMLS概念唯一标识符(gui)定义匹配的元素。将gui映射为几种对表型分析有用的生物医学本体(例如,SNOMED-CT、HPO、ICD-10、FMA、LOINC和ICPC2等),并使用OWL (Web ontology Language)结构表示为轻量级本体。这个过程产生了一个链接知识库(link Knowledge Base, LKB),它允许运行表达性查询来检索使用自动推理符合特定标准的报告。结论:虽然表型工具主要依赖于关系数据库,但NLP和关联数据技术的结合使我们能够使用来自数据网络的标准本体构建可扩展的知识库。我们的方法使我们能够执行一个管道,它的输入是自由文本,并自动将识别的实体映射到允许回答表型查询的LKB。在这项工作中,我们只使用了西班牙语放射学报告,尽管它可以扩展到其他语言,因为有合适的语料库可用。这在处理来自不同登记和队列的大型研究数据库的区域和国家系统中特别有价值,并在需要索引和管理分布式数据源的大型数据重用基础设施的可伸缩性方面发挥重要作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Leveraging Transformers-based models and linked data for deep phenotyping in radiology

Background and Objective:

Despite significant investments in the normalization and the standardization of Electronic Health Records (EHRs), free text is still the rule rather than the exception in clinical notes. The use of free text has implications in data reuse methods used for supporting clinical research since the query mechanisms used in cohort definition and patient matching are mainly based on structured data and clinical terminologies. This study aims to develop a method for the secondary use of clinical text by: (a) using Natural Language Processing (NLP) for tagging clinical notes with biomedical terminology; and (b) designing an ontology that maps and classifies all the identified tags to various terminologies and allows for running phenotyping queries.

Methods and Results:

Transformers-based NLP Models, concretely pre-trained RoBERTa language models, were used to process radiology reports and annotate them identifying elements matching UMLS Concept Unique Identifiers (CUIs) definitions. CUIs were mapped into several biomedical ontologies useful for phenotyping (e.g., SNOMED-CT, HPO, ICD-10, FMA, LOINC, and ICPC2, among others) and represented as a lightweight ontology using OWL (Web Ontology Language) constructs. This process resulted in a Linked Knowledge Base (LKB), which allows running expressive queries to retrieve reports that comply with specific criteria using automatic reasoning.

Conclusion:

Although phenotyping tools mostly rely on relational databases, the combination of NLP and Linked Data technologies allows us to build scalable knowledge bases using standard ontologies from the Web of data. Our approach enables us to execute a pipeline which input is free text and automatically maps identified entities to a LKB that allows answering phenotyping queries. In this work, we have only used Spanish radiology reports, although it is extensible to other languages for which suitable corpora are available. This is particularly valuable in regional and national systems dealing with large research databases from different registries and cohorts and plays an essential role in the scalability of large data reuse infrastructures that require indexing and governing distributed data sources.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer methods and programs in biomedicine
Computer methods and programs in biomedicine 工程技术-工程:生物医学
CiteScore
12.30
自引率
6.60%
发文量
601
审稿时长
135 days
期刊介绍: To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.
期刊最新文献
Editorial Board A Markov Chain methodology for care pathway mapping using health insurance data, a study case on pediatric TBI Towards clinical prediction with transparency: An explainable AI approach to survival modelling in residential aged care A novel endoscopic posterior cervical decompression and interbody fusion technique: Feasibility and biomechanical analysis Nonlinear dose-response relationship in tDCS-induced brain network synchrony: A resting-state whole-brain model analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1