癫痫患者电子病历的符号学提取和基于机器学习的分类:回顾性分析。

IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS JMIR Medical Informatics Pub Date : 2024-10-17 DOI:10.2196/57727
Yilin Xia, Mengqiao He, Sijia Basang, Leihao Sha, Zijie Huang, Ling Jin, Yifei Duan, Yusha Tang, Hua Li, Wanlin Lai, Lei Chen
{"title":"癫痫患者电子病历的符号学提取和基于机器学习的分类:回顾性分析。","authors":"Yilin Xia, Mengqiao He, Sijia Basang, Leihao Sha, Zijie Huang, Ling Jin, Yifei Duan, Yusha Tang, Hua Li, Wanlin Lai, Lei Chen","doi":"10.2196/57727","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Obtaining and describing semiology efficiently and classifying seizure types correctly are crucial for the diagnosis and treatment of epilepsy. Nevertheless, there exists an inadequacy in related informatics resources and decision support tools.</p><p><strong>Objective: </strong>We developed a symptom entity extraction tool and an epilepsy semiology ontology (ESO) and used machine learning to achieve an automated binary classification of epilepsy in this study.</p><p><strong>Methods: </strong>Using present history data of electronic health records from the Southwest Epilepsy Center in China, we constructed an ESO and a symptom-entity extraction tool to extract seizure duration, seizure symptoms, and seizure frequency from the unstructured text by combining manual annotation with natural language processing techniques. In addition, we achieved automatic classification of patients in the study cohort with high accuracy based on the extracted seizure feature data using multiple machine learning methods.</p><p><strong>Results: </strong>Data included present history from 10,925 cases between 2010 and 2020. Six annotators labeled a total of 2500 texts to obtain 5844 words of semiology and construct an ESO with 702 terms. Based on the ontology, the extraction tool achieved an accuracy rate of 85% in symptom extraction. Furthermore, we trained a stacking ensemble learning model combining XGBoost and random forest with an F1-score of 75.03%. The random forest model had the highest area under the curve (0.985).</p><p><strong>Conclusions: </strong>This work demonstrated the feasibility of natural language processing-assisted structural extraction of epilepsy medical record texts and downstream tasks, providing open ontology resources for subsequent related work.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e57727"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501417/pdf/","citationCount":"0","resultStr":"{\"title\":\"Semiology Extraction and Machine Learning-Based Classification of Electronic Health Records for Patients With Epilepsy: Retrospective Analysis.\",\"authors\":\"Yilin Xia, Mengqiao He, Sijia Basang, Leihao Sha, Zijie Huang, Ling Jin, Yifei Duan, Yusha Tang, Hua Li, Wanlin Lai, Lei Chen\",\"doi\":\"10.2196/57727\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Obtaining and describing semiology efficiently and classifying seizure types correctly are crucial for the diagnosis and treatment of epilepsy. Nevertheless, there exists an inadequacy in related informatics resources and decision support tools.</p><p><strong>Objective: </strong>We developed a symptom entity extraction tool and an epilepsy semiology ontology (ESO) and used machine learning to achieve an automated binary classification of epilepsy in this study.</p><p><strong>Methods: </strong>Using present history data of electronic health records from the Southwest Epilepsy Center in China, we constructed an ESO and a symptom-entity extraction tool to extract seizure duration, seizure symptoms, and seizure frequency from the unstructured text by combining manual annotation with natural language processing techniques. In addition, we achieved automatic classification of patients in the study cohort with high accuracy based on the extracted seizure feature data using multiple machine learning methods.</p><p><strong>Results: </strong>Data included present history from 10,925 cases between 2010 and 2020. Six annotators labeled a total of 2500 texts to obtain 5844 words of semiology and construct an ESO with 702 terms. Based on the ontology, the extraction tool achieved an accuracy rate of 85% in symptom extraction. Furthermore, we trained a stacking ensemble learning model combining XGBoost and random forest with an F1-score of 75.03%. The random forest model had the highest area under the curve (0.985).</p><p><strong>Conclusions: </strong>This work demonstrated the feasibility of natural language processing-assisted structural extraction of epilepsy medical record texts and downstream tasks, providing open ontology resources for subsequent related work.</p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"12 \",\"pages\":\"e57727\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501417/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/57727\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/57727","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

背景:有效获取和描述符号学,正确分类癫痫发作类型对癫痫的诊断和治疗至关重要。然而,在相关的信息学资源和决策支持工具方面存在不足。目的:在本研究中,我们开发了一种症状实体提取工具和癫痫符号学本体(ESO),并利用机器学习实现癫痫的自动二分类。方法:利用中国西南癫痫中心现有的电子病历数据,构建ESO和症状实体提取工具,结合人工标注和自然语言处理技术,从非结构化文本中提取发作持续时间、发作症状和发作频率。此外,我们利用多种机器学习方法,基于提取的癫痫特征数据,实现了对研究队列患者的高精度自动分类。结果:数据包括2010年至2020年10925例患者的病史。6位注释者对2500个文本进行标注,得到5844个符号学词汇,构建了一个包含702个术语的ESO。在本体的基础上,该提取工具对症状的提取准确率达到85%。此外,我们训练了一个结合XGBoost和随机森林的堆叠集成学习模型,f1得分为75.03%。随机森林模型曲线下面积最大(0.985)。结论:本工作证明了自然语言处理辅助癫痫病案文本结构提取及下游任务的可行性,为后续相关工作提供了开放的本体资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Semiology Extraction and Machine Learning-Based Classification of Electronic Health Records for Patients With Epilepsy: Retrospective Analysis.

Background: Obtaining and describing semiology efficiently and classifying seizure types correctly are crucial for the diagnosis and treatment of epilepsy. Nevertheless, there exists an inadequacy in related informatics resources and decision support tools.

Objective: We developed a symptom entity extraction tool and an epilepsy semiology ontology (ESO) and used machine learning to achieve an automated binary classification of epilepsy in this study.

Methods: Using present history data of electronic health records from the Southwest Epilepsy Center in China, we constructed an ESO and a symptom-entity extraction tool to extract seizure duration, seizure symptoms, and seizure frequency from the unstructured text by combining manual annotation with natural language processing techniques. In addition, we achieved automatic classification of patients in the study cohort with high accuracy based on the extracted seizure feature data using multiple machine learning methods.

Results: Data included present history from 10,925 cases between 2010 and 2020. Six annotators labeled a total of 2500 texts to obtain 5844 words of semiology and construct an ESO with 702 terms. Based on the ontology, the extraction tool achieved an accuracy rate of 85% in symptom extraction. Furthermore, we trained a stacking ensemble learning model combining XGBoost and random forest with an F1-score of 75.03%. The random forest model had the highest area under the curve (0.985).

Conclusions: This work demonstrated the feasibility of natural language processing-assisted structural extraction of epilepsy medical record texts and downstream tasks, providing open ontology resources for subsequent related work.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
期刊最新文献
Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study. The Impact of Data Control and Delayed Discounting on the Public's Willingness to Share Different Types of Health Care Data: Empirical Study. Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study. A Dynamic Adaptive Ensemble Learning Framework for Noninvasive Mild Cognitive Impairment Detection: Development and Validation Study. Interpretable Machine Learning Model for Predicting Postpartum Depression: Retrospective Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1