癫痫患者电子病历的符号学提取和基于机器学习的分类：回顾性分析。

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS JMIR Medical Informatics Pub Date : 2024-10-17 DOI:10.2196/57727

Yilin Xia, Mengqiao He, Sijia Basang, Leihao Sha, Zijie Huang, Ling Jin, Yifei Duan, Yusha Tang, Hua Li, Wanlin Lai, Lei Chen

{"title":"癫痫患者电子病历的符号学提取和基于机器学习的分类：回顾性分析。","authors":"Yilin Xia, Mengqiao He, Sijia Basang, Leihao Sha, Zijie Huang, Ling Jin, Yifei Duan, Yusha Tang, Hua Li, Wanlin Lai, Lei Chen","doi":"10.2196/57727","DOIUrl":null,"url":null,"abstract":"Background: Obtaining and describing semiology efficiently and classifying seizure types correctly are crucial for the diagnosis and treatment of epilepsy. Nevertheless, there exists an inadequacy in related informatics resources and decision support tools.Objective: We developed a symptom entity extraction tool and an epilepsy semiology ontology (ESO) and used machine learning to achieve an automated binary classification of epilepsy in this study.Methods: Using present history data of electronic health records from the Southwest Epilepsy Center in China, we constructed an ESO and a symptom-entity extraction tool to extract seizure duration, seizure symptoms, and seizure frequency from the unstructured text by combining manual annotation with natural language processing techniques. In addition, we achieved automatic classification of patients in the study cohort with high accuracy based on the extracted seizure feature data using multiple machine learning methods.Results: Data included present history from 10,925 cases between 2010 and 2020. Six annotators labeled a total of 2500 texts to obtain 5844 words of semiology and construct an ESO with 702 terms. Based on the ontology, the extraction tool achieved an accuracy rate of 85% in symptom extraction. Furthermore, we trained a stacking ensemble learning model combining XGBoost and random forest with an F1-score of 75.03%. The random forest model had the highest area under the curve (0.985).Conclusions: This work demonstrated the feasibility of natural language processing-assisted structural extraction of epilepsy medical record texts and downstream tasks, providing open ontology resources for subsequent related work.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e57727"},"PeriodicalIF":3.8000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501417/pdf/","citationCount":"0","resultStr":"{\"title\":\"Semiology Extraction and Machine Learning-Based Classification of Electronic Health Records for Patients With Epilepsy: Retrospective Analysis.\",\"authors\":\"Yilin Xia, Mengqiao He, Sijia Basang, Leihao Sha, Zijie Huang, Ling Jin, Yifei Duan, Yusha Tang, Hua Li, Wanlin Lai, Lei Chen\",\"doi\":\"10.2196/57727\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Obtaining and describing semiology efficiently and classifying seizure types correctly are crucial for the diagnosis and treatment of epilepsy. Nevertheless, there exists an inadequacy in related informatics resources and decision support tools.Objective: We developed a symptom entity extraction tool and an epilepsy semiology ontology (ESO) and used machine learning to achieve an automated binary classification of epilepsy in this study.Methods: Using present history data of electronic health records from the Southwest Epilepsy Center in China, we constructed an ESO and a symptom-entity extraction tool to extract seizure duration, seizure symptoms, and seizure frequency from the unstructured text by combining manual annotation with natural language processing techniques. In addition, we achieved automatic classification of patients in the study cohort with high accuracy based on the extracted seizure feature data using multiple machine learning methods.Results: Data included present history from 10,925 cases between 2010 and 2020. Six annotators labeled a total of 2500 texts to obtain 5844 words of semiology and construct an ESO with 702 terms. Based on the ontology, the extraction tool achieved an accuracy rate of 85% in symptom extraction. Furthermore, we trained a stacking ensemble learning model combining XGBoost and random forest with an F1-score of 75.03%. The random forest model had the highest area under the curve (0.985).Conclusions: This work demonstrated the feasibility of natural language processing-assisted structural extraction of epilepsy medical record texts and downstream tasks, providing open ontology resources for subsequent related work.\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"12 \",\"pages\":\"e57727\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501417/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/57727\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/57727","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：有效获取和描述符号学，正确分类癫痫发作类型对癫痫的诊断和治疗至关重要。然而，在相关的信息学资源和决策支持工具方面存在不足。目的：在本研究中，我们开发了一种症状实体提取工具和癫痫符号学本体（ESO），并利用机器学习实现癫痫的自动二分类。方法：利用中国西南癫痫中心现有的电子病历数据，构建ESO和症状实体提取工具，结合人工标注和自然语言处理技术，从非结构化文本中提取发作持续时间、发作症状和发作频率。此外，我们利用多种机器学习方法，基于提取的癫痫特征数据，实现了对研究队列患者的高精度自动分类。结果：数据包括2010年至2020年10925例患者的病史。6位注释者对2500个文本进行标注，得到5844个符号学词汇，构建了一个包含702个术语的ESO。在本体的基础上，该提取工具对症状的提取准确率达到85%。此外，我们训练了一个结合XGBoost和随机森林的堆叠集成学习模型，f1得分为75.03%。随机森林模型曲线下面积最大（0.985）。结论：本工作证明了自然语言处理辅助癫痫病案文本结构提取及下游任务的可行性，为后续相关工作提供了开放的本体资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Semiology Extraction and Machine Learning-Based Classification of Electronic Health Records for Patients With Epilepsy: Retrospective Analysis.

Background: Obtaining and describing semiology efficiently and classifying seizure types correctly are crucial for the diagnosis and treatment of epilepsy. Nevertheless, there exists an inadequacy in related informatics resources and decision support tools.

Objective: We developed a symptom entity extraction tool and an epilepsy semiology ontology (ESO) and used machine learning to achieve an automated binary classification of epilepsy in this study.

Methods: Using present history data of electronic health records from the Southwest Epilepsy Center in China, we constructed an ESO and a symptom-entity extraction tool to extract seizure duration, seizure symptoms, and seizure frequency from the unstructured text by combining manual annotation with natural language processing techniques. In addition, we achieved automatic classification of patients in the study cohort with high accuracy based on the extracted seizure feature data using multiple machine learning methods.

Results: Data included present history from 10,925 cases between 2010 and 2020. Six annotators labeled a total of 2500 texts to obtain 5844 words of semiology and construct an ESO with 702 terms. Based on the ontology, the extraction tool achieved an accuracy rate of 85% in symptom extraction. Furthermore, we trained a stacking ensemble learning model combining XGBoost and random forest with an F1-score of 75.03%. The random forest model had the highest area under the curve (0.985).

Conclusions: This work demonstrated the feasibility of natural language processing-assisted structural extraction of epilepsy medical record texts and downstream tasks, providing open ontology resources for subsequent related work.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.