从初级保健电子病历中识别疾病队列的数据挖掘方法:一例糖尿病

Q3 Computer Science Open Bioinformatics Journal Pub Date : 2017-12-12 DOI:10.2174/1875036201710010016
Ebenezer S. Owusu Adjah, O. Montvida, Julius Agbeve, S. Paul
{"title":"从初级保健电子病历中识别疾病队列的数据挖掘方法:一例糖尿病","authors":"Ebenezer S. Owusu Adjah, O. Montvida, Julius Agbeve, S. Paul","doi":"10.2174/1875036201710010016","DOIUrl":null,"url":null,"abstract":"Background: Identification of diseased patients from primary care based electronic medical records (EMRs) has methodological challenges that may impact epidemiologic inferences. Objective: To compare deterministic clinically guided selection algorithms with probabilistic machine learning (ML) methodologies for their ability to identify patients with type 2 diabetes mellitus (T2DM) from large population based EMRs from nationally representative primary care database. Methods: Four cohorts of patients with T2DM were defined by deterministic approach based on disease codes. The database was mined for a set of best predictors of T2DM and the performance of six ML algorithms were compared based on cross-validated true positive rate, true negative rate, and area under receiver operating characteristic curve. Results: In the database of 11,018,025 research suitable individuals, 379 657 (3.4%) were coded to have T2DM. Logistic Regression classifier was selected as best ML algorithm and resulted in a cohort of 383,330 patients with potential T2DM. Eighty-three percent (83%) of this cohort had a T2DM code, and 16% of the patients with T2DM code were not included in this ML cohort. Of those in the ML cohort without disease code, 52% had at least one measure of elevated glucose level and 22% had received at least one prescription for antidiabetic medication. Conclusion: Deterministic cohort selection based on disease coding potentially introduces significant mis-classification problem. ML techniques allow testing for potential disease predictors, and under meaningful data input, are able to identify diseased cohorts in a holistic way.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"10 1","pages":"16-27"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus\",\"authors\":\"Ebenezer S. Owusu Adjah, O. Montvida, Julius Agbeve, S. Paul\",\"doi\":\"10.2174/1875036201710010016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Identification of diseased patients from primary care based electronic medical records (EMRs) has methodological challenges that may impact epidemiologic inferences. Objective: To compare deterministic clinically guided selection algorithms with probabilistic machine learning (ML) methodologies for their ability to identify patients with type 2 diabetes mellitus (T2DM) from large population based EMRs from nationally representative primary care database. Methods: Four cohorts of patients with T2DM were defined by deterministic approach based on disease codes. The database was mined for a set of best predictors of T2DM and the performance of six ML algorithms were compared based on cross-validated true positive rate, true negative rate, and area under receiver operating characteristic curve. Results: In the database of 11,018,025 research suitable individuals, 379 657 (3.4%) were coded to have T2DM. Logistic Regression classifier was selected as best ML algorithm and resulted in a cohort of 383,330 patients with potential T2DM. Eighty-three percent (83%) of this cohort had a T2DM code, and 16% of the patients with T2DM code were not included in this ML cohort. Of those in the ML cohort without disease code, 52% had at least one measure of elevated glucose level and 22% had received at least one prescription for antidiabetic medication. Conclusion: Deterministic cohort selection based on disease coding potentially introduces significant mis-classification problem. ML techniques allow testing for potential disease predictors, and under meaningful data input, are able to identify diseased cohorts in a holistic way.\",\"PeriodicalId\":38956,\"journal\":{\"name\":\"Open Bioinformatics Journal\",\"volume\":\"10 1\",\"pages\":\"16-27\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Open Bioinformatics Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/1875036201710010016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Bioinformatics Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1875036201710010016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 22

摘要

背景:从基于初级保健的电子医疗记录(EMR)中识别患病患者存在方法学挑战,可能会影响流行病学推断。目的:比较确定性临床指导选择算法和概率机器学习(ML)方法在从具有全国代表性的初级保健数据库中基于大规模人群的电子病历中识别2型糖尿病(T2DM)患者的能力。方法:以疾病编码为基础,采用确定性方法确定4组T2DM患者。在数据库中挖掘了一组T2DM的最佳预测因子,并基于交叉验证的真阳性率、真阴性率和受试者工作特征曲线下面积对六种ML算法的性能进行了比较。结果:在11018025个适合研究的个体的数据库中,379657(3.4%)被编码为患有T2DM。Logistic回归分类器被选为最佳ML算法,并产生了383330名潜在T2DM患者的队列。该队列中83%(83%)有T2DM代码,16%的T2DM代码患者不包括在该ML队列中。在没有疾病代码的ML队列中,52%的人至少有一次血糖水平升高,22%的人至少接受过一次抗糖尿病药物处方。结论:基于疾病编码的确定性队列选择可能会引入重大的错误分类问题。ML技术允许测试潜在的疾病预测因素,并且在有意义的数据输入下,能够以整体的方式识别患病队列。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Data Mining Approach to Identify Disease Cohorts from Primary Care Electronic Medical Records: A Case of Diabetes Mellitus
Background: Identification of diseased patients from primary care based electronic medical records (EMRs) has methodological challenges that may impact epidemiologic inferences. Objective: To compare deterministic clinically guided selection algorithms with probabilistic machine learning (ML) methodologies for their ability to identify patients with type 2 diabetes mellitus (T2DM) from large population based EMRs from nationally representative primary care database. Methods: Four cohorts of patients with T2DM were defined by deterministic approach based on disease codes. The database was mined for a set of best predictors of T2DM and the performance of six ML algorithms were compared based on cross-validated true positive rate, true negative rate, and area under receiver operating characteristic curve. Results: In the database of 11,018,025 research suitable individuals, 379 657 (3.4%) were coded to have T2DM. Logistic Regression classifier was selected as best ML algorithm and resulted in a cohort of 383,330 patients with potential T2DM. Eighty-three percent (83%) of this cohort had a T2DM code, and 16% of the patients with T2DM code were not included in this ML cohort. Of those in the ML cohort without disease code, 52% had at least one measure of elevated glucose level and 22% had received at least one prescription for antidiabetic medication. Conclusion: Deterministic cohort selection based on disease coding potentially introduces significant mis-classification problem. ML techniques allow testing for potential disease predictors, and under meaningful data input, are able to identify diseased cohorts in a holistic way.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Open Bioinformatics Journal
Open Bioinformatics Journal Computer Science-Computer Science (miscellaneous)
CiteScore
2.40
自引率
0.00%
发文量
4
期刊介绍: The Open Bioinformatics Journal is an Open Access online journal, which publishes research articles, reviews/mini-reviews, letters, clinical trial studies and guest edited single topic issues in all areas of bioinformatics and computational biology. The coverage includes biomedicine, focusing on large data acquisition, analysis and curation, computational and statistical methods for the modeling and analysis of biological data, and descriptions of new algorithms and databases. The Open Bioinformatics Journal, a peer reviewed journal, is an important and reliable source of current information on the developments in the field. The emphasis will be on publishing quality articles rapidly and freely available worldwide.
期刊最新文献
Decision-making Support System for Predicting and Eliminating Malnutrition and Anemia Immunoinformatics Approach for the Design of Chimeric Vaccine Against Whitmore Disease A New Deep Learning Model based on Neuroimaging for Predicting Alzheimer's Disease Early Prediction of Covid-19 Samples from Chest X-ray Images using Deep Learning Approach Electronic Health Record (EHR) System Development for Study on EHR Data-based Early Prediction of Diabetes Using Machine Learning Algorithms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1