Susanne Ibing , Julian Hugo , Florian Borchert , Linea Schmidt , Caroline Benson , Allison A. Marshall , Colleen Chasteau , Ujunwa Korie , Diana Paguay , Jan Philipp Sachs , Bernhard Y. Renard , Judy H. Cho , Erwin P. Böttinger , Ryan C. Ungaro
{"title":"基于电子病历识别新诊断的克罗恩病病例","authors":"Susanne Ibing , Julian Hugo , Florian Borchert , Linea Schmidt , Caroline Benson , Allison A. Marshall , Colleen Chasteau , Ujunwa Korie , Diana Paguay , Jan Philipp Sachs , Bernhard Y. Renard , Judy H. Cho , Erwin P. Böttinger , Ryan C. Ungaro","doi":"10.1016/j.artmed.2024.103032","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>Early diagnosis and treatment of Crohn’s Disease are associated with decreased risk of surgery and complications. However, diagnostic delay is frequently seen in clinical practice. To better understand Crohn’s Disease risk factors and disease indicators, we identified, described, and predicted incident Crohn’s Disease patients based on the Electronic Health Record data of the Mount Sinai Health System.</div></div><div><h3>Methods:</h3><div>We developed two phenotyping algorithms based on structured Electronic Health Record data (i.e., coded diagnosis, medication prescription, and healthcare utilization), and a more simple and advanced approach of information extraction from clinical notes, including data between 2011 and 2023. We conducted an ablation study for the classification task using different models, prediction time points, data inputs, text encoding methods, and case-control matching variables.</div></div><div><h3>Results:</h3><div>We identified 247 incident Crohn’s Disease cases and 1221 matched controls and validated our cohorts through manual chart review. A second control cohort (n = 1235) was created without matching on race. Gastrointestinal symptoms were significantly overrepresented in cases at least 180 days before the first coded Crohn’s Disease diagnosis. Adding text-based features to the clinical prediction models increased their overall performances. However, adding race as a matching variable had more effects on the model performance than the choice of modeling algorithm or input data, with an area under the receiver operating characteristic difference of 0.09 between the best-performing models.</div></div><div><h3>Conclusion:</h3><div>We demonstrate the feasibility of identifying newly diagnosed Crohn’s Disease patients within a United States health system using Electronic Health Records. For the predictive modeling task, cases and controls were distinguished only with modest performance, even though various state-of-the-art methods were applied based on features from structured and unstructured data. Our findings suggest the benefit of adding information from clinical notes in a supervised or unsupervised manner for cohort creation and predictive modeling.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"159 ","pages":"Article 103032"},"PeriodicalIF":6.1000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Electronic Health Records-based identification of newly diagnosed Crohn’s Disease cases\",\"authors\":\"Susanne Ibing , Julian Hugo , Florian Borchert , Linea Schmidt , Caroline Benson , Allison A. Marshall , Colleen Chasteau , Ujunwa Korie , Diana Paguay , Jan Philipp Sachs , Bernhard Y. Renard , Judy H. Cho , Erwin P. Böttinger , Ryan C. Ungaro\",\"doi\":\"10.1016/j.artmed.2024.103032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background:</h3><div>Early diagnosis and treatment of Crohn’s Disease are associated with decreased risk of surgery and complications. However, diagnostic delay is frequently seen in clinical practice. To better understand Crohn’s Disease risk factors and disease indicators, we identified, described, and predicted incident Crohn’s Disease patients based on the Electronic Health Record data of the Mount Sinai Health System.</div></div><div><h3>Methods:</h3><div>We developed two phenotyping algorithms based on structured Electronic Health Record data (i.e., coded diagnosis, medication prescription, and healthcare utilization), and a more simple and advanced approach of information extraction from clinical notes, including data between 2011 and 2023. We conducted an ablation study for the classification task using different models, prediction time points, data inputs, text encoding methods, and case-control matching variables.</div></div><div><h3>Results:</h3><div>We identified 247 incident Crohn’s Disease cases and 1221 matched controls and validated our cohorts through manual chart review. A second control cohort (n = 1235) was created without matching on race. Gastrointestinal symptoms were significantly overrepresented in cases at least 180 days before the first coded Crohn’s Disease diagnosis. Adding text-based features to the clinical prediction models increased their overall performances. However, adding race as a matching variable had more effects on the model performance than the choice of modeling algorithm or input data, with an area under the receiver operating characteristic difference of 0.09 between the best-performing models.</div></div><div><h3>Conclusion:</h3><div>We demonstrate the feasibility of identifying newly diagnosed Crohn’s Disease patients within a United States health system using Electronic Health Records. For the predictive modeling task, cases and controls were distinguished only with modest performance, even though various state-of-the-art methods were applied based on features from structured and unstructured data. Our findings suggest the benefit of adding information from clinical notes in a supervised or unsupervised manner for cohort creation and predictive modeling.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"159 \",\"pages\":\"Article 103032\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2024-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365724002744\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365724002744","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Electronic Health Records-based identification of newly diagnosed Crohn’s Disease cases
Background:
Early diagnosis and treatment of Crohn’s Disease are associated with decreased risk of surgery and complications. However, diagnostic delay is frequently seen in clinical practice. To better understand Crohn’s Disease risk factors and disease indicators, we identified, described, and predicted incident Crohn’s Disease patients based on the Electronic Health Record data of the Mount Sinai Health System.
Methods:
We developed two phenotyping algorithms based on structured Electronic Health Record data (i.e., coded diagnosis, medication prescription, and healthcare utilization), and a more simple and advanced approach of information extraction from clinical notes, including data between 2011 and 2023. We conducted an ablation study for the classification task using different models, prediction time points, data inputs, text encoding methods, and case-control matching variables.
Results:
We identified 247 incident Crohn’s Disease cases and 1221 matched controls and validated our cohorts through manual chart review. A second control cohort (n = 1235) was created without matching on race. Gastrointestinal symptoms were significantly overrepresented in cases at least 180 days before the first coded Crohn’s Disease diagnosis. Adding text-based features to the clinical prediction models increased their overall performances. However, adding race as a matching variable had more effects on the model performance than the choice of modeling algorithm or input data, with an area under the receiver operating characteristic difference of 0.09 between the best-performing models.
Conclusion:
We demonstrate the feasibility of identifying newly diagnosed Crohn’s Disease patients within a United States health system using Electronic Health Records. For the predictive modeling task, cases and controls were distinguished only with modest performance, even though various state-of-the-art methods were applied based on features from structured and unstructured data. Our findings suggest the benefit of adding information from clinical notes in a supervised or unsupervised manner for cohort creation and predictive modeling.
期刊介绍:
Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care.
Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.