Shaun Gupta, Frederik Dieleman, P. Long, O. Doyle, N. Leavitt
{"title":"Using SNOMED to automate clinical concept mapping","authors":"Shaun Gupta, Frederik Dieleman, P. Long, O. Doyle, N. Leavitt","doi":"10.1145/3368555.3384453","DOIUrl":null,"url":null,"abstract":"The International Classification of Disease (ICD) is a widely used diagnostic ontology for the classification of health disorders and a valuable resource for healthcare analytics. However, ICD is an evolving ontology and subject to periodic revisions (e.g. ICD-9-CM to ICD-10-CM) resulting in the absence of complete cross-walks between versions. While clinical experts can create custom mappings across ICD versions, this process is both time-consuming and costly. We propose an automated solution that facilitates interoperability without sacrificing accuracy. Our solution leverages the SNOMED-CT ontology whereby medical concepts are organised in a directed acyclic graph. We use this to map ICD-9-CM to ICD-10-CM by associating codes to clinical concepts in the SNOMED graph using a nearest neighbors search in combination with natural language processing. To assess the impact of our method, the performance of a gradient boosted tree (XGBoost) developed to classify patients with Exocrine Pancreatic Insufficiency (EPI) disorder, was compared when using features constructed by our solution versus clinically-driven methods. This dataset comprised of 23, 204 EPI patients and 277, 324 non-EPI patients with data spanning from October 2011 to April 2017. Our algorithm generated clinical predictors with comparable stability across the ICD-9-CM to ICD-10-CM transition point when compared to ICD-9-CM/ICD-10-CM mappings generated by clinical experts. Preliminary modeling results showed highly similar performance for models based on the SNOMED mapping vs clinically defined mapping (71% precision at 20% recall for both models). Overall, the framework does not compromise on accuracy at the individual code level or at the model-level while obviating the need for time-consuming manual mapping.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Conference on Health, Inference, and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368555.3384453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The International Classification of Disease (ICD) is a widely used diagnostic ontology for the classification of health disorders and a valuable resource for healthcare analytics. However, ICD is an evolving ontology and subject to periodic revisions (e.g. ICD-9-CM to ICD-10-CM) resulting in the absence of complete cross-walks between versions. While clinical experts can create custom mappings across ICD versions, this process is both time-consuming and costly. We propose an automated solution that facilitates interoperability without sacrificing accuracy. Our solution leverages the SNOMED-CT ontology whereby medical concepts are organised in a directed acyclic graph. We use this to map ICD-9-CM to ICD-10-CM by associating codes to clinical concepts in the SNOMED graph using a nearest neighbors search in combination with natural language processing. To assess the impact of our method, the performance of a gradient boosted tree (XGBoost) developed to classify patients with Exocrine Pancreatic Insufficiency (EPI) disorder, was compared when using features constructed by our solution versus clinically-driven methods. This dataset comprised of 23, 204 EPI patients and 277, 324 non-EPI patients with data spanning from October 2011 to April 2017. Our algorithm generated clinical predictors with comparable stability across the ICD-9-CM to ICD-10-CM transition point when compared to ICD-9-CM/ICD-10-CM mappings generated by clinical experts. Preliminary modeling results showed highly similar performance for models based on the SNOMED mapping vs clinically defined mapping (71% precision at 20% recall for both models). Overall, the framework does not compromise on accuracy at the individual code level or at the model-level while obviating the need for time-consuming manual mapping.