Stephanie Garies, Matt Taylor, Boglarka Soos, Cliff Lindeman, Neil Drummond, Anh Pham, Zhi Aponte-Hao, Tyler Williamson
{"title":"使用机器学习对泛加拿大电子病历数据库中的药物记录进行标准化:一项数据驱动的算法研究,重点关注初级保健中处方的抗生素。","authors":"Stephanie Garies, Matt Taylor, Boglarka Soos, Cliff Lindeman, Neil Drummond, Anh Pham, Zhi Aponte-Hao, Tyler Williamson","doi":"10.9778/cmajo.20220235","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Most antibiotics dispensed by community pharmacies in Canada are prescribed by family physicians, but using the prescribing information contained within primary care electronic medical records (EMRs) for secondary purposes can be challenging owing to variable data quality. We used antibiotic medications as an exemplar to validate a machine-learning approach for cleaning and coding medication data in a pan-Canadian primary care EMR database.</p><p><strong>Methods: </strong>The Canadian Primary Care Sentinel Surveillance Network database contained an estimated 42 million medication records, which we mapped to an Anatomic Therapeutic Chemical (ATC) code by applying a semisupervised classification model developed using reference standard labels derived from the Health Canada Drug Product Database. We validated the resulting ATC codes in a subset of antibiotic records (16 119 unique strings) to determine whether the algorithm correctly classified the medication according to manual review of the original medication record.</p><p><strong>Results: </strong>In the antibiotic subset, the algorithm showed high validity (sensitivity 99.5%, specificity 92.4%, positive predictive value 98.6%, negative predictive value 97.0%) in classifying whether the medication was an antibiotic.</p><p><strong>Interpretation: </strong>Our machine-learning algorithm classified unstructured antibiotic medication data from primary care with a high degree of accuracy. Access to cleaned EMR data can support important secondary uses, including community-based antibiotic prescribing surveillance and practice improvement.</p>","PeriodicalId":93946,"journal":{"name":"CMAJ open","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10620009/pdf/","citationCount":"0","resultStr":"{\"title\":\"Using machine learning to standardize medication records in a pan-Canadian electronic medical record database: a data-driven algorithm study focused on antibiotics prescribed in primary care.\",\"authors\":\"Stephanie Garies, Matt Taylor, Boglarka Soos, Cliff Lindeman, Neil Drummond, Anh Pham, Zhi Aponte-Hao, Tyler Williamson\",\"doi\":\"10.9778/cmajo.20220235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Most antibiotics dispensed by community pharmacies in Canada are prescribed by family physicians, but using the prescribing information contained within primary care electronic medical records (EMRs) for secondary purposes can be challenging owing to variable data quality. We used antibiotic medications as an exemplar to validate a machine-learning approach for cleaning and coding medication data in a pan-Canadian primary care EMR database.</p><p><strong>Methods: </strong>The Canadian Primary Care Sentinel Surveillance Network database contained an estimated 42 million medication records, which we mapped to an Anatomic Therapeutic Chemical (ATC) code by applying a semisupervised classification model developed using reference standard labels derived from the Health Canada Drug Product Database. We validated the resulting ATC codes in a subset of antibiotic records (16 119 unique strings) to determine whether the algorithm correctly classified the medication according to manual review of the original medication record.</p><p><strong>Results: </strong>In the antibiotic subset, the algorithm showed high validity (sensitivity 99.5%, specificity 92.4%, positive predictive value 98.6%, negative predictive value 97.0%) in classifying whether the medication was an antibiotic.</p><p><strong>Interpretation: </strong>Our machine-learning algorithm classified unstructured antibiotic medication data from primary care with a high degree of accuracy. Access to cleaned EMR data can support important secondary uses, including community-based antibiotic prescribing surveillance and practice improvement.</p>\",\"PeriodicalId\":93946,\"journal\":{\"name\":\"CMAJ open\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10620009/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CMAJ open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.9778/cmajo.20220235\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/9/1 0:00:00\",\"PubModel\":\"Print\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CMAJ open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9778/cmajo.20220235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/1 0:00:00","PubModel":"Print","JCR":"","JCRName":"","Score":null,"Total":0}
Using machine learning to standardize medication records in a pan-Canadian electronic medical record database: a data-driven algorithm study focused on antibiotics prescribed in primary care.
Background: Most antibiotics dispensed by community pharmacies in Canada are prescribed by family physicians, but using the prescribing information contained within primary care electronic medical records (EMRs) for secondary purposes can be challenging owing to variable data quality. We used antibiotic medications as an exemplar to validate a machine-learning approach for cleaning and coding medication data in a pan-Canadian primary care EMR database.
Methods: The Canadian Primary Care Sentinel Surveillance Network database contained an estimated 42 million medication records, which we mapped to an Anatomic Therapeutic Chemical (ATC) code by applying a semisupervised classification model developed using reference standard labels derived from the Health Canada Drug Product Database. We validated the resulting ATC codes in a subset of antibiotic records (16 119 unique strings) to determine whether the algorithm correctly classified the medication according to manual review of the original medication record.
Results: In the antibiotic subset, the algorithm showed high validity (sensitivity 99.5%, specificity 92.4%, positive predictive value 98.6%, negative predictive value 97.0%) in classifying whether the medication was an antibiotic.
Interpretation: Our machine-learning algorithm classified unstructured antibiotic medication data from primary care with a high degree of accuracy. Access to cleaned EMR data can support important secondary uses, including community-based antibiotic prescribing surveillance and practice improvement.