交流儿科疾病年龄聚类中的探索性无监督机器学习分析。

IF 4.1 Q1 HEALTH CARE SCIENCES & SERVICES BMJ Health & Care Informatics Pub Date : 2024-07-29 DOI:10.1136/bmjhci-2023-100963
Joshua William Spear, Eleni Pissaridou, Stuart Bowyer, William A Bryant, Daniel Key, John Booth, Anastasia Spiridou, Spiros Denaxas, Rebecca Pope, Andrew M Taylor, Harry Hemingway, Neil J Sebire
{"title":"交流儿科疾病年龄聚类中的探索性无监督机器学习分析。","authors":"Joshua William Spear, Eleni Pissaridou, Stuart Bowyer, William A Bryant, Daniel Key, John Booth, Anastasia Spiridou, Spiros Denaxas, Rebecca Pope, Andrew M Taylor, Harry Hemingway, Neil J Sebire","doi":"10.1136/bmjhci-2023-100963","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.</p><p><strong>Methods: </strong>Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.</p><p><strong>Findings: </strong>Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.</p><p><strong>Conclusion: </strong>Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"31 1","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11288139/pdf/","citationCount":"0","resultStr":"{\"title\":\"Communicating exploratory unsupervised machine learning analysis in age clustering for paediatric disease.\",\"authors\":\"Joshua William Spear, Eleni Pissaridou, Stuart Bowyer, William A Bryant, Daniel Key, John Booth, Anastasia Spiridou, Spiros Denaxas, Rebecca Pope, Andrew M Taylor, Harry Hemingway, Neil J Sebire\",\"doi\":\"10.1136/bmjhci-2023-100963\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.</p><p><strong>Methods: </strong>Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.</p><p><strong>Findings: </strong>Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.</p><p><strong>Conclusion: </strong>Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.</p>\",\"PeriodicalId\":9050,\"journal\":{\"name\":\"BMJ Health & Care Informatics\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11288139/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Health & Care Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjhci-2023-100963\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2023-100963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:尽管电子医疗记录(EHR)数据的可用性越来越高,即插即用的机器学习(ML)应用编程接口也越来越广泛,但迄今为止,在医院常规工作流程中采用数据驱动决策的情况仍然有限。本研究通过按年龄推导诊断集群的视角,调查了可使用电子病历数据进行的机器学习分析类型,以及如何将结果传达给非专业的利益相关者:方法:预处理后,使用一家三级儿科医院的观察性电子病历数据,该数据包含 61 522 名独特的患者和 3315 个独特的 ICD-10 诊断代码。采用 K 均值聚类来确定患者诊断的年龄分布。通过定量指标和专家对聚类临床有效性的评估,选定了最终模型。此外,还对预处理决策的不确定性进行了分析:研究结果:确定了四个疾病年龄群,大致符合以下年龄段:0 至 1 岁;1 至 5 岁;6 至 12 岁:结果:确定了四个疾病年龄群,大致符合以下年龄段:0 至 1 岁;1 至 5 岁;5 至 13 岁;13 至 18 岁。这些群组中的诊断符合现有的关于不同年龄发病倾向的知识,而连续群组则呈现了已知的疾病进展。结果验证了文献中的类似方法。预处理决定所引起的不确定性对个体诊断的影响很大,但对群体水平的影响不大。我们成功地展示了减轻或传达这种不确定性的策略:应用于电子病历数据的无监督 ML 可以识别与临床相关的诊断年龄分布,从而增强现有的决策制定。但是,如果不适当地减轻或传达医疗数据集中的偏差,则会对结果产生极大的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Communicating exploratory unsupervised machine learning analysis in age clustering for paediatric disease.

Background: Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.

Methods: Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.

Findings: Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.

Conclusion: Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.10
自引率
4.90%
发文量
40
审稿时长
18 weeks
期刊最新文献
Scaling equitable artificial intelligence in healthcare with machine learning operations. Understanding prescribing errors for system optimisation: the technology-related error mechanism classification. Detection of hypertension from pharyngeal images using deep learning algorithm in primary care settings in Japan. PubMed captures more fine-grained bibliographic data on scientific commentary than Web of Science: a comparative analysis. Method to apply temporal graph analysis on electronic patient record data to explore healthcare professional-patient interaction intensity: a cohort study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1