Predicting Autism Spectrum Disorder: Transformer-Based Deep Learning Ensemble Framework Using Health Administrative & Birth Registry Data

Kevin Dick, Emily Kaczmarek, Robin Ducharme, Alexa C Bowie, Alysha L. J. Dingwall-Harvey, Heather Howley, Steven Hawken, Mark C Walker, Christine M Armour
{"title":"Predicting Autism Spectrum Disorder: Transformer-Based Deep Learning Ensemble Framework Using Health Administrative & Birth Registry Data","authors":"Kevin Dick, Emily Kaczmarek, Robin Ducharme, Alexa C Bowie, Alysha L. J. Dingwall-Harvey, Heather Howley, Steven Hawken, Mark C Walker, Christine M Armour","doi":"10.1101/2024.07.03.24309684","DOIUrl":null,"url":null,"abstract":"Background\nEarly diagnosis and access to resources, support and therapy are critical for improving long-term outcomes for children with autism spectrum disorder (ASD). ASD is typically detected using a case-finding approach based on symptoms and family history, resulting in many delayed or missed diagnoses. While population-based screening would be ideal for early identification, available screening tools have limited accuracy. This study aims to determine whether machine learning models applied to health administrative and birth registry data can identify young children (aged 18 months to 5 years) who are at increased likelihood of developing ASD. Methods\nWe assembled the study cohort using individually linked maternal-newborn data from the Better Outcomes Registry and Network (BORN) Ontario database. The cohort included all live births in Ontario, Canada between April 1st, 2006, and March 31st, 2018, linked to datasets from Newborn Screening Ontario (NSO), Prenatal Screening Ontario (PSO), and Canadian Institute for Health Information (CIHI) (Discharge Abstract Database (DAD) and National Ambulatory Care Reporting System (NACRS)). The NSO and PSO datasets provided screening biomarker values and outcomes, while DAD and NACRS contained diagnosis codes and intervention codes for mothers and offspring. Extreme Gradient Boosting models and large-scale ensembled Transformer deep learning models were developed to predict ASD diagnosis between 18 and 60 months of age. Leveraging explainable artificial intelligence methods, we determined the impactful factors that contribute to increased likelihood of ASD at both an individual- and population-level. Results\nThe final study cohort included 703,894 mother-offspring pairs, with 10,964 identified cases of ASD. The best-performing ensemble of Transformer models achieved an area under the receiver operating characteristic curve of 69.6% for predicting ASD diagnosis, a sensitivity of 70.9%, a specificity of 56.9%. We determine that our model can be used to identify an enriched pool of children with the greatest likelihood of developing ASD, demonstrating the feasibility of this approach. Conclusions\nThis study highlights the feasibility of employing machine learning models and routinely collected health data to systematically identify young children at high likelihood of developing ASD. Ensemble transformer models applied to health administrative and birth registry data offer a promising avenue for universal ASD screening. Such early detection enables targeted and formal assessment for timely diagnosis and early access to resources, support, or therapy.","PeriodicalId":501556,"journal":{"name":"medRxiv - Health Systems and Quality Improvement","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Systems and Quality Improvement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.03.24309684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background Early diagnosis and access to resources, support and therapy are critical for improving long-term outcomes for children with autism spectrum disorder (ASD). ASD is typically detected using a case-finding approach based on symptoms and family history, resulting in many delayed or missed diagnoses. While population-based screening would be ideal for early identification, available screening tools have limited accuracy. This study aims to determine whether machine learning models applied to health administrative and birth registry data can identify young children (aged 18 months to 5 years) who are at increased likelihood of developing ASD. Methods We assembled the study cohort using individually linked maternal-newborn data from the Better Outcomes Registry and Network (BORN) Ontario database. The cohort included all live births in Ontario, Canada between April 1st, 2006, and March 31st, 2018, linked to datasets from Newborn Screening Ontario (NSO), Prenatal Screening Ontario (PSO), and Canadian Institute for Health Information (CIHI) (Discharge Abstract Database (DAD) and National Ambulatory Care Reporting System (NACRS)). The NSO and PSO datasets provided screening biomarker values and outcomes, while DAD and NACRS contained diagnosis codes and intervention codes for mothers and offspring. Extreme Gradient Boosting models and large-scale ensembled Transformer deep learning models were developed to predict ASD diagnosis between 18 and 60 months of age. Leveraging explainable artificial intelligence methods, we determined the impactful factors that contribute to increased likelihood of ASD at both an individual- and population-level. Results The final study cohort included 703,894 mother-offspring pairs, with 10,964 identified cases of ASD. The best-performing ensemble of Transformer models achieved an area under the receiver operating characteristic curve of 69.6% for predicting ASD diagnosis, a sensitivity of 70.9%, a specificity of 56.9%. We determine that our model can be used to identify an enriched pool of children with the greatest likelihood of developing ASD, demonstrating the feasibility of this approach. Conclusions This study highlights the feasibility of employing machine learning models and routinely collected health data to systematically identify young children at high likelihood of developing ASD. Ensemble transformer models applied to health administrative and birth registry data offer a promising avenue for universal ASD screening. Such early detection enables targeted and formal assessment for timely diagnosis and early access to resources, support, or therapy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预测自闭症谱系障碍:使用健康管理和出生登记数据的基于变压器的深度学习集合框架
背景早期诊断以及获得资源、支持和治疗对于改善自闭症谱系障碍(ASD)儿童的长期治疗效果至关重要。自闭症谱系障碍通常采用基于症状和家族病史的病例查找法进行检测,这导致了许多延迟诊断或漏诊。虽然基于人群的筛查是早期识别的理想方法,但现有筛查工具的准确性有限。本研究旨在确定应用于健康管理和出生登记数据的机器学习模型能否识别出更有可能患 ASD 的幼儿(18 个月至 5 岁)。方法我们利用安大略省更好结果登记和网络(BORN)数据库中的母婴单独关联数据组建了研究队列。该队列包括 2006 年 4 月 1 日至 2018 年 3 月 31 日期间在加拿大安大略省出生的所有活产婴儿,并与安大略省新生儿筛查(NSO)、安大略省产前筛查(PSO)和加拿大卫生信息研究所(CIHI)(出院摘要数据库(DAD)和国家非住院护理报告系统(NACRS))的数据集相链接。NSO 和 PSO 数据集提供筛查生物标志物值和结果,而 DAD 和 NACRS 包含母亲和后代的诊断代码和干预代码。我们开发了极端梯度提升模型和大规模集合 Transformer 深度学习模型,用于预测 18 到 60 个月大的 ASD 诊断。利用可解释的人工智能方法,我们确定了在个体和人群层面导致 ASD 可能性增加的影响因素。结果最终的研究队列包括 703,894 对母子,发现了 10,964 例 ASD 病例。表现最佳的 Transformer 模型组合在预测 ASD 诊断方面的接收者操作特征曲线下面积为 69.6%,灵敏度为 70.9%,特异度为 56.9%。我们确定,我们的模型可用于识别最有可能罹患 ASD 的儿童,证明了这种方法的可行性。结论这项研究强调了利用机器学习模型和日常收集的健康数据系统识别极有可能罹患 ASD 的幼儿的可行性。应用于健康管理和出生登记数据的集合变换器模型为普及 ASD 筛查提供了一条前景广阔的途径。通过这种早期检测,可以进行有针对性的正式评估,以便及时诊断,及早获得资源、支持或治疗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Effect of Monitoring and Evaluation Systems on the Performance of Neonatal Intensive Care Unit at Yumbe Regional referral hospital; A Pre-post quasi-experimental study design Plaintiff experiences of the medico-legal environment in Ireland “We’re here to help them if they want to come”: A qualitative exploration of hospital staff perceptions and experiences with outpatient non-attendance Improving Access and Efficiency of Acute Ischemic Stroke Treatment Across Four Canadian Provinces: A Stepped-Wedge Trial I am a quarterback: A mixed methods study of death investigators' communication with family members of young sudden cardiac death victims from suspected heritable causes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1