Predicting Autism Spectrum Disorder: Transformer-Based Deep Learning Ensemble Framework Using Health Administrative & Birth Registry Data

medRxiv - Health Systems and Quality Improvement Pub Date : 2024-07-05 DOI:10.1101/2024.07.03.24309684

Kevin Dick, Emily Kaczmarek, Robin Ducharme, Alexa C Bowie, Alysha L. J. Dingwall-Harvey, Heather Howley, Steven Hawken, Mark C Walker, Christine M Armour

{"title":"Predicting Autism Spectrum Disorder: Transformer-Based Deep Learning Ensemble Framework Using Health Administrative & Birth Registry Data","authors":"Kevin Dick, Emily Kaczmarek, Robin Ducharme, Alexa C Bowie, Alysha L. J. Dingwall-Harvey, Heather Howley, Steven Hawken, Mark C Walker, Christine M Armour","doi":"10.1101/2024.07.03.24309684","DOIUrl":null,"url":null,"abstract":"Background\nEarly diagnosis and access to resources, support and therapy are critical for improving long-term outcomes for children with autism spectrum disorder (ASD). ASD is typically detected using a case-finding approach based on symptoms and family history, resulting in many delayed or missed diagnoses. While population-based screening would be ideal for early identification, available screening tools have limited accuracy. This study aims to determine whether machine learning models applied to health administrative and birth registry data can identify young children (aged 18 months to 5 years) who are at increased likelihood of developing ASD. Methods\nWe assembled the study cohort using individually linked maternal-newborn data from the Better Outcomes Registry and Network (BORN) Ontario database. The cohort included all live births in Ontario, Canada between April 1st, 2006, and March 31st, 2018, linked to datasets from Newborn Screening Ontario (NSO), Prenatal Screening Ontario (PSO), and Canadian Institute for Health Information (CIHI) (Discharge Abstract Database (DAD) and National Ambulatory Care Reporting System (NACRS)). The NSO and PSO datasets provided screening biomarker values and outcomes, while DAD and NACRS contained diagnosis codes and intervention codes for mothers and offspring. Extreme Gradient Boosting models and large-scale ensembled Transformer deep learning models were developed to predict ASD diagnosis between 18 and 60 months of age. Leveraging explainable artificial intelligence methods, we determined the impactful factors that contribute to increased likelihood of ASD at both an individual- and population-level. Results\nThe final study cohort included 703,894 mother-offspring pairs, with 10,964 identified cases of ASD. The best-performing ensemble of Transformer models achieved an area under the receiver operating characteristic curve of 69.6% for predicting ASD diagnosis, a sensitivity of 70.9%, a specificity of 56.9%. We determine that our model can be used to identify an enriched pool of children with the greatest likelihood of developing ASD, demonstrating the feasibility of this approach. Conclusions\nThis study highlights the feasibility of employing machine learning models and routinely collected health data to systematically identify young children at high likelihood of developing ASD. Ensemble transformer models applied to health administrative and birth registry data offer a promising avenue for universal ASD screening. Such early detection enables targeted and formal assessment for timely diagnosis and early access to resources, support, or therapy.","PeriodicalId":501556,"journal":{"name":"medRxiv - Health Systems and Quality Improvement","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Systems and Quality Improvement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.03.24309684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background Early diagnosis and access to resources, support and therapy are critical for improving long-term outcomes for children with autism spectrum disorder (ASD). ASD is typically detected using a case-finding approach based on symptoms and family history, resulting in many delayed or missed diagnoses. While population-based screening would be ideal for early identification, available screening tools have limited accuracy. This study aims to determine whether machine learning models applied to health administrative and birth registry data can identify young children (aged 18 months to 5 years) who are at increased likelihood of developing ASD. Methods We assembled the study cohort using individually linked maternal-newborn data from the Better Outcomes Registry and Network (BORN) Ontario database. The cohort included all live births in Ontario, Canada between April 1st, 2006, and March 31st, 2018, linked to datasets from Newborn Screening Ontario (NSO), Prenatal Screening Ontario (PSO), and Canadian Institute for Health Information (CIHI) (Discharge Abstract Database (DAD) and National Ambulatory Care Reporting System (NACRS)). The NSO and PSO datasets provided screening biomarker values and outcomes, while DAD and NACRS contained diagnosis codes and intervention codes for mothers and offspring. Extreme Gradient Boosting models and large-scale ensembled Transformer deep learning models were developed to predict ASD diagnosis between 18 and 60 months of age. Leveraging explainable artificial intelligence methods, we determined the impactful factors that contribute to increased likelihood of ASD at both an individual- and population-level. Results The final study cohort included 703,894 mother-offspring pairs, with 10,964 identified cases of ASD. The best-performing ensemble of Transformer models achieved an area under the receiver operating characteristic curve of 69.6% for predicting ASD diagnosis, a sensitivity of 70.9%, a specificity of 56.9%. We determine that our model can be used to identify an enriched pool of children with the greatest likelihood of developing ASD, demonstrating the feasibility of this approach. Conclusions This study highlights the feasibility of employing machine learning models and routinely collected health data to systematically identify young children at high likelihood of developing ASD. Ensemble transformer models applied to health administrative and birth registry data offer a promising avenue for universal ASD screening. Such early detection enables targeted and formal assessment for timely diagnosis and early access to resources, support, or therapy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

预测自闭症谱系障碍：使用健康管理和出生登记数据的基于变压器的深度学习集合框架

背景早期诊断以及获得资源、支持和治疗对于改善自闭症谱系障碍（ASD）儿童的长期治疗效果至关重要。自闭症谱系障碍通常采用基于症状和家族病史的病例查找法进行检测，这导致了许多延迟诊断或漏诊。虽然基于人群的筛查是早期识别的理想方法，但现有筛查工具的准确性有限。本研究旨在确定应用于健康管理和出生登记数据的机器学习模型能否识别出更有可能患 ASD 的幼儿（18 个月至 5 岁）。方法我们利用安大略省更好结果登记和网络（BORN）数据库中的母婴单独关联数据组建了研究队列。该队列包括 2006 年 4 月 1 日至 2018 年 3 月 31 日期间在加拿大安大略省出生的所有活产婴儿，并与安大略省新生儿筛查（NSO）、安大略省产前筛查（PSO）和加拿大卫生信息研究所（CIHI）（出院摘要数据库（DAD）和国家非住院护理报告系统（NACRS））的数据集相链接。NSO 和 PSO 数据集提供筛查生物标志物值和结果，而 DAD 和 NACRS 包含母亲和后代的诊断代码和干预代码。我们开发了极端梯度提升模型和大规模集合 Transformer 深度学习模型，用于预测 18 到 60 个月大的 ASD 诊断。利用可解释的人工智能方法，我们确定了在个体和人群层面导致 ASD 可能性增加的影响因素。结果最终的研究队列包括 703,894 对母子，发现了 10,964 例 ASD 病例。表现最佳的 Transformer 模型组合在预测 ASD 诊断方面的接收者操作特征曲线下面积为 69.6%，灵敏度为 70.9%，特异度为 56.9%。我们确定，我们的模型可用于识别最有可能罹患 ASD 的儿童，证明了这种方法的可行性。结论这项研究强调了利用机器学习模型和日常收集的健康数据系统识别极有可能罹患 ASD 的幼儿的可行性。应用于健康管理和出生登记数据的集合变换器模型为普及 ASD 筛查提供了一条前景广阔的途径。通过这种早期检测，可以进行有针对性的正式评估，以便及时诊断，及早获得资源、支持或治疗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv - Health Systems and Quality Improvement

自引率

0.00%

发文量