Ajay Kevat, Rylan Steinkey, Sadasivam Suresh, Warren R Ruehland, Jasneek Chawla, Philip I Terrill, Andrew Collaro, Kartik Iyer
{"title":"Evaluation of automated pediatric sleep stage classification using U-Sleep - a convolutional neural network","authors":"Ajay Kevat, Rylan Steinkey, Sadasivam Suresh, Warren R Ruehland, Jasneek Chawla, Philip I Terrill, Andrew Collaro, Kartik Iyer","doi":"10.1101/2024.08.18.24312174","DOIUrl":null,"url":null,"abstract":"Study Objectives U-Sleep is a publicly-available automated sleep stager, but has not been independently validated using pediatric data. We aimed to a) test the hypothesis that U-Sleep performance is equivalent to trained humans, using a concordance dataset of 50 pediatric polysomnogram excerpts scored by multiple trained scorers, and b) identify clinical and demographic characteristics that impact U-Sleep accuracy, using a clinical dataset of 3114 polysomnograms from a tertiary center. Methods Agreement between U-Sleep and gold 30-second epoch sleep staging was determined across both datasets. Utilizing the concordance dataset, the hypothesis of equivalence between human scorers and U-Sleep was tested using a Wilcoxon two one-sided test (TOST). Multivariable regression and generalized additive modelling were used on the clinical dataset to estimate the effects of age, comorbidities and polysomnographic findings on U-Sleep performance. Results The median (interquartile range) Cohens kappa agreement of U-Sleep and individual trained humans relative to gold scoring for 5-stage sleep staging in the concordance dataset were similar, kappa=0.79(0.19) vs 0.78(0.13) respectively, and satisfied statistical equivalence (TOST p<0.01). Median (interquartile range) kappa agreement between U-Sleep 2.0 and clinical sleep-staging was kappa=0.69(0.22). Modelling indicated lower performance for children <2 years, those with medical comorbidities possibly altering sleep electroencephalography (kappa reduction=0.07-0.15) and those with decreased sleep efficiency or sleep-disordered breathing (kappa reduction=0.1). Conclusion While U-Sleep algorithms showed statistically equivalent performance to trained scorers, accuracy was lower in children <2 years and those with sleep-disordered breathing or comorbidities affecting electroencephalography. U-Sleep is suitable for pediatric clinical utilization provided automated staging is followed by expert clinician review.","PeriodicalId":501549,"journal":{"name":"medRxiv - Pediatrics","volume":"111 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Pediatrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.18.24312174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Study Objectives U-Sleep is a publicly-available automated sleep stager, but has not been independently validated using pediatric data. We aimed to a) test the hypothesis that U-Sleep performance is equivalent to trained humans, using a concordance dataset of 50 pediatric polysomnogram excerpts scored by multiple trained scorers, and b) identify clinical and demographic characteristics that impact U-Sleep accuracy, using a clinical dataset of 3114 polysomnograms from a tertiary center. Methods Agreement between U-Sleep and gold 30-second epoch sleep staging was determined across both datasets. Utilizing the concordance dataset, the hypothesis of equivalence between human scorers and U-Sleep was tested using a Wilcoxon two one-sided test (TOST). Multivariable regression and generalized additive modelling were used on the clinical dataset to estimate the effects of age, comorbidities and polysomnographic findings on U-Sleep performance. Results The median (interquartile range) Cohens kappa agreement of U-Sleep and individual trained humans relative to gold scoring for 5-stage sleep staging in the concordance dataset were similar, kappa=0.79(0.19) vs 0.78(0.13) respectively, and satisfied statistical equivalence (TOST p<0.01). Median (interquartile range) kappa agreement between U-Sleep 2.0 and clinical sleep-staging was kappa=0.69(0.22). Modelling indicated lower performance for children <2 years, those with medical comorbidities possibly altering sleep electroencephalography (kappa reduction=0.07-0.15) and those with decreased sleep efficiency or sleep-disordered breathing (kappa reduction=0.1). Conclusion While U-Sleep algorithms showed statistically equivalent performance to trained scorers, accuracy was lower in children <2 years and those with sleep-disordered breathing or comorbidities affecting electroencephalography. U-Sleep is suitable for pediatric clinical utilization provided automated staging is followed by expert clinician review.