Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm with Uncertainty-Guided Physician Review

IF 3 2区医学 Q2 CLINICAL NEUROLOGY Nature and Science of Sleep Pub Date : 2024-05-27 DOI:10.2147/nss.s455649

Michal Bechny, Giuliana Monachino, Luigi Fiorillo, Julia van der Meer, Markus H Schmidt, Claudio LA Bassetti, Athina Tzovara, Francesca D Faraci

{"title":"Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm with Uncertainty-Guided Physician Review","authors":"Michal Bechny, Giuliana Monachino, Luigi Fiorillo, Julia van der Meer, Markus H Schmidt, Claudio LA Bassetti, Athina Tzovara, Francesca D Faraci","doi":"10.2147/nss.s455649","DOIUrl":null,"url":null,"abstract":"Purpose: This study aims to enhance the clinical use of automated sleep-scoring algorithms by incorporating an uncertainty estimation approach to efficiently assist clinicians in the manual review of predicted hypnograms, a necessity due to the notable inter-scorer variability inherent in polysomnography (PSG) databases. Our efforts target the extent of review required to achieve predefined agreement levels, examining both in-domain (ID) and out-of-domain (OOD) data, and considering subjects’ diagnoses. Patients and Methods: A total of 19,578 PSGs from 13 open-access databases were used to train U-Sleep, a state-of-the-art sleep-scoring algorithm. We leveraged a comprehensive clinical database of an additional 8832 PSGs, covering a full spectrum of ages (0– 91 years) and sleep-disorders, to refine the U-Sleep, and to evaluate different uncertainty-quantification approaches, including our novel confidence network. The ID data consisted of PSGs scored by over 50 physicians, and the two OOD sets comprised recordings each scored by a unique senior physician. Results: U-Sleep demonstrated robust performance, with Cohen’s kappa (K) at 76.2% on ID and 73.8– 78.8% on OOD data. The confidence network excelled at identifying uncertain predictions, achieving AUROC scores of 85.7% on ID and 82.5– 85.6% on OOD data. Independently of sleep-disorder status, statistical evaluations revealed significant differences in confidence scores between aligning vs discording predictions, and significant correlations of confidence scores with classification performance metrics. To achieve κ ≥ 90% with physician intervention, examining less than 29.0% of uncertain epochs was required, substantially reducing physicians’ workload, and facilitating near-perfect agreement. Conclusion: Inter-scorer variability limits the accuracy of the scoring algorithms to ~80%. By integrating an uncertainty estimation with U-Sleep, we enhance the review of predicted hypnograms, to align with the scoring taste of a responsible physician. Validated across ID and OOD data and various sleep-disorders, our approach offers a strategy to boost automated scoring tools’ usability in clinical settings. Keywords: automated sleep scoring, uncertainty quantification, explainable AI, polysomnography, sleep medicine ","PeriodicalId":18896,"journal":{"name":"Nature and Science of Sleep","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature and Science of Sleep","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/nss.s455649","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: This study aims to enhance the clinical use of automated sleep-scoring algorithms by incorporating an uncertainty estimation approach to efficiently assist clinicians in the manual review of predicted hypnograms, a necessity due to the notable inter-scorer variability inherent in polysomnography (PSG) databases. Our efforts target the extent of review required to achieve predefined agreement levels, examining both in-domain (ID) and out-of-domain (OOD) data, and considering subjects’ diagnoses.
Patients and Methods: A total of 19,578 PSGs from 13 open-access databases were used to train U-Sleep, a state-of-the-art sleep-scoring algorithm. We leveraged a comprehensive clinical database of an additional 8832 PSGs, covering a full spectrum of ages (0– 91 years) and sleep-disorders, to refine the U-Sleep, and to evaluate different uncertainty-quantification approaches, including our novel confidence network. The ID data consisted of PSGs scored by over 50 physicians, and the two OOD sets comprised recordings each scored by a unique senior physician.
Results: U-Sleep demonstrated robust performance, with Cohen’s kappa (K) at 76.2% on ID and 73.8– 78.8% on OOD data. The confidence network excelled at identifying uncertain predictions, achieving AUROC scores of 85.7% on ID and 82.5– 85.6% on OOD data. Independently of sleep-disorder status, statistical evaluations revealed significant differences in confidence scores between aligning vs discording predictions, and significant correlations of confidence scores with classification performance metrics. To achieve κ ≥ 90% with physician intervention, examining less than 29.0% of uncertain epochs was required, substantially reducing physicians’ workload, and facilitating near-perfect agreement.
Conclusion: Inter-scorer variability limits the accuracy of the scoring algorithms to ~80%. By integrating an uncertainty estimation with U-Sleep, we enhance the review of predicted hypnograms, to align with the scoring taste of a responsible physician. Validated across ID and OOD data and various sleep-disorders, our approach offers a strategy to boost automated scoring tools’ usability in clinical settings.

Keywords: automated sleep scoring, uncertainty quantification, explainable AI, polysomnography, sleep medicine

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

连接人工智能与临床实践：将自动睡眠评分算法与不确定性引导的医生审查相结合

目的：由于多导睡眠图（PSG）数据库中固有的评分者之间的显著差异，本研究旨在通过采用不确定性估计方法来有效协助临床医生对预测催眠图进行人工审核，从而提高自动睡眠评分算法的临床使用率。我们的工作以达到预定的一致性水平所需的审查程度为目标，同时检查域内（ID）和域外（OOD）数据，并考虑受试者的诊断：我们使用 13 个开放数据库中的 19,578 份 PSG 来训练最先进的睡眠评分算法 U-Sleep。我们还利用了一个包含 8832 份 PSG 的综合临床数据库来完善 U-Sleep，并评估不同的不确定性量化方法，包括我们的新型置信网络。ID数据由50多名医生评分的PSG组成，两组OOD数据由一名资深医生评分的记录组成：结果：U-Sleep 表现强劲，ID 数据的 Cohen's kappa (K) 为 76.2%，OOD 数据的 Cohen's kappa (K) 为 73.8%-78.8%。置信网络在识别不确定预测方面表现出色，ID 数据的 AUROC 得分为 85.7%，OOD 数据的 AUROC 得分为 82.5- 85.6%。统计评估显示，与睡眠障碍状态无关，对齐预测与不对齐预测的置信度得分存在显著差异，且置信度得分与分类性能指标存在显著相关性。在医生的干预下，要达到κ≥90%，需要检查的不确定历时少于29.0%，这大大减少了医生的工作量，并促进了接近完美的一致性：结论：评分者之间的差异将评分算法的准确性限制在约 80%。通过将不确定性估计与 U-Sleep 相结合，我们加强了对预测催眠图的审查，使其与负责任的医生的评分品味相一致。我们的方法在ID和OOD数据以及各种睡眠障碍中得到了验证，为提高自动评分工具在临床环境中的可用性提供了一种策略。关键词：自动睡眠评分、不确定性量化、可解释人工智能、多导睡眠图、睡眠医学

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Nature and Science of Sleep Neuroscience-Behavioral Neuroscience

CiteScore

5.70

自引率

5.90%

发文量

245

审稿时长

16 weeks

期刊介绍： Nature and Science of Sleep is an international, peer-reviewed, open access journal covering all aspects of sleep science and sleep medicine, including the neurophysiology and functions of sleep, the genetics of sleep, sleep and society, biological rhythms, dreaming, sleep disorders and therapy, and strategies to optimize healthy sleep. Specific topics covered in the journal include: The functions of sleep in humans and other animals Physiological and neurophysiological changes with sleep The genetics of sleep and sleep differences The neurotransmitters, receptors and pathways involved in controlling both sleep and wakefulness Behavioral and pharmacological interventions aimed at improving sleep, and improving wakefulness Sleep changes with development and with age Sleep and reproduction (e.g., changes across the menstrual cycle, with pregnancy and menopause) The science and nature of dreams Sleep disorders Impact of sleep and sleep disorders on health, daytime function and quality of life Sleep problems secondary to clinical disorders Interaction of society with sleep (e.g., consequences of shift work, occupational health, public health) The microbiome and sleep Chronotherapy Impact of circadian rhythms on sleep, physiology, cognition and health Mechanisms controlling circadian rhythms, centrally and peripherally Impact of circadian rhythm disruptions (including night shift work, jet lag and social jet lag) on sleep, physiology, cognition and health Behavioral and pharmacological interventions aimed at reducing adverse effects of circadian-related sleep disruption Assessment of technologies and biomarkers for measuring sleep and/or circadian rhythms Epigenetic markers of sleep or circadian disruption.