Refining sleep staging accuracy: transfer learning coupled with scorability models.

IF 5.6 2区医学 Q1 Medicine Sleep Pub Date : 2024-11-08 DOI:10.1093/sleep/zsae202

Wolfgang Ganglberger, Samaneh Nasiri, Haoqi Sun, Soriul Kim, Chol Shin, M Brandon Westover, Robert J Thomas

{"title":"Refining sleep staging accuracy: transfer learning coupled with scorability models.","authors":"Wolfgang Ganglberger, Samaneh Nasiri, Haoqi Sun, Soriul Kim, Chol Shin, M Brandon Westover, Robert J Thomas","doi":"10.1093/sleep/zsae202","DOIUrl":null,"url":null,"abstract":"Study objectives: This study aimed to (1) improve sleep staging accuracy through transfer learning (TL), to achieve or exceed human inter-expert agreement and (2) introduce a scorability model to assess the quality and trustworthiness of automated sleep staging.Methods: A deep neural network (base model) was trained on a large multi-site polysomnography (PSG) dataset from the United States. TL was used to calibrate the model to a reduced montage and limited samples from the Korean Genome and Epidemiology Study (KoGES) dataset. Model performance was compared to inter-expert reliability among three human experts. A scorability assessment was developed to predict the agreement between the model and human experts.Results: Initial sleep staging by the base model showed lower agreement with experts (κ = 0.55) compared to the inter-expert agreement (κ = 0.62). Calibration with 324 randomly sampled training cases matched expert agreement levels. Further targeted sampling improved performance, with models exceeding inter-expert agreement (κ = 0.70). The scorability assessment, combining biosignal quality and model confidence features, predicted model-expert agreement moderately well (R² = 0.42). Recordings with higher scorability scores demonstrated greater model-expert agreement than inter-expert agreement. Even with lower scorability scores, model performance was comparable to inter-expert agreement.Conclusions: Fine-tuning a pretrained neural network through targeted TL significantly enhances sleep staging performance for an atypical montage, achieving and surpassing human expert agreement levels. The introduction of a scorability assessment provides a robust measure of reliability, ensuring quality control and enhancing the practical application of the system before deployment. This approach marks an important advancement in automated sleep analysis, demonstrating the potential for AI to exceed human performance in clinical settings.","PeriodicalId":22018,"journal":{"name":"Sleep","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sleep","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/sleep/zsae202","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Study objectives: This study aimed to (1) improve sleep staging accuracy through transfer learning (TL), to achieve or exceed human inter-expert agreement and (2) introduce a scorability model to assess the quality and trustworthiness of automated sleep staging.

Methods: A deep neural network (base model) was trained on a large multi-site polysomnography (PSG) dataset from the United States. TL was used to calibrate the model to a reduced montage and limited samples from the Korean Genome and Epidemiology Study (KoGES) dataset. Model performance was compared to inter-expert reliability among three human experts. A scorability assessment was developed to predict the agreement between the model and human experts.

Results: Initial sleep staging by the base model showed lower agreement with experts (κ = 0.55) compared to the inter-expert agreement (κ = 0.62). Calibration with 324 randomly sampled training cases matched expert agreement levels. Further targeted sampling improved performance, with models exceeding inter-expert agreement (κ = 0.70). The scorability assessment, combining biosignal quality and model confidence features, predicted model-expert agreement moderately well (R² = 0.42). Recordings with higher scorability scores demonstrated greater model-expert agreement than inter-expert agreement. Even with lower scorability scores, model performance was comparable to inter-expert agreement.

Conclusions: Fine-tuning a pretrained neural network through targeted TL significantly enhances sleep staging performance for an atypical montage, achieving and surpassing human expert agreement levels. The introduction of a scorability assessment provides a robust measure of reliability, ensuring quality control and enhancing the practical application of the system before deployment. This approach marks an important advancement in automated sleep analysis, demonstrating the potential for AI to exceed human performance in clinical settings.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

提高睡眠分期的准确性：迁移学习与评分模型相结合

研究目的：本研究旨在：1）通过迁移学习提高睡眠分期的准确性，以达到或超过人类专家间的一致意见；2）引入可评分性模型，以评估自动睡眠分期的质量和可信度：方法：在美国的一个大型多站点多导睡眠图（PSG）数据集上训练了一个深度神经网络（基础模型）。利用迁移学习将模型校准为来自韩国基因组与流行病学研究（KoGES）数据集的缩减蒙太奇和有限样本。将模型性能与三位人类专家的专家间可靠性进行了比较。开发了一种可评分性评估方法，用于预测模型与人类专家之间的一致性：结果：与专家间的一致性（κ=0.62）相比，基础模型的初始睡眠分期与专家的一致性较低（κ=0.55）。使用 324 个随机取样的训练案例进行校准后，与专家的一致性水平相符。进一步有针对性的取样提高了性能，模型超过了专家间协议（κ=0.70）。结合生物信号质量和模型置信度特征进行的可评分性评估对模型与专家间的一致性有较好的预测作用（R²=0.42）。可评分较高的记录显示模型与专家之间的一致性高于专家之间的一致性。即使可评分较低，模型性能也与专家间一致度相当：结论：通过有针对性的迁移学习对预先训练的神经网络进行微调，可显著提高非典型蒙太奇的睡眠分期性能，达到并超过人类专家的一致水平。可测性评估的引入为可靠性提供了可靠的衡量标准，确保了质量控制，并增强了系统在部署前的实际应用。这种方法标志着自动睡眠分析的重要进步，证明了人工智能在临床环境中超越人类表现的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Sleep Medicine-Neurology (clinical)

CiteScore

8.70

自引率

10.70%

发文量

期刊介绍： SLEEP® publishes findings from studies conducted at any level of analysis, including: Genes Molecules Cells Physiology Neural systems and circuits Behavior and cognition Self-report SLEEP® publishes articles that use a wide variety of scientific approaches and address a broad range of topics. These may include, but are not limited to: Basic and neuroscience studies of sleep and circadian mechanisms In vitro and animal models of sleep, circadian rhythms, and human disorders Pre-clinical human investigations, including the measurement and manipulation of sleep and circadian rhythms Studies in clinical or population samples. These may address factors influencing sleep and circadian rhythms (e.g., development and aging, and social and environmental influences) and relationships between sleep, circadian rhythms, health, and disease Clinical trials, epidemiology studies, implementation, and dissemination research.