Refining sleep staging accuracy: transfer learning coupled with scorability models.

IF 5.6 2区 医学 Q1 Medicine Sleep Pub Date : 2024-11-08 DOI:10.1093/sleep/zsae202
Wolfgang Ganglberger, Samaneh Nasiri, Haoqi Sun, Soriul Kim, Chol Shin, M Brandon Westover, Robert J Thomas
{"title":"Refining sleep staging accuracy: transfer learning coupled with scorability models.","authors":"Wolfgang Ganglberger, Samaneh Nasiri, Haoqi Sun, Soriul Kim, Chol Shin, M Brandon Westover, Robert J Thomas","doi":"10.1093/sleep/zsae202","DOIUrl":null,"url":null,"abstract":"<p><strong>Study objectives: </strong>This study aimed to (1) improve sleep staging accuracy through transfer learning (TL), to achieve or exceed human inter-expert agreement and (2) introduce a scorability model to assess the quality and trustworthiness of automated sleep staging.</p><p><strong>Methods: </strong>A deep neural network (base model) was trained on a large multi-site polysomnography (PSG) dataset from the United States. TL was used to calibrate the model to a reduced montage and limited samples from the Korean Genome and Epidemiology Study (KoGES) dataset. Model performance was compared to inter-expert reliability among three human experts. A scorability assessment was developed to predict the agreement between the model and human experts.</p><p><strong>Results: </strong>Initial sleep staging by the base model showed lower agreement with experts (κ = 0.55) compared to the inter-expert agreement (κ = 0.62). Calibration with 324 randomly sampled training cases matched expert agreement levels. Further targeted sampling improved performance, with models exceeding inter-expert agreement (κ = 0.70). The scorability assessment, combining biosignal quality and model confidence features, predicted model-expert agreement moderately well (R² = 0.42). Recordings with higher scorability scores demonstrated greater model-expert agreement than inter-expert agreement. Even with lower scorability scores, model performance was comparable to inter-expert agreement.</p><p><strong>Conclusions: </strong>Fine-tuning a pretrained neural network through targeted TL significantly enhances sleep staging performance for an atypical montage, achieving and surpassing human expert agreement levels. The introduction of a scorability assessment provides a robust measure of reliability, ensuring quality control and enhancing the practical application of the system before deployment. This approach marks an important advancement in automated sleep analysis, demonstrating the potential for AI to exceed human performance in clinical settings.</p>","PeriodicalId":22018,"journal":{"name":"Sleep","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sleep","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/sleep/zsae202","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Study objectives: This study aimed to (1) improve sleep staging accuracy through transfer learning (TL), to achieve or exceed human inter-expert agreement and (2) introduce a scorability model to assess the quality and trustworthiness of automated sleep staging.

Methods: A deep neural network (base model) was trained on a large multi-site polysomnography (PSG) dataset from the United States. TL was used to calibrate the model to a reduced montage and limited samples from the Korean Genome and Epidemiology Study (KoGES) dataset. Model performance was compared to inter-expert reliability among three human experts. A scorability assessment was developed to predict the agreement between the model and human experts.

Results: Initial sleep staging by the base model showed lower agreement with experts (κ = 0.55) compared to the inter-expert agreement (κ = 0.62). Calibration with 324 randomly sampled training cases matched expert agreement levels. Further targeted sampling improved performance, with models exceeding inter-expert agreement (κ = 0.70). The scorability assessment, combining biosignal quality and model confidence features, predicted model-expert agreement moderately well (R² = 0.42). Recordings with higher scorability scores demonstrated greater model-expert agreement than inter-expert agreement. Even with lower scorability scores, model performance was comparable to inter-expert agreement.

Conclusions: Fine-tuning a pretrained neural network through targeted TL significantly enhances sleep staging performance for an atypical montage, achieving and surpassing human expert agreement levels. The introduction of a scorability assessment provides a robust measure of reliability, ensuring quality control and enhancing the practical application of the system before deployment. This approach marks an important advancement in automated sleep analysis, demonstrating the potential for AI to exceed human performance in clinical settings.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
提高睡眠分期的准确性:迁移学习与评分模型相结合
研究目的:本研究旨在:1)通过迁移学习提高睡眠分期的准确性,以达到或超过人类专家间的一致意见;2)引入可评分性模型,以评估自动睡眠分期的质量和可信度:方法:在美国的一个大型多站点多导睡眠图(PSG)数据集上训练了一个深度神经网络(基础模型)。利用迁移学习将模型校准为来自韩国基因组与流行病学研究(KoGES)数据集的缩减蒙太奇和有限样本。将模型性能与三位人类专家的专家间可靠性进行了比较。开发了一种可评分性评估方法,用于预测模型与人类专家之间的一致性:结果:与专家间的一致性(κ=0.62)相比,基础模型的初始睡眠分期与专家的一致性较低(κ=0.55)。使用 324 个随机取样的训练案例进行校准后,与专家的一致性水平相符。进一步有针对性的取样提高了性能,模型超过了专家间协议(κ=0.70)。结合生物信号质量和模型置信度特征进行的可评分性评估对模型与专家间的一致性有较好的预测作用(R²=0.42)。可评分较高的记录显示模型与专家之间的一致性高于专家之间的一致性。即使可评分较低,模型性能也与专家间一致度相当:结论:通过有针对性的迁移学习对预先训练的神经网络进行微调,可显著提高非典型蒙太奇的睡眠分期性能,达到并超过人类专家的一致水平。可测性评估的引入为可靠性提供了可靠的衡量标准,确保了质量控制,并增强了系统在部署前的实际应用。这种方法标志着自动睡眠分析的重要进步,证明了人工智能在临床环境中超越人类表现的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Sleep
Sleep Medicine-Neurology (clinical)
CiteScore
8.70
自引率
10.70%
发文量
0
期刊介绍: SLEEP® publishes findings from studies conducted at any level of analysis, including: Genes Molecules Cells Physiology Neural systems and circuits Behavior and cognition Self-report SLEEP® publishes articles that use a wide variety of scientific approaches and address a broad range of topics. These may include, but are not limited to: Basic and neuroscience studies of sleep and circadian mechanisms In vitro and animal models of sleep, circadian rhythms, and human disorders Pre-clinical human investigations, including the measurement and manipulation of sleep and circadian rhythms Studies in clinical or population samples. These may address factors influencing sleep and circadian rhythms (e.g., development and aging, and social and environmental influences) and relationships between sleep, circadian rhythms, health, and disease Clinical trials, epidemiology studies, implementation, and dissemination research.
期刊最新文献
What is cataplexy? Wireless wearable sensors can facilitate rapid detection of sleep apnea in hospitalized stroke patients. Device-measured weekend catch-up sleep, mortality, and cardiovascular disease incidence in adults. A circadian-informed lighting intervention accelerates circadian adjustment to a night work schedule in a submarine lighting environment. Associations of accelerometer-measured sleep duration with incident cardiovascular disease and cardiovascular mortality.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1