Yutong Wu, David Conlan, Siegfried Perez, Anthony Nguyen
{"title":"Leveraging Unlabeled Clinical Data to Boost Performance of Risk Stratification Models for Suspected Acute Coronary Syndrome.","authors":"Yutong Wu, David Conlan, Siegfried Perez, Anthony Nguyen","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The performance of deep learning models in the health domain is desperately limited by the scarcity of labeled data, especially for specific clinical-domain tasks. Conversely, there are vastly available clinical unlabeled data waiting to be exploited to improve deep learning models where their training labeled data are limited. This paper investigates the use of task-specific unlabeled data to boost the performance of classification models for the risk stratification of suspected acute coronary syndrome. By leveraging large numbers of unlabeled clinical notes in task-adaptive language model pretraining, valuable prior task-specific knowledge can be attained. Based on such pretrained models, task-specific fine-tuning with limited labeled data produces better performances. Extensive experiments demonstrate that the pretrained task-specific language models using task-specific unlabeled data can significantly improve the performance of the downstream models for specific classification tasks.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"744-753"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785873/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The performance of deep learning models in the health domain is desperately limited by the scarcity of labeled data, especially for specific clinical-domain tasks. Conversely, there are vastly available clinical unlabeled data waiting to be exploited to improve deep learning models where their training labeled data are limited. This paper investigates the use of task-specific unlabeled data to boost the performance of classification models for the risk stratification of suspected acute coronary syndrome. By leveraging large numbers of unlabeled clinical notes in task-adaptive language model pretraining, valuable prior task-specific knowledge can be attained. Based on such pretrained models, task-specific fine-tuning with limited labeled data produces better performances. Extensive experiments demonstrate that the pretrained task-specific language models using task-specific unlabeled data can significantly improve the performance of the downstream models for specific classification tasks.