OptimCLM: Optimizing clinical language models for predicting patient outcomes via knowledge distillation, pruning and quantization

IF 3.7 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2024-12-18 DOI:10.1016/j.ijmedinf.2024.105764

Mohammad Junayed Hasan , Fuad Rahman , Nabeel Mohammed

{"title":"OptimCLM: Optimizing clinical language models for predicting patient outcomes via knowledge distillation, pruning and quantization","authors":"Mohammad Junayed Hasan , Fuad Rahman , Nabeel Mohammed","doi":"10.1016/j.ijmedinf.2024.105764","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Clinical Language Models (CLMs) possess the potential to reform traditional healthcare systems by aiding in clinical decision making and optimal resource utilization. They can enhance patient outcomes and help healthcare management through predictive clinical tasks. However, their real-world deployment is limited due to high computational cost at inference, in terms of both time and space complexity.</div></div><div><h3>Objective</h3><div>This study aims to develop and optimize an efficient framework that compresses CLMs without significant performance loss, reducing inference time and disk-space, and enabling real-world clinical applications.</div></div><div><h3>Methods</h3><div>We introduce OptimCLM, a framework for optimizing CLMs with ensemble learning, knowledge distillation (KD), pruning and quantization. Based on domain-knowledge and performance, we select and combine domain-adaptive CLMs DischargeBERT and COReBERT as the teacher ensemble model. We transfer the teacher's knowledge to two smaller generalist models, BERT-PKD and TinyBERT, and apply black-box KD, post-training unstructured pruning and post-training 8-bit model quantization to them. In an admission-to-discharge setting, we evaluate the framework on four clinical outcome prediction tasks (length of stay prediction, mortality prediction, diagnosis prediction and procedure prediction) using admission notes from the MIMIC-III clinical database.</div></div><div><h3>Results</h3><div>The OptimCLM framework achieved up to <strong>22.88</strong>× compression ratio and <strong>28.7</strong>× inference speedup, with less than <strong>5%</strong> and <strong>2%</strong> loss in macro-averaged AUROC for TinyBERT and BERT-PKD, respectively. The teacher model outperformed five state-of-the-art models on all tasks. The optimized BERT-PKD model also outperformed them in most tasks.</div></div><div><h3>Conclusion</h3><div>Our findings suggest that domain-specific fine-tuning with ensemble learning and KD is more effective than domain-specific pre-training for domain-knowledge transfer and text classification tasks. Thus, this work demonstrates the feasibility and potential of deploying optimized CLMs in healthcare settings and developing them with less computational resources.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105764"},"PeriodicalIF":3.7000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624004271","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Clinical Language Models (CLMs) possess the potential to reform traditional healthcare systems by aiding in clinical decision making and optimal resource utilization. They can enhance patient outcomes and help healthcare management through predictive clinical tasks. However, their real-world deployment is limited due to high computational cost at inference, in terms of both time and space complexity.

Objective

This study aims to develop and optimize an efficient framework that compresses CLMs without significant performance loss, reducing inference time and disk-space, and enabling real-world clinical applications.

Methods

We introduce OptimCLM, a framework for optimizing CLMs with ensemble learning, knowledge distillation (KD), pruning and quantization. Based on domain-knowledge and performance, we select and combine domain-adaptive CLMs DischargeBERT and COReBERT as the teacher ensemble model. We transfer the teacher's knowledge to two smaller generalist models, BERT-PKD and TinyBERT, and apply black-box KD, post-training unstructured pruning and post-training 8-bit model quantization to them. In an admission-to-discharge setting, we evaluate the framework on four clinical outcome prediction tasks (length of stay prediction, mortality prediction, diagnosis prediction and procedure prediction) using admission notes from the MIMIC-III clinical database.

Results

The OptimCLM framework achieved up to 22.88× compression ratio and 28.7× inference speedup, with less than 5% and 2% loss in macro-averaged AUROC for TinyBERT and BERT-PKD, respectively. The teacher model outperformed five state-of-the-art models on all tasks. The optimized BERT-PKD model also outperformed them in most tasks.

Conclusion

Our findings suggest that domain-specific fine-tuning with ensemble learning and KD is more effective than domain-specific pre-training for domain-knowledge transfer and text classification tasks. Thus, this work demonstrates the feasibility and potential of deploying optimized CLMs in healthcare settings and developing them with less computational resources.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

OptimCLM：优化临床语言模型，通过知识蒸馏，修剪和量化来预测患者结果。

背景：临床语言模型（CLMs）通过帮助临床决策和优化资源利用，具有改革传统医疗保健系统的潜力。它们可以提高患者的治疗效果，并通过预测性临床任务帮助医疗保健管理。然而，由于在时间和空间复杂性方面的推断计算成本高，它们在现实世界中的部署受到限制。目的：本研究旨在开发和优化一个有效的框架，在不显著性能损失的情况下压缩clm，减少推理时间和磁盘空间，并使现实世界的临床应用成为可能。方法：介绍了一个基于集成学习、知识蒸馏（KD）、剪枝和量化的clm优化框架OptimCLM。基于领域知识和性能，我们选择并组合了领域自适应CLMs DischargeBERT和COReBERT作为教师集成模型。我们将教师的知识转移到两个较小的通才模型BERT-PKD和TinyBERT中，并对它们应用黑箱KD、训练后非结构化修剪和训练后8位模型量化。在入院-出院设置中，我们使用来自MIMIC-III临床数据库的入院记录评估了四个临床结果预测任务（住院时间预测、死亡率预测、诊断预测和程序预测）的框架。结果：OptimCLM框架实现了22.88倍的压缩比和28.7倍的推理加速，TinyBERT和BERT-PKD的宏观平均AUROC损失分别小于5%和2%。教师模型在所有任务上都优于五个最先进的模型。优化后的BERT-PKD模型在大多数任务上也优于它们。结论：我们的研究结果表明，在领域知识转移和文本分类任务中，基于集成学习和KD的领域特定微调比领域特定预训练更有效。因此，这项工作证明了在医疗保健环境中部署优化的clm并使用更少的计算资源开发它们的可行性和潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Medical Informatics 医学-计算机：信息系统

CiteScore

8.90

自引率

4.10%

发文量

217

审稿时长

42 days

期刊介绍： International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.