A machine learning case study to predict rare clinical event of interest: imbalanced data, interpretability, and practical considerations.

IF 1.2 4区 医学 Q4 PHARMACOLOGY & PHARMACY Journal of Biopharmaceutical Statistics Pub Date : 2024-06-11 DOI:10.1080/10543406.2024.2364722
Sheng Zhong, Jane Zhang, Jenny Jiao, Hongjian Zhu, Yunzhao Xing, Li Wang
{"title":"A machine learning case study to predict rare clinical event of interest: imbalanced data, interpretability, and practical considerations.","authors":"Sheng Zhong, Jane Zhang, Jenny Jiao, Hongjian Zhu, Yunzhao Xing, Li Wang","doi":"10.1080/10543406.2024.2364722","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate prediction of a rare and clinically important event following study treatment has been crucial in drug development. For instance, the rarity of an adverse event is often commensurate with the seriousness of medical consequences, and delayed detection of the rare adverse event can pose significant or even life-threatening health risks to patients. In this machine learning case study, we demonstrate with an example originated from a real clinical trial setting how to define and solve the rare clinical event prediction problem using machine learning in pharmaceutical industry. The unique contributions of this work include the proposal of a six-step investigation framework that facilitates the communication with non-technical stakeholders and the interpretation of the model performance in terms of practical consequences in the context of patient screenings for conducting a future clinical trial. In terms of machine learning methodology, for data splitting into the training and test sets, we adapt the rare-event stratified split approach (from scikit-learn) to further account for group splitting for multiple records of a patient simultaneously. To handle imbalanced data due to rare events in model training, the cost-sensitive learning approach is employed to give more weights to the minor class and the metrics precision together with recall are used to capture prediction performance instead of the raw accuracy rate. Finally, we demonstrate how to apply the state-of-the-art SHAP values to identify important risk factors to improve model interpretability.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-14"},"PeriodicalIF":1.2000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biopharmaceutical Statistics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/10543406.2024.2364722","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate prediction of a rare and clinically important event following study treatment has been crucial in drug development. For instance, the rarity of an adverse event is often commensurate with the seriousness of medical consequences, and delayed detection of the rare adverse event can pose significant or even life-threatening health risks to patients. In this machine learning case study, we demonstrate with an example originated from a real clinical trial setting how to define and solve the rare clinical event prediction problem using machine learning in pharmaceutical industry. The unique contributions of this work include the proposal of a six-step investigation framework that facilitates the communication with non-technical stakeholders and the interpretation of the model performance in terms of practical consequences in the context of patient screenings for conducting a future clinical trial. In terms of machine learning methodology, for data splitting into the training and test sets, we adapt the rare-event stratified split approach (from scikit-learn) to further account for group splitting for multiple records of a patient simultaneously. To handle imbalanced data due to rare events in model training, the cost-sensitive learning approach is employed to give more weights to the minor class and the metrics precision together with recall are used to capture prediction performance instead of the raw accuracy rate. Finally, we demonstrate how to apply the state-of-the-art SHAP values to identify important risk factors to improve model interpretability.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预测罕见临床事件的机器学习案例研究:不平衡数据、可解释性和实际考虑因素。
在药物开发过程中,准确预测研究治疗后出现的罕见临床重要事件至关重要。例如,不良事件的罕见程度往往与医疗后果的严重程度成正比,而罕见不良事件的延迟检测可能会给患者带来重大甚至危及生命的健康风险。在本机器学习案例研究中,我们通过一个源于真实临床试验环境的例子,展示了如何在制药行业利用机器学习定义和解决罕见临床事件预测问题。这项工作的独特贡献包括提出了一个六步调查框架,该框架有助于与非技术利益相关者进行沟通,并从实际后果的角度解释模型性能,从而为未来开展临床试验进行患者筛查。在机器学习方法方面,为了将数据拆分为训练集和测试集,我们采用了罕见事件分层拆分方法(来自 scikit-learn),以进一步考虑同时对一名患者的多条记录进行分组拆分。为了在模型训练中处理罕见事件导致的不平衡数据,我们采用了成本敏感学习方法,为次要类别赋予更多权重,并使用精度和召回率指标来衡量预测性能,而不是原始准确率。最后,我们演示了如何应用最先进的 SHAP 值来识别重要的风险因素,以提高模型的可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Biopharmaceutical Statistics
Journal of Biopharmaceutical Statistics 医学-统计学与概率论
CiteScore
2.50
自引率
18.20%
发文量
71
审稿时长
6-12 weeks
期刊介绍: The Journal of Biopharmaceutical Statistics, a rapid publication journal, discusses quality applications of statistics in biopharmaceutical research and development. Now publishing six times per year, it includes expositions of statistical methodology with immediate applicability to biopharmaceutical research in the form of full-length and short manuscripts, review articles, selected/invited conference papers, short articles, and letters to the editor. Addressing timely and provocative topics important to the biostatistical profession, the journal covers: Drug, device, and biological research and development; Drug screening and drug design; Assessment of pharmacological activity; Pharmaceutical formulation and scale-up; Preclinical safety assessment; Bioavailability, bioequivalence, and pharmacokinetics; Phase, I, II, and III clinical development including complex innovative designs; Premarket approval assessment of clinical safety; Postmarketing surveillance; Big data and artificial intelligence and applications.
期刊最新文献
Sequential monitoring of cancer immunotherapy trial with random delayed treatment effect. Directed Acyclic Graph Assisted Method For Estimating Average Treatment Effect. Bayesian phase II adaptive randomization by jointly modeling efficacy and toxicity as time-to-event outcomes. Interval estimation of relative risks for combined unilateral and bilateral correlated data. Sample size estimation for recurrent event data using multifrailty and multilevel survival models.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1