评估用于信号检测的四种机器学习模型。

IF 3.4 3区 医学 Q2 PHARMACOLOGY & PHARMACY Therapeutic Advances in Drug Safety Pub Date : 2023-12-25 eCollection Date: 2023-01-01 DOI:10.1177/20420986231219472
Daniel G Dauner, Eleazar Leal, Terrence J Adam, Rui Zhang, Joel F Farley
{"title":"评估用于信号检测的四种机器学习模型。","authors":"Daniel G Dauner, Eleazar Leal, Terrence J Adam, Rui Zhang, Joel F Farley","doi":"10.1177/20420986231219472","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Logistic regression-based signal detection algorithms have benefits over disproportionality analysis due to their ability to handle potential confounders and masking factors. Feature exploration and developing alternative machine learning algorithms can further strengthen signal detection.</p><p><strong>Objectives: </strong>Our objective was to compare the signal detection performance of logistic regression, gradient-boosted trees, random forest and support vector machine models utilizing Food and Drug Administration adverse event reporting system data.</p><p><strong>Design: </strong>Cross-sectional study.</p><p><strong>Methods: </strong>The quarterly data extract files from 1 October 2017 through 31 December 2020 were downloaded. Due to an imbalanced outcome, two training sets were used: one stratified on the outcome variable and another using Synthetic Minority Oversampling Technique (SMOTE). A crude model and a model with tuned hyperparameters were developed for each algorithm. Model performance was compared against a reference set using accuracy, precision, F1 score, recall, the receiver operating characteristic area under the curve (ROCAUC), and the precision-recall curve area under the curve (PRCAUC).</p><p><strong>Results: </strong>Models trained on the balanced training set had higher accuracy, F1 score and recall compared to models trained on the SMOTE training set. When using the balanced training set, logistic regression, gradient-boosted trees, random forest and support vector machine models obtained similar performance evaluation metrics. The gradient-boosted trees hyperparameter tuned model had the highest ROCAUC (0.646) and the random forest crude model had the highest PRCAUC (0.839) when using the balanced training set.</p><p><strong>Conclusion: </strong>All models trained on the balanced training set performed similarly. Logistic regression models had higher accuracy, precision and recall. Logistic regression, random forest and gradient-boosted trees hyperparameter tuned models had a PRCAUC ⩾ 0.8. All models had an ROCAUC ⩾ 0.5. Including both disproportionality analysis results and additional case report information in models resulted in higher performance evaluation metrics than disproportionality analysis alone.</p>","PeriodicalId":23012,"journal":{"name":"Therapeutic Advances in Drug Safety","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10752114/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluation of four machine learning models for signal detection.\",\"authors\":\"Daniel G Dauner, Eleazar Leal, Terrence J Adam, Rui Zhang, Joel F Farley\",\"doi\":\"10.1177/20420986231219472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Logistic regression-based signal detection algorithms have benefits over disproportionality analysis due to their ability to handle potential confounders and masking factors. Feature exploration and developing alternative machine learning algorithms can further strengthen signal detection.</p><p><strong>Objectives: </strong>Our objective was to compare the signal detection performance of logistic regression, gradient-boosted trees, random forest and support vector machine models utilizing Food and Drug Administration adverse event reporting system data.</p><p><strong>Design: </strong>Cross-sectional study.</p><p><strong>Methods: </strong>The quarterly data extract files from 1 October 2017 through 31 December 2020 were downloaded. Due to an imbalanced outcome, two training sets were used: one stratified on the outcome variable and another using Synthetic Minority Oversampling Technique (SMOTE). A crude model and a model with tuned hyperparameters were developed for each algorithm. Model performance was compared against a reference set using accuracy, precision, F1 score, recall, the receiver operating characteristic area under the curve (ROCAUC), and the precision-recall curve area under the curve (PRCAUC).</p><p><strong>Results: </strong>Models trained on the balanced training set had higher accuracy, F1 score and recall compared to models trained on the SMOTE training set. When using the balanced training set, logistic regression, gradient-boosted trees, random forest and support vector machine models obtained similar performance evaluation metrics. The gradient-boosted trees hyperparameter tuned model had the highest ROCAUC (0.646) and the random forest crude model had the highest PRCAUC (0.839) when using the balanced training set.</p><p><strong>Conclusion: </strong>All models trained on the balanced training set performed similarly. Logistic regression models had higher accuracy, precision and recall. Logistic regression, random forest and gradient-boosted trees hyperparameter tuned models had a PRCAUC ⩾ 0.8. All models had an ROCAUC ⩾ 0.5. Including both disproportionality analysis results and additional case report information in models resulted in higher performance evaluation metrics than disproportionality analysis alone.</p>\",\"PeriodicalId\":23012,\"journal\":{\"name\":\"Therapeutic Advances in Drug Safety\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-12-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10752114/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Therapeutic Advances in Drug Safety\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/20420986231219472\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Therapeutic Advances in Drug Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20420986231219472","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

摘要

背景:基于逻辑回归的信号检测算法由于能够处理潜在的混杂因素和掩蔽因素,因此比比例失调分析更有优势。特征探索和开发替代机器学习算法可以进一步加强信号检测:我们的目的是利用食品药品管理局不良事件报告系统的数据,比较逻辑回归、梯度提升树、随机森林和支持向量机模型的信号检测性能:设计:横断面研究:下载2017年10月1日至2020年12月31日的季度数据提取文件。由于结果不平衡,使用了两个训练集:一个根据结果变量分层,另一个使用合成少数群体过度取样技术(SMOTE)。每种算法都开发了一个粗略模型和一个带调整超参数的模型。使用准确度、精确度、F1 分数、召回率、曲线下接收者操作特征面积(ROCAUC)和精确度-召回率曲线下面积(PRCAUC)将模型性能与参考集进行比较:与在 SMOTE 训练集上训练的模型相比,在均衡训练集上训练的模型具有更高的精确度、F1 分数和召回率。使用均衡训练集时,逻辑回归、梯度增强树、随机森林和支持向量机模型获得了相似的性能评估指标。使用均衡训练集时,梯度提升树超参数调整模型的 ROCAUC 最高(0.646),随机森林粗模型的 PRCAUC 最高(0.839):结论:在平衡训练集上训练的所有模型表现相似。逻辑回归模型具有更高的准确率、精确度和召回率。逻辑回归、随机森林和梯度提升树超参数调整模型的 PRCAUC ⩾ 0.8。所有模型的 ROCAUC ⩾ 0.5。将比例失调分析结果和额外的病例报告信息纳入模型后,性能评估指标高于单独的比例失调分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluation of four machine learning models for signal detection.

Background: Logistic regression-based signal detection algorithms have benefits over disproportionality analysis due to their ability to handle potential confounders and masking factors. Feature exploration and developing alternative machine learning algorithms can further strengthen signal detection.

Objectives: Our objective was to compare the signal detection performance of logistic regression, gradient-boosted trees, random forest and support vector machine models utilizing Food and Drug Administration adverse event reporting system data.

Design: Cross-sectional study.

Methods: The quarterly data extract files from 1 October 2017 through 31 December 2020 were downloaded. Due to an imbalanced outcome, two training sets were used: one stratified on the outcome variable and another using Synthetic Minority Oversampling Technique (SMOTE). A crude model and a model with tuned hyperparameters were developed for each algorithm. Model performance was compared against a reference set using accuracy, precision, F1 score, recall, the receiver operating characteristic area under the curve (ROCAUC), and the precision-recall curve area under the curve (PRCAUC).

Results: Models trained on the balanced training set had higher accuracy, F1 score and recall compared to models trained on the SMOTE training set. When using the balanced training set, logistic regression, gradient-boosted trees, random forest and support vector machine models obtained similar performance evaluation metrics. The gradient-boosted trees hyperparameter tuned model had the highest ROCAUC (0.646) and the random forest crude model had the highest PRCAUC (0.839) when using the balanced training set.

Conclusion: All models trained on the balanced training set performed similarly. Logistic regression models had higher accuracy, precision and recall. Logistic regression, random forest and gradient-boosted trees hyperparameter tuned models had a PRCAUC ⩾ 0.8. All models had an ROCAUC ⩾ 0.5. Including both disproportionality analysis results and additional case report information in models resulted in higher performance evaluation metrics than disproportionality analysis alone.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Therapeutic Advances in Drug Safety
Therapeutic Advances in Drug Safety Medicine-Pharmacology (medical)
CiteScore
6.70
自引率
4.50%
发文量
31
审稿时长
9 weeks
期刊介绍: Therapeutic Advances in Drug Safety delivers the highest quality peer-reviewed articles, reviews, and scholarly comment on pioneering efforts and innovative studies pertaining to the safe use of drugs in patients. The journal has a strong clinical and pharmacological focus and is aimed at clinicians and researchers in drug safety, providing a forum in print and online for publishing the highest quality articles in this area. The editors welcome articles of current interest on research across all areas of drug safety, including therapeutic drug monitoring, pharmacoepidemiology, adverse drug reactions, drug interactions, pharmacokinetics, pharmacovigilance, medication/prescribing errors, risk management, ethics and regulation.
期刊最新文献
Determining the optimum dose of remifentanil in combination with propofol for total intravenous anaesthesia in hysteroscopy under Narcotrend and SPI monitoring. The evolution of the Pharmacovigilance department in the pharmaceutical industry: results of an Italian national survey. Comparison of a single intravenous infusion of alfentanil or sufentanil combined with target-controlled infusion of propofol for daytime hysteroscopy: a randomized clinical trial. Governance of artificial intelligence and machine learning in pharmacovigilance: what works today and what more is needed? Patient-centric decision-making in supplements intake and disclosure in clinical practice: a novel SIDP-12 tool to prevent drug-supplement interaction.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1