Empirical bayes model to combine signals of adverse drug reactions

R. Harpaz, W. DuMouchel, P. LePendu, N. Shah
{"title":"Empirical bayes model to combine signals of adverse drug reactions","authors":"R. Harpaz, W. DuMouchel, P. LePendu, N. Shah","doi":"10.1145/2487575.2488214","DOIUrl":null,"url":null,"abstract":"Data mining is a crucial tool for identifying risk signals of potential adverse drug reactions (ADRs). However, mining of ADR signals is currently limited to leveraging a single data source at a time. It is widely believed that combining ADR evidence from multiple data sources will result in a more accurate risk identification system. We present a methodology based on empirical Bayes modeling to combine ADR signals mined from ~5 million adverse event reports collected by the FDA, and healthcare data corresponding to 46 million patients' the main two types of information sources currently employed for signal detection. Based on four sets of test cases (gold standard), we demonstrate that our method leads to a statistically significant and substantial improvement in signal detection accuracy, averaging 40% over the use of each source independently, and an area under the ROC curve of 0.87. We also compare the method with alternative supervised learning approaches, and argue that our approach is preferable as it does not require labeled (training) samples whose availability is currently limited. To our knowledge, this is the first effort to combine signals from these two complementary data sources, and to demonstrate the benefits of a computationally integrative strategy for drug safety surveillance.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"96 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2488214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

Abstract

Data mining is a crucial tool for identifying risk signals of potential adverse drug reactions (ADRs). However, mining of ADR signals is currently limited to leveraging a single data source at a time. It is widely believed that combining ADR evidence from multiple data sources will result in a more accurate risk identification system. We present a methodology based on empirical Bayes modeling to combine ADR signals mined from ~5 million adverse event reports collected by the FDA, and healthcare data corresponding to 46 million patients' the main two types of information sources currently employed for signal detection. Based on four sets of test cases (gold standard), we demonstrate that our method leads to a statistically significant and substantial improvement in signal detection accuracy, averaging 40% over the use of each source independently, and an area under the ROC curve of 0.87. We also compare the method with alternative supervised learning approaches, and argue that our approach is preferable as it does not require labeled (training) samples whose availability is currently limited. To our knowledge, this is the first effort to combine signals from these two complementary data sources, and to demonstrate the benefits of a computationally integrative strategy for drug safety surveillance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结合药物不良反应信号的经验贝叶斯模型
数据挖掘是识别潜在药物不良反应(adr)风险信号的关键工具。然而,ADR信号的挖掘目前仅限于一次利用单个数据源。人们普遍认为,将来自多个数据源的ADR证据结合起来,将会形成一个更准确的风险识别系统。我们提出了一种基于经验贝叶斯模型的方法,将从FDA收集的约500万份不良事件报告中挖掘的ADR信号与4600万患者的医疗数据相结合,这是目前用于信号检测的两种主要信息来源。基于四组测试用例(金标准),我们证明了我们的方法在信号检测精度方面具有统计显着性和实质性的提高,在独立使用每个源时平均为40%,ROC曲线下面积为0.87。我们还将该方法与其他监督学习方法进行了比较,并认为我们的方法更可取,因为它不需要目前可用性有限的标记(训练)样本。据我们所知,这是第一次将这两个互补数据源的信号结合起来,并展示了药物安全监测计算综合策略的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A general bootstrap performance diagnostic Flexible and robust co-regularized multi-domain graph clustering Beyond myopic inference in big data pipelines Constrained stochastic gradient descent for large-scale least squares problem Inferring distant-time location in low-sampling-rate trajectories
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1