Large-scale regression-based pattern discovery: The example of screening the WHO global drug safety database

IF 3.6 4区数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Statistical Analysis and Data Mining Pub Date : 2010-08-01 DOI:10.1002/SAM.V3:4

O. Caster, G. N. Norén, D. Madigan, A. Bate

{"title":"Large-scale regression-based pattern discovery: The example of screening the WHO global drug safety database","authors":"O. Caster, G. N. Norén, D. Madigan, A. Bate","doi":"10.1002/SAM.V3:4","DOIUrl":null,"url":null,"abstract":"Most measures of interestingness for patterns of co-occurring events are based on data projections onto contingency tables for the events of primary interest. As an alternative, this article presents the first implementation of shrinkage logistic regression for large-scale pattern discovery, with an evaluation of its usefulness in real-world binary transaction data. Regression accounts for the impact of other covariates that may confound or otherwise distort associations. The application considered is international adverse drug reaction (ADR) surveillance, in which large collections of reports on suspected ADRs are screened for interesting reporting patterns worthy of clinical follow-up. Our results show that regression-based pattern discovery does offer practical advantages. Specifically it can eliminate false positives and false negatives due to other covariates. Furthermore, it identifies some established drug safety issues earlier than a measure based on contingency tables. While regression offers clear conceptual advantages, our results suggest that methods based on contingency tables will continue to play a key role in ADR surveillance, for two reasons: the failure of regression to identify some established drug safety concerns as early as the currently used measures, and the relative lack of transparency of the procedure to estimate the regression coefficients. This suggests shrinkage regression should be used in parallel to existing measures of interestingness in ADR surveillance and other large-scale pattern discovery applications. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 197-208, 2010","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"22 1","pages":"197-208"},"PeriodicalIF":3.6000,"publicationDate":"2010-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/SAM.V3:4","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 35

Abstract

Most measures of interestingness for patterns of co-occurring events are based on data projections onto contingency tables for the events of primary interest. As an alternative, this article presents the first implementation of shrinkage logistic regression for large-scale pattern discovery, with an evaluation of its usefulness in real-world binary transaction data. Regression accounts for the impact of other covariates that may confound or otherwise distort associations. The application considered is international adverse drug reaction (ADR) surveillance, in which large collections of reports on suspected ADRs are screened for interesting reporting patterns worthy of clinical follow-up. Our results show that regression-based pattern discovery does offer practical advantages. Specifically it can eliminate false positives and false negatives due to other covariates. Furthermore, it identifies some established drug safety issues earlier than a measure based on contingency tables. While regression offers clear conceptual advantages, our results suggest that methods based on contingency tables will continue to play a key role in ADR surveillance, for two reasons: the failure of regression to identify some established drug safety concerns as early as the currently used measures, and the relative lack of transparency of the procedure to estimate the regression coefficients. This suggests shrinkage regression should be used in parallel to existing measures of interestingness in ADR surveillance and other large-scale pattern discovery applications. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 197-208, 2010

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于大规模回归的模式发现:以筛选WHO全球药物安全数据库为例

对共同发生事件模式的兴趣度的大多数度量是基于对主要感兴趣事件的列联表的数据投影。作为替代方案，本文介绍了用于大规模模式发现的收缩逻辑回归的第一个实现，并评估了其在现实世界二进制事务数据中的实用性。回归解释了其他协变量的影响，这些协变量可能混淆或扭曲关联。考虑的应用是国际药物不良反应(ADR)监测，其中对大量可疑ADR报告进行筛选，以寻找值得临床随访的有趣报告模式。我们的研究结果表明，基于回归的模式发现确实具有实际优势。具体来说，它可以消除由于其他协变量引起的假阳性和假阴性。此外，它比基于列联表的措施更早地确定了一些既定的药物安全问题。虽然回归具有明显的概念优势，但我们的研究结果表明，基于列联表的方法将继续在ADR监测中发挥关键作用，原因有两个:回归无法在目前使用的措施中尽早识别出一些已建立的药物安全问题，以及估计回归系数的程序相对缺乏透明度。这表明收缩回归应该与现有的ADR监测和其他大规模模式发现应用中的兴趣度度量并行使用。版权所有©2010 Wiley期刊公司统计分析与数据挖掘(3):197-208,2010

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

3.20

自引率

7.70%

发文量

期刊介绍： Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.