Ensemble learning for score likelihood ratios under the common source problem

Federico Veneri, Danica M. Ommen
{"title":"Ensemble learning for score likelihood ratios under the common source problem","authors":"Federico Veneri, Danica M. Ommen","doi":"10.1002/sam.11637","DOIUrl":null,"url":null,"abstract":"Machine learning‐based score likelihood ratios (SLRs) have emerged as alternatives to traditional likelihood ratios and Bayes factors to quantify the value of evidence when contrasting two opposing propositions. When developing a conventional statistical model is infeasible, machine learning can be used to construct a (dis)similarity score for complex data and estimate the ratio of the conditional distributions of the scores. Under the common source problem, the opposing propositions address if two items come from the same source. To develop their SLRs, practitioners create datasets using pairwise comparisons from a background population sample. These comparisons result in a complex dependence structure that violates the independence assumption made by many popular methods. We propose a resampling step to remedy this lack of independence and an ensemble approach to enhance the performance of SLR systems. First, we introduce a source‐aware resampling plan to construct datasets where the independence assumption is met. Using these newly created sets, we train multiple base SLRs and aggregate their outputs into a final value of evidence. Our experimental results show that this ensemble SLR can outperform a traditional SLR approach in terms of the rate of misleading evidence and discriminatory power and present more consistent results.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Machine learning‐based score likelihood ratios (SLRs) have emerged as alternatives to traditional likelihood ratios and Bayes factors to quantify the value of evidence when contrasting two opposing propositions. When developing a conventional statistical model is infeasible, machine learning can be used to construct a (dis)similarity score for complex data and estimate the ratio of the conditional distributions of the scores. Under the common source problem, the opposing propositions address if two items come from the same source. To develop their SLRs, practitioners create datasets using pairwise comparisons from a background population sample. These comparisons result in a complex dependence structure that violates the independence assumption made by many popular methods. We propose a resampling step to remedy this lack of independence and an ensemble approach to enhance the performance of SLR systems. First, we introduce a source‐aware resampling plan to construct datasets where the independence assumption is met. Using these newly created sets, we train multiple base SLRs and aggregate their outputs into a final value of evidence. Our experimental results show that this ensemble SLR can outperform a traditional SLR approach in terms of the rate of misleading evidence and discriminatory power and present more consistent results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
共源问题下分数似然比的集成学习
基于机器学习的分数似然比(slr)已经成为传统似然比和贝叶斯因子的替代品,用于在对比两个对立命题时量化证据的价值。当开发传统统计模型不可行的时候,机器学习可以用来为复杂数据构建一个(非)相似度分数,并估计分数的条件分布的比率。在共同来源问题下,如果两个项目来自同一来源,则对立命题解决。为了开发他们的单反,从业者从背景人口样本中使用两两比较创建数据集。这些比较导致了复杂的依赖结构,违背了许多流行方法所做的独立性假设。我们提出了一个重采样步骤来弥补这种独立性的缺乏,并提出了一个集成方法来提高单反系统的性能。首先,我们引入源感知重采样计划来构建满足独立性假设的数据集。使用这些新创建的集合,我们训练多个基本单反,并将它们的输出汇总为最终的证据值。我们的实验结果表明,这种集成单反方法在误导证据率和区分能力方面优于传统的单反方法,并且呈现出更一致的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Neural interval‐censored survival regression with feature selection Bayesian batch optimization for molybdenum versus tungsten inertial confinement fusion double shell target design Gaussian process selections in semiparametric multi‐kernel machine regression for multi‐pathway analysis An automated alignment algorithm for identification of the source of footwear impressions with common class characteristics Confidence bounds for threshold similarity graph in random variable network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1