数据融合下的有效估计

IF 2.4 2区 数学 Q2 BIOLOGY Biometrika Pub Date : 2023-12-01 Epub Date: 2023-02-06 DOI:10.1093/biomet/asad007
Sijia Li, Alex Luedtke
{"title":"数据融合下的有效估计","authors":"Sijia Li, Alex Luedtke","doi":"10.1093/biomet/asad007","DOIUrl":null,"url":null,"abstract":"<p><p>We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical simulations, we illustrate marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10653189/pdf/","citationCount":"7","resultStr":"{\"title\":\"Efficient Estimation under Data Fusion.\",\"authors\":\"Sijia Li, Alex Luedtke\",\"doi\":\"10.1093/biomet/asad007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical simulations, we illustrate marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.</p>\",\"PeriodicalId\":9001,\"journal\":{\"name\":\"Biometrika\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10653189/pdf/\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrika\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biomet/asad007\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/2/6 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrika","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asad007","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/2/6 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 7

摘要

我们的目标是通过将多个来源的数据融合在一起,对光滑的有限维参数进行推断。先前的工作已经研究了在类似的数据融合环境中对各种参数的估计,包括对政策下的平均治疗效果和平均奖励的估计,其中大多数工作将一个具有协变量、行动和奖励的历史数据源与一个具有相同协变量的数据源合并。在这项工作中,我们考虑了一个或多个数据源与目标人群分布的每一部分一致的一般情况,例如,给定行动和协变量的奖励的条件分布。我们描述了在单个分析中将这些数据源融合在一起可能产生的潜在效率增益,我们通过半参数效率界的降低来表征。我们还提供了一种构造实现这些边界的估计量的通用方法。在数值模拟中,我们展示了使用我们提出的估计量而不是它们的自然替代方案在效率上的显著提高。最后,我们通过融合两项HIV疫苗试验的数据,说明了疫苗免疫原性研究中可以实现的效率提高的幅度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Efficient Estimation under Data Fusion.

We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical simulations, we illustrate marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biometrika
Biometrika 生物-生物学
CiteScore
5.50
自引率
3.70%
发文量
56
审稿时长
6-12 weeks
期刊介绍: Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.
期刊最新文献
Local Bootstrap for Network Data A Simple Bootstrap for Chatterjee's Rank Correlation Sensitivity models and bounds under sequential unmeasured confounding in longitudinal studies Studies in the history of probability and statistics, LI: the first conditional logistic regression Skip-sampling: subsampling in the frequency domain
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1