潜在类别模型条件依赖性评估的得分检验及其在记录关联中的应用

IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Journal of the Royal Statistical Society Series C-Applied Statistics Pub Date : 2022-09-18 DOI:10.1111/rssc.12590
Huiping Xu, Xiaochun Li, Zuoyi Zhang, Shaun Grannis
{"title":"潜在类别模型条件依赖性评估的得分检验及其在记录关联中的应用","authors":"Huiping Xu,&nbsp;Xiaochun Li,&nbsp;Zuoyi Zhang,&nbsp;Shaun Grannis","doi":"10.1111/rssc.12590","DOIUrl":null,"url":null,"abstract":"<p>The Fellegi–Sunter model has been widely used in probabilistic record linkage despite its often invalid conditional independence assumption. Prior research has demonstrated that conditional dependence latent class models yield improved match performance when using the correct conditional dependence structure. With a misspecified conditional dependence structure, these models can yield worse performance. It is, therefore, critically important to correctly identify the conditional dependence structure. Existing methods for identifying the conditional dependence structure include the correlation residual plot, the log-odds ratio check, and the bivariate residual, all of which have been shown to perform inadequately. Bootstrap bivariate residual approach and score test have also been proposed and found to have better performance, with the score test having greater power and lower computational burden. In this paper, we extend the score-test-based approach to account for different conditional dependence structures. Through a simulation study, we develop practical recommendations on the utilisation of the score test and assess the match performance with conditional dependence identified by the proposed method. Performance of the proposed method is further evaluated using a real-world record linkage example. Findings show that the proposed method leads to improved matching accuracy relative to the Fellegi–Sunter model.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1663-1687"},"PeriodicalIF":1.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Score test for assessing the conditional dependence in latent class models and its application to record linkage\",\"authors\":\"Huiping Xu,&nbsp;Xiaochun Li,&nbsp;Zuoyi Zhang,&nbsp;Shaun Grannis\",\"doi\":\"10.1111/rssc.12590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The Fellegi–Sunter model has been widely used in probabilistic record linkage despite its often invalid conditional independence assumption. Prior research has demonstrated that conditional dependence latent class models yield improved match performance when using the correct conditional dependence structure. With a misspecified conditional dependence structure, these models can yield worse performance. It is, therefore, critically important to correctly identify the conditional dependence structure. Existing methods for identifying the conditional dependence structure include the correlation residual plot, the log-odds ratio check, and the bivariate residual, all of which have been shown to perform inadequately. Bootstrap bivariate residual approach and score test have also been proposed and found to have better performance, with the score test having greater power and lower computational burden. In this paper, we extend the score-test-based approach to account for different conditional dependence structures. Through a simulation study, we develop practical recommendations on the utilisation of the score test and assess the match performance with conditional dependence identified by the proposed method. Performance of the proposed method is further evaluated using a real-world record linkage example. Findings show that the proposed method leads to improved matching accuracy relative to the Fellegi–Sunter model.</p>\",\"PeriodicalId\":49981,\"journal\":{\"name\":\"Journal of the Royal Statistical Society Series C-Applied Statistics\",\"volume\":\"71 5\",\"pages\":\"1663-1687\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Royal Statistical Society Series C-Applied Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/rssc.12590\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society Series C-Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/rssc.12590","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

尽管Fellegi-Sunter模型的条件独立假设常常是无效的,但它在概率记录关联中得到了广泛的应用。已有研究表明,当使用正确的条件依赖结构时,条件依赖潜类模型的匹配性能得到了提高。如果使用错误指定的条件依赖结构,这些模型可能会产生更差的性能。因此,正确识别条件依赖结构是至关重要的。现有的识别条件依赖结构的方法包括相关残差图、对数-比值比检查和二元残差,但这些方法都表现不佳。Bootstrap双变量残差法和分数检验也被提出,结果表明分数检验具有更好的性能,分数检验具有更大的能力和更低的计算负担。在本文中,我们扩展了基于分数测试的方法来考虑不同的条件依赖结构。通过模拟研究,我们提出了关于分数测试使用的实用建议,并评估了由所提出的方法确定的条件依赖的匹配性能。使用实际记录链接示例进一步评估了所提出方法的性能。研究结果表明,相对于Fellegi-Sunter模型,该方法具有更高的匹配精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Score test for assessing the conditional dependence in latent class models and its application to record linkage

The Fellegi–Sunter model has been widely used in probabilistic record linkage despite its often invalid conditional independence assumption. Prior research has demonstrated that conditional dependence latent class models yield improved match performance when using the correct conditional dependence structure. With a misspecified conditional dependence structure, these models can yield worse performance. It is, therefore, critically important to correctly identify the conditional dependence structure. Existing methods for identifying the conditional dependence structure include the correlation residual plot, the log-odds ratio check, and the bivariate residual, all of which have been shown to perform inadequately. Bootstrap bivariate residual approach and score test have also been proposed and found to have better performance, with the score test having greater power and lower computational burden. In this paper, we extend the score-test-based approach to account for different conditional dependence structures. Through a simulation study, we develop practical recommendations on the utilisation of the score test and assess the match performance with conditional dependence identified by the proposed method. Performance of the proposed method is further evaluated using a real-world record linkage example. Findings show that the proposed method leads to improved matching accuracy relative to the Fellegi–Sunter model.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.50
自引率
0.00%
发文量
76
审稿时长
>12 weeks
期刊介绍: The Journal of the Royal Statistical Society, Series C (Applied Statistics) is a journal of international repute for statisticians both inside and outside the academic world. The journal is concerned with papers which deal with novel solutions to real life statistical problems by adapting or developing methodology, or by demonstrating the proper application of new or existing statistical methods to them. At their heart therefore the papers in the journal are motivated by examples and statistical data of all kinds. The subject-matter covers the whole range of inter-disciplinary fields, e.g. applications in agriculture, genetics, industry, medicine and the physical sciences, and papers on design issues (e.g. in relation to experiments, surveys or observational studies). A deep understanding of statistical methodology is not necessary to appreciate the content. Although papers describing developments in statistical computing driven by practical examples are within its scope, the journal is not concerned with simply numerical illustrations or simulation studies. The emphasis of Series C is on case-studies of statistical analyses in practice.
期刊最新文献
tdCoxSNN: Time-dependent Cox survival neural network for continuous-time dynamic prediction. Measuring the impact of new risk factors within survival models. Non-parametric Bayesian approach to multiple treatment comparisons in network meta-analysis with application to comparisons of anti-depressants. Joint modelling of survival and backwards recurrence outcomes: an analysis of factors associated with fertility treatment in the U.S. Walking fingerprinting.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1