{"title":"使用合成数据替换链接派生元素:一个案例研究。","authors":"Dean M Resnick, Christine S Cox, Lisa B Mirel","doi":"10.1007/s10742-021-00241-z","DOIUrl":null,"url":null,"abstract":"<p><p>While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that are like those based on the actual data, this evaluation is an essential first step in assessing the potential utility of synthetic mortality data.</p>","PeriodicalId":45600,"journal":{"name":"Health Services and Outcomes Research Methodology","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10742-021-00241-z","citationCount":"2","resultStr":"{\"title\":\"Using Synthetic Data to Replace Linkage Derived Elements: A Case Study.\",\"authors\":\"Dean M Resnick, Christine S Cox, Lisa B Mirel\",\"doi\":\"10.1007/s10742-021-00241-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that are like those based on the actual data, this evaluation is an essential first step in assessing the potential utility of synthetic mortality data.</p>\",\"PeriodicalId\":45600,\"journal\":{\"name\":\"Health Services and Outcomes Research Methodology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2021-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s10742-021-00241-z\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Services and Outcomes Research Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10742-021-00241-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Services and Outcomes Research Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10742-021-00241-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Using Synthetic Data to Replace Linkage Derived Elements: A Case Study.
While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that are like those based on the actual data, this evaluation is an essential first step in assessing the potential utility of synthetic mortality data.
期刊介绍:
The journal reflects the multidisciplinary nature of the field of health services and outcomes research. It addresses the needs of multiple, interlocking communities, including methodologists in statistics, econometrics, social and behavioral sciences; designers and analysts of health policy and health services research projects; and health care providers and policy makers who need to properly understand and evaluate the results of published research. The journal strives to enhance the level of methodologic rigor in health services and outcomes research and contributes to the development of methodologic standards in the field. In pursuing its main objective, the journal also provides a meeting ground for researchers from a number of traditional disciplines and fosters the development of new quantitative, qualitative, and mixed methods by statisticians, econometricians, health services researchers, and methodologists in other fields. Health Services and Outcomes Research Methodology publishes: Research papers on quantitative, qualitative, and mixed methods; Case Studies describing applications of quantitative and qualitative methodology in health services and outcomes research; Review Articles synthesizing and popularizing methodologic developments; Tutorials; Articles on computational issues and software reviews; Book reviews; and Notices. Special issues will be devoted to papers presented at important workshops and conferences.