Eric A Whitsel, P Miguel Quibrera, Richard L Smith, Diane J Catellier, Duanping Liao, Amanda C Henley, Gerardo Heiss
{"title":"商业地理编码的准确性:评估与影响。","authors":"Eric A Whitsel, P Miguel Quibrera, Richard L Smith, Diane J Catellier, Duanping Liao, Amanda C Henley, Gerardo Heiss","doi":"10.1186/1742-5573-3-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Published studies of geocoding accuracy often focus on a single geographic area, address source or vendor, do not adjust accuracy measures for address characteristics, and do not examine effects of inaccuracy on exposure measures. We addressed these issues in a Women's Health Initiative ancillary study, the Environmental Epidemiology of Arrhythmogenesis in WHI.</p><p><strong>Results: </strong>Addresses in 49 U.S. states (n = 3,615) with established coordinates were geocoded by four vendors (A-D). There were important differences among vendors in address match rate (98%; 82%; 81%; 30%), concordance between established and vendor-assigned census tracts (85%; 88%; 87%; 98%) and distance between established and vendor-assigned coordinates (mean rho [meters]: 1809; 748; 704; 228). Mean rho was lowest among street-matched, complete, zip-coded, unedited and urban addresses, and addresses with North American Datum of 1983 or World Geodetic System of 1984 coordinates. In mixed models restricted to vendors with minimally acceptable match rates (A-C) and adjusted for address characteristics, within-address correlation, and among-vendor heteroscedasticity of rho, differences in mean rho were small for street-type matches (280; 268; 275), i.e. likely to bias results relying on them about equally for most applications. In contrast, differences between centroid-type matches were substantial in some vendor contrasts, but not others (5497; 4303; 4210) p(interaction) < 10(-4), i.e. more likely to bias results differently in many applications. The adjusted odds of an address match was higher for vendor A versus C (odds ratio = 66, 95% confidence interval: 47, 93), but not B versus C (OR = 1.1, 95% CI: 0.9, 1.3). That of census tract concordance was no higher for vendor A versus C (OR = 1.0, 95% CI: 0.9, 1.2) or B versus C (OR = 1.1, 95% CI: 0.9, 1.3). Misclassification of a related exposure measure--distance to the nearest highway--increased with mean rho and in the absence of confounding, non-differential misclassification of this distance biased its hypothetical association with coronary heart disease mortality toward the null.</p><p><strong>Conclusion: </strong>Geocoding error depends on measures used to evaluate it, address characteristics and vendor. Vendor selection presents a trade-off between potential for missing data and error in estimating spatially defined attributes. Informed selection is needed to control the trade-off and adjust analyses for its effects.</p>","PeriodicalId":87082,"journal":{"name":"Epidemiologic perspectives & innovations : EP+I","volume":"3 ","pages":"8"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557664/pdf/","citationCount":"0","resultStr":"{\"title\":\"Accuracy of commercial geocoding: assessment and implications.\",\"authors\":\"Eric A Whitsel, P Miguel Quibrera, Richard L Smith, Diane J Catellier, Duanping Liao, Amanda C Henley, Gerardo Heiss\",\"doi\":\"10.1186/1742-5573-3-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Published studies of geocoding accuracy often focus on a single geographic area, address source or vendor, do not adjust accuracy measures for address characteristics, and do not examine effects of inaccuracy on exposure measures. We addressed these issues in a Women's Health Initiative ancillary study, the Environmental Epidemiology of Arrhythmogenesis in WHI.</p><p><strong>Results: </strong>Addresses in 49 U.S. states (n = 3,615) with established coordinates were geocoded by four vendors (A-D). There were important differences among vendors in address match rate (98%; 82%; 81%; 30%), concordance between established and vendor-assigned census tracts (85%; 88%; 87%; 98%) and distance between established and vendor-assigned coordinates (mean rho [meters]: 1809; 748; 704; 228). Mean rho was lowest among street-matched, complete, zip-coded, unedited and urban addresses, and addresses with North American Datum of 1983 or World Geodetic System of 1984 coordinates. In mixed models restricted to vendors with minimally acceptable match rates (A-C) and adjusted for address characteristics, within-address correlation, and among-vendor heteroscedasticity of rho, differences in mean rho were small for street-type matches (280; 268; 275), i.e. likely to bias results relying on them about equally for most applications. In contrast, differences between centroid-type matches were substantial in some vendor contrasts, but not others (5497; 4303; 4210) p(interaction) < 10(-4), i.e. more likely to bias results differently in many applications. The adjusted odds of an address match was higher for vendor A versus C (odds ratio = 66, 95% confidence interval: 47, 93), but not B versus C (OR = 1.1, 95% CI: 0.9, 1.3). That of census tract concordance was no higher for vendor A versus C (OR = 1.0, 95% CI: 0.9, 1.2) or B versus C (OR = 1.1, 95% CI: 0.9, 1.3). Misclassification of a related exposure measure--distance to the nearest highway--increased with mean rho and in the absence of confounding, non-differential misclassification of this distance biased its hypothetical association with coronary heart disease mortality toward the null.</p><p><strong>Conclusion: </strong>Geocoding error depends on measures used to evaluate it, address characteristics and vendor. Vendor selection presents a trade-off between potential for missing data and error in estimating spatially defined attributes. Informed selection is needed to control the trade-off and adjust analyses for its effects.</p>\",\"PeriodicalId\":87082,\"journal\":{\"name\":\"Epidemiologic perspectives & innovations : EP+I\",\"volume\":\"3 \",\"pages\":\"8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557664/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemiologic perspectives & innovations : EP+I\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/1742-5573-3-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiologic perspectives & innovations : EP+I","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/1742-5573-3-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
背景:已发表的关于地理编码准确性的研究通常只关注单一地理区域、地址来源或供应商,不会根据地址特征调整准确性测量,也不会检查不准确性对暴露测量的影响。我们在 "妇女健康倡议 "辅助研究 "WHI 中心律失常发生的环境流行病学 "中解决了这些问题:四家供应商(A-D)对美国 49 个州(n = 3,615 个)具有既定坐标的地址进行了地理编码。不同供应商在地址匹配率(98%;82%;81%;30%)、既定普查区与供应商分配的普查区之间的一致性(85%;88%;87%;98%)以及既定坐标与供应商分配的坐标之间的距离(平均 rho [米]:1809;748;704;228)方面存在显著差异。在街道匹配、完整、邮政编码、未编辑和城市地址,以及具有 1983 年北美基准或 1984 年世界大地测量系统坐标的地址中,平均 rho 值最低。在混合模型中,仅限于匹配率最低的供应商(A-C),并根据地址特征、地址内相关性和供应商间 rho 的异方差性进行调整,街道类型匹配的平均 rho 差异较小(280;268;275),也就是说,在大多数应用中,依靠街道类型匹配的结果可能会有偏差。与此相反,在某些供应商对比中,中心点类型匹配的差异很大,而在其他供应商对比中则不然(5497;4303;4210),p(交互作用) < 10(-4),即在许多应用中更有可能对结果产生不同的偏差。供应商 A 与供应商 C 的地址匹配调整后几率更高(几率比 = 66,95% 置信区间:47, 93),但供应商 B 与供应商 C 的地址匹配调整后几率不高(OR = 1.1,95% 置信区间:0.9, 1.3)。人口普查区的一致性在供应商 A 与供应商 C(OR = 1.0,95% CI:0.9,1.2)或供应商 B 与供应商 C(OR = 1.1,95% CI:0.9,1.3)之间并不高。在没有混杂因素的情况下,该距离的非差异性错误分类使其与冠心病死亡率的假定关联偏向于空值:地理编码误差取决于评估误差的方法、地址特征和供应商。供应商的选择需要在潜在的数据缺失和空间定义属性估计误差之间进行权衡。需要在知情的情况下进行选择,以控制权衡,并根据其影响调整分析。
Accuracy of commercial geocoding: assessment and implications.
Background: Published studies of geocoding accuracy often focus on a single geographic area, address source or vendor, do not adjust accuracy measures for address characteristics, and do not examine effects of inaccuracy on exposure measures. We addressed these issues in a Women's Health Initiative ancillary study, the Environmental Epidemiology of Arrhythmogenesis in WHI.
Results: Addresses in 49 U.S. states (n = 3,615) with established coordinates were geocoded by four vendors (A-D). There were important differences among vendors in address match rate (98%; 82%; 81%; 30%), concordance between established and vendor-assigned census tracts (85%; 88%; 87%; 98%) and distance between established and vendor-assigned coordinates (mean rho [meters]: 1809; 748; 704; 228). Mean rho was lowest among street-matched, complete, zip-coded, unedited and urban addresses, and addresses with North American Datum of 1983 or World Geodetic System of 1984 coordinates. In mixed models restricted to vendors with minimally acceptable match rates (A-C) and adjusted for address characteristics, within-address correlation, and among-vendor heteroscedasticity of rho, differences in mean rho were small for street-type matches (280; 268; 275), i.e. likely to bias results relying on them about equally for most applications. In contrast, differences between centroid-type matches were substantial in some vendor contrasts, but not others (5497; 4303; 4210) p(interaction) < 10(-4), i.e. more likely to bias results differently in many applications. The adjusted odds of an address match was higher for vendor A versus C (odds ratio = 66, 95% confidence interval: 47, 93), but not B versus C (OR = 1.1, 95% CI: 0.9, 1.3). That of census tract concordance was no higher for vendor A versus C (OR = 1.0, 95% CI: 0.9, 1.2) or B versus C (OR = 1.1, 95% CI: 0.9, 1.3). Misclassification of a related exposure measure--distance to the nearest highway--increased with mean rho and in the absence of confounding, non-differential misclassification of this distance biased its hypothetical association with coronary heart disease mortality toward the null.
Conclusion: Geocoding error depends on measures used to evaluate it, address characteristics and vendor. Vendor selection presents a trade-off between potential for missing data and error in estimating spatially defined attributes. Informed selection is needed to control the trade-off and adjust analyses for its effects.