Ron Goeken, Lap Huynh, Thomas Lenius, Rebecca Vick
{"title":"New Methods of Census Record Linking.","authors":"Ron Goeken, Lap Huynh, Thomas Lenius, Rebecca Vick","doi":"10.1080/01615440.2010.517152","DOIUrl":null,"url":null,"abstract":"<p><p>The Minnesota Population Center (MPC) has released linked datasets through its NAPP and IPUMS projects, making them readily accessible to researchers. Prior to the availability of complete count census microdata from the MPC, researchers applied various forms of record-linking software. This essay describes the techniques used in the MPC's linking program and briefly compares this technique with those used by other researchers. The key feature of the MPC linking method is the construction of cumulative name similarity scores, based on approximately 2.5 billion record comparisons; we also use support vector mechanics to classify potential links. This article explains modifications made for the final linked datasets and includes a discussion of the role of weighting variables when using linked data.</p>","PeriodicalId":45535,"journal":{"name":"Historical Methods","volume":"44 1","pages":"7-14"},"PeriodicalIF":1.6000,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01615440.2010.517152","citationCount":"62","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Historical Methods","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/01615440.2010.517152","RegionNum":2,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HISTORY","Score":null,"Total":0}
引用次数: 62
Abstract
The Minnesota Population Center (MPC) has released linked datasets through its NAPP and IPUMS projects, making them readily accessible to researchers. Prior to the availability of complete count census microdata from the MPC, researchers applied various forms of record-linking software. This essay describes the techniques used in the MPC's linking program and briefly compares this technique with those used by other researchers. The key feature of the MPC linking method is the construction of cumulative name similarity scores, based on approximately 2.5 billion record comparisons; we also use support vector mechanics to classify potential links. This article explains modifications made for the final linked datasets and includes a discussion of the role of weighting variables when using linked data.
期刊介绍:
Historical Methodsreaches an international audience of social scientists concerned with historical problems. It explores interdisciplinary approaches to new data sources, new approaches to older questions and material, and practical discussions of computer and statistical methodology, data collection, and sampling procedures. The journal includes the following features: “Evidence Matters” emphasizes how to find, decipher, and analyze evidence whether or not that evidence is meant to be quantified. “Database Developments” announces major new public databases or large alterations in older ones, discusses innovative ways to organize them, and explains new ways of categorizing information.