{"title":"Synthetic population data for small area estimation in the United States","authors":"Yue Lin","doi":"10.1177/23998083231215825","DOIUrl":null,"url":null,"abstract":"Small area estimation is critical for a wide range of applications, including urban planning, funding distribution, and policy formulation. Individual-level population data, which typically include each individual’s socio-demographic characteristics and small area location, are a rich source of information for small area estimation. However, individual-level population data are often not made public due to confidentiality concerns. This paper describes the development of a public-use synthetic individual-level population dataset in the United States that can be useful for small area estimation. This dataset contains characteristics of housing type, age, sex, race, and Hispanic or Latino origin for all 308,745,538 individuals in the United States at the census block group level, based on publicly available aggregated data from the 2010 Census. Experimental results suggest the validity of the synthetic data by comparing it to different data sources, and we show examples of how this dataset can be used in small area estimation.","PeriodicalId":11863,"journal":{"name":"Environment and Planning B: Urban Analytics and City Science","volume":"78 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environment and Planning B: Urban Analytics and City Science","FirstCategoryId":"96","ListUrlMain":"https://doi.org/10.1177/23998083231215825","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}
引用次数: 0
Abstract
Small area estimation is critical for a wide range of applications, including urban planning, funding distribution, and policy formulation. Individual-level population data, which typically include each individual’s socio-demographic characteristics and small area location, are a rich source of information for small area estimation. However, individual-level population data are often not made public due to confidentiality concerns. This paper describes the development of a public-use synthetic individual-level population dataset in the United States that can be useful for small area estimation. This dataset contains characteristics of housing type, age, sex, race, and Hispanic or Latino origin for all 308,745,538 individuals in the United States at the census block group level, based on publicly available aggregated data from the 2010 Census. Experimental results suggest the validity of the synthetic data by comparing it to different data sources, and we show examples of how this dataset can be used in small area estimation.