{"title":"在基于手机的大规模数据中发现代表性偏差:北卡罗来纳州案例研究","authors":"Hanna V. Jardel, Paul L. Delamater","doi":"10.1111/gean.12399","DOIUrl":null,"url":null,"abstract":"<p>Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.</p>","PeriodicalId":12533,"journal":{"name":"Geographical Analysis","volume":"56 4","pages":"723-745"},"PeriodicalIF":3.3000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncovering Representation Bias in Large-scale Cellular Phone-based Data: A Case Study in North Carolina\",\"authors\":\"Hanna V. Jardel, Paul L. Delamater\",\"doi\":\"10.1111/gean.12399\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.</p>\",\"PeriodicalId\":12533,\"journal\":{\"name\":\"Geographical Analysis\",\"volume\":\"56 4\",\"pages\":\"723-745\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geographical Analysis\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/gean.12399\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geographical Analysis","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/gean.12399","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
Uncovering Representation Bias in Large-scale Cellular Phone-based Data: A Case Study in North Carolina
Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.
期刊介绍:
First in its specialty area and one of the most frequently cited publications in geography, Geographical Analysis has, since 1969, presented significant advances in geographical theory, model building, and quantitative methods to geographers and scholars in a wide spectrum of related fields. Traditionally, mathematical and nonmathematical articulations of geographical theory, and statements and discussions of the analytic paradigm are published in the journal. Spatial data analyses and spatial econometrics and statistics are strongly represented.