在基于手机的大规模数据中发现代表性偏差:北卡罗来纳州案例研究

IF 3.3 3区 地球科学 Q1 GEOGRAPHY Geographical Analysis Pub Date : 2024-04-08 DOI:10.1111/gean.12399
Hanna V. Jardel, Paul L. Delamater
{"title":"在基于手机的大规模数据中发现代表性偏差:北卡罗来纳州案例研究","authors":"Hanna V. Jardel,&nbsp;Paul L. Delamater","doi":"10.1111/gean.12399","DOIUrl":null,"url":null,"abstract":"<p>Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.</p>","PeriodicalId":12533,"journal":{"name":"Geographical Analysis","volume":"56 4","pages":"723-745"},"PeriodicalIF":3.3000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncovering Representation Bias in Large-scale Cellular Phone-based Data: A Case Study in North Carolina\",\"authors\":\"Hanna V. Jardel,&nbsp;Paul L. Delamater\",\"doi\":\"10.1111/gean.12399\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.</p>\",\"PeriodicalId\":12533,\"journal\":{\"name\":\"Geographical Analysis\",\"volume\":\"56 4\",\"pages\":\"723-745\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geographical Analysis\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/gean.12399\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geographical Analysis","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/gean.12399","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0

摘要

基于手机的大型移动数据集是人类移动研究的重要新数据源。我们调查并说明了大型移动数据集在普查街区组、片区和县一级的代表性偏差。我们将 2019 年美国社区调查(ACS)数据与 SafeGraph(SG)手机移动数据配对,通过将 ACS 估算人口与 SG 数据中的设备数量进行对比,并按照收入、黑人人口比例、55 岁以上人口比例、18-65 岁人口比例、拥挤居住条件人口比例和城市化水平等关键社会人口变量进行分层,来阐明 SG 数据中的潜在偏差。我们通过对 10 个月期间的研究,评估了偏差是否随时间而变化。这种偏差会随着主要人口特征的变化和时间的推移而变化。具体而言,我们发现在所有汇总水平上,黑人人口比例最高的地区代表性不足。我们还发现,在工作年龄居民比例最高的地区以及收入中位数最低的地区,所有层面的代表性都不足。研究人员在使用流动性数据集时应谨慎,因为关键社会人口因素和收集时间不同会造成偏差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Uncovering Representation Bias in Large-scale Cellular Phone-based Data: A Case Study in North Carolina

Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.70
自引率
5.60%
发文量
40
期刊介绍: First in its specialty area and one of the most frequently cited publications in geography, Geographical Analysis has, since 1969, presented significant advances in geographical theory, model building, and quantitative methods to geographers and scholars in a wide spectrum of related fields. Traditionally, mathematical and nonmathematical articulations of geographical theory, and statements and discussions of the analytic paradigm are published in the journal. Spatial data analyses and spatial econometrics and statistics are strongly represented.
期刊最新文献
Issue Information Impacts of improved transport on regional market access Testing Hypotheses When You Have More Than a Few* Beyond Auto‐Models: Self‐Correlated Sui‐Model Respecifications Issue Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1