Spatial Disaggregation of Population Subgroups Leveraging Self-Trained Multi-Output Gradient Boosting Regression Trees

Marina Georgati, J. Monteiro, Bruno Martins, C. Kessler
{"title":"Spatial Disaggregation of Population Subgroups Leveraging Self-Trained Multi-Output Gradient Boosting Regression Trees","authors":"Marina Georgati, J. Monteiro, Bruno Martins, C. Kessler","doi":"10.5194/agile-giss-3-5-2022","DOIUrl":null,"url":null,"abstract":"Abstract. Accurate and consistent estimations on the present and future population distribution, at fine spatial resolution, are fundamental to support a variety of activities. However, the sampling regime, sample size, and methods used to collect census data are heterogeneous across temporal periods and/or geographic regions. Moreover, the data is usually only made available in aggregated form, to ensure privacy. In an attempt to address these issues, several previous initiatives have addressed the use of spatial disaggregation methods to produce high-resolution gridded datasets describing the human population distribution, although these projects have usually not addressed specific population subgroups. This paper describes a spatial disaggregation method based on self-training regression models, innovating over previous studies in the simultaneous prediction of disaggregated counts for multiple inter-related variables, by leveraging multi-output models based on gradient tree boosting. We report on experiments for two case studies, using high-resolution data (i.e., counts for different subgroups available at a resolution of 100 meters) for the municipality of Amsterdam and the region of Greater Copenhagen. Results show that the proposed approach can capture spatial heterogeneity and the dependency on local factors, outperforming alternatives (e.g., seminal disaggregation algorithms, or approaches leveraging individual regression models for each variable) in terms of averaged error metrics, and also upon visual inspection of spatial variation in the resulting maps.\n","PeriodicalId":116168,"journal":{"name":"AGILE: GIScience Series","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AGILE: GIScience Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/agile-giss-3-5-2022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Abstract. Accurate and consistent estimations on the present and future population distribution, at fine spatial resolution, are fundamental to support a variety of activities. However, the sampling regime, sample size, and methods used to collect census data are heterogeneous across temporal periods and/or geographic regions. Moreover, the data is usually only made available in aggregated form, to ensure privacy. In an attempt to address these issues, several previous initiatives have addressed the use of spatial disaggregation methods to produce high-resolution gridded datasets describing the human population distribution, although these projects have usually not addressed specific population subgroups. This paper describes a spatial disaggregation method based on self-training regression models, innovating over previous studies in the simultaneous prediction of disaggregated counts for multiple inter-related variables, by leveraging multi-output models based on gradient tree boosting. We report on experiments for two case studies, using high-resolution data (i.e., counts for different subgroups available at a resolution of 100 meters) for the municipality of Amsterdam and the region of Greater Copenhagen. Results show that the proposed approach can capture spatial heterogeneity and the dependency on local factors, outperforming alternatives (e.g., seminal disaggregation algorithms, or approaches leveraging individual regression models for each variable) in terms of averaged error metrics, and also upon visual inspection of spatial variation in the resulting maps.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用自训练多输出梯度增强回归树的人口子群空间分解
摘要以精细的空间分辨率对当前和未来人口分布进行准确和一致的估计是支持各种活动的基础。然而,抽样制度、样本量和用于收集人口普查数据的方法在不同的时间和/或地理区域是不同的。此外,数据通常只以汇总形式提供,以确保隐私。为了解决这些问题,以前的一些倡议已经解决了使用空间分解方法来产生描述人口分布的高分辨率网格数据集的问题,尽管这些项目通常没有处理具体的人口亚组。本文描述了一种基于自训练回归模型的空间分解方法,该方法利用基于梯度树提升的多输出模型,在多个相互关联变量的分解计数同时预测方面进行了创新。我们报告了两个案例研究的实验,使用阿姆斯特丹市和大哥本哈根地区的高分辨率数据(即100米分辨率下可用的不同子组计数)。结果表明,所提出的方法可以捕获空间异质性和对局部因素的依赖,在平均误差度量方面优于替代方法(例如,种子分解算法,或利用每个变量的单独回归模型的方法),并且在最终地图的空间变化方面也优于视觉检查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Is it safe to be attractive? Disentangling the influence of streetscape features on the perceived safety and attractiveness of city streets Satellite parking: a new method for measuring parking occupancy Semantic complexity of geographic questions - A comparison in terms of conceptual transformations of answers Development of an inclusive Mapping Application in a Co-Design Process Visualizing of the below-ground water network infrastructure
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1