Learning to generate synthetic human mobility data: A physics-regularized Gaussian process approach based on multiple kernel learning

IF 5.8 1区 工程技术 Q1 ECONOMICS Transportation Research Part B-Methodological Pub Date : 2024-11-01 DOI:10.1016/j.trb.2024.103064
Ekin Uğurel , Shuai Huang , Cynthia Chen
{"title":"Learning to generate synthetic human mobility data: A physics-regularized Gaussian process approach based on multiple kernel learning","authors":"Ekin Uğurel ,&nbsp;Shuai Huang ,&nbsp;Cynthia Chen","doi":"10.1016/j.trb.2024.103064","DOIUrl":null,"url":null,"abstract":"<div><div>Passively-generated mobile data has grown increasingly popular in the travel behavior (or human mobility) literature. A relatively untapped potential for passively-generated mobile data is synthetic population generation, which is the basis for any large-scale simulations for purposes ranging from state monitoring, policy evaluation, and digital twins. And yet, this significant potential may be hindered by the growing sparsity or rate of missingness in the data, which stems from heightened privacy concerns among both data vendors and consumers (users of service platforms generating individual mobile data). To both fulfill the great potential and to address sparsity in the data, there is a need to develop a flexible and scalable model that can capture individual heterogeneity and adapt to changes in mobility patterns. We propose a conditional-generative Gaussian process framework that learns kernel structures characterizing individual mobile data and can provably replicate observed patterns. Our approach integrates physical knowledge to regularize the framework such that the generated data obeys constraints imposed by the built and natural environments (such as those on velocity and bearing). To capture travel behavior heterogeneity at the individual level, we propose a data-driven multiple kernel learning approach to determine the optimal composite kernel for every user. Our experiments demonstrate that: (1) the impact of kernel choice on mobility metrics derived from synthetic data is non-negligible; (2) physics-regularization not only reduces model bias but also improves uncertainty estimates associated with the predicted locations; and (3) the proposed method is robust and generalizes well to varying individuals and modes of travel.</div></div>","PeriodicalId":54418,"journal":{"name":"Transportation Research Part B-Methodological","volume":"189 ","pages":"Article 103064"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part B-Methodological","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0191261524001887","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

Passively-generated mobile data has grown increasingly popular in the travel behavior (or human mobility) literature. A relatively untapped potential for passively-generated mobile data is synthetic population generation, which is the basis for any large-scale simulations for purposes ranging from state monitoring, policy evaluation, and digital twins. And yet, this significant potential may be hindered by the growing sparsity or rate of missingness in the data, which stems from heightened privacy concerns among both data vendors and consumers (users of service platforms generating individual mobile data). To both fulfill the great potential and to address sparsity in the data, there is a need to develop a flexible and scalable model that can capture individual heterogeneity and adapt to changes in mobility patterns. We propose a conditional-generative Gaussian process framework that learns kernel structures characterizing individual mobile data and can provably replicate observed patterns. Our approach integrates physical knowledge to regularize the framework such that the generated data obeys constraints imposed by the built and natural environments (such as those on velocity and bearing). To capture travel behavior heterogeneity at the individual level, we propose a data-driven multiple kernel learning approach to determine the optimal composite kernel for every user. Our experiments demonstrate that: (1) the impact of kernel choice on mobility metrics derived from synthetic data is non-negligible; (2) physics-regularization not only reduces model bias but also improves uncertainty estimates associated with the predicted locations; and (3) the proposed method is robust and generalizes well to varying individuals and modes of travel.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
学习生成合成人类移动数据:基于多核学习的物理规则化高斯过程方法
被动生成的移动数据在旅行行为(或人类流动性)文献中越来越受欢迎。被动生成的移动数据还有一个相对尚未开发的潜力,那就是合成人口生成,这是为国家监控、政策评估和数字孪生等目的进行大规模模拟的基础。然而,由于数据供应商和消费者(生成个人移动数据的服务平台的用户)对隐私问题的高度关注,数据的稀疏性或遗漏率越来越高,这可能会阻碍这一巨大潜力的发挥。为了发挥数据的巨大潜力并解决数据稀疏性问题,有必要开发一种灵活且可扩展的模型,以捕捉个体异质性并适应移动模式的变化。我们提出了一种条件生成高斯过程框架,该框架可学习表征个体移动数据的核结构,并可证明它能复制观察到的模式。我们的方法整合了物理知识,对框架进行正则化,使生成的数据服从建筑环境和自然环境施加的约束(如速度和方位约束)。为了捕捉个人层面的旅行行为异质性,我们提出了一种数据驱动的多核学习方法,以确定每个用户的最佳复合核。我们的实验证明(1) 内核选择对从合成数据中得出的移动性指标的影响是不可忽略的;(2) 物理规则化不仅减少了模型偏差,还改善了与预测位置相关的不确定性估计;以及 (3) 所提出的方法是稳健的,并能很好地泛化到不同的个人和旅行模式中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Transportation Research Part B-Methodological
Transportation Research Part B-Methodological 工程技术-工程:土木
CiteScore
12.40
自引率
8.80%
发文量
143
审稿时长
14.1 weeks
期刊介绍: Transportation Research: Part B publishes papers on all methodological aspects of the subject, particularly those that require mathematical analysis. The general theme of the journal is the development and solution of problems that are adequately motivated to deal with important aspects of the design and/or analysis of transportation systems. Areas covered include: traffic flow; design and analysis of transportation networks; control and scheduling; optimization; queuing theory; logistics; supply chains; development and application of statistical, econometric and mathematical models to address transportation problems; cost models; pricing and/or investment; traveler or shipper behavior; cost-benefit methodologies.
期刊最新文献
Modelling the impacts of en-route ride-pooling service in a mixed pooling and non-pooling market Amachine learning technique embedded reference-dependent choice model for explanatory power improvement: Shifting of reference point as a key factor in vehicle purchase decision-making Making the most of your private parking slot: Strategy-proof double auctions-enabled staggered sharing schemes Editorial Board Safety, liability, and insurance markets in the age of automated driving
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1