Learning from user behavior: A survey-assist algorithm for longitudinal mobility data collection

IF 5.1 2区工程技术 Q1 TRANSPORTATION Travel Behaviour and Society Pub Date : 2024-03-22 DOI:10.1016/j.tbs.2024.100761

Hannah Lu, Katie Rischpater, K. Shankari

{"title":"Learning from user behavior: A survey-assist algorithm for longitudinal mobility data collection","authors":"Hannah Lu, Katie Rischpater, K. Shankari","doi":"10.1016/j.tbs.2024.100761","DOIUrl":null,"url":null,"abstract":"<div>GPS-based travel surveys are widely used in mobility studies to gather crucial qualitative data, like purpose, transportation mode and replaced mode. However, survey response still poses a burden to users, especially in long-term mobility studies, leading to response fatigue. We explore a survey-assist strategy to ease this burden by a novel, user-level modeling approach that leverages past responses from each user to predict responses for new trips, without relying on external data sources like GIS data.We investigate three main algorithms for predicting responses: (i) clustering trips and extrapolating responses for similar trips, (ii) using random forest classification, and (iii) clustering that uses a hybrid algorithm to determine spatial structure, which is then fed as input to a classic random forest classifier. The clustering approach can flexibly predict responses for even complex qualitative survey questions; it achieved F-scores of 65%. The random forest pipeline uses architecture that restricts it to predicting three predetermined survey questions: trip purpose, mode, and replaced mode. However, it achieved F-scores of 78%.While the survey-assist approach has been implemented by several proprietary systems, to our knowledge, this is the first exploration in the academic literature. It follows that this is also the first rigorous evaluation of multiple algorithms that can implement the approach. The evaluation uses a large scale, publicly available, longitudinal dataset consisting of <math><mrow><mo>≈</mo></mrow></math> 92 k trips from 235 users over a period of roughly one and a half years.With this approach, travel surveys can be pre-filled with the predicted responses for each trip, thus streamlining the survey process for users. Combined with an active learning system that requests user input on low-confidence predictions, models can be updated and improved over time to better support the long-term collection of longitudinal qualitative data.</div>","PeriodicalId":51534,"journal":{"name":"Travel Behaviour and Society","volume":null,"pages":null},"PeriodicalIF":5.1000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Travel Behaviour and Society","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214367X24000243","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION","Score":null,"Total":0}

引用次数: 0

Abstract

GPS-based travel surveys are widely used in mobility studies to gather crucial qualitative data, like purpose, transportation mode and replaced mode. However, survey response still poses a burden to users, especially in long-term mobility studies, leading to response fatigue. We explore a survey-assist strategy to ease this burden by a novel, user-level modeling approach that leverages past responses from each user to predict responses for new trips, without relying on external data sources like GIS data.

We investigate three main algorithms for predicting responses: (i) clustering trips and extrapolating responses for similar trips, (ii) using random forest classification, and (iii) clustering that uses a hybrid algorithm to determine spatial structure, which is then fed as input to a classic random forest classifier. The clustering approach can flexibly predict responses for even complex qualitative survey questions; it achieved F-scores of 65%. The random forest pipeline uses architecture that restricts it to predicting three predetermined survey questions: trip purpose, mode, and replaced mode. However, it achieved F-scores of 78%.

While the survey-assist approach has been implemented by several proprietary systems, to our knowledge, this is the first exploration in the academic literature. It follows that this is also the first rigorous evaluation of multiple algorithms that can implement the approach. The evaluation uses a large scale, publicly available, longitudinal dataset consisting of $\approx$ 92 k trips from 235 users over a period of roughly one and a half years.

With this approach, travel surveys can be pre-filled with the predicted responses for each trip, thus streamlining the survey process for users. Combined with an active learning system that requests user input on low-confidence predictions, models can be updated and improved over time to better support the long-term collection of longitudinal qualitative data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从用户行为中学习：用于纵向移动数据收集的调查辅助算法

基于 GPS 的出行调查被广泛应用于流动性研究，以收集关键的定性数据，如出行目的、交通方式和替代模式。然而，调查回复仍然给用户带来了负担，尤其是在长期流动性研究中，这会导致回复疲劳。我们探索了一种调查辅助策略，通过一种新颖的用户级建模方法来减轻这种负担，该方法利用每个用户过去的回复来预测新出行的回复，而无需依赖地理信息系统数据等外部数据源。我们研究了预测回复的三种主要算法：(i) 对出行进行聚类，并推断类似出行的回复；(ii) 使用随机森林分类法；(iii) 使用混合算法确定空间结构的聚类，然后将其作为经典随机森林分类器的输入。即使是复杂的定性调查问题，聚类方法也能灵活预测答案；其 F 分数达到 65%。随机森林管道使用的架构限制了它预测三个预先确定的调查问题：旅行目的、模式和替换模式。据我们所知，这是学术文献中的首次探索。据我们所知，这是首次在学术文献中进行探讨，因此这也是首次对可以实现该方法的多种算法进行严格评估。评估使用了一个大规模、公开的纵向数据集，该数据集由 235 名用户在大约一年半的时间内的≈ 92 k 次旅行组成。使用这种方法，旅行调查可以预先填写每次旅行的预测回复，从而简化用户的调查流程。结合主动学习系统（该系统要求用户对低置信度预测进行输入），模型可以随着时间的推移不断更新和改进，从而更好地支持纵向定性数据的长期收集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Travel Behaviour and Society TRANSPORTATION-

CiteScore

9.80

自引率

7.70%

发文量

109

期刊介绍： Travel Behaviour and Society is an interdisciplinary journal publishing high-quality original papers which report leading edge research in theories, methodologies and applications concerning transportation issues and challenges which involve the social and spatial dimensions. In particular, it provides a discussion forum for major research in travel behaviour, transportation infrastructure, transportation and environmental issues, mobility and social sustainability, transportation geographic information systems (TGIS), transportation and quality of life, transportation data collection and analysis, etc.