Shenhao Wang , Baichuan Mo , Yunhan Zheng , Stephane Hess , Jinhua Zhao
{"title":"Comparing hundreds of machine learning and discrete choice models for travel demand modeling: An empirical benchmark","authors":"Shenhao Wang , Baichuan Mo , Yunhan Zheng , Stephane Hess , Jinhua Zhao","doi":"10.1016/j.trb.2024.103061","DOIUrl":null,"url":null,"abstract":"<div><p>Numerous studies have compared machine learning (ML) and discrete choice models (DCMs) in predicting travel demand. However, these studies often lack generalizability as they compare models deterministically without considering contextual variations. To address this limitation, our study develops an empirical benchmark by designing a tournament model to learn the intrinsic predictive values of ML and DCMs. This novel approach enables us to efficiently summarize a large number of experiments, quantify the randomness in model comparisons, and use formal statistical tests to differentiate between the model and contextual effects. This benchmark study compares two large-scale data sources: a database compiled from literature review summarizing 136 experiments from 35 studies, and our own experiment data, encompassing a total of 6970 experiments from 105 models and 12 model families, tested repeatedly on three datasets, sample sizes, and choice categories. This benchmark study yields two key findings. Firstly, many ML models, particularly the ensemble methods and deep learning, statistically outperform the DCM family and its individual variants (i.e., multinomial, nested, and mixed logit), thus corroborating with the previous research. However, this study also highlights the crucial role of the contextual factors (i.e., data sources, inputs and choice categories), which can explain models’ predictive performance more effectively than the differences in model types alone. Model performance varies significantly with data sources, improving with larger sample sizes and lower dimensional alternative sets. After controlling all the model and contextual factors, significant randomness still remains, implying inherent uncertainty in such model comparisons. Overall, we suggest that future researchers shift more focus from context-specific and deterministic model comparisons towards examining model transferability across contexts and characterizing the inherent uncertainty in ML, thus creating more robust and generalizable next-generation travel demand models.</p></div>","PeriodicalId":54418,"journal":{"name":"Transportation Research Part B-Methodological","volume":"190 ","pages":"Article 103061"},"PeriodicalIF":5.8000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part B-Methodological","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0191261524001851","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
Numerous studies have compared machine learning (ML) and discrete choice models (DCMs) in predicting travel demand. However, these studies often lack generalizability as they compare models deterministically without considering contextual variations. To address this limitation, our study develops an empirical benchmark by designing a tournament model to learn the intrinsic predictive values of ML and DCMs. This novel approach enables us to efficiently summarize a large number of experiments, quantify the randomness in model comparisons, and use formal statistical tests to differentiate between the model and contextual effects. This benchmark study compares two large-scale data sources: a database compiled from literature review summarizing 136 experiments from 35 studies, and our own experiment data, encompassing a total of 6970 experiments from 105 models and 12 model families, tested repeatedly on three datasets, sample sizes, and choice categories. This benchmark study yields two key findings. Firstly, many ML models, particularly the ensemble methods and deep learning, statistically outperform the DCM family and its individual variants (i.e., multinomial, nested, and mixed logit), thus corroborating with the previous research. However, this study also highlights the crucial role of the contextual factors (i.e., data sources, inputs and choice categories), which can explain models’ predictive performance more effectively than the differences in model types alone. Model performance varies significantly with data sources, improving with larger sample sizes and lower dimensional alternative sets. After controlling all the model and contextual factors, significant randomness still remains, implying inherent uncertainty in such model comparisons. Overall, we suggest that future researchers shift more focus from context-specific and deterministic model comparisons towards examining model transferability across contexts and characterizing the inherent uncertainty in ML, thus creating more robust and generalizable next-generation travel demand models.
期刊介绍:
Transportation Research: Part B publishes papers on all methodological aspects of the subject, particularly those that require mathematical analysis. The general theme of the journal is the development and solution of problems that are adequately motivated to deal with important aspects of the design and/or analysis of transportation systems. Areas covered include: traffic flow; design and analysis of transportation networks; control and scheduling; optimization; queuing theory; logistics; supply chains; development and application of statistical, econometric and mathematical models to address transportation problems; cost models; pricing and/or investment; traveler or shipper behavior; cost-benefit methodologies.