在不平衡面板数据集上建立选择行为模型时揭示和减少偏差

IF 2.4 3区经济学 Q1 ECONOMICS Journal of Choice Modelling Pub Date : 2024-03-01 Epub Date: 2024-01-18 DOI:10.1016/j.jocm.2024.100471

Mirosława Łukawska, Laurent Cazor, Mads Paulsen, Thomas Kjær Rasmussen, Otto Anker Nielsen

{"title":"在不平衡面板数据集上建立选择行为模型时揭示和减少偏差","authors":"Mirosława Łukawska, Laurent Cazor, Mads Paulsen, Thomas Kjær Rasmussen, Otto Anker Nielsen","doi":"10.1016/j.jocm.2024.100471","DOIUrl":null,"url":null,"abstract":"<div><p>The emergence of modern tools and technologies gives a unique opportunity to collect large amounts of data for understanding behaviour. However, the generated datasets are often imbalanced, as individuals might contribute to the datasets at different frequencies and periods. Models based on these datasets are challenging to estimate, and the results are not straightforward to interpret without considering the sample structure. This study investigates the issue of handling imbalanced panel datasets for modelling individual behaviour. It first conducts a simulation experiment to study to which degree mixed logit models with and without panel reproduce the population preferences when using imbalanced data. It then investigates how the application of bias reduction strategies, such as subsampling and likelihood weighting, influences model results and finds that combining these techniques helps to find an optimal trade-off between bias and variance of the estimates. Considering the conclusions from the simulation study, a large-scale case study estimates bicycle route choice models with different correction strategies. These strategies are compared in terms of efficiency, weighted fit measures, and computational burden to provide recommendations that fit the modelling purpose. We find that the weighted panel mixed multinomial logit model, estimated on the entire dataset, performs best in terms of minimising the bias-efficiency trade-off in the estimates. Finally, we propose a strategy that ensures equal contribution of each individual to the estimation results, regardless of their representation in the sample, while reducing the computational burden related to estimating models on large datasets.</p></div>","PeriodicalId":46863,"journal":{"name":"Journal of Choice Modelling","volume":"50 ","pages":"Article 100471"},"PeriodicalIF":2.4000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1755534524000046/pdfft?md5=7ba46a5a4007cd14820c35c90ef2af12&pid=1-s2.0-S1755534524000046-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Revealing and reducing bias when modelling choice behaviour on imbalanced panel datasets\",\"authors\":\"Mirosława Łukawska, Laurent Cazor, Mads Paulsen, Thomas Kjær Rasmussen, Otto Anker Nielsen\",\"doi\":\"10.1016/j.jocm.2024.100471\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The emergence of modern tools and technologies gives a unique opportunity to collect large amounts of data for understanding behaviour. However, the generated datasets are often imbalanced, as individuals might contribute to the datasets at different frequencies and periods. Models based on these datasets are challenging to estimate, and the results are not straightforward to interpret without considering the sample structure. This study investigates the issue of handling imbalanced panel datasets for modelling individual behaviour. It first conducts a simulation experiment to study to which degree mixed logit models with and without panel reproduce the population preferences when using imbalanced data. It then investigates how the application of bias reduction strategies, such as subsampling and likelihood weighting, influences model results and finds that combining these techniques helps to find an optimal trade-off between bias and variance of the estimates. Considering the conclusions from the simulation study, a large-scale case study estimates bicycle route choice models with different correction strategies. These strategies are compared in terms of efficiency, weighted fit measures, and computational burden to provide recommendations that fit the modelling purpose. We find that the weighted panel mixed multinomial logit model, estimated on the entire dataset, performs best in terms of minimising the bias-efficiency trade-off in the estimates. Finally, we propose a strategy that ensures equal contribution of each individual to the estimation results, regardless of their representation in the sample, while reducing the computational burden related to estimating models on large datasets.</p></div>\",\"PeriodicalId\":46863,\"journal\":{\"name\":\"Journal of Choice Modelling\",\"volume\":\"50 \",\"pages\":\"Article 100471\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1755534524000046/pdfft?md5=7ba46a5a4007cd14820c35c90ef2af12&pid=1-s2.0-S1755534524000046-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Choice Modelling\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1755534524000046\",\"RegionNum\":3,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Choice Modelling","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1755534524000046","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

摘要

现代工具和技术的出现为收集大量数据以了解行为提供了独特的机会。然而，所生成的数据集往往是不平衡的，因为个体可能以不同的频率和时期为数据集做出贡献。在不考虑样本结构的情况下，基于这些数据集的模型估计具有挑战性，而且结果也无法直接解释。本研究探讨了处理不平衡面板数据集以建立个人行为模型的问题。它首先进行了一项模拟实验，研究在使用不平衡数据时，有面板和无面板的混合 Logit 模型在多大程度上再现了人群偏好。然后，研究了子采样和似然加权等减少偏差策略的应用如何影响模型结果，并发现结合这些技术有助于在偏差和估计方差之间找到最佳权衡。考虑到模拟研究的结论，一项大规模案例研究采用不同的修正策略对自行车路线选择模型进行了估算。从效率、加权拟合度量和计算负担等方面对这些策略进行了比较，以提供符合建模目的的建议。我们发现，对整个数据集进行估算的加权面板混合多二项对数模型在最小化估算结果的偏差-效率权衡方面表现最佳。最后，我们提出了一种策略，可确保每个个体对估算结果的贡献均等，无论其在样本中的代表性如何，同时减轻在大型数据集上估算模型的计算负担。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Revealing and reducing bias when modelling choice behaviour on imbalanced panel datasets

The emergence of modern tools and technologies gives a unique opportunity to collect large amounts of data for understanding behaviour. However, the generated datasets are often imbalanced, as individuals might contribute to the datasets at different frequencies and periods. Models based on these datasets are challenging to estimate, and the results are not straightforward to interpret without considering the sample structure. This study investigates the issue of handling imbalanced panel datasets for modelling individual behaviour. It first conducts a simulation experiment to study to which degree mixed logit models with and without panel reproduce the population preferences when using imbalanced data. It then investigates how the application of bias reduction strategies, such as subsampling and likelihood weighting, influences model results and finds that combining these techniques helps to find an optimal trade-off between bias and variance of the estimates. Considering the conclusions from the simulation study, a large-scale case study estimates bicycle route choice models with different correction strategies. These strategies are compared in terms of efficiency, weighted fit measures, and computational burden to provide recommendations that fit the modelling purpose. We find that the weighted panel mixed multinomial logit model, estimated on the entire dataset, performs best in terms of minimising the bias-efficiency trade-off in the estimates. Finally, we propose a strategy that ensures equal contribution of each individual to the estimation results, regardless of their representation in the sample, while reducing the computational burden related to estimating models on large datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Choice Modelling ECONOMICS-

CiteScore

4.10

自引率

12.50%

发文量