In this paper, we address the challenge of sampling in scenarios where limited resources prevent exhaustive measurement across all subjects. We consider a setting where samples are drawn from multiple groups, each following a distribution with unknown mean and variance parameters. We introduce a novel sampling strategy, motivated simply by Cauchy-Schwarz inequality, which minimizes the variance of the population mean estimator by allocating samples proportionally to both the group size and the standard deviation. This approach improves the efficiency of sampling by focusing resources on groups with greater variability, thereby enhancing the precision of the overall estimate. Additionally, we extend our method to a two-stage sampling procedure in a Bayes approach, named BayesSRW, where a preliminary stage is used to estimate the variance, which then informs the optimal allocation of the remaining sampling budget. Through simulation examples, we demonstrate the effectiveness of our approach in reducing estimation uncertainty and providing more reliable insights in applications ranging from user experience surveys to high-dimensional peptide array studies.
{"title":"BayesSRW: Bayesian Sampling and Re-weighting approach for variance reduction","authors":"Carol Liu","doi":"arxiv-2408.15454","DOIUrl":"https://doi.org/arxiv-2408.15454","url":null,"abstract":"In this paper, we address the challenge of sampling in scenarios where\u0000limited resources prevent exhaustive measurement across all subjects. We\u0000consider a setting where samples are drawn from multiple groups, each following\u0000a distribution with unknown mean and variance parameters. We introduce a novel\u0000sampling strategy, motivated simply by Cauchy-Schwarz inequality, which\u0000minimizes the variance of the population mean estimator by allocating samples\u0000proportionally to both the group size and the standard deviation. This approach\u0000improves the efficiency of sampling by focusing resources on groups with\u0000greater variability, thereby enhancing the precision of the overall estimate.\u0000Additionally, we extend our method to a two-stage sampling procedure in a Bayes\u0000approach, named BayesSRW, where a preliminary stage is used to estimate the\u0000variance, which then informs the optimal allocation of the remaining sampling\u0000budget. Through simulation examples, we demonstrate the effectiveness of our\u0000approach in reducing estimation uncertainty and providing more reliable\u0000insights in applications ranging from user experience surveys to\u0000high-dimensional peptide array studies.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the context of financial credit risk evaluation, the fairness of machine learning models has become a critical concern, especially given the potential for biased predictions that disproportionately affect certain demographic groups. This study investigates the impact of data preprocessing, with a specific focus on Truncated Singular Value Decomposition (SVD), on the fairness and performance of probability of default models. Using a comprehensive dataset sourced from Kaggle, various preprocessing techniques, including SVD, were applied to assess their effect on model accuracy, discriminatory power, and fairness.
{"title":"The effects of data preprocessing on probability of default model fairness","authors":"Di Wu","doi":"arxiv-2408.15452","DOIUrl":"https://doi.org/arxiv-2408.15452","url":null,"abstract":"In the context of financial credit risk evaluation, the fairness of machine\u0000learning models has become a critical concern, especially given the potential\u0000for biased predictions that disproportionately affect certain demographic\u0000groups. This study investigates the impact of data preprocessing, with a\u0000specific focus on Truncated Singular Value Decomposition (SVD), on the fairness\u0000and performance of probability of default models. Using a comprehensive dataset\u0000sourced from Kaggle, various preprocessing techniques, including SVD, were\u0000applied to assess their effect on model accuracy, discriminatory power, and\u0000fairness.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address the estimation of endogenous treatment models with social interactions in both the treatment and outcome equations. We model the interactions between individuals in an internally consistent manner via a game theoretic approach based on discrete Bayesian games. This introduces a substantial computational burden in estimation which we address through a sequential version of the nested fixed point algorithm. We also provide some relevant treatment effects, and procedures for their estimation, which capture the impact on both the individual and the total sample. Our empirical application examines the impact of an individual's exercise frequency on her level of self-esteem. We find that an individual's exercise frequency is influenced by her expectation of her friends'. We also find that an individual's level of self-esteem is affected by her level of exercise and, at relatively lower levels of self-esteem, by the expectation of her friends' self-esteem.
{"title":"Endogenous Treatment Models with Social Interactions: An Application to the Impact of Exercise on Self-Esteem","authors":"Zhongjian Lin, Francis Vella","doi":"arxiv-2408.13971","DOIUrl":"https://doi.org/arxiv-2408.13971","url":null,"abstract":"We address the estimation of endogenous treatment models with social\u0000interactions in both the treatment and outcome equations. We model the\u0000interactions between individuals in an internally consistent manner via a game\u0000theoretic approach based on discrete Bayesian games. This introduces a\u0000substantial computational burden in estimation which we address through a\u0000sequential version of the nested fixed point algorithm. We also provide some\u0000relevant treatment effects, and procedures for their estimation, which capture\u0000the impact on both the individual and the total sample. Our empirical\u0000application examines the impact of an individual's exercise frequency on her\u0000level of self-esteem. We find that an individual's exercise frequency is\u0000influenced by her expectation of her friends'. We also find that an\u0000individual's level of self-esteem is affected by her level of exercise and, at\u0000relatively lower levels of self-esteem, by the expectation of her friends'\u0000self-esteem.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes how a time-varying Markov model was used to forecast housing development at a master-planned community during a transition from high to low growth. Our approach draws on detailed historical data to model the dynamics of the market participants, producing results that are entirely data-driven and free of bias. While traditional time series forecasting methods often struggle to account for nonlinear regime changes in growth, our approach successfully captures the onset of buildout as well as external economic shocks, such as the 1990 and 2008-2011 recessions and the 2021 post-pandemic boom. This research serves as a valuable tool for urban planners, homeowner associations, and property stakeholders aiming to navigate the complexities of growth at master-planned communities during periods of both system stability and instability.
{"title":"Modeling the Dynamics of Growth in Master-Planned Communities","authors":"Christopher K. Allsup, Irene S. Gabashvili","doi":"arxiv-2408.14214","DOIUrl":"https://doi.org/arxiv-2408.14214","url":null,"abstract":"This paper describes how a time-varying Markov model was used to forecast\u0000housing development at a master-planned community during a transition from high\u0000to low growth. Our approach draws on detailed historical data to model the\u0000dynamics of the market participants, producing results that are entirely\u0000data-driven and free of bias. While traditional time series forecasting methods\u0000often struggle to account for nonlinear regime changes in growth, our approach\u0000successfully captures the onset of buildout as well as external economic\u0000shocks, such as the 1990 and 2008-2011 recessions and the 2021 post-pandemic\u0000boom. This research serves as a valuable tool for urban planners, homeowner\u0000associations, and property stakeholders aiming to navigate the complexities of\u0000growth at master-planned communities during periods of both system stability\u0000and instability.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop an estimator for treatment effects in high-dimensional settings with additive measurement error, a prevalent challenge in modern econometrics. We introduce the Double/Debiased Convex Conditioned LASSO (Double/Debiased CoCoLASSO), which extends the double/debiased machine learning framework to accommodate mismeasured covariates. Our principal contributions are threefold. (1) We construct a Neyman-orthogonal score function that remains valid under measurement error, incorporating a bias correction term to account for error-induced correlations. (2) We propose a method of moments estimator for the measurement error variance, enabling implementation without prior knowledge of the error covariance structure. (3) We establish the $sqrt{N}$-consistency and asymptotic normality of our estimator under general conditions, allowing for both the number of covariates and the magnitude of measurement error to increase with the sample size. Our theoretical results demonstrate the estimator's efficiency within the class of regularized high-dimensional estimators accounting for measurement error. Monte Carlo simulations corroborate our asymptotic theory and illustrate the estimator's robust performance across various levels of measurement error. Notably, our covariance-oblivious approach nearly matches the efficiency of methods that assume known error variance.
{"title":"Double/Debiased CoCoLASSO of Treatment Effects with Mismeasured High-Dimensional Control Variables","authors":"Geonwoo Kim, Suyong Song","doi":"arxiv-2408.14671","DOIUrl":"https://doi.org/arxiv-2408.14671","url":null,"abstract":"We develop an estimator for treatment effects in high-dimensional settings\u0000with additive measurement error, a prevalent challenge in modern econometrics.\u0000We introduce the Double/Debiased Convex Conditioned LASSO (Double/Debiased\u0000CoCoLASSO), which extends the double/debiased machine learning framework to\u0000accommodate mismeasured covariates. Our principal contributions are threefold.\u0000(1) We construct a Neyman-orthogonal score function that remains valid under\u0000measurement error, incorporating a bias correction term to account for\u0000error-induced correlations. (2) We propose a method of moments estimator for\u0000the measurement error variance, enabling implementation without prior knowledge\u0000of the error covariance structure. (3) We establish the $sqrt{N}$-consistency\u0000and asymptotic normality of our estimator under general conditions, allowing\u0000for both the number of covariates and the magnitude of measurement error to\u0000increase with the sample size. Our theoretical results demonstrate the\u0000estimator's efficiency within the class of regularized high-dimensional\u0000estimators accounting for measurement error. Monte Carlo simulations\u0000corroborate our asymptotic theory and illustrate the estimator's robust\u0000performance across various levels of measurement error. Notably, our\u0000covariance-oblivious approach nearly matches the efficiency of methods that\u0000assume known error variance.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Instead of testing for unanimous agreement, I propose learning how broad of a consensus favors one distribution over another (of earnings, productivity, asset returns, test scores, etc.). Specifically, given a sample from each of two distributions, I propose statistical inference methods to learn about the set of utility functions for which the first distribution has higher expected utility than the second distribution. With high probability, an "inner" confidence set is contained within this true set, while an "outer" confidence set contains the true set. Such confidence sets can be formed by inverting a proposed multiple testing procedure that controls the familywise error rate. Theoretical justification comes from empirical process results, given that very large classes of utility functions are generally Donsker (subject to finite moments). The theory additionally justifies a uniform (over utility functions) confidence band of expected utility differences, as well as tests with a utility-based "restricted stochastic dominance" as either the null or alternative hypothesis. Simulated and empirical examples illustrate the methodology.
{"title":"Inference on Consensus Ranking of Distributions","authors":"David M. Kaplan","doi":"arxiv-2408.13949","DOIUrl":"https://doi.org/arxiv-2408.13949","url":null,"abstract":"Instead of testing for unanimous agreement, I propose learning how broad of a\u0000consensus favors one distribution over another (of earnings, productivity,\u0000asset returns, test scores, etc.). Specifically, given a sample from each of\u0000two distributions, I propose statistical inference methods to learn about the\u0000set of utility functions for which the first distribution has higher expected\u0000utility than the second distribution. With high probability, an \"inner\"\u0000confidence set is contained within this true set, while an \"outer\" confidence\u0000set contains the true set. Such confidence sets can be formed by inverting a\u0000proposed multiple testing procedure that controls the familywise error rate.\u0000Theoretical justification comes from empirical process results, given that very\u0000large classes of utility functions are generally Donsker (subject to finite\u0000moments). The theory additionally justifies a uniform (over utility functions)\u0000confidence band of expected utility differences, as well as tests with a\u0000utility-based \"restricted stochastic dominance\" as either the null or\u0000alternative hypothesis. Simulated and empirical examples illustrate the\u0000methodology.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces an econometric framework for analyzing cross-sectional dependence in the idiosyncratic volatilities of assets using high frequency data. We first consider the estimation of standard measures of dependence in the idiosyncratic volatilities such as covariances and correlations. Naive estimators of these measures are biased due to the use of the error-laden estimates of idiosyncratic volatilities. We provide bias-corrected estimators and the relevant asymptotic theory. Next, we introduce an idiosyncratic volatility factor model, in which we decompose the variation in idiosyncratic volatilities into two parts: the variation related to the systematic factors such as the market volatility, and the residual variation. Again, naive estimators of the decomposition are biased, and we provide bias-corrected estimators. We also provide the asymptotic theory that allows us to test whether the residual (non-systematic) components of the idiosyncratic volatilities exhibit cross-sectional dependence. We apply our methodology to the S&P 100 index constituents, and document strong cross-sectional dependence in their idiosyncratic volatilities. We consider two different sets of idiosyncratic volatility factors, and find that neither can fully account for the cross-sectional dependence in idiosyncratic volatilities. For each model, we map out the network of dependencies in residual (non-systematic) idiosyncratic volatilities across all stocks.
{"title":"Cross-sectional Dependence in Idiosyncratic Volatility","authors":"Ilze Kalnina, Kokouvi Tewou","doi":"arxiv-2408.13437","DOIUrl":"https://doi.org/arxiv-2408.13437","url":null,"abstract":"This paper introduces an econometric framework for analyzing cross-sectional\u0000dependence in the idiosyncratic volatilities of assets using high frequency\u0000data. We first consider the estimation of standard measures of dependence in\u0000the idiosyncratic volatilities such as covariances and correlations. Naive\u0000estimators of these measures are biased due to the use of the error-laden\u0000estimates of idiosyncratic volatilities. We provide bias-corrected estimators\u0000and the relevant asymptotic theory. Next, we introduce an idiosyncratic\u0000volatility factor model, in which we decompose the variation in idiosyncratic\u0000volatilities into two parts: the variation related to the systematic factors\u0000such as the market volatility, and the residual variation. Again, naive\u0000estimators of the decomposition are biased, and we provide bias-corrected\u0000estimators. We also provide the asymptotic theory that allows us to test\u0000whether the residual (non-systematic) components of the idiosyncratic\u0000volatilities exhibit cross-sectional dependence. We apply our methodology to\u0000the S&P 100 index constituents, and document strong cross-sectional dependence\u0000in their idiosyncratic volatilities. We consider two different sets of\u0000idiosyncratic volatility factors, and find that neither can fully account for\u0000the cross-sectional dependence in idiosyncratic volatilities. For each model,\u0000we map out the network of dependencies in residual (non-systematic)\u0000idiosyncratic volatilities across all stocks.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We explore tree-based macroeconomic regime-switching in the context of the dynamic Nelson-Siegel (DNS) yield-curve model. In particular, we customize the tree-growing algorithm to partition macroeconomic variables based on the DNS model's marginal likelihood, thereby identifying regime-shifting patterns in the yield curve. Compared to traditional Markov-switching models, our model offers clear economic interpretation via macroeconomic linkages and ensures computational simplicity. In an empirical application to U.S. Treasury bond yields, we find (1) important yield curve regime switching, and (2) evidence that macroeconomic variables have predictive power for the yield curve when the short rate is high, but not in other regimes, thereby refining the notion of yield curve ``macro-spanning".
我们以动态 Nelson-Siegel(DNS)收益率曲线模型为背景,探讨了基于树的宏观经济制度转换。特别是,我们定制了树状生长算法,以根据 DNS 模型的边际似然率划分宏观经济变量,从而识别收益率曲线的制度转换模式。与传统的马尔可夫转换模型相比,我们的模型通过宏观经济联系提供了清晰的经济解释,并确保了计算的简便性。在对美国国债收益率的实证应用中,我们发现:(1)收益率曲线存在重要的制度转换;(2)有证据表明,当空头利率较高时,宏观经济变量对收益率曲线具有预测能力,但在其他制度下则没有,从而完善了收益率曲线 "宏观跨度 "的概念。
{"title":"Machine Learning and the Yield Curve: Tree-Based Macroeconomic Regime Switching","authors":"Siyu Bie, Francis X. Diebold, Jingyu He, Junye Li","doi":"arxiv-2408.12863","DOIUrl":"https://doi.org/arxiv-2408.12863","url":null,"abstract":"We explore tree-based macroeconomic regime-switching in the context of the\u0000dynamic Nelson-Siegel (DNS) yield-curve model. In particular, we customize the\u0000tree-growing algorithm to partition macroeconomic variables based on the DNS\u0000model's marginal likelihood, thereby identifying regime-shifting patterns in\u0000the yield curve. Compared to traditional Markov-switching models, our model\u0000offers clear economic interpretation via macroeconomic linkages and ensures\u0000computational simplicity. In an empirical application to U.S. Treasury bond\u0000yields, we find (1) important yield curve regime switching, and (2) evidence\u0000that macroeconomic variables have predictive power for the yield curve when the\u0000short rate is high, but not in other regimes, thereby refining the notion of\u0000yield curve ``macro-spanning\".","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pooled panel analyses tend to mask heterogeneity in unit-specific treatment effects. For example, existing studies on the impact of democracy on economic growth do not reach a consensus as empirical findings are substantially heterogeneous in the country composition of the panel. In contrast to pooled panel analyses, this paper proposes a Difference-in-Differences (DiD) estimator that exploits the temporal dimension in the data and estimates unit-specific average treatment effects on the treated (ATT) with as few as two cross-sectional units. Under weak identification and temporal dependence conditions, the DiD estimator is asymptotically normal. The estimator is further complemented with a test of identification granted at least two candidate control units. Empirical results using the DiD estimator suggest Benin's economy would have been 6.3% smaller on average over the 1993-2018 period had she not democratised.
集合面板分析往往会掩盖特定单位治疗效果的异质性。例如,关于民主对经济增长影响的现有研究并没有达成共识,因为实证研究结果在面板的国家构成方面存在很大的异质性。与集合面板分析不同,本文提出了一种差分(DiD)估计方法,利用数据中的时间维度,估计特定单位对被治疗者的平均治疗效果(ATT),只需两个跨部门单位。在弱识别和时间依赖性条件下,DiD 估计器是渐近正态的。此外,该估计器还得到了至少两个候选控制单元的识别检验的补充。使用 DiD 估计器的实证结果表明,如果贝宁没有实现民主化,1993-2018 年期间的经济规模平均会缩小 6.3%。
{"title":"Difference-in-differences with as few as two cross-sectional units -- A new perspective to the democracy-growth debate","authors":"Gilles Koumou, Emmanuel Selorm Tsyawo","doi":"arxiv-2408.13047","DOIUrl":"https://doi.org/arxiv-2408.13047","url":null,"abstract":"Pooled panel analyses tend to mask heterogeneity in unit-specific treatment\u0000effects. For example, existing studies on the impact of democracy on economic\u0000growth do not reach a consensus as empirical findings are substantially\u0000heterogeneous in the country composition of the panel. In contrast to pooled\u0000panel analyses, this paper proposes a Difference-in-Differences (DiD) estimator\u0000that exploits the temporal dimension in the data and estimates unit-specific\u0000average treatment effects on the treated (ATT) with as few as two\u0000cross-sectional units. Under weak identification and temporal dependence\u0000conditions, the DiD estimator is asymptotically normal. The estimator is\u0000further complemented with a test of identification granted at least two\u0000candidate control units. Empirical results using the DiD estimator suggest\u0000Benin's economy would have been 6.3% smaller on average over the 1993-2018\u0000period had she not democratised.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiyuan Ren, Joseph Y. J. Chow, Venktesh Pandey, Linfei Yuan
As an IT-enabled multi-passenger mobility service, microtransit has the potential to improve accessibility, reduce congestion, and enhance flexibility in transportation options. However, due to its heterogeneous impacts on different communities and population segments, there is a need for better tools in microtransit forecast and revenue management, especially when actual usage data are limited. We propose a novel framework based on an agent-based mixed logit model estimated with microtransit usage data and synthetic trip data. The framework involves estimating a lower-branch mode choice model with synthetic trip data, combining lower-branch parameters with microtransit data to estimate an upper-branch ride pass subscription model, and applying the nested model to evaluate microtransit pricing and subsidy policies. The framework enables further decision-support analysis to consider diverse travel patterns and heterogeneous tastes of the total population. We test the framework in a case study with synthetic trip data from Replica Inc. and microtransit data from Arlington Via. The lower-branch model result in a rho-square value of 0.603 on weekdays and 0.576 on weekends. Predictions made by the upper-branch model closely match the marginal subscription data. In a ride pass pricing policy scenario, we show that a discount in weekly pass (from $25 to $18.9) and monthly pass (from $80 to $71.5) would surprisingly increase total revenue by $102/day. In an event- or place-based subsidy policy scenario, we show that a 100% fare discount would reduce 80 car trips during peak hours at AT&T Stadium, requiring a subsidy of $32,068/year.
{"title":"Integrating an agent-based behavioral model in microtransit forecasting and revenue management","authors":"Xiyuan Ren, Joseph Y. J. Chow, Venktesh Pandey, Linfei Yuan","doi":"arxiv-2408.12577","DOIUrl":"https://doi.org/arxiv-2408.12577","url":null,"abstract":"As an IT-enabled multi-passenger mobility service, microtransit has the\u0000potential to improve accessibility, reduce congestion, and enhance flexibility\u0000in transportation options. However, due to its heterogeneous impacts on\u0000different communities and population segments, there is a need for better tools\u0000in microtransit forecast and revenue management, especially when actual usage\u0000data are limited. We propose a novel framework based on an agent-based mixed\u0000logit model estimated with microtransit usage data and synthetic trip data. The\u0000framework involves estimating a lower-branch mode choice model with synthetic\u0000trip data, combining lower-branch parameters with microtransit data to estimate\u0000an upper-branch ride pass subscription model, and applying the nested model to\u0000evaluate microtransit pricing and subsidy policies. The framework enables\u0000further decision-support analysis to consider diverse travel patterns and\u0000heterogeneous tastes of the total population. We test the framework in a case\u0000study with synthetic trip data from Replica Inc. and microtransit data from\u0000Arlington Via. The lower-branch model result in a rho-square value of 0.603 on\u0000weekdays and 0.576 on weekends. Predictions made by the upper-branch model\u0000closely match the marginal subscription data. In a ride pass pricing policy\u0000scenario, we show that a discount in weekly pass (from $25 to $18.9) and\u0000monthly pass (from $80 to $71.5) would surprisingly increase total revenue by\u0000$102/day. In an event- or place-based subsidy policy scenario, we show that a\u0000100% fare discount would reduce 80 car trips during peak hours at AT&T Stadium,\u0000requiring a subsidy of $32,068/year.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}