We present recent progress in the design and development of DEPLOYERS, an agent-based macroeconomics modeling (ABM) framework, capable to deploy and simulate a full economic system (individual workers, goods and services firms, government, central and private banks, financial market, external sectors) whose structure and activity analysis reproduce the desired calibration data, that can be, for example a Social Accounting Matrix (SAM) or a Supply-Use Table (SUT) or an Input-Output Table (IOT).Here we extend our previous work to a multi-country version and show an example using data from a 46-countries 64-sectors FIGARO Inter-Country IOT. The simulation of each country runs on a separate thread or CPU core to simulate the activity of one step (month, week, or day) and then interacts (updates imports, exports, transfer) with that country's foreign partners, and proceeds to the next step. This interaction can be chosen to be aggregated (a single row and column IO account) or disaggregated (64 rows and columns) with each partner. A typical run simulates thousands of individuals and firms engaged in their monthly activity and then records the results, much like a survey of the country's economic system. This data can then be subjected to, for example, an Input-Output analysis to find out the sources of observed stylized effects as a function of time in the detailed and realistic modeling environment that can be easily implemented in an ABM framework.
{"title":"DEPLOYERS: An agent based modeling tool for multi country real world data","authors":"Martin Jaraiz, Ruth Pinacho","doi":"arxiv-2409.04876","DOIUrl":"https://doi.org/arxiv-2409.04876","url":null,"abstract":"We present recent progress in the design and development of DEPLOYERS, an\u0000agent-based macroeconomics modeling (ABM) framework, capable to deploy and\u0000simulate a full economic system (individual workers, goods and services firms,\u0000government, central and private banks, financial market, external sectors)\u0000whose structure and activity analysis reproduce the desired calibration data,\u0000that can be, for example a Social Accounting Matrix (SAM) or a Supply-Use Table\u0000(SUT) or an Input-Output Table (IOT).Here we extend our previous work to a\u0000multi-country version and show an example using data from a 46-countries\u000064-sectors FIGARO Inter-Country IOT. The simulation of each country runs on a\u0000separate thread or CPU core to simulate the activity of one step (month, week,\u0000or day) and then interacts (updates imports, exports, transfer) with that\u0000country's foreign partners, and proceeds to the next step. This interaction can\u0000be chosen to be aggregated (a single row and column IO account) or\u0000disaggregated (64 rows and columns) with each partner. A typical run simulates\u0000thousands of individuals and firms engaged in their monthly activity and then\u0000records the results, much like a survey of the country's economic system. This\u0000data can then be subjected to, for example, an Input-Output analysis to find\u0000out the sources of observed stylized effects as a function of time in the\u0000detailed and realistic modeling environment that can be easily implemented in\u0000an ABM framework.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"178 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning techniques are widely used for estimating causal effects. Double/debiased machine learning (DML) (Chernozhukov et al., 2018) uses a double-robust score function that relies on the prediction of nuisance functions, such as the propensity score, which is the probability of treatment assignment conditional on covariates. Estimators relying on double-robust score functions are highly sensitive to errors in propensity score predictions. Machine learners increase the severity of this problem as they tend to over- or underestimate these probabilities. Several calibration approaches have been proposed to improve probabilistic forecasts of machine learners. This paper investigates the use of probability calibration approaches within the DML framework. Simulation results demonstrate that calibrating propensity scores may significantly reduces the root mean squared error of DML estimates of the average treatment effect in finite samples. We showcase it in an empirical example and provide conditions under which calibration does not alter the asymptotic properties of the DML estimator.
{"title":"Improving the Finite Sample Performance of Double/Debiased Machine Learning with Propensity Score Calibration","authors":"Daniele Ballinari, Nora Bearth","doi":"arxiv-2409.04874","DOIUrl":"https://doi.org/arxiv-2409.04874","url":null,"abstract":"Machine learning techniques are widely used for estimating causal effects.\u0000Double/debiased machine learning (DML) (Chernozhukov et al., 2018) uses a\u0000double-robust score function that relies on the prediction of nuisance\u0000functions, such as the propensity score, which is the probability of treatment\u0000assignment conditional on covariates. Estimators relying on double-robust score\u0000functions are highly sensitive to errors in propensity score predictions.\u0000Machine learners increase the severity of this problem as they tend to over- or\u0000underestimate these probabilities. Several calibration approaches have been\u0000proposed to improve probabilistic forecasts of machine learners. This paper\u0000investigates the use of probability calibration approaches within the DML\u0000framework. Simulation results demonstrate that calibrating propensity scores\u0000may significantly reduces the root mean squared error of DML estimates of the\u0000average treatment effect in finite samples. We showcase it in an empirical\u0000example and provide conditions under which calibration does not alter the\u0000asymptotic properties of the DML estimator.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the causal effect of job training on wage rates in the presence of firm heterogeneity. When training affects worker sorting to firms, sample selection is no longer binary but is "multilayered". This paper extends the canonical Heckman (1979) sample selection model - which assumes selection is binary - to a setting where it is multilayered, and shows that in this setting Lee bounds set identifies a total effect that combines a weighted-average of the causal effect of job training on wage rates across firms with a weighted-average of the contrast in wages between different firms for a fixed level of training. Thus, Lee bounds set identifies a policy-relevant estimand only when firms pay homogeneous wages and/or when job training does not affect worker sorting across firms. We derive sharp closed-form bounds for the causal effect of job training on wage rates at each firm which leverage information on firm-specific wages. We illustrate our partial identification approach with an empirical application to the Job Corps Study. Results show that while conventional Lee bounds are strictly positive, our within-firm bounds include 0 showing that canonical Lee bounds may be capturing a pure sorting effect of job training.
{"title":"Lee Bounds with Multilayered Sample Selection","authors":"Kory Kroft, Ismael Mourifié, Atom Vayalinkal","doi":"arxiv-2409.04589","DOIUrl":"https://doi.org/arxiv-2409.04589","url":null,"abstract":"This paper investigates the causal effect of job training on wage rates in\u0000the presence of firm heterogeneity. When training affects worker sorting to\u0000firms, sample selection is no longer binary but is \"multilayered\". This paper\u0000extends the canonical Heckman (1979) sample selection model - which assumes\u0000selection is binary - to a setting where it is multilayered, and shows that in\u0000this setting Lee bounds set identifies a total effect that combines a\u0000weighted-average of the causal effect of job training on wage rates across\u0000firms with a weighted-average of the contrast in wages between different firms\u0000for a fixed level of training. Thus, Lee bounds set identifies a\u0000policy-relevant estimand only when firms pay homogeneous wages and/or when job\u0000training does not affect worker sorting across firms. We derive sharp\u0000closed-form bounds for the causal effect of job training on wage rates at each\u0000firm which leverage information on firm-specific wages. We illustrate our\u0000partial identification approach with an empirical application to the Job Corps\u0000Study. Results show that while conventional Lee bounds are strictly positive,\u0000our within-firm bounds include 0 showing that canonical Lee bounds may be\u0000capturing a pure sorting effect of job training.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a constrained maximum likelihood estimator for sequential search models, using the MPEC (Mathematical Programming with Equilibrium Constraints) approach. This method enhances numerical accuracy while avoiding ad hoc components and errors related to equilibrium conditions. Monte Carlo simulations show that the estimator performs better in small samples, with lower bias and root-mean-squared error, though less effectively in large samples. Despite these mixed results, the MPEC approach remains valuable for identifying candidate parameters comparable to the benchmark, without relying on ad hoc look-up tables, as it generates the table through solved equilibrium constraints.
{"title":"An MPEC Estimator for the Sequential Search Model","authors":"Shinji Koiso, Suguru Otani","doi":"arxiv-2409.04378","DOIUrl":"https://doi.org/arxiv-2409.04378","url":null,"abstract":"This paper proposes a constrained maximum likelihood estimator for sequential\u0000search models, using the MPEC (Mathematical Programming with Equilibrium\u0000Constraints) approach. This method enhances numerical accuracy while avoiding\u0000ad hoc components and errors related to equilibrium conditions. Monte Carlo\u0000simulations show that the estimator performs better in small samples, with\u0000lower bias and root-mean-squared error, though less effectively in large\u0000samples. Despite these mixed results, the MPEC approach remains valuable for\u0000identifying candidate parameters comparable to the benchmark, without relying\u0000on ad hoc look-up tables, as it generates the table through solved equilibrium\u0000constraints.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a novel method for estimating and conducting inference about extreme quantile treatment effects (QTEs) in the presence of endogeneity. Our approach is applicable to a broad range of empirical research designs, including instrumental variables design and regression discontinuity design, among others. By leveraging regular variation and subsampling, the method ensures robust performance even in extreme tails, where data may be sparse or entirely absent. Simulation studies confirm the theoretical robustness of our approach. Applying our method to assess the impact of job training provided by the Job Training Partnership Act (JTPA), we find significantly negative QTEs for the lowest quantiles (i.e., the most disadvantaged individuals), contrasting with previous literature that emphasizes positive QTEs for intermediate quantiles.
{"title":"Extreme Quantile Treatment Effects under Endogeneity: Evaluating Policy Effects for the Most Vulnerable Individuals","authors":"Yuya Sasaki, Yulong Wang","doi":"arxiv-2409.03979","DOIUrl":"https://doi.org/arxiv-2409.03979","url":null,"abstract":"We introduce a novel method for estimating and conducting inference about\u0000extreme quantile treatment effects (QTEs) in the presence of endogeneity. Our\u0000approach is applicable to a broad range of empirical research designs,\u0000including instrumental variables design and regression discontinuity design,\u0000among others. By leveraging regular variation and subsampling, the method\u0000ensures robust performance even in extreme tails, where data may be sparse or\u0000entirely absent. Simulation studies confirm the theoretical robustness of our\u0000approach. Applying our method to assess the impact of job training provided by\u0000the Job Training Partnership Act (JTPA), we find significantly negative QTEs\u0000for the lowest quantiles (i.e., the most disadvantaged individuals),\u0000contrasting with previous literature that emphasizes positive QTEs for\u0000intermediate quantiles.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"395 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Brownlees, Guðmundur Stefán Guðmundsson, Yaping Wang
This paper establishes bounds on the predictive performance of empirical risk minimization for principal component regression. Our analysis is nonparametric, in the sense that the relation between the prediction target and the predictors is not specified. In particular, we do not rely on the assumption that the prediction target is generated by a factor model. In our analysis we consider the cases in which the largest eigenvalues of the covariance matrix of the predictors grow linearly in the number of predictors (strong signal regime) or sublinearly (weak signal regime). The main result of this paper shows that empirical risk minimization for principal component regression is consistent for prediction and, under appropriate conditions, it achieves optimal performance (up to a logarithmic factor) in both the strong and weak signal regimes.
{"title":"Performance of Empirical Risk Minimization For Principal Component Regression","authors":"Christian Brownlees, Guðmundur Stefán Guðmundsson, Yaping Wang","doi":"arxiv-2409.03606","DOIUrl":"https://doi.org/arxiv-2409.03606","url":null,"abstract":"This paper establishes bounds on the predictive performance of empirical risk\u0000minimization for principal component regression. Our analysis is nonparametric,\u0000in the sense that the relation between the prediction target and the predictors\u0000is not specified. In particular, we do not rely on the assumption that the\u0000prediction target is generated by a factor model. In our analysis we consider\u0000the cases in which the largest eigenvalues of the covariance matrix of the\u0000predictors grow linearly in the number of predictors (strong signal regime) or\u0000sublinearly (weak signal regime). The main result of this paper shows that\u0000empirical risk minimization for principal component regression is consistent\u0000for prediction and, under appropriate conditions, it achieves optimal\u0000performance (up to a logarithmic factor) in both the strong and weak signal\u0000regimes.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the field of fresh produce retail, vegetables generally have a relatively limited shelf life, and their quality deteriorates with time. Most vegetable varieties, if not sold on the day of delivery, become difficult to sell the following day. Therefore, retailers usually perform daily quantitative replenishment based on historical sales data and demand conditions. Vegetable pricing typically uses a "cost-plus pricing" method, with retailers often discounting products affected by transportation loss and quality decline. In this context, reliable market demand analysis is crucial as it directly impacts replenishment and pricing decisions. Given the limited retail space, a rational sales mix becomes essential. This paper first uses data analysis and visualization techniques to examine the distribution patterns and interrelationships of vegetable sales quantities by category and individual item, based on provided data on vegetable types, sales records, wholesale prices, and recent loss rates. Next, it constructs a functional relationship between total sales volume and cost-plus pricing for vegetable categories, forecasts future wholesale prices using the ARIMA model, and establishes a sales profit function and constraints. A nonlinear programming model is then developed and solved to provide daily replenishment quantities and pricing strategies for each vegetable category for the upcoming week. Further, we optimize the profit function and constraints based on the actual sales conditions and requirements, providing replenishment quantities and pricing strategies for individual items on July 1 to maximize retail profit. Finally, to better formulate replenishment and pricing decisions for vegetable products, we discuss and forecast the data that retailers need to collect and analyses how the collected data can be applied to the above issues.
{"title":"Automatic Pricing and Replenishment Strategies for Vegetable Products Based on Data Analysis and Nonlinear Programming","authors":"Mingpu Ma","doi":"arxiv-2409.09065","DOIUrl":"https://doi.org/arxiv-2409.09065","url":null,"abstract":"In the field of fresh produce retail, vegetables generally have a relatively\u0000limited shelf life, and their quality deteriorates with time. Most vegetable\u0000varieties, if not sold on the day of delivery, become difficult to sell the\u0000following day. Therefore, retailers usually perform daily quantitative\u0000replenishment based on historical sales data and demand conditions. Vegetable\u0000pricing typically uses a \"cost-plus pricing\" method, with retailers often\u0000discounting products affected by transportation loss and quality decline. In\u0000this context, reliable market demand analysis is crucial as it directly impacts\u0000replenishment and pricing decisions. Given the limited retail space, a rational\u0000sales mix becomes essential. This paper first uses data analysis and\u0000visualization techniques to examine the distribution patterns and\u0000interrelationships of vegetable sales quantities by category and individual\u0000item, based on provided data on vegetable types, sales records, wholesale\u0000prices, and recent loss rates. Next, it constructs a functional relationship\u0000between total sales volume and cost-plus pricing for vegetable categories,\u0000forecasts future wholesale prices using the ARIMA model, and establishes a\u0000sales profit function and constraints. A nonlinear programming model is then\u0000developed and solved to provide daily replenishment quantities and pricing\u0000strategies for each vegetable category for the upcoming week. Further, we\u0000optimize the profit function and constraints based on the actual sales\u0000conditions and requirements, providing replenishment quantities and pricing\u0000strategies for individual items on July 1 to maximize retail profit. Finally,\u0000to better formulate replenishment and pricing decisions for vegetable products,\u0000we discuss and forecast the data that retailers need to collect and analyses\u0000how the collected data can be applied to the above issues.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper explores the concept of "momentum" in sports competitions through the use of the TOPSIS model and 0-1 logistic regression model. First, the TOPSIS model is employed to evaluate the performance of two tennis players, with visualizations used to analyze the situation's evolution at every moment in the match, explaining how "momentum" manifests in sports. Then, the 0-1 logistic regression model is utilized to verify the impact of "momentum" on match outcomes, demonstrating that fluctuations in player performance and the successive occurrence of successes are not random. Additionally, this paper examines the indicators that influence the reversal of game situations by analyzing key match data and testing the accuracy of the models with match data. The findings show that the model accurately explains the conditions during matches and can be generalized to other sports competitions. Finally, the strengths, weaknesses, and potential future improvements of the model are discussed.
{"title":"Momentum Dynamics in Competitive Sports: A Multi-Model Analysis Using TOPSIS and Logistic Regression","authors":"Mingpu Ma","doi":"arxiv-2409.02872","DOIUrl":"https://doi.org/arxiv-2409.02872","url":null,"abstract":"This paper explores the concept of \"momentum\" in sports competitions through\u0000the use of the TOPSIS model and 0-1 logistic regression model. First, the\u0000TOPSIS model is employed to evaluate the performance of two tennis players,\u0000with visualizations used to analyze the situation's evolution at every moment\u0000in the match, explaining how \"momentum\" manifests in sports. Then, the 0-1\u0000logistic regression model is utilized to verify the impact of \"momentum\" on\u0000match outcomes, demonstrating that fluctuations in player performance and the\u0000successive occurrence of successes are not random. Additionally, this paper\u0000examines the indicators that influence the reversal of game situations by\u0000analyzing key match data and testing the accuracy of the models with match\u0000data. The findings show that the model accurately explains the conditions\u0000during matches and can be generalized to other sports competitions. Finally,\u0000the strengths, weaknesses, and potential future improvements of the model are\u0000discussed.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The urban-rural consumption gap, as one of the important indicators in social development, directly reflects the imbalance in urban and rural economic and social development. Data elements, as an important component of New Quality Productivity, are of significant importance in promoting economic development and improving people's living standards in the information age. This study, through the analysis of fixed-effects regression models, system GMM regression models, and the intermediate effect model, found that the development level of data elements to some extent promotes the narrowing of the urban-rural consumption gap. At the same time, the intermediate variable of urban-rural income gap plays an important role between data elements and consumption gap, with a significant intermediate effect. The results of the study indicate that the advancement of data elements can promote the balance of urban and rural residents' consumption levels by reducing the urban-rural income gap, providing theoretical support and policy recommendations for achieving common prosperity and promoting coordinated urban-rural development. Building upon this, this paper emphasizes the complex correlation between the development of data elements and the urban-rural consumption gap, and puts forward policy suggestions such as promoting the development of the data element market, strengthening the construction of the digital economy and e-commerce, and promoting integrated urban-rural development. Overall, the development of data elements is not only an important path to reducing the urban-rural consumption gap but also one of the key drivers for promoting the balanced development of China's economic and social development. This study has a certain theoretical and practical significance for understanding the mechanism of the urban-rural consumption gap and improving policies for urban-rural economic development.
{"title":"The Impact of Data Elements on Narrowing the Urban-Rural Consumption Gap in China: Mechanisms and Policy Analysis","authors":"Mingpu Ma","doi":"arxiv-2409.02662","DOIUrl":"https://doi.org/arxiv-2409.02662","url":null,"abstract":"The urban-rural consumption gap, as one of the important indicators in social\u0000development, directly reflects the imbalance in urban and rural economic and\u0000social development. Data elements, as an important component of New Quality\u0000Productivity, are of significant importance in promoting economic development\u0000and improving people's living standards in the information age. This study,\u0000through the analysis of fixed-effects regression models, system GMM regression\u0000models, and the intermediate effect model, found that the development level of\u0000data elements to some extent promotes the narrowing of the urban-rural\u0000consumption gap. At the same time, the intermediate variable of urban-rural\u0000income gap plays an important role between data elements and consumption gap,\u0000with a significant intermediate effect. The results of the study indicate that\u0000the advancement of data elements can promote the balance of urban and rural\u0000residents' consumption levels by reducing the urban-rural income gap, providing\u0000theoretical support and policy recommendations for achieving common prosperity\u0000and promoting coordinated urban-rural development. Building upon this, this\u0000paper emphasizes the complex correlation between the development of data\u0000elements and the urban-rural consumption gap, and puts forward policy\u0000suggestions such as promoting the development of the data element market,\u0000strengthening the construction of the digital economy and e-commerce, and\u0000promoting integrated urban-rural development. Overall, the development of data\u0000elements is not only an important path to reducing the urban-rural consumption\u0000gap but also one of the key drivers for promoting the balanced development of\u0000China's economic and social development. This study has a certain theoretical\u0000and practical significance for understanding the mechanism of the urban-rural\u0000consumption gap and improving policies for urban-rural economic development.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of fitting a relationship (e.g. a potential scientific law) to data involving multiple variables. Ordinary (least squares) regression is not suitable for this because the estimated relationship will differ according to which variable is chosen as being dependent, and the dependent variable is unrealistically assumed to be the only variable which has any measurement error (noise). We present a very general method for estimating a linear functional relationship between multiple noisy variables, which are treated impartially, i.e. no distinction between dependent and independent variables. The data are not assumed to follow any distribution, but all variables are treated as being equally reliable. Our approach extends the geometric mean functional relationship to multiple dimensions. This is especially useful with variables measured in different units, as it is naturally scale-invariant, whereas orthogonal regression is not. This is because our approach is not based on minimizing distances, but on the symmetric concept of correlation. The estimated coefficients are easily obtained from the covariances or correlations, and correspond to geometric means of associated least squares coefficients. The ease of calculation will hopefully allow widespread application of impartial fitting to estimate relationships in a neutral way.
{"title":"Fitting an Equation to Data Impartially","authors":"Chris Tofallis","doi":"arxiv-2409.02573","DOIUrl":"https://doi.org/arxiv-2409.02573","url":null,"abstract":"We consider the problem of fitting a relationship (e.g. a potential\u0000scientific law) to data involving multiple variables. Ordinary (least squares)\u0000regression is not suitable for this because the estimated relationship will\u0000differ according to which variable is chosen as being dependent, and the\u0000dependent variable is unrealistically assumed to be the only variable which has\u0000any measurement error (noise). We present a very general method for estimating\u0000a linear functional relationship between multiple noisy variables, which are\u0000treated impartially, i.e. no distinction between dependent and independent\u0000variables. The data are not assumed to follow any distribution, but all\u0000variables are treated as being equally reliable. Our approach extends the\u0000geometric mean functional relationship to multiple dimensions. This is\u0000especially useful with variables measured in different units, as it is\u0000naturally scale-invariant, whereas orthogonal regression is not. This is\u0000because our approach is not based on minimizing distances, but on the symmetric\u0000concept of correlation. The estimated coefficients are easily obtained from the\u0000covariances or correlations, and correspond to geometric means of associated\u0000least squares coefficients. The ease of calculation will hopefully allow\u0000widespread application of impartial fitting to estimate relationships in a\u0000neutral way.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}