Grounding the analysis on multidisciplinary literature on the topic, the existing EU legislation and relevant examples, this working paper aims at highlighting some key economic and organizational aspects of the "Open Government Data" paradigm and its drivers and implications within and outside Public Administrations. The discussion intends to adopt an "Internet Science" perspective, taking into account as enabling factors the digital environment itself, as well as specific models and tools. More "traditional" and mature markets grounded on Public Sector Information are also considered, in order to indirectly detect the main differences with respect to the aforementioned paradigm.
{"title":"Open Government Data: A Focus on Key Economic and Organizational Drivers","authors":"R. Iemma","doi":"10.2139/ssrn.2262943","DOIUrl":"https://doi.org/10.2139/ssrn.2262943","url":null,"abstract":"Grounding the analysis on multidisciplinary literature on the topic, the existing EU legislation and relevant examples, this working paper aims at highlighting some key economic and organizational aspects of the \"Open Government Data\" paradigm and its drivers and implications within and outside Public Administrations. The discussion intends to adopt an \"Internet Science\" perspective, taking into account as enabling factors the digital environment itself, as well as specific models and tools. More \"traditional\" and mature markets grounded on Public Sector Information are also considered, in order to indirectly detect the main differences with respect to the aforementioned paradigm.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127596813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-25DOI: 10.1142/9789814436434_0004
Romain Bompis, E. Gobet
We give a broad overview of approximation methods to derive analytical formulas for accurate and quick evaluation of option prices. We compare different approaches, from the theoretical point of view regarding the tools they require, and also from the numerical point of view regarding their performances. In the case of local volatility models with general time-dependency, we derive new formulas using the local volatility function at the mid-point between strike and spot: in general, our approximations outperform previous ones by Hagan and Henry-Labordere. We also provide approximations of the option delta.
{"title":"Asymptotic and Non Asymptotic Approximations for Option Valuation","authors":"Romain Bompis, E. Gobet","doi":"10.1142/9789814436434_0004","DOIUrl":"https://doi.org/10.1142/9789814436434_0004","url":null,"abstract":"We give a broad overview of approximation methods to derive analytical formulas for accurate and quick evaluation of option prices. We compare different approaches, from the theoretical point of view regarding the tools they require, and also from the numerical point of view regarding their performances. In the case of local volatility models with general time-dependency, we derive new formulas using the local volatility function at the mid-point between strike and spot: in general, our approximations outperform previous ones by Hagan and Henry-Labordere. We also provide approximations of the option delta.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128662217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Which code of behavior should form the basis of science and research? Replicability is definitely among these values. It is a pivotal feature of good scientific practice. Only replicable results are indeed scientific results. Studies that cannot be replicated are, strictly speaking, not scientific, but – given they are good – a type of feuilleton. Still, to most researchers – and this might seem surprising – facilitating and particularly conducting a replication study is anything but a matter of course.
{"title":"Data Accessibility is Not Sufficient for Making Replication Studies a Matter of Course","authors":"Denis Huschka, Gert G. Wagner","doi":"10.2139/ssrn.2038836","DOIUrl":"https://doi.org/10.2139/ssrn.2038836","url":null,"abstract":"Which code of behavior should form the basis of science and research? Replicability is definitely among these values. It is a pivotal feature of good scientific practice. Only replicable results are indeed scientific results. Studies that cannot be replicated are, strictly speaking, not scientific, but – given they are good – a type of feuilleton. Still, to most researchers – and this might seem surprising – facilitating and particularly conducting a replication study is anything but a matter of course.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115352337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a new way to de-bias random forest variable selection using a clean random forest algorithm. Strobl etal (2007) have shown random forest to be biased towards variables with many levels or categories and scales and correlated variables which might result in some inflated variable importance measures. The proposed algorithm builds random forests without each variable and keeps variables when dropping them degrades the overall random forest performance. The algorithm is simple and straight forward and its complexity and speed is a function of the number of salient variables. It runs more efficiently than the permutation test algorithm and is an alternative method to address known biases. The paper concludes some normative guidance on how to use random forest variable importance.
{"title":"De-Biased Random Forest Variable Selection","authors":"Dhruv Sharma","doi":"10.2139/ssrn.1975801","DOIUrl":"https://doi.org/10.2139/ssrn.1975801","url":null,"abstract":"This paper proposes a new way to de-bias random forest variable selection using a clean random forest algorithm. Strobl etal (2007) have shown random forest to be biased towards variables with many levels or categories and scales and correlated variables which might result in some inflated variable importance measures. The proposed algorithm builds random forests without each variable and keeps variables when dropping them degrades the overall random forest performance. The algorithm is simple and straight forward and its complexity and speed is a function of the number of salient variables. It runs more efficiently than the permutation test algorithm and is an alternative method to address known biases. The paper concludes some normative guidance on how to use random forest variable importance.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129113364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-06-01DOI: 10.5089/9781455270507.001
M. Denk, Michael Weber
International organizations collect data from national authorities to create multivariate cross-sectional time series for their analyses. As data from countries with not yet well-established statistical systems may be incomplete, the bridging of data gaps is a crucial challenge. This paper investigates data structures and missing data patterns in the cross-sectional time series framework, reviews missing value imputation techniques used for micro data in official statistics, and discusses their applicability to cross-sectional time series. It presents statistical methods and quality indicators that enable the (comparative) evaluation of imputation processes and completed datasets.
{"title":"Avoid Filling Swiss Cheese with Whipped Cream: Imputation Techniques and Evaluation Procedures for Cross-Country Time Series","authors":"M. Denk, Michael Weber","doi":"10.5089/9781455270507.001","DOIUrl":"https://doi.org/10.5089/9781455270507.001","url":null,"abstract":"International organizations collect data from national authorities to create multivariate cross-sectional time series for their analyses. As data from countries with not yet well-established statistical systems may be incomplete, the bridging of data gaps is a crucial challenge. This paper investigates data structures and missing data patterns in the cross-sectional time series framework, reviews missing value imputation techniques used for micro data in official statistics, and discusses their applicability to cross-sectional time series. It presents statistical methods and quality indicators that enable the (comparative) evaluation of imputation processes and completed datasets.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128216463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider a dependent competing risks model with many risks and many covariates. We show identifiability of the marginal distributions of latent variables for a given dependence structure. Instead of directly estimating these distributions, we suggest a plug-in regression framework for the Copula-Graphic estimator which utilizes a consistent estimator for the cumulative incidence curves. Our model is an attractive empirical approach as it does not require knowledge of the marginal distributions which are typically unknown in applications. We illustrate the applicability of our approach with the help of a parametric unemployment duration model with an unknown dependence structure. We construct identification bounds for the marginal distributions and partial effects in response to covariate changes. The bounds for the partial effects are surprisingly tight and often reveal the direction of the covariate effect.
{"title":"A Regression Model for the Copula Graphic Estimator","authors":"Simon M. S. Lo, R. Wilke","doi":"10.2139/ssrn.1858645","DOIUrl":"https://doi.org/10.2139/ssrn.1858645","url":null,"abstract":"We consider a dependent competing risks model with many risks and many covariates. We show identifiability of the marginal distributions of latent variables for a given dependence structure. Instead of directly estimating these distributions, we suggest a plug-in regression framework for the Copula-Graphic estimator which utilizes a consistent estimator for the cumulative incidence curves. Our model is an attractive empirical approach as it does not require knowledge of the marginal distributions which are typically unknown in applications. We illustrate the applicability of our approach with the help of a parametric unemployment duration model with an unknown dependence structure. We construct identification bounds for the marginal distributions and partial effects in response to covariate changes. The bounds for the partial effects are surprisingly tight and often reveal the direction of the covariate effect.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a generalized autoregressive distributed lag (GADL) model for conducting regression estimations that involve mixed-frequency data. As an example, we show that daily asset market information - currency and equity mar- ket movements - can produce forecasts of quarterly commodity price changes that are superior to those in the previous research. Following the traditional ADL lit- erature, our estimation strategy relies on a Vandermonde matrix to parameterize the weighting functions for higher-frequency observations. Accordingly, infer- ences can be obtained using ordinary least squares principles without Kalman fi ltering, non-linear optimizations, or additional restrictions on the parameters. Our fi ndings provide an easy-to-use method for conducting mixed data-sampling analysis as well as for forecasting world commodity price movements.
{"title":"Forecasting Commodity Prices with Mixed-Frequency Data: An OLS-Based Generalized ADL Approach","authors":"Yu‐chin Chen, Wen-Jen Tsay","doi":"10.2139/ssrn.1782214","DOIUrl":"https://doi.org/10.2139/ssrn.1782214","url":null,"abstract":"This paper presents a generalized autoregressive distributed lag (GADL) model for conducting regression estimations that involve mixed-frequency data. As an example, we show that daily asset market information - currency and equity mar- ket movements - can produce forecasts of quarterly commodity price changes that are superior to those in the previous research. Following the traditional ADL lit- erature, our estimation strategy relies on a Vandermonde matrix to parameterize the weighting functions for higher-frequency observations. Accordingly, infer- ences can be obtained using ordinary least squares principles without Kalman fi ltering, non-linear optimizations, or additional restrictions on the parameters. Our fi ndings provide an easy-to-use method for conducting mixed data-sampling analysis as well as for forecasting world commodity price movements.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123992275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Independent experts and politicians have criticized statistical analyses of recreation behavior, which rely upon onsite samples due to their potential for biased inference. The use of onsite sampling usually reflects data or budgetary constraints, but can lead to two primary forms of bias in site choice models. First, the strategy entails sampling site choices rather than sampling individuals--a form of bias called endogenous stratification. Under these conditions, sample choices may not reflect the site choices of the true population. Second, exogenous attributes of the individuals sampled onsite may differ from the attributes of individuals in the population--the most common form in recreation demand is avidity bias. We propose addressing these biases by combining two the existing methods: Weighted Exogenous Stratification Maximum Likelihood estimation and propensity score estimation. We use the National Marine Fisheries Service's Marine Recreational Fishing Statistics Survey to illustrate methods of bias reduction, employing both simulated and empirical applications. We find that propensity score based weights can significantly reduce bias in estimation. Our results indicate that failure to account for these biases can overstate anglers' willingness to pay for improvements in fishing catch, but weighted models exhibit higher variance of parameter estimates and willingness to pay.
{"title":"Addressing Onsite Sampling in Recreation Site Choice Models","authors":"Paul R. Hindsley, C. Landry, B. Gentner","doi":"10.2139/ssrn.1824390","DOIUrl":"https://doi.org/10.2139/ssrn.1824390","url":null,"abstract":"Independent experts and politicians have criticized statistical analyses of recreation behavior, which rely upon onsite samples due to their potential for biased inference. The use of onsite sampling usually reflects data or budgetary constraints, but can lead to two primary forms of bias in site choice models. First, the strategy entails sampling site choices rather than sampling individuals--a form of bias called endogenous stratification. Under these conditions, sample choices may not reflect the site choices of the true population. Second, exogenous attributes of the individuals sampled onsite may differ from the attributes of individuals in the population--the most common form in recreation demand is avidity bias. We propose addressing these biases by combining two the existing methods: Weighted Exogenous Stratification Maximum Likelihood estimation and propensity score estimation. We use the National Marine Fisheries Service's Marine Recreational Fishing Statistics Survey to illustrate methods of bias reduction, employing both simulated and empirical applications. We find that propensity score based weights can significantly reduce bias in estimation. Our results indicate that failure to account for these biases can overstate anglers' willingness to pay for improvements in fishing catch, but weighted models exhibit higher variance of parameter estimates and willingness to pay.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129386997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the best way to improve the optimization of spatial databases: through spatial indexes. The most commune and utilized spatial indexes are R-tree and Quadtree and they are presented, analyzed and compared in this paper. Also there are given a few examples of queries that run in Oracle Spatial and are being supported by an R-tree spatial index. Spatial databases offer special features that can be very helpful when needing to represent such data. But in terms of storage and time costs, spatial data can require a lot of resources. This is why optimizing the database is one of the most important aspects when working with large volumes of data.
{"title":"Optimizing Spatial Databases","authors":"Anda Belciu, Stefan Olaru","doi":"10.2139/ssrn.1800758","DOIUrl":"https://doi.org/10.2139/ssrn.1800758","url":null,"abstract":"This paper describes the best way to improve the optimization of spatial databases: through spatial indexes. The most commune and utilized spatial indexes are R-tree and Quadtree and they are presented, analyzed and compared in this paper. Also there are given a few examples of queries that run in Oracle Spatial and are being supported by an R-tree spatial index. Spatial databases offer special features that can be very helpful when needing to represent such data. But in terms of storage and time costs, spatial data can require a lot of resources. This is why optimizing the database is one of the most important aspects when working with large volumes of data.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131130365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The quality of match of four statistical matches used in the LIMEW estimates for Great Britain for 1995 and 2005 is described. The first match combines the fifth (1995) wave of the British Household Panel Survey (BHPS) with the 1995–96 Family Resources Survey (FRS). The second match combines the 1995 time-use module of the Office of Population Censuses and Surveys Omnibus Survey with the 1995-96 FRS. The third match combines the 15th wave (2005) of the BHPS with the 2005 FRS. The fourth match combines the 2000 United Kingdom Time Use Survey with the 2005 FRS. In each case, the alignment of the two datasets is examined, after which various aspects of the match quality are described. In each case, the matches are of high quality, given the nature of the source datasets.
{"title":"Quality of Match for Statistical Matches Used in the 1995 and 2005 LIMEW Estimates for Great Britain","authors":"Thomas Masterson","doi":"10.2139/ssrn.1800227","DOIUrl":"https://doi.org/10.2139/ssrn.1800227","url":null,"abstract":"The quality of match of four statistical matches used in the LIMEW estimates for Great Britain for 1995 and 2005 is described. The first match combines the fifth (1995) wave of the British Household Panel Survey (BHPS) with the 1995–96 Family Resources Survey (FRS). The second match combines the 1995 time-use module of the Office of Population Censuses and Surveys Omnibus Survey with the 1995-96 FRS. The third match combines the 15th wave (2005) of the BHPS with the 2005 FRS. The fourth match combines the 2000 United Kingdom Time Use Survey with the 2005 FRS. In each case, the alignment of the two datasets is examined, after which various aspects of the match quality are described. In each case, the matches are of high quality, given the nature of the source datasets.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114644057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}