ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)最新文献

英文中文

Open Government Data: A Focus on Key Economic and Organizational Drivers 开放政府数据:关注关键的经济和组织驱动因素

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2012-11-15 DOI: 10.2139/ssrn.2262943

R. Iemma

Grounding the analysis on multidisciplinary literature on the topic, the existing EU legislation and relevant examples, this working paper aims at highlighting some key economic and organizational aspects of the "Open Government Data" paradigm and its drivers and implications within and outside Public Administrations. The discussion intends to adopt an "Internet Science" perspective, taking into account as enabling factors the digital environment itself, as well as specific models and tools. More "traditional" and mature markets grounded on Public Sector Information are also considered, in order to indirectly detect the main differences with respect to the aforementioned paradigm.

基于对该主题的多学科文献、现有欧盟立法和相关实例的分析，本工作文件旨在强调“开放政府数据”范式的一些关键经济和组织方面及其在公共行政部门内外的驱动因素和影响。讨论打算采用“互联网科学”的观点，考虑到数字环境本身以及具体的模型和工具作为促成因素。还考虑了以公共部门信息为基础的更“传统”和成熟的市场，以便间接发现与上述范例的主要差异。

引用次数: 6

Asymptotic and Non Asymptotic Approximations for Option Valuation 期权估值的渐近逼近与非渐近逼近

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2012-07-25 DOI: 10.1142/9789814436434_0004

Romain Bompis, E. Gobet

We give a broad overview of approximation methods to derive analytical formulas for accurate and quick evaluation of option prices. We compare different approaches, from the theoretical point of view regarding the tools they require, and also from the numerical point of view regarding their performances. In the case of local volatility models with general time-dependency, we derive new formulas using the local volatility function at the mid-point between strike and spot: in general, our approximations outperform previous ones by Hagan and Henry-Labordere. We also provide approximations of the option delta.

我们给出了一个概览的近似方法，以获得准确和快速评估期权价格的分析公式。我们比较了不同的方法，从理论的角度来看，他们需要的工具，也从数字的角度来看，他们的性能。对于具有一般时间依赖性的局部波动率模型，我们使用罢工和现货之间中点的局部波动率函数推导出新的公式:一般来说，我们的近似优于Hagan和Henry-Labordere先前的近似。我们还提供了期权delta的近似。

引用次数: 20

Data Accessibility is Not Sufficient for Making Replication Studies a Matter of Course 数据可访问性不足以使复制研究成为理所当然的事情

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2012-04-12 DOI: 10.2139/ssrn.2038836

Denis Huschka, Gert G. Wagner

Which code of behavior should form the basis of science and research? Replicability is definitely among these values. It is a pivotal feature of good scientific practice. Only replicable results are indeed scientific results. Studies that cannot be replicated are, strictly speaking, not scientific, but – given they are good – a type of feuilleton. Still, to most researchers – and this might seem surprising – facilitating and particularly conducting a replication study is anything but a matter of course.

哪些行为准则应该成为科学研究的基础?可复制性无疑是这些价值之一。这是良好科学实践的关键特征。只有可复制的结果才是真正的科学结果。严格来说，不能被复制的研究不是科学的，但是——如果它们是好的——是一种理论。尽管如此，对大多数研究人员来说——这似乎令人惊讶——促进和特别是进行重复性研究绝不是理所当然的事。

引用次数: 5

De-Biased Random Forest Variable Selection 去偏随机森林变量选择

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2011-12-22 DOI: 10.2139/ssrn.1975801

Dhruv Sharma

This paper proposes a new way to de-bias random forest variable selection using a clean random forest algorithm. Strobl etal (2007) have shown random forest to be biased towards variables with many levels or categories and scales and correlated variables which might result in some inflated variable importance measures. The proposed algorithm builds random forests without each variable and keeps variables when dropping them degrades the overall random forest performance. The algorithm is simple and straight forward and its complexity and speed is a function of the number of salient variables. It runs more efficiently than the permutation test algorithm and is an alternative method to address known biases. The paper concludes some normative guidance on how to use random forest variable importance.

本文提出了一种利用干净随机森林算法消除随机森林变量选择偏差的新方法。stroble etal(2007)已经表明随机森林偏向于具有许多水平或类别和规模的变量和相关变量，这可能导致一些膨胀的变量重要性度量。该算法构建不包含每个变量的随机森林，并在删除变量时保留变量，从而降低了随机森林的整体性能。该算法简单直接，其复杂度和速度是显著变量数量的函数。它比排列测试算法运行更有效，是解决已知偏差的另一种方法。本文对随机森林变量重要性的使用提出了一些规范的指导意见。

引用次数: 1

Avoid Filling Swiss Cheese with Whipped Cream: Imputation Techniques and Evaluation Procedures for Cross-Country Time Series 避免用鲜奶油填充瑞士奶酪:跨国时间序列的归算技术和评估程序

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2011-06-01 DOI: 10.5089/9781455270507.001

M. Denk, Michael Weber

International organizations collect data from national authorities to create multivariate cross-sectional time series for their analyses. As data from countries with not yet well-established statistical systems may be incomplete, the bridging of data gaps is a crucial challenge. This paper investigates data structures and missing data patterns in the cross-sectional time series framework, reviews missing value imputation techniques used for micro data in official statistics, and discusses their applicability to cross-sectional time series. It presents statistical methods and quality indicators that enable the (comparative) evaluation of imputation processes and completed datasets.

国际组织从国家当局收集数据，为其分析创建多变量横断面时间序列。由于来自统计系统尚未完善的国家的数据可能不完整，因此弥合数据差距是一项重大挑战。本文研究了横截面时间序列框架下的数据结构和缺失数据模式，综述了官方统计中用于微观数据的缺失值估算技术，并讨论了它们在横截面时间序列中的适用性。它提出了统计方法和质量指标，使(比较)评价的imputation过程和完成的数据集。

引用次数: 14

A Regression Model for the Copula Graphic Estimator Copula图估计量的回归模型

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2011-04-30 DOI: 10.2139/ssrn.1858645

Simon M. S. Lo, R. Wilke

We consider a dependent competing risks model with many risks and many covariates. We show identifiability of the marginal distributions of latent variables for a given dependence structure. Instead of directly estimating these distributions, we suggest a plug-in regression framework for the Copula-Graphic estimator which utilizes a consistent estimator for the cumulative incidence curves. Our model is an attractive empirical approach as it does not require knowledge of the marginal distributions which are typically unknown in applications. We illustrate the applicability of our approach with the help of a parametric unemployment duration model with an unknown dependence structure. We construct identification bounds for the marginal distributions and partial effects in response to covariate changes. The bounds for the partial effects are surprisingly tight and often reveal the direction of the covariate effect.

考虑一个具有多风险和多协变量的依赖竞争风险模型。我们证明了给定依赖结构的潜在变量的边际分布的可辨识性。而不是直接估计这些分布，我们建议一个插件回归框架的Copula-Graphic估计，利用累积发生率曲线的一致估计。我们的模型是一种有吸引力的经验方法，因为它不需要边际分布的知识，这在应用中通常是未知的。我们用一个具有未知依赖结构的参数失业持续时间模型来说明我们方法的适用性。我们为响应协变量变化的边际分布和部分效应构建识别界。部分效应的界限非常紧密，并且经常揭示协变量效应的方向。

引用次数: 1

Forecasting Commodity Prices with Mixed-Frequency Data: An OLS-Based Generalized ADL Approach 用混合频率数据预测商品价格:基于ols的广义ADL方法

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2011-04-29 DOI: 10.2139/ssrn.1782214

Yu‐chin Chen, Wen-Jen Tsay

This paper presents a generalized autoregressive distributed lag (GADL) model for conducting regression estimations that involve mixed-frequency data. As an example, we show that daily asset market information - currency and equity mar- ket movements - can produce forecasts of quarterly commodity price changes that are superior to those in the previous research. Following the traditional ADL lit- erature, our estimation strategy relies on a Vandermonde matrix to parameterize the weighting functions for higher-frequency observations. Accordingly, infer- ences can be obtained using ordinary least squares principles without Kalman fi ltering, non-linear optimizations, or additional restrictions on the parameters. Our fi ndings provide an easy-to-use method for conducting mixed data-sampling analysis as well as for forecasting world commodity price movements.

本文提出了一种广义自回归分布滞后(GADL)模型，用于混合频率数据的回归估计。作为一个例子，我们展示了每日的资产市场信息——货币和股票市场的运动——可以产生季度商品价格变化的预测，比以前的研究要好。根据传统的ADL文献，我们的估计策略依赖于Vandermonde矩阵来参数化高频观测的权重函数。因此，可以使用普通最小二乘原理获得推论，而不需要卡尔曼滤波、非线性优化或对参数的附加限制。我们的研究结果为进行混合数据抽样分析以及预测世界商品价格走势提供了一种易于使用的方法。

引用次数: 9

Addressing Onsite Sampling in Recreation Site Choice Models 解决娱乐场所选择模型中的现场抽样问题

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2011-04-27 DOI: 10.2139/ssrn.1824390

Paul R. Hindsley, C. Landry, B. Gentner

Independent experts and politicians have criticized statistical analyses of recreation behavior, which rely upon onsite samples due to their potential for biased inference. The use of onsite sampling usually reflects data or budgetary constraints, but can lead to two primary forms of bias in site choice models. First, the strategy entails sampling site choices rather than sampling individuals--a form of bias called endogenous stratification. Under these conditions, sample choices may not reflect the site choices of the true population. Second, exogenous attributes of the individuals sampled onsite may differ from the attributes of individuals in the population--the most common form in recreation demand is avidity bias. We propose addressing these biases by combining two the existing methods: Weighted Exogenous Stratification Maximum Likelihood estimation and propensity score estimation. We use the National Marine Fisheries Service's Marine Recreational Fishing Statistics Survey to illustrate methods of bias reduction, employing both simulated and empirical applications. We find that propensity score based weights can significantly reduce bias in estimation. Our results indicate that failure to account for these biases can overstate anglers' willingness to pay for improvements in fishing catch, but weighted models exhibit higher variance of parameter estimates and willingness to pay.

独立专家和政治家批评了娱乐行为的统计分析，这些分析依赖于现场样本，因为它们有可能产生有偏见的推断。现场抽样的使用通常反映了数据或预算限制，但可能导致选址模型中的两种主要形式的偏差。首先，该策略需要抽样地点的选择，而不是抽样个人——一种被称为内生分层的偏见形式。在这种情况下，样本选择可能不能反映真实总体的地点选择。其次，现场采样个体的外生属性可能与种群中个体的属性不同——娱乐需求中最常见的形式是贪婪偏差。我们建议通过结合两种现有方法来解决这些偏差:加权外生分层最大似然估计和倾向评分估计。我们使用国家海洋渔业局的海洋休闲捕鱼统计调查来说明减少偏差的方法，采用模拟和实证应用。我们发现基于倾向得分的权重可以显著减少估计中的偏差。我们的研究结果表明，不考虑这些偏差可能会夸大钓鱼者为改善渔获量而支付的意愿，但加权模型在参数估计和支付意愿方面表现出更高的方差。

{"title":"Addressing Onsite Sampling in Recreation Site Choice Models","authors":"Paul R. Hindsley, C. Landry, B. Gentner","doi":"10.2139/ssrn.1824390","DOIUrl":"https://doi.org/10.2139/ssrn.1824390","url":null,"abstract":"Independent experts and politicians have criticized statistical analyses of recreation behavior, which rely upon onsite samples due to their potential for biased inference. The use of onsite sampling usually reflects data or budgetary constraints, but can lead to two primary forms of bias in site choice models. First, the strategy entails sampling site choices rather than sampling individuals--a form of bias called endogenous stratification. Under these conditions, sample choices may not reflect the site choices of the true population. Second, exogenous attributes of the individuals sampled onsite may differ from the attributes of individuals in the population--the most common form in recreation demand is avidity bias. We propose addressing these biases by combining two the existing methods: Weighted Exogenous Stratification Maximum Likelihood estimation and propensity score estimation. We use the National Marine Fisheries Service's Marine Recreational Fishing Statistics Survey to illustrate methods of bias reduction, employing both simulated and empirical applications. We find that propensity score based weights can significantly reduce bias in estimation. Our results indicate that failure to account for these biases can overstate anglers' willingness to pay for improvements in fishing catch, but weighted models exhibit higher variance of parameter estimates and willingness to pay.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129386997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Optimizing Spatial Databases 优化空间数据库

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2011-04-01 DOI: 10.2139/ssrn.1800758

Anda Belciu, Stefan Olaru

This paper describes the best way to improve the optimization of spatial databases: through spatial indexes. The most commune and utilized spatial indexes are R-tree and Quadtree and they are presented, analyzed and compared in this paper. Also there are given a few examples of queries that run in Oracle Spatial and are being supported by an R-tree spatial index. Spatial databases offer special features that can be very helpful when needing to represent such data. But in terms of storage and time costs, spatial data can require a lot of resources. This is why optimizing the database is one of the most important aspects when working with large volumes of data.

本文描述了提高空间数据库优化的最佳方法:通过空间索引。本文对r树和四叉树这两种空间指标进行了介绍、分析和比较。此外，还给出了一些在Oracle Spatial中运行的查询示例，这些查询由R-tree空间索引支持。空间数据库提供了一些特殊的特性，在需要表示此类数据时非常有用。但就存储和时间成本而言，空间数据可能需要大量资源。这就是为什么在处理大量数据时，优化数据库是最重要的方面之一。

引用次数: 7

Quality of Match for Statistical Matches Used in the 1995 and 2005 LIMEW Estimates for Great Britain 英国1995年和2005年LIMEW估算中使用的统计匹配的匹配质量

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

Pub Date : 2011-03-31 DOI: 10.2139/ssrn.1800227

Thomas Masterson

The quality of match of four statistical matches used in the LIMEW estimates for Great Britain for 1995 and 2005 is described. The first match combines the fifth (1995) wave of the British Household Panel Survey (BHPS) with the 1995–96 Family Resources Survey (FRS). The second match combines the 1995 time-use module of the Office of Population Censuses and Surveys Omnibus Survey with the 1995-96 FRS. The third match combines the 15th wave (2005) of the BHPS with the 2005 FRS. The fourth match combines the 2000 United Kingdom Time Use Survey with the 2005 FRS. In each case, the alignment of the two datasets is examined, after which various aspects of the match quality are described. In each case, the matches are of high quality, given the nature of the source datasets.

描述了1995年和2005年英国LIMEW估计中使用的四种统计匹配的匹配质量。第一场比赛结合了第五波(1995年)英国家庭小组调查(BHPS)和1995 - 96年家庭资源调查(FRS)。第二次匹配将1995年人口普查和调查办公室综合调查的时间使用模块与1995-96年FRS相结合。第三次匹配将第15波(2005年)BHPS与2005年FRS相结合。第四次匹配将2000年英国时间使用调查与2005年FRS相结合。在每种情况下，检查两个数据集的一致性，然后描述匹配质量的各个方面。在每种情况下，考虑到源数据集的性质，匹配都是高质量的。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀