Journal of Official Statistics最新文献_第8页

Creative and Exhaustive, but Less Practical – a Comment on the Article by Diewert and Fox 创意和详尽，但不太实用——对迪沃特和福克斯文章的评论

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2022-03-01 DOI: 10.2478/jos-2022-0014

B. Goldhammer

Abstract The article by Diewert and Fox provides a comprehensive overview of challenges that NSOs face in producing the CPI in pandemic times by touching on many different fields. A focus is on the treatment of missing prices, where they propose different methods depending on the resources available to the NSO. However, some of the procedures proposed can be seen as being less practical like the use of reservation prices (which is also debatable from a theoretical point of view) and of alternative data sources for weights whose implementation supposedly takes longer than the pandemic itself. Overall, the article provides an important contribution for making CPI production more robust for similar crises in the future.

摘要Diewert和Fox的文章通过涉及许多不同领域，全面概述了国家统计局在疫情期间编制CPI时面临的挑战。重点是处理遗漏价格，根据NSO可用的资源，他们提出了不同的方法。然而，一些拟议的程序可能被视为不太实用，比如使用预订价格（从理论角度来看，这也是有争议的）和替代数据源来计算权重，这些数据源的实施时间据说比疫情本身更长。总的来说，这篇文章为使CPI生产在未来类似危机中更加稳健做出了重要贡献。

引用次数: 0

Estimating Weights for Web-Scraped Data in Consumer Price Indices 消费者价格指数中网络抓取数据的权重估计

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2022-03-01 DOI: 10.2478/jos-2022-0002

D. Ayoubkhani, Heledd Thomas

Abstract In recent years, there has been much interest among national statistical agencies in using web-scraped data in consumer price indices, potentially supplementing or replacing manually collected price quotes. Yet one challenge that has received very little attention to date is the estimation of expenditure weights in the absence of quantity information, which would enable the construction of weighted item-level price indices. In this article we propose the novel approach of predicting sales quantities from their ranks (for example, when products are sorted ‘by popularity’ on consumer websites) via appropriate statistical distributions. Using historical transactional data supplied by a UK retailer for two consumer items, we assessed the out-of-sample accuracy of the Pareto, log-normal and truncated log-normal distributions, finding that the last of these resulted in an index series that most closely approximated an expenditure-weighted benchmark. Our results demonstrate the value of supplementing web-scraped price quotes with a simple set of retailer-supplied summary statistics relating to quantities, allowing statistical agencies to realise the benefits of freely available internet data whilst placing minimal burden on retailers. However, further research would need to be undertaken before the approach could be implemented in the compilation of official price indices.

近年来，国家统计机构对在消费者价格指数中使用网络抓取数据非常感兴趣，可能会补充或取代人工收集的价格报价。然而，迄今为止很少受到注意的一项挑战是在缺乏数量资料的情况下估计支出权重，这将使建立加权项目一级价格指数成为可能。在本文中，我们提出了一种新颖的方法，通过适当的统计分布从它们的排名中预测销量(例如，当产品在消费者网站上“按受欢迎程度”排序时)。使用一家英国零售商提供的两种消费品的历史交易数据，我们评估了帕累托、对数正态分布和截断对数正态分布的样本外准确性，发现最后一种分布产生了最接近支出加权基准的指数系列。我们的研究结果表明，用一组简单的零售商提供的与数量相关的汇总统计数据来补充网络报价的价值，使统计机构能够实现免费提供互联网数据的好处，同时将零售商的负担降到最低。但是，在编制官方价格指数时采用这种方法之前，还需要进行进一步的研究。

{"title":"Estimating Weights for Web-Scraped Data in Consumer Price Indices","authors":"D. Ayoubkhani, Heledd Thomas","doi":"10.2478/jos-2022-0002","DOIUrl":"https://doi.org/10.2478/jos-2022-0002","url":null,"abstract":"Abstract In recent years, there has been much interest among national statistical agencies in using web-scraped data in consumer price indices, potentially supplementing or replacing manually collected price quotes. Yet one challenge that has received very little attention to date is the estimation of expenditure weights in the absence of quantity information, which would enable the construction of weighted item-level price indices. In this article we propose the novel approach of predicting sales quantities from their ranks (for example, when products are sorted ‘by popularity’ on consumer websites) via appropriate statistical distributions. Using historical transactional data supplied by a UK retailer for two consumer items, we assessed the out-of-sample accuracy of the Pareto, log-normal and truncated log-normal distributions, finding that the last of these resulted in an index series that most closely approximated an expenditure-weighted benchmark. Our results demonstrate the value of supplementing web-scraped price quotes with a simple set of retailer-supplied summary statistics relating to quantities, allowing statistical agencies to realise the benefits of freely available internet data whilst placing minimal burden on retailers. However, further research would need to be undertaken before the approach could be implemented in the compilation of official price indices.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"5 - 21"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48879756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Unit Value Indexes for Exports – New Developments Using Administrative Trade Data 出口单位价值指数-使用行政贸易数据的新发展

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2022-03-01 DOI: 10.2478/jos-2022-0005

Don Fast, S. Fleck, Dominic A Smith

Abstract U.S. import and export price indexes replaced unit value indexes forty years ago, given quality concerns of mismeasurement due to unit value bias. The administrative trade data underlying the unit values have greatly improved since that time. The transaction records are now more detailed, available electronically, and compiled monthly with little delay. The data are used by academic researchers to calculate price measures, and unit value indexes based on trade data are used by other national statistical offices (NSOs). The U.S. Bureau of Labor Statistics is now evaluating whether replacing price indexes with unit value indexes for homogeneous products calculated from administrative trade data could expand the number of published official import and export price indexes. Using export transactions, the research calculates detailed unit value indexes from 200 + million trade records from 2012–2017 for 123 export product categories. Results show that 27 of the 123 unit value indexes are homogeneous and closely comparable to published official price indexes. This article presents the concepts and methods considered to calculate and evaluate the unit value indexes and to select the product categories that are homogeneous. Compared to official price indexes, export unit value indexes for the 27 5-digit BEA (U.S. Bureau of Economic Analysis) end-use product categories would deflate real exports of these goods by 13 percentage points less over the period. Incorporating these 27 indexes into the top-level XPI would increase the value of real exports of all merchandise goods by 2.6 percentage points at the end of 2017.

摘要美国进出口价格指数在40年前取代了单位价值指数，考虑到单位价值偏差造成的计量错误的质量问题。从那时起，作为单位价值基础的行政贸易数据有了很大改善。交易记录现在更加详细，可以通过电子方式获得，并且每月编制一次，几乎没有延误。学术研究人员使用这些数据来计算价格指标，其他国家统计局(NSOs)使用基于贸易数据的单位价值指数。美国劳工统计局(Bureau of Labor Statistics)目前正在评估，用从行政贸易数据计算的同质产品的单位价值指数取代价格指数，是否可以扩大公布的官方进出口价格指数的数量。该研究利用出口交易，从2012-2017年的2亿多份贸易记录中计算出123个出口产品类别的详细单位价值指数。结果表明，123个单位价值指数中有27个与公布的官方价格指数具有同质性和可比性。本文介绍了单位价值指标的计算和评价以及同质产品品类的选择所考虑的概念和方法。与官方价格指数相比，27个5位数的BEA(美国经济分析局)最终用途产品类别的出口单位价值指数在此期间将使这些商品的实际出口减少13个百分点。将这27个指数纳入最高水平的XPI，到2017年底，所有商品的实际出口价值将增加2.6个百分点。

{"title":"Unit Value Indexes for Exports – New Developments Using Administrative Trade Data","authors":"Don Fast, S. Fleck, Dominic A Smith","doi":"10.2478/jos-2022-0005","DOIUrl":"https://doi.org/10.2478/jos-2022-0005","url":null,"abstract":"Abstract U.S. import and export price indexes replaced unit value indexes forty years ago, given quality concerns of mismeasurement due to unit value bias. The administrative trade data underlying the unit values have greatly improved since that time. The transaction records are now more detailed, available electronically, and compiled monthly with little delay. The data are used by academic researchers to calculate price measures, and unit value indexes based on trade data are used by other national statistical offices (NSOs). The U.S. Bureau of Labor Statistics is now evaluating whether replacing price indexes with unit value indexes for homogeneous products calculated from administrative trade data could expand the number of published official import and export price indexes. Using export transactions, the research calculates detailed unit value indexes from 200 + million trade records from 2012–2017 for 123 export product categories. Results show that 27 of the 123 unit value indexes are homogeneous and closely comparable to published official price indexes. This article presents the concepts and methods considered to calculate and evaluate the unit value indexes and to select the product categories that are homogeneous. Compared to official price indexes, export unit value indexes for the 27 5-digit BEA (U.S. Bureau of Economic Analysis) end-use product categories would deflate real exports of these goods by 13 percentage points less over the period. Incorporating these 27 indexes into the top-level XPI would increase the value of real exports of all merchandise goods by 2.6 percentage points at the end of 2017.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"83 - 106"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48907225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Using Scanner Data for Computing Consumer Spatial Price Indexes at Regional Level: An Empirical Application for Grocery Products in Italy 使用扫描仪数据计算区域层面的消费者空间价格指数:意大利杂货产品的实证应用

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2022-03-01 DOI: 10.2478/jos-2022-0003

T. Laureti, F. Polidoro

Abstract The importance of constructing sub-national spatial price indexes (SPIs) has been acknowledged in the literature for over two decades. However, systematic attempts to compile sub-national SPIs on a regular basis have been hampered by the labour-intensive analyses required for processing traditional price data. In the case of household consumption expenditures, the increasing availability of big data may change the current approach for estimating sub-national SPIs by considering the use of weighted index formulae. The aim of this paper is twofold: firstly, to review previous literature on sub-national SPIs and secondly to estimate Italian consumer SPIs. To this aim we use scanner data referring to grocery products sold in a random sample of approximately 1,800 Italian outlets belonging to the most important retail chains and including information on prices, quantities and quality characteristics of products at barcode level. Various weighted index formulas are used for calculating consumer SPIs at detailed territorial level and at the lowest aggregation level. Our results show an interesting territorial variability of consumer prices of products sold in large-scale retail outlets across the Italian regions. Overall, the Southern regions appear to have price levels below the national average both for food and non-food products with some interesting exceptions.

构建次国家级空间价格指数(SPIs)的重要性在过去二十多年的文献中得到了认可。然而，由于处理传统价格数据所需的劳力密集分析，定期编制地方一级消费价格指数的系统努力受到了阻碍。就家庭消费支出而言，大数据的日益可用性可能会改变目前通过考虑使用加权指数公式来估计地方消费指数的方法。本文的目的是双重的:首先，回顾以前的文献对次国家的SPIs，其次估计意大利消费者的SPIs。为此，我们使用扫描器数据，指的是在大约1800家意大利最重要的零售连锁店的随机样本中销售的杂货产品，包括条形码级别产品的价格、数量和质量特征信息。使用各种加权指数公式来计算详细区域水平和最低聚集水平的消费者消费指数。我们的研究结果显示，意大利各地区大型零售店销售的产品的消费者价格存在有趣的地域差异。总的来说，南方地区的食品和非食品的价格水平似乎都低于全国平均水平，但也有一些有趣的例外。

{"title":"Using Scanner Data for Computing Consumer Spatial Price Indexes at Regional Level: An Empirical Application for Grocery Products in Italy","authors":"T. Laureti, F. Polidoro","doi":"10.2478/jos-2022-0003","DOIUrl":"https://doi.org/10.2478/jos-2022-0003","url":null,"abstract":"Abstract The importance of constructing sub-national spatial price indexes (SPIs) has been acknowledged in the literature for over two decades. However, systematic attempts to compile sub-national SPIs on a regular basis have been hampered by the labour-intensive analyses required for processing traditional price data. In the case of household consumption expenditures, the increasing availability of big data may change the current approach for estimating sub-national SPIs by considering the use of weighted index formulae. The aim of this paper is twofold: firstly, to review previous literature on sub-national SPIs and secondly to estimate Italian consumer SPIs. To this aim we use scanner data referring to grocery products sold in a random sample of approximately 1,800 Italian outlets belonging to the most important retail chains and including information on prices, quantities and quality characteristics of products at barcode level. Various weighted index formulas are used for calculating consumer SPIs at detailed territorial level and at the lowest aggregation level. Our results show an interesting territorial variability of consumer prices of products sold in large-scale retail outlets across the Italian regions. Overall, the Southern regions appear to have price levels below the national average both for food and non-food products with some interesting exceptions.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"23 - 56"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47638542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Rentals for Housing: A Property Fixed-Effects Estimator of Inflation from Administrative Data 住房租金：基于行政数据的通货膨胀的房地产固定效应估计

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2022-03-01 DOI: 10.2478/jos-2022-0009

A. Bentley

Abstract Official rentals for housing (rent) price inflation statistics are of considerable public interest. Matched-sample estimators, such as that used for nearly two-decades in New Zealand (2000–2019), require an unrealistic assumption of a static universe of rental properties. This article investigates (1) a property fixed-effects estimator that better reflects the dynamic universe of rental properties by implicitly imputing for price change associated with new and disappearing rental properties; (2) length-alignment simulations and property life-cycle metrics to inform the choice of data window length (eight years) and preferred splice methodology (mean-splice); and (3) stock-imputation to convert administrative data from a ‘flow’ (new tenancy price) to ‘stock’ (currently paid rent) concept. The derived window-length sensitivity findings have important implications for inflation measurement. It was found that the longer the data window used to fit the model, the greater the estimated rate of inflation. Using administrative data, a range of estimates from 55% (window length: three-quarters) to 127% (window of 90-quarters) were found for total inflation, over the 25-years to 2017 Q4.

官方住房租金(租金)价格通胀统计数据具有相当大的公共利益。匹配样本估计器，例如在新西兰(2000-2019)使用了近20年的匹配样本估计器，需要对租赁物业的静态世界进行不切实际的假设。本文研究了(1)一个财产固定效应估计量，该估计量通过隐式估算与新建和消失的租赁财产相关的价格变化，更好地反映了租赁财产的动态世界;(2)长度对齐模拟和属性生命周期指标，为选择数据窗口长度(8年)和首选拼接方法(均值拼接)提供信息;(3)存量估算，将管理数据从“流量”(新租赁价格)转换为“存量”(当前支付的租金)概念。推导出的窗长灵敏度结果对暴胀测量具有重要意义。研究发现，用于拟合模型的数据窗口越长，估计的通货膨胀率就越大。使用行政数据，在截至2017年第四季度的25年间，总通货膨胀率的估计范围从55%(窗口长度:四分之三)到127%(90个季度窗口)。

{"title":"Rentals for Housing: A Property Fixed-Effects Estimator of Inflation from Administrative Data","authors":"A. Bentley","doi":"10.2478/jos-2022-0009","DOIUrl":"https://doi.org/10.2478/jos-2022-0009","url":null,"abstract":"Abstract Official rentals for housing (rent) price inflation statistics are of considerable public interest. Matched-sample estimators, such as that used for nearly two-decades in New Zealand (2000–2019), require an unrealistic assumption of a static universe of rental properties. This article investigates (1) a property fixed-effects estimator that better reflects the dynamic universe of rental properties by implicitly imputing for price change associated with new and disappearing rental properties; (2) length-alignment simulations and property life-cycle metrics to inform the choice of data window length (eight years) and preferred splice methodology (mean-splice); and (3) stock-imputation to convert administrative data from a ‘flow’ (new tenancy price) to ‘stock’ (currently paid rent) concept. The derived window-length sensitivity findings have important implications for inflation measurement. It was found that the longer the data window used to fit the model, the greater the estimated rate of inflation. Using administrative data, a range of estimates from 55% (window length: three-quarters) to 127% (window of 90-quarters) were found for total inflation, over the 25-years to 2017 Q4.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"187 - 211"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43762649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Substitution Bias in the Measurement of Import and Export Price Indices: Causes and Correction 进出口价格指数计量中的替代偏差：原因及修正

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2022-03-01 DOI: 10.2478/jos-2022-0006

Ludwig von Auer, Alena Shumskikh

Abstract The import and export price indices of an economy are usually compiled by some Laspeyres type index. It is well known that such an index formula is prone to substitution bias. Therefore, also the terms of trade (ratio of export and import price index) are likely to be distorted. The underlying substitution bias accumulates over time. The present article introduces a simple and transparent retroactive correction approach that addresses the source of the substitution bias and produces meaningful long-run time series of import and export price levels and, therefore, of the terms of trade. Furthermore, an empirical case study is conducted that demonstrates the efficacy and versatility of the correction approach.

摘要一个经济体的进出口价格指数通常由一些拉氏指数编制。众所周知，这样的指数公式容易产生替代偏差。因此，贸易条件（进出口价格指数之比）也可能被扭曲。潜在的替代偏差会随着时间的推移而累积。本文介绍了一种简单透明的追溯修正方法，该方法解决了替代偏差的来源，并产生了进出口价格水平的有意义的长期时间序列，从而产生了贸易条件的长期时间系列。此外，还进行了一个实证案例研究，证明了矫正方法的有效性和多功能性。

引用次数: 3

Response Burden and Data Quality in Business Surveys 商业调查中的回应负担和数据质量

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2021-12-01 DOI: 10.2478/jos-2021-0036

Marco Bottone, Lucia Modugno, A. Neri

Abstract Response burden has long been a concern for data producers. In this article, we investigate the relationship between some measures of actual and perceived burden and we provide empirical evidence of their association with data quality. We draw on two business surveys conducted by Banca d’Italia since 1970, which provide a very rich and unique source of information. We find evidence that the perceived burden is affected by actual burden but the latter is not the only driver. Our results also show a clear link between a respondent’s perceived effort and the probability of not answering some important questions (such as those relating to expectations of future investments and turnover) or of dropping out of the survey. On the contrary, we do not find significant effects on the quality of answers to quantitative questions such as business turnover and investments. Overall, these findings have implications for data producers that should target the perceived burden, besides the actual burden, to increase data quality.

摘要响应负担一直是数据生产者关注的问题。在这篇文章中，我们调查了一些实际负担和感知负担的衡量标准之间的关系，并提供了它们与数据质量相关的经验证据。我们借鉴了意大利银行自1970年以来进行的两次商业调查，这两次调查提供了非常丰富和独特的信息来源。我们发现有证据表明，感知负担受到实际负担的影响，但实际负担并不是唯一的驱动因素。我们的研究结果还表明，受访者的感知努力与不回答一些重要问题（如与未来投资和营业额预期有关的问题）或退出调查的可能性之间存在明显联系。相反，我们没有发现对业务营业额和投资等定量问题的答案质量有显著影响。总的来说，这些发现对数据生产者有启示，他们应该针对实际负担之外的感知负担，以提高数据质量。

引用次数: 2

Freedom of Information and Personal Confidentiality in Spatial COVID-19 Data COVID-19空间数据中的信息自由和个人保密

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2021-12-01 DOI: 10.2478/jos-2021-0035

M. Beenstock, D. Felsenstein

Abstract We draw attention to how, in the name of protecting the confidentiality of personal data, national statistical agencies have limited public access to spatial data on COVID-19. We also draw attention to large disparities in the way that access has been limited. In doing so, we distinguish between absolute confidentiality in which the probability of detection is 1, relative confidentiality where this probability is less than 1, and collective confidentiality, which refers to the probability of detection of at least one person. In spatial data, the probability of personal detection is less than 1, and the probability of collective detection varies directly with this probability and COVID-19 morbidity. Statistical agencies have been concerned with relative and collective confidentiality, which they implement using the techniques of truncation, where spatial data are not made public for zones with small populations, and censoring, where exact data are not made public for zones where morbidity is small. Granular spatial data are essential for epidemiological research into COVID-19. We argue that in their reluctance to make these data available to the public, data security officers (DSO) have unreasonably prioritized data protection over freedom of information. We also argue that by attaching importance to relative and collective confidentiality, they have over-indulged in data truncation and censoring. We highlight the need for legislation concerning relative and collective confidentiality, and regulation of DSO practices regarding data truncation and censoring.

摘要我们提请注意，在保护个人数据机密性的名义下，国家统计机构如何限制公众获取新冠肺炎空间数据。我们还提请注意在准入受到限制方面存在的巨大差异。在这样做的过程中，我们区分了绝对机密性和集体机密性，其中绝对机密性是检测概率为1，相对机密性是该概率小于1，集体机密性是指检测到至少一个人的概率。在空间数据中，个人检测的概率小于1，集体检测的概率与该概率和新冠肺炎发病率直接相关。统计机构一直关注相对和集体的保密性，他们使用截断技术来实现这一点，在截断技术中，人口较少地区的空间数据不公开；在审查技术中，发病率较低地区的确切数据不公开。精细的空间数据对于新冠肺炎的流行病学研究至关重要。我们认为，由于数据安全官员不愿向公众提供这些数据，他们不合理地将数据保护置于信息自由之上。我们还认为，由于重视相对和集体保密，他们过度沉迷于数据截断和审查。我们强调需要制定有关相对和集体保密的立法，并监管DSO在数据截断和审查方面的做法。

{"title":"Freedom of Information and Personal Confidentiality in Spatial COVID-19 Data","authors":"M. Beenstock, D. Felsenstein","doi":"10.2478/jos-2021-0035","DOIUrl":"https://doi.org/10.2478/jos-2021-0035","url":null,"abstract":"Abstract We draw attention to how, in the name of protecting the confidentiality of personal data, national statistical agencies have limited public access to spatial data on COVID-19. We also draw attention to large disparities in the way that access has been limited. In doing so, we distinguish between absolute confidentiality in which the probability of detection is 1, relative confidentiality where this probability is less than 1, and collective confidentiality, which refers to the probability of detection of at least one person. In spatial data, the probability of personal detection is less than 1, and the probability of collective detection varies directly with this probability and COVID-19 morbidity. Statistical agencies have been concerned with relative and collective confidentiality, which they implement using the techniques of truncation, where spatial data are not made public for zones with small populations, and censoring, where exact data are not made public for zones where morbidity is small. Granular spatial data are essential for epidemiological research into COVID-19. We argue that in their reluctance to make these data available to the public, data security officers (DSO) have unreasonably prioritized data protection over freedom of information. We also argue that by attaching importance to relative and collective confidentiality, they have over-indulged in data truncation and censoring. We highlight the need for legislation concerning relative and collective confidentiality, and regulation of DSO practices regarding data truncation and censoring.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"37 1","pages":"791 - 809"},"PeriodicalIF":1.1,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41691355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Investigating an Alternative for Estimation from a Nonprobability Sample: Matching plus Calibration 研究一种非概率样本估计的替代方案：匹配加校准

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2021-12-01 DOI: 10.2478/jos-2023-0003

Zhanxu Liu, R. Valliant

Abstract Matching a nonprobability sample to a probability sample is one strategy both for selecting the nonprobability units and for weighting them. This approach has been employed in the past to select subsamples of persons from a large panel of volunteers. One method of weighting, introduced here, is to assign a unit in the nonprobability sample the weight from its matched case in the probability sample. The properties of resulting estimators depend on whether the probability sample weights are inverses of selection probabilities or are calibrated. In addition, imperfect matching can cause estimates from the matched sample to be biased so that its weights need to be adjusted, especially when the size of the volunteer panel is small. Calibration weighting combined with matching is one approach to correct bias and reduce variances. We explore the theoretical properties of the matched and matched, calibrated estimators with respect to a quasirandomization distribution that is assumed to describe how units in the nonprobability sample are observed, a superpopulation model for analysis variables collected in the nonprobability sample, and the randomization distribution for the probability sample. Numerical studies using simulated and real data from the 2015 US Behavioral Risk Factor Surveillance Survey are conducted to examine the performance of the alternative estimators.

摘要将非概率样本与概率样本匹配是选择非概率单元和对其进行加权的一种策略。这种方法过去曾被用于从一个大型志愿者小组中选择子样本。这里介绍的一种加权方法是将概率样本中匹配情况的权重分配给非概率样本中的一个单元。所得估计量的性质取决于概率样本权重是选择概率的倒数还是经过校准。此外，不完美的匹配可能会导致匹配样本的估计值有偏差，因此需要调整其权重，尤其是当志愿者小组的规模较小时。校准加权与匹配相结合是校正偏差和减少方差的一种方法。我们探索了匹配和匹配、校准估计量的理论性质，关于拟随机化分布、非概率样本中收集的分析变量的超总体模型以及概率样本的随机化分布，拟随机化分布被假设为描述如何观察非概率样本的单位。使用2015年美国行为风险因素监测调查的模拟和真实数据进行了数值研究，以检验替代估计量的性能。

{"title":"Investigating an Alternative for Estimation from a Nonprobability Sample: Matching plus Calibration","authors":"Zhanxu Liu, R. Valliant","doi":"10.2478/jos-2023-0003","DOIUrl":"https://doi.org/10.2478/jos-2023-0003","url":null,"abstract":"Abstract Matching a nonprobability sample to a probability sample is one strategy both for selecting the nonprobability units and for weighting them. This approach has been employed in the past to select subsamples of persons from a large panel of volunteers. One method of weighting, introduced here, is to assign a unit in the nonprobability sample the weight from its matched case in the probability sample. The properties of resulting estimators depend on whether the probability sample weights are inverses of selection probabilities or are calibrated. In addition, imperfect matching can cause estimates from the matched sample to be biased so that its weights need to be adjusted, especially when the size of the volunteer panel is small. Calibration weighting combined with matching is one approach to correct bias and reduce variances. We explore the theoretical properties of the matched and matched, calibrated estimators with respect to a quasirandomization distribution that is assumed to describe how units in the nonprobability sample are observed, a superpopulation model for analysis variables collected in the nonprobability sample, and the randomization distribution for the probability sample. Numerical studies using simulated and real data from the 2015 US Behavioral Risk Factor Surveillance Survey are conducted to examine the performance of the alternative estimators.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"45 - 78"},"PeriodicalIF":1.1,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48839103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Estimation of Domain Means from Business Surveys in the Presence of Stratum Jumpers and Nonresponse 在存在地层跳跃和无响应的情况下，从商业调查中估计领域均值

IF 1.1 4区数学 Q4 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Official Statistics

Pub Date : 2021-12-01 DOI: 10.2478/jos-2021-0045

Mengxuan Xu, V. Landsman, B. Graubard

Abstract Misclassified frame records (also called stratum jumpers) and low response rates are characteristic for business surveys. In the context of estimation of the domain parameters, jumpers may contribute to extreme variation in sample weights and skewed sampling distributions of the estimators, especially for domains with a small number of observations. There is limited literature about the extent to which these problems may affect the performance of the ratio estimators with nonresponse-adjusted weights. To address this gap, we designed a simulation study to explore the properties of the Horvitz-Thompson type ratio estimators, with and without smoothing of the weights, under different scenarios. The ratio estimator with propensity-adjusted weights showed satisfactory performance in all scenarios with a high response rate. For scenarios with a low response rate, the performance of this estimator improved with an increase in the proportion of jumpers in the domain. The smoothed estimators that we studied performed well in scenarios with non-informative weights, but can become markedly biased when the weights are informative, irrespective of response rate. We also studied the performance of the ’doubled half’ bootstrap method for variance estimation. We illustrated an application of the methods in a real business survey.

摘要错误分类的框架记录（也称为地层跳跃）和低响应率是商业调查的特点。在域参数估计的背景下，跳跃可能会导致样本权重的极端变化和估计量的偏斜采样分布，特别是对于具有少量观测值的域。关于这些问题可能在多大程度上影响具有无响应调整权重的比率估计器的性能，文献有限。为了解决这一差距，我们设计了一个模拟研究，以探索在不同情况下，在有和没有平滑权重的情况下，Horvitz-Thompson型比率估计量的性质。具有倾向调整权重的比率估计器在所有具有高响应率的场景中都显示出令人满意的性能。对于响应率较低的场景，该估计器的性能随着域中跳线比例的增加而提高。我们研究的平滑估计量在具有非信息权重的场景中表现良好，但当权重是信息权重时，无论响应率如何，都可能变得明显有偏差。我们还研究了“双半”bootstrap方法在方差估计中的性能。我们举例说明了这些方法在实际商业调查中的应用。

{"title":"Estimation of Domain Means from Business Surveys in the Presence of Stratum Jumpers and Nonresponse","authors":"Mengxuan Xu, V. Landsman, B. Graubard","doi":"10.2478/jos-2021-0045","DOIUrl":"https://doi.org/10.2478/jos-2021-0045","url":null,"abstract":"Abstract Misclassified frame records (also called stratum jumpers) and low response rates are characteristic for business surveys. In the context of estimation of the domain parameters, jumpers may contribute to extreme variation in sample weights and skewed sampling distributions of the estimators, especially for domains with a small number of observations. There is limited literature about the extent to which these problems may affect the performance of the ratio estimators with nonresponse-adjusted weights. To address this gap, we designed a simulation study to explore the properties of the Horvitz-Thompson type ratio estimators, with and without smoothing of the weights, under different scenarios. The ratio estimator with propensity-adjusted weights showed satisfactory performance in all scenarios with a high response rate. For scenarios with a low response rate, the performance of this estimator improved with an increase in the proportion of jumpers in the domain. The smoothed estimators that we studied performed well in scenarios with non-informative weights, but can become markedly biased when the weights are informative, irrespective of response rate. We also studied the performance of the ’doubled half’ bootstrap method for variance estimation. We illustrated an application of the methods in a real business survey.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"37 1","pages":"1059 - 1078"},"PeriodicalIF":1.1,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44958616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0