Abstract The article by Diewert and Fox provides a comprehensive overview of challenges that NSOs face in producing the CPI in pandemic times by touching on many different fields. A focus is on the treatment of missing prices, where they propose different methods depending on the resources available to the NSO. However, some of the procedures proposed can be seen as being less practical like the use of reservation prices (which is also debatable from a theoretical point of view) and of alternative data sources for weights whose implementation supposedly takes longer than the pandemic itself. Overall, the article provides an important contribution for making CPI production more robust for similar crises in the future.
{"title":"Creative and Exhaustive, but Less Practical – a Comment on the Article by Diewert and Fox","authors":"B. Goldhammer","doi":"10.2478/jos-2022-0014","DOIUrl":"https://doi.org/10.2478/jos-2022-0014","url":null,"abstract":"Abstract The article by Diewert and Fox provides a comprehensive overview of challenges that NSOs face in producing the CPI in pandemic times by touching on many different fields. A focus is on the treatment of missing prices, where they propose different methods depending on the resources available to the NSO. However, some of the procedures proposed can be seen as being less practical like the use of reservation prices (which is also debatable from a theoretical point of view) and of alternative data sources for weights whose implementation supposedly takes longer than the pandemic itself. Overall, the article provides an important contribution for making CPI production more robust for similar crises in the future.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"291 - 293"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48496485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract In recent years, there has been much interest among national statistical agencies in using web-scraped data in consumer price indices, potentially supplementing or replacing manually collected price quotes. Yet one challenge that has received very little attention to date is the estimation of expenditure weights in the absence of quantity information, which would enable the construction of weighted item-level price indices. In this article we propose the novel approach of predicting sales quantities from their ranks (for example, when products are sorted ‘by popularity’ on consumer websites) via appropriate statistical distributions. Using historical transactional data supplied by a UK retailer for two consumer items, we assessed the out-of-sample accuracy of the Pareto, log-normal and truncated log-normal distributions, finding that the last of these resulted in an index series that most closely approximated an expenditure-weighted benchmark. Our results demonstrate the value of supplementing web-scraped price quotes with a simple set of retailer-supplied summary statistics relating to quantities, allowing statistical agencies to realise the benefits of freely available internet data whilst placing minimal burden on retailers. However, further research would need to be undertaken before the approach could be implemented in the compilation of official price indices.
{"title":"Estimating Weights for Web-Scraped Data in Consumer Price Indices","authors":"D. Ayoubkhani, Heledd Thomas","doi":"10.2478/jos-2022-0002","DOIUrl":"https://doi.org/10.2478/jos-2022-0002","url":null,"abstract":"Abstract In recent years, there has been much interest among national statistical agencies in using web-scraped data in consumer price indices, potentially supplementing or replacing manually collected price quotes. Yet one challenge that has received very little attention to date is the estimation of expenditure weights in the absence of quantity information, which would enable the construction of weighted item-level price indices. In this article we propose the novel approach of predicting sales quantities from their ranks (for example, when products are sorted ‘by popularity’ on consumer websites) via appropriate statistical distributions. Using historical transactional data supplied by a UK retailer for two consumer items, we assessed the out-of-sample accuracy of the Pareto, log-normal and truncated log-normal distributions, finding that the last of these resulted in an index series that most closely approximated an expenditure-weighted benchmark. Our results demonstrate the value of supplementing web-scraped price quotes with a simple set of retailer-supplied summary statistics relating to quantities, allowing statistical agencies to realise the benefits of freely available internet data whilst placing minimal burden on retailers. However, further research would need to be undertaken before the approach could be implemented in the compilation of official price indices.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"5 - 21"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48879756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract U.S. import and export price indexes replaced unit value indexes forty years ago, given quality concerns of mismeasurement due to unit value bias. The administrative trade data underlying the unit values have greatly improved since that time. The transaction records are now more detailed, available electronically, and compiled monthly with little delay. The data are used by academic researchers to calculate price measures, and unit value indexes based on trade data are used by other national statistical offices (NSOs). The U.S. Bureau of Labor Statistics is now evaluating whether replacing price indexes with unit value indexes for homogeneous products calculated from administrative trade data could expand the number of published official import and export price indexes. Using export transactions, the research calculates detailed unit value indexes from 200 + million trade records from 2012–2017 for 123 export product categories. Results show that 27 of the 123 unit value indexes are homogeneous and closely comparable to published official price indexes. This article presents the concepts and methods considered to calculate and evaluate the unit value indexes and to select the product categories that are homogeneous. Compared to official price indexes, export unit value indexes for the 27 5-digit BEA (U.S. Bureau of Economic Analysis) end-use product categories would deflate real exports of these goods by 13 percentage points less over the period. Incorporating these 27 indexes into the top-level XPI would increase the value of real exports of all merchandise goods by 2.6 percentage points at the end of 2017.
摘要美国进出口价格指数在40年前取代了单位价值指数,考虑到单位价值偏差造成的计量错误的质量问题。从那时起,作为单位价值基础的行政贸易数据有了很大改善。交易记录现在更加详细,可以通过电子方式获得,并且每月编制一次,几乎没有延误。学术研究人员使用这些数据来计算价格指标,其他国家统计局(NSOs)使用基于贸易数据的单位价值指数。美国劳工统计局(Bureau of Labor Statistics)目前正在评估,用从行政贸易数据计算的同质产品的单位价值指数取代价格指数,是否可以扩大公布的官方进出口价格指数的数量。该研究利用出口交易,从2012-2017年的2亿多份贸易记录中计算出123个出口产品类别的详细单位价值指数。结果表明,123个单位价值指数中有27个与公布的官方价格指数具有同质性和可比性。本文介绍了单位价值指标的计算和评价以及同质产品品类的选择所考虑的概念和方法。与官方价格指数相比,27个5位数的BEA(美国经济分析局)最终用途产品类别的出口单位价值指数在此期间将使这些商品的实际出口减少13个百分点。将这27个指数纳入最高水平的XPI,到2017年底,所有商品的实际出口价值将增加2.6个百分点。
{"title":"Unit Value Indexes for Exports – New Developments Using Administrative Trade Data","authors":"Don Fast, S. Fleck, Dominic A Smith","doi":"10.2478/jos-2022-0005","DOIUrl":"https://doi.org/10.2478/jos-2022-0005","url":null,"abstract":"Abstract U.S. import and export price indexes replaced unit value indexes forty years ago, given quality concerns of mismeasurement due to unit value bias. The administrative trade data underlying the unit values have greatly improved since that time. The transaction records are now more detailed, available electronically, and compiled monthly with little delay. The data are used by academic researchers to calculate price measures, and unit value indexes based on trade data are used by other national statistical offices (NSOs). The U.S. Bureau of Labor Statistics is now evaluating whether replacing price indexes with unit value indexes for homogeneous products calculated from administrative trade data could expand the number of published official import and export price indexes. Using export transactions, the research calculates detailed unit value indexes from 200 + million trade records from 2012–2017 for 123 export product categories. Results show that 27 of the 123 unit value indexes are homogeneous and closely comparable to published official price indexes. This article presents the concepts and methods considered to calculate and evaluate the unit value indexes and to select the product categories that are homogeneous. Compared to official price indexes, export unit value indexes for the 27 5-digit BEA (U.S. Bureau of Economic Analysis) end-use product categories would deflate real exports of these goods by 13 percentage points less over the period. Incorporating these 27 indexes into the top-level XPI would increase the value of real exports of all merchandise goods by 2.6 percentage points at the end of 2017.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"83 - 106"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48907225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The importance of constructing sub-national spatial price indexes (SPIs) has been acknowledged in the literature for over two decades. However, systematic attempts to compile sub-national SPIs on a regular basis have been hampered by the labour-intensive analyses required for processing traditional price data. In the case of household consumption expenditures, the increasing availability of big data may change the current approach for estimating sub-national SPIs by considering the use of weighted index formulae. The aim of this paper is twofold: firstly, to review previous literature on sub-national SPIs and secondly to estimate Italian consumer SPIs. To this aim we use scanner data referring to grocery products sold in a random sample of approximately 1,800 Italian outlets belonging to the most important retail chains and including information on prices, quantities and quality characteristics of products at barcode level. Various weighted index formulas are used for calculating consumer SPIs at detailed territorial level and at the lowest aggregation level. Our results show an interesting territorial variability of consumer prices of products sold in large-scale retail outlets across the Italian regions. Overall, the Southern regions appear to have price levels below the national average both for food and non-food products with some interesting exceptions.
{"title":"Using Scanner Data for Computing Consumer Spatial Price Indexes at Regional Level: An Empirical Application for Grocery Products in Italy","authors":"T. Laureti, F. Polidoro","doi":"10.2478/jos-2022-0003","DOIUrl":"https://doi.org/10.2478/jos-2022-0003","url":null,"abstract":"Abstract The importance of constructing sub-national spatial price indexes (SPIs) has been acknowledged in the literature for over two decades. However, systematic attempts to compile sub-national SPIs on a regular basis have been hampered by the labour-intensive analyses required for processing traditional price data. In the case of household consumption expenditures, the increasing availability of big data may change the current approach for estimating sub-national SPIs by considering the use of weighted index formulae. The aim of this paper is twofold: firstly, to review previous literature on sub-national SPIs and secondly to estimate Italian consumer SPIs. To this aim we use scanner data referring to grocery products sold in a random sample of approximately 1,800 Italian outlets belonging to the most important retail chains and including information on prices, quantities and quality characteristics of products at barcode level. Various weighted index formulas are used for calculating consumer SPIs at detailed territorial level and at the lowest aggregation level. Our results show an interesting territorial variability of consumer prices of products sold in large-scale retail outlets across the Italian regions. Overall, the Southern regions appear to have price levels below the national average both for food and non-food products with some interesting exceptions.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"23 - 56"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47638542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Official rentals for housing (rent) price inflation statistics are of considerable public interest. Matched-sample estimators, such as that used for nearly two-decades in New Zealand (2000–2019), require an unrealistic assumption of a static universe of rental properties. This article investigates (1) a property fixed-effects estimator that better reflects the dynamic universe of rental properties by implicitly imputing for price change associated with new and disappearing rental properties; (2) length-alignment simulations and property life-cycle metrics to inform the choice of data window length (eight years) and preferred splice methodology (mean-splice); and (3) stock-imputation to convert administrative data from a ‘flow’ (new tenancy price) to ‘stock’ (currently paid rent) concept. The derived window-length sensitivity findings have important implications for inflation measurement. It was found that the longer the data window used to fit the model, the greater the estimated rate of inflation. Using administrative data, a range of estimates from 55% (window length: three-quarters) to 127% (window of 90-quarters) were found for total inflation, over the 25-years to 2017 Q4.
{"title":"Rentals for Housing: A Property Fixed-Effects Estimator of Inflation from Administrative Data","authors":"A. Bentley","doi":"10.2478/jos-2022-0009","DOIUrl":"https://doi.org/10.2478/jos-2022-0009","url":null,"abstract":"Abstract Official rentals for housing (rent) price inflation statistics are of considerable public interest. Matched-sample estimators, such as that used for nearly two-decades in New Zealand (2000–2019), require an unrealistic assumption of a static universe of rental properties. This article investigates (1) a property fixed-effects estimator that better reflects the dynamic universe of rental properties by implicitly imputing for price change associated with new and disappearing rental properties; (2) length-alignment simulations and property life-cycle metrics to inform the choice of data window length (eight years) and preferred splice methodology (mean-splice); and (3) stock-imputation to convert administrative data from a ‘flow’ (new tenancy price) to ‘stock’ (currently paid rent) concept. The derived window-length sensitivity findings have important implications for inflation measurement. It was found that the longer the data window used to fit the model, the greater the estimated rate of inflation. Using administrative data, a range of estimates from 55% (window length: three-quarters) to 127% (window of 90-quarters) were found for total inflation, over the 25-years to 2017 Q4.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"187 - 211"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43762649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The import and export price indices of an economy are usually compiled by some Laspeyres type index. It is well known that such an index formula is prone to substitution bias. Therefore, also the terms of trade (ratio of export and import price index) are likely to be distorted. The underlying substitution bias accumulates over time. The present article introduces a simple and transparent retroactive correction approach that addresses the source of the substitution bias and produces meaningful long-run time series of import and export price levels and, therefore, of the terms of trade. Furthermore, an empirical case study is conducted that demonstrates the efficacy and versatility of the correction approach.
{"title":"Substitution Bias in the Measurement of Import and Export Price Indices: Causes and Correction","authors":"Ludwig von Auer, Alena Shumskikh","doi":"10.2478/jos-2022-0006","DOIUrl":"https://doi.org/10.2478/jos-2022-0006","url":null,"abstract":"Abstract The import and export price indices of an economy are usually compiled by some Laspeyres type index. It is well known that such an index formula is prone to substitution bias. Therefore, also the terms of trade (ratio of export and import price index) are likely to be distorted. The underlying substitution bias accumulates over time. The present article introduces a simple and transparent retroactive correction approach that addresses the source of the substitution bias and produces meaningful long-run time series of import and export price levels and, therefore, of the terms of trade. Furthermore, an empirical case study is conducted that demonstrates the efficacy and versatility of the correction approach.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"38 1","pages":"107 - 126"},"PeriodicalIF":1.1,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46496467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Response burden has long been a concern for data producers. In this article, we investigate the relationship between some measures of actual and perceived burden and we provide empirical evidence of their association with data quality. We draw on two business surveys conducted by Banca d’Italia since 1970, which provide a very rich and unique source of information. We find evidence that the perceived burden is affected by actual burden but the latter is not the only driver. Our results also show a clear link between a respondent’s perceived effort and the probability of not answering some important questions (such as those relating to expectations of future investments and turnover) or of dropping out of the survey. On the contrary, we do not find significant effects on the quality of answers to quantitative questions such as business turnover and investments. Overall, these findings have implications for data producers that should target the perceived burden, besides the actual burden, to increase data quality.
{"title":"Response Burden and Data Quality in Business Surveys","authors":"Marco Bottone, Lucia Modugno, A. Neri","doi":"10.2478/jos-2021-0036","DOIUrl":"https://doi.org/10.2478/jos-2021-0036","url":null,"abstract":"Abstract Response burden has long been a concern for data producers. In this article, we investigate the relationship between some measures of actual and perceived burden and we provide empirical evidence of their association with data quality. We draw on two business surveys conducted by Banca d’Italia since 1970, which provide a very rich and unique source of information. We find evidence that the perceived burden is affected by actual burden but the latter is not the only driver. Our results also show a clear link between a respondent’s perceived effort and the probability of not answering some important questions (such as those relating to expectations of future investments and turnover) or of dropping out of the survey. On the contrary, we do not find significant effects on the quality of answers to quantitative questions such as business turnover and investments. Overall, these findings have implications for data producers that should target the perceived burden, besides the actual burden, to increase data quality.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"37 1","pages":"811 - 836"},"PeriodicalIF":1.1,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48371412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We draw attention to how, in the name of protecting the confidentiality of personal data, national statistical agencies have limited public access to spatial data on COVID-19. We also draw attention to large disparities in the way that access has been limited. In doing so, we distinguish between absolute confidentiality in which the probability of detection is 1, relative confidentiality where this probability is less than 1, and collective confidentiality, which refers to the probability of detection of at least one person. In spatial data, the probability of personal detection is less than 1, and the probability of collective detection varies directly with this probability and COVID-19 morbidity. Statistical agencies have been concerned with relative and collective confidentiality, which they implement using the techniques of truncation, where spatial data are not made public for zones with small populations, and censoring, where exact data are not made public for zones where morbidity is small. Granular spatial data are essential for epidemiological research into COVID-19. We argue that in their reluctance to make these data available to the public, data security officers (DSO) have unreasonably prioritized data protection over freedom of information. We also argue that by attaching importance to relative and collective confidentiality, they have over-indulged in data truncation and censoring. We highlight the need for legislation concerning relative and collective confidentiality, and regulation of DSO practices regarding data truncation and censoring.
{"title":"Freedom of Information and Personal Confidentiality in Spatial COVID-19 Data","authors":"M. Beenstock, D. Felsenstein","doi":"10.2478/jos-2021-0035","DOIUrl":"https://doi.org/10.2478/jos-2021-0035","url":null,"abstract":"Abstract We draw attention to how, in the name of protecting the confidentiality of personal data, national statistical agencies have limited public access to spatial data on COVID-19. We also draw attention to large disparities in the way that access has been limited. In doing so, we distinguish between absolute confidentiality in which the probability of detection is 1, relative confidentiality where this probability is less than 1, and collective confidentiality, which refers to the probability of detection of at least one person. In spatial data, the probability of personal detection is less than 1, and the probability of collective detection varies directly with this probability and COVID-19 morbidity. Statistical agencies have been concerned with relative and collective confidentiality, which they implement using the techniques of truncation, where spatial data are not made public for zones with small populations, and censoring, where exact data are not made public for zones where morbidity is small. Granular spatial data are essential for epidemiological research into COVID-19. We argue that in their reluctance to make these data available to the public, data security officers (DSO) have unreasonably prioritized data protection over freedom of information. We also argue that by attaching importance to relative and collective confidentiality, they have over-indulged in data truncation and censoring. We highlight the need for legislation concerning relative and collective confidentiality, and regulation of DSO practices regarding data truncation and censoring.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"37 1","pages":"791 - 809"},"PeriodicalIF":1.1,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41691355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Matching a nonprobability sample to a probability sample is one strategy both for selecting the nonprobability units and for weighting them. This approach has been employed in the past to select subsamples of persons from a large panel of volunteers. One method of weighting, introduced here, is to assign a unit in the nonprobability sample the weight from its matched case in the probability sample. The properties of resulting estimators depend on whether the probability sample weights are inverses of selection probabilities or are calibrated. In addition, imperfect matching can cause estimates from the matched sample to be biased so that its weights need to be adjusted, especially when the size of the volunteer panel is small. Calibration weighting combined with matching is one approach to correct bias and reduce variances. We explore the theoretical properties of the matched and matched, calibrated estimators with respect to a quasirandomization distribution that is assumed to describe how units in the nonprobability sample are observed, a superpopulation model for analysis variables collected in the nonprobability sample, and the randomization distribution for the probability sample. Numerical studies using simulated and real data from the 2015 US Behavioral Risk Factor Surveillance Survey are conducted to examine the performance of the alternative estimators.
{"title":"Investigating an Alternative for Estimation from a Nonprobability Sample: Matching plus Calibration","authors":"Zhanxu Liu, R. Valliant","doi":"10.2478/jos-2023-0003","DOIUrl":"https://doi.org/10.2478/jos-2023-0003","url":null,"abstract":"Abstract Matching a nonprobability sample to a probability sample is one strategy both for selecting the nonprobability units and for weighting them. This approach has been employed in the past to select subsamples of persons from a large panel of volunteers. One method of weighting, introduced here, is to assign a unit in the nonprobability sample the weight from its matched case in the probability sample. The properties of resulting estimators depend on whether the probability sample weights are inverses of selection probabilities or are calibrated. In addition, imperfect matching can cause estimates from the matched sample to be biased so that its weights need to be adjusted, especially when the size of the volunteer panel is small. Calibration weighting combined with matching is one approach to correct bias and reduce variances. We explore the theoretical properties of the matched and matched, calibrated estimators with respect to a quasirandomization distribution that is assumed to describe how units in the nonprobability sample are observed, a superpopulation model for analysis variables collected in the nonprobability sample, and the randomization distribution for the probability sample. Numerical studies using simulated and real data from the 2015 US Behavioral Risk Factor Surveillance Survey are conducted to examine the performance of the alternative estimators.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"39 1","pages":"45 - 78"},"PeriodicalIF":1.1,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48839103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Misclassified frame records (also called stratum jumpers) and low response rates are characteristic for business surveys. In the context of estimation of the domain parameters, jumpers may contribute to extreme variation in sample weights and skewed sampling distributions of the estimators, especially for domains with a small number of observations. There is limited literature about the extent to which these problems may affect the performance of the ratio estimators with nonresponse-adjusted weights. To address this gap, we designed a simulation study to explore the properties of the Horvitz-Thompson type ratio estimators, with and without smoothing of the weights, under different scenarios. The ratio estimator with propensity-adjusted weights showed satisfactory performance in all scenarios with a high response rate. For scenarios with a low response rate, the performance of this estimator improved with an increase in the proportion of jumpers in the domain. The smoothed estimators that we studied performed well in scenarios with non-informative weights, but can become markedly biased when the weights are informative, irrespective of response rate. We also studied the performance of the ’doubled half’ bootstrap method for variance estimation. We illustrated an application of the methods in a real business survey.
{"title":"Estimation of Domain Means from Business Surveys in the Presence of Stratum Jumpers and Nonresponse","authors":"Mengxuan Xu, V. Landsman, B. Graubard","doi":"10.2478/jos-2021-0045","DOIUrl":"https://doi.org/10.2478/jos-2021-0045","url":null,"abstract":"Abstract Misclassified frame records (also called stratum jumpers) and low response rates are characteristic for business surveys. In the context of estimation of the domain parameters, jumpers may contribute to extreme variation in sample weights and skewed sampling distributions of the estimators, especially for domains with a small number of observations. There is limited literature about the extent to which these problems may affect the performance of the ratio estimators with nonresponse-adjusted weights. To address this gap, we designed a simulation study to explore the properties of the Horvitz-Thompson type ratio estimators, with and without smoothing of the weights, under different scenarios. The ratio estimator with propensity-adjusted weights showed satisfactory performance in all scenarios with a high response rate. For scenarios with a low response rate, the performance of this estimator improved with an increase in the proportion of jumpers in the domain. The smoothed estimators that we studied performed well in scenarios with non-informative weights, but can become markedly biased when the weights are informative, irrespective of response rate. We also studied the performance of the ’doubled half’ bootstrap method for variance estimation. We illustrated an application of the methods in a real business survey.","PeriodicalId":51092,"journal":{"name":"Journal of Official Statistics","volume":"37 1","pages":"1059 - 1078"},"PeriodicalIF":1.1,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44958616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}