Journal of Survey Statistics and Methodology最新文献

英文中文

OUP accepted manuscript OUP接受稿件

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2022-01-01 DOI: 10.1093/jssam/smac021

引用次数: 0

OUP accepted manuscript OUP接受稿件

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2022-01-01 DOI: 10.1093/jssam/smac018

引用次数: 4

OUP accepted manuscript OUP接受稿件

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2022-01-01 DOI: 10.1093/jssam/smac007

引用次数: 1

Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics. 海量数据的多重输入:在收入动态面板研究中的应用。

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2021-10-19 eCollection Date: 2023-02-01 DOI: 10.1093/jssam/smab038

Yajuan Si, Steve Heeringa, David Johnson, Roderick J A Little, Wenshuo Liu, Fabian Pfeffer, Trivellore Raghunathan

Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods because of the simple implementation; however, the univariate hot deck results in large random wealth fluctuations. MI is effective but faced with operational challenges. We use a sequential regression/chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with those from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures, such as the associations between PSID wealth components and the relationships between the household net worth and sociodemographic factors, and facilitates completed data analyses with general purposes. MI incorporates highly predictive covariates into imputation models and increases efficiency. We recommend the practical implementation of MI and expect greater gains when the fraction of missing information is large.

多重插值(Multiple imputation, MI)是一种处理多元数据集缺失数据的常用方法，但其在大规模复杂数据集中的实用性一直受到质疑。收入动态小组研究(PSID)就是这样一组数据，这是一项长期而广泛的美国家庭收入和财富调查。由于执行简单，目前使用传统的热甲板方法处理该调查的缺失数据;然而，单变量热牌会导致财富的大随机波动。MI是有效的，但面临着操作上的挑战。我们使用顺序回归/链式方程方法，使用IVEware软件，将2013年PSID中的估算截面财富数据相乘，并将所得估算数据与当前热甲板方法的分析结果进行比较。实际困难，如非正态分布变量，跳跃模式，分类变量与许多层次，多重共线性，描述了我们的方法来克服它们。我们通过内部诊断和外部基准数据来评估imputation的质量和有效性。MI通过帮助保存相关结构(例如PSID财富组成部分之间的关联以及家庭净资产与社会人口因素之间的关系)，对现有的热甲板方法进行了改进，并促进了具有一般用途的完整数据分析。人工智能将高度预测的协变量整合到估算模型中，提高了效率。我们推荐MI的实际实现，并期望在丢失信息的比例较大时获得更大的收益。

{"title":"Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics.","authors":"Yajuan Si, Steve Heeringa, David Johnson, Roderick J A Little, Wenshuo Liu, Fabian Pfeffer, Trivellore Raghunathan","doi":"10.1093/jssam/smab038","DOIUrl":"10.1093/jssam/smab038","url":null,"abstract":"<p><p>Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods because of the simple implementation; however, the univariate hot deck results in large random wealth fluctuations. MI is effective but faced with operational challenges. We use a sequential regression/chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with those from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures, such as the associations between PSID wealth components and the relationships between the household net worth and sociodemographic factors, and facilitates completed data analyses with general purposes. MI incorporates highly predictive covariates into imputation models and increases efficiency. We recommend the practical implementation of MI and expect greater gains when the fraction of missing information is large.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"11 1","pages":"260-283"},"PeriodicalIF":2.1,"publicationDate":"2021-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9874997/pdf/smab038.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10584238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Underreporting of Purchases in the US Consumer Expenditure Survey 在美国消费者支出调查中少报购买

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2021-08-30 DOI: 10.1093/jssam/smab024

S. Eckman

Motivated misreporting occurs when respondents give incorrect responses to survey questions to shorten the interview; studies have detected this behavior across many modes, topics, and countries. This paper tests whether motivated misreporting affects responses in a large survey of household purchases, the US Consumer Expenditure Interview Survey. The data from this survey inform the calculation of the official measure of inflation, among other uses. Using a parallel web survey and multiple imputation, this article estimates the size of the misreporting effect without experimentally manipulating questions in the survey itself. Results suggest that household purchases are underreported by approximately five percentage points in three sections of the first wave of the survey. The approach used here, involving a web survey built to mimic the expenditure survey, could be applied in other large surveys where budget or logistical constraints prevent experimentation.

当被调查者对调查问题给出不正确的回答以缩短访谈时，会发生动机性误报;研究已经在许多模式、主题和国家中发现了这种行为。本文测试了动机性误报是否会影响家庭购买的大型调查，即美国消费者支出访谈调查的反应。这项调查的数据为官方通货膨胀指标的计算提供了依据，以及其他用途。本文使用平行网络调查和多重输入，在没有实验操纵调查本身问题的情况下估计误报效应的大小。结果显示，在第一波调查的三个部分中，家庭购买被少报了大约5个百分点。这里使用的方法包括模仿支出调查而建立的网络调查，可以应用于预算或后勤限制阻止实验的其他大型调查。

引用次数: 3

Sequential and Concurrent Internet-Telephone Mixed-Mode Designs in Sexual Health Behavior Research 性健康行为研究中的顺序和并发网络电话混合模式设计

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2021-08-30 DOI: 10.1093/jssam/smab026

S. Legleye, Géraldine Charrance

The 2013 FECOND (Fertility, Contraception, and Sexual Dysfunction) probability telephone survey aims to monitor sexual health behaviors among fifteen to forty-nine year olds in France. We conducted a random experiment to compare a classic telephone survey (group T, n = 3,846 respondents) with two Internet-telephone mixed-mode protocols: a sequential Internet-telephone protocol (group S, n = 762, among which there were 462 Internet questionnaires), and a concurrent protocol (group C, n = 1,165, among which there were 208 Internet questionnaires). We compare telephone (T), sequential (S), and concurrent (C) samples on cooperation rates, break-off, and item nonresponse rates, sociodemographic characteristics, health behaviors, and seven sexual health behaviors and personal opinions questions. Reports on the most sensitive behaviors were expected to be more truthful and more prevalent on the Internet—and thus in the mixed-mode samples—than in the telephone sample. The cooperation rate (i.e., the response rate among the possible respondents selected during the initial telephone call) was higher in the classic telephone survey than in the sequential and concurrent mixed-mode protocols (88 percent for T versus 77 percent for S and 55 percent for C), where break-off and item nonresponse rates were also higher. Despite these lower response rates, mixed-mode samples showed better representativeness: their marginal distribution of sociodemographic characteristics was closer to that of the 2013 census, and they had higher R-indicators. A causal estimation of the measurement effect resulting from Internet administration found higher prevalence of three out of the seven sexual health behaviors and personal opinions in the sequential protocol compared to the classic telephone group; a similar pattern was found in the concurrent protocol. In addition, the variance of the weights of the mixed-mode protocols is lower, especially for the sequential design. Sequential telephone-Internet mixed-mode protocols nested in a probability telephone survey may be a good way to improve survey research on sensitive behaviors.

2013年的FECOND（生育、避孕和性功能障碍）概率电话调查旨在监测法国15至49岁青少年的性健康行为。我们进行了一项随机实验来比较一项经典的电话调查（T组，n = 3846名受访者）使用两种互联网电话混合模式协议：一种顺序互联网电话协议（S组，n = 762份，其中462份互联网问卷），以及一项并发协议（C组，n = 1165份，其中有208份互联网问卷）。我们比较了电话（T）、顺序（S）和并发（C）样本的合作率、中断率和项目无反应率、社会人口统计学特征、健康行为以及七种性健康行为和个人意见问题。与电话样本相比，关于最敏感行为的报告在互联网上更真实、更普遍，因此在混合模式样本中也是如此。在经典电话调查中，合作率（即在最初的电话通话中选择的可能受访者的应答率）高于顺序和并发混合模式协议（T为88%，S为77%，C为55%），其中中断和项目无应答率也更高。尽管回答率较低，但混合模式样本显示出更好的代表性：他们的社会人口特征的边际分布更接近2013年人口普查，并且他们的R指标更高。对互联网管理产生的测量效果的因果估计发现，与经典电话组相比，在顺序协议中，七分之三的性健康行为和个人意见的发生率更高；在并发协议中也发现了类似的模式。此外，混合模式协议的权重方差较小，特别是对于顺序设计。嵌套在概率电话调查中的顺序电话-互联网混合模式协议可能是改进敏感行为调查研究的好方法。

{"title":"Sequential and Concurrent Internet-Telephone Mixed-Mode Designs in Sexual Health Behavior Research","authors":"S. Legleye, Géraldine Charrance","doi":"10.1093/jssam/smab026","DOIUrl":"https://doi.org/10.1093/jssam/smab026","url":null,"abstract":"\u0000 The 2013 FECOND (Fertility, Contraception, and Sexual Dysfunction) probability telephone survey aims to monitor sexual health behaviors among fifteen to forty-nine year olds in France. We conducted a random experiment to compare a classic telephone survey (group T, n = 3,846 respondents) with two Internet-telephone mixed-mode protocols: a sequential Internet-telephone protocol (group S, n = 762, among which there were 462 Internet questionnaires), and a concurrent protocol (group C, n = 1,165, among which there were 208 Internet questionnaires). We compare telephone (T), sequential (S), and concurrent (C) samples on cooperation rates, break-off, and item nonresponse rates, sociodemographic characteristics, health behaviors, and seven sexual health behaviors and personal opinions questions. Reports on the most sensitive behaviors were expected to be more truthful and more prevalent on the Internet—and thus in the mixed-mode samples—than in the telephone sample. The cooperation rate (i.e., the response rate among the possible respondents selected during the initial telephone call) was higher in the classic telephone survey than in the sequential and concurrent mixed-mode protocols (88 percent for T versus 77 percent for S and 55 percent for C), where break-off and item nonresponse rates were also higher. Despite these lower response rates, mixed-mode samples showed better representativeness: their marginal distribution of sociodemographic characteristics was closer to that of the 2013 census, and they had higher R-indicators. A causal estimation of the measurement effect resulting from Internet administration found higher prevalence of three out of the seven sexual health behaviors and personal opinions in the sequential protocol compared to the classic telephone group; a similar pattern was found in the concurrent protocol. In addition, the variance of the weights of the mixed-mode protocols is lower, especially for the sequential design. Sequential telephone-Internet mixed-mode protocols nested in a probability telephone survey may be a good way to improve survey research on sensitive behaviors.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2021-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47716223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lack of Replication or Generalization? Cultural Values Explain a Question Wording Effect 缺乏复制还是泛化?文化价值观解释问题措辞效应

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2021-08-20 DOI: 10.1093/jssam/smab007

Henning Silber, E. Tvinnereim, T. Stark, A. Blom, J. Krosnick, M. Bošnjak, S. Clement, Anne Cornilleau, Anne-Sophie Cousteaux, M. John, G. Jónsdóttir, K. Lawson, Peter Lynn, Johan Martinsson, Ditte Shamshiri-Petersen, Su-Hao Tu

In the context of the current “replication crisis” across the sciences, failures to reproduce a finding are often viewed as discrediting it. This paper shows how such a conclusion can be incorrect. In 1981, Schuman and Presser showed that including the word “freedom” in a survey question significantly increased approval of allowing a speech against religion in the USA. New experiments in probability sample surveys (n = 23,370) in the USA and 10 other countries showed that the wording effect replicated in the USA and appeared in four other countries (Canada, Germany, Taiwan, and the Netherlands) but not in the remaining countries. The effect appeared only in countries in which the value of freedom is especially salient and endorsed. Thus, public support for a proposition was enhanced by portraying it as embodying a salient principle of a nation’s culture. Instead of questioning initial findings, inconsistent results across countries signal limits on generalizability and identify an important moderator.

在当前整个科学领域的“复制危机”背景下，未能复制一项发现往往被视为对其失去信誉。本文表明，这样的结论可能是不正确的。1981年，舒曼和普莱斯表明，在美国，在调查问题中加入“自由”一词显著增加了允许反宗教言论的支持率。在美国和其他10个国家进行的概率抽样调查(n = 23,370)的新实验表明，这种措辞效应在美国得到了复制，在其他4个国家(加拿大、德国、台湾和荷兰)也出现了，但在其他国家没有。这种影响只出现在自由价值特别突出和得到认可的国家。因此，如果一项提议被描绘成体现了一个国家文化的突出原则，公众对它的支持就会增强。各国之间不一致的结果没有对初步发现提出质疑，而是表明了普遍性的局限性，并确定了一个重要的调节因素。

{"title":"Lack of Replication or Generalization? Cultural Values Explain a Question Wording Effect","authors":"Henning Silber, E. Tvinnereim, T. Stark, A. Blom, J. Krosnick, M. Bošnjak, S. Clement, Anne Cornilleau, Anne-Sophie Cousteaux, M. John, G. Jónsdóttir, K. Lawson, Peter Lynn, Johan Martinsson, Ditte Shamshiri-Petersen, Su-Hao Tu","doi":"10.1093/jssam/smab007","DOIUrl":"https://doi.org/10.1093/jssam/smab007","url":null,"abstract":"\u0000 In the context of the current “replication crisis” across the sciences, failures to reproduce a finding are often viewed as discrediting it. This paper shows how such a conclusion can be incorrect. In 1981, Schuman and Presser showed that including the word “freedom” in a survey question significantly increased approval of allowing a speech against religion in the USA. New experiments in probability sample surveys (n = 23,370) in the USA and 10 other countries showed that the wording effect replicated in the USA and appeared in four other countries (Canada, Germany, Taiwan, and the Netherlands) but not in the remaining countries. The effect appeared only in countries in which the value of freedom is especially salient and endorsed. Thus, public support for a proposition was enhanced by portraying it as embodying a salient principle of a nation’s culture. Instead of questioning initial findings, inconsistent results across countries signal limits on generalizability and identify an important moderator.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2021-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45204446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An Experimental Comparison of Three Strategies for Converting Mail Respondents in a Probability-Based Mixed-Mode Panel to Internet Respondents 基于概率的混合模式面板中邮件回答者转换为互联网回答者三种策略的实验比较

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2021-08-17 DOI: 10.1093/jssam/smab002

David Bretschi, Ines Schaurer, D. Dillman

In recent years, web-push strategies have been developed in cross-sectional mixed-mode surveys to improve response rates and reduce the costs of data collection. However, pushing respondents into the more cost-efficient web mode has rarely been examined in the context of panel surveys. This study evaluates how a web-push intervention affects the willingness of panel members to switch survey modes from mail to web. We tested three web-push strategies in a German probability-based mixed-mode panel by randomly assigning 1,895 panelists of the mail mode to one of three conditions: (1) the web option was offered to panelists concurrently with the paper questionnaire including a promised €10 incentive for completing the survey on the web, (2) the web option was presented sequentially two weeks before sending the paper questionnaire and respondents were also promised an incentive of €10, or (3) same sequential web-first approach as for condition 2, but with a prepaid €10 incentive instead of a promised incentive. The study found that a sequential presentation of the web option significantly increases the web response in a single survey but may not motivate more panelists to switch to the web mode permanently. Contrary to our expectation, offering prepaid incentives neither improves the web response nor the proportion of mode switchers. Overall, all three web-push strategies show the potential to effectively reduce survey costs without causing differences in panel attrition after five consecutive waves. Condition 2, the sequential web-first design combined with a promised incentive was most effective in pushing respondents to switch to the web mode and in reducing costs.

近年来，在横断面混合模式调查中制定了网络推送策略，以提高响应率并降低数据收集成本。然而，很少在小组调查的背景下研究将受访者推向更具成本效益的网络模式。本研究评估了网络推送干预如何影响小组成员将调查模式从邮件切换到网络的意愿。我们在德国一个基于概率的混合模式小组中测试了三种网络推送策略，方法是将1895名邮件模式的小组成员随机分配到三个条件之一：（1）网络选项与纸质问卷同时提供给小组成员，其中包括承诺在网络上完成调查的10欧元奖励，（2）在发送纸质问卷前两周，依次提供网络选项，并向受访者承诺10欧元的激励，或（3）与条件2相同的顺序网络优先方法，但使用预付的10欧元激励，而不是承诺的激励。研究发现，在一次调查中，连续呈现网络选项会显著提高网络反应，但可能不会激励更多的小组成员永久切换到网络模式。与我们的预期相反，提供预付费激励既不能提高网络响应，也不能提高模式转换者的比例。总的来说，这三种网络推送策略都显示出在连续五波之后有效降低调查成本而不会造成小组流失差异的潜力。条件2，顺序的网络优先设计与承诺的激励相结合，在促使受访者转向网络模式和降低成本方面最有效。

{"title":"An Experimental Comparison of Three Strategies for Converting Mail Respondents in a Probability-Based Mixed-Mode Panel to Internet Respondents","authors":"David Bretschi, Ines Schaurer, D. Dillman","doi":"10.1093/jssam/smab002","DOIUrl":"https://doi.org/10.1093/jssam/smab002","url":null,"abstract":"\u0000 In recent years, web-push strategies have been developed in cross-sectional mixed-mode surveys to improve response rates and reduce the costs of data collection. However, pushing respondents into the more cost-efficient web mode has rarely been examined in the context of panel surveys. This study evaluates how a web-push intervention affects the willingness of panel members to switch survey modes from mail to web. We tested three web-push strategies in a German probability-based mixed-mode panel by randomly assigning 1,895 panelists of the mail mode to one of three conditions: (1) the web option was offered to panelists concurrently with the paper questionnaire including a promised €10 incentive for completing the survey on the web, (2) the web option was presented sequentially two weeks before sending the paper questionnaire and respondents were also promised an incentive of €10, or (3) same sequential web-first approach as for condition 2, but with a prepaid €10 incentive instead of a promised incentive. The study found that a sequential presentation of the web option significantly increases the web response in a single survey but may not motivate more panelists to switch to the web mode permanently. Contrary to our expectation, offering prepaid incentives neither improves the web response nor the proportion of mode switchers. Overall, all three web-push strategies show the potential to effectively reduce survey costs without causing differences in panel attrition after five consecutive waves. Condition 2, the sequential web-first design combined with a promised incentive was most effective in pushing respondents to switch to the web mode and in reducing costs.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44504804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Capture–Recapture Estimation of Characteristics of U.S. Local Food Farms Using a Web-Scraped List Frame 使用网络列表框架对美国当地食品农场特征进行捕获-再捕获评估

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2021-08-16 DOI: 10.1093/jssam/smab008

Michael Hyman, L. Sartore, L. Young

The emerging sectors of agriculture, such as organics, urban, and local food, tend to be dominated by farms that are smaller, more transient, more diverse, and more dispersed than the traditional farms in the rural areas of the United States. As a consequence, a list frame of all farms within one of these sectors is difficult to construct and, even with the best of efforts, is incomplete. The United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) maintains a list frame of all known and potential U.S. farms and uses this list frame as the sampling frame for most of its surveys. Traditionally, NASS has used its area frame to assess undercoverage. However, getting a good measure of the incompleteness of the NASS list frame using an area frame is cost prohibitive for farms in these emerging sectors that tend to be located within and near urban areas. In 2016, NASS conducted the Local Food Marketing Practices (LFMP) survey. Independent samples were drawn from (1) the NASS list frame and (2) a web-scraped list of local food farms. Using these two samples and capture–recapture methods, the total number and sales of local food operations at the United States, regional, and state levels were estimated. To our knowledge, the LFMP survey is the first survey in which a web-scraped list frame has been used to assess undercoverage in a capture–recapture setting to produce official statistics. In this article, the methods are presented, and the challenges encountered are reviewed. Best practices and open research questions for conducting surveys using web-scraped list frames and capture–recapture methods are discussed.

新兴的农业部门，如有机物、城市和当地食品，往往由比美国农村地区传统农场更小、更短暂、更多样、更分散的农场主导。因此，很难建立其中一个部门内所有农场的清单框架，即使尽了最大努力，也不完整。美国农业部（USDA）的国家农业统计局（NASS）保留了一个所有已知和潜在美国农场的列表框架，并将该列表框架用作大多数调查的抽样框架。传统上，NASS使用其面积框架来评估欠平均化。然而，对于那些倾向于位于城市地区内和附近的新兴部门的农场来说，使用区域框架来很好地衡量NASS列表框架的不完整性是成本高昂的。2016年，NASS进行了当地食品营销实践（LFMP）调查。独立样本来自（1）NASS列表框架和（2）当地食品农场的网络刮取列表。使用这两个样本和捕获-再捕获方法，估计了美国、地区和州各级当地食品业务的总数和销售额。据我们所知，LFMP调查是第一次使用网络抓取列表框架来评估捕获-再捕获环境中的低平均率，以产生官方统计数据的调查。在这篇文章中，介绍了方法，并回顾了遇到的挑战。讨论了使用网络抓取列表框架和捕获-重新捕获方法进行调查的最佳实践和开放研究问题。

{"title":"Capture–Recapture Estimation of Characteristics of U.S. Local Food Farms Using a Web-Scraped List Frame","authors":"Michael Hyman, L. Sartore, L. Young","doi":"10.1093/jssam/smab008","DOIUrl":"https://doi.org/10.1093/jssam/smab008","url":null,"abstract":"\u0000 The emerging sectors of agriculture, such as organics, urban, and local food, tend to be dominated by farms that are smaller, more transient, more diverse, and more dispersed than the traditional farms in the rural areas of the United States. As a consequence, a list frame of all farms within one of these sectors is difficult to construct and, even with the best of efforts, is incomplete. The United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) maintains a list frame of all known and potential U.S. farms and uses this list frame as the sampling frame for most of its surveys. Traditionally, NASS has used its area frame to assess undercoverage. However, getting a good measure of the incompleteness of the NASS list frame using an area frame is cost prohibitive for farms in these emerging sectors that tend to be located within and near urban areas. In 2016, NASS conducted the Local Food Marketing Practices (LFMP) survey. Independent samples were drawn from (1) the NASS list frame and (2) a web-scraped list of local food farms. Using these two samples and capture–recapture methods, the total number and sales of local food operations at the United States, regional, and state levels were estimated. To our knowledge, the LFMP survey is the first survey in which a web-scraped list frame has been used to assess undercoverage in a capture–recapture setting to produce official statistics. In this article, the methods are presented, and the challenges encountered are reviewed. Best practices and open research questions for conducting surveys using web-scraped list frames and capture–recapture methods are discussed.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46183458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Bootstrap Estimation of the Conditional Bias for Measuring Influence in Complex Surveys 复杂调查中测量影响的条件偏差的Bootstrap估计

IF 2.1 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology

Pub Date : 2021-08-16 DOI: 10.1093/jssam/smab029

J. Beaumont, Cynthia Bocci, Michel St-Louis

In sample surveys that collect information on skewed variables, it is often desirable to assess the influence of sample units on the sampling error of survey-weighted estimators of finite population parameters. The conditional bias is an attractive measure of influence that accounts for the sampling design and the estimation method. It is defined as the design expectation of the sampling error conditional on a given unit being selected in the sample. The estimation of the conditional bias is relatively straightforward for simple sampling designs and estimators. However, for complex designs or complex estimators, it may be tedious to derive an explicit expression for the conditional bias. In those complex surveys, variance estimation is often achieved through replication methods such as the bootstrap. Bootstrap methods of variance estimation are typically implemented by producing a set of bootstrap weights that is provided to users along with the survey data. In this article, we show how to use these bootstrap weights to obtain an estimator of the conditional bias. Our bootstrap estimator is evaluated in a simulation study and illustrated using data from the Canadian Survey of Household Spending.

在收集偏斜变量信息的抽样调查中，通常需要评估样本单位对有限总体参数的调查加权估计量的抽样误差的影响。条件偏差是一种有吸引力的影响度量，它解释了抽样设计和估计方法。它被定义为采样误差的设计预期，条件是在样本中选择给定的单元。对于简单的采样设计和估计器，条件偏差的估计是相对直接的。然而，对于复杂的设计或复杂的估计量，推导条件偏差的显式表达式可能是乏味的。在这些复杂的调查中，方差估计通常通过bootstrap等复制方法来实现。方差估计的Bootstrap方法通常通过产生一组Bootstrap权重来实现，该权重与调查数据一起提供给用户。在这篇文章中，我们展示了如何使用这些自举权重来获得条件偏差的估计量。我们的bootstrap估计器在一项模拟研究中进行了评估，并使用加拿大家庭支出调查的数据进行了说明。

{"title":"Bootstrap Estimation of the Conditional Bias for Measuring Influence in Complex Surveys","authors":"J. Beaumont, Cynthia Bocci, Michel St-Louis","doi":"10.1093/jssam/smab029","DOIUrl":"https://doi.org/10.1093/jssam/smab029","url":null,"abstract":"\u0000 In sample surveys that collect information on skewed variables, it is often desirable to assess the influence of sample units on the sampling error of survey-weighted estimators of finite population parameters. The conditional bias is an attractive measure of influence that accounts for the sampling design and the estimation method. It is defined as the design expectation of the sampling error conditional on a given unit being selected in the sample. The estimation of the conditional bias is relatively straightforward for simple sampling designs and estimators. However, for complex designs or complex estimators, it may be tedious to derive an explicit expression for the conditional bias. In those complex surveys, variance estimation is often achieved through replication methods such as the bootstrap. Bootstrap methods of variance estimation are typically implemented by producing a set of bootstrap weights that is provided to users along with the survey data. In this article, we show how to use these bootstrap weights to obtain an estimator of the conditional bias. Our bootstrap estimator is evaluated in a simulation study and illustrated using data from the Canadian Survey of Household Spending.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46294511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Survey Statistics and Methodology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀