Analysis of household samples: the 1901 census of Canada.

IF 1.6 2区 历史学 Q1 HISTORY Historical Methods Pub Date : 2000-01-01 DOI:10.1080/01615440009598960
M Ornstein
{"title":"Analysis of household samples: the 1901 census of Canada.","authors":"M Ornstein","doi":"10.1080/01615440009598960","DOIUrl":null,"url":null,"abstract":"he sample of the nominal census for 1901 prepared by the Canadian Families Project is a sample of T households or dwellings, and the sampling point is the count of dwellings entered by the enumerator in column 1 of Schedule 1. Five percent of all dwellings on each microfilm reel were selected randomly; thus, the sample is stratified by microfilm reel. All individuals in each sampled dwelling were entered into the data set. Household samples for which information is gathered for every household member actually involve two levels of sampling and analysis. Usually, a simple random sample or stratified sample of households is selected. The resulting sample of individuals, however, is a cluster sample; it is a stratified cluster sample if the household sample is stratified. The selection probabilities are the same for individuals and for households. Thus, if the household sample is selfweighting, or epsem (equal probability of selection method)-which means that no weights are required to obtain unbiased estimates of population characteristicsthen so is the individual sample. The analysis of household characteristics is straightforward. For example, regional comparisons of household size require only the household sample. However, a cluster sample generally provides less information than a simple random sample of the same size, in this case, because members of the same household are less different than a simple random sample of individuals, who for the most part were from different households. In other words, the characteristics of one household member of a cluster usually go some way toward predicting the characteristics of the other household members. With household samples, that is often true for religion, for example. Usually, the religion of one household member is a good (but not perfect) predictor of the religion of all the other household members. The degree of within-household similarity is different for each variable. Because parents and children, and women and men, live together, households are not particularly homogeneous in age or sex composition. The consequence of within-cluster similarity is that estimates of statistical parameters generally have less precision than the parameters that would be obtained from a simple random sample of the same size. When cluster samples are used with computer programs that cannot, or are not “instructed” to, take account of clustering and assume a simple random sample, such as SPSS and SAS, erroneous standard errors, confidence intervals, and significance tests are computed. Almost always, standard errors are underestimated, the confidence intervals are too narrow, and statistical significance is overestimated. One can compute the degree of misestimation exactly by measuring the withincluster homogeneity, but the degree of misestimation cannot be predicted beforehand and is different for every variable. For that reason, the commonsense “fix” of decreasing the weight for each observation by some multiplier to take account of the loss in precision results in overestimates of some confidence intervals and underestimates of others. Some software, such as the STATA package, used in examples cited later, provides correct confidence intervals and significance levels, but STATA is not in general use by sociologists and historians.’ Clustering is always a problem in principle, but whether the considerable effort required to deal with the statistical issues is justified depends on the particular analysis. The key issue is whether one is working with a “small” or “large” sample. With a large sample, effects that just reach statistical significance are usually too small to be of substantive interest, so incorrect significance tests are not a worry. Still, it is hard to argue in favor of using statistical procedures and programs that are known to give wrong answers. With small samples, the problem is much more severe: not accounting for clustering may dramatically increase the risk of obtaining “findings” that are not actual-","PeriodicalId":45535,"journal":{"name":"Historical Methods","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01615440009598960","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Historical Methods","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/01615440009598960","RegionNum":2,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HISTORY","Score":null,"Total":0}
引用次数: 16

Abstract

he sample of the nominal census for 1901 prepared by the Canadian Families Project is a sample of T households or dwellings, and the sampling point is the count of dwellings entered by the enumerator in column 1 of Schedule 1. Five percent of all dwellings on each microfilm reel were selected randomly; thus, the sample is stratified by microfilm reel. All individuals in each sampled dwelling were entered into the data set. Household samples for which information is gathered for every household member actually involve two levels of sampling and analysis. Usually, a simple random sample or stratified sample of households is selected. The resulting sample of individuals, however, is a cluster sample; it is a stratified cluster sample if the household sample is stratified. The selection probabilities are the same for individuals and for households. Thus, if the household sample is selfweighting, or epsem (equal probability of selection method)-which means that no weights are required to obtain unbiased estimates of population characteristicsthen so is the individual sample. The analysis of household characteristics is straightforward. For example, regional comparisons of household size require only the household sample. However, a cluster sample generally provides less information than a simple random sample of the same size, in this case, because members of the same household are less different than a simple random sample of individuals, who for the most part were from different households. In other words, the characteristics of one household member of a cluster usually go some way toward predicting the characteristics of the other household members. With household samples, that is often true for religion, for example. Usually, the religion of one household member is a good (but not perfect) predictor of the religion of all the other household members. The degree of within-household similarity is different for each variable. Because parents and children, and women and men, live together, households are not particularly homogeneous in age or sex composition. The consequence of within-cluster similarity is that estimates of statistical parameters generally have less precision than the parameters that would be obtained from a simple random sample of the same size. When cluster samples are used with computer programs that cannot, or are not “instructed” to, take account of clustering and assume a simple random sample, such as SPSS and SAS, erroneous standard errors, confidence intervals, and significance tests are computed. Almost always, standard errors are underestimated, the confidence intervals are too narrow, and statistical significance is overestimated. One can compute the degree of misestimation exactly by measuring the withincluster homogeneity, but the degree of misestimation cannot be predicted beforehand and is different for every variable. For that reason, the commonsense “fix” of decreasing the weight for each observation by some multiplier to take account of the loss in precision results in overestimates of some confidence intervals and underestimates of others. Some software, such as the STATA package, used in examples cited later, provides correct confidence intervals and significance levels, but STATA is not in general use by sociologists and historians.’ Clustering is always a problem in principle, but whether the considerable effort required to deal with the statistical issues is justified depends on the particular analysis. The key issue is whether one is working with a “small” or “large” sample. With a large sample, effects that just reach statistical significance are usually too small to be of substantive interest, so incorrect significance tests are not a worry. Still, it is hard to argue in favor of using statistical procedures and programs that are known to give wrong answers. With small samples, the problem is much more severe: not accounting for clustering may dramatically increase the risk of obtaining “findings” that are not actual-
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
家庭样本分析:1901年加拿大人口普查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Historical Methods
Historical Methods Multiple-
CiteScore
3.20
自引率
7.10%
发文量
13
期刊介绍: Historical Methodsreaches an international audience of social scientists concerned with historical problems. It explores interdisciplinary approaches to new data sources, new approaches to older questions and material, and practical discussions of computer and statistical methodology, data collection, and sampling procedures. The journal includes the following features: “Evidence Matters” emphasizes how to find, decipher, and analyze evidence whether or not that evidence is meant to be quantified. “Database Developments” announces major new public databases or large alterations in older ones, discusses innovative ways to organize them, and explains new ways of categorizing information.
期刊最新文献
A New Strategy for Linking U.S. Historical Censuses: A Case Study for the IPUMS Multigenerational Longitudinal Panel. Simple Strategies for Improving Inference with Linked Data: A Case Study of the 1850-1930 IPUMS Linked Representative Historical Samples. Reconstruction of Birth Histories for the Study of Fertility in the United States, 1830-1910. Introduction to Special Issues on Historical Record Linking. Linking the 1940 U.S. Census with Modern Data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1