Matthew Jannetti, A. Carroll-Scott, Erikka Gilliam, Irene E. Headen, Maggie Beverly, F. Lê-Scherban
{"title":"Improving Sampling Probability Definitions with Predictive Algorithms","authors":"Matthew Jannetti, A. Carroll-Scott, Erikka Gilliam, Irene E. Headen, Maggie Beverly, F. Lê-Scherban","doi":"10.1177/1525822X221113181","DOIUrl":null,"url":null,"abstract":"Place-based initiatives often use resident surveys to inform and evaluate interventions. Sampling based on well-defined sampling frames is important but challenging for initiatives that target subpopulations. Databases that enumerate total population counts can produce overinclusive sampling frames, resulting in costly outreach to ineligible participants. Quantifying eligibility before sampling using machine learning algorithms can improve efficiency and reduce costs. We developed a model to improve sampling for the West Philly Promise Neighborhood’s biennial population-representative survey of households with children within a geographic footprint. This study proposes a method to estimate probability of study eligibility by building a well-calibrated predictive model using existing administrative data sources. Six machine-learning models were evaluated; logistic regression provided the best balance of accuracy and understandable probabilities. This approach can be a blueprint for other population-based studies whose sampling frames cannot be well defined using traditional sources.","PeriodicalId":48060,"journal":{"name":"Field Methods","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Field Methods","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/1525822X221113181","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}
引用次数: 1
Abstract
Place-based initiatives often use resident surveys to inform and evaluate interventions. Sampling based on well-defined sampling frames is important but challenging for initiatives that target subpopulations. Databases that enumerate total population counts can produce overinclusive sampling frames, resulting in costly outreach to ineligible participants. Quantifying eligibility before sampling using machine learning algorithms can improve efficiency and reduce costs. We developed a model to improve sampling for the West Philly Promise Neighborhood’s biennial population-representative survey of households with children within a geographic footprint. This study proposes a method to estimate probability of study eligibility by building a well-calibrated predictive model using existing administrative data sources. Six machine-learning models were evaluated; logistic regression provided the best balance of accuracy and understandable probabilities. This approach can be a blueprint for other population-based studies whose sampling frames cannot be well defined using traditional sources.
期刊介绍:
Field Methods (formerly Cultural Anthropology Methods) is devoted to articles about the methods used by field wzorkers in the social and behavioral sciences and humanities for the collection, management, and analysis data about human thought and/or human behavior in the natural world. Articles should focus on innovations and issues in the methods used, rather than on the reporting of research or theoretical/epistemological questions about research. High-quality articles using qualitative and quantitative methods-- from scientific or interpretative traditions-- dealing with data collection and analysis in applied and scholarly research from writers in the social sciences, humanities, and related professions are all welcome in the pages of the journal.