用预测算法改进采样概率定义

IF 1.1 3区社会学 Q2 ANTHROPOLOGY Field Methods Pub Date : 2022-09-15 DOI:10.1177/1525822X221113181

Matthew Jannetti, A. Carroll-Scott, Erikka Gilliam, Irene E. Headen, Maggie Beverly, F. Lê-Scherban

{"title":"用预测算法改进采样概率定义","authors":"Matthew Jannetti, A. Carroll-Scott, Erikka Gilliam, Irene E. Headen, Maggie Beverly, F. Lê-Scherban","doi":"10.1177/1525822X221113181","DOIUrl":null,"url":null,"abstract":"Place-based initiatives often use resident surveys to inform and evaluate interventions. Sampling based on well-defined sampling frames is important but challenging for initiatives that target subpopulations. Databases that enumerate total population counts can produce overinclusive sampling frames, resulting in costly outreach to ineligible participants. Quantifying eligibility before sampling using machine learning algorithms can improve efficiency and reduce costs. We developed a model to improve sampling for the West Philly Promise Neighborhood’s biennial population-representative survey of households with children within a geographic footprint. This study proposes a method to estimate probability of study eligibility by building a well-calibrated predictive model using existing administrative data sources. Six machine-learning models were evaluated; logistic regression provided the best balance of accuracy and understandable probabilities. This approach can be a blueprint for other population-based studies whose sampling frames cannot be well defined using traditional sources.","PeriodicalId":48060,"journal":{"name":"Field Methods","volume":"35 1","pages":"137 - 152"},"PeriodicalIF":1.1000,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving Sampling Probability Definitions with Predictive Algorithms\",\"authors\":\"Matthew Jannetti, A. Carroll-Scott, Erikka Gilliam, Irene E. Headen, Maggie Beverly, F. Lê-Scherban\",\"doi\":\"10.1177/1525822X221113181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Place-based initiatives often use resident surveys to inform and evaluate interventions. Sampling based on well-defined sampling frames is important but challenging for initiatives that target subpopulations. Databases that enumerate total population counts can produce overinclusive sampling frames, resulting in costly outreach to ineligible participants. Quantifying eligibility before sampling using machine learning algorithms can improve efficiency and reduce costs. We developed a model to improve sampling for the West Philly Promise Neighborhood’s biennial population-representative survey of households with children within a geographic footprint. This study proposes a method to estimate probability of study eligibility by building a well-calibrated predictive model using existing administrative data sources. Six machine-learning models were evaluated; logistic regression provided the best balance of accuracy and understandable probabilities. This approach can be a blueprint for other population-based studies whose sampling frames cannot be well defined using traditional sources.\",\"PeriodicalId\":48060,\"journal\":{\"name\":\"Field Methods\",\"volume\":\"35 1\",\"pages\":\"137 - 152\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2022-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Field Methods\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1177/1525822X221113181\",\"RegionNum\":3,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ANTHROPOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Field Methods","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/1525822X221113181","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}

引用次数: 1

摘要

基于地点的举措通常使用居民调查来告知和评估干预措施。基于定义明确的采样框架的采样很重要，但对于针对亚群体的举措来说具有挑战性。列举总人口计数的数据库可能会产生过多的抽样框架，导致对不合格参与者的推广成本高昂。使用机器学习算法在采样前量化合格性可以提高效率并降低成本。我们为West Philly Promise Neighborhood两年一次的人口代表性调查开发了一个模型，以改进抽样，该调查针对地理足迹内有孩子的家庭。本研究提出了一种方法，通过使用现有的管理数据源建立一个校准良好的预测模型来估计研究合格的概率。评估了六个机器学习模型；逻辑回归提供了准确性和可理解概率之间的最佳平衡。这种方法可以作为其他基于人群的研究的蓝图，这些研究的采样框架无法使用传统来源很好地定义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving Sampling Probability Definitions with Predictive Algorithms

Place-based initiatives often use resident surveys to inform and evaluate interventions. Sampling based on well-defined sampling frames is important but challenging for initiatives that target subpopulations. Databases that enumerate total population counts can produce overinclusive sampling frames, resulting in costly outreach to ineligible participants. Quantifying eligibility before sampling using machine learning algorithms can improve efficiency and reduce costs. We developed a model to improve sampling for the West Philly Promise Neighborhood’s biennial population-representative survey of households with children within a geographic footprint. This study proposes a method to estimate probability of study eligibility by building a well-calibrated predictive model using existing administrative data sources. Six machine-learning models were evaluated; logistic regression provided the best balance of accuracy and understandable probabilities. This approach can be a blueprint for other population-based studies whose sampling frames cannot be well defined using traditional sources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Field Methods Multiple-

CiteScore

2.70

自引率

5.90%

发文量

期刊介绍： Field Methods (formerly Cultural Anthropology Methods) is devoted to articles about the methods used by field wzorkers in the social and behavioral sciences and humanities for the collection, management, and analysis data about human thought and/or human behavior in the natural world. Articles should focus on innovations and issues in the methods used, rather than on the reporting of research or theoretical/epistemological questions about research. High-quality articles using qualitative and quantitative methods-- from scientific or interpretative traditions-- dealing with data collection and analysis in applied and scholarly research from writers in the social sciences, humanities, and related professions are all welcome in the pages of the journal.