Designing and Evaluating a Hierarchical Framework for Matching Food Outlets across Multi-sourced Geospatial Datasets: a Case Study of San Diego County

Yanjia Cao, Jiue-An Yang, Atsushi Nara, Marta M. Jankowska
{"title":"Designing and Evaluating a Hierarchical Framework for Matching Food Outlets across Multi-sourced Geospatial Datasets: a Case Study of San Diego County","authors":"Yanjia Cao, Jiue-An Yang, Atsushi Nara, Marta M. Jankowska","doi":"10.1007/s11524-023-00817-9","DOIUrl":null,"url":null,"abstract":"<p>Research on retail food environment (RFE) relies on data availability and accuracy. However, the discrepancies in RFE datasets may lead to imprecision when measuring association with health outcomes. In this research, we present a two-tier hierarchical point of interest (POI) matching framework to compare and triangulate food outlets across multiple geospatial data sources. Two matching parameters were used including the geodesic distance between businesses and the similarity of business names according to Levenshtein distance (LD) and Double Metaphone (DM). Sensitivity analysis was conducted to determine thresholds of matching parameters. Our Tier 1 matching used more restricted parameters to generate high confidence-matched POIs, whereas in Tier 2 we opted for relaxed matching parameters and applied a weighted multi-attribute model on the previously unmatched records. Our case study in San Diego County, California used government, commercial, and crowdsourced data and returned 20.2% matched records from Tier 1 and 18.6% matched from Tier 2. Our manual validation shows a 100% matching rate for Tier 1 and up to 30.6% for Tier 2. Matched and unmatched records from Tier 1 were further analyzed for spatial patterns and categorical differences. Our hierarchical POI matching framework generated highly confident food POIs by conflating datasets and identified some food POIs that are unique to specific data sources. Triangulating RFE data can reduce uncertain and invalid POI listings when representing food environment using multiple data sources. Studies investigating associations between food environment and health outcomes may benefit from improved quality of RFE.</p>","PeriodicalId":17506,"journal":{"name":"Journal of Urban Health","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Urban Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11524-023-00817-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Research on retail food environment (RFE) relies on data availability and accuracy. However, the discrepancies in RFE datasets may lead to imprecision when measuring association with health outcomes. In this research, we present a two-tier hierarchical point of interest (POI) matching framework to compare and triangulate food outlets across multiple geospatial data sources. Two matching parameters were used including the geodesic distance between businesses and the similarity of business names according to Levenshtein distance (LD) and Double Metaphone (DM). Sensitivity analysis was conducted to determine thresholds of matching parameters. Our Tier 1 matching used more restricted parameters to generate high confidence-matched POIs, whereas in Tier 2 we opted for relaxed matching parameters and applied a weighted multi-attribute model on the previously unmatched records. Our case study in San Diego County, California used government, commercial, and crowdsourced data and returned 20.2% matched records from Tier 1 and 18.6% matched from Tier 2. Our manual validation shows a 100% matching rate for Tier 1 and up to 30.6% for Tier 2. Matched and unmatched records from Tier 1 were further analyzed for spatial patterns and categorical differences. Our hierarchical POI matching framework generated highly confident food POIs by conflating datasets and identified some food POIs that are unique to specific data sources. Triangulating RFE data can reduce uncertain and invalid POI listings when representing food environment using multiple data sources. Studies investigating associations between food environment and health outcomes may benefit from improved quality of RFE.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
设计和评估多源地理空间数据集食品店匹配分层框架:圣地亚哥县案例研究
有关零售食品环境(RFE)的研究依赖于数据的可用性和准确性。然而,零售食品环境数据集的差异可能会导致在测量与健康结果的关联时不够精确。在这项研究中,我们提出了一个两层兴趣点(POI)匹配框架,用于比较和三角测量多个地理空间数据源中的食品店。我们使用了两个匹配参数,包括企业之间的大地测量距离,以及根据莱文斯坦距离(LD)和双隐喻(DM)计算的企业名称相似度。我们进行了敏感性分析,以确定匹配参数的阈值。我们的一级匹配使用了更多限制性参数,以生成高置信度匹配的 POI,而在二级匹配中,我们选择了宽松的匹配参数,并对之前未匹配的记录应用了加权多属性模型。我们在加利福尼亚州圣迭戈县进行的案例研究使用了政府、商业和众包数据,结果显示,一级匹配的记录占 20.2%,二级匹配的记录占 18.6%。我们的人工验证显示,第 1 层的匹配率为 100%,第 2 层的匹配率高达 30.6%。我们进一步分析了第 1 层的匹配记录和未匹配记录的空间模式和分类差异。我们的分层 POI 匹配框架通过混淆数据集生成了高度可信的食品 POI,并识别出一些特定数据源独有的食品 POI。在使用多种数据源表示食品环境时,三角化 RFE 数据可减少不确定和无效的 POI 列表。对食品环境与健康结果之间关系的调查研究可能会从提高《食物权证》质量中获益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Sleep Health among Community-Recruited Opioid-Using People Who Inject Drugs in Los Angeles, CA and Denver, CO Assessing the Burden of Electrical, Elevator, Heat, Hot Water, and Water Service Interruptions in New York City Public Housing Considering Residents’ Health and Well-Being in the Process of Social Housing Redevelopment: A Rapid Scoping Literature Review Strategies to Reduce Frequent Emergency Department Use among Persons Experiencing Homelessness with Mental Health Conditions: a Scoping Review Neighborhood Safety Concerns and the Onset of Depressive Symptoms Among Women: A Population-based Prospective Cohort Study in South Korea
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1