揭示机器学习算法对在线地理编码服务质量的影响:使用 COVID-19 数据的案例研究

IF 2.8 3区 地球科学 Q1 GEOGRAPHY Journal of Geographical Systems Pub Date : 2024-01-25 DOI:10.1007/s10109-023-00435-8
Batuhan Kilic, Onur Can Bayrak, Fatih Gülgen, Mert Gurturk, Perihan Abay
{"title":"揭示机器学习算法对在线地理编码服务质量的影响:使用 COVID-19 数据的案例研究","authors":"Batuhan Kilic, Onur Can Bayrak, Fatih Gülgen, Mert Gurturk, Perihan Abay","doi":"10.1007/s10109-023-00435-8","DOIUrl":null,"url":null,"abstract":"<p>In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries.</p>","PeriodicalId":47245,"journal":{"name":"Journal of Geographical Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unveiling the impact of machine learning algorithms on the quality of online geocoding services: a case study using COVID-19 data\",\"authors\":\"Batuhan Kilic, Onur Can Bayrak, Fatih Gülgen, Mert Gurturk, Perihan Abay\",\"doi\":\"10.1007/s10109-023-00435-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries.</p>\",\"PeriodicalId\":47245,\"journal\":{\"name\":\"Journal of Geographical Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Geographical Systems\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s10109-023-00435-8\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geographical Systems","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10109-023-00435-8","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0

摘要

在当今时代,地址作为日常生活中实现移动性的关键要素之一,发挥着至关重要的作用。全球地图平台和定位服务使用地址数据来确定地理位置。在线平台提供的地理编码有助于对报告的病例进行空间追踪,也有助于对 COVID-19 等传染病进行空间分析。地理编码过程中最关键的第一阶段是地址匹配。然而,由于印刷错误、使用的缩写不同、地址不完整或畸形,匹配的准确率很少能达到 100%。本研究的目的是检验机器学习分类器的能力,这些分类器可用于衡量在线地理编码服务生成的地址匹配结果的一致性,并找出性能最佳的分类器。我们使用几种文本相似性度量方法对七种机器学习分类器的性能进行了比较,这些方法可评估输入地址数据与服务输出之间的匹配分数。测试中使用的数据来自四种不同的在线地理编码服务,应用于图尔基耶的 925 个地址。研究结果表明,随机森林机器学习分类器在地址匹配过程中最为准确。本研究的结果适用于土耳其的类似数据集,但是否适用于其他国家的数据还需要进一步研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Unveiling the impact of machine learning algorithms on the quality of online geocoding services: a case study using COVID-19 data

In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.40
自引率
6.90%
发文量
33
期刊介绍: The Journal of Geographical Systems (JGS) is an interdisciplinary peer-reviewed academic journal that aims to encourage and promote high-quality scholarship on new theoretical or empirical results, models and methods in the social sciences. It solicits original papers with a spatial dimension that can be of interest to social scientists. Coverage includes regional science, economic geography, spatial economics, regional and urban economics, GIScience and GeoComputation, big data and machine learning. Spatial analysis, spatial econometrics and statistics are strongly represented. One of the distinctive features of the journal is its concern for the interface between modeling, statistical techniques and spatial issues in a wide spectrum of related fields. An important goal of the journal is to encourage a spatial perspective in the social sciences that emphasizes geographical space as a relevant dimension to our understanding of socio-economic phenomena. Contributions should be of high-quality, be technically well-crafted, make a substantial contribution to the subject and contain a spatial dimension. The journal also aims to publish, review and survey articles that make recent theoretical and methodological developments more readily accessible to the audience of the journal. All papers of this journal have undergone rigorous double-blind peer-review, based on initial editor screening and with at least two peer reviewers. Officially cited as J Geogr Syst
期刊最新文献
Point cluster analysis using weighted random labeling Implications for spatial non-stationarity and the neighborhood effect averaging problem (NEAP) in green inequality research: evidence from three states in the USA Integrating big data with KNIME as an alternative without programming code: an application to the PATSTAT patent database Mobility deviation index: incorporating geographical context into analysis of human mobility Speeding up estimation of spatially varying coefficients models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1