揭示机器学习算法对在线地理编码服务质量的影响：使用 COVID-19 数据的案例研究

IF 2.8 3区地球科学 Q1 GEOGRAPHY Journal of Geographical Systems Pub Date : 2024-01-25 DOI:10.1007/s10109-023-00435-8

Batuhan Kilic, Onur Can Bayrak, Fatih Gülgen, Mert Gurturk, Perihan Abay

{"title":"揭示机器学习算法对在线地理编码服务质量的影响：使用 COVID-19 数据的案例研究","authors":"Batuhan Kilic, Onur Can Bayrak, Fatih Gülgen, Mert Gurturk, Perihan Abay","doi":"10.1007/s10109-023-00435-8","DOIUrl":null,"url":null,"abstract":"<p>In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries.</p>","PeriodicalId":47245,"journal":{"name":"Journal of Geographical Systems","volume":"255 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unveiling the impact of machine learning algorithms on the quality of online geocoding services: a case study using COVID-19 data\",\"authors\":\"Batuhan Kilic, Onur Can Bayrak, Fatih Gülgen, Mert Gurturk, Perihan Abay\",\"doi\":\"10.1007/s10109-023-00435-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries.</p>\",\"PeriodicalId\":47245,\"journal\":{\"name\":\"Journal of Geographical Systems\",\"volume\":\"255 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Geographical Systems\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s10109-023-00435-8\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geographical Systems","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10109-023-00435-8","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}

引用次数: 0

摘要

在当今时代，地址作为日常生活中实现移动性的关键要素之一，发挥着至关重要的作用。全球地图平台和定位服务使用地址数据来确定地理位置。在线平台提供的地理编码有助于对报告的病例进行空间追踪，也有助于对 COVID-19 等传染病进行空间分析。地理编码过程中最关键的第一阶段是地址匹配。然而，由于印刷错误、使用的缩写不同、地址不完整或畸形，匹配的准确率很少能达到 100%。本研究的目的是检验机器学习分类器的能力，这些分类器可用于衡量在线地理编码服务生成的地址匹配结果的一致性，并找出性能最佳的分类器。我们使用几种文本相似性度量方法对七种机器学习分类器的性能进行了比较，这些方法可评估输入地址数据与服务输出之间的匹配分数。测试中使用的数据来自四种不同的在线地理编码服务，应用于图尔基耶的 925 个地址。研究结果表明，随机森林机器学习分类器在地址匹配过程中最为准确。本研究的结果适用于土耳其的类似数据集，但是否适用于其他国家的数据还需要进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Unveiling the impact of machine learning algorithms on the quality of online geocoding services: a case study using COVID-19 data

In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Geographical Systems GEOGRAPHY-

CiteScore

5.40

自引率

6.90%

发文量

期刊介绍： The Journal of Geographical Systems (JGS) is an interdisciplinary peer-reviewed academic journal that aims to encourage and promote high-quality scholarship on new theoretical or empirical results, models and methods in the social sciences. It solicits original papers with a spatial dimension that can be of interest to social scientists. Coverage includes regional science, economic geography, spatial economics, regional and urban economics, GIScience and GeoComputation, big data and machine learning. Spatial analysis, spatial econometrics and statistics are strongly represented. One of the distinctive features of the journal is its concern for the interface between modeling, statistical techniques and spatial issues in a wide spectrum of related fields. An important goal of the journal is to encourage a spatial perspective in the social sciences that emphasizes geographical space as a relevant dimension to our understanding of socio-economic phenomena. Contributions should be of high-quality, be technically well-crafted, make a substantial contribution to the subject and contain a spatial dimension. The journal also aims to publish, review and survey articles that make recent theoretical and methodological developments more readily accessible to the audience of the journal. All papers of this journal have undergone rigorous double-blind peer-review, based on initial editor screening and with at least two peer reviewers. Officially cited as J Geogr Syst