Suela Isaj, Vassilis Kaffes, T. Pedersen, G. Giannopoulos
{"title":"基于监督天际线的空间实体关联算法","authors":"Suela Isaj, Vassilis Kaffes, T. Pedersen, G. Giannopoulos","doi":"10.48786/edbt.2022.11","DOIUrl":null,"url":null,"abstract":"The ease of publishing data on the web has contributed to larger and more diverse types of data. Entities that refer to a physical place and are characterized by a location and different attributes are named spatial entities. Even though the amount of spatial entity data from multiple sources keeps increasing, facilitating the development of richer, more accurate and more comprehensive geospatial applications and services, there is unavoidable redundancy and ambiguity. We address the problem of spatial entity linkage with SkylineExplore-Trained (SkyEx-T ), a skyline-based algorithm that can label an entity pair as being the same physical entity or not. We introduce LinkGeoML-eXtended (LGM-X ), a meta-similarity function that computes similarity features specifically tailored to the specificities of spatial entities. The skylines of SkyEx-T are created using a preference function, which ranks the pairs based on the likelihood of referring to the same entity. We propose deriving the preference function using a tiny training set (down to 0.05% of the dataset). Additionally, we provide a theoretical guarantee for the cut-off that can best separate the classes, and we show experimentally that it results in a nearoptimal F-measure (on average only 2% loss). SkyEx-T yields an F-measure of 0.71-0.74 and beats the existing non-skyline-based baselines with a margin of 0.11-0.39 in F-measure. When compared to machine learning techniques, SkyEx-T is able to achieve a similar accuracy (sometimes slightly better one in very small training sets) and more importantly, having no-parameters to tune and a model that is already explainable (no need for further actions to achieve explainability).","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"2:220-2:233"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Supervised Skyline-Based Algorithm for Spatial Entity Linkage\",\"authors\":\"Suela Isaj, Vassilis Kaffes, T. Pedersen, G. Giannopoulos\",\"doi\":\"10.48786/edbt.2022.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ease of publishing data on the web has contributed to larger and more diverse types of data. Entities that refer to a physical place and are characterized by a location and different attributes are named spatial entities. Even though the amount of spatial entity data from multiple sources keeps increasing, facilitating the development of richer, more accurate and more comprehensive geospatial applications and services, there is unavoidable redundancy and ambiguity. We address the problem of spatial entity linkage with SkylineExplore-Trained (SkyEx-T ), a skyline-based algorithm that can label an entity pair as being the same physical entity or not. We introduce LinkGeoML-eXtended (LGM-X ), a meta-similarity function that computes similarity features specifically tailored to the specificities of spatial entities. The skylines of SkyEx-T are created using a preference function, which ranks the pairs based on the likelihood of referring to the same entity. We propose deriving the preference function using a tiny training set (down to 0.05% of the dataset). Additionally, we provide a theoretical guarantee for the cut-off that can best separate the classes, and we show experimentally that it results in a nearoptimal F-measure (on average only 2% loss). SkyEx-T yields an F-measure of 0.71-0.74 and beats the existing non-skyline-based baselines with a margin of 0.11-0.39 in F-measure. When compared to machine learning techniques, SkyEx-T is able to achieve a similar accuracy (sometimes slightly better one in very small training sets) and more importantly, having no-parameters to tune and a model that is already explainable (no need for further actions to achieve explainability).\",\"PeriodicalId\":88813,\"journal\":{\"name\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"volume\":\"1 1\",\"pages\":\"2:220-2:233\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48786/edbt.2022.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2022.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Supervised Skyline-Based Algorithm for Spatial Entity Linkage
The ease of publishing data on the web has contributed to larger and more diverse types of data. Entities that refer to a physical place and are characterized by a location and different attributes are named spatial entities. Even though the amount of spatial entity data from multiple sources keeps increasing, facilitating the development of richer, more accurate and more comprehensive geospatial applications and services, there is unavoidable redundancy and ambiguity. We address the problem of spatial entity linkage with SkylineExplore-Trained (SkyEx-T ), a skyline-based algorithm that can label an entity pair as being the same physical entity or not. We introduce LinkGeoML-eXtended (LGM-X ), a meta-similarity function that computes similarity features specifically tailored to the specificities of spatial entities. The skylines of SkyEx-T are created using a preference function, which ranks the pairs based on the likelihood of referring to the same entity. We propose deriving the preference function using a tiny training set (down to 0.05% of the dataset). Additionally, we provide a theoretical guarantee for the cut-off that can best separate the classes, and we show experimentally that it results in a nearoptimal F-measure (on average only 2% loss). SkyEx-T yields an F-measure of 0.71-0.74 and beats the existing non-skyline-based baselines with a margin of 0.11-0.39 in F-measure. When compared to machine learning techniques, SkyEx-T is able to achieve a similar accuracy (sometimes slightly better one in very small training sets) and more importantly, having no-parameters to tune and a model that is already explainable (no need for further actions to achieve explainability).