{"title":"An experimental study of existing tools for outlier detection and cleaning in trajectories","authors":"Mariana M Garcez Duarte, Mahmoud Sakr","doi":"10.1007/s10707-024-00522-y","DOIUrl":null,"url":null,"abstract":"<p>Outlier detection and cleaning are essential steps in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within individual trajectories, i.e., points that deviate significantly inside a single trajectory. We experiment with ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This experiment considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into five types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, and Heuristic-based methods. Our research provides insights into these libraries’ performance and contributes to developing data preprocessing and outlier detection methodologies.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"25 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoinformatica","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10707-024-00522-y","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Outlier detection and cleaning are essential steps in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within individual trajectories, i.e., points that deviate significantly inside a single trajectory. We experiment with ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This experiment considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into five types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, and Heuristic-based methods. Our research provides insights into these libraries’ performance and contributes to developing data preprocessing and outlier detection methodologies.
期刊介绍:
GeoInformatica is located at the confluence of two rapidly advancing domains: Computer Science and Geographic Information Science; nowadays, Earth studies use more and more sophisticated computing theory and tools, and computer processing of Earth observations through Geographic Information Systems (GIS) attracts a great deal of attention from governmental, industrial and research worlds.
This journal aims to promote the most innovative results coming from the research in the field of computer science applied to geographic information systems. Thus, GeoInformatica provides an effective forum for disseminating original and fundamental research and experience in the rapidly advancing area of the use of computer science for spatial studies.