{"title":"Automated curation of spatial metadata in environmental monitoring data","authors":"İlhan Mutlu , Jörg Hackermüller , Jana Schor","doi":"10.1016/j.ecoinf.2025.103038","DOIUrl":null,"url":null,"abstract":"<div><div>Spatial data accuracy in environmental monitoring is crucial for practical large-scale data analytics and the development of AI models. In this context, spatial data is metadata and faces the same challenges as any other metadata, like missing values, false or contradicting information, formatting problems of textual data and numbers, the usage of different languages, and more. These issues severely limit the usability of the data.</div><div>With this study, we provide an automatic approach, CleanGeoStreamR, to resolve as many of these issues as possible for the spatially annotated environmental monitoring database. We substantially increased the quality of the spatial metadata and, therefore, the quantity of data points that can be used in large-scale data analytics and AI applications.</div><div>Further, our goal is to raise awareness about the issues related to spatial metadata and promote the implementation of our concepts in other environmental monitoring data sources. Advanced understanding and the availability of automatic approaches like the presented method will substantially contribute to making environmental monitoring data FAIR and enhance its usability in the era of Big Data and AI.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"86 ","pages":"Article 103038"},"PeriodicalIF":5.8000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125000470","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Spatial data accuracy in environmental monitoring is crucial for practical large-scale data analytics and the development of AI models. In this context, spatial data is metadata and faces the same challenges as any other metadata, like missing values, false or contradicting information, formatting problems of textual data and numbers, the usage of different languages, and more. These issues severely limit the usability of the data.
With this study, we provide an automatic approach, CleanGeoStreamR, to resolve as many of these issues as possible for the spatially annotated environmental monitoring database. We substantially increased the quality of the spatial metadata and, therefore, the quantity of data points that can be used in large-scale data analytics and AI applications.
Further, our goal is to raise awareness about the issues related to spatial metadata and promote the implementation of our concepts in other environmental monitoring data sources. Advanced understanding and the availability of automatic approaches like the presented method will substantially contribute to making environmental monitoring data FAIR and enhance its usability in the era of Big Data and AI.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.