{"title":"Local Evaluation of Large-scale Remote Sensing Machine Learning-generated Building and Road Dataset: The Case of Rwanda","authors":"Emmanuel Nyandwi, Markus Gerke, Pedro Achanccaray","doi":"10.1007/s41064-024-00297-9","DOIUrl":null,"url":null,"abstract":"<p>Accurate and up-to-date building and road data are crucial for informed spatial planning. In developing regions in particular, major challenges arise due to the limited availability of these data, primarily as a result of the inherent inefficiency of traditional field-based surveys and manual data generation methods. Importantly, this limitation has prompted the exploration of alternative solutions, including the use of remote sensing machine learning-generated (RSML) datasets. Within the field of RSML datasets, a plethora of models have been proposed. However, these methods, evaluated in a research setting, may not translate perfectly to massive real-world applications, attributable to potential inaccuracies in unknown geographic spaces. The scepticism surrounding the usefulness of datasets generated by global models, owing to unguaranteed local accuracy, appears to be particularly concerning. As a consequence, rigorous evaluations of these datasets in local scenarios are essential for gaining insights into their usability. To address this concern, this study investigates the local accuracy of large RSML datasets. For this evaluation, we employed a dataset generated using models pre-trained on a variety of samples drawn from across the world and accessible from public repositories of open benchmark datasets. Subsequently, these models were fine-tuned with a limited set of local samples specific to Rwanda. In addition, the evaluation included Microsoft’s and Google’s global datasets. Using ResNet and Mask R‑CNN, we explored the performance variations of different building detection approaches: bottom-up, end-to-end, and their combination. For road extraction, we explored the approach of training multiple models on subsets representing different road types. Our testing dataset was carefully designed to be diverse, incorporating both easy and challenging scenes. It includes areas purposefully chosen for their high level of clutter, making it difficult to detect structures like buildings. This inclusion of complex scenarios alongside simpler ones allows us to thoroughly assess the robustness of DL-based detection models for handling diverse real-world conditions. In addition, buildings were evaluated using a polygon-wise comparison, while roads were assessed using network length-derived metrics.</p><p>Our results showed a precision (P) of around 75% and a recall (R) of around 60% for the locally fine-tuned building model. This performance was achieved in three out of six testing sites and is considered the lowest limit needed for practical utility of RSML datasets, according to the literature. In contrast, comparable results were obtained in only one out of six sites for the Google and Microsoft datasets. Our locally fine-tuned road model achieved moderate success, meeting the minimum usability threshold in four out of six sites. In contrast, the Microsoft dataset performed well on all sites. In summary, our findings suggest improved performance in road extraction, relative to building extraction tasks. Moreover, we observed that a pipeline relying on a combination of bottom-up and top-down segmentation, while leveraging open global benchmark annotation dataset as well as a small number of samples for fine-tuning, can offer more accurate RSML datasets compared to an open global dataset. Our findings suggest that relying solely on aggregated accuracy metrics can be misleading. According to our evaluation, even city-level derived measures may not capture significant variations in performance within a city, such as lower accuracy in specific neighbourhoods. Overcoming the challenges of complex areas might benefit from exploring alternative approaches, including the integration of LiDAR data, UAV images, aerial images or using other network architectures.</p>","PeriodicalId":56035,"journal":{"name":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","volume":"9 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s41064-024-00297-9","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate and up-to-date building and road data are crucial for informed spatial planning. In developing regions in particular, major challenges arise due to the limited availability of these data, primarily as a result of the inherent inefficiency of traditional field-based surveys and manual data generation methods. Importantly, this limitation has prompted the exploration of alternative solutions, including the use of remote sensing machine learning-generated (RSML) datasets. Within the field of RSML datasets, a plethora of models have been proposed. However, these methods, evaluated in a research setting, may not translate perfectly to massive real-world applications, attributable to potential inaccuracies in unknown geographic spaces. The scepticism surrounding the usefulness of datasets generated by global models, owing to unguaranteed local accuracy, appears to be particularly concerning. As a consequence, rigorous evaluations of these datasets in local scenarios are essential for gaining insights into their usability. To address this concern, this study investigates the local accuracy of large RSML datasets. For this evaluation, we employed a dataset generated using models pre-trained on a variety of samples drawn from across the world and accessible from public repositories of open benchmark datasets. Subsequently, these models were fine-tuned with a limited set of local samples specific to Rwanda. In addition, the evaluation included Microsoft’s and Google’s global datasets. Using ResNet and Mask R‑CNN, we explored the performance variations of different building detection approaches: bottom-up, end-to-end, and their combination. For road extraction, we explored the approach of training multiple models on subsets representing different road types. Our testing dataset was carefully designed to be diverse, incorporating both easy and challenging scenes. It includes areas purposefully chosen for their high level of clutter, making it difficult to detect structures like buildings. This inclusion of complex scenarios alongside simpler ones allows us to thoroughly assess the robustness of DL-based detection models for handling diverse real-world conditions. In addition, buildings were evaluated using a polygon-wise comparison, while roads were assessed using network length-derived metrics.
Our results showed a precision (P) of around 75% and a recall (R) of around 60% for the locally fine-tuned building model. This performance was achieved in three out of six testing sites and is considered the lowest limit needed for practical utility of RSML datasets, according to the literature. In contrast, comparable results were obtained in only one out of six sites for the Google and Microsoft datasets. Our locally fine-tuned road model achieved moderate success, meeting the minimum usability threshold in four out of six sites. In contrast, the Microsoft dataset performed well on all sites. In summary, our findings suggest improved performance in road extraction, relative to building extraction tasks. Moreover, we observed that a pipeline relying on a combination of bottom-up and top-down segmentation, while leveraging open global benchmark annotation dataset as well as a small number of samples for fine-tuning, can offer more accurate RSML datasets compared to an open global dataset. Our findings suggest that relying solely on aggregated accuracy metrics can be misleading. According to our evaluation, even city-level derived measures may not capture significant variations in performance within a city, such as lower accuracy in specific neighbourhoods. Overcoming the challenges of complex areas might benefit from exploring alternative approaches, including the integration of LiDAR data, UAV images, aerial images or using other network architectures.
期刊介绍:
PFG is an international scholarly journal covering the progress and application of photogrammetric methods, remote sensing technology and the interconnected field of geoinformation science. It places special editorial emphasis on the communication of new methodologies in data acquisition and new approaches to optimized processing and interpretation of all types of data which were acquired by photogrammetric methods, remote sensing, image processing and the computer-aided interpretation of such data in general. The journal hence addresses both researchers and students of these disciplines at academic institutions and universities as well as the downstream users in both the private sector and public administration.
Founded in 1926 under the former name Bildmessung und Luftbildwesen, PFG is worldwide the oldest journal on photogrammetry. It is the official journal of the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF).