Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Yezhou Yang, Hyunho Lee, Anna K. Liljedahl, C. Witharana, Yili Yang, Brendan M. Rogers, S. Arundel, Matthew B. Jones, Kenton McHenry, Patricia Solis
{"title":"Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model's Generalizability in Permafrost Mapping","authors":"Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Yezhou Yang, Hyunho Lee, Anna K. Liljedahl, C. Witharana, Yili Yang, Brendan M. Rogers, S. Arundel, Matthew B. Jones, Kenton McHenry, Patricia Solis","doi":"10.48550/arXiv.2401.08787","DOIUrl":null,"url":null,"abstract":"This paper assesses trending AI foundation models, especially emerging computer vision foundation models and their performance in natural landscape feature segmentation. While the term foundation model has quickly garnered interest from the geospatial domain, its definition remains vague. Hence, this paper will first introduce AI foundation models and their defining characteristics. Built upon the tremendous success achieved by Large Language Models (LLMs) as the foundation models for language tasks, this paper discusses the challenges of building foundation models for geospatial artificial intelligence (GeoAI) vision tasks. To evaluate the performance of large AI vision models, especially Meta’s Segment Anything Model (SAM), we implemented different instance segmentation pipelines that minimize the changes to SAM to leverage its power as a foundation model. A series of prompt strategies were developed to test SAM’s performance regarding its theoretical upper bound of predictive accuracy, zero-shot performance, and domain adaptability through fine-tuning. The analysis used two permafrost feature datasets, ice-wedge polygons and retrogressive thaw slumps because (1) these landform features are more challenging to segment than man-made features due to their complicated formation mechanisms, diverse forms, and vague boundaries; (2) their presence and changes are important indicators for Arctic warming and climate change. The results show that although promising, SAM still has room for improvement to support AI-augmented terrain mapping. The spatial and domain generalizability of this finding is further validated using a more general dataset EuroCrops for agricultural field mapping. Finally, we discuss future research directions that strengthen SAM’s applicability in challenging geospatial domains.","PeriodicalId":20944,"journal":{"name":"Remote. Sens.","volume":"35 5","pages":"797"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote. Sens.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2401.08787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper assesses trending AI foundation models, especially emerging computer vision foundation models and their performance in natural landscape feature segmentation. While the term foundation model has quickly garnered interest from the geospatial domain, its definition remains vague. Hence, this paper will first introduce AI foundation models and their defining characteristics. Built upon the tremendous success achieved by Large Language Models (LLMs) as the foundation models for language tasks, this paper discusses the challenges of building foundation models for geospatial artificial intelligence (GeoAI) vision tasks. To evaluate the performance of large AI vision models, especially Meta’s Segment Anything Model (SAM), we implemented different instance segmentation pipelines that minimize the changes to SAM to leverage its power as a foundation model. A series of prompt strategies were developed to test SAM’s performance regarding its theoretical upper bound of predictive accuracy, zero-shot performance, and domain adaptability through fine-tuning. The analysis used two permafrost feature datasets, ice-wedge polygons and retrogressive thaw slumps because (1) these landform features are more challenging to segment than man-made features due to their complicated formation mechanisms, diverse forms, and vague boundaries; (2) their presence and changes are important indicators for Arctic warming and climate change. The results show that although promising, SAM still has room for improvement to support AI-augmented terrain mapping. The spatial and domain generalizability of this finding is further validated using a more general dataset EuroCrops for agricultural field mapping. Finally, we discuss future research directions that strengthen SAM’s applicability in challenging geospatial domains.
本文评估了人工智能基础模型的发展趋势,尤其是新兴的计算机视觉基础模型及其在自然景观特征分割中的表现。虽然基础模型一词在地理空间领域迅速引起了人们的兴趣,但其定义仍然模糊不清。因此,本文将首先介绍人工智能基础模型及其定义特征。基于大型语言模型(LLM)作为语言任务基础模型所取得的巨大成功,本文将讨论为地理空间人工智能(GeoAI)视觉任务构建基础模型所面临的挑战。为了评估大型人工智能视觉模型,特别是 Meta 的 "任意分割模型"(SAM)的性能,我们实施了不同的实例分割管道,尽量减少对 SAM 的改动,以充分利用其作为基础模型的能力。我们开发了一系列提示策略,以测试 SAM 在预测准确性的理论上限、零误差性能以及通过微调实现的领域适应性方面的性能。分析使用了两个永久冻土特征数据集,即冰缘多边形和逆行融雪坍塌,因为(1)这些地貌特征由于其复杂的形成机制、多样的形式和模糊的边界,比人造特征更难分割;(2)它们的存在和变化是北极变暖和气候变化的重要指标。研究结果表明,虽然 SAM 有很好的前景,但仍有改进的余地,以支持人工智能增强地形测绘。这一发现的空间和领域通用性通过使用更通用的农田绘图数据集 EuroCrops 得到了进一步验证。最后,我们讨论了未来的研究方向,以加强 SAM 在具有挑战性的地理空间领域的适用性。