Chenhao Wang , Jingbo Chen , Yu Meng , Yupeng Deng , Kai Li , Yunlong Kong
{"title":"SAMPolyBuild: Adapting the Segment Anything Model for polygonal building extraction","authors":"Chenhao Wang , Jingbo Chen , Yu Meng , Yupeng Deng , Kai Li , Yunlong Kong","doi":"10.1016/j.isprsjprs.2024.09.018","DOIUrl":null,"url":null,"abstract":"<div><div>Extracting polygonal buildings from high-resolution remote sensing images is a critical task for large-scale mapping, 3D city modeling, and various geographic information system applications. Traditional methods are often restricted in accurately delineating boundaries and exhibit limited generalizability, which can affect their real-world applicability. The Segment Anything Model (SAM), a promptable segmentation model trained on an unprecedentedly large dataset, demonstrates remarkable generalization ability across various scenarios. In this context, we present SAMPolyBuild, an innovative framework that adapts SAM for polygonal building extraction, allowing for both automatic and prompt-based extraction. To fulfill the requirement for object location prompts in SAM, we developed the Auto Bbox Prompter, which is trained to detect building bounding boxes directly from the image encoder features of the SAM. The boundary precision of the SAM mask results was insufficient for vector polygon extraction, especially when challenged by blurry edges and tree occlusions. Therefore, we extended the SAM decoder with additional parameters to enable multitask learning to predict masks and generate Gaussian vertex and boundary maps simultaneously. Furthermore, we developed a mask-guided vertex connection algorithm to generate the final polygon. Extensive evaluation on the WHU-Mix vector dataset and SpaceNet datasets demonstrate that our method achieves a new state-of-the-art in terms of accuracy and generalizability, significantly improving average precision (AP), average recall (AR), intersection over union (IoU), boundary F1, and vertex F1 metrics. Moreover, by combining the automatic and prompt modes of our framework, we found that 91.2% of the building polygons predicted by SAMPolyBuild on out-of-domain data closely match the quality of manually delineated polygons. The source code is available at <span><span>https://github.com/wchh-2000/SAMPolyBuild</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 707-720"},"PeriodicalIF":10.6000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271624003563","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Extracting polygonal buildings from high-resolution remote sensing images is a critical task for large-scale mapping, 3D city modeling, and various geographic information system applications. Traditional methods are often restricted in accurately delineating boundaries and exhibit limited generalizability, which can affect their real-world applicability. The Segment Anything Model (SAM), a promptable segmentation model trained on an unprecedentedly large dataset, demonstrates remarkable generalization ability across various scenarios. In this context, we present SAMPolyBuild, an innovative framework that adapts SAM for polygonal building extraction, allowing for both automatic and prompt-based extraction. To fulfill the requirement for object location prompts in SAM, we developed the Auto Bbox Prompter, which is trained to detect building bounding boxes directly from the image encoder features of the SAM. The boundary precision of the SAM mask results was insufficient for vector polygon extraction, especially when challenged by blurry edges and tree occlusions. Therefore, we extended the SAM decoder with additional parameters to enable multitask learning to predict masks and generate Gaussian vertex and boundary maps simultaneously. Furthermore, we developed a mask-guided vertex connection algorithm to generate the final polygon. Extensive evaluation on the WHU-Mix vector dataset and SpaceNet datasets demonstrate that our method achieves a new state-of-the-art in terms of accuracy and generalizability, significantly improving average precision (AP), average recall (AR), intersection over union (IoU), boundary F1, and vertex F1 metrics. Moreover, by combining the automatic and prompt modes of our framework, we found that 91.2% of the building polygons predicted by SAMPolyBuild on out-of-domain data closely match the quality of manually delineated polygons. The source code is available at https://github.com/wchh-2000/SAMPolyBuild.
从高分辨率遥感图像中提取多边形建筑物是大比例尺测绘、三维城市建模和各种地理信息系统应用的一项关键任务。传统方法在准确划定边界方面往往受到限制,并表现出有限的通用性,这可能会影响其在现实世界中的适用性。Segment Anything Model(SAM)是一种在前所未有的大型数据集上训练出来的可提示分割模型,它在各种场景中都表现出卓越的泛化能力。在这种情况下,我们提出了 SAMPolyBuild,这是一个创新的框架,它将 SAM 用于多边形建筑提取,允许自动提取和基于提示的提取。为了满足 SAM 中对象位置提示的要求,我们开发了自动边界框提示器,经过训练,它可以直接从 SAM 的图像编码器特征中检测建筑物边界框。SAM 掩码结果的边界精度不足以进行矢量多边形提取,尤其是在边缘模糊和树木遮挡的情况下。因此,我们对 SAM 解码器进行了扩展,增加了额外的参数,使多任务学习能够预测遮罩,并同时生成高斯顶点和边界图。此外,我们还开发了一种掩膜引导的顶点连接算法,以生成最终的多边形。在WHU-Mix向量数据集和SpaceNet数据集上进行的广泛评估表明,我们的方法在准确性和通用性方面达到了新的先进水平,显著提高了平均精确度(AP)、平均召回率(AR)、交集大于联合(IoU)、边界F1和顶点F1指标。此外,通过结合我们框架的自动和提示模式,我们发现 SAMPolyBuild 在域外数据上预测的 91.2% 的建筑多边形与人工划定的多边形质量非常接近。源代码见 https://github.com/wchh-2000/SAMPolyBuild。
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.