Peng Chen , Peixian Li , Bing Wang , Sihai Zhao , Yongliang Zhang , Tao Zhang , Xingcheng Ding
{"title":"B3-CDG:用于双时态建筑物二进制变化检测的伪样本扩散发生器","authors":"Peng Chen , Peixian Li , Bing Wang , Sihai Zhao , Yongliang Zhang , Tao Zhang , Xingcheng Ding","doi":"10.1016/j.isprsjprs.2024.10.021","DOIUrl":null,"url":null,"abstract":"<div><div>Building change detection (CD) plays a crucial role in urban planning, land resource management, and disaster monitoring. Currently, deep learning has become a key approach in building CD, but challenges persist. Obtaining large-scale, accurately registered bi-temporal images is difficult, and annotation is time-consuming. Therefore, we propose B<sup>3</sup>-CDG, a bi-temporal building binary CD pseudo-sample generator based on the principle of latent diffusion. This generator treats building change processes as local semantic states transformations. It utilizes textual instructions and mask prompts to generate specific class changes in designated regions of single-temporal images, creating different temporal images with clear semantic transitions. B<sup>3</sup>-CDG is driven by large-scale pretrained models and utilizes external adapters to guide the model in learning remote sensing image distributions. To generate seamless building boundaries, B<sup>3</sup>-CDG adopts a simple and effective approach—dilation masks—to compel the model to learn boundary details. In addition, B<sup>3</sup>-CDG incorporates diffusion guidance and data augmentation to enhance image realism. In the generation experiments, B<sup>3</sup>-CDG achieved the best performance with the lowest FID (26.40) and the highest IS (4.60) compared to previous baseline methods (such as Inpaint and IAug). This method effectively addresses challenges such as boundary continuity, shadow generation, and vegetation occlusion while ensuring that the generated building roof structures and colors are realistic and diverse. In the application experiments, B<sup>3</sup>-CDG improved the IOU of the validation model (SFFNet) by 6.34 % and 7.10 % on the LEVIR and WHUCD datasets, respectively. When the real data is extremely limited (using only 5 % of the original data), the improvement further reaches 33.68 % and 32.40 %. Moreover, B<sup>3</sup>-CDG can enhance the baseline performance of advanced CD models, such as SNUNet and ChangeFormer. Ablation studies further confirm the effectiveness of the B<sup>3</sup>-CDG design. This study introduces a novel research paradigm for building CD, potentially advancing the field. Source code and datasets will be available at <span><span>https://github.com/ABCnutter/B3-CDG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 408-429"},"PeriodicalIF":10.6000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"B3-CDG: A pseudo-sample diffusion generator for bi-temporal building binary change detection\",\"authors\":\"Peng Chen , Peixian Li , Bing Wang , Sihai Zhao , Yongliang Zhang , Tao Zhang , Xingcheng Ding\",\"doi\":\"10.1016/j.isprsjprs.2024.10.021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Building change detection (CD) plays a crucial role in urban planning, land resource management, and disaster monitoring. Currently, deep learning has become a key approach in building CD, but challenges persist. Obtaining large-scale, accurately registered bi-temporal images is difficult, and annotation is time-consuming. Therefore, we propose B<sup>3</sup>-CDG, a bi-temporal building binary CD pseudo-sample generator based on the principle of latent diffusion. This generator treats building change processes as local semantic states transformations. It utilizes textual instructions and mask prompts to generate specific class changes in designated regions of single-temporal images, creating different temporal images with clear semantic transitions. B<sup>3</sup>-CDG is driven by large-scale pretrained models and utilizes external adapters to guide the model in learning remote sensing image distributions. To generate seamless building boundaries, B<sup>3</sup>-CDG adopts a simple and effective approach—dilation masks—to compel the model to learn boundary details. In addition, B<sup>3</sup>-CDG incorporates diffusion guidance and data augmentation to enhance image realism. In the generation experiments, B<sup>3</sup>-CDG achieved the best performance with the lowest FID (26.40) and the highest IS (4.60) compared to previous baseline methods (such as Inpaint and IAug). This method effectively addresses challenges such as boundary continuity, shadow generation, and vegetation occlusion while ensuring that the generated building roof structures and colors are realistic and diverse. In the application experiments, B<sup>3</sup>-CDG improved the IOU of the validation model (SFFNet) by 6.34 % and 7.10 % on the LEVIR and WHUCD datasets, respectively. When the real data is extremely limited (using only 5 % of the original data), the improvement further reaches 33.68 % and 32.40 %. Moreover, B<sup>3</sup>-CDG can enhance the baseline performance of advanced CD models, such as SNUNet and ChangeFormer. Ablation studies further confirm the effectiveness of the B<sup>3</sup>-CDG design. This study introduces a novel research paradigm for building CD, potentially advancing the field. Source code and datasets will be available at <span><span>https://github.com/ABCnutter/B3-CDG</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50269,\"journal\":{\"name\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"volume\":\"218 \",\"pages\":\"Pages 408-429\"},\"PeriodicalIF\":10.6000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0924271624003988\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271624003988","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
B3-CDG: A pseudo-sample diffusion generator for bi-temporal building binary change detection
Building change detection (CD) plays a crucial role in urban planning, land resource management, and disaster monitoring. Currently, deep learning has become a key approach in building CD, but challenges persist. Obtaining large-scale, accurately registered bi-temporal images is difficult, and annotation is time-consuming. Therefore, we propose B3-CDG, a bi-temporal building binary CD pseudo-sample generator based on the principle of latent diffusion. This generator treats building change processes as local semantic states transformations. It utilizes textual instructions and mask prompts to generate specific class changes in designated regions of single-temporal images, creating different temporal images with clear semantic transitions. B3-CDG is driven by large-scale pretrained models and utilizes external adapters to guide the model in learning remote sensing image distributions. To generate seamless building boundaries, B3-CDG adopts a simple and effective approach—dilation masks—to compel the model to learn boundary details. In addition, B3-CDG incorporates diffusion guidance and data augmentation to enhance image realism. In the generation experiments, B3-CDG achieved the best performance with the lowest FID (26.40) and the highest IS (4.60) compared to previous baseline methods (such as Inpaint and IAug). This method effectively addresses challenges such as boundary continuity, shadow generation, and vegetation occlusion while ensuring that the generated building roof structures and colors are realistic and diverse. In the application experiments, B3-CDG improved the IOU of the validation model (SFFNet) by 6.34 % and 7.10 % on the LEVIR and WHUCD datasets, respectively. When the real data is extremely limited (using only 5 % of the original data), the improvement further reaches 33.68 % and 32.40 %. Moreover, B3-CDG can enhance the baseline performance of advanced CD models, such as SNUNet and ChangeFormer. Ablation studies further confirm the effectiveness of the B3-CDG design. This study introduces a novel research paradigm for building CD, potentially advancing the field. Source code and datasets will be available at https://github.com/ABCnutter/B3-CDG.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.