基于DWT关注网络的遥感影像建筑物分割

Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems Pub Date : 2023-07-28 DOI:10.1145/3609703.3609704

Yin-hua Wu, Mingquan Zhou, Shenglin Geng, Dan Zhang

{"title":"基于DWT关注网络的遥感影像建筑物分割","authors":"Yin-hua Wu, Mingquan Zhou, Shenglin Geng, Dan Zhang","doi":"10.1145/3609703.3609704","DOIUrl":null,"url":null,"abstract":"The attention mechanism has been widely used and achieved good results in many visual tasks. But the calculations of attention mechanism in vision tasks consume huge spaces and times, which is the obvious disadvantage of this method. In order to alleviate this problem, we use the DWT(Discrete Wavelet Transform) method to reduce the complexity of attention calculation. DWT can transform an N-dimensional vector into two vectors, one is the low-frequency component of N/2 dimension and the other is high-frequency component of N/2 dimension too. We only use the low-frequency to calculate the attention matrixes, which can reduce the complexity of matrix multiplication, then the time and space consumption of the network is reduced significantly. We also find that the building segmentation in the remote sensing image is different from the other scene segmentation, that the sizes and numbers of different classes of the targets in the general scene images are obvious. Despite all this, our method is still applicable for the targets with large numbers and sizes in general scene images, but not for the targets with small sizes and numbers, and this view is also verified by the subsequent experiments on different datasets. We apply our method on three typical networks (Danet, Swin and Segmenter), and carry out comprehensive experiments on the Cityscape dataset and three building segmentation datasets (Inria Aerial Dataset, Massachusetts Buildings Dataset and Chinese Style Architecture Dataset). The experiments show that, our method is more suitable for building segmentation and can reduce the complexity of the model calculation in building segmentation, and the Mean IoU of segmentation results is not reduced clearly, some even improved.","PeriodicalId":101485,"journal":{"name":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building Segmentation from Remote Sensing Image via DWT Attention Networks\",\"authors\":\"Yin-hua Wu, Mingquan Zhou, Shenglin Geng, Dan Zhang\",\"doi\":\"10.1145/3609703.3609704\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The attention mechanism has been widely used and achieved good results in many visual tasks. But the calculations of attention mechanism in vision tasks consume huge spaces and times, which is the obvious disadvantage of this method. In order to alleviate this problem, we use the DWT(Discrete Wavelet Transform) method to reduce the complexity of attention calculation. DWT can transform an N-dimensional vector into two vectors, one is the low-frequency component of N/2 dimension and the other is high-frequency component of N/2 dimension too. We only use the low-frequency to calculate the attention matrixes, which can reduce the complexity of matrix multiplication, then the time and space consumption of the network is reduced significantly. We also find that the building segmentation in the remote sensing image is different from the other scene segmentation, that the sizes and numbers of different classes of the targets in the general scene images are obvious. Despite all this, our method is still applicable for the targets with large numbers and sizes in general scene images, but not for the targets with small sizes and numbers, and this view is also verified by the subsequent experiments on different datasets. We apply our method on three typical networks (Danet, Swin and Segmenter), and carry out comprehensive experiments on the Cityscape dataset and three building segmentation datasets (Inria Aerial Dataset, Massachusetts Buildings Dataset and Chinese Style Architecture Dataset). The experiments show that, our method is more suitable for building segmentation and can reduce the complexity of the model calculation in building segmentation, and the Mean IoU of segmentation results is not reduced clearly, some even improved.\",\"PeriodicalId\":101485,\"journal\":{\"name\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3609703.3609704\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609703.3609704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

注意机制在许多视觉任务中得到了广泛的应用，并取得了良好的效果。但视觉任务中注意机制的计算耗费巨大的空间和时间，这是该方法的明显缺点。为了缓解这一问题，我们采用离散小波变换(DWT)方法来降低注意力计算的复杂度。DWT可以将一个N维向量变换为两个向量，一个是N/2维的低频分量，另一个也是N/2维的高频分量。我们只使用低频来计算注意矩阵，这样可以降低矩阵乘法的复杂度，从而显著降低网络的时间和空间消耗。我们还发现，遥感图像中的建筑物分割与其他场景分割不同，在一般场景图像中，不同类别目标的大小和数量是明显的。尽管如此，我们的方法仍然适用于一般场景图像中数量和尺寸较大的目标，而不适用于尺寸和尺寸较小的目标，并且这一观点也通过后续在不同数据集上的实验得到验证。我们在三个典型网络(Danet、Swin和Segmenter)上应用了我们的方法，并在Cityscape数据集和三个建筑分割数据集(Inria Aerial dataset、Massachusetts Buildings dataset和Chinese Style Architecture dataset)上进行了综合实验。实验表明，我们的方法更适合于建筑物分割，可以降低建筑物分割中模型计算的复杂度，分割结果的Mean IoU没有明显降低，有的甚至有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Building Segmentation from Remote Sensing Image via DWT Attention Networks

The attention mechanism has been widely used and achieved good results in many visual tasks. But the calculations of attention mechanism in vision tasks consume huge spaces and times, which is the obvious disadvantage of this method. In order to alleviate this problem, we use the DWT(Discrete Wavelet Transform) method to reduce the complexity of attention calculation. DWT can transform an N-dimensional vector into two vectors, one is the low-frequency component of N/2 dimension and the other is high-frequency component of N/2 dimension too. We only use the low-frequency to calculate the attention matrixes, which can reduce the complexity of matrix multiplication, then the time and space consumption of the network is reduced significantly. We also find that the building segmentation in the remote sensing image is different from the other scene segmentation, that the sizes and numbers of different classes of the targets in the general scene images are obvious. Despite all this, our method is still applicable for the targets with large numbers and sizes in general scene images, but not for the targets with small sizes and numbers, and this view is also verified by the subsequent experiments on different datasets. We apply our method on three typical networks (Danet, Swin and Segmenter), and carry out comprehensive experiments on the Cityscape dataset and three building segmentation datasets (Inria Aerial Dataset, Massachusetts Buildings Dataset and Chinese Style Architecture Dataset). The experiments show that, our method is more suitable for building segmentation and can reduce the complexity of the model calculation in building segmentation, and the Mean IoU of segmentation results is not reduced clearly, some even improved.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems

自引率

0.00%

发文量