Yin-hua Wu, Mingquan Zhou, Shenglin Geng, Dan Zhang
{"title":"基于DWT关注网络的遥感影像建筑物分割","authors":"Yin-hua Wu, Mingquan Zhou, Shenglin Geng, Dan Zhang","doi":"10.1145/3609703.3609704","DOIUrl":null,"url":null,"abstract":"The attention mechanism has been widely used and achieved good results in many visual tasks. But the calculations of attention mechanism in vision tasks consume huge spaces and times, which is the obvious disadvantage of this method. In order to alleviate this problem, we use the DWT(Discrete Wavelet Transform) method to reduce the complexity of attention calculation. DWT can transform an N-dimensional vector into two vectors, one is the low-frequency component of N/2 dimension and the other is high-frequency component of N/2 dimension too. We only use the low-frequency to calculate the attention matrixes, which can reduce the complexity of matrix multiplication, then the time and space consumption of the network is reduced significantly. We also find that the building segmentation in the remote sensing image is different from the other scene segmentation, that the sizes and numbers of different classes of the targets in the general scene images are obvious. Despite all this, our method is still applicable for the targets with large numbers and sizes in general scene images, but not for the targets with small sizes and numbers, and this view is also verified by the subsequent experiments on different datasets. We apply our method on three typical networks (Danet, Swin and Segmenter), and carry out comprehensive experiments on the Cityscape dataset and three building segmentation datasets (Inria Aerial Dataset, Massachusetts Buildings Dataset and Chinese Style Architecture Dataset). The experiments show that, our method is more suitable for building segmentation and can reduce the complexity of the model calculation in building segmentation, and the Mean IoU of segmentation results is not reduced clearly, some even improved.","PeriodicalId":101485,"journal":{"name":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building Segmentation from Remote Sensing Image via DWT Attention Networks\",\"authors\":\"Yin-hua Wu, Mingquan Zhou, Shenglin Geng, Dan Zhang\",\"doi\":\"10.1145/3609703.3609704\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The attention mechanism has been widely used and achieved good results in many visual tasks. But the calculations of attention mechanism in vision tasks consume huge spaces and times, which is the obvious disadvantage of this method. In order to alleviate this problem, we use the DWT(Discrete Wavelet Transform) method to reduce the complexity of attention calculation. DWT can transform an N-dimensional vector into two vectors, one is the low-frequency component of N/2 dimension and the other is high-frequency component of N/2 dimension too. We only use the low-frequency to calculate the attention matrixes, which can reduce the complexity of matrix multiplication, then the time and space consumption of the network is reduced significantly. We also find that the building segmentation in the remote sensing image is different from the other scene segmentation, that the sizes and numbers of different classes of the targets in the general scene images are obvious. Despite all this, our method is still applicable for the targets with large numbers and sizes in general scene images, but not for the targets with small sizes and numbers, and this view is also verified by the subsequent experiments on different datasets. We apply our method on three typical networks (Danet, Swin and Segmenter), and carry out comprehensive experiments on the Cityscape dataset and three building segmentation datasets (Inria Aerial Dataset, Massachusetts Buildings Dataset and Chinese Style Architecture Dataset). The experiments show that, our method is more suitable for building segmentation and can reduce the complexity of the model calculation in building segmentation, and the Mean IoU of segmentation results is not reduced clearly, some even improved.\",\"PeriodicalId\":101485,\"journal\":{\"name\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3609703.3609704\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609703.3609704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Building Segmentation from Remote Sensing Image via DWT Attention Networks
The attention mechanism has been widely used and achieved good results in many visual tasks. But the calculations of attention mechanism in vision tasks consume huge spaces and times, which is the obvious disadvantage of this method. In order to alleviate this problem, we use the DWT(Discrete Wavelet Transform) method to reduce the complexity of attention calculation. DWT can transform an N-dimensional vector into two vectors, one is the low-frequency component of N/2 dimension and the other is high-frequency component of N/2 dimension too. We only use the low-frequency to calculate the attention matrixes, which can reduce the complexity of matrix multiplication, then the time and space consumption of the network is reduced significantly. We also find that the building segmentation in the remote sensing image is different from the other scene segmentation, that the sizes and numbers of different classes of the targets in the general scene images are obvious. Despite all this, our method is still applicable for the targets with large numbers and sizes in general scene images, but not for the targets with small sizes and numbers, and this view is also verified by the subsequent experiments on different datasets. We apply our method on three typical networks (Danet, Swin and Segmenter), and carry out comprehensive experiments on the Cityscape dataset and three building segmentation datasets (Inria Aerial Dataset, Massachusetts Buildings Dataset and Chinese Style Architecture Dataset). The experiments show that, our method is more suitable for building segmentation and can reduce the complexity of the model calculation in building segmentation, and the Mean IoU of segmentation results is not reduced clearly, some even improved.