The traditional subspace class algorithm has low or even no estimation accuracy for DOA estimation under the conditions of less number of snapshots, low SNR and source coherence. Therefore, the application of compressed sensing theory in DOA estimation is studied in this paper. To address the problems of poor estimation accuracy of sparsity adaptive matching Pursuit(SAMP) algorithm in noisy environment and the need to gradually approximate the true sparsity from zero, a sparsity pre-estimation adaptive matching Pursuit(SPAMP) algorithm is proposed in this paper . The algorithm in this paper optimizes the iterative termination conditions of the algorithm by using the changing rules of iterative residuals, and at the same time approximates the source sparsity quickly and accurately by pre-estimating the initial sparsity.. The simulation results show that the algorithm in this paper has the advantages of high estimation accuracy, fast operation speed and better noise immunity, which promotes further integration of compressed sensing and DOA estimation in practical situations.
{"title":"DOA Estimation Based on a Sparsity Pre-estimation Adaptive Matching Pursuit Algorithm","authors":"Huijing Dou, Dongxu Xie, W. Guo","doi":"10.1145/3581807.3581863","DOIUrl":"https://doi.org/10.1145/3581807.3581863","url":null,"abstract":"The traditional subspace class algorithm has low or even no estimation accuracy for DOA estimation under the conditions of less number of snapshots, low SNR and source coherence. Therefore, the application of compressed sensing theory in DOA estimation is studied in this paper. To address the problems of poor estimation accuracy of sparsity adaptive matching Pursuit(SAMP) algorithm in noisy environment and the need to gradually approximate the true sparsity from zero, a sparsity pre-estimation adaptive matching Pursuit(SPAMP) algorithm is proposed in this paper . The algorithm in this paper optimizes the iterative termination conditions of the algorithm by using the changing rules of iterative residuals, and at the same time approximates the source sparsity quickly and accurately by pre-estimating the initial sparsity.. The simulation results show that the algorithm in this paper has the advantages of high estimation accuracy, fast operation speed and better noise immunity, which promotes further integration of compressed sensing and DOA estimation in practical situations.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133133553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is important to extract vehicle appearance features for vehicle re-identification. The appearance variation of the same vehicle from different viewpoints and the appearance similarity between vehicles from different classes bring challenges for capturing the descriptive features. Considering these, we propose a multi-scale attention feature fusion network (MSAF) for vehicle re-identification. It uses ResNet50 as the backbone, and introduces a scalable channel attention module for each feature channel. Then a multi-scale fusion module is designed to output the final extracted vehicle features. Experimental results on the VERI-Wild dataset indicate that the proposed MSAF achieves high Rank-1 index of 91.20% with mAP of 80.20%.
{"title":"Vehicle Re-identification Based on Multi-Scale Attention Feature Fusion","authors":"Geyan Su, Zhonghua Sun, Kebin Jia, Jinchao Feng","doi":"10.1145/3581807.3581813","DOIUrl":"https://doi.org/10.1145/3581807.3581813","url":null,"abstract":"It is important to extract vehicle appearance features for vehicle re-identification. The appearance variation of the same vehicle from different viewpoints and the appearance similarity between vehicles from different classes bring challenges for capturing the descriptive features. Considering these, we propose a multi-scale attention feature fusion network (MSAF) for vehicle re-identification. It uses ResNet50 as the backbone, and introduces a scalable channel attention module for each feature channel. Then a multi-scale fusion module is designed to output the final extracted vehicle features. Experimental results on the VERI-Wild dataset indicate that the proposed MSAF achieves high Rank-1 index of 91.20% with mAP of 80.20%.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114188218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
College students live alone without their parents and bear the influence of academics, life, personality, family, and other factors alone, which leads to the phenomenon of isolation and autism in some students. If this situation is not detected and resolved in time, it may cause serious consequences. This paper uses the consumption data of students to analyze the students' friendship situation. First, it examines the consumption data of the students' campus all-in-one cards and observes the consumption behaviors of the students from the three aspects of consumption time, consumption location, and consumption frequency. It is found that the more overlapping the trajectories of the consumption locations among students, the more likely there is a friendship between students. On this basis, this paper proposes a student friend discovery model, which further explores the social relationship between students from the perspective of multiple colleges, and can find both friend relationships and lonely students. The experimental results show that the excavated social relationships align with the actual situation.
{"title":"Friend Relation Recognization Algorithm Based on The Campus Card Consumption","authors":"Haopeng Zhang, Jinbo Yu, Mengyu Li, Yuchen Zhang, Yulong Ling, Xiao Zhang","doi":"10.1145/3581807.3581883","DOIUrl":"https://doi.org/10.1145/3581807.3581883","url":null,"abstract":"College students live alone without their parents and bear the influence of academics, life, personality, family, and other factors alone, which leads to the phenomenon of isolation and autism in some students. If this situation is not detected and resolved in time, it may cause serious consequences. This paper uses the consumption data of students to analyze the students' friendship situation. First, it examines the consumption data of the students' campus all-in-one cards and observes the consumption behaviors of the students from the three aspects of consumption time, consumption location, and consumption frequency. It is found that the more overlapping the trajectories of the consumption locations among students, the more likely there is a friendship between students. On this basis, this paper proposes a student friend discovery model, which further explores the social relationship between students from the perspective of multiple colleges, and can find both friend relationships and lonely students. The experimental results show that the excavated social relationships align with the actual situation.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125129053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the field of human action recognition has been the focus of computer vision, and human action recognition has a good prospect in many fields, such as security state monitoring, behavior characteristics analysis and network video image restoration. In this paper, based on attention mechanism of human action recognition method is studied, in order to improve the model accuracy and efficiency in VIT network structure as the framework of feature extraction, because video data includes characteristics of time and space, so choose the space and time attention mechanism instead of the traditional convolution network for feature extraction, In addition, L2 weight attenuation regularization is introduced in model training to prevent the model from overfitting the training data. Through the test on the human action related dataset UCF101, it is found that the proposed model can effectively improve the recognition accuracy compared with other models.
{"title":"Human Action Recognition Based on Vision Transformer and L2 Regularization","authors":"Qiliang Chen, Hasiqidalatu Tang, Jia-xin Cai","doi":"10.1145/3581807.3581840","DOIUrl":"https://doi.org/10.1145/3581807.3581840","url":null,"abstract":"In recent years, the field of human action recognition has been the focus of computer vision, and human action recognition has a good prospect in many fields, such as security state monitoring, behavior characteristics analysis and network video image restoration. In this paper, based on attention mechanism of human action recognition method is studied, in order to improve the model accuracy and efficiency in VIT network structure as the framework of feature extraction, because video data includes characteristics of time and space, so choose the space and time attention mechanism instead of the traditional convolution network for feature extraction, In addition, L2 weight attenuation regularization is introduced in model training to prevent the model from overfitting the training data. Through the test on the human action related dataset UCF101, it is found that the proposed model can effectively improve the recognition accuracy compared with other models.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116897905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xichao Yue, Chaoqun Wang, Yong Wang, Le Chen, Weifei Wang, Yuhang Lei
Anomaly detection for gas flowmeter data is one of the important means to improve the reliability of fair trade of natural gas transmission and distribution. However, the field environment of natural gas in the industrial scene has the characteristics of complex anomaly categories and difficult to distinguish some anomalies. At the same time, the traditional anomaly detection methods are difficult to accurately analyze the abnormal state for a period of time, and are easy to be disturbed by many factors. For example, although DBSCAN (density based spatial clustering of applications with noise) can cluster dense data sets of arbitrary shape, it will greatly affect the classification effect of data sets with uneven density, and the noise points will also interfere to a certain extent, resulting in the weakening of the ability of the algorithm to distinguish anomalies. LOF(local outliers factor) algorithm realizes outlier detection by calculating the local density deviation of a given data point relative to its neighborhood. In view of the above problems. A more accurate anomaly detection strategy is proposed. Firstly, the local anomaly factor algorithm is used to eliminate outliers with too large LOF value, so as to reduce the clustering effect of DBSCAN due to uneven density as much as possible. Experiments show that the clustering effect of this strategy is significantly improved compared with the traditional detection methods.
天然气流量计数据异常检测是提高天然气输配公平交易可靠性的重要手段之一。然而,工业场景天然气现场环境具有异常类别复杂、部分异常难以识别的特点。同时,传统的异常检测方法难以对一段时间内的异常状态进行准确分析,容易受到多种因素的干扰。例如,DBSCAN (density based spatial clustering of applications with noise)虽然可以对任意形状的密集数据集进行聚类,但会极大地影响密度不均匀数据集的分类效果,并且噪声点也会在一定程度上产生干扰,导致算法区分异常的能力减弱。LOF(local outliers factor)算法通过计算给定数据点相对于其邻域的局部密度偏差来实现离群点检测。鉴于以上问题。提出了一种更精确的异常检测策略。首先,采用局部异常因子算法剔除LOF值过大的离群点,尽可能降低DBSCAN因密度不均匀造成的聚类效果。实验表明,与传统检测方法相比,该策略的聚类效果有了显著提高。
{"title":"Gas flow meter anomaly data detection based on fused LOF-DBSCAN algorithm","authors":"Xichao Yue, Chaoqun Wang, Yong Wang, Le Chen, Weifei Wang, Yuhang Lei","doi":"10.1145/3581807.3581881","DOIUrl":"https://doi.org/10.1145/3581807.3581881","url":null,"abstract":"Anomaly detection for gas flowmeter data is one of the important means to improve the reliability of fair trade of natural gas transmission and distribution. However, the field environment of natural gas in the industrial scene has the characteristics of complex anomaly categories and difficult to distinguish some anomalies. At the same time, the traditional anomaly detection methods are difficult to accurately analyze the abnormal state for a period of time, and are easy to be disturbed by many factors. For example, although DBSCAN (density based spatial clustering of applications with noise) can cluster dense data sets of arbitrary shape, it will greatly affect the classification effect of data sets with uneven density, and the noise points will also interfere to a certain extent, resulting in the weakening of the ability of the algorithm to distinguish anomalies. LOF(local outliers factor) algorithm realizes outlier detection by calculating the local density deviation of a given data point relative to its neighborhood. In view of the above problems. A more accurate anomaly detection strategy is proposed. Firstly, the local anomaly factor algorithm is used to eliminate outliers with too large LOF value, so as to reduce the clustering effect of DBSCAN due to uneven density as much as possible. Experiments show that the clustering effect of this strategy is significantly improved compared with the traditional detection methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114954624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract: Medical image segmentation is clinically important in medical diagnosis as it permits superior lesion detection in medical diagnosis to help physicians assist in treatment. Vision Transformer (ViT) has achieved remarkable results in computer vision and has been used for image segmentation tasks, but the potential in medical image segmentation remains largely unexplored with the special characteristics of medical images. Moreover, ViT based on multi-head self-attention (MSA) converts the image into a one-dimensional sequence, which destroys the two-dimensional structure of the image. Therefore, we propose VA-TransUNet, which combines the advantages of Transformer and Convolutional Neural Networks (CNN) to capture global and local contextual information and consider the features of channel dimensionality. Transformer based on visual attention is adopted, it is taken as the encoder, CNN is used as the decoder, and the image is directly fed into the Transformer. The key of visual attention is the large kernel attention (LKA), which is a depth-wise separable convolution that decomposes a large convolution into various convolutions. Experiment on Synapse of abdominal multi-organ (Synapse) and Automated Cardiac Diagnosis Challenge (ACDC) datasets demonstrate that we proposed VA-TransUNet outperforms the current the-state-of-art networks. The codes and trained models will be publicly and available at https://github.com/BeautySilly/VA-TransUNet.
{"title":"VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention","authors":"Ting Jiang, Tao Xu, Xiaoning Li","doi":"10.1145/3581807.3581826","DOIUrl":"https://doi.org/10.1145/3581807.3581826","url":null,"abstract":"Abstract: Medical image segmentation is clinically important in medical diagnosis as it permits superior lesion detection in medical diagnosis to help physicians assist in treatment. Vision Transformer (ViT) has achieved remarkable results in computer vision and has been used for image segmentation tasks, but the potential in medical image segmentation remains largely unexplored with the special characteristics of medical images. Moreover, ViT based on multi-head self-attention (MSA) converts the image into a one-dimensional sequence, which destroys the two-dimensional structure of the image. Therefore, we propose VA-TransUNet, which combines the advantages of Transformer and Convolutional Neural Networks (CNN) to capture global and local contextual information and consider the features of channel dimensionality. Transformer based on visual attention is adopted, it is taken as the encoder, CNN is used as the decoder, and the image is directly fed into the Transformer. The key of visual attention is the large kernel attention (LKA), which is a depth-wise separable convolution that decomposes a large convolution into various convolutions. Experiment on Synapse of abdominal multi-organ (Synapse) and Automated Cardiac Diagnosis Challenge (ACDC) datasets demonstrate that we proposed VA-TransUNet outperforms the current the-state-of-art networks. The codes and trained models will be publicly and available at https://github.com/BeautySilly/VA-TransUNet.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114283841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Songxin Ye, Nanying Li, Jiaqi Xue, Yaqian Long, S. Jia
Traditional cell viability judgment methods are invasive and damaging to cells. Moreover, even under a microscope, it is difficult to distinguish live cells from dead cells by the naked eye alone. With the development of optical imaging technology, hyperspectral imaging is more and more widely used in various fields. Hyperspectral imaging is a non-contact optical technique that provides both spectral and spatial information in a single measurement. It becomes a fast, non-invasive option to differentiate between live and dead cells. In recent years, the rapid development of deep learning has provided a better way to distinguish the difference between living and dead cells through a large amount of data. However, it is often necessary to acquire large amounts of labeled data at an expensive cost to train models. This is more difficult to achieve on medical hyperspectral images. Therefore, in this paper, a new model called HSI-DETR is proposed to solve the above problem on the target detection task of live and dead cells, which is based on the detection transformer (DETR) model. The HSI-DETR model suitable for hyperspectral images (HSI) is proposed with minimal modification. Then, some parameters of DETR trained on RGB images are transferred to HSI-DETR trained on hyperspectral images. Compared to the general method, this method can train a better model with a small number of labeled samples. And compared to the DETR-R50, the AP50 of HSI-DETR-R50 has increased by 5.15%.
{"title":"HSI-DETR: A DETR-based Transfer Learning from RGB to Hyperspectral Images for Object Detection of Live and Dead Cells: To achieve better results, convert models with the fewest changes from RGB to HSI.","authors":"Songxin Ye, Nanying Li, Jiaqi Xue, Yaqian Long, S. Jia","doi":"10.1145/3581807.3581822","DOIUrl":"https://doi.org/10.1145/3581807.3581822","url":null,"abstract":"Traditional cell viability judgment methods are invasive and damaging to cells. Moreover, even under a microscope, it is difficult to distinguish live cells from dead cells by the naked eye alone. With the development of optical imaging technology, hyperspectral imaging is more and more widely used in various fields. Hyperspectral imaging is a non-contact optical technique that provides both spectral and spatial information in a single measurement. It becomes a fast, non-invasive option to differentiate between live and dead cells. In recent years, the rapid development of deep learning has provided a better way to distinguish the difference between living and dead cells through a large amount of data. However, it is often necessary to acquire large amounts of labeled data at an expensive cost to train models. This is more difficult to achieve on medical hyperspectral images. Therefore, in this paper, a new model called HSI-DETR is proposed to solve the above problem on the target detection task of live and dead cells, which is based on the detection transformer (DETR) model. The HSI-DETR model suitable for hyperspectral images (HSI) is proposed with minimal modification. Then, some parameters of DETR trained on RGB images are transferred to HSI-DETR trained on hyperspectral images. Compared to the general method, this method can train a better model with a small number of labeled samples. And compared to the DETR-R50, the AP50 of HSI-DETR-R50 has increased by 5.15%.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131386339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Ouyang, Xinqing Wang, Honghui Xu, Ruizhe Hu, Faming Shao, Dong Wang
Few-shot object detection (FSOD) aims to retain the performance of detector when only given scarce annotated instances. We reckon that its difficulty lies in the fact that the scare positive samples restrict the accurate construction of the eigenspace of involved categories. In this paper, we proposed a novel FSOD detector based on refining the eigenspace, which is implemented through a pure positive augmentation, a full feature mining module and a modified loss function. The pure positive augmentation expands the quantity and enriches the scale distribution of positive samples, inhibiting the expansion of negative samples. The full feature mining module enables the model to mining more information about objects. The modified loss function drives prediction results closer to ground truths. We apply these two improvements to YOLOv4, the representative of one-stage detector, which is termed YOLOv4-FS. On PASCAL VOC and MS COCO datasets, our YOLOv4-FS achieves competitive performance compared with existing progressive detectors.
{"title":"Few-shot Object Detection via Refining Eigenspace","authors":"Yan Ouyang, Xinqing Wang, Honghui Xu, Ruizhe Hu, Faming Shao, Dong Wang","doi":"10.1145/3581807.3581820","DOIUrl":"https://doi.org/10.1145/3581807.3581820","url":null,"abstract":"Few-shot object detection (FSOD) aims to retain the performance of detector when only given scarce annotated instances. We reckon that its difficulty lies in the fact that the scare positive samples restrict the accurate construction of the eigenspace of involved categories. In this paper, we proposed a novel FSOD detector based on refining the eigenspace, which is implemented through a pure positive augmentation, a full feature mining module and a modified loss function. The pure positive augmentation expands the quantity and enriches the scale distribution of positive samples, inhibiting the expansion of negative samples. The full feature mining module enables the model to mining more information about objects. The modified loss function drives prediction results closer to ground truths. We apply these two improvements to YOLOv4, the representative of one-stage detector, which is termed YOLOv4-FS. On PASCAL VOC and MS COCO datasets, our YOLOv4-FS achieves competitive performance compared with existing progressive detectors.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130862672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.
{"title":"Swin Transformer with Multi-Scale Residual Attention for Semantic Segmentation of Remote Sensing Images","authors":"Yuanyang Lin, Da-han Wang, Yun Wu, Shunzhi Zhu","doi":"10.1145/3581807.3581827","DOIUrl":"https://doi.org/10.1145/3581807.3581827","url":null,"abstract":"Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134176559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chinese address resolution (CAR) is a key step in geocoding technology, and the resolution results directly affect the service quality of address-based applications. Deep learning models have been widely used in CAR task but they require abundant annotated address data to obtain satisfied performance. In this paper, an active transfer learning method combining uncertainty with diversity for CAR is proposed, for which the main goal is to mitigate the annotation requirement for unlabeled address in the target region and to Improve the utilization of labeled data in the source region. Considering the correlation among Chinese addresses, we propose a clustering method of unlabeled address on the basis of feature words, mined from address data based on LDA model, to reflect the distribution of the address. A metric of comprehensive sample strategy combing uncertainty with diversity (CSSCUD) is constructed to select training samples from the target region, which can obtain high valuable samples by considering informativeness and distribution in feature words space jointly in each batch. Experiments on the address dataset from two different regions show that the comprehensive active transfer learning method achieves a higher resolution accuracy than various baselines by using the same number of labeled training samples, which illustrates that the proposed method is effective and practical for CAR.
{"title":"An Active Transfer Learning Method Combining Uncertainty with Diversity for Chinese Address Resolution","authors":"Yuwei Hu, Xueyuan Zheng, Ping Zong","doi":"10.1145/3581807.3581902","DOIUrl":"https://doi.org/10.1145/3581807.3581902","url":null,"abstract":"Chinese address resolution (CAR) is a key step in geocoding technology, and the resolution results directly affect the service quality of address-based applications. Deep learning models have been widely used in CAR task but they require abundant annotated address data to obtain satisfied performance. In this paper, an active transfer learning method combining uncertainty with diversity for CAR is proposed, for which the main goal is to mitigate the annotation requirement for unlabeled address in the target region and to Improve the utilization of labeled data in the source region. Considering the correlation among Chinese addresses, we propose a clustering method of unlabeled address on the basis of feature words, mined from address data based on LDA model, to reflect the distribution of the address. A metric of comprehensive sample strategy combing uncertainty with diversity (CSSCUD) is constructed to select training samples from the target region, which can obtain high valuable samples by considering informativeness and distribution in feature words space jointly in each batch. Experiments on the address dataset from two different regions show that the comprehensive active transfer learning method achieves a higher resolution accuracy than various baselines by using the same number of labeled training samples, which illustrates that the proposed method is effective and practical for CAR.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134511443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}