Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition最新文献

英文中文

DOA Estimation Based on a Sparsity Pre-estimation Adaptive Matching Pursuit Algorithm 基于稀疏预估计自适应匹配跟踪算法的DOA估计

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581863

Huijing Dou, Dongxu Xie, W. Guo

The traditional subspace class algorithm has low or even no estimation accuracy for DOA estimation under the conditions of less number of snapshots, low SNR and source coherence. Therefore, the application of compressed sensing theory in DOA estimation is studied in this paper. To address the problems of poor estimation accuracy of sparsity adaptive matching Pursuit(SAMP) algorithm in noisy environment and the need to gradually approximate the true sparsity from zero, a sparsity pre-estimation adaptive matching Pursuit(SPAMP) algorithm is proposed in this paper . The algorithm in this paper optimizes the iterative termination conditions of the algorithm by using the changing rules of iterative residuals, and at the same time approximates the source sparsity quickly and accurately by pre-estimating the initial sparsity.. The simulation results show that the algorithm in this paper has the advantages of high estimation accuracy, fast operation speed and better noise immunity, which promotes further integration of compressed sensing and DOA estimation in practical situations.

传统的子空间类算法在快照个数少、信噪比低、信源相干性低等条件下，对DOA估计精度低，甚至没有估计精度。因此，本文对压缩感知理论在DOA估计中的应用进行了研究。针对稀疏自适应匹配追踪(SAMP)算法在噪声环境下估计精度差，需要从零逐渐逼近真实稀疏度的问题，提出了稀疏预估计自适应匹配追踪(SPAMP)算法。本文算法利用迭代残差的变化规律来优化算法的迭代终止条件，同时通过预估初始稀疏度来快速准确地逼近源稀疏度。仿真结果表明，本文算法具有估计精度高、运算速度快、抗噪声能力强等优点，可促进压缩感知与DOA估计在实际应用中的进一步融合。

引用次数: 0

Vehicle Re-identification Based on Multi-Scale Attention Feature Fusion 基于多尺度注意力特征融合的车辆再识别

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581813

Geyan Su, Zhonghua Sun, Kebin Jia, Jinchao Feng

It is important to extract vehicle appearance features for vehicle re-identification. The appearance variation of the same vehicle from different viewpoints and the appearance similarity between vehicles from different classes bring challenges for capturing the descriptive features. Considering these, we propose a multi-scale attention feature fusion network (MSAF) for vehicle re-identification. It uses ResNet50 as the backbone, and introduces a scalable channel attention module for each feature channel. Then a multi-scale fusion module is designed to output the final extracted vehicle features. Experimental results on the VERI-Wild dataset indicate that the proposed MSAF achieves high Rank-1 index of 91.20% with mAP of 80.20%.

车辆外观特征的提取是车辆再识别的重要内容。同一车辆在不同视角下的外观差异以及不同类别车辆之间的外观相似性给描述特征的捕捉带来了挑战。为此，我们提出了一种用于车辆再识别的多尺度注意力特征融合网络(MSAF)。它以ResNet50为骨干，并为每个特征通道引入可扩展的通道关注模块。然后设计多尺度融合模块输出最终提取的车辆特征。在VERI-Wild数据集上的实验结果表明，本文提出的MSAF达到了91.20%的Rank-1指数，mAP达到了80.20%。

引用次数: 0

Friend Relation Recognization Algorithm Based on The Campus Card Consumption 基于校园一卡通消费的朋友关系识别算法

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581883

Haopeng Zhang, Jinbo Yu, Mengyu Li, Yuchen Zhang, Yulong Ling, Xiao Zhang

College students live alone without their parents and bear the influence of academics, life, personality, family, and other factors alone, which leads to the phenomenon of isolation and autism in some students. If this situation is not detected and resolved in time, it may cause serious consequences. This paper uses the consumption data of students to analyze the students' friendship situation. First, it examines the consumption data of the students' campus all-in-one cards and observes the consumption behaviors of the students from the three aspects of consumption time, consumption location, and consumption frequency. It is found that the more overlapping the trajectories of the consumption locations among students, the more likely there is a friendship between students. On this basis, this paper proposes a student friend discovery model, which further explores the social relationship between students from the perspective of multiple colleges, and can find both friend relationships and lonely students. The experimental results show that the excavated social relationships align with the actual situation.

大学生在没有父母陪伴的情况下独自生活，独自承受学业、生活、性格、家庭等因素的影响，导致部分学生出现孤立和自闭症现象。如果不及时发现和解决，可能会造成严重的后果。本文利用大学生消费数据对大学生的友谊状况进行分析。首先，对学生校园一卡通的消费数据进行梳理，从消费时间、消费地点、消费频率三个方面观察学生的消费行为。研究发现，学生之间的消费位置轨迹重叠越多，学生之间越有可能存在友谊。在此基础上，本文提出了一个学生朋友发现模型，该模型从多个学院的角度进一步探索学生之间的社会关系，既可以发现朋友关系，也可以发现孤独的学生。实验结果表明，挖掘出的社会关系与实际情况相符。

{"title":"Friend Relation Recognization Algorithm Based on The Campus Card Consumption","authors":"Haopeng Zhang, Jinbo Yu, Mengyu Li, Yuchen Zhang, Yulong Ling, Xiao Zhang","doi":"10.1145/3581807.3581883","DOIUrl":"https://doi.org/10.1145/3581807.3581883","url":null,"abstract":"College students live alone without their parents and bear the influence of academics, life, personality, family, and other factors alone, which leads to the phenomenon of isolation and autism in some students. If this situation is not detected and resolved in time, it may cause serious consequences. This paper uses the consumption data of students to analyze the students' friendship situation. First, it examines the consumption data of the students' campus all-in-one cards and observes the consumption behaviors of the students from the three aspects of consumption time, consumption location, and consumption frequency. It is found that the more overlapping the trajectories of the consumption locations among students, the more likely there is a friendship between students. On this basis, this paper proposes a student friend discovery model, which further explores the social relationship between students from the perspective of multiple colleges, and can find both friend relationships and lonely students. The experimental results show that the excavated social relationships align with the actual situation.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125129053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human Action Recognition Based on Vision Transformer and L2 Regularization 基于视觉变换和L2正则化的人体动作识别

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581840

Qiliang Chen, Hasiqidalatu Tang, Jia-xin Cai

In recent years, the field of human action recognition has been the focus of computer vision, and human action recognition has a good prospect in many fields, such as security state monitoring, behavior characteristics analysis and network video image restoration. In this paper, based on attention mechanism of human action recognition method is studied, in order to improve the model accuracy and efficiency in VIT network structure as the framework of feature extraction, because video data includes characteristics of time and space, so choose the space and time attention mechanism instead of the traditional convolution network for feature extraction, In addition, L2 weight attenuation regularization is introduced in model training to prevent the model from overfitting the training data. Through the test on the human action related dataset UCF101, it is found that the proposed model can effectively improve the recognition accuracy compared with other models.

近年来，人体动作识别领域一直是计算机视觉的研究热点，人体动作识别在安全状态监控、行为特征分析、网络视频图像恢复等诸多领域都有很好的应用前景。本文对基于注意机制的人体动作识别方法进行了研究，为了提高模型的准确性和效率，以VIT网络结构为框架进行特征提取，由于视频数据包含时间和空间的特征，因此选择了时空注意机制代替传统的卷积网络进行特征提取。在模型训练中引入L2权值衰减正则化，防止模型与训练数据过拟合。通过在人体动作相关数据集UCF101上的测试，发现与其他模型相比，所提出的模型可以有效地提高识别精度。

引用次数: 0

Gas flow meter anomaly data detection based on fused LOF-DBSCAN algorithm 基于融合LOF-DBSCAN算法的气体流量计异常数据检测

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581881

Xichao Yue, Chaoqun Wang, Yong Wang, Le Chen, Weifei Wang, Yuhang Lei

Anomaly detection for gas flowmeter data is one of the important means to improve the reliability of fair trade of natural gas transmission and distribution. However, the field environment of natural gas in the industrial scene has the characteristics of complex anomaly categories and difficult to distinguish some anomalies. At the same time, the traditional anomaly detection methods are difficult to accurately analyze the abnormal state for a period of time, and are easy to be disturbed by many factors. For example, although DBSCAN (density based spatial clustering of applications with noise) can cluster dense data sets of arbitrary shape, it will greatly affect the classification effect of data sets with uneven density, and the noise points will also interfere to a certain extent, resulting in the weakening of the ability of the algorithm to distinguish anomalies. LOF(local outliers factor) algorithm realizes outlier detection by calculating the local density deviation of a given data point relative to its neighborhood. In view of the above problems. A more accurate anomaly detection strategy is proposed. Firstly, the local anomaly factor algorithm is used to eliminate outliers with too large LOF value, so as to reduce the clustering effect of DBSCAN due to uneven density as much as possible. Experiments show that the clustering effect of this strategy is significantly improved compared with the traditional detection methods.

天然气流量计数据异常检测是提高天然气输配公平交易可靠性的重要手段之一。然而，工业场景天然气现场环境具有异常类别复杂、部分异常难以识别的特点。同时，传统的异常检测方法难以对一段时间内的异常状态进行准确分析，容易受到多种因素的干扰。例如，DBSCAN (density based spatial clustering of applications with noise)虽然可以对任意形状的密集数据集进行聚类，但会极大地影响密度不均匀数据集的分类效果，并且噪声点也会在一定程度上产生干扰，导致算法区分异常的能力减弱。LOF(local outliers factor)算法通过计算给定数据点相对于其邻域的局部密度偏差来实现离群点检测。鉴于以上问题。提出了一种更精确的异常检测策略。首先，采用局部异常因子算法剔除LOF值过大的离群点，尽可能降低DBSCAN因密度不均匀造成的聚类效果。实验表明，与传统检测方法相比，该策略的聚类效果有了显著提高。

{"title":"Gas flow meter anomaly data detection based on fused LOF-DBSCAN algorithm","authors":"Xichao Yue, Chaoqun Wang, Yong Wang, Le Chen, Weifei Wang, Yuhang Lei","doi":"10.1145/3581807.3581881","DOIUrl":"https://doi.org/10.1145/3581807.3581881","url":null,"abstract":"Anomaly detection for gas flowmeter data is one of the important means to improve the reliability of fair trade of natural gas transmission and distribution. However, the field environment of natural gas in the industrial scene has the characteristics of complex anomaly categories and difficult to distinguish some anomalies. At the same time, the traditional anomaly detection methods are difficult to accurately analyze the abnormal state for a period of time, and are easy to be disturbed by many factors. For example, although DBSCAN (density based spatial clustering of applications with noise) can cluster dense data sets of arbitrary shape, it will greatly affect the classification effect of data sets with uneven density, and the noise points will also interfere to a certain extent, resulting in the weakening of the ability of the algorithm to distinguish anomalies. LOF(local outliers factor) algorithm realizes outlier detection by calculating the local density deviation of a given data point relative to its neighborhood. In view of the above problems. A more accurate anomaly detection strategy is proposed. Firstly, the local anomaly factor algorithm is used to eliminate outliers with too large LOF value, so as to reduce the clustering effect of DBSCAN due to uneven density as much as possible. Experiments show that the clustering effect of this strategy is significantly improved compared with the traditional detection methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114954624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention 基于视觉注意的u型医学图像分割网络VA-TransUNet

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581826

Ting Jiang, Tao Xu, Xiaoning Li

Abstract: Medical image segmentation is clinically important in medical diagnosis as it permits superior lesion detection in medical diagnosis to help physicians assist in treatment. Vision Transformer (ViT) has achieved remarkable results in computer vision and has been used for image segmentation tasks, but the potential in medical image segmentation remains largely unexplored with the special characteristics of medical images. Moreover, ViT based on multi-head self-attention (MSA) converts the image into a one-dimensional sequence, which destroys the two-dimensional structure of the image. Therefore, we propose VA-TransUNet, which combines the advantages of Transformer and Convolutional Neural Networks (CNN) to capture global and local contextual information and consider the features of channel dimensionality. Transformer based on visual attention is adopted, it is taken as the encoder, CNN is used as the decoder, and the image is directly fed into the Transformer. The key of visual attention is the large kernel attention (LKA), which is a depth-wise separable convolution that decomposes a large convolution into various convolutions. Experiment on Synapse of abdominal multi-organ (Synapse) and Automated Cardiac Diagnosis Challenge (ACDC) datasets demonstrate that we proposed VA-TransUNet outperforms the current the-state-of-art networks. The codes and trained models will be publicly and available at https://github.com/BeautySilly/VA-TransUNet.

摘要:医学图像分割在医学诊断中具有重要的临床意义，因为它可以在医学诊断中更好地发现病变，帮助医生辅助治疗。视觉转换器(Vision Transformer, ViT)在计算机视觉领域取得了显著的成果，并已被用于图像分割任务，但由于医学图像的特殊特性，其在医学图像分割方面的潜力仍未得到充分开发。此外，基于多头自注意(MSA)的ViT将图像转换为一维序列，从而破坏了图像的二维结构。因此，我们提出了VA-TransUNet，它结合了Transformer和卷积神经网络(CNN)的优点来捕获全局和局部上下文信息，并考虑了通道维度的特征。采用基于视觉注意的Transformer，以其作为编码器，CNN作为解码器，将图像直接送入Transformer。视觉注意的关键是大核注意(large kernel attention, LKA)，它是一种深度可分卷积，将一个大卷积分解成多个不同的卷积。在腹部多器官突触(Synapse)和心脏自动诊断挑战(ACDC)数据集上的实验表明，我们提出的VA-TransUNet优于当前最先进的网络。代码和经过训练的模型将在https://github.com/BeautySilly/VA-TransUNet上公开提供。

{"title":"VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention","authors":"Ting Jiang, Tao Xu, Xiaoning Li","doi":"10.1145/3581807.3581826","DOIUrl":"https://doi.org/10.1145/3581807.3581826","url":null,"abstract":"Abstract: Medical image segmentation is clinically important in medical diagnosis as it permits superior lesion detection in medical diagnosis to help physicians assist in treatment. Vision Transformer (ViT) has achieved remarkable results in computer vision and has been used for image segmentation tasks, but the potential in medical image segmentation remains largely unexplored with the special characteristics of medical images. Moreover, ViT based on multi-head self-attention (MSA) converts the image into a one-dimensional sequence, which destroys the two-dimensional structure of the image. Therefore, we propose VA-TransUNet, which combines the advantages of Transformer and Convolutional Neural Networks (CNN) to capture global and local contextual information and consider the features of channel dimensionality. Transformer based on visual attention is adopted, it is taken as the encoder, CNN is used as the decoder, and the image is directly fed into the Transformer. The key of visual attention is the large kernel attention (LKA), which is a depth-wise separable convolution that decomposes a large convolution into various convolutions. Experiment on Synapse of abdominal multi-organ (Synapse) and Automated Cardiac Diagnosis Challenge (ACDC) datasets demonstrate that we proposed VA-TransUNet outperforms the current the-state-of-art networks. The codes and trained models will be publicly and available at https://github.com/BeautySilly/VA-TransUNet.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114283841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HSI-DETR: A DETR-based Transfer Learning from RGB to Hyperspectral Images for Object Detection of Live and Dead Cells: To achieve better results, convert models with the fewest changes from RGB to HSI. HSI- detr:基于der的从RGB到高光谱图像的迁移学习，用于活细胞和死细胞的目标检测:为了获得更好的结果，将RGB变化最小的模型转换为HSI。

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581822

Songxin Ye, Nanying Li, Jiaqi Xue, Yaqian Long, S. Jia

Traditional cell viability judgment methods are invasive and damaging to cells. Moreover, even under a microscope, it is difficult to distinguish live cells from dead cells by the naked eye alone. With the development of optical imaging technology, hyperspectral imaging is more and more widely used in various fields. Hyperspectral imaging is a non-contact optical technique that provides both spectral and spatial information in a single measurement. It becomes a fast, non-invasive option to differentiate between live and dead cells. In recent years, the rapid development of deep learning has provided a better way to distinguish the difference between living and dead cells through a large amount of data. However, it is often necessary to acquire large amounts of labeled data at an expensive cost to train models. This is more difficult to achieve on medical hyperspectral images. Therefore, in this paper, a new model called HSI-DETR is proposed to solve the above problem on the target detection task of live and dead cells, which is based on the detection transformer (DETR) model. The HSI-DETR model suitable for hyperspectral images (HSI) is proposed with minimal modification. Then, some parameters of DETR trained on RGB images are transferred to HSI-DETR trained on hyperspectral images. Compared to the general method, this method can train a better model with a small number of labeled samples. And compared to the DETR-R50, the AP50 of HSI-DETR-R50 has increased by 5.15%.

传统的细胞活力判断方法对细胞具有侵入性和损伤性。此外，即使在显微镜下，单凭肉眼也很难区分活细胞和死细胞。随着光学成像技术的发展，高光谱成像越来越广泛地应用于各个领域。高光谱成像是一种非接触式光学技术，可在一次测量中同时提供光谱和空间信息。它成为区分活细胞和死细胞的一种快速、无创的选择。近年来，深度学习的快速发展，通过大量的数据提供了更好的方法来区分活细胞和死细胞的差异。然而，通常需要以昂贵的成本获取大量标记数据来训练模型。这在医学高光谱图像上更难实现。因此，本文在检测变压器(DETR)模型的基础上，提出了一种新的HSI-DETR模型来解决上述活细胞和死细胞目标检测任务的问题。提出了适用于高光谱图像(HSI)的HSI- detr模型。然后，将RGB图像上训练的DETR的部分参数转化为高光谱图像上训练的HSI-DETR。与一般方法相比，该方法可以用少量的标记样本训练出更好的模型。与DETR-R50相比，HSI-DETR-R50的AP50提高了5.15%。

{"title":"HSI-DETR: A DETR-based Transfer Learning from RGB to Hyperspectral Images for Object Detection of Live and Dead Cells: To achieve better results, convert models with the fewest changes from RGB to HSI.","authors":"Songxin Ye, Nanying Li, Jiaqi Xue, Yaqian Long, S. Jia","doi":"10.1145/3581807.3581822","DOIUrl":"https://doi.org/10.1145/3581807.3581822","url":null,"abstract":"Traditional cell viability judgment methods are invasive and damaging to cells. Moreover, even under a microscope, it is difficult to distinguish live cells from dead cells by the naked eye alone. With the development of optical imaging technology, hyperspectral imaging is more and more widely used in various fields. Hyperspectral imaging is a non-contact optical technique that provides both spectral and spatial information in a single measurement. It becomes a fast, non-invasive option to differentiate between live and dead cells. In recent years, the rapid development of deep learning has provided a better way to distinguish the difference between living and dead cells through a large amount of data. However, it is often necessary to acquire large amounts of labeled data at an expensive cost to train models. This is more difficult to achieve on medical hyperspectral images. Therefore, in this paper, a new model called HSI-DETR is proposed to solve the above problem on the target detection task of live and dead cells, which is based on the detection transformer (DETR) model. The HSI-DETR model suitable for hyperspectral images (HSI) is proposed with minimal modification. Then, some parameters of DETR trained on RGB images are transferred to HSI-DETR trained on hyperspectral images. Compared to the general method, this method can train a better model with a small number of labeled samples. And compared to the DETR-R50, the AP50 of HSI-DETR-R50 has increased by 5.15%.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131386339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Few-shot Object Detection via Refining Eigenspace 基于特征空间细化的少镜头目标检测

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581820

Yan Ouyang, Xinqing Wang, Honghui Xu, Ruizhe Hu, Faming Shao, Dong Wang

Few-shot object detection (FSOD) aims to retain the performance of detector when only given scarce annotated instances. We reckon that its difficulty lies in the fact that the scare positive samples restrict the accurate construction of the eigenspace of involved categories. In this paper, we proposed a novel FSOD detector based on refining the eigenspace, which is implemented through a pure positive augmentation, a full feature mining module and a modified loss function. The pure positive augmentation expands the quantity and enriches the scale distribution of positive samples, inhibiting the expansion of negative samples. The full feature mining module enables the model to mining more information about objects. The modified loss function drives prediction results closer to ground truths. We apply these two improvements to YOLOv4, the representative of one-stage detector, which is termed YOLOv4-FS. On PASCAL VOC and MS COCO datasets, our YOLOv4-FS achieves competitive performance compared with existing progressive detectors.

少射目标检测(FSOD)的目的是在给定少量注释实例的情况下保持检测器的性能。我们认为，它的困难在于，大量的正样本限制了所涉及类别特征空间的准确构造。本文提出了一种基于特征空间细化的FSOD检测器，该检测器通过纯正增、全特征挖掘模块和改进的损失函数实现。纯正扩增扩大了正样本的数量，丰富了正样本的尺度分布，抑制了负样本的扩张。完整的特性挖掘模块使模型能够挖掘关于对象的更多信息。修正后的损失函数使预测结果更接近基本事实。我们将这两个改进应用于一级检测器的代表YOLOv4，称为YOLOv4- fs。在PASCAL VOC和MS COCO数据集上，我们的YOLOv4-FS与现有的渐进式检测器相比具有竞争力的性能。

引用次数: 0

Swin Transformer with Multi-Scale Residual Attention for Semantic Segmentation of Remote Sensing Images 基于多尺度剩余注意的Swin变压器遥感图像语义分割

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581827

Yuanyang Lin, Da-han Wang, Yun Wu, Shunzhi Zhu

Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.

遥感图像的语义分割通常面临前景背景不平衡、目标尺度变化大、类间相似性显著等问题。基于fcn的全卷积编码器-解码器架构似乎已经成为语义分割的标准，这种架构在遥感图像中也很普遍。然而，由于CNN的局限性，编码器无法获得全局上下文信息，这对于遥感图像的语义分割是非常重要的。相比之下，本文将基于cnn的编码器替换为Swin Transformer，以获得丰富的全局上下文信息。此外，对于基于cnn的解码器，我们提出了多层次连接模块(MLCM)来融合高、低层语义信息，帮助特征图获得更多的语义信息，并使用多尺度上采样模块(MSUM)加入上采样过程，更好地恢复图像的分辨率，从而获得更好的分割结果。在ISPRS Vaihingen和Potsdam数据集上的实验结果表明了该方法的有效性。

{"title":"Swin Transformer with Multi-Scale Residual Attention for Semantic Segmentation of Remote Sensing Images","authors":"Yuanyang Lin, Da-han Wang, Yun Wu, Shunzhi Zhu","doi":"10.1145/3581807.3581827","DOIUrl":"https://doi.org/10.1145/3581807.3581827","url":null,"abstract":"Semantic segmentation of remote sensing images usually faces the problems of unbalanced foreground-background, large variation of object scales, and significant similarity of different classes. The FCN-based fully convolutional encoder-decoder architecture seems to have become the standard for semantic segmentation, and this architecture is also prevalent in remote sensing images. However, because of the limitations of CNN, the encoder cannot obtain global contextual information, which is extraordinarily important to the semantic segmentation of remote sensing images. By contrast, in this paper, the CNN-based encoder is replaced by Swin Transformer to obtain rich global contextual information. Besides, for the CNN-based decoder, we propose a multi-level connection module (MLCM) to fuse high-level and low-level semantic information to help feature maps obtain more semantic information and use a multi-scale upsample module (MSUM) to join the upsampling process to recover the resolution of images better to get segmentation results preferably. The experimental results on the ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134176559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Active Transfer Learning Method Combining Uncertainty with Diversity for Chinese Address Resolution 不确定性与多样性相结合的主动迁移学习方法在中文地址解析中的应用

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pub Date : 2022-11-17 DOI: 10.1145/3581807.3581902

Yuwei Hu, Xueyuan Zheng, Ping Zong

Chinese address resolution (CAR) is a key step in geocoding technology, and the resolution results directly affect the service quality of address-based applications. Deep learning models have been widely used in CAR task but they require abundant annotated address data to obtain satisfied performance. In this paper, an active transfer learning method combining uncertainty with diversity for CAR is proposed, for which the main goal is to mitigate the annotation requirement for unlabeled address in the target region and to Improve the utilization of labeled data in the source region. Considering the correlation among Chinese addresses, we propose a clustering method of unlabeled address on the basis of feature words, mined from address data based on LDA model, to reflect the distribution of the address. A metric of comprehensive sample strategy combing uncertainty with diversity (CSSCUD) is constructed to select training samples from the target region, which can obtain high valuable samples by considering informativeness and distribution in feature words space jointly in each batch. Experiments on the address dataset from two different regions show that the comprehensive active transfer learning method achieves a higher resolution accuracy than various baselines by using the same number of labeled training samples, which illustrates that the proposed method is effective and practical for CAR.

中文地址解析(CAR)是地理编码技术的关键步骤，其解析结果直接影响到基于地址的应用程序的服务质量。深度学习模型在CAR任务中得到了广泛的应用，但需要大量的带注释的地址数据才能获得满意的性能。本文提出了一种结合不确定性和多样性的CAR主动迁移学习方法，其主要目标是减轻目标区域未标记地址的标注要求，提高源区域已标记数据的利用率。考虑到中文地址之间的相关性，提出了一种基于LDA模型从地址数据中挖掘特征词的未标记地址聚类方法，以反映地址的分布情况。构建了一种结合不确定性和多样性的综合样本策略度量(CSSCUD)，从目标区域中选择训练样本，通过综合考虑每批样本的信息量和特征空间分布，获得高价值样本。在两个不同区域的地址数据集上进行的实验表明，在使用相同数量的标记训练样本的情况下，综合主动迁移学习方法获得了比各种基线更高的分辨率精度，说明了该方法对CAR的有效性和实用性。

{"title":"An Active Transfer Learning Method Combining Uncertainty with Diversity for Chinese Address Resolution","authors":"Yuwei Hu, Xueyuan Zheng, Ping Zong","doi":"10.1145/3581807.3581902","DOIUrl":"https://doi.org/10.1145/3581807.3581902","url":null,"abstract":"Chinese address resolution (CAR) is a key step in geocoding technology, and the resolution results directly affect the service quality of address-based applications. Deep learning models have been widely used in CAR task but they require abundant annotated address data to obtain satisfied performance. In this paper, an active transfer learning method combining uncertainty with diversity for CAR is proposed, for which the main goal is to mitigate the annotation requirement for unlabeled address in the target region and to Improve the utilization of labeled data in the source region. Considering the correlation among Chinese addresses, we propose a clustering method of unlabeled address on the basis of feature words, mined from address data based on LDA model, to reflect the distribution of the address. A metric of comprehensive sample strategy combing uncertainty with diversity (CSSCUD) is constructed to select training samples from the target region, which can obtain high valuable samples by considering informativeness and distribution in feature words space jointly in each batch. Experiments on the address dataset from two different regions show that the comprehensive active transfer learning method achieves a higher resolution accuracy than various baselines by using the same number of labeled training samples, which illustrates that the proposed method is effective and practical for CAR.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134511443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀