首页 > 最新文献

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition最新文献

英文 中文
Improving Pedestrian Attribute Recognition with Dual Adaptive Fusion Attention 双自适应融合注意改进行人属性识别
Wenbiao Xie, Chen Zou, Chengui Fu, Xiaomei Xie, Qiuming Liu, He Xiao
As one of the important fields of computer vision research, pedestrian attribute recognition has received increasing attention on researchers at domestic and foreign. However, obtaining long-distance pedestrian information on actual scenes has problems, such as lack of information, incomplete feature extraction, and low attribute recognition accuracy. To address these issues, we proposed a Dual Adaptive Fusion Attention and Criss-Cross Attention Module (DAFCC). This module contains two sub-modules: First, the dual adaptive fusion attention module automatically adjusts the weights of attributes in different scales, then fusion the different scale features and makes attribute extraction more complete. Second, we employ criss-cross attention to extract rich contextual information, which is beneficial for visual understanding. By training on the public PA-100K, RAP and PETA datasets, the mean accuracies achieved 81.09%, 81.44% and 85.94%, respectively. Extensive experimental results show that the method has strong competitiveness among many current classical algorithms.
行人属性识别作为计算机视觉研究的重要领域之一,越来越受到国内外研究者的关注。然而,在实际场景中获取远距离行人信息存在信息缺乏、特征提取不完整、属性识别精度低等问题。为了解决这些问题,我们提出了双自适应融合注意和交叉注意模块(DAFCC)。该模块包含两个子模块:一是双自适应融合关注模块,自动调整不同尺度下属性的权重,融合不同尺度特征,使属性提取更加完整;其次,我们利用交叉注意力提取丰富的上下文信息,这有利于视觉理解。在PA-100K、RAP和PETA公开数据集上进行训练,平均准确率分别达到81.09%、81.44%和85.94%。大量的实验结果表明,该方法与现有的许多经典算法相比具有很强的竞争力。
{"title":"Improving Pedestrian Attribute Recognition with Dual Adaptive Fusion Attention","authors":"Wenbiao Xie, Chen Zou, Chengui Fu, Xiaomei Xie, Qiuming Liu, He Xiao","doi":"10.1145/3581807.3581814","DOIUrl":"https://doi.org/10.1145/3581807.3581814","url":null,"abstract":"As one of the important fields of computer vision research, pedestrian attribute recognition has received increasing attention on researchers at domestic and foreign. However, obtaining long-distance pedestrian information on actual scenes has problems, such as lack of information, incomplete feature extraction, and low attribute recognition accuracy. To address these issues, we proposed a Dual Adaptive Fusion Attention and Criss-Cross Attention Module (DAFCC). This module contains two sub-modules: First, the dual adaptive fusion attention module automatically adjusts the weights of attributes in different scales, then fusion the different scale features and makes attribute extraction more complete. Second, we employ criss-cross attention to extract rich contextual information, which is beneficial for visual understanding. By training on the public PA-100K, RAP and PETA datasets, the mean accuracies achieved 81.09%, 81.44% and 85.94%, respectively. Extensive experimental results show that the method has strong competitiveness among many current classical algorithms.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121510124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Instance-level Weighted Graph Learning for Incomplete Multi-view Clustering 不完全多视图聚类的实例级加权图学习
J. Zhang, Lunke Fei, Yun Li, Fangqi Nie, Qiaoxian Jiang, Libing Liang, Pengcheng Yan
Incomplete multi-view clustering has attracted board attention due to the frequent absent of some views of real-world objects. Existing incomplete multi-view clustering methods usually assign different weights to different views to learn the consensus graph of multi-views, which however cannot preserve properly the non-noise information in the views of lower weight. In this paper, unlike existing view-level weighted graph learning, we propose a simple yet effective instance-level weighted graph learning for incomplete multi-view clustering. Specifically, we first use the similarity information of the available views to estimate and recover the missing views, such that the harmful impact of the missing views can be reduced. Then, we adaptively assign the weights to the similarities between different perspectives such that negative effects of noises are reduced. Finally, by combining graph fusion and rank constraints, we can learn a new consensus representation of multi-view data for incomplete multi-view analysis. Experimental results on five widely used incomplete multi-view datasets clearly demonstrate the effectiveness of our proposed method.
不完全多视图聚类由于经常缺少真实物体的某些视图而引起了广泛的关注。现有的不完全多视图聚类方法通常对不同的视图分配不同的权值来学习多视图的一致图,但不能很好地保留权值较低的视图中的非噪声信息。在本文中,不同于现有的视图级加权图学习,我们提出了一种简单而有效的实例级加权图学习,用于不完全多视图聚类。具体而言,我们首先利用可用视图的相似度信息来估计和恢复缺失视图,从而减少缺失视图的有害影响。然后,我们自适应地为不同视角之间的相似性分配权重,以减少噪声的负面影响。最后,结合图融合和秩约束,我们可以学习一种新的多视图数据共识表示,用于不完全多视图分析。在五个广泛使用的不完全多视图数据集上的实验结果清楚地证明了我们所提出的方法的有效性。
{"title":"Instance-level Weighted Graph Learning for Incomplete Multi-view Clustering","authors":"J. Zhang, Lunke Fei, Yun Li, Fangqi Nie, Qiaoxian Jiang, Libing Liang, Pengcheng Yan","doi":"10.1145/3581807.3581832","DOIUrl":"https://doi.org/10.1145/3581807.3581832","url":null,"abstract":"Incomplete multi-view clustering has attracted board attention due to the frequent absent of some views of real-world objects. Existing incomplete multi-view clustering methods usually assign different weights to different views to learn the consensus graph of multi-views, which however cannot preserve properly the non-noise information in the views of lower weight. In this paper, unlike existing view-level weighted graph learning, we propose a simple yet effective instance-level weighted graph learning for incomplete multi-view clustering. Specifically, we first use the similarity information of the available views to estimate and recover the missing views, such that the harmful impact of the missing views can be reduced. Then, we adaptively assign the weights to the similarities between different perspectives such that negative effects of noises are reduced. Finally, by combining graph fusion and rank constraints, we can learn a new consensus representation of multi-view data for incomplete multi-view analysis. Experimental results on five widely used incomplete multi-view datasets clearly demonstrate the effectiveness of our proposed method.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122133257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rice Disease Recognition and Feature Visualization Using a Convolutional Neural Network 基于卷积神经网络的水稻病害识别与特征可视化
Yan Wei, Zhibin Wang, Xiao-Jun Qiao
To achieve fast and accurate identification of rice diseases in the field, we propose an automatic rice disease classifier, in which the process of characterizing rice diseases is visualized and analyzed by a deconvolutional neural network. An AlexNet model, pretrained by ImageNet, is constructed and trained on rice disease images to classify them. After the training is completed, the signal is repositioned to the corresponding position of the input image by a deconvolutional neural network corresponding to the AlexNet structure. The set of pixels that contribute most to the prediction of the convolutional neural network is identified from the deconvolution visualization map. The experimental results demonstrated the effectiveness of the proposed method. The classifier achieved an accuracy of 90.03% for the rice disease dataset, which was 8.39% and 16.78% higher than the accuracies achieved by the LeNet and BP neural networks, respectively. The features of the middle layer of the convolutional neural network perform a hierarchical transformation from low-level information, such as color, to high-level information, such as contours and edges of disease spots. This transformation process matches the criteria for the actual identification of rice diseases. The proposed method lays the foundation for the accurate identification of crop diseases and the design and adjustment of deep convolutional neural network structures.
为了实现对水稻病害的快速准确识别,提出了一种水稻病害自动分类器,该分类器采用反卷积神经网络对水稻病害的识别过程进行可视化分析。利用ImageNet预训练的AlexNet模型,对水稻病害图像进行分类训练。训练完成后,通过AlexNet结构对应的反卷积神经网络将信号重新定位到输入图像的相应位置。从反卷积可视化图中识别出对卷积神经网络预测贡献最大的像素集。实验结果证明了该方法的有效性。该分类器对水稻病害数据集的准确率为90.03%,比LeNet和BP神经网络分别提高8.39%和16.78%。卷积神经网络中间层的特征执行从低级信息(如颜色)到高级信息(如病斑的轮廓和边缘)的分层转换。这一转化过程符合实际鉴定水稻病害的标准。该方法为作物病害的准确识别和深度卷积神经网络结构的设计与调整奠定了基础。
{"title":"Rice Disease Recognition and Feature Visualization Using a Convolutional Neural Network","authors":"Yan Wei, Zhibin Wang, Xiao-Jun Qiao","doi":"10.1145/3581807.3581811","DOIUrl":"https://doi.org/10.1145/3581807.3581811","url":null,"abstract":"To achieve fast and accurate identification of rice diseases in the field, we propose an automatic rice disease classifier, in which the process of characterizing rice diseases is visualized and analyzed by a deconvolutional neural network. An AlexNet model, pretrained by ImageNet, is constructed and trained on rice disease images to classify them. After the training is completed, the signal is repositioned to the corresponding position of the input image by a deconvolutional neural network corresponding to the AlexNet structure. The set of pixels that contribute most to the prediction of the convolutional neural network is identified from the deconvolution visualization map. The experimental results demonstrated the effectiveness of the proposed method. The classifier achieved an accuracy of 90.03% for the rice disease dataset, which was 8.39% and 16.78% higher than the accuracies achieved by the LeNet and BP neural networks, respectively. The features of the middle layer of the convolutional neural network perform a hierarchical transformation from low-level information, such as color, to high-level information, such as contours and edges of disease spots. This transformation process matches the criteria for the actual identification of rice diseases. The proposed method lays the foundation for the accurate identification of crop diseases and the design and adjustment of deep convolutional neural network structures.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115102547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principal component self-attention mechanism for melanoma hyperspectral image recognition 黑色素瘤高光谱图像识别的主成分自注意机制
Hongbo Liang, Nanying Li, Jiaqi Xue, Yaqian Long, S. Jia
Early detection of melanoma and prompt treatment are key approaches to reducing melanoma-related deaths. In order to improve the ability of early detection of melanoma, this paper introduces a set of hyperspectral images (HSIs) data captured by dermoscopy using hyperspectral technology, and based on this data, proposes a principal component self-attention mechanism (PCSAM) method for the classification of dysplastic nevus and melanoma. The proposed method uses principal component analysis technology to amplify the differences in spectral features of the lesions and extract some new features that are convenient for classification. In addition, under the action of the attention mechanism, the spectral features of melanoma are fully paid attention to, and the contextual spatial information between each HSI block can also be utilized. Finally, a comparison experiment is carried out using RGB images and HSIs. Experimental results demonstrate that the spectral features of melanoma can significantly improve the classification accuracy, and it also shows that the participation of hyperspectral technology can effectively improve the recognition accuracy of dysplastic nevus and melanoma, which reflects the advantages of HSI compared with the traditional image.
早期发现黑色素瘤和及时治疗是减少黑色素瘤相关死亡的关键途径。为了提高对黑色素瘤的早期发现能力,本文介绍了一组利用高光谱技术在皮肤镜下捕获的高光谱图像(hsi)数据,并基于该数据,提出了一种主成分自注意机制(PCSAM)方法对发育不良痣和黑色素瘤进行分类。该方法利用主成分分析技术放大病变光谱特征的差异,提取便于分类的新特征。此外,在注意机制的作用下,黑色素瘤的光谱特征得到了充分的关注,并且还可以利用每个HSI块之间的上下文空间信息。最后,对RGB图像和hsi图像进行了对比实验。实验结果表明,黑色素瘤的光谱特征可以显著提高分类精度,也表明高光谱技术的参与可以有效提高对发育不良痣和黑色素瘤的识别精度,这体现了HSI相对于传统图像的优势。
{"title":"Principal component self-attention mechanism for melanoma hyperspectral image recognition","authors":"Hongbo Liang, Nanying Li, Jiaqi Xue, Yaqian Long, S. Jia","doi":"10.1145/3581807.3581843","DOIUrl":"https://doi.org/10.1145/3581807.3581843","url":null,"abstract":"Early detection of melanoma and prompt treatment are key approaches to reducing melanoma-related deaths. In order to improve the ability of early detection of melanoma, this paper introduces a set of hyperspectral images (HSIs) data captured by dermoscopy using hyperspectral technology, and based on this data, proposes a principal component self-attention mechanism (PCSAM) method for the classification of dysplastic nevus and melanoma. The proposed method uses principal component analysis technology to amplify the differences in spectral features of the lesions and extract some new features that are convenient for classification. In addition, under the action of the attention mechanism, the spectral features of melanoma are fully paid attention to, and the contextual spatial information between each HSI block can also be utilized. Finally, a comparison experiment is carried out using RGB images and HSIs. Experimental results demonstrate that the spectral features of melanoma can significantly improve the classification accuracy, and it also shows that the participation of hyperspectral technology can effectively improve the recognition accuracy of dysplastic nevus and melanoma, which reflects the advantages of HSI compared with the traditional image.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122875871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Key Points Positioning: A Two-Stage Algorithm For Single-view Point Cloud of Human Back Based on Point-wise Network 关键点定位:基于点向网络的人体背部单视角点云两阶段算法
Nan Dong, Xinfeng Zhang, Xiaomin Liu, Weifeng Guo, Fei Wang
Point cloud data is a collection of massive points containing the spatial position of each point on the target surface, which contains abundant spatial information. At present, it is also applied to the digital modeling of human surface in medical imaging, as the data basis for subsequent human body measurement, morphology estimation and data analysis. Among them, the key points is defined as the landmark position of the surface morphology analysis, those key points provides a reference position for the analysis work, and also reflects the symmetry of the body to a certain extent and morphology information. Aiming at the back shape analysis in clinical diagnosis, this paper proposes a two-stage key points positioning scheme of coarse segmentation and fine positioning. We design and build an pointwise artificial neural network to roughly locate the body part, in this part, we propose a maximum pooling module based on spatial location coding to express local features more strongly. Farther, we propose a gray distance and curvature based operator to match the position of key points. The experiment, shows that our method can effectively enhance the distinctiveness of features and meanwhile, reduce the influence from background.
点云数据是海量点的集合,包含了每个点在目标表面上的空间位置,包含了丰富的空间信息。目前,它还应用于医学成像中人体表面的数字化建模,作为后续人体测量、形态估计和数据分析的数据基础。其中关键点被定义为表面形貌分析的地标位置,这些关键点为分析工作提供了参考位置,也在一定程度上反映了人体的对称性和形貌信息。针对临床诊断中的背部形状分析,提出了一种粗分割和精细定位两阶段的关键点定位方案。我们设计并构建了一个点向人工神经网络来对人体部位进行粗略定位,在这一部分中,我们提出了一个基于空间位置编码的最大池化模块来更强地表达局部特征。此外,我们提出了一种基于灰度距离和曲率的算子来匹配关键点的位置。实验表明,该方法可以有效地增强特征的显著性,同时减小背景的影响。
{"title":"Key Points Positioning: A Two-Stage Algorithm For Single-view Point Cloud of Human Back Based on Point-wise Network","authors":"Nan Dong, Xinfeng Zhang, Xiaomin Liu, Weifeng Guo, Fei Wang","doi":"10.1145/3581807.3581846","DOIUrl":"https://doi.org/10.1145/3581807.3581846","url":null,"abstract":"Point cloud data is a collection of massive points containing the spatial position of each point on the target surface, which contains abundant spatial information. At present, it is also applied to the digital modeling of human surface in medical imaging, as the data basis for subsequent human body measurement, morphology estimation and data analysis. Among them, the key points is defined as the landmark position of the surface morphology analysis, those key points provides a reference position for the analysis work, and also reflects the symmetry of the body to a certain extent and morphology information. Aiming at the back shape analysis in clinical diagnosis, this paper proposes a two-stage key points positioning scheme of coarse segmentation and fine positioning. We design and build an pointwise artificial neural network to roughly locate the body part, in this part, we propose a maximum pooling module based on spatial location coding to express local features more strongly. Farther, we propose a gray distance and curvature based operator to match the position of key points. The experiment, shows that our method can effectively enhance the distinctiveness of features and meanwhile, reduce the influence from background.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116232236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and Analyzing the Multi-Information Network Propagation Dynamics on Hot Events 热点事件下多信息网络传播动力学建模与分析
Yuwei She, Xinyi Jiang, Changyi Wu, Fulian Yin
As the largest online social platform in China, Weibo enables users to freely access and share information, and plays an important role in the dissemination of public opinion. The hot topics on Weibo include multiple pieces of information, the dissemination of which is not an isolated process but affects each other. In consideration of the problem of unclear multi-information propagation rules and insufficient analysis of factors influencing public opinion in real networks, this paper analyzes the multi-information delayed transmission scenario in complex network environments and constructs the Multiple-Information Delay-transmission Susceptible-Forwarding-Immune (MD-SFIFI) model considering the situation that the first message of a hot event has a certain time interval from the release of other messages. Data fitting is conducted to prove the validity of our model. This paper realizes the study of multi-information dissemination law by analyzing the correlation between parameters and information dissemination indicators, and summarizes the multi-information dissemination law, aiming to provide theoretical and data support for the decision making and research of government public opinion response and governance.
微博作为中国最大的网络社交平台,让用户可以自由获取和分享信息,在舆论传播中发挥着重要作用。微博上的热点话题包含了多条信息,这些信息的传播不是一个孤立的过程,而是相互影响的。针对现实网络中多信息传播规则不清晰、舆情影响因素分析不足的问题,分析了复杂网络环境下的多信息延迟传播场景,考虑热点事件的第一条消息与其他消息的发布有一定的时间间隔,构建了多信息延迟传播敏感-转发-免疫(MD-SFIFI)模型。通过数据拟合验证了模型的有效性。本文通过分析参数与信息传播指标之间的相关性,实现对多元信息传播规律的研究,并对多元信息传播规律进行归纳总结,旨在为政府舆情响应与治理的决策与研究提供理论和数据支持。
{"title":"Modeling and Analyzing the Multi-Information Network Propagation Dynamics on Hot Events","authors":"Yuwei She, Xinyi Jiang, Changyi Wu, Fulian Yin","doi":"10.1145/3581807.3581897","DOIUrl":"https://doi.org/10.1145/3581807.3581897","url":null,"abstract":"As the largest online social platform in China, Weibo enables users to freely access and share information, and plays an important role in the dissemination of public opinion. The hot topics on Weibo include multiple pieces of information, the dissemination of which is not an isolated process but affects each other. In consideration of the problem of unclear multi-information propagation rules and insufficient analysis of factors influencing public opinion in real networks, this paper analyzes the multi-information delayed transmission scenario in complex network environments and constructs the Multiple-Information Delay-transmission Susceptible-Forwarding-Immune (MD-SFIFI) model considering the situation that the first message of a hot event has a certain time interval from the release of other messages. Data fitting is conducted to prove the validity of our model. This paper realizes the study of multi-information dissemination law by analyzing the correlation between parameters and information dissemination indicators, and summarizes the multi-information dissemination law, aiming to provide theoretical and data support for the decision making and research of government public opinion response and governance.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124415824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Forgery Detection Using Spatio-Temporal Dual Transformer 基于时空双变压器的视频伪造检测
Chenyu Liu, Jia Li, Junxian Duan, Huaibo Huang
The fake videos generated by deep generation technology pose a potential threat to social stability, which makes it critical to detect fake videos. Although the previous detection methods have achieved high accuracy, the generalization to different datasets and in realistic scenes is not effective. We find several novel temporal and spatial clues. In the frequency domain, the inter-frame differences between the real and fake videos are significantly more obvious than the intra-frame differences. In the shallow texture on the CbCr color channels, the forged areas of the fake videos appear more distinct blurring compared to the real videos. And the optical flow of the real video changes gradually, while the optical flow of the fake video changes drastically. This paper proposes a spatio-temporal dual Transformer network for video forgery detection that integrates spatio-temporal clues with the temporal consistency of consecutive frames to improve generalization. Specifically, an EfficientNet is first used to extract spatial artifacts of shallow textures and high-frequency information. We add a new loss function to EfficientNet to extract more robust face features, as well as introduce an attention mechanism to enhance the extracted features. Next, a Swin Transformer is used to capture the subtle temporal artifacts in inter-frame spectrum difference and the optical flow. A feature interaction module is added to fuse local features and global representations. Finally, another Swin Transformer is used to classify the videos according to the extracted spatio-temporal features. We evaluate our method on datasets such as FaceForensics++, Celeb-DF (v2) and DFDC. Extensive experiments show that the proposed framework has high accuracy and generalization, outperforming the current state-of-the-art methods.
深度生成技术产生的虚假视频对社会稳定构成潜在威胁,因此对虚假视频的检测至关重要。虽然以前的检测方法已经达到了很高的精度,但对不同数据集和真实场景的泛化效果并不好。我们发现了一些新的时空线索。在频域上,真假视频的帧间差异明显大于帧内差异。在CbCr颜色通道上的浅纹理中,假视频的伪造区域比真实视频出现更明显的模糊。真实视频的光流变化是渐变的,而假视频的光流变化是剧烈的。本文提出了一种用于视频伪造检测的时空双Transformer网络,该网络将时空线索与连续帧的时间一致性相结合,提高了视频伪造检测的泛化能力。具体而言,首先使用effentnet提取浅纹理和高频信息的空间伪影。我们在effentnet中增加了一个新的损失函数来提取更鲁棒的人脸特征,并引入了一个注意机制来增强提取的特征。然后,使用Swin变压器捕获帧间频谱差和光流中的细微时间伪影。添加了一个功能交互模块来融合本地功能和全局表示。最后,利用Swin Transformer根据提取的时空特征对视频进行分类。我们在FaceForensics++、Celeb-DF (v2)和DFDC等数据集上评估了我们的方法。大量的实验表明,该框架具有较高的准确性和泛化性,优于目前最先进的方法。
{"title":"Video Forgery Detection Using Spatio-Temporal Dual Transformer","authors":"Chenyu Liu, Jia Li, Junxian Duan, Huaibo Huang","doi":"10.1145/3581807.3581847","DOIUrl":"https://doi.org/10.1145/3581807.3581847","url":null,"abstract":"The fake videos generated by deep generation technology pose a potential threat to social stability, which makes it critical to detect fake videos. Although the previous detection methods have achieved high accuracy, the generalization to different datasets and in realistic scenes is not effective. We find several novel temporal and spatial clues. In the frequency domain, the inter-frame differences between the real and fake videos are significantly more obvious than the intra-frame differences. In the shallow texture on the CbCr color channels, the forged areas of the fake videos appear more distinct blurring compared to the real videos. And the optical flow of the real video changes gradually, while the optical flow of the fake video changes drastically. This paper proposes a spatio-temporal dual Transformer network for video forgery detection that integrates spatio-temporal clues with the temporal consistency of consecutive frames to improve generalization. Specifically, an EfficientNet is first used to extract spatial artifacts of shallow textures and high-frequency information. We add a new loss function to EfficientNet to extract more robust face features, as well as introduce an attention mechanism to enhance the extracted features. Next, a Swin Transformer is used to capture the subtle temporal artifacts in inter-frame spectrum difference and the optical flow. A feature interaction module is added to fuse local features and global representations. Finally, another Swin Transformer is used to classify the videos according to the extracted spatio-temporal features. We evaluate our method on datasets such as FaceForensics++, Celeb-DF (v2) and DFDC. Extensive experiments show that the proposed framework has high accuracy and generalization, outperforming the current state-of-the-art methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127295015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chinese Electronic Medical Record Named Entity Recognition Based on Bi-RNN-LSTM-RNN-CRF 基于Bi-RNN-LSTM-RNN-CRF的中文电子病历命名实体识别
Chenquan Dai, Xiaobin Zhuang, Jiaxin Cai
Based on the mainstream deep learning model BiLSTM-CRF, the electronic medical record named entity recognition model Bi-RNN-LSTM-RNN-CRF is established. First collect the electronic medical record data set, then convert the characters into vectors through the word vector tool, enter them into the bidirectional RNN-LSTM-RNN layer for training, and then enter the training results into the CRF layer, calculate the loss function to obtain the prediction results, and record the time that the process took.Finally, repeat the above steps with the traditional BiLSTM-CRF model to compare the results of the two models. Experimental results show that the F1 value of the Bi-RNN-LSTM-RNN-CRF model can reach 97.80%, and the recognition effect is slightly inferior to that of BiLSTM-CRF.
基于主流深度学习模型BiLSTM-CRF,建立电子病历命名实体识别模型Bi-RNN-LSTM-RNN-CRF。首先采集电子病历数据集,然后通过单词向量工具将字符转换成向量,输入双向RNN-LSTM-RNN层进行训练,然后将训练结果输入CRF层,计算损失函数得到预测结果,并记录该过程所花费的时间。最后,对传统的BiLSTM-CRF模型重复上述步骤,比较两种模型的结果。实验结果表明,Bi-RNN-LSTM-RNN-CRF模型的F1值可达到97.80%,识别效果略逊于BiLSTM-CRF。
{"title":"Chinese Electronic Medical Record Named Entity Recognition Based on Bi-RNN-LSTM-RNN-CRF","authors":"Chenquan Dai, Xiaobin Zhuang, Jiaxin Cai","doi":"10.1145/3581807.3581892","DOIUrl":"https://doi.org/10.1145/3581807.3581892","url":null,"abstract":"Based on the mainstream deep learning model BiLSTM-CRF, the electronic medical record named entity recognition model Bi-RNN-LSTM-RNN-CRF is established. First collect the electronic medical record data set, then convert the characters into vectors through the word vector tool, enter them into the bidirectional RNN-LSTM-RNN layer for training, and then enter the training results into the CRF layer, calculate the loss function to obtain the prediction results, and record the time that the process took.Finally, repeat the above steps with the traditional BiLSTM-CRF model to compare the results of the two models. Experimental results show that the F1 value of the Bi-RNN-LSTM-RNN-CRF model can reach 97.80%, and the recognition effect is slightly inferior to that of BiLSTM-CRF.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134204637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scale Channel Attention for Chinese Scene Text Recognition 中文场景文本识别的多尺度通道关注
Haiqing Liao, X. Du, Yun Wu, Da-Han Wang
Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.
场景文本识别已被证明是解决各种计算机视觉任务的高效方法。近年来,人们提出了许多基于编码器-解码器框架的识别算法来处理具有视角失真和曲线形状的场景文本。然而,这些方法大多只考虑单尺度特征,而没有考虑多尺度特征。同时,现有的文本识别方法主要用于英文文本,忽略了中文文本的关键作用。本文提出了一种端到端融合多尺度特征的中文场景文本识别方法。具体来说,我们在骨干网中采用并定制了密集空间金字塔池(DenseASPP)来捕获输入图像的多尺度特征,同时扩展接收野。此外,为了进一步提高CSTR的性能,我们还增加了挤压-激励网络(SE)来捕获具有全局信息的注意力特征。中文场景文本数据集的实验结果表明,该方法能够有效地减轻文本尺度变化带来的上下文信息丢失的影响,优于现有的方法。
{"title":"Multi-Scale Channel Attention for Chinese Scene Text Recognition","authors":"Haiqing Liao, X. Du, Yun Wu, Da-Han Wang","doi":"10.1145/3581807.3581808","DOIUrl":"https://doi.org/10.1145/3581807.3581808","url":null,"abstract":"Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"13 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114029209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on AIS Data Aided Ship Classification in Spaceborne SAR Images 星载SAR图像中AIS数据辅助舰船分类研究
Zhenguo Yan, Xin Song, Lei Yang
The continuous development of spaceborne synthetic aperture radar (SAR) technology promotes the research of ship classification and plays an important role in maritime surveillance. At present, the mainstream ship classification based on the deep learning method in SAR images has achieved a state-of-the-art performance, but it heavily depends on plenty of labeled samples. Compared with SAR images, the automatic identification system (AIS) can provide a large amount of data that is relatively easy to obtain and contains rich ship information. Therefore, in order to solve the problem of ship classification in SAR images with limited samples, a ship object classification method by AIS data aided is proposed in this paper. Specifically, we first train the ship classification model SMOTEBoost on AIS data, and then transfer the trained model to SAR images for ship type prediction. Experimental results show that the proposed method achieves classification accuracy as high as 93%, which proves that AIS data transfer can effectively solve the problem of ship classification in SAR images with limited samples.
星载合成孔径雷达(SAR)技术的不断发展促进了船舶分类的研究,在海上监视中发挥着重要作用。目前,基于深度学习方法的主流舰船分类方法在SAR图像上已经达到了最先进的性能,但它严重依赖于大量的标记样本。与SAR图像相比,自动识别系统(AIS)可以提供大量数据,相对容易获取,并且包含丰富的船舶信息。因此,为了解决有限样本SAR图像中的船舶分类问题,本文提出了一种AIS数据辅助下的船舶目标分类方法。具体来说,我们首先在AIS数据上训练船舶分类模型SMOTEBoost,然后将训练好的模型转移到SAR图像上进行船型预测。实验结果表明,该方法的分类准确率高达93%,证明AIS数据传输可以有效解决有限样本SAR图像下的船舶分类问题。
{"title":"Research on AIS Data Aided Ship Classification in Spaceborne SAR Images","authors":"Zhenguo Yan, Xin Song, Lei Yang","doi":"10.1145/3581807.3581833","DOIUrl":"https://doi.org/10.1145/3581807.3581833","url":null,"abstract":"The continuous development of spaceborne synthetic aperture radar (SAR) technology promotes the research of ship classification and plays an important role in maritime surveillance. At present, the mainstream ship classification based on the deep learning method in SAR images has achieved a state-of-the-art performance, but it heavily depends on plenty of labeled samples. Compared with SAR images, the automatic identification system (AIS) can provide a large amount of data that is relatively easy to obtain and contains rich ship information. Therefore, in order to solve the problem of ship classification in SAR images with limited samples, a ship object classification method by AIS data aided is proposed in this paper. Specifically, we first train the ship classification model SMOTEBoost on AIS data, and then transfer the trained model to SAR images for ship type prediction. Experimental results show that the proposed method achieves classification accuracy as high as 93%, which proves that AIS data transfer can effectively solve the problem of ship classification in SAR images with limited samples.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"20 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120903530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1