首页 > 最新文献

Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition最新文献

英文 中文
An Augmented Reality Tracking Registration Method Based on Deep Learning 基于深度学习的增强现实跟踪配准方法
Xingya Yan, Guangrui Bai, Chaobao Tang
Augmented reality is a three-dimensional visualization technology that can carry out human-computer interaction. Virtual information is placed in the designated area of the real world to enhance real-world information. Based on the existing implementation process of augmented reality, this paper proposes an augmented reality method based on deep learning, aiming at the inaccurate positioning and model drift of the augmented reality method without markers in complex backgrounds, light changes, and partial occlusion. The proposed method uses the lightweight SSD model for target detection, the SURF algorithm to extract feature points and the FLANN algorithm for feature matching. Experimental results show that this method can effectively solve the problems of inaccurate positioning and model drift under particular circumstances while ensuring the operational efficiency of the augmented reality system.
增强现实是一种可以进行人机交互的三维可视化技术。虚拟信息被放置在现实世界的指定区域,以增强现实世界的信息。本文在现有增强现实实现流程的基础上,针对增强现实方法在复杂背景、光照变化、局部遮挡等情况下无标记定位不准确、模型漂移等问题,提出了一种基于深度学习的增强现实方法。该方法采用轻量级SSD模型进行目标检测,SURF算法提取特征点,FLANN算法进行特征匹配。实验结果表明,该方法在保证增强现实系统运行效率的同时,能有效解决特定情况下的定位不准确和模型漂移问题。
{"title":"An Augmented Reality Tracking Registration Method Based on Deep Learning","authors":"Xingya Yan, Guangrui Bai, Chaobao Tang","doi":"10.1145/3573942.3574034","DOIUrl":"https://doi.org/10.1145/3573942.3574034","url":null,"abstract":"Augmented reality is a three-dimensional visualization technology that can carry out human-computer interaction. Virtual information is placed in the designated area of the real world to enhance real-world information. Based on the existing implementation process of augmented reality, this paper proposes an augmented reality method based on deep learning, aiming at the inaccurate positioning and model drift of the augmented reality method without markers in complex backgrounds, light changes, and partial occlusion. The proposed method uses the lightweight SSD model for target detection, the SURF algorithm to extract feature points and the FLANN algorithm for feature matching. Experimental results show that this method can effectively solve the problems of inaccurate positioning and model drift under particular circumstances while ensuring the operational efficiency of the augmented reality system.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122299187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual Correlation Filter Tracking for UAV Based on Temporal and Spatial Regularization with Boolean Maps 基于布尔映射时空正则化的无人机视觉相关滤波跟踪
Na Li, Jiale Gao, Y. Liu, Yansheng Zhu, Wenhan Jiang
Object tracking is now widely used in sports event broadcasting, security surveillance, and human-computer interaction. It is a challenging task for tracking on unmanned aerial vehicle (UAV) datasets due to many factors such as illumination change, appearance modification, occlusion, motion blur and so on. To solve the problem, a visual correlation filter tracking algorithm based on temporal and spatial regularization is proposed. It employs boolean maps to obtain visual attention, and fuses different features such as color names (CN), histogram of oriented gradient (HOG) and Gray features to enhance the visual representation. New object occlusion judgment method and model update strategy are put forward to make the tracker more robust. The proposed algorithm is compared with other six trackers in terms of distant precision and success rate on UAV123. And the experimental results show that it achieves more stable and robust tracking performance.
目前,目标跟踪已广泛应用于体育赛事转播、安全监控、人机交互等领域。由于光照变化、外观修饰、遮挡、运动模糊等因素,对无人机数据集进行跟踪是一项具有挑战性的任务。为了解决这一问题,提出了一种基于时空正则化的视觉相关滤波跟踪算法。它采用布尔映射来获取视觉注意力,并融合颜色名称(CN)、梯度直方图(HOG)和灰度特征等不同特征来增强视觉表征。为了提高跟踪器的鲁棒性,提出了新的目标遮挡判断方法和模型更新策略。将该算法与其他六种跟踪器在UAV123上的远端精度和成功率进行了比较。实验结果表明,该方法具有更稳定、鲁棒的跟踪性能。
{"title":"Visual Correlation Filter Tracking for UAV Based on Temporal and Spatial Regularization with Boolean Maps","authors":"Na Li, Jiale Gao, Y. Liu, Yansheng Zhu, Wenhan Jiang","doi":"10.1145/3573942.3574036","DOIUrl":"https://doi.org/10.1145/3573942.3574036","url":null,"abstract":"Object tracking is now widely used in sports event broadcasting, security surveillance, and human-computer interaction. It is a challenging task for tracking on unmanned aerial vehicle (UAV) datasets due to many factors such as illumination change, appearance modification, occlusion, motion blur and so on. To solve the problem, a visual correlation filter tracking algorithm based on temporal and spatial regularization is proposed. It employs boolean maps to obtain visual attention, and fuses different features such as color names (CN), histogram of oriented gradient (HOG) and Gray features to enhance the visual representation. New object occlusion judgment method and model update strategy are put forward to make the tracker more robust. The proposed algorithm is compared with other six trackers in terms of distant precision and success rate on UAV123. And the experimental results show that it achieves more stable and robust tracking performance.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129410519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of PM2.5 on the Detection Performance of Quantum Interference Radar PM2.5对量子干涉雷达探测性能的影响
Lihao Tian, Min Nie, Guang Yang
In order to study the influence of PM2.5 particles on the detection performance of quantum interference radar, this article analyzes the relationship between the concentration of PM2.5 particles and the extinction coefficient under different particle sizes based on the spectral distribution function of PM2.5 particles and the Mie scattering theory. Then establish the influence model of PM2.5 particles on the detection distance and maximum detection error probability of quantum interference radar. The simulation results show that as the concentration of PM2.5 particles increases, the extinction coefficient of PM2.5 particles shows a gradually increasing trend; the energy of the detected photons is attenuated, resulting in a decrease in the transmission distance of the photons; when the energy of the emitted photons remains unchanged, The maximum detection error probability of quantum interference radar increases with the increase of PM2.5 particle concentration; when the PM2.5 particle concentration remains unchanged, the maximum detection error probability decreases gradually with the increase of the emitted photon energy. Therefore, the average number of emitted photons should be appropriately adjusted according to PM2.5 pollution in order to reduce the impact of PM2.5 atmospheric pollution on the detection performance of quantum interference radar.
为了研究PM2.5粒子对量子干涉雷达探测性能的影响,本文基于PM2.5粒子的光谱分布函数和米氏散射理论,分析了不同粒径下PM2.5粒子浓度与消光系数的关系。然后建立PM2.5粒子对量子干涉雷达探测距离和最大探测误差概率的影响模型。模拟结果表明:随着PM2.5浓度的增加,PM2.5粒子的消光系数呈逐渐增大的趋势;被探测光子的能量被衰减,导致光子的传输距离减小;当发射光子能量不变时,量子干涉雷达的最大探测误差概率随PM2.5粒子浓度的增加而增大;当PM2.5颗粒浓度不变时,随着发射光子能量的增加,最大检测误差概率逐渐减小。因此,应根据PM2.5污染程度适当调整平均发射光子数,以减少PM2.5大气污染对量子干涉雷达探测性能的影响。
{"title":"Effects of PM2.5 on the Detection Performance of Quantum Interference Radar","authors":"Lihao Tian, Min Nie, Guang Yang","doi":"10.1145/3573942.3574117","DOIUrl":"https://doi.org/10.1145/3573942.3574117","url":null,"abstract":"In order to study the influence of PM2.5 particles on the detection performance of quantum interference radar, this article analyzes the relationship between the concentration of PM2.5 particles and the extinction coefficient under different particle sizes based on the spectral distribution function of PM2.5 particles and the Mie scattering theory. Then establish the influence model of PM2.5 particles on the detection distance and maximum detection error probability of quantum interference radar. The simulation results show that as the concentration of PM2.5 particles increases, the extinction coefficient of PM2.5 particles shows a gradually increasing trend; the energy of the detected photons is attenuated, resulting in a decrease in the transmission distance of the photons; when the energy of the emitted photons remains unchanged, The maximum detection error probability of quantum interference radar increases with the increase of PM2.5 particle concentration; when the PM2.5 particle concentration remains unchanged, the maximum detection error probability decreases gradually with the increase of the emitted photon energy. Therefore, the average number of emitted photons should be appropriately adjusted according to PM2.5 pollution in order to reduce the impact of PM2.5 atmospheric pollution on the detection performance of quantum interference radar.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129346892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single Image Dehazing Via Enhanced CycleGAN 通过增强型CycleGAN实现单幅图像去雾
Sheping Zhai, Yuanbiao Liu, Dabao Cheng
Due to the influence of atmospheric light scattering, the images acquired by outdoor imaging device in haze scene will appear low definition, contrast reduction, overexposure and other visible quality degradation, which makes it difficult to handle the relevant computer vision tasks. Therefore, image dehazing has become an important research area of computer vision. However, existing dehazing methods generally require paired image datasets that include both hazy images and corresponding ground truth images, while the recovered images are easy to occur color distortion and detail loss. In this study, an end-to-end image dehazing method based on Cycle-consistent Generative Adversarial Networks (CycleGAN) is proposed. For effectively learning the mapping relationship between hazy images and clear images, we refine the transformation module of the generator by weighting optimization, which can promote the network adaptability to scale. Then in order to further improve the quality of generated images, the enhanced perceptual loss and low-frequency loss combined with image feature attributes are constructed in the overall optimization objective of the network. The experimental results show that our dehazing algorithm effectively recovers the texture information while correcting the color distortion of original CycleGAN, and the recovery effect is clear and more natural, which better reduces the influence of haze on the imaging quality.
在雾霾场景下,由于大气光散射的影响,室外成像设备获取的图像会出现清晰度低、对比度降低、过度曝光等可见质量下降的现象,给计算机视觉任务的处理带来困难。因此,图像去雾已经成为计算机视觉的一个重要研究领域。然而,现有的去雾方法通常需要同时包含朦胧图像和相应的地真图像的成对图像数据集,而恢复后的图像容易出现颜色失真和细节丢失。本文提出了一种基于循环一致性生成对抗网络(CycleGAN)的端到端图像去雾方法。为了有效地学习模糊图像和清晰图像之间的映射关系,我们通过加权优化对生成器的变换模块进行了细化,提高了网络的尺度适应性。然后,为了进一步提高生成图像的质量,在网络的总体优化目标中,结合图像特征属性构建增强的感知损失和低频损失。实验结果表明,我们的去雾算法在校正原始CycleGAN颜色失真的同时,有效地恢复了纹理信息,恢复效果清晰自然,较好地降低了雾霾对成像质量的影响。
{"title":"Single Image Dehazing Via Enhanced CycleGAN","authors":"Sheping Zhai, Yuanbiao Liu, Dabao Cheng","doi":"10.1145/3573942.3574097","DOIUrl":"https://doi.org/10.1145/3573942.3574097","url":null,"abstract":"Due to the influence of atmospheric light scattering, the images acquired by outdoor imaging device in haze scene will appear low definition, contrast reduction, overexposure and other visible quality degradation, which makes it difficult to handle the relevant computer vision tasks. Therefore, image dehazing has become an important research area of computer vision. However, existing dehazing methods generally require paired image datasets that include both hazy images and corresponding ground truth images, while the recovered images are easy to occur color distortion and detail loss. In this study, an end-to-end image dehazing method based on Cycle-consistent Generative Adversarial Networks (CycleGAN) is proposed. For effectively learning the mapping relationship between hazy images and clear images, we refine the transformation module of the generator by weighting optimization, which can promote the network adaptability to scale. Then in order to further improve the quality of generated images, the enhanced perceptual loss and low-frequency loss combined with image feature attributes are constructed in the overall optimization objective of the network. The experimental results show that our dehazing algorithm effectively recovers the texture information while correcting the color distortion of original CycleGAN, and the recovery effect is clear and more natural, which better reduces the influence of haze on the imaging quality.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129490217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyperspectral Anomaly Detection based on Autoencoder using Superpixel Manifold Constraint 基于超像素流形约束的自编码器高光谱异常检测
Yuquan Gan, Wenqiang Li, Y. Liu, Jinglu He, Ji Zhang
In the field of hyperspectral anomaly detection, autoencoder (AE) have become a hot research topic due to their unsupervised characteristics and powerful feature extraction capability. However, autoencoders do not keep the spatial structure information of the original data well during the training process, and is affected by anomalies, resulting in poor detection performance. To address these problems, a hyperspectral anomaly detection method based on autoencoders with superpixel manifold constraints is proposed. Firstly, superpixel segmentation technique is used to obtain the superpixels of the hyperspectral image, and then the manifold learning method is used to learn the embedded manifold that based on the superpixels. Secondly, the learned manifold constraints are embedded in the autoencoder to learn the potential representation, which can maintain the consistency of the local spatial and geometric structure of the hyperspectral images (HSI). Finally, anomalies are detected by computing reconstruction errors of the autoencoder. Extensive experiments are conducted on three datasets, and the experimental results show that the proposed method has better detection performance than other hyperspectral anomaly detectors.
在高光谱异常检测领域,自编码器以其无监督的特点和强大的特征提取能力成为研究的热点。然而,自编码器在训练过程中没有很好地保留原始数据的空间结构信息,受到异常的影响,导致检测性能较差。针对这些问题,提出了一种基于超像素流形约束的自编码器的高光谱异常检测方法。首先利用超像素分割技术获取高光谱图像的超像素,然后利用流形学习方法学习基于超像素的嵌入流形。其次,将学习到的流形约束嵌入到自编码器中学习潜在表示,以保持高光谱图像局部空间和几何结构的一致性;最后,通过计算自编码器的重构误差来检测异常。在三个数据集上进行了大量的实验,实验结果表明,该方法比其他高光谱异常检测器具有更好的检测性能。
{"title":"Hyperspectral Anomaly Detection based on Autoencoder using Superpixel Manifold Constraint","authors":"Yuquan Gan, Wenqiang Li, Y. Liu, Jinglu He, Ji Zhang","doi":"10.1145/3573942.3574108","DOIUrl":"https://doi.org/10.1145/3573942.3574108","url":null,"abstract":"In the field of hyperspectral anomaly detection, autoencoder (AE) have become a hot research topic due to their unsupervised characteristics and powerful feature extraction capability. However, autoencoders do not keep the spatial structure information of the original data well during the training process, and is affected by anomalies, resulting in poor detection performance. To address these problems, a hyperspectral anomaly detection method based on autoencoders with superpixel manifold constraints is proposed. Firstly, superpixel segmentation technique is used to obtain the superpixels of the hyperspectral image, and then the manifold learning method is used to learn the embedded manifold that based on the superpixels. Secondly, the learned manifold constraints are embedded in the autoencoder to learn the potential representation, which can maintain the consistency of the local spatial and geometric structure of the hyperspectral images (HSI). Finally, anomalies are detected by computing reconstruction errors of the autoencoder. Extensive experiments are conducted on three datasets, and the experimental results show that the proposed method has better detection performance than other hyperspectral anomaly detectors.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123665287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Dialogue Generation Based on Transformer and Collaborative Attention 基于变压器和协同注意的多模态对话生成
Wei Guan, Zhen Zhang, Li Ma
In view of the fact that the current multimodal dialogue generation models are based on a single image for question-and-answer dialogue generation, the image information cannot be deeply integrated into the sentences, resulting in the inability to generate semantically coherent, informative visual contextual dialogue responses, which further limits the application of multimodal dialogue generation models in real scenarios. This paper proposes a Deep Collaborative Attention Model (DCAN) method for multimodal dialogue generation tasks. First, the method globally encode the dialogue context and its corresponding visual context information respectively; second, to guide the simultaneous learning of interactions between image and text multimodal representations, after the visual context features are fused with the dialogue context features through the collaborative attention mechanism, the hadamard product is used to fully fuse the multimodal features again to improve the network performance; finally, the fused features are fed into a transformer-based decoder to generate coherent, informative responses. in order to solve the problem of continuous dialogue in multimodal dialogue, the method of this paper uses the OpenVidial2.0 data set to conduct experiments. The results show that the responses generated by this model have higher correlation and diversity than existing comparison models, and it can effectively integrate visual context information.
鉴于目前的多模态对话生成模型都是基于单一图像进行问答对话生成,图像信息无法深度融入句子,无法生成语义连贯、信息丰富的视觉语境对话响应,这进一步限制了多模态对话生成模型在真实场景中的应用。针对多模态对话生成任务,提出了一种深度协同注意模型(DCAN)方法。首先,该方法对对话上下文及其对应的视觉上下文信息进行全局编码;第二,为指导图文多模态表征交互的同时学习,通过协同注意机制将视觉语境特征与对话语境特征融合后,利用hadamard产品将多模态特征再次充分融合,提高网络性能;最后,将融合的特征输入到基于变压器的解码器中,以产生连贯的、信息丰富的响应。为了解决多模态对话中的连续对话问题,本文的方法使用OpenVidial2.0数据集进行实验。结果表明,该模型生成的响应比现有的比较模型具有更高的相关性和多样性,能够有效地整合视觉上下文信息。
{"title":"Multimodal Dialogue Generation Based on Transformer and Collaborative Attention","authors":"Wei Guan, Zhen Zhang, Li Ma","doi":"10.1145/3573942.3574091","DOIUrl":"https://doi.org/10.1145/3573942.3574091","url":null,"abstract":"In view of the fact that the current multimodal dialogue generation models are based on a single image for question-and-answer dialogue generation, the image information cannot be deeply integrated into the sentences, resulting in the inability to generate semantically coherent, informative visual contextual dialogue responses, which further limits the application of multimodal dialogue generation models in real scenarios. This paper proposes a Deep Collaborative Attention Model (DCAN) method for multimodal dialogue generation tasks. First, the method globally encode the dialogue context and its corresponding visual context information respectively; second, to guide the simultaneous learning of interactions between image and text multimodal representations, after the visual context features are fused with the dialogue context features through the collaborative attention mechanism, the hadamard product is used to fully fuse the multimodal features again to improve the network performance; finally, the fused features are fed into a transformer-based decoder to generate coherent, informative responses. in order to solve the problem of continuous dialogue in multimodal dialogue, the method of this paper uses the OpenVidial2.0 data set to conduct experiments. The results show that the responses generated by this model have higher correlation and diversity than existing comparison models, and it can effectively integrate visual context information.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114528444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Voicifier-LN: An Novel Approach to Elevate the Speaker Similarity for General Zero-shot Multi-Speaker TTS Voicifier-LN:一种提高一般零射多扬声器TTS中说话人相似度的新方法
Dengfeng Ke, Liangjie Huang, Wenhan Yao, Ruixin Hu, Xueyin Zu, Yanlu Xie, Jinsong Zhang
Speeches generated from neural network-based Text-to-Speech (TTS) have been becoming more natural and intelligible. However, the evident dropping performance still exists when synthesizing multi-speaker speeches in zero-shot manner, especially for those from different countries with different accents. To bridge this gap, we propose a novel method, called Voicifier. It firstly operates on high frequency mel-spectrogram bins to approximately remove the content and rhythm. Then Voicifier uses two strategies, from the shallow to the deep mixing, to further destroy the content and rhythm but retain the timbre. Furthermore, for better zero-shot performance, we propose Voice-Pin Layer Normalization (VPLN) which pins down the timbre according with the text feature. During inference, the model is allowed to synthesize high quality and similarity speeches with just around 1 sec target speech audio. Experiments and ablation studies prove that the methods are able to retain more target timbre while abandoning much more of the content and rhythm-related information. To our best knowledge, the methods are found to be universal that is to say it can be applied to most of the existing TTS systems to enhance the ability of cross-speaker synthesis.
基于神经网络的文本到语音(TTS)生成的语音已经变得更加自然和可理解。但是,在以零镜头的方式合成多人演讲时,表现仍然存在明显的下降现象,特别是对于来自不同国家、不同口音的人来说。为了弥补这一差距,我们提出了一种新颖的方法,称为Voicifier。它首先对高频梅尔谱箱进行运算,近似地去除内容和节奏。然后Voicifier使用了两种策略,从浅到深的混合,进一步破坏内容和节奏,但保留音色。此外,为了获得更好的零拍摄性能,我们提出了语音引脚层归一化(VPLN),该方法根据文本特征确定音色。在推理过程中,该模型被允许用大约1秒的目标语音音频合成高质量和相似的语音。实验和消融研究证明,该方法能够保留更多的目标音色,同时放弃更多的内容和节奏相关信息。据我们所知,这些方法是通用的,也就是说,它可以应用于大多数现有的TTS系统,以提高交叉说话人合成的能力。
{"title":"Voicifier-LN: An Novel Approach to Elevate the Speaker Similarity for General Zero-shot Multi-Speaker TTS","authors":"Dengfeng Ke, Liangjie Huang, Wenhan Yao, Ruixin Hu, Xueyin Zu, Yanlu Xie, Jinsong Zhang","doi":"10.1145/3573942.3574120","DOIUrl":"https://doi.org/10.1145/3573942.3574120","url":null,"abstract":"Speeches generated from neural network-based Text-to-Speech (TTS) have been becoming more natural and intelligible. However, the evident dropping performance still exists when synthesizing multi-speaker speeches in zero-shot manner, especially for those from different countries with different accents. To bridge this gap, we propose a novel method, called Voicifier. It firstly operates on high frequency mel-spectrogram bins to approximately remove the content and rhythm. Then Voicifier uses two strategies, from the shallow to the deep mixing, to further destroy the content and rhythm but retain the timbre. Furthermore, for better zero-shot performance, we propose Voice-Pin Layer Normalization (VPLN) which pins down the timbre according with the text feature. During inference, the model is allowed to synthesize high quality and similarity speeches with just around 1 sec target speech audio. Experiments and ablation studies prove that the methods are able to retain more target timbre while abandoning much more of the content and rhythm-related information. To our best knowledge, the methods are found to be universal that is to say it can be applied to most of the existing TTS systems to enhance the ability of cross-speaker synthesis.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114620721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental Encoding Transformer Incorporating Common-sense Awareness for Conversational Sentiment Recognition 基于常识感知的增量编码转换器用于会话情感识别
Xiao Yang, Xiaopeng Cao, Hao Liang
Conversational sentiment recognition has been widely used in people's lives and work. However, machines do not understand emotions through common-sense cognition. We propose an Incremental Encoding Transformer Incorporating Common-sense Awareness (IETCA) model. The model helps the machines use common-sense knowledge to better understand emotions in conversation. The model uses a context-aware graph attention mechanism to obtain knowledge-rich utterance representations and uses an incremental encoding Transformer to get rich contextual representations. We do some experiments on five datasets. The results show that the model has some improvement in conversational sentiment recognition.
会话情感识别已广泛应用于人们的生活和工作中。然而,机器不能通过常识认知来理解情感。我们提出了一种包含常识感知(IETCA)模型的增量编码转换器。该模型帮助机器使用常识知识来更好地理解对话中的情绪。该模型采用上下文感知的图注意机制获得知识丰富的话语表示,并采用增量式编码转换器获得丰富的上下文表示。我们在五个数据集上做了一些实验。结果表明,该模型在会话情感识别方面有一定的提高。
{"title":"Incremental Encoding Transformer Incorporating Common-sense Awareness for Conversational Sentiment Recognition","authors":"Xiao Yang, Xiaopeng Cao, Hao Liang","doi":"10.1145/3573942.3573965","DOIUrl":"https://doi.org/10.1145/3573942.3573965","url":null,"abstract":"Conversational sentiment recognition has been widely used in people's lives and work. However, machines do not understand emotions through common-sense cognition. We propose an Incremental Encoding Transformer Incorporating Common-sense Awareness (IETCA) model. The model helps the machines use common-sense knowledge to better understand emotions in conversation. The model uses a context-aware graph attention mechanism to obtain knowledge-rich utterance representations and uses an incremental encoding Transformer to get rich contextual representations. We do some experiments on five datasets. The results show that the model has some improvement in conversational sentiment recognition.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"169 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113987211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dual-Task Deep Neural Network for Scene and Action Recognition Based on 3D SENet and 3D SEResNet 基于3D SENet和3D SEResNet的场景和动作识别双任务深度神经网络
Zhouzhou Wei, Yuelei Xiao
Aiming at the problem that scene information will become noise and cause interference in the feature extraction stage of action recognition, a dual-task deep neural network model for scene and action recognition is proposed. The model first uses a convolutional layer and max pooling layer as shared layers to extract low-dimensional features, then uses 3D SEResNet for action recognition and 3D SENet for scene recognition, and finally outputs their respective results. In addition, to solve the problem that the existing public dataset is not associated with the scene, a scene and action dataset (SAAD) for recognition is built by ourselves. Experimental results show that our method performs better than other methods on SAAD dataset.
针对场景信息在动作识别特征提取阶段容易成为噪声而产生干扰的问题,提出了一种场景与动作识别双任务深度神经网络模型。该模型首先使用卷积层和最大池化层作为共享层提取低维特征,然后使用3D SEResNet进行动作识别,使用3D SENet进行场景识别,最后输出各自的结果。此外,为了解决现有公共数据集与场景没有关联的问题,我们自行构建了用于识别的场景与动作数据集(SAAD)。实验结果表明,该方法在SAAD数据集上的性能优于其他方法。
{"title":"A Dual-Task Deep Neural Network for Scene and Action Recognition Based on 3D SENet and 3D SEResNet","authors":"Zhouzhou Wei, Yuelei Xiao","doi":"10.1145/3573942.3574077","DOIUrl":"https://doi.org/10.1145/3573942.3574077","url":null,"abstract":"Aiming at the problem that scene information will become noise and cause interference in the feature extraction stage of action recognition, a dual-task deep neural network model for scene and action recognition is proposed. The model first uses a convolutional layer and max pooling layer as shared layers to extract low-dimensional features, then uses 3D SEResNet for action recognition and 3D SENet for scene recognition, and finally outputs their respective results. In addition, to solve the problem that the existing public dataset is not associated with the scene, a scene and action dataset (SAAD) for recognition is built by ourselves. Experimental results show that our method performs better than other methods on SAAD dataset.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127736322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Network Prediction Model Based on Differential Localization 基于差分定位的神经网络预测模型
Yuanhua Liu, Ruini Li, Xinliang Niu
The Global Navigation Satellite System-Reflectometry (GNSS-R) is affected by buildings, trees, etc. during the transmission process, which generates large errors. The traditional method is to use differential to eliminate most of the errors to improve positioning accuracy. In this paper, a neural network prediction model based on differential results is proposed, which uses the differential results X, Y and Z as the inputs of the neural network to predict the satellite position, and finally compare it with the real value. The paper uses Artificial Neural Network (ANN), Recurrent Neural Network (RNN) and Long Short Term Memory-Recurrent Neural Network (LSTM-RNN) are used to establish training models and make predictions. The results show that compared with the ANN model, the Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) of the RNN model are reduced by 1.54% and 3.59%, respectively; compared with the RNN model, the MAPE and RMSE of the LSTM-RNN model are reduced by 21.16% and 14.81%, respectively, which proves that the training accuracy and fit of the LSTM-RNN are better.
全球导航卫星系统反射(GNSS-R)在传输过程中受到建筑物、树木等的影响,产生较大的误差。传统的定位方法是利用微分法消除大部分误差,以提高定位精度。本文提出了一种基于差分结果的神经网络预测模型,将差分结果X、Y、Z作为神经网络的输入,对卫星位置进行预测,最后与实测值进行比较。本文采用人工神经网络(ANN)、递归神经网络(RNN)和长短期记忆-递归神经网络(LSTM-RNN)建立训练模型并进行预测。结果表明:与人工神经网络模型相比,RNN模型的平均绝对百分比误差(MAPE)和均方根误差(RMSE)分别减小了1.54%和3.59%;与RNN模型相比,LSTM-RNN模型的MAPE和RMSE分别降低了21.16%和14.81%,证明LSTM-RNN的训练精度和拟合更好。
{"title":"Neural Network Prediction Model Based on Differential Localization","authors":"Yuanhua Liu, Ruini Li, Xinliang Niu","doi":"10.1145/3573942.3573960","DOIUrl":"https://doi.org/10.1145/3573942.3573960","url":null,"abstract":"The Global Navigation Satellite System-Reflectometry (GNSS-R) is affected by buildings, trees, etc. during the transmission process, which generates large errors. The traditional method is to use differential to eliminate most of the errors to improve positioning accuracy. In this paper, a neural network prediction model based on differential results is proposed, which uses the differential results X, Y and Z as the inputs of the neural network to predict the satellite position, and finally compare it with the real value. The paper uses Artificial Neural Network (ANN), Recurrent Neural Network (RNN) and Long Short Term Memory-Recurrent Neural Network (LSTM-RNN) are used to establish training models and make predictions. The results show that compared with the ANN model, the Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) of the RNN model are reduced by 1.54% and 3.59%, respectively; compared with the RNN model, the MAPE and RMSE of the LSTM-RNN model are reduced by 21.16% and 14.81%, respectively, which proves that the training accuracy and fit of the LSTM-RNN are better.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126429399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1