首页 > 最新文献

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition最新文献

英文 中文
Real-Time Calibration Method of Air Quality Data Based on AdaBoost Training Model 基于AdaBoost训练模型的空气质量数据实时校准方法
Xuejing Jiang, Xun Sun, Qiuming Liu
At present, a large number of cities are facing the situation of "garbage besieged", and the existing garbage disposal system can no longer meet the increasingly complex factors. With the development of a new generation of Internet of Things technology, integrating the knowledge and technology of related disciplines such as network and geographic information, it is possible to build a real-time monitoring platform for seepage and odor in landfills to complete gas monitoring. The author of the paper reviewed the related technologies of the Internet of Things, and proposed the design scheme of the online monitoring system for odor and seepage of the Maiyuan garbage dump in Nanchang City, selected 5 monitoring items, completed the data collection, and used the collected data to use Matlab and python software to carry out simulation analysis and prediction, and finally discuss the main factors and treatment measures of environmental pollution, provide theoretical guidance for relevant managers to improve the overall management decision-making level of urban domestic garbage dumps, and draw some practical conclusions.
目前,大量城市面临“垃圾围城”的局面,现有的垃圾处理系统已经无法满足日益复杂的因素。随着新一代物联网技术的发展,整合网络、地理信息等相关学科的知识和技术,构建垃圾填埋场渗流、恶臭实时监测平台,完成气体监测成为可能。本文作者在回顾物联网相关技术的基础上,提出了南昌市麦园垃圾场恶臭、渗漏在线监测系统的设计方案,选取了5个监测项目,完成了数据采集,并利用采集到的数据利用Matlab和python软件进行仿真分析和预测,最后探讨了环境污染的主要影响因素和治理措施。为相关管理者提高城市生活垃圾填埋场整体管理决策水平提供理论指导,并得出一些实用结论。
{"title":"Real-Time Calibration Method of Air Quality Data Based on AdaBoost Training Model","authors":"Xuejing Jiang, Xun Sun, Qiuming Liu","doi":"10.1145/3581807.3581882","DOIUrl":"https://doi.org/10.1145/3581807.3581882","url":null,"abstract":"At present, a large number of cities are facing the situation of \"garbage besieged\", and the existing garbage disposal system can no longer meet the increasingly complex factors. With the development of a new generation of Internet of Things technology, integrating the knowledge and technology of related disciplines such as network and geographic information, it is possible to build a real-time monitoring platform for seepage and odor in landfills to complete gas monitoring. The author of the paper reviewed the related technologies of the Internet of Things, and proposed the design scheme of the online monitoring system for odor and seepage of the Maiyuan garbage dump in Nanchang City, selected 5 monitoring items, completed the data collection, and used the collected data to use Matlab and python software to carry out simulation analysis and prediction, and finally discuss the main factors and treatment measures of environmental pollution, provide theoretical guidance for relevant managers to improve the overall management decision-making level of urban domestic garbage dumps, and draw some practical conclusions.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126848474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Simulation of A RF Front-end Circuit of Dual Channel Navigation Receiver 双通道导航接收机射频前端电路的设计与仿真
Yu Zhang, Qiang Wu, Jie Liu
RF front-end design is one of the most important steps in receiver design, and its noise performance has a significant impact on the received signal noise characteristics, baseband processing performance, the final positioning accuracy and other indicators. The theoretical IF value of the satellite signal received by the receiver and the frequency search range of the signal acquisition algorithm depend on the frequency setting scheme of the RF front-end. This paper introduces the RF front-end design of a dual-channel multi-mode multi-frequency receiver, mainly for BI1 and L1C frequency points, the circuit design of the RF end of the receiver is introduced in detail, and the corresponding solutions are introduced for the link attenuation, signal radiation, electromagnetic interference and other conditions of the RF link. And the corresponding link simulation is carried out through ADS to ensure the reliability of the design. The final product is tested by the corresponding hardware and software, and the expected effect is achieved.
射频前端设计是接收机设计中最重要的步骤之一,其噪声性能对接收信号的噪声特性、基带处理性能、最终定位精度等指标有着重要的影响。接收机接收到的卫星信号的理论中频值和信号采集算法的频率搜索范围取决于射频前端的频率整定方案。本文介绍了一种双通道多模多频接收机的射频前端设计,主要针对BI1和L1C两种频率点,详细介绍了接收机射频端的电路设计,并针对射频链路的链路衰减、信号辐射、电磁干扰等情况介绍了相应的解决方案。并通过ADS进行了相应的链路仿真,保证了设计的可靠性。通过相应的硬件和软件对最终产品进行测试,达到了预期的效果。
{"title":"Design and Simulation of A RF Front-end Circuit of Dual Channel Navigation Receiver","authors":"Yu Zhang, Qiang Wu, Jie Liu","doi":"10.1145/3581807.3581905","DOIUrl":"https://doi.org/10.1145/3581807.3581905","url":null,"abstract":"RF front-end design is one of the most important steps in receiver design, and its noise performance has a significant impact on the received signal noise characteristics, baseband processing performance, the final positioning accuracy and other indicators. The theoretical IF value of the satellite signal received by the receiver and the frequency search range of the signal acquisition algorithm depend on the frequency setting scheme of the RF front-end. This paper introduces the RF front-end design of a dual-channel multi-mode multi-frequency receiver, mainly for BI1 and L1C frequency points, the circuit design of the RF end of the receiver is introduced in detail, and the corresponding solutions are introduced for the link attenuation, signal radiation, electromagnetic interference and other conditions of the RF link. And the corresponding link simulation is carried out through ADS to ensure the reliability of the design. The final product is tested by the corresponding hardware and software, and the expected effect is achieved.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123792351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulation and Design of a 56Gbps Cross-backplane Transmission Channel 56Gbps跨背板传输信道的仿真与设计
Kai Yao, Qiang Wu, Jinling Cui
Over the past decade, new technologies and applications such as artificial intelligence, cloud services, and big data have led to an exponential increase in Internet-connected devices and data traffic. This puts forward higher requirements for bandwidth and stability of data transmission. In order to achieve the goal of device integration and miniaturization, the backplane structure is often used to realize the interconnect between board and card systems. The backplane is used as the basis for data exchange. However, the long-distance transmission across the backplane will cause serious losses and various signal integrity problems. In recent years, with the development of serial transmission, 56Gbps PAM4 modulation transmission has gradually shown transmission efficiency beyond 28Gbps NRZ. The change of modulation mode brings an inherent loss of 9.5dB, and PAM4 modulation has more stringent requirements on signal integrity. In this paper, ADS is used to model and simulate the cross-backplane long-distance transmission channel, and a set of high-speed transmission channel design scheme based on OIF CSI-56G-LR specification is established from the aspects of plate, laminates and holes.
近十年来,人工智能、云服务、大数据等新技术和新应用推动互联设备和数据流量呈指数级增长。这对数据传输的带宽和稳定性提出了更高的要求。为了达到器件集成化和小型化的目的,通常采用背板结构来实现板卡系统之间的互连。背板作为数据交换的基础。然而,跨背板的长距离传输会造成严重的损耗和各种信号完整性问题。近年来,随着串行传输技术的发展,56Gbps PAM4调制传输逐渐显示出超过28Gbps NRZ的传输效率。调制方式的改变带来9.5dB的固有损耗,PAM4调制对信号完整性的要求更为严格。本文利用ADS对跨背板远距离传输信道进行建模和仿真,从板、层和孔三个方面建立了一套基于OIF CSI-56G-LR规范的高速传输信道设计方案。
{"title":"Simulation and Design of a 56Gbps Cross-backplane Transmission Channel","authors":"Kai Yao, Qiang Wu, Jinling Cui","doi":"10.1145/3581807.3581867","DOIUrl":"https://doi.org/10.1145/3581807.3581867","url":null,"abstract":"Over the past decade, new technologies and applications such as artificial intelligence, cloud services, and big data have led to an exponential increase in Internet-connected devices and data traffic. This puts forward higher requirements for bandwidth and stability of data transmission. In order to achieve the goal of device integration and miniaturization, the backplane structure is often used to realize the interconnect between board and card systems. The backplane is used as the basis for data exchange. However, the long-distance transmission across the backplane will cause serious losses and various signal integrity problems. In recent years, with the development of serial transmission, 56Gbps PAM4 modulation transmission has gradually shown transmission efficiency beyond 28Gbps NRZ. The change of modulation mode brings an inherent loss of 9.5dB, and PAM4 modulation has more stringent requirements on signal integrity. In this paper, ADS is used to model and simulate the cross-backplane long-distance transmission channel, and a set of high-speed transmission channel design scheme based on OIF CSI-56G-LR specification is established from the aspects of plate, laminates and holes.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121522834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GANExplainer: Explainability Method for Graph Neural Network with Generative Adversarial Nets 基于生成对抗网络的图神经网络的可解释性方法
Xinrui Kang, Dong Liang, Qinfeng Li
In recent years, graph neural networks (GNNs) have achieved encouraging performance in the processing of graph data generated in non-Euclidean space. GNNs learn node features by aggregating and combining neighbor information, which is applied to many graphics tasks. However, the complex deep learning structure is still regarded as a black box, which is difficult to obtain the full trust of human beings. Due to the lack of interpretability, the application of graph neural network is greatly limited. Therefore, we propose an interpretable method, called GANExplainer, to explain GNNs at the model level. Our method can implicitly generate the characteristic subgraph of the graph without relying on specific input examples as the interpretation of the model to the data. GANExplainer relies on the framework of generative-adversarial method to train the generator and discriminator at the same time. More importantly, when constructing the discriminator, the corresponding graph rules are added to ensure the effectiveness of the generated characteristic subgraph. We carried out experiments on synthetic dataset and chemical molecules dataset and verified the effect of our method on model level interpreter from three aspects: accuracy, fidelity and sparsity.
近年来,图神经网络(gnn)在处理非欧几里德空间生成的图数据方面取得了令人鼓舞的成绩。gnn通过聚合和组合邻居信息来学习节点特征,这种方法被应用于许多图形任务中。然而,复杂的深度学习结构仍然被视为一个黑盒子,难以获得人类的充分信任。由于缺乏可解释性,极大地限制了图神经网络的应用。因此,我们提出了一种可解释的方法,称为GANExplainer,以在模型级别解释gnn。我们的方法可以隐式地生成图的特征子图,而不依赖于特定的输入示例作为模型对数据的解释。GANExplainer依靠生成对抗方法的框架来同时训练生成器和鉴别器。更重要的是,在构造鉴别器时,加入了相应的图规则,保证了生成的特征子图的有效性。我们在合成数据集和化学分子数据集上进行了实验,从准确性、保真度和稀疏度三个方面验证了我们的方法在模型级解释器上的效果。
{"title":"GANExplainer: Explainability Method for Graph Neural Network with Generative Adversarial Nets","authors":"Xinrui Kang, Dong Liang, Qinfeng Li","doi":"10.1145/3581807.3581850","DOIUrl":"https://doi.org/10.1145/3581807.3581850","url":null,"abstract":"In recent years, graph neural networks (GNNs) have achieved encouraging performance in the processing of graph data generated in non-Euclidean space. GNNs learn node features by aggregating and combining neighbor information, which is applied to many graphics tasks. However, the complex deep learning structure is still regarded as a black box, which is difficult to obtain the full trust of human beings. Due to the lack of interpretability, the application of graph neural network is greatly limited. Therefore, we propose an interpretable method, called GANExplainer, to explain GNNs at the model level. Our method can implicitly generate the characteristic subgraph of the graph without relying on specific input examples as the interpretation of the model to the data. GANExplainer relies on the framework of generative-adversarial method to train the generator and discriminator at the same time. More importantly, when constructing the discriminator, the corresponding graph rules are added to ensure the effectiveness of the generated characteristic subgraph. We carried out experiments on synthetic dataset and chemical molecules dataset and verified the effect of our method on model level interpreter from three aspects: accuracy, fidelity and sparsity.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133542357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMOT: Motion-Aware Multi-Object Tracking with Optical Flow MMOT:运动感知多目标跟踪与光流
Haodong Liu, Tianyang Xu, Xiaojun Wu
Modern multi-object tracking (MOT) benefited from recent advances in deep neural network and large video datasets. However, there are still some challenges impeding further improvement of the tracking performance, including complex background, fast motion and occlusion scenes. In this paper, we propose a new framework which employs motion information with optical flow, enable directly distinguishing the foreground and background regions. The proposed end-to-end network consists of two branches to separately model the spatial feature representations and optical flow motion patterns. We propose different fusion mechanism by combining the motion clues and appearance information. The results on MOT17 dataset show that our method is an effective mechanism in modeling temporal-spatial information.
现代多目标跟踪(MOT)得益于深度神经网络和大型视频数据集的最新进展。但是,在背景复杂、运动速度快、遮挡场景等方面,仍然存在一些阻碍跟踪性能进一步提高的问题。本文提出了一种利用运动信息和光流来直接区分前景和背景区域的新框架。提出的端到端网络由两个分支组成,分别对空间特征表示和光流运动模式进行建模。结合运动线索和外观信息,提出了不同的融合机制。在MOT17数据集上的结果表明,该方法是一种有效的时空信息建模机制。
{"title":"MMOT: Motion-Aware Multi-Object Tracking with Optical Flow","authors":"Haodong Liu, Tianyang Xu, Xiaojun Wu","doi":"10.1145/3581807.3581824","DOIUrl":"https://doi.org/10.1145/3581807.3581824","url":null,"abstract":"Modern multi-object tracking (MOT) benefited from recent advances in deep neural network and large video datasets. However, there are still some challenges impeding further improvement of the tracking performance, including complex background, fast motion and occlusion scenes. In this paper, we propose a new framework which employs motion information with optical flow, enable directly distinguishing the foreground and background regions. The proposed end-to-end network consists of two branches to separately model the spatial feature representations and optical flow motion patterns. We propose different fusion mechanism by combining the motion clues and appearance information. The results on MOT17 dataset show that our method is an effective mechanism in modeling temporal-spatial information.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133795845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Fusion of Visual and Semantic Representations by Gated Co-Attention for Scene Text Recognition 基于门控共同注意的场景文本识别中视觉和语义表征的改进融合
Junwei Zhou, Xi Wang, Jiao Dai, Jizhong Han
Recognizing variations of text occurrences in scene photos is still difficult in the present day. In recent years, the performance of text recognition models based on the attention mechanism has vastly increased. However, these models typically focus on recognizing image regions or visual attention that are significant. In this paper, we present a unique paradigm for scene text recognition named gated co-attention. Using our suggested model, visual and semantic attention may be jointly reasoned. Given the visual features extracted by a convolutional network and the semantic features extracted by a language model, the first step involves combining the two sets of features. Second, the gated co-attention stage eliminates irrelevant visual characteristics and incorrect semantic data before fusing the knowledge of the two modalities. In addition, we analyze the performance of our model on several datasets, and the experimental results demonstrate that our method has outstanding performance on all seven datasets, with the best results reached on four datasets.
在今天,识别场景照片中文本出现的变化仍然很困难。近年来,基于注意机制的文本识别模型的性能有了很大的提高。然而,这些模型通常专注于识别图像区域或重要的视觉注意力。在本文中,我们提出了一种独特的场景文本识别范式——门控共注意。使用我们提出的模型,视觉注意和语义注意可以联合推理。给定卷积网络提取的视觉特征和语言模型提取的语义特征,第一步是将两组特征结合起来。其次,门控的共同注意阶段在融合两种模式的知识之前,消除了不相关的视觉特征和不正确的语义数据。此外,我们分析了我们的模型在多个数据集上的性能,实验结果表明我们的方法在所有7个数据集上都有出色的性能,其中在4个数据集上达到了最好的结果。
{"title":"Improved Fusion of Visual and Semantic Representations by Gated Co-Attention for Scene Text Recognition","authors":"Junwei Zhou, Xi Wang, Jiao Dai, Jizhong Han","doi":"10.1145/3581807.3581837","DOIUrl":"https://doi.org/10.1145/3581807.3581837","url":null,"abstract":"Recognizing variations of text occurrences in scene photos is still difficult in the present day. In recent years, the performance of text recognition models based on the attention mechanism has vastly increased. However, these models typically focus on recognizing image regions or visual attention that are significant. In this paper, we present a unique paradigm for scene text recognition named gated co-attention. Using our suggested model, visual and semantic attention may be jointly reasoned. Given the visual features extracted by a convolutional network and the semantic features extracted by a language model, the first step involves combining the two sets of features. Second, the gated co-attention stage eliminates irrelevant visual characteristics and incorrect semantic data before fusing the knowledge of the two modalities. In addition, we analyze the performance of our model on several datasets, and the experimental results demonstrate that our method has outstanding performance on all seven datasets, with the best results reached on four datasets.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114758044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic Maximum Relevance and Modal Alignment for Cross-Modal Retrieval 跨模态检索的语义最大关联和模态对齐
Pingping Sun, Baohua Qiang, Zhiguang Liu, Xianyi Yang, Guangyong Xi, Weigang Liu, Ruidong Chen, S. Zhang
With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.
随着多媒体数据资源的日益丰富,挖掘不同模态之间的关系以实现精细化的跨模态检索的研究逐渐兴起。在本文中,我们提出了一种新的跨模态检索的语义最大关联和模态对齐(SMR-MA)方法,该方法利用具有丰富图像文本信息的预训练模型提取每个图像文本的特征,并通过模态对齐模块和具有共享权重的多层感知器进一步促进相同语义类别之间的模态信息交互。此外,将多模态嵌入分布到归一化超球上,并在角空间中对特征嵌入和权值进行角边惩罚,使分类边界最大化,从而增加类内相似度和类间差异。在三个基准数据集上进行的综合分析实验表明,该方法在跨模态检索任务中具有优异的性能,明显优于现有的跨模态检索方法。
{"title":"Semantic Maximum Relevance and Modal Alignment for Cross-Modal Retrieval","authors":"Pingping Sun, Baohua Qiang, Zhiguang Liu, Xianyi Yang, Guangyong Xi, Weigang Liu, Ruidong Chen, S. Zhang","doi":"10.1145/3581807.3581857","DOIUrl":"https://doi.org/10.1145/3581807.3581857","url":null,"abstract":"With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116152215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on Phoneme Recognition using Attention-based Methods 基于注意的音素识别方法研究
Yupei Zhang
A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.
音素是语言中最小的声音单位。每种语言都有相应的音素。音素识别可以用于基于语音的应用程序,如自动语音识别和口型同步。本文提出了一个端到端的深度学习模型,称为连接时间分类(CTC)和基于注意力的seq2seq网络,该网络由编码器中的一个双GRU层和解码器中的一个GRU层组成,用于识别语音中的音素。在TIMIT数据集上的实验证明了它在其他一些seq2seq网络上的优势,在应用注意机制后,提高了50%以上。
{"title":"Research on Phoneme Recognition using Attention-based Methods","authors":"Yupei Zhang","doi":"10.1145/3581807.3581866","DOIUrl":"https://doi.org/10.1145/3581807.3581866","url":null,"abstract":"A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124721257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tongue Image Retrieval Based On Reinforcement Learning 基于强化学习的舌头图像检索
A. Farooq, Xinfeng Zhang
In Chinese medicine, the patient's body constitution plays a crucial role in determining the course of treatment because it is so intrinsically linked to the patient's physiological and pathological processes. Traditional Chinese medicine practitioners use tongue diagnosis to determine a person's constitutional type during an examination. An effective solution is needed to overcome the complexity of this setting before the tongue image constitution recognition system can be deployed on a non-invasive mobile device for fast, efficient, and accurate constitution recognition. We will use deep deterministic policy gradients to implement tongue retrieval techniques. We suggested a new method for image retrieval systems based on Deep Deterministic Policy Gradients (DDPG) in an effort to boost the precision of database searches for query images. We present a strategy for enhancing image retrieval accuracy that uses the complexity of individual instances to split the dataset into two subsets for independent classification using Deep reinforcement learning. Experiments on tongue datasets are performed to gauge the efficacy of our suggested approach; in these experiments, deep reinforcement learning techniques are applied to develop a retrieval system for pictures of tongues affected by various disorders. Using our proposed strategy, it may be possible to enhance image retrieval accuracy through enhanced recognition of tongue diseases. Databases containing pictures of tongues affected by a wide range of disorders will be used as examples. The experimental results suggest that the new approach to computing the main colour histogram outperforms the prior one. Though the difference is tiny statistically, the enhanced retrieval impact is clear to the human eye. The tongue is similarly brought to the fore to emphasise the importance of the required verbal statement. Both investigations used tongue images classified into five distinct categories.
在中医中,病人的体质在决定治疗过程中起着至关重要的作用,因为它与病人的生理和病理过程有着内在的联系。中医在检查时使用舌头诊断来确定一个人的体质类型。在非侵入性移动设备上部署舌头图像体质识别系统以实现快速、高效、准确的体质识别之前,需要一个有效的解决方案来克服这种设置的复杂性。我们将使用深度确定性策略梯度来实现舌头检索技术。本文提出了一种基于深度确定性策略梯度(Deep Deterministic Policy Gradients, DDPG)的图像检索方法,以提高数据库对查询图像的检索精度。我们提出了一种提高图像检索精度的策略,该策略利用单个实例的复杂性将数据集分成两个子集,使用深度强化学习进行独立分类。在舌头数据集上进行了实验,以衡量我们建议的方法的有效性;在这些实验中,应用深度强化学习技术开发了一个检索系统,用于检索受各种疾病影响的舌头图片。使用我们提出的策略,可以通过增强对舌头疾病的识别来提高图像检索的准确性。包含受各种疾病影响的舌头图片的数据库将被用作例子。实验结果表明,计算主颜色直方图的新方法优于先前的方法。虽然统计上的差异很小,但增强的检索影响对人眼来说是显而易见的。同样,舌头也被放在前面,以强调所需要的口头陈述的重要性。这两项调查使用的舌头图像都被分为五种不同的类别。
{"title":"Tongue Image Retrieval Based On Reinforcement Learning","authors":"A. Farooq, Xinfeng Zhang","doi":"10.1145/3581807.3581848","DOIUrl":"https://doi.org/10.1145/3581807.3581848","url":null,"abstract":"In Chinese medicine, the patient's body constitution plays a crucial role in determining the course of treatment because it is so intrinsically linked to the patient's physiological and pathological processes. Traditional Chinese medicine practitioners use tongue diagnosis to determine a person's constitutional type during an examination. An effective solution is needed to overcome the complexity of this setting before the tongue image constitution recognition system can be deployed on a non-invasive mobile device for fast, efficient, and accurate constitution recognition. We will use deep deterministic policy gradients to implement tongue retrieval techniques. We suggested a new method for image retrieval systems based on Deep Deterministic Policy Gradients (DDPG) in an effort to boost the precision of database searches for query images. We present a strategy for enhancing image retrieval accuracy that uses the complexity of individual instances to split the dataset into two subsets for independent classification using Deep reinforcement learning. Experiments on tongue datasets are performed to gauge the efficacy of our suggested approach; in these experiments, deep reinforcement learning techniques are applied to develop a retrieval system for pictures of tongues affected by various disorders. Using our proposed strategy, it may be possible to enhance image retrieval accuracy through enhanced recognition of tongue diseases. Databases containing pictures of tongues affected by a wide range of disorders will be used as examples. The experimental results suggest that the new approach to computing the main colour histogram outperforms the prior one. Though the difference is tiny statistically, the enhanced retrieval impact is clear to the human eye. The tongue is similarly brought to the fore to emphasise the importance of the required verbal statement. Both investigations used tongue images classified into five distinct categories.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121905344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ResAsapp: An Effective Convolution to Distinguish Adjacent Pixels For Scene Text Detection ResAsapp:一种用于场景文本检测的有效卷积识别相邻像素
Kangming Weng, X. Du, Kunze Chen, Dahan Wang, Shunzhi Zhu
The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive research has shown that the segmentation-based method will be disturbed by adjoining pixels and cannot effectively identify the text boundaries. To tackle this problem, we proposed a ResAsapp Conv based on the PSE algorithm. This convolution structure can provide different scale visual fields about the object and make it effectively recognize the boundary of texts. The method's effectiveness is validated on three benchmark datasets, CTW1500, Total-Text, and ICDAR2015 datasets. In particular, on the CTW1500 dataset, a dataset full of long curve text in all kinds of scenes, which is hard to distinguish, our network achieves an F-measure of 81.2%.
基于分割的场景文本检测方法是场景文本检测的一个重要方向,它可以检测任意文本或弯曲文本,越来越受到研究者的关注。然而,大量研究表明,基于分割的方法会受到相邻像素的干扰,无法有效识别文本边界。为了解决这个问题,我们提出了一种基于PSE算法的ResAsapp Conv。这种卷积结构可以提供物体不同尺度的视野,使其能够有效地识别文本的边界。在CTW1500、Total-Text和ICDAR2015三个基准数据集上验证了该方法的有效性。特别是在CTW1500数据集上,我们的网络实现了81.2%的F-measure。CTW1500数据集是一个充满各种场景的长曲线文本的数据集,很难区分。
{"title":"ResAsapp: An Effective Convolution to Distinguish Adjacent Pixels For Scene Text Detection","authors":"Kangming Weng, X. Du, Kunze Chen, Dahan Wang, Shunzhi Zhu","doi":"10.1145/3581807.3581854","DOIUrl":"https://doi.org/10.1145/3581807.3581854","url":null,"abstract":"The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive research has shown that the segmentation-based method will be disturbed by adjoining pixels and cannot effectively identify the text boundaries. To tackle this problem, we proposed a ResAsapp Conv based on the PSE algorithm. This convolution structure can provide different scale visual fields about the object and make it effectively recognize the boundary of texts. The method's effectiveness is validated on three benchmark datasets, CTW1500, Total-Text, and ICDAR2015 datasets. In particular, on the CTW1500 dataset, a dataset full of long curve text in all kinds of scenes, which is hard to distinguish, our network achieves an F-measure of 81.2%.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125191899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1