At present, a large number of cities are facing the situation of "garbage besieged", and the existing garbage disposal system can no longer meet the increasingly complex factors. With the development of a new generation of Internet of Things technology, integrating the knowledge and technology of related disciplines such as network and geographic information, it is possible to build a real-time monitoring platform for seepage and odor in landfills to complete gas monitoring. The author of the paper reviewed the related technologies of the Internet of Things, and proposed the design scheme of the online monitoring system for odor and seepage of the Maiyuan garbage dump in Nanchang City, selected 5 monitoring items, completed the data collection, and used the collected data to use Matlab and python software to carry out simulation analysis and prediction, and finally discuss the main factors and treatment measures of environmental pollution, provide theoretical guidance for relevant managers to improve the overall management decision-making level of urban domestic garbage dumps, and draw some practical conclusions.
{"title":"Real-Time Calibration Method of Air Quality Data Based on AdaBoost Training Model","authors":"Xuejing Jiang, Xun Sun, Qiuming Liu","doi":"10.1145/3581807.3581882","DOIUrl":"https://doi.org/10.1145/3581807.3581882","url":null,"abstract":"At present, a large number of cities are facing the situation of \"garbage besieged\", and the existing garbage disposal system can no longer meet the increasingly complex factors. With the development of a new generation of Internet of Things technology, integrating the knowledge and technology of related disciplines such as network and geographic information, it is possible to build a real-time monitoring platform for seepage and odor in landfills to complete gas monitoring. The author of the paper reviewed the related technologies of the Internet of Things, and proposed the design scheme of the online monitoring system for odor and seepage of the Maiyuan garbage dump in Nanchang City, selected 5 monitoring items, completed the data collection, and used the collected data to use Matlab and python software to carry out simulation analysis and prediction, and finally discuss the main factors and treatment measures of environmental pollution, provide theoretical guidance for relevant managers to improve the overall management decision-making level of urban domestic garbage dumps, and draw some practical conclusions.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126848474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RF front-end design is one of the most important steps in receiver design, and its noise performance has a significant impact on the received signal noise characteristics, baseband processing performance, the final positioning accuracy and other indicators. The theoretical IF value of the satellite signal received by the receiver and the frequency search range of the signal acquisition algorithm depend on the frequency setting scheme of the RF front-end. This paper introduces the RF front-end design of a dual-channel multi-mode multi-frequency receiver, mainly for BI1 and L1C frequency points, the circuit design of the RF end of the receiver is introduced in detail, and the corresponding solutions are introduced for the link attenuation, signal radiation, electromagnetic interference and other conditions of the RF link. And the corresponding link simulation is carried out through ADS to ensure the reliability of the design. The final product is tested by the corresponding hardware and software, and the expected effect is achieved.
{"title":"Design and Simulation of A RF Front-end Circuit of Dual Channel Navigation Receiver","authors":"Yu Zhang, Qiang Wu, Jie Liu","doi":"10.1145/3581807.3581905","DOIUrl":"https://doi.org/10.1145/3581807.3581905","url":null,"abstract":"RF front-end design is one of the most important steps in receiver design, and its noise performance has a significant impact on the received signal noise characteristics, baseband processing performance, the final positioning accuracy and other indicators. The theoretical IF value of the satellite signal received by the receiver and the frequency search range of the signal acquisition algorithm depend on the frequency setting scheme of the RF front-end. This paper introduces the RF front-end design of a dual-channel multi-mode multi-frequency receiver, mainly for BI1 and L1C frequency points, the circuit design of the RF end of the receiver is introduced in detail, and the corresponding solutions are introduced for the link attenuation, signal radiation, electromagnetic interference and other conditions of the RF link. And the corresponding link simulation is carried out through ADS to ensure the reliability of the design. The final product is tested by the corresponding hardware and software, and the expected effect is achieved.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123792351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past decade, new technologies and applications such as artificial intelligence, cloud services, and big data have led to an exponential increase in Internet-connected devices and data traffic. This puts forward higher requirements for bandwidth and stability of data transmission. In order to achieve the goal of device integration and miniaturization, the backplane structure is often used to realize the interconnect between board and card systems. The backplane is used as the basis for data exchange. However, the long-distance transmission across the backplane will cause serious losses and various signal integrity problems. In recent years, with the development of serial transmission, 56Gbps PAM4 modulation transmission has gradually shown transmission efficiency beyond 28Gbps NRZ. The change of modulation mode brings an inherent loss of 9.5dB, and PAM4 modulation has more stringent requirements on signal integrity. In this paper, ADS is used to model and simulate the cross-backplane long-distance transmission channel, and a set of high-speed transmission channel design scheme based on OIF CSI-56G-LR specification is established from the aspects of plate, laminates and holes.
{"title":"Simulation and Design of a 56Gbps Cross-backplane Transmission Channel","authors":"Kai Yao, Qiang Wu, Jinling Cui","doi":"10.1145/3581807.3581867","DOIUrl":"https://doi.org/10.1145/3581807.3581867","url":null,"abstract":"Over the past decade, new technologies and applications such as artificial intelligence, cloud services, and big data have led to an exponential increase in Internet-connected devices and data traffic. This puts forward higher requirements for bandwidth and stability of data transmission. In order to achieve the goal of device integration and miniaturization, the backplane structure is often used to realize the interconnect between board and card systems. The backplane is used as the basis for data exchange. However, the long-distance transmission across the backplane will cause serious losses and various signal integrity problems. In recent years, with the development of serial transmission, 56Gbps PAM4 modulation transmission has gradually shown transmission efficiency beyond 28Gbps NRZ. The change of modulation mode brings an inherent loss of 9.5dB, and PAM4 modulation has more stringent requirements on signal integrity. In this paper, ADS is used to model and simulate the cross-backplane long-distance transmission channel, and a set of high-speed transmission channel design scheme based on OIF CSI-56G-LR specification is established from the aspects of plate, laminates and holes.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121522834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, graph neural networks (GNNs) have achieved encouraging performance in the processing of graph data generated in non-Euclidean space. GNNs learn node features by aggregating and combining neighbor information, which is applied to many graphics tasks. However, the complex deep learning structure is still regarded as a black box, which is difficult to obtain the full trust of human beings. Due to the lack of interpretability, the application of graph neural network is greatly limited. Therefore, we propose an interpretable method, called GANExplainer, to explain GNNs at the model level. Our method can implicitly generate the characteristic subgraph of the graph without relying on specific input examples as the interpretation of the model to the data. GANExplainer relies on the framework of generative-adversarial method to train the generator and discriminator at the same time. More importantly, when constructing the discriminator, the corresponding graph rules are added to ensure the effectiveness of the generated characteristic subgraph. We carried out experiments on synthetic dataset and chemical molecules dataset and verified the effect of our method on model level interpreter from three aspects: accuracy, fidelity and sparsity.
{"title":"GANExplainer: Explainability Method for Graph Neural Network with Generative Adversarial Nets","authors":"Xinrui Kang, Dong Liang, Qinfeng Li","doi":"10.1145/3581807.3581850","DOIUrl":"https://doi.org/10.1145/3581807.3581850","url":null,"abstract":"In recent years, graph neural networks (GNNs) have achieved encouraging performance in the processing of graph data generated in non-Euclidean space. GNNs learn node features by aggregating and combining neighbor information, which is applied to many graphics tasks. However, the complex deep learning structure is still regarded as a black box, which is difficult to obtain the full trust of human beings. Due to the lack of interpretability, the application of graph neural network is greatly limited. Therefore, we propose an interpretable method, called GANExplainer, to explain GNNs at the model level. Our method can implicitly generate the characteristic subgraph of the graph without relying on specific input examples as the interpretation of the model to the data. GANExplainer relies on the framework of generative-adversarial method to train the generator and discriminator at the same time. More importantly, when constructing the discriminator, the corresponding graph rules are added to ensure the effectiveness of the generated characteristic subgraph. We carried out experiments on synthetic dataset and chemical molecules dataset and verified the effect of our method on model level interpreter from three aspects: accuracy, fidelity and sparsity.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133542357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern multi-object tracking (MOT) benefited from recent advances in deep neural network and large video datasets. However, there are still some challenges impeding further improvement of the tracking performance, including complex background, fast motion and occlusion scenes. In this paper, we propose a new framework which employs motion information with optical flow, enable directly distinguishing the foreground and background regions. The proposed end-to-end network consists of two branches to separately model the spatial feature representations and optical flow motion patterns. We propose different fusion mechanism by combining the motion clues and appearance information. The results on MOT17 dataset show that our method is an effective mechanism in modeling temporal-spatial information.
{"title":"MMOT: Motion-Aware Multi-Object Tracking with Optical Flow","authors":"Haodong Liu, Tianyang Xu, Xiaojun Wu","doi":"10.1145/3581807.3581824","DOIUrl":"https://doi.org/10.1145/3581807.3581824","url":null,"abstract":"Modern multi-object tracking (MOT) benefited from recent advances in deep neural network and large video datasets. However, there are still some challenges impeding further improvement of the tracking performance, including complex background, fast motion and occlusion scenes. In this paper, we propose a new framework which employs motion information with optical flow, enable directly distinguishing the foreground and background regions. The proposed end-to-end network consists of two branches to separately model the spatial feature representations and optical flow motion patterns. We propose different fusion mechanism by combining the motion clues and appearance information. The results on MOT17 dataset show that our method is an effective mechanism in modeling temporal-spatial information.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133795845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengcui Cheng, Xiaoling Chen, T. Zhang, Ziyi Wang, Guangzhi He, Y. Tong, P. Xie
Objective: Understanding the cortical activation patters can play an important role in exploring the motor control mechanisms in elderly subjects. This study investigates the hemodynamic responses in elderly subjects during the upper-limb movements using functional near-infrared spectroscopy (fNIRS). Methods: The multi-channel fNIRS signals were continuously recorded from the bilateral prefrontal cortex (PFC) and motor cortex (MC) in eight healthy elderly subjects during the resting state (RS), right and left upper-limb movements (RM and LM). In this study, we applied the generalized linear model (GLM) informed in the NIRS-SPM software to compute the changes of hemoglobin concentrations and describe the brain activations during motor tasks. Results: The results showed that the changes of oxyhemoglobin concentrations were more concentrated in the left motor cortex of subjects during the RM task, and in the right hemisphere including prefrontal cortex and motor cortex during the LM task. Further analysis also showed that there was a significant difference between two hemispheres in the RM and LM tasks, while no difference in the RS task. Conclusions: These findings suggested that the fNIRS signals could reliably quantify the neuronal activity during limb movements. This study may provide a new insight into the motor mechanism of the upper-limb movements and is significant for monitoring brain function.
{"title":"Activation During Upper Limb Movements Measured with Functional Near-Infrared Spectroscopy in Healthy Elderly Subjects","authors":"Shengcui Cheng, Xiaoling Chen, T. Zhang, Ziyi Wang, Guangzhi He, Y. Tong, P. Xie","doi":"10.1145/3581807.3581877","DOIUrl":"https://doi.org/10.1145/3581807.3581877","url":null,"abstract":"Objective: Understanding the cortical activation patters can play an important role in exploring the motor control mechanisms in elderly subjects. This study investigates the hemodynamic responses in elderly subjects during the upper-limb movements using functional near-infrared spectroscopy (fNIRS). Methods: The multi-channel fNIRS signals were continuously recorded from the bilateral prefrontal cortex (PFC) and motor cortex (MC) in eight healthy elderly subjects during the resting state (RS), right and left upper-limb movements (RM and LM). In this study, we applied the generalized linear model (GLM) informed in the NIRS-SPM software to compute the changes of hemoglobin concentrations and describe the brain activations during motor tasks. Results: The results showed that the changes of oxyhemoglobin concentrations were more concentrated in the left motor cortex of subjects during the RM task, and in the right hemisphere including prefrontal cortex and motor cortex during the LM task. Further analysis also showed that there was a significant difference between two hemispheres in the RM and LM tasks, while no difference in the RS task. Conclusions: These findings suggested that the fNIRS signals could reliably quantify the neuronal activity during limb movements. This study may provide a new insight into the motor mechanism of the upper-limb movements and is significant for monitoring brain function.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134188610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Deng, Junli Zhang, K. Cao, Miwei Shang, F. Han
Abstract: Epimedium, a traditional Chinese medicine, is widely used to treat neurodegenerative diseases such as Alzheimer's disease (AD). However, the conventional experimental methods based on proteomics and genomics in previous researches are difficult to comprehensively describe the mechanism of Epimedium in the treatment of AD. In this study, with the help of computer software, combined with the GEO database and the method of network pharmacology, the relevant pharmacological networks and core target networks were established and performed visual analysis. Then we carried out the GO and KEGG enrichment analysis to make a relatively comprehensive elaboration on the mechanism of Epimedium in treating AD, and screened the key mechanisms and targets. The results indicated that Epimedium may act on the key targets such as PIK3CB and BCL-2, and participating in the regulation of PI3K-Akt and calcium signaling pathways in the treatment of AD. This study provided a theoretical basis for in-depth analysis of Epimedium, and laid the foundation for the development of related new drugs.
{"title":"Combining GEO Database and the Method of Network Pharmacology to Explore the Molecular Mechanism of Epimedium in the Treatment of Alzheimer's Disease","authors":"Lei Deng, Junli Zhang, K. Cao, Miwei Shang, F. Han","doi":"10.1145/3581807.3581884","DOIUrl":"https://doi.org/10.1145/3581807.3581884","url":null,"abstract":"Abstract: Epimedium, a traditional Chinese medicine, is widely used to treat neurodegenerative diseases such as Alzheimer's disease (AD). However, the conventional experimental methods based on proteomics and genomics in previous researches are difficult to comprehensively describe the mechanism of Epimedium in the treatment of AD. In this study, with the help of computer software, combined with the GEO database and the method of network pharmacology, the relevant pharmacological networks and core target networks were established and performed visual analysis. Then we carried out the GO and KEGG enrichment analysis to make a relatively comprehensive elaboration on the mechanism of Epimedium in treating AD, and screened the key mechanisms and targets. The results indicated that Epimedium may act on the key targets such as PIK3CB and BCL-2, and participating in the regulation of PI3K-Akt and calcium signaling pathways in the treatment of AD. This study provided a theoretical basis for in-depth analysis of Epimedium, and laid the foundation for the development of related new drugs.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131874884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recognizing variations of text occurrences in scene photos is still difficult in the present day. In recent years, the performance of text recognition models based on the attention mechanism has vastly increased. However, these models typically focus on recognizing image regions or visual attention that are significant. In this paper, we present a unique paradigm for scene text recognition named gated co-attention. Using our suggested model, visual and semantic attention may be jointly reasoned. Given the visual features extracted by a convolutional network and the semantic features extracted by a language model, the first step involves combining the two sets of features. Second, the gated co-attention stage eliminates irrelevant visual characteristics and incorrect semantic data before fusing the knowledge of the two modalities. In addition, we analyze the performance of our model on several datasets, and the experimental results demonstrate that our method has outstanding performance on all seven datasets, with the best results reached on four datasets.
{"title":"Improved Fusion of Visual and Semantic Representations by Gated Co-Attention for Scene Text Recognition","authors":"Junwei Zhou, Xi Wang, Jiao Dai, Jizhong Han","doi":"10.1145/3581807.3581837","DOIUrl":"https://doi.org/10.1145/3581807.3581837","url":null,"abstract":"Recognizing variations of text occurrences in scene photos is still difficult in the present day. In recent years, the performance of text recognition models based on the attention mechanism has vastly increased. However, these models typically focus on recognizing image regions or visual attention that are significant. In this paper, we present a unique paradigm for scene text recognition named gated co-attention. Using our suggested model, visual and semantic attention may be jointly reasoned. Given the visual features extracted by a convolutional network and the semantic features extracted by a language model, the first step involves combining the two sets of features. Second, the gated co-attention stage eliminates irrelevant visual characteristics and incorrect semantic data before fusing the knowledge of the two modalities. In addition, we analyze the performance of our model on several datasets, and the experimental results demonstrate that our method has outstanding performance on all seven datasets, with the best results reached on four datasets.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114758044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.
{"title":"Semantic Maximum Relevance and Modal Alignment for Cross-Modal Retrieval","authors":"Pingping Sun, Baohua Qiang, Zhiguang Liu, Xianyi Yang, Guangyong Xi, Weigang Liu, Ruidong Chen, S. Zhang","doi":"10.1145/3581807.3581857","DOIUrl":"https://doi.org/10.1145/3581807.3581857","url":null,"abstract":"With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116152215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.
{"title":"Research on Phoneme Recognition using Attention-based Methods","authors":"Yupei Zhang","doi":"10.1145/3581807.3581866","DOIUrl":"https://doi.org/10.1145/3581807.3581866","url":null,"abstract":"A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124721257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}