At present, a large number of cities are facing the situation of "garbage besieged", and the existing garbage disposal system can no longer meet the increasingly complex factors. With the development of a new generation of Internet of Things technology, integrating the knowledge and technology of related disciplines such as network and geographic information, it is possible to build a real-time monitoring platform for seepage and odor in landfills to complete gas monitoring. The author of the paper reviewed the related technologies of the Internet of Things, and proposed the design scheme of the online monitoring system for odor and seepage of the Maiyuan garbage dump in Nanchang City, selected 5 monitoring items, completed the data collection, and used the collected data to use Matlab and python software to carry out simulation analysis and prediction, and finally discuss the main factors and treatment measures of environmental pollution, provide theoretical guidance for relevant managers to improve the overall management decision-making level of urban domestic garbage dumps, and draw some practical conclusions.
{"title":"Real-Time Calibration Method of Air Quality Data Based on AdaBoost Training Model","authors":"Xuejing Jiang, Xun Sun, Qiuming Liu","doi":"10.1145/3581807.3581882","DOIUrl":"https://doi.org/10.1145/3581807.3581882","url":null,"abstract":"At present, a large number of cities are facing the situation of \"garbage besieged\", and the existing garbage disposal system can no longer meet the increasingly complex factors. With the development of a new generation of Internet of Things technology, integrating the knowledge and technology of related disciplines such as network and geographic information, it is possible to build a real-time monitoring platform for seepage and odor in landfills to complete gas monitoring. The author of the paper reviewed the related technologies of the Internet of Things, and proposed the design scheme of the online monitoring system for odor and seepage of the Maiyuan garbage dump in Nanchang City, selected 5 monitoring items, completed the data collection, and used the collected data to use Matlab and python software to carry out simulation analysis and prediction, and finally discuss the main factors and treatment measures of environmental pollution, provide theoretical guidance for relevant managers to improve the overall management decision-making level of urban domestic garbage dumps, and draw some practical conclusions.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126848474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RF front-end design is one of the most important steps in receiver design, and its noise performance has a significant impact on the received signal noise characteristics, baseband processing performance, the final positioning accuracy and other indicators. The theoretical IF value of the satellite signal received by the receiver and the frequency search range of the signal acquisition algorithm depend on the frequency setting scheme of the RF front-end. This paper introduces the RF front-end design of a dual-channel multi-mode multi-frequency receiver, mainly for BI1 and L1C frequency points, the circuit design of the RF end of the receiver is introduced in detail, and the corresponding solutions are introduced for the link attenuation, signal radiation, electromagnetic interference and other conditions of the RF link. And the corresponding link simulation is carried out through ADS to ensure the reliability of the design. The final product is tested by the corresponding hardware and software, and the expected effect is achieved.
{"title":"Design and Simulation of A RF Front-end Circuit of Dual Channel Navigation Receiver","authors":"Yu Zhang, Qiang Wu, Jie Liu","doi":"10.1145/3581807.3581905","DOIUrl":"https://doi.org/10.1145/3581807.3581905","url":null,"abstract":"RF front-end design is one of the most important steps in receiver design, and its noise performance has a significant impact on the received signal noise characteristics, baseband processing performance, the final positioning accuracy and other indicators. The theoretical IF value of the satellite signal received by the receiver and the frequency search range of the signal acquisition algorithm depend on the frequency setting scheme of the RF front-end. This paper introduces the RF front-end design of a dual-channel multi-mode multi-frequency receiver, mainly for BI1 and L1C frequency points, the circuit design of the RF end of the receiver is introduced in detail, and the corresponding solutions are introduced for the link attenuation, signal radiation, electromagnetic interference and other conditions of the RF link. And the corresponding link simulation is carried out through ADS to ensure the reliability of the design. The final product is tested by the corresponding hardware and software, and the expected effect is achieved.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123792351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past decade, new technologies and applications such as artificial intelligence, cloud services, and big data have led to an exponential increase in Internet-connected devices and data traffic. This puts forward higher requirements for bandwidth and stability of data transmission. In order to achieve the goal of device integration and miniaturization, the backplane structure is often used to realize the interconnect between board and card systems. The backplane is used as the basis for data exchange. However, the long-distance transmission across the backplane will cause serious losses and various signal integrity problems. In recent years, with the development of serial transmission, 56Gbps PAM4 modulation transmission has gradually shown transmission efficiency beyond 28Gbps NRZ. The change of modulation mode brings an inherent loss of 9.5dB, and PAM4 modulation has more stringent requirements on signal integrity. In this paper, ADS is used to model and simulate the cross-backplane long-distance transmission channel, and a set of high-speed transmission channel design scheme based on OIF CSI-56G-LR specification is established from the aspects of plate, laminates and holes.
{"title":"Simulation and Design of a 56Gbps Cross-backplane Transmission Channel","authors":"Kai Yao, Qiang Wu, Jinling Cui","doi":"10.1145/3581807.3581867","DOIUrl":"https://doi.org/10.1145/3581807.3581867","url":null,"abstract":"Over the past decade, new technologies and applications such as artificial intelligence, cloud services, and big data have led to an exponential increase in Internet-connected devices and data traffic. This puts forward higher requirements for bandwidth and stability of data transmission. In order to achieve the goal of device integration and miniaturization, the backplane structure is often used to realize the interconnect between board and card systems. The backplane is used as the basis for data exchange. However, the long-distance transmission across the backplane will cause serious losses and various signal integrity problems. In recent years, with the development of serial transmission, 56Gbps PAM4 modulation transmission has gradually shown transmission efficiency beyond 28Gbps NRZ. The change of modulation mode brings an inherent loss of 9.5dB, and PAM4 modulation has more stringent requirements on signal integrity. In this paper, ADS is used to model and simulate the cross-backplane long-distance transmission channel, and a set of high-speed transmission channel design scheme based on OIF CSI-56G-LR specification is established from the aspects of plate, laminates and holes.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121522834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, graph neural networks (GNNs) have achieved encouraging performance in the processing of graph data generated in non-Euclidean space. GNNs learn node features by aggregating and combining neighbor information, which is applied to many graphics tasks. However, the complex deep learning structure is still regarded as a black box, which is difficult to obtain the full trust of human beings. Due to the lack of interpretability, the application of graph neural network is greatly limited. Therefore, we propose an interpretable method, called GANExplainer, to explain GNNs at the model level. Our method can implicitly generate the characteristic subgraph of the graph without relying on specific input examples as the interpretation of the model to the data. GANExplainer relies on the framework of generative-adversarial method to train the generator and discriminator at the same time. More importantly, when constructing the discriminator, the corresponding graph rules are added to ensure the effectiveness of the generated characteristic subgraph. We carried out experiments on synthetic dataset and chemical molecules dataset and verified the effect of our method on model level interpreter from three aspects: accuracy, fidelity and sparsity.
{"title":"GANExplainer: Explainability Method for Graph Neural Network with Generative Adversarial Nets","authors":"Xinrui Kang, Dong Liang, Qinfeng Li","doi":"10.1145/3581807.3581850","DOIUrl":"https://doi.org/10.1145/3581807.3581850","url":null,"abstract":"In recent years, graph neural networks (GNNs) have achieved encouraging performance in the processing of graph data generated in non-Euclidean space. GNNs learn node features by aggregating and combining neighbor information, which is applied to many graphics tasks. However, the complex deep learning structure is still regarded as a black box, which is difficult to obtain the full trust of human beings. Due to the lack of interpretability, the application of graph neural network is greatly limited. Therefore, we propose an interpretable method, called GANExplainer, to explain GNNs at the model level. Our method can implicitly generate the characteristic subgraph of the graph without relying on specific input examples as the interpretation of the model to the data. GANExplainer relies on the framework of generative-adversarial method to train the generator and discriminator at the same time. More importantly, when constructing the discriminator, the corresponding graph rules are added to ensure the effectiveness of the generated characteristic subgraph. We carried out experiments on synthetic dataset and chemical molecules dataset and verified the effect of our method on model level interpreter from three aspects: accuracy, fidelity and sparsity.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133542357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern multi-object tracking (MOT) benefited from recent advances in deep neural network and large video datasets. However, there are still some challenges impeding further improvement of the tracking performance, including complex background, fast motion and occlusion scenes. In this paper, we propose a new framework which employs motion information with optical flow, enable directly distinguishing the foreground and background regions. The proposed end-to-end network consists of two branches to separately model the spatial feature representations and optical flow motion patterns. We propose different fusion mechanism by combining the motion clues and appearance information. The results on MOT17 dataset show that our method is an effective mechanism in modeling temporal-spatial information.
{"title":"MMOT: Motion-Aware Multi-Object Tracking with Optical Flow","authors":"Haodong Liu, Tianyang Xu, Xiaojun Wu","doi":"10.1145/3581807.3581824","DOIUrl":"https://doi.org/10.1145/3581807.3581824","url":null,"abstract":"Modern multi-object tracking (MOT) benefited from recent advances in deep neural network and large video datasets. However, there are still some challenges impeding further improvement of the tracking performance, including complex background, fast motion and occlusion scenes. In this paper, we propose a new framework which employs motion information with optical flow, enable directly distinguishing the foreground and background regions. The proposed end-to-end network consists of two branches to separately model the spatial feature representations and optical flow motion patterns. We propose different fusion mechanism by combining the motion clues and appearance information. The results on MOT17 dataset show that our method is an effective mechanism in modeling temporal-spatial information.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133795845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recognizing variations of text occurrences in scene photos is still difficult in the present day. In recent years, the performance of text recognition models based on the attention mechanism has vastly increased. However, these models typically focus on recognizing image regions or visual attention that are significant. In this paper, we present a unique paradigm for scene text recognition named gated co-attention. Using our suggested model, visual and semantic attention may be jointly reasoned. Given the visual features extracted by a convolutional network and the semantic features extracted by a language model, the first step involves combining the two sets of features. Second, the gated co-attention stage eliminates irrelevant visual characteristics and incorrect semantic data before fusing the knowledge of the two modalities. In addition, we analyze the performance of our model on several datasets, and the experimental results demonstrate that our method has outstanding performance on all seven datasets, with the best results reached on four datasets.
{"title":"Improved Fusion of Visual and Semantic Representations by Gated Co-Attention for Scene Text Recognition","authors":"Junwei Zhou, Xi Wang, Jiao Dai, Jizhong Han","doi":"10.1145/3581807.3581837","DOIUrl":"https://doi.org/10.1145/3581807.3581837","url":null,"abstract":"Recognizing variations of text occurrences in scene photos is still difficult in the present day. In recent years, the performance of text recognition models based on the attention mechanism has vastly increased. However, these models typically focus on recognizing image regions or visual attention that are significant. In this paper, we present a unique paradigm for scene text recognition named gated co-attention. Using our suggested model, visual and semantic attention may be jointly reasoned. Given the visual features extracted by a convolutional network and the semantic features extracted by a language model, the first step involves combining the two sets of features. Second, the gated co-attention stage eliminates irrelevant visual characteristics and incorrect semantic data before fusing the knowledge of the two modalities. In addition, we analyze the performance of our model on several datasets, and the experimental results demonstrate that our method has outstanding performance on all seven datasets, with the best results reached on four datasets.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114758044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.
{"title":"Semantic Maximum Relevance and Modal Alignment for Cross-Modal Retrieval","authors":"Pingping Sun, Baohua Qiang, Zhiguang Liu, Xianyi Yang, Guangyong Xi, Weigang Liu, Ruidong Chen, S. Zhang","doi":"10.1145/3581807.3581857","DOIUrl":"https://doi.org/10.1145/3581807.3581857","url":null,"abstract":"With the increasing abundance of multimedia data resources, researches on mining the relationship between different modalities to achieve refined cross-modal retrieval are gradually emerging. In this paper, we propose a novel Semantic Maximum Relevance and Modal Alignment (SMR-MA) for Cross-Modal Retrieval, which utilizes the pre-trained model with abundant image text information to extract the features of each image text, and further promotes the modal information interaction between the same semantic categories through the modal alignment module and the multi-layer perceptron with shared weights. In addition, multi-modal embedding is distributed to the normalized hypersphere, and angular edge penalty is applied between feature embedding and weight in angular space to maximize the classification boundary, thus increasing both intra-class similarity and inter-class difference. Comprehensive analysis experiments on three benchmark datasets demonstrate that the proposed method has superior performance in cross-modal retrieval tasks and is significantly superior to the state-of-the-art cross-modal retrieval methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116152215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.
{"title":"Research on Phoneme Recognition using Attention-based Methods","authors":"Yupei Zhang","doi":"10.1145/3581807.3581866","DOIUrl":"https://doi.org/10.1145/3581807.3581866","url":null,"abstract":"A phoneme is the smallest sound unit of a language. Every language has its corresponding phonemes. Phoneme recognition can be used in speech-based applications such as auto speech recognition and lip sync. This paper proposes an end-to-end deep learning model called Connectionist Temporal Classification (CTC) and attention-based seq2seq network that consists of one bi-GRU layer in the encoder and one GRU layer in the decoder, for recognizing the phonemes in speech. Experiments on the TIMIT dataset demonstrate its advantages on some other seq2seq networks, with over 50% improvements after applying the attention mechanism.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124721257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In Chinese medicine, the patient's body constitution plays a crucial role in determining the course of treatment because it is so intrinsically linked to the patient's physiological and pathological processes. Traditional Chinese medicine practitioners use tongue diagnosis to determine a person's constitutional type during an examination. An effective solution is needed to overcome the complexity of this setting before the tongue image constitution recognition system can be deployed on a non-invasive mobile device for fast, efficient, and accurate constitution recognition. We will use deep deterministic policy gradients to implement tongue retrieval techniques. We suggested a new method for image retrieval systems based on Deep Deterministic Policy Gradients (DDPG) in an effort to boost the precision of database searches for query images. We present a strategy for enhancing image retrieval accuracy that uses the complexity of individual instances to split the dataset into two subsets for independent classification using Deep reinforcement learning. Experiments on tongue datasets are performed to gauge the efficacy of our suggested approach; in these experiments, deep reinforcement learning techniques are applied to develop a retrieval system for pictures of tongues affected by various disorders. Using our proposed strategy, it may be possible to enhance image retrieval accuracy through enhanced recognition of tongue diseases. Databases containing pictures of tongues affected by a wide range of disorders will be used as examples. The experimental results suggest that the new approach to computing the main colour histogram outperforms the prior one. Though the difference is tiny statistically, the enhanced retrieval impact is clear to the human eye. The tongue is similarly brought to the fore to emphasise the importance of the required verbal statement. Both investigations used tongue images classified into five distinct categories.
{"title":"Tongue Image Retrieval Based On Reinforcement Learning","authors":"A. Farooq, Xinfeng Zhang","doi":"10.1145/3581807.3581848","DOIUrl":"https://doi.org/10.1145/3581807.3581848","url":null,"abstract":"In Chinese medicine, the patient's body constitution plays a crucial role in determining the course of treatment because it is so intrinsically linked to the patient's physiological and pathological processes. Traditional Chinese medicine practitioners use tongue diagnosis to determine a person's constitutional type during an examination. An effective solution is needed to overcome the complexity of this setting before the tongue image constitution recognition system can be deployed on a non-invasive mobile device for fast, efficient, and accurate constitution recognition. We will use deep deterministic policy gradients to implement tongue retrieval techniques. We suggested a new method for image retrieval systems based on Deep Deterministic Policy Gradients (DDPG) in an effort to boost the precision of database searches for query images. We present a strategy for enhancing image retrieval accuracy that uses the complexity of individual instances to split the dataset into two subsets for independent classification using Deep reinforcement learning. Experiments on tongue datasets are performed to gauge the efficacy of our suggested approach; in these experiments, deep reinforcement learning techniques are applied to develop a retrieval system for pictures of tongues affected by various disorders. Using our proposed strategy, it may be possible to enhance image retrieval accuracy through enhanced recognition of tongue diseases. Databases containing pictures of tongues affected by a wide range of disorders will be used as examples. The experimental results suggest that the new approach to computing the main colour histogram outperforms the prior one. Though the difference is tiny statistically, the enhanced retrieval impact is clear to the human eye. The tongue is similarly brought to the fore to emphasise the importance of the required verbal statement. Both investigations used tongue images classified into five distinct categories.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121905344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kangming Weng, X. Du, Kunze Chen, Dahan Wang, Shunzhi Zhu
The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive research has shown that the segmentation-based method will be disturbed by adjoining pixels and cannot effectively identify the text boundaries. To tackle this problem, we proposed a ResAsapp Conv based on the PSE algorithm. This convolution structure can provide different scale visual fields about the object and make it effectively recognize the boundary of texts. The method's effectiveness is validated on three benchmark datasets, CTW1500, Total-Text, and ICDAR2015 datasets. In particular, on the CTW1500 dataset, a dataset full of long curve text in all kinds of scenes, which is hard to distinguish, our network achieves an F-measure of 81.2%.
{"title":"ResAsapp: An Effective Convolution to Distinguish Adjacent Pixels For Scene Text Detection","authors":"Kangming Weng, X. Du, Kunze Chen, Dahan Wang, Shunzhi Zhu","doi":"10.1145/3581807.3581854","DOIUrl":"https://doi.org/10.1145/3581807.3581854","url":null,"abstract":"The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive research has shown that the segmentation-based method will be disturbed by adjoining pixels and cannot effectively identify the text boundaries. To tackle this problem, we proposed a ResAsapp Conv based on the PSE algorithm. This convolution structure can provide different scale visual fields about the object and make it effectively recognize the boundary of texts. The method's effectiveness is validated on three benchmark datasets, CTW1500, Total-Text, and ICDAR2015 datasets. In particular, on the CTW1500 dataset, a dataset full of long curve text in all kinds of scenes, which is hard to distinguish, our network achieves an F-measure of 81.2%.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125191899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}