{"title":"DA-STD: Deformable Attention-Based Scene Text Detection in Arbitrary Shape","authors":"Xing Wu, Yangyang Qi, Bin Tang, Hairan Liu","doi":"10.1109/PIC53636.2021.9687065","DOIUrl":null,"url":null,"abstract":"Scene Text Detection (STD) is important for developing many popular technologies, such as Security and Automatic Driving. However, the existing text detection models are based on unified text shape and single background, which does not accord with the text characteristics in the natural scene. To detect arbitrarily shaped text with a complex background, we proposed a method based on deformable attention mechanism and named DA-STD. At first, a feature enhancement module named FPEM is applied to enhance the image’s ability of representation learning. In addition, unlike the attention in the vanilla Transformer, our method adopts the deformable attention module interested in the pixels around the sampling points rather than the global features to make relational modeling. Experiments show that not only can we effectively improve the performance of the model but also greatly save the computational cost in this way.","PeriodicalId":297239,"journal":{"name":"2021 IEEE International Conference on Progress in Informatics and Computing (PIC)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Progress in Informatics and Computing (PIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PIC53636.2021.9687065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Scene Text Detection (STD) is important for developing many popular technologies, such as Security and Automatic Driving. However, the existing text detection models are based on unified text shape and single background, which does not accord with the text characteristics in the natural scene. To detect arbitrarily shaped text with a complex background, we proposed a method based on deformable attention mechanism and named DA-STD. At first, a feature enhancement module named FPEM is applied to enhance the image’s ability of representation learning. In addition, unlike the attention in the vanilla Transformer, our method adopts the deformable attention module interested in the pixels around the sampling points rather than the global features to make relational modeling. Experiments show that not only can we effectively improve the performance of the model but also greatly save the computational cost in this way.