M. Akopyan, O.V. Belyaeva, T.P. Plechov, D. Turdakov
{"title":"基于社交媒体图像的文本识别","authors":"M. Akopyan, O.V. Belyaeva, T.P. Plechov, D. Turdakov","doi":"10.1109/IVMEM.2019.00006","DOIUrl":null,"url":null,"abstract":"Text recognition problem has been studied many years. A few OCR engines exist, which successfully solve the problem for many languages. But these engines work well only with high quality scanned images. Social networks nowadays contain large number of images that need to analyze and recognize the text contained in them, but they have different quality: mixed text with images, poor quality images taken from camera of smartphone, etc. In this paper a text extraction pipeline is provided to address text extraction from various quality images collected form social media. Input images are categorized into different classes and then class specific preprocessing is applied to them (illumination improvement, text localization etc.). Then OCR engine used to recognize text. In the paper we present results of our experiments on dataset collected from social media.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Text Recognition on Images from Social Media\",\"authors\":\"M. Akopyan, O.V. Belyaeva, T.P. Plechov, D. Turdakov\",\"doi\":\"10.1109/IVMEM.2019.00006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text recognition problem has been studied many years. A few OCR engines exist, which successfully solve the problem for many languages. But these engines work well only with high quality scanned images. Social networks nowadays contain large number of images that need to analyze and recognize the text contained in them, but they have different quality: mixed text with images, poor quality images taken from camera of smartphone, etc. In this paper a text extraction pipeline is provided to address text extraction from various quality images collected form social media. Input images are categorized into different classes and then class specific preprocessing is applied to them (illumination improvement, text localization etc.). Then OCR engine used to recognize text. In the paper we present results of our experiments on dataset collected from social media.\",\"PeriodicalId\":166102,\"journal\":{\"name\":\"2019 Ivannikov Memorial Workshop (IVMEM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Ivannikov Memorial Workshop (IVMEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IVMEM.2019.00006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Ivannikov Memorial Workshop (IVMEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVMEM.2019.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text recognition problem has been studied many years. A few OCR engines exist, which successfully solve the problem for many languages. But these engines work well only with high quality scanned images. Social networks nowadays contain large number of images that need to analyze and recognize the text contained in them, but they have different quality: mixed text with images, poor quality images taken from camera of smartphone, etc. In this paper a text extraction pipeline is provided to address text extraction from various quality images collected form social media. Input images are categorized into different classes and then class specific preprocessing is applied to them (illumination improvement, text localization etc.). Then OCR engine used to recognize text. In the paper we present results of our experiments on dataset collected from social media.