{"title":"基于条件随机场的出生数字图像文本提取研究","authors":"Zhang Jian, Cheng Ren-hong, Wang Kai, Zhao Hong","doi":"10.1504/IJHPSA.2014.059873","DOIUrl":null,"url":null,"abstract":"With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.","PeriodicalId":39217,"journal":{"name":"International Journal of High Performance Systems Architecture","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJHPSA.2014.059873","citationCount":"1","resultStr":"{\"title\":\"Research on born-digital image text extraction based on conditional random field\",\"authors\":\"Zhang Jian, Cheng Ren-hong, Wang Kai, Zhao Hong\",\"doi\":\"10.1504/IJHPSA.2014.059873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.\",\"PeriodicalId\":39217,\"journal\":{\"name\":\"International Journal of High Performance Systems Architecture\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1504/IJHPSA.2014.059873\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of High Performance Systems Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJHPSA.2014.059873\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of High Performance Systems Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJHPSA.2014.059873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
Research on born-digital image text extraction based on conditional random field
With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.