基于条件随机场的出生数字图像文本提取研究

Q4 Computer Science International Journal of High Performance Systems Architecture Pub Date : 2014-03-01 DOI:10.1504/IJHPSA.2014.059873

Zhang Jian, Cheng Ren-hong, Wang Kai, Zhao Hong

{"title":"基于条件随机场的出生数字图像文本提取研究","authors":"Zhang Jian, Cheng Ren-hong, Wang Kai, Zhao Hong","doi":"10.1504/IJHPSA.2014.059873","DOIUrl":null,"url":null,"abstract":"With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.","PeriodicalId":39217,"journal":{"name":"International Journal of High Performance Systems Architecture","volume":"5 1","pages":"39-49"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJHPSA.2014.059873","citationCount":"1","resultStr":"{\"title\":\"Research on born-digital image text extraction based on conditional random field\",\"authors\":\"Zhang Jian, Cheng Ren-hong, Wang Kai, Zhao Hong\",\"doi\":\"10.1504/IJHPSA.2014.059873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.\",\"PeriodicalId\":39217,\"journal\":{\"name\":\"International Journal of High Performance Systems Architecture\",\"volume\":\"5 1\",\"pages\":\"39-49\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1504/IJHPSA.2014.059873\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of High Performance Systems Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJHPSA.2014.059873\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of High Performance Systems Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJHPSA.2014.059873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 1

摘要

随着电子邮件和网页中数字视频和数字图像数量的急剧增加，从图像中提取文本变得比以往任何时候都重要。原生数字图像是由计算机直接生成的，图像中的文本对于帮助理解图像的语义非常重要。尽管近年来提出了许多方法来提取自然场景图像中的文本，但对数字图像的文本检测和提取仍然是一个挑战。提出了一种从原始数字图像中分割文本连通分量cc的新方法。首先，基于小波理论对给定图像进行二值化，得到所有候选文本cc;其次，基于自然语言处理中广泛使用的概率图模型条件随机场CRF，对提取的cc进行分类，对文本cc进行标注。实验结果表明，该方法可以有效地从原始数字图像中提取文本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Research on born-digital image text extraction based on conditional random field

With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of High Performance Systems Architecture Computer Science-Hardware and Architecture

CiteScore

2.00

自引率

0.00%

发文量