Swapnil Khedekar, V. Ramanaprasad, S. Setlur, V. Govindaraju
{"title":"Text - image separation in Devanagari documents","authors":"Swapnil Khedekar, V. Ramanaprasad, S. Setlur, V. Govindaraju","doi":"10.1109/ICDAR.2003.1227861","DOIUrl":null,"url":null,"abstract":"In this paper we present a top-down, projection-profilebased algorithm to separate text blocks from image blocksin a Devanagari document. We use a distinctive feature ofDevanagari text, called Shirorekha (Header Line) to analyzethe pattern produced by Devanagari text in the horizontalprofile. The horizontal profile corresponding to a textblock possesses certain regularity in frequency, orientationand shows spatial cohesion. The algorithm uses these featuresto identify text blocks in a document image containingboth text and graphics.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2003.1227861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 54
Abstract
In this paper we present a top-down, projection-profilebased algorithm to separate text blocks from image blocksin a Devanagari document. We use a distinctive feature ofDevanagari text, called Shirorekha (Header Line) to analyzethe pattern produced by Devanagari text in the horizontalprofile. The horizontal profile corresponding to a textblock possesses certain regularity in frequency, orientationand shows spatial cohesion. The algorithm uses these featuresto identify text blocks in a document image containingboth text and graphics.