{"title":"An efficient method for page segmentation","authors":"Xingyuan Li, W. Oh, S. Ji, K. Moon, Hyeon-Jin Kim","doi":"10.1109/ICICS.1997.652121","DOIUrl":null,"url":null,"abstract":"Page segmentation is necessary for optical character recognition (OCR) and also very useful for many other document image manipulations. We describe a bottom-up method for page segmentation. Connected components are extracted and clustered into a tree description according to their spatial relations. Then, a new iterative split and merge process is performed to refine the text blocks. We also propose new criterion for clustering the connected components and some new techniques to deal with noise and reduce the computation time. The experiment shows the method's efficiency.","PeriodicalId":71361,"journal":{"name":"信息通信技术","volume":"43 1","pages":"957-961 vol.2"},"PeriodicalIF":0.0000,"publicationDate":"1997-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"信息通信技术","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1109/ICICS.1997.652121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Page segmentation is necessary for optical character recognition (OCR) and also very useful for many other document image manipulations. We describe a bottom-up method for page segmentation. Connected components are extracted and clustered into a tree description according to their spatial relations. Then, a new iterative split and merge process is performed to refine the text blocks. We also propose new criterion for clustering the connected components and some new techniques to deal with noise and reduce the computation time. The experiment shows the method's efficiency.