{"title":"一种基于连通构件游程特征的改进中文版式公式提取方法","authors":"Fang Yang, Chunning Hou, Xue-dong Tian","doi":"10.1109/ICVISP.2017.28","DOIUrl":null,"url":null,"abstract":"The mathematical formula extraction is the prerequisite of formula structure analysis, recognition and retrieval. This paper studies the formula extraction method for the printed Chinese scientific and technical document images, proposes a criterion based on connected component run-length feature to estimate formulae in text lines, and then improves the formula location method based on rules. The connected component run-length's change regularity was analyzed firstly for all symbols in a text line. Then Change-rate threshold was set to estimate whether there is formula in this line. Finally, improved formula extraction method was given. The experimental results on the samples collected from printed Chinese scientific and technical documents showed that the proposed method is effective in estimate the embedded formula, and improves the accuracy of the formula location.","PeriodicalId":404467,"journal":{"name":"2017 International Conference on Vision, Image and Signal Processing (ICVISP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Improved Formula Extraction Method of Printed Chinese Layouts Based on Connected Component Run-Length Feature\",\"authors\":\"Fang Yang, Chunning Hou, Xue-dong Tian\",\"doi\":\"10.1109/ICVISP.2017.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The mathematical formula extraction is the prerequisite of formula structure analysis, recognition and retrieval. This paper studies the formula extraction method for the printed Chinese scientific and technical document images, proposes a criterion based on connected component run-length feature to estimate formulae in text lines, and then improves the formula location method based on rules. The connected component run-length's change regularity was analyzed firstly for all symbols in a text line. Then Change-rate threshold was set to estimate whether there is formula in this line. Finally, improved formula extraction method was given. The experimental results on the samples collected from printed Chinese scientific and technical documents showed that the proposed method is effective in estimate the embedded formula, and improves the accuracy of the formula location.\",\"PeriodicalId\":404467,\"journal\":{\"name\":\"2017 International Conference on Vision, Image and Signal Processing (ICVISP)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Vision, Image and Signal Processing (ICVISP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICVISP.2017.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Vision, Image and Signal Processing (ICVISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICVISP.2017.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Improved Formula Extraction Method of Printed Chinese Layouts Based on Connected Component Run-Length Feature
The mathematical formula extraction is the prerequisite of formula structure analysis, recognition and retrieval. This paper studies the formula extraction method for the printed Chinese scientific and technical document images, proposes a criterion based on connected component run-length feature to estimate formulae in text lines, and then improves the formula location method based on rules. The connected component run-length's change regularity was analyzed firstly for all symbols in a text line. Then Change-rate threshold was set to estimate whether there is formula in this line. Finally, improved formula extraction method was given. The experimental results on the samples collected from printed Chinese scientific and technical documents showed that the proposed method is effective in estimate the embedded formula, and improves the accuracy of the formula location.