{"title":"Text Detection of Clinical Medical Documents Based on SWT Algorithm","authors":"Jingyi Wang, Zhao Liu","doi":"10.1145/3424978.3425119","DOIUrl":null,"url":null,"abstract":"Clinical medical document images are rich in rich text information, and the detection of text areas is the basis for subsequent text analysis. However, the existing text detection algorithms are mainly for a single language, and the results for mixed Chinese and English text detection are not ideal. In this regard, this paper proposes a hybrid Chinese and English text detection algorithm based on the stroke width transform (SWT) algorithm. The algorithm first preprocesses the image, then determines the connected domain, and determines and filters the text area based on the morphological rules of the connected domain, then connects the pixels into Chinese characters and English characters according to the stroke characteristics, and finally outputs the text area result of the image. The simulation experiment results show that the algorithm can detect the Chinese and English mixed text areas in clinical medical document images better than the traditional text detection algorithm, and the effect is better.","PeriodicalId":178822,"journal":{"name":"Proceedings of the 4th International Conference on Computer Science and Application Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Computer Science and Application Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3424978.3425119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Clinical medical document images are rich in rich text information, and the detection of text areas is the basis for subsequent text analysis. However, the existing text detection algorithms are mainly for a single language, and the results for mixed Chinese and English text detection are not ideal. In this regard, this paper proposes a hybrid Chinese and English text detection algorithm based on the stroke width transform (SWT) algorithm. The algorithm first preprocesses the image, then determines the connected domain, and determines and filters the text area based on the morphological rules of the connected domain, then connects the pixels into Chinese characters and English characters according to the stroke characteristics, and finally outputs the text area result of the image. The simulation experiment results show that the algorithm can detect the Chinese and English mixed text areas in clinical medical document images better than the traditional text detection algorithm, and the effect is better.