{"title":"(化学)DeT/sub E/X从位图图像自动生成(化学)文档的标记语言描述","authors":"A. Simon, Jean-Christophe Pret, A.P. Johnson","doi":"10.1109/ICDAR.1995.599035","DOIUrl":null,"url":null,"abstract":"This paper presents a novel view of document processing, as being the reverse process to T/sub E/X. This concept simplifies the analysis of the physical structure of documents, and also suggests the use of a style file for layout recognition. An algorithm is given for both phases, layout analysis and layout recognition. The bottom-up layout analysis method employed is based on the Kruskal's algorithm and uses the distances between the components to construct the physical page structure. The algorithm is linear with respect to the number of the connected components. For layout recognition, a document style description language (DSDL) is introduced. This helps a fault-tolerant, recursive parsing algorithm to label the blocks of the document. The presented methods were designed to be used for scientific publications (papers, reports, books), but could be applied to a broader range of documents.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"(Chem)DeT/sub E/X automatic generation of a markup language description of (chemical) documents from bitmap images\",\"authors\":\"A. Simon, Jean-Christophe Pret, A.P. Johnson\",\"doi\":\"10.1109/ICDAR.1995.599035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel view of document processing, as being the reverse process to T/sub E/X. This concept simplifies the analysis of the physical structure of documents, and also suggests the use of a style file for layout recognition. An algorithm is given for both phases, layout analysis and layout recognition. The bottom-up layout analysis method employed is based on the Kruskal's algorithm and uses the distances between the components to construct the physical page structure. The algorithm is linear with respect to the number of the connected components. For layout recognition, a document style description language (DSDL) is introduced. This helps a fault-tolerant, recursive parsing algorithm to label the blocks of the document. The presented methods were designed to be used for scientific publications (papers, reports, books), but could be applied to a broader range of documents.\",\"PeriodicalId\":273519,\"journal\":{\"name\":\"Proceedings of 3rd International Conference on Document Analysis and Recognition\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 3rd International Conference on Document Analysis and Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.1995.599035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 3rd International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.1995.599035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
(Chem)DeT/sub E/X automatic generation of a markup language description of (chemical) documents from bitmap images
This paper presents a novel view of document processing, as being the reverse process to T/sub E/X. This concept simplifies the analysis of the physical structure of documents, and also suggests the use of a style file for layout recognition. An algorithm is given for both phases, layout analysis and layout recognition. The bottom-up layout analysis method employed is based on the Kruskal's algorithm and uses the distances between the components to construct the physical page structure. The algorithm is linear with respect to the number of the connected components. For layout recognition, a document style description language (DSDL) is introduced. This helps a fault-tolerant, recursive parsing algorithm to label the blocks of the document. The presented methods were designed to be used for scientific publications (papers, reports, books), but could be applied to a broader range of documents.