Rustem Damirovich Saitgareev, B. R. Giniatullin, Vladislav Yurievich Toporov, Artur Aleksandrovich Atnagulov, Farid Radikovich Aglyamov
{"title":"从结构相似的扫描文档中提取数据","authors":"Rustem Damirovich Saitgareev, B. R. Giniatullin, Vladislav Yurievich Toporov, Artur Aleksandrovich Atnagulov, Farid Radikovich Aglyamov","doi":"10.26907/1562-5419-2021-24-4-667-688","DOIUrl":null,"url":null,"abstract":"Currently, the major part of transmitted and stored data is unstructured, and the amount of unstructured data is growing rapidly each year, although it is hardly searchable, unqueryable, and its processing is not automated. At the same time, there is a growth of electronic document management systems. This paper proposes a solution for extracting data from paper documents considering their structure and layout based on document photos. By examining different approaches, including neural networks and plain algorithmic methods, we present their results and discuss them.","PeriodicalId":262909,"journal":{"name":"Russian Digital Libraries Journal","volume":"181 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Extraction from Similarly Structured Scanned Documents\",\"authors\":\"Rustem Damirovich Saitgareev, B. R. Giniatullin, Vladislav Yurievich Toporov, Artur Aleksandrovich Atnagulov, Farid Radikovich Aglyamov\",\"doi\":\"10.26907/1562-5419-2021-24-4-667-688\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, the major part of transmitted and stored data is unstructured, and the amount of unstructured data is growing rapidly each year, although it is hardly searchable, unqueryable, and its processing is not automated. At the same time, there is a growth of electronic document management systems. This paper proposes a solution for extracting data from paper documents considering their structure and layout based on document photos. By examining different approaches, including neural networks and plain algorithmic methods, we present their results and discuss them.\",\"PeriodicalId\":262909,\"journal\":{\"name\":\"Russian Digital Libraries Journal\",\"volume\":\"181 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Russian Digital Libraries Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26907/1562-5419-2021-24-4-667-688\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Digital Libraries Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26907/1562-5419-2021-24-4-667-688","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data Extraction from Similarly Structured Scanned Documents
Currently, the major part of transmitted and stored data is unstructured, and the amount of unstructured data is growing rapidly each year, although it is hardly searchable, unqueryable, and its processing is not automated. At the same time, there is a growth of electronic document management systems. This paper proposes a solution for extracting data from paper documents considering their structure and layout based on document photos. By examining different approaches, including neural networks and plain algorithmic methods, we present their results and discuss them.