{"title":"利用 OpenCV 对传统格式财务报告进行文本提取的框架","authors":"Jiaxin Wei, Jin Yang, Xinyang Liu","doi":"10.3233/jifs-234170","DOIUrl":null,"url":null,"abstract":"Due to intensified off-balance sheet disclosure by regulatory authorities, financial reports now contain a substantial amount of information beyond the financial statements. Consequently, the length of footnotes in financial reports exceeds that of the financial statements. This poses a novel challenge for regulators and users of financial reports in efficiently managing this information. Financial reports, with their clear structure, encompass abundant structured information applicable to information extraction, automatic summarization, and information retrieval. Extracting headings and paragraph content from financial reports enables the acquisition of the annual report text’s framework. This paper focuses on extracting the structural framework of annual report texts and introduces an OpenCV-based method for text framework extraction using computer vision. The proposed method employs morphological image dilation to distinguish headings from the main body of the text. Moreover, this paper combines the proposed method with a traditional, rule-based extraction method that exploits the characteristic features of numbers and symbols at the beginning of headings. This combination results in an optimized framework extraction method, producing a more concise text framework.","PeriodicalId":54795,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":"29 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A text extraction framework of financial report in traditional format with OpenCV\",\"authors\":\"Jiaxin Wei, Jin Yang, Xinyang Liu\",\"doi\":\"10.3233/jifs-234170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to intensified off-balance sheet disclosure by regulatory authorities, financial reports now contain a substantial amount of information beyond the financial statements. Consequently, the length of footnotes in financial reports exceeds that of the financial statements. This poses a novel challenge for regulators and users of financial reports in efficiently managing this information. Financial reports, with their clear structure, encompass abundant structured information applicable to information extraction, automatic summarization, and information retrieval. Extracting headings and paragraph content from financial reports enables the acquisition of the annual report text’s framework. This paper focuses on extracting the structural framework of annual report texts and introduces an OpenCV-based method for text framework extraction using computer vision. The proposed method employs morphological image dilation to distinguish headings from the main body of the text. Moreover, this paper combines the proposed method with a traditional, rule-based extraction method that exploits the characteristic features of numbers and symbols at the beginning of headings. This combination results in an optimized framework extraction method, producing a more concise text framework.\",\"PeriodicalId\":54795,\"journal\":{\"name\":\"Journal of Intelligent & Fuzzy Systems\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent & Fuzzy Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3233/jifs-234170\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/jifs-234170","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A text extraction framework of financial report in traditional format with OpenCV
Due to intensified off-balance sheet disclosure by regulatory authorities, financial reports now contain a substantial amount of information beyond the financial statements. Consequently, the length of footnotes in financial reports exceeds that of the financial statements. This poses a novel challenge for regulators and users of financial reports in efficiently managing this information. Financial reports, with their clear structure, encompass abundant structured information applicable to information extraction, automatic summarization, and information retrieval. Extracting headings and paragraph content from financial reports enables the acquisition of the annual report text’s framework. This paper focuses on extracting the structural framework of annual report texts and introduces an OpenCV-based method for text framework extraction using computer vision. The proposed method employs morphological image dilation to distinguish headings from the main body of the text. Moreover, this paper combines the proposed method with a traditional, rule-based extraction method that exploits the characteristic features of numbers and symbols at the beginning of headings. This combination results in an optimized framework extraction method, producing a more concise text framework.
期刊介绍:
The purpose of the Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology is to foster advancements of knowledge and help disseminate results concerning recent applications and case studies in the areas of fuzzy logic, intelligent systems, and web-based applications among working professionals and professionals in education and research, covering a broad cross-section of technical disciplines.