软件从基于文字的文档中提取数据的方法是面向情境的

IF 0.4 Q4 MATHEMATICS, APPLIED Journal of Applied Mathematics & Informatics Pub Date : 2021-12-24 DOI:10.37791/2687-0649-2021-16-6-66-83
V. Mironov, A. Gusarenko, N. Yusupova
{"title":"软件从基于文字的文档中提取数据的方法是面向情境的","authors":"V. Mironov, A. Gusarenko, N. Yusupova","doi":"10.37791/2687-0649-2021-16-6-66-83","DOIUrl":null,"url":null,"abstract":"The article discusses the use of situation-oriented approach to software processing word-documents. The documents under consideration are prepared by the user in the environment of the Microsoft Word processor or its analogs and are used in the future as data sources. The openness of the Office Open XML and Open Document Format made it possible to apply the concept of virtual documents mapped to ZIP archives for programmatic access to XML components of word documents in a situational environment. The importance of developing preliminary agreements regarding the placement of information in the document for subsequent search and retrieval, for example, using pre-prepared templates, is substantiated. For the DOCX and ODT formats, the article discusses the use of key phrases, bookmarks, content controls, custom XML components to organize the extraction of entered data. For each option, tree-like models of access to the extracted data, as well as the corresponding XPath expressions, are built. It is noted that the use of one or another option depends on the functionality and limitations of the word processor and is characterized by varying complexity of developing a blank template, entering data by the user and programming data extraction. The applied solution is based on entering metadata into the article using content controls placed in a stub template and bound to elements of a custom XML component. The developed hierarchical situational model of HSM provides extraction of an XML component, loading it into a DOM object and XSLT transformations to obtain the resulting data: an error report and JavaScript code for subsequent use of the extracted metadata.","PeriodicalId":44195,"journal":{"name":"Journal of Applied Mathematics & Informatics","volume":"7 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2021-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Software extract data from word-based documents situationally-oriented approach\",\"authors\":\"V. Mironov, A. Gusarenko, N. Yusupova\",\"doi\":\"10.37791/2687-0649-2021-16-6-66-83\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article discusses the use of situation-oriented approach to software processing word-documents. The documents under consideration are prepared by the user in the environment of the Microsoft Word processor or its analogs and are used in the future as data sources. The openness of the Office Open XML and Open Document Format made it possible to apply the concept of virtual documents mapped to ZIP archives for programmatic access to XML components of word documents in a situational environment. The importance of developing preliminary agreements regarding the placement of information in the document for subsequent search and retrieval, for example, using pre-prepared templates, is substantiated. For the DOCX and ODT formats, the article discusses the use of key phrases, bookmarks, content controls, custom XML components to organize the extraction of entered data. For each option, tree-like models of access to the extracted data, as well as the corresponding XPath expressions, are built. It is noted that the use of one or another option depends on the functionality and limitations of the word processor and is characterized by varying complexity of developing a blank template, entering data by the user and programming data extraction. The applied solution is based on entering metadata into the article using content controls placed in a stub template and bound to elements of a custom XML component. The developed hierarchical situational model of HSM provides extraction of an XML component, loading it into a DOM object and XSLT transformations to obtain the resulting data: an error report and JavaScript code for subsequent use of the extracted metadata.\",\"PeriodicalId\":44195,\"journal\":{\"name\":\"Journal of Applied Mathematics & Informatics\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2021-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Applied Mathematics & Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37791/2687-0649-2021-16-6-66-83\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Mathematics & Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37791/2687-0649-2021-16-6-66-83","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0

摘要

本文讨论了面向情景的方法在软件处理word文档中的应用。所考虑的文档由用户在Microsoft Word处理程序或其类似程序的环境中编写,并在将来用作数据源。Office Open XML和Open Document Format的开放性使得将虚拟文档映射到ZIP档案的概念应用于情景环境中对word文档的XML组件的编程访问成为可能。关于在文件中放置资料以便随后搜索和检索的初步协议的重要性得到证实,例如,使用预先编制的模板。对于DOCX和ODT格式,本文讨论了如何使用关键短语、书签、内容控件、自定义XML组件来组织输入数据的提取。对于每个选项,都构建了访问提取数据的树状模型以及相应的XPath表达式。应当指出,使用一种或另一种选择取决于文字处理机的功能和限制,其特点是开发空白模板、由用户输入数据和编写数据提取程序的复杂性各不相同。所应用的解决方案基于使用放置在存根模板中的内容控件将元数据输入到文章中,并绑定到自定义XML组件的元素。开发的HSM分层情景模型提供XML组件的提取、将其加载到DOM对象和XSLT转换以获得结果数据:一个错误报告和用于随后使用提取的元数据的JavaScript代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Software extract data from word-based documents situationally-oriented approach
The article discusses the use of situation-oriented approach to software processing word-documents. The documents under consideration are prepared by the user in the environment of the Microsoft Word processor or its analogs and are used in the future as data sources. The openness of the Office Open XML and Open Document Format made it possible to apply the concept of virtual documents mapped to ZIP archives for programmatic access to XML components of word documents in a situational environment. The importance of developing preliminary agreements regarding the placement of information in the document for subsequent search and retrieval, for example, using pre-prepared templates, is substantiated. For the DOCX and ODT formats, the article discusses the use of key phrases, bookmarks, content controls, custom XML components to organize the extraction of entered data. For each option, tree-like models of access to the extracted data, as well as the corresponding XPath expressions, are built. It is noted that the use of one or another option depends on the functionality and limitations of the word processor and is characterized by varying complexity of developing a blank template, entering data by the user and programming data extraction. The applied solution is based on entering metadata into the article using content controls placed in a stub template and bound to elements of a custom XML component. The developed hierarchical situational model of HSM provides extraction of an XML component, loading it into a DOM object and XSLT transformations to obtain the resulting data: an error report and JavaScript code for subsequent use of the extracted metadata.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.70
自引率
0.00%
发文量
0
期刊最新文献
User interface modeling for convolutional neural network for complex character recognition On segmentation of brain tumors by MRI images with deep learning methods Functional formation of a neuromorphic reservoir computational element based on a memristive metamaterial Fuzzy model of a multi-stage chemical-energy-technological processing system fine ore raw materials Computational concept for human food choice and eating behaviour
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1