Semantic Processing for the Conversion of Unstructured Documents into Structured Information in the Enterprise Context

Adam Bartusiak, Jörg Lässig
{"title":"Semantic Processing for the Conversion of Unstructured Documents into Structured Information in the Enterprise Context","authors":"Adam Bartusiak, Jörg Lässig","doi":"10.1145/2993318.2993341","DOIUrl":null,"url":null,"abstract":"We present an on-going research project addressing the problem of massive amounts of unstructured data that is generated on a daily basis in most business organisations, regardless of size. Our motivation is to support in particular small and medium seized enterprises to gain a competitive advantage in the market. The goal is to improve their processes for extracting valuable business information from such disorganised data. To achieve this, we introduce a flexible and scalable data analysis framework capable of transforming various types of documents into semantically annotated structures. This includes emails, text files in various formats, slide presentations, blog entries, etc. Additionally, the solution provides a semantic search engine for structured retrieval of the analyzed information and a graphical layer to dynamically visualize the search results as an interactive graph. Throughout the paper, the architecture of two main engines that are responsible for data and text analysis and semantic search are described. We conclude that semantic processing of unstructured sources significantly improves data management and data integration within the enterprises.","PeriodicalId":177013,"journal":{"name":"Proceedings of the 12th International Conference on Semantic Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th International Conference on Semantic Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2993318.2993341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We present an on-going research project addressing the problem of massive amounts of unstructured data that is generated on a daily basis in most business organisations, regardless of size. Our motivation is to support in particular small and medium seized enterprises to gain a competitive advantage in the market. The goal is to improve their processes for extracting valuable business information from such disorganised data. To achieve this, we introduce a flexible and scalable data analysis framework capable of transforming various types of documents into semantically annotated structures. This includes emails, text files in various formats, slide presentations, blog entries, etc. Additionally, the solution provides a semantic search engine for structured retrieval of the analyzed information and a graphical layer to dynamically visualize the search results as an interactive graph. Throughout the paper, the architecture of two main engines that are responsible for data and text analysis and semantic search are described. We conclude that semantic processing of unstructured sources significantly improves data management and data integration within the enterprises.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
企业环境中非结构化文档向结构化信息转换的语义处理
我们提出了一个正在进行的研究项目,解决了大多数商业组织(无论规模大小)每天产生的大量非结构化数据的问题。我们的动机是协助中小型被查获货品的企业在市场上取得竞争优势。目标是改进他们从这些杂乱无章的数据中提取有价值的业务信息的流程。为了实现这一点,我们引入了一个灵活的、可扩展的数据分析框架,能够将各种类型的文档转换为带有语义注释的结构。这包括电子邮件、各种格式的文本文件、幻灯片演示文稿、博客条目等。此外,该解决方案还提供了一个语义搜索引擎,用于结构化地检索所分析的信息,并提供了一个图形层,将搜索结果动态地可视化为交互式图形。在整个论文中,描述了负责数据和文本分析以及语义搜索的两个主要引擎的体系结构。我们得出结论,非结构化数据源的语义处理显著改善了企业内部的数据管理和数据集成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Top-level Ideas about Importing, Translating and Exporting Knowledge via an Ontology of Representation Languages Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation Executing SPARQL queries over Mapped Document Store with SparqlMap-M Evaluating Query and Storage Strategies for RDF Archives Linking Images to Semantic Knowledge Base with User-generated Tags
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1