An Approach to Document Warehousing System Lifecyle from Textual ETL to Multidimensional Queries: A Proof-of-Concept Prototype

A. Cembalo, F. M. Pisano, G. Romano
{"title":"An Approach to Document Warehousing System Lifecyle from Textual ETL to Multidimensional Queries: A Proof-of-Concept Prototype","authors":"A. Cembalo, F. M. Pisano, G. Romano","doi":"10.1109/CISIS.2012.185","DOIUrl":null,"url":null,"abstract":"For years, businessmen made use of ad-hoc technologies in order to analyze huge amount of data related to the domain of interest, aiming at extracting relevant information to elaborate successful company strategies. Such technologies focused essentially on the structured data. In particular Data Warehousing systems represent the decision support systems on which academia and industry focused their attention. It is believed that \"about 80% of the information of any organization is contained in unstructured and semi-structured documents\"[1], so limiting the analysis to only the structured data, as it has been done so far, is likely to lose a high percentage of potentially useful knowledge. Since text is the primary mean to disseminate information and knowledge, it is necessary to introduce concepts related to text-oriented Business Intelligent and Document Warehousing systems, which could have many useful applications in industries or large domains. In this paper we present a prototype application of a Document Warehousing system, highlighting challenges and solutions for each phase of its lifecycle. The prototype is related to Security and Prevention domain and it is built with a set of open-source tools whose features and limitations are highlighted. As we currently know, organization and setting of the fundamental elements of a Document Warehouse system lifecycle, are issues which have not been deepened yet. Furthermore until now, we have not find an application of Document Warehousing, which has been implemented integrating the open-source tools which we use to implement our prototype yet.","PeriodicalId":158978,"journal":{"name":"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems","volume":"429 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISIS.2012.185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

For years, businessmen made use of ad-hoc technologies in order to analyze huge amount of data related to the domain of interest, aiming at extracting relevant information to elaborate successful company strategies. Such technologies focused essentially on the structured data. In particular Data Warehousing systems represent the decision support systems on which academia and industry focused their attention. It is believed that "about 80% of the information of any organization is contained in unstructured and semi-structured documents"[1], so limiting the analysis to only the structured data, as it has been done so far, is likely to lose a high percentage of potentially useful knowledge. Since text is the primary mean to disseminate information and knowledge, it is necessary to introduce concepts related to text-oriented Business Intelligent and Document Warehousing systems, which could have many useful applications in industries or large domains. In this paper we present a prototype application of a Document Warehousing system, highlighting challenges and solutions for each phase of its lifecycle. The prototype is related to Security and Prevention domain and it is built with a set of open-source tools whose features and limitations are highlighted. As we currently know, organization and setting of the fundamental elements of a Document Warehouse system lifecycle, are issues which have not been deepened yet. Furthermore until now, we have not find an application of Document Warehousing, which has been implemented integrating the open-source tools which we use to implement our prototype yet.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从文本ETL到多维查询的文档仓库系统生命周期方法:一个概念验证原型
多年来,商人利用ad-hoc技术来分析与感兴趣领域相关的大量数据,旨在提取相关信息以制定成功的公司战略。这些技术主要关注结构化数据。特别是数据仓库系统代表了学术界和工业界关注的决策支持系统。据认为,“任何组织大约80%的信息都包含在非结构化和半结构化文档中”[1],因此,像目前所做的那样,将分析仅限于结构化数据,可能会失去很大比例的潜在有用知识。由于文本是传播信息和知识的主要手段,因此有必要介绍与面向文本的业务智能和文档仓库系统相关的概念,这些概念可能在工业或大型领域中有许多有用的应用。在本文中,我们提出了一个文档仓库系统的原型应用程序,突出了其生命周期的每个阶段的挑战和解决方案。该原型与安全防护领域相关,使用一组开源工具构建,突出了这些工具的功能和局限性。正如我们目前所知,Document Warehouse系统生命周期的基本元素的组织和设置是尚未深入研究的问题。此外,到目前为止,我们还没有找到一个文档仓库的应用程序,它已经集成了我们用来实现原型的开源工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Trustworthiness-based Group Communication Protocols Evaluation of Human Robot Interaction Factors of a Socially Assistive Robot Together with Older People Architecture for Integrating Computational Tools Based on Grid Services for System Monitoring and Alerting Evaluation of Never Die Network for a Rural Area in an Ultra Large Scale Disaster Towards a Model for Recognising the Social Attitude in Natural Interaction with Embodied Agents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1