首页 > 最新文献

Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development最新文献

英文 中文
Simultaneous Feature Selection and Tuple Selection for Efficient Classification 同时特征选择和元组选择的高效分类
Pub Date : 1900-01-01 DOI: 10.4018/978-1-60566-748-5.CH012
M. Dash, V. Gopalkrishnan
It is no longer news that data are increasing very rapidly day-by-day. Particularly with Internet becoming so prevalent everywhere, the sources of data have become numerous. Data are increasing in both ways: dimensions or features and instances or examples or tuples, not all the data are relevant though. While gathering the data on any particular aspect, usually one tends to gather as much information as will be required for various tasks. One may not explicitly have any particular task, for example classification, in mind. So, it behooves for a data mining expert to remove the noisy, irrelevant and redundant data before proceeding with classification because many traditional algorithms fail in the presence of such noisy and irrelevant data (Blum and Langley 1997). As an example, consider microarray gene expression data where there are thousands of features (or genes) and only 10s of tuples (or sample tests). For example, Leukemia cancer data (Alon, Barkai et al. 1999) has 7129 genes and 72 sample tests. It has been shown that even with very few genes one can achieve the same or even better prediction acABStrAct
数据每天都在快速增长,这已经不是新闻了。特别是随着互联网变得无处不在,数据的来源变得越来越多。数据以两种方式增加:维度或特征,实例或示例或元组,但并非所有数据都是相关的。在收集任何特定方面的数据时,通常倾向于收集各种任务所需的尽可能多的信息。人们可能没有明确地想到任何特定的任务,例如分类。因此,在进行分类之前,数据挖掘专家应该先去除有噪声的、不相关的和冗余的数据,因为许多传统算法在存在这些有噪声的和不相关的数据时失败(Blum和Langley 1997)。例如,考虑微阵列基因表达数据,其中有数千个特征(或基因),而只有10个元组(或样本测试)。例如,白血病数据(Alon, Barkai et al. 1999)有7129个基因和72个样本测试。研究表明,即使只有很少的基因,人们也能达到相同甚至更好的预测
{"title":"Simultaneous Feature Selection and Tuple Selection for Efficient Classification","authors":"M. Dash, V. Gopalkrishnan","doi":"10.4018/978-1-60566-748-5.CH012","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH012","url":null,"abstract":"It is no longer news that data are increasing very rapidly day-by-day. Particularly with Internet becoming so prevalent everywhere, the sources of data have become numerous. Data are increasing in both ways: dimensions or features and instances or examples or tuples, not all the data are relevant though. While gathering the data on any particular aspect, usually one tends to gather as much information as will be required for various tasks. One may not explicitly have any particular task, for example classification, in mind. So, it behooves for a data mining expert to remove the noisy, irrelevant and redundant data before proceeding with classification because many traditional algorithms fail in the presence of such noisy and irrelevant data (Blum and Langley 1997). As an example, consider microarray gene expression data where there are thousands of features (or genes) and only 10s of tuples (or sample tests). For example, Leukemia cancer data (Alon, Barkai et al. 1999) has 7129 genes and 72 sample tests. It has been shown that even with very few genes one can achieve the same or even better prediction acABStrAct","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115780590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Decisional Annotations 决策的注释
Pub Date : 1900-01-01 DOI: 10.4018/978-1-60566-748-5.CH004
G. Cabanac, M. Chevalier, F. Ravat, O. Teste
This chapter deals with an annotation-based decisional system. The decisional system we present is based on multidimensional databases, which are composed of facts and dimensions. The expertise of decision-makers is modelled, shared and stored through annotations. These annotations allow decision-makers to carry on active analysis and to collaborate with other decision-makers on a common analysis.
本章讨论基于注释的决策系统。我们提出的决策系统是基于多维数据库的,多维数据库由事实和维度组成。决策者的专业知识通过注释建模、共享和存储。这些注释允许决策者进行主动分析,并在公共分析上与其他决策者协作。
{"title":"Decisional Annotations","authors":"G. Cabanac, M. Chevalier, F. Ravat, O. Teste","doi":"10.4018/978-1-60566-748-5.CH004","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH004","url":null,"abstract":"This chapter deals with an annotation-based decisional system. The decisional system we present is based on multidimensional databases, which are composed of facts and dimensions. The expertise of decision-makers is modelled, shared and stored through annotations. These annotations allow decision-makers to carry on active analysis and to collaborate with other decision-makers on a common analysis.","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124569071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conceptual Data Warehouse Design Methodology for Business Process Intelligence 面向业务流程智能的概念数据仓库设计方法
Pub Date : 1900-01-01 DOI: 10.4018/978-1-60566-748-5.CH007
Svetlana Mansmann, T. Neumuth, O. Burgert, Matthias Röger
129 The emerging area of business process intelligence aims at enhancing the analysis power of business process management systems by employing performance-oriented technologies of data warehousing and mining. However, the differences in the assumptions and objectives of the underlying models, namely the business process model and the multidimensional data model, aggravate straightforward and meaningful convergence of the two concepts. The authors present an approach to designing a data warehousingfor enabling the multidimensional analysis of business processes and their execution. The aims of such analysis are manifold, from quantitative and qualitative assessment to process discovery, pattern recognition and mining. The authors demonstrate that business processes and workflows represent a non-conventional application scenario for the data warehousing approach and that multiple challenges arise at various design stages. They describe deficiencies of the conventional OLAP technology with respect to business process modeling andformulate the requirements for an adequate multidimensional presentation of process descriptions. Modeling extensions proposed at the conceptual level are verified by implementing them in a relational OLAP system, accessible via state-of the-art visualfrontend tools. The authors demonstrate the benefits of the proposed modelingframework by presenting relevant analysis tasks from the domain of medical engineering and showing the type of the decision support provided by our solution.
129业务流程智能的新兴领域旨在通过采用面向性能的数据仓库和挖掘技术来增强业务流程管理系统的分析能力。然而,底层模型(即业务流程模型和多维数据模型)的假设和目标的差异加剧了这两个概念的直接而有意义的融合。作者提出了一种设计数据仓库的方法,该方法支持对业务流程及其执行进行多维分析。这种分析的目的是多方面的,从定量和定性评估到过程发现、模式识别和挖掘。作者论证了业务流程和工作流代表了数据仓库方法的非传统应用场景,并且在不同的设计阶段出现了多种挑战。它们描述了传统OLAP技术在业务流程建模方面的不足,并制定了流程描述的适当多维表示的需求。在概念级提出的建模扩展通过在关系OLAP系统中实现它们来验证,这些系统可以通过最先进的可视化前端工具进行访问。作者通过展示来自医学工程领域的相关分析任务,并展示我们的解决方案提供的决策支持类型,展示了所提出的建模框架的好处。
{"title":"Conceptual Data Warehouse Design Methodology for Business Process Intelligence","authors":"Svetlana Mansmann, T. Neumuth, O. Burgert, Matthias Röger","doi":"10.4018/978-1-60566-748-5.CH007","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH007","url":null,"abstract":"129 The emerging area of business process intelligence aims at enhancing the analysis power of business process management systems by employing performance-oriented technologies of data warehousing and mining. However, the differences in the assumptions and objectives of the underlying models, namely the business process model and the multidimensional data model, aggravate straightforward and meaningful convergence of the two concepts. The authors present an approach to designing a data warehousingfor enabling the multidimensional analysis of business processes and their execution. The aims of such analysis are manifold, from quantitative and qualitative assessment to process discovery, pattern recognition and mining. The authors demonstrate that business processes and workflows represent a non-conventional application scenario for the data warehousing approach and that multiple challenges arise at various design stages. They describe deficiencies of the conventional OLAP technology with respect to business process modeling andformulate the requirements for an adequate multidimensional presentation of process descriptions. Modeling extensions proposed at the conceptual level are verified by implementing them in a relational OLAP system, accessible via state-of the-art visualfrontend tools. The authors demonstrate the benefits of the proposed modelingframework by presenting relevant analysis tasks from the domain of medical engineering and showing the type of the decision support provided by our solution.","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117255707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamic Workload for Schema Evolution in Data Warehouses 数据仓库中模式演化的动态工作负载
Pub Date : 1900-01-01 DOI: 10.4018/978-1-60566-748-5.CH002
F. Bentayeb, Cécile Favre, Omar Boussaïd
A data warehouse allows the integration of heterogeneous data sources for identified analysis purposes. The data warehouse schema is designed according to the available data sources and the users' analysis requirements. In order to provide an answer to new individual analysis needs, we previously proposed, in recent work, a solution for on-line analysis personalization. We based our solution on a user-driven approach for data warehouse schema evolution which consists in creating new hierarchy levels in OLAP (On-Line Analytical Processing) dimensions. One of the main objectives of OLAP, as the meaning of the acronym refers, is the performance during the analysis process. Since data warehouses contain a large volume of data, answering decision queries efficiently requires particular access methods. The main issue is to use redundant optimization structures such as views and indices. This implies to select an appropriate set of materialized views and indices, which minimizes total query response time, given a limited storage space. A judicious choice in this selection must be cost-driven and based on a workload which represents a set of users' queries on the data warehouse. In this chapter, we address the issues related to the workload’s evolution and maintenance in data warehouse systems in response to new requirements modeling resulting from users’ personalized analysis needs. The main issue is to avoid the workload generation from scratch. Hence, we propose a workload management system which helps the administrator to maintain and adapt dynamically the workload according to changes arising on the data warehouse schema. To achieve this maintenance, we propose two types of workload updates: (1) maintaining existing queries consistent with respect to the new data warehouse schema and (2) creating new queries based on the new dimension hierarchy levels. Our system helps the administrator in adopting a pro-active behaviour in the management of the data warehouse performance. In order to validate our workload management system, we address the implementation issues of our proposed prototype. This latter has been developed within client/server architecture with a web client interfaced with the Oracle 10g DataBase Management System.
数据仓库允许为确定的分析目的集成异构数据源。数据仓库模式是根据可用的数据源和用户的分析需求设计的。为了满足新的个性化分析需求,我们在最近的工作中提出了一种在线分析个性化解决方案。我们的解决方案基于用户驱动的数据仓库模式演变方法,该方法包括在OLAP(在线分析处理)维度中创建新的层次结构级别。顾名思义,OLAP的主要目标之一是分析过程中的性能。由于数据仓库包含大量数据,因此有效地回答决策查询需要特定的访问方法。主要问题是使用冗余的优化结构,如视图和索引。这意味着在有限的存储空间下,选择一组适当的物化视图和索引,这样可以最大限度地减少总查询响应时间。在这种选择中,明智的选择必须是成本驱动的,并且必须基于表示一组用户对数据仓库的查询的工作负载。在本章中,我们将讨论与数据仓库系统中工作负载的演变和维护相关的问题,以响应由用户个性化分析需求产生的新需求建模。主要问题是避免从头生成工作负载。因此,我们提出了一个工作负载管理系统,它可以帮助管理员根据数据仓库模式上的变化动态地维护和适应工作负载。为了实现这种维护,我们提出了两种类型的工作负载更新:(1)维护现有查询与新数据仓库模式的一致性;(2)基于新的维度层次结构级别创建新查询。我们的系统帮助管理员在管理数据仓库性能时采取主动的行为。为了验证我们的工作负载管理系统,我们解决了我们提出的原型的实现问题。后者是在客户端/服务器架构中开发的,带有一个与Oracle 10g数据库管理系统接口的web客户端。
{"title":"Dynamic Workload for Schema Evolution in Data Warehouses","authors":"F. Bentayeb, Cécile Favre, Omar Boussaïd","doi":"10.4018/978-1-60566-748-5.CH002","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH002","url":null,"abstract":"A data warehouse allows the integration of heterogeneous data sources for identified analysis purposes. The data warehouse schema is designed according to the available data sources and the users' analysis requirements. In order to provide an answer to new individual analysis needs, we previously proposed, in recent work, a solution for on-line analysis personalization. We based our solution on a user-driven approach for data warehouse schema evolution which consists in creating new hierarchy levels in OLAP (On-Line Analytical Processing) dimensions. One of the main objectives of OLAP, as the meaning of the acronym refers, is the performance during the analysis process. Since data warehouses contain a large volume of data, answering decision queries efficiently requires particular access methods. The main issue is to use redundant optimization structures such as views and indices. This implies to select an appropriate set of materialized views and indices, which minimizes total query response time, given a limited storage space. A judicious choice in this selection must be cost-driven and based on a workload which represents a set of users' queries on the data warehouse. In this chapter, we address the issues related to the workload’s evolution and maintenance in data warehouse systems in response to new requirements modeling resulting from users’ personalized analysis needs. The main issue is to avoid the workload generation from scratch. Hence, we propose a workload management system which helps the administrator to maintain and adapt dynamically the workload according to changes arising on the data warehouse schema. To achieve this maintenance, we propose two types of workload updates: (1) maintaining existing queries consistent with respect to the new data warehouse schema and (2) creating new queries based on the new dimension hierarchy levels. Our system helps the administrator in adopting a pro-active behaviour in the management of the data warehouse performance. In order to validate our workload management system, we address the implementation issues of our proposed prototype. This latter has been developed within client/server architecture with a web client interfaced with the Oracle 10g DataBase Management System.","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128780952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Data Warehouse Facilitating Evidence-Based Medicine 数据仓库促进循证医学
Pub Date : 1900-01-01 DOI: 10.4018/978-1-60566-748-5.CH008
N. Stolba, Tho Manh Nguyen, A. Tjoa
In the past, much effort of healthcare decision support systems were focused on the data acquisition and storage, in order to allow the use of this data at some later point in time. Medical data was used in static manner, for analytical purposes, in order to verify the undertaken decisions. Due to the immense volumes of medical data, the architecture of the future healthcare decision support systems focus more on interoperability than on integration. With the raising need for the creation of unified knowledge base, the federated approach to distributed data warehouses (DWH) is getting increasing attention. The exploitation of evidence-based guidelines becomes a priority concern, as the awareness of the importance of knowledge management rises. Consequently, interoperability between medical information systems is becoming a necessity in modern health care. Under strong security measures, health care organizations are striking to unite and share their (partly very high sensitive) data assets in order to achieve a wider knowledge base and to provide a matured decision support service for the decision makers. Ontological integration of the very complex and heterogeneous medical data structures is a challenging task. The authors’ objective is to point out the advantages of the deployment of a federated data warehouse approach for the integration of the wide range of different medical data sources and for distribution of evidence-based clinical knowledge, to support clinical decision makers, primarily clinicians at the point of care. DOI: 10.4018/978-1-60566-748-5.ch008
过去,医疗保健决策支持系统的大部分工作都集中在数据采集和存储上,以便在以后的某个时间点使用这些数据。为了分析目的,以静态方式使用医疗数据,以核实所作出的决定。由于医疗数据量巨大,未来医疗保健决策支持系统的体系结构将更多地关注互操作性,而不是集成。随着建立统一知识库的需求日益增加,分布式数据仓库的联邦化方法越来越受到人们的关注。随着人们对知识管理重要性认识的提高,基于证据的指南的开发成为一个优先考虑的问题。因此,医疗信息系统之间的互操作性正在成为现代医疗保健的必需品。在强大的安全措施下,医疗保健组织正在努力统一和共享其(部分非常敏感的)数据资产,以便获得更广泛的知识库,并为决策者提供成熟的决策支持服务。对非常复杂和异构的医疗数据结构进行本体集成是一项具有挑战性的任务。作者的目的是指出部署联邦数据仓库方法的优势,用于集成广泛的不同医疗数据源和分发循证临床知识,以支持临床决策者,主要是护理点的临床医生。DOI: 10.4018 / 978 - 1 - 60566 - 748 - 5. - ch008
{"title":"Data Warehouse Facilitating Evidence-Based Medicine","authors":"N. Stolba, Tho Manh Nguyen, A. Tjoa","doi":"10.4018/978-1-60566-748-5.CH008","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH008","url":null,"abstract":"In the past, much effort of healthcare decision support systems were focused on the data acquisition and storage, in order to allow the use of this data at some later point in time. Medical data was used in static manner, for analytical purposes, in order to verify the undertaken decisions. Due to the immense volumes of medical data, the architecture of the future healthcare decision support systems focus more on interoperability than on integration. With the raising need for the creation of unified knowledge base, the federated approach to distributed data warehouses (DWH) is getting increasing attention. The exploitation of evidence-based guidelines becomes a priority concern, as the awareness of the importance of knowledge management rises. Consequently, interoperability between medical information systems is becoming a necessity in modern health care. Under strong security measures, health care organizations are striking to unite and share their (partly very high sensitive) data assets in order to achieve a wider knowledge base and to provide a matured decision support service for the decision makers. Ontological integration of the very complex and heterogeneous medical data structures is a challenging task. The authors’ objective is to point out the advantages of the deployment of a federated data warehouse approach for the integration of the wide range of different medical data sources and for distribution of evidence-based clinical knowledge, to support clinical decision makers, primarily clinicians at the point of care. DOI: 10.4018/978-1-60566-748-5.ch008","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130628093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1