首页 > 最新文献

International Journal of Data Warehousing and Mining最新文献

英文 中文
Scalable Biclustering Algorithm Considers the Presence or Absence of Properties 可伸缩双聚类算法考虑属性的存在与否
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021010103
Abdélilah Balamane
Most existing biclustering algorithms take into account the properties that hold for a set of objects. However, it could be beneficial in several application domains such as organized crimes, genetics, or digital marketing to identify homogeneous groups of similar objects in terms of both the presence and the absence of attributes. In this paper, the author proposes a scalable and efficient algorithm of biclustering that exploits a binary matrix to produce at least three types of biclusters where the cell's column (1) are filled with 1's, (2) are filled with 0's, and (3) some columns filled with 1's and/or with 0's. This procedure is scalable and it's executed without having to consider the complementary of the initial binary context. The implementation and validation of the method on data sets illustrates its potential in the discovery of relevant patterns.
大多数现有的双聚类算法都考虑到一组对象的属性。但是,在有组织犯罪、遗传学或数字营销等几个应用领域中,根据是否存在属性来识别相似对象的同质组可能是有益的。在本文中,作者提出了一种可扩展且高效的双聚类算法,该算法利用一个二进制矩阵来产生至少三种类型的双聚类,其中单元格的列(1)被1填充,(2)被0填充,(3)某些列被1和/或0填充。这个过程是可伸缩的,它的执行不需要考虑初始二进制上下文的补充。该方法在数据集上的实现和验证说明了它在发现相关模式方面的潜力。
{"title":"Scalable Biclustering Algorithm Considers the Presence or Absence of Properties","authors":"Abdélilah Balamane","doi":"10.4018/IJDWM.2021010103","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010103","url":null,"abstract":"Most existing biclustering algorithms take into account the properties that hold for a set of objects. However, it could be beneficial in several application domains such as organized crimes, genetics, or digital marketing to identify homogeneous groups of similar objects in terms of both the presence and the absence of attributes. In this paper, the author proposes a scalable and efficient algorithm of biclustering that exploits a binary matrix to produce at least three types of biclusters where the cell's column (1) are filled with 1's, (2) are filled with 0's, and (3) some columns filled with 1's and/or with 0's. This procedure is scalable and it's executed without having to consider the complementary of the initial binary context. The implementation and validation of the method on data sets illustrates its potential in the discovery of relevant patterns.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"76 1","pages":"39-56"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75477856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Development of a Framework for Preserving the Disease-Evidence-Information to Support Efficient Disease Diagnosis 疾病证据信息保存框架的开发以支持有效的疾病诊断
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021040104
V. Rajinikanth, S. Kadry
In medical domain, the detection of the acute diseases based on the medical data plays a vital role in identifying the nature, cause, and the severity of the disease with suitable accuracy; this information supports the doctor during the decision making and treatment planning procedures. The research aims to develop a framework for preserving the disease-evidence-information (DEvI) to support the automated disease detection process. Various phases of DEvI include (1) data collection, (2) data pre- and post-processing, (3) disease information mining, and (4) implementation of a deep-neural-network (DNN) architecture to detect the disease. To demonstrate the proposed framework, assessment of lung nodule (LN) is presented, and the attained result confirms that this framework helps to attain better segmentation as well as classification result. This technique is clinically significant and helps to reduce the diagnostic burden of the doctor during the malignant LN detection.
在医学领域,基于医学数据的急性疾病检测对于准确识别疾病的性质、病因和严重程度起着至关重要的作用;这些信息支持医生在决策和治疗计划过程中。本研究旨在建立一个保存疾病证据信息(DEvI)的框架,以支持疾病自动检测过程。DEvI的各个阶段包括(1)数据收集,(2)数据预处理和后处理,(3)疾病信息挖掘,以及(4)实现深度神经网络(DNN)架构来检测疾病。为了验证所提出的框架,提出了对肺结节(LN)的评估,获得的结果证实了该框架有助于获得更好的分割和分类结果。这项技术具有临床意义,有助于减轻医生在恶性LN检测过程中的诊断负担。
{"title":"Development of a Framework for Preserving the Disease-Evidence-Information to Support Efficient Disease Diagnosis","authors":"V. Rajinikanth, S. Kadry","doi":"10.4018/IJDWM.2021040104","DOIUrl":"https://doi.org/10.4018/IJDWM.2021040104","url":null,"abstract":"In medical domain, the detection of the acute diseases based on the medical data plays a vital role in identifying the nature, cause, and the severity of the disease with suitable accuracy; this information supports the doctor during the decision making and treatment planning procedures. The research aims to develop a framework for preserving the disease-evidence-information (DEvI) to support the automated disease detection process. Various phases of DEvI include (1) data collection, (2) data pre- and post-processing, (3) disease information mining, and (4) implementation of a deep-neural-network (DNN) architecture to detect the disease. To demonstrate the proposed framework, assessment of lung nodule (LN) is presented, and the attained result confirms that this framework helps to attain better segmentation as well as classification result. This technique is clinically significant and helps to reduce the diagnostic burden of the doctor during the malignant LN detection.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"40 1","pages":"63-84"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81417632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Enhancing Data Quality at ETL Stage of Data Warehousing 提高数据仓库ETL阶段的数据质量
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021010105
Neha Gupta, Sakshi Jolly
Data usually comes into data warehouses from multiple sources having different formats and are specifically categorized into three groups (i.e., structured, semi-structured, and unstructured). Various data mining technologies are used to collect, refine, and analyze the data which further leads to the problem of data quality management. Data purgation occurs when the data is subject to ETL methodology in order to maintain and improve the data quality. The data may contain unnecessary information and may have inappropriate symbols which can be defined as dummy values, cryptic values, or missing values. The present work has improved the expectation-maximization algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics to ensure dummy values, Wards algorithm with Minkowski distance to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse.
数据通常从具有不同格式的多个数据源进入数据仓库,具体分为三组(即结构化、半结构化和非结构化)。使用各种数据挖掘技术来收集、提炼和分析数据,从而导致数据质量管理问题。当数据受ETL方法约束时,就会进行数据净化,以保持和提高数据质量。数据可能包含不必要的信息,并且可能具有不适当的符号,这些符号可以定义为虚拟值、隐值或缺失值。本文改进了使用点积的期望最大化算法来处理隐式数据,使用Gower度量来确保虚拟值的DBSCAN方法,使用Minkowski距离来改善矛盾数据的结果的Wards算法和使用欧几里得距离度量来处理数据集中缺失值的K-means算法。这些距离度量提高了数据质量,还有助于提供加载到数据仓库中的一致数据。
{"title":"Enhancing Data Quality at ETL Stage of Data Warehousing","authors":"Neha Gupta, Sakshi Jolly","doi":"10.4018/IJDWM.2021010105","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010105","url":null,"abstract":"Data usually comes into data warehouses from multiple sources having different formats and are specifically categorized into three groups (i.e., structured, semi-structured, and unstructured). Various data mining technologies are used to collect, refine, and analyze the data which further leads to the problem of data quality management. Data purgation occurs when the data is subject to ETL methodology in order to maintain and improve the data quality. The data may contain unnecessary information and may have inappropriate symbols which can be defined as dummy values, cryptic values, or missing values. The present work has improved the expectation-maximization algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics to ensure dummy values, Wards algorithm with Minkowski distance to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"5 1","pages":"74-91"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73515708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Volunteer Data Warehouse: State of the Art 志愿者数据仓库:最新技术
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021070101
Amir Sakka, S. Bimonte, F. Pinet, Lucile Sautot
With the maturity of crowdsourcing systems, new analysis possibilities appear where volunteers play a crucial role by bringing the implicit knowledge issued from practical and daily experience. At the same time, data warehouse and OLAP systems represent the first citizen of decision-support systems. They allow analyzing a huge volume of data according to the multidimensional model. The more the multidimensional model reflects the decision-makers' analysis needs, the more the DW project is successful. However, when volunteers are involved in the design of DWs, existing DW design methodologies present some limitations. In this work, the authors present the main features of volunteer data warehouse (VDW) design, and they study the main existing DW design methodology to find out how they can contribute to fulfil the features needed by this particular DW approach. To provide a formal framework to classify existing work, they provide a study of differences between classical DW users and volunteers. The paper also presents a set of open issues for VDW.
随着众包系统的成熟,新的分析可能性出现,志愿者通过将实践和日常经验中产生的隐性知识发挥关键作用。同时,数据仓库和OLAP系统代表了决策支持系统的第一个公民。它们允许根据多维模型分析大量数据。多维模型越能反映决策者的分析需求,DW项目就越成功。然而,当志愿者参与DW的设计时,现有的DW设计方法存在一些局限性。在这项工作中,作者介绍了志愿数据仓库(VDW)设计的主要特征,并研究了现有的主要数据仓库设计方法,以找出他们如何能够为实现这种特定数据仓库方法所需的特征做出贡献。为了提供一个对现有工作进行分类的正式框架,他们对经典DW用户和志愿者之间的差异进行了研究。本文还提出了一组VDW的开放问题。
{"title":"Volunteer Data Warehouse: State of the Art","authors":"Amir Sakka, S. Bimonte, F. Pinet, Lucile Sautot","doi":"10.4018/IJDWM.2021070101","DOIUrl":"https://doi.org/10.4018/IJDWM.2021070101","url":null,"abstract":"With the maturity of crowdsourcing systems, new analysis possibilities appear where volunteers play a crucial role by bringing the implicit knowledge issued from practical and daily experience. At the same time, data warehouse and OLAP systems represent the first citizen of decision-support systems. They allow analyzing a huge volume of data according to the multidimensional model. The more the multidimensional model reflects the decision-makers' analysis needs, the more the DW project is successful. However, when volunteers are involved in the design of DWs, existing DW design methodologies present some limitations. In this work, the authors present the main features of volunteer data warehouse (VDW) design, and they study the main existing DW design methodology to find out how they can contribute to fulfil the features needed by this particular DW approach. To provide a formal framework to classify existing work, they provide a study of differences between classical DW users and volunteers. The paper also presents a set of open issues for VDW.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"9 1","pages":"1-21"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87022433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ranking News Feed Updates on Social Media: A Review and Expertise-Aware Approach 对社交媒体上的新闻动态更新进行排名:一种审查和专业知识意识方法
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021010102
S. Belkacem, K. Boukhalfa
Social media are used by hundreds of millions of users worldwide. On these platforms, any user can post and share updates with individuals from his social network. Due to the large amount of data, users are overwhelmed by updates displayed chronologically in their newsfeed. Moreover, most of them are irrelevant. Ranking newsfeed updates in order of relevance is proposed to help users quickly catch up with the relevant updates. In this work, the authors first study approaches proposed in this area according to four main criteria: features that may influence relevance, relevance prediction models, training and evaluation methods, and evaluation platforms. Then the authors propose an approach that leverages another type of feature which is the expertise of the update's author for the corresponding topics. Experimental results on Twitter highlight that judging expertise, which has not been considered in the academic and the industrial communities, is crucial for maximizing the relevance of updates in newsfeeds.
全球有数亿用户使用社交媒体。在这些平台上,任何用户都可以在他的社交网络上发布和分享更新。由于大量的数据,用户被按时间顺序显示的更新淹没了。此外,其中大多数都是无关紧要的。根据相关性对新闻源更新进行排序,以帮助用户快速赶上相关更新。在这项工作中,作者首先根据四个主要标准研究了该领域提出的方法:可能影响相关性的特征、相关性预测模型、训练和评估方法以及评估平台。然后作者提出了一种方法,该方法利用了另一种类型的特性,即更新作者对相应主题的专业知识。Twitter上的实验结果强调,评判专业知识对于最大限度地提高新闻源更新的相关性至关重要,而这一点在学术界和工业界都没有得到考虑。
{"title":"Ranking News Feed Updates on Social Media: A Review and Expertise-Aware Approach","authors":"S. Belkacem, K. Boukhalfa","doi":"10.4018/IJDWM.2021010102","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010102","url":null,"abstract":"Social media are used by hundreds of millions of users worldwide. On these platforms, any user can post and share updates with individuals from his social network. Due to the large amount of data, users are overwhelmed by updates displayed chronologically in their newsfeed. Moreover, most of them are irrelevant. Ranking newsfeed updates in order of relevance is proposed to help users quickly catch up with the relevant updates. In this work, the authors first study approaches proposed in this area according to four main criteria: features that may influence relevance, relevance prediction models, training and evaluation methods, and evaluation platforms. Then the authors propose an approach that leverages another type of feature which is the expertise of the update's author for the corresponding topics. Experimental results on Twitter highlight that judging expertise, which has not been considered in the academic and the industrial communities, is crucial for maximizing the relevance of updates in newsfeeds.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"1 1","pages":"15-38"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82725512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Computation of Top-K Skyline Objects in Data Set With Uncertain Preferences 具有不确定偏好的数据集中Top-K Skyline对象的高效计算
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021070104
Nitesh Sukhwani, Venkateswara Rao Kagita, Vikas Kumar, S. K. Panda
Skyline recommendation with uncertain preferences has drawn AI researchers' attention in recent years due to its wide range of applications. The naive approach of skyline recommendation computes the skyline probability of all objects and ranks them accordingly. However, in many applications, the interest is in determining top-k objects rather than their ranking. The most efficient algorithm to determine an object's skyline probability employs the concepts of zero-contributing set and prefix-based k-level absorption. The authors show that the performance of these methods highly depends on the arrangement of objects in the database. In this paper, the authors propose a method for determining top-k skyline objects without computing the skyline probability of all the objects. They also propose and analyze different methods of ordering the objects in the database. Finally, they empirically show the efficacy of the proposed approaches on several synthetic and real-world data sets.
具有不确定偏好的Skyline推荐由于其广泛的应用,近年来引起了人工智能研究人员的关注。朴素的天际线推荐方法计算所有对象的天际线概率,并相应地对它们进行排序。然而,在许多应用程序中,我们感兴趣的是确定top-k对象,而不是它们的排名。确定目标天际线概率的最有效算法采用零贡献集和基于前缀的k级吸收的概念。作者表明,这些方法的性能高度依赖于数据库中对象的排列。在本文中,作者提出了一种不计算所有物体的天际线概率而确定top-k天际线物体的方法。他们还提出并分析了对数据库中的对象排序的不同方法。最后,他们通过经验证明了所提出的方法在几个合成和现实世界数据集上的有效性。
{"title":"Efficient Computation of Top-K Skyline Objects in Data Set With Uncertain Preferences","authors":"Nitesh Sukhwani, Venkateswara Rao Kagita, Vikas Kumar, S. K. Panda","doi":"10.4018/IJDWM.2021070104","DOIUrl":"https://doi.org/10.4018/IJDWM.2021070104","url":null,"abstract":"Skyline recommendation with uncertain preferences has drawn AI researchers' attention in recent years due to its wide range of applications. The naive approach of skyline recommendation computes the skyline probability of all objects and ranks them accordingly. However, in many applications, the interest is in determining top-k objects rather than their ranking. The most efficient algorithm to determine an object's skyline probability employs the concepts of zero-contributing set and prefix-based k-level absorption. The authors show that the performance of these methods highly depends on the arrangement of objects in the database. In this paper, the authors propose a method for determining top-k skyline objects without computing the skyline probability of all the objects. They also propose and analyze different methods of ordering the objects in the database. Finally, they empirically show the efficacy of the proposed approaches on several synthetic and real-world data sets.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"19 1","pages":"68-80"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90233739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Filter-Wrapper Incremental Algorithms for Finding Reduct in Incomplete Decision Systems When Adding and Deleting an Attribute Set 不完全决策系统中添加和删除属性集时寻找约简的Filter-Wrapper增量算法
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021040103
Long Giang Nguyen, Le Hoang Son, N. Tuan, T. Ngan, Nguyen Nhu Son, N. Thang
The tolerance rough set model is an effective tool to solve attribute reduction problem directly on incomplete decision systems without pre-processing missing values. In practical applications, incomplete decision systems are often changed and updated, especially in the case of adding or removing attributes. To solve the problem of finding reduct on dynamic incomplete decision systems, researchers have proposed many incremental algorithms to decrease execution time. However, the proposed incremental algorithms are mainly based on filter approach in which classification accuracy was calculated after the reduct has been obtained. As the results, these filter algorithms do not get the best result in term of the number of attributes in reduct and classification accuracy. This paper proposes two distance based filter-wrapper incremental algorithms: the algorithm IFWA_AA in case of adding attributes and the algorithm IFWA_DA in case of deleting attributes. Experimental results show that proposed filter-wrapper incremental algorithm IFWA_AA decreases significantly the number of attributes in reduct and improves classification accuracy compared to filter incremental algorithms such as UARA, IDRA.
容差粗糙集模型是直接解决不完全决策系统属性约简问题的有效工具,无需预处理缺失值。在实际应用中,不完整的决策系统经常被更改和更新,特别是在添加或删除属性的情况下。为了解决动态不完全决策系统的约简查找问题,研究者们提出了许多减少执行时间的增量算法。然而,所提出的增量算法主要基于滤波方法,在得到约简后计算分类精度。结果表明,这些过滤算法在约简属性数量和分类精度方面都没有得到最好的结果。本文提出了两种基于距离的filter-wrapper增量算法:添加属性时的IFWA_AA算法和删除属性时的IFWA_DA算法。实验结果表明,与UARA、IDRA等滤波增量算法相比,本文提出的filter-wrapper增量算法IFWA_AA显著减少了约简中属性的数量,提高了分类精度。
{"title":"Filter-Wrapper Incremental Algorithms for Finding Reduct in Incomplete Decision Systems When Adding and Deleting an Attribute Set","authors":"Long Giang Nguyen, Le Hoang Son, N. Tuan, T. Ngan, Nguyen Nhu Son, N. Thang","doi":"10.4018/IJDWM.2021040103","DOIUrl":"https://doi.org/10.4018/IJDWM.2021040103","url":null,"abstract":"The tolerance rough set model is an effective tool to solve attribute reduction problem directly on incomplete decision systems without pre-processing missing values. In practical applications, incomplete decision systems are often changed and updated, especially in the case of adding or removing attributes. To solve the problem of finding reduct on dynamic incomplete decision systems, researchers have proposed many incremental algorithms to decrease execution time. However, the proposed incremental algorithms are mainly based on filter approach in which classification accuracy was calculated after the reduct has been obtained. As the results, these filter algorithms do not get the best result in term of the number of attributes in reduct and classification accuracy. This paper proposes two distance based filter-wrapper incremental algorithms: the algorithm IFWA_AA in case of adding attributes and the algorithm IFWA_DA in case of deleting attributes. Experimental results show that proposed filter-wrapper incremental algorithm IFWA_AA decreases significantly the number of attributes in reduct and improves classification accuracy compared to filter incremental algorithms such as UARA, IDRA.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"1 1","pages":"39-62"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75470700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Image Retrieval Using Intensity Gradients and Texture Chromatic Pattern: Satellite Images Retrieval 基于灰度梯度和纹理色度模式的图像检索:卫星图像检索
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021010104
I. Jacob, P. Betty, P. Darney, Hoang Viet Long, T. Tuan, Y. H. Robinson, S. Vimal, E. G. Julie
Methods to retrieve images involve retrieving images from the database by using features of it. They are colour, shape, and texture. These features are used to find the similarity for the query image with that of images in the database. The images are sorted in the order with this similarity. The article uses intra- and inter-texture chrominance and its intensity. Here inter-chromatic texture feature is extracted by LOCTP (local oppugnant colored texture pattern). Local binary pattern (LBP) gives the intra-texture information. Histogram of oriented gradient (HoG) is used to get the shape information from the satellite images. The performance analysis is land-cover remote sensing database, NWPU-VHR-10 dataset, and satellite optical land cover database gives better results than the previous works.
检索图像的方法包括利用数据库的特征从数据库中检索图像。它们是颜色、形状和质地。这些特征用于查找查询图像与数据库中图像的相似度。图像按照这种相似性排序。本文使用纹理内部和纹理间的色度及其强度。本文采用局部对抗彩色纹理模式(LOCTP)提取颜色间纹理特征。局部二值模式(LBP)给出纹理内部信息。利用定向梯度直方图(HoG)从卫星图像中获取形状信息。利用土地覆盖遥感数据库、NWPU-VHR-10数据集和卫星光学土地覆盖数据库进行性能分析,结果优于以往的工作。
{"title":"Image Retrieval Using Intensity Gradients and Texture Chromatic Pattern: Satellite Images Retrieval","authors":"I. Jacob, P. Betty, P. Darney, Hoang Viet Long, T. Tuan, Y. H. Robinson, S. Vimal, E. G. Julie","doi":"10.4018/IJDWM.2021010104","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010104","url":null,"abstract":"Methods to retrieve images involve retrieving images from the database by using features of it. They are colour, shape, and texture. These features are used to find the similarity for the query image with that of images in the database. The images are sorted in the order with this similarity. The article uses intra- and inter-texture chrominance and its intensity. Here inter-chromatic texture feature is extracted by LOCTP (local oppugnant colored texture pattern). Local binary pattern (LBP) gives the intra-texture information. Histogram of oriented gradient (HoG) is used to get the shape information from the satellite images. The performance analysis is land-cover remote sensing database, NWPU-VHR-10 dataset, and satellite optical land cover database gives better results than the previous works.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"1 1","pages":"57-73"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83343927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
OCL Constraints Checking on NoSQL Systems Through an MDA-Based Approach 基于mda的NoSQL系统OCL约束检测方法
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021010101
F. Abdelhédi, A. A. Brahim, G. Zurfluh
Big data have received a great deal of attention in recent years. Not only is the amount of data on a completely different level than before, but also the authors have different type of data including factors such as format, structure, and sources. This has definitely changed the tools one needs to handle big data, giving rise to NoSQL systems. While NoSQL systems have proven their efficiency to handle big data, it's still an unsolved problem how the automatic storage of big data in NoSQL systems could be done. This paper proposes an automatic approach for implementing UML conceptual models in NoSQL systems, including the mapping of the associated OCL constraints to the code required for checking them. In order to demonstrate the practical applicability of the work, this paper has realized it in a tool supporting four fundamental OCL expressions: iterate-based expressions, OCL predefined operations, If expression, and Let expression.
近年来,大数据受到了广泛关注。不仅数据量与以前完全不同,而且作者的数据类型也不同,包括格式、结构和来源等因素。这无疑改变了处理大数据所需的工具,从而产生了NoSQL系统。虽然NoSQL系统已经证明了其处理大数据的效率,但如何在NoSQL系统中实现大数据的自动存储仍然是一个未解决的问题。本文提出了一种在NoSQL系统中实现UML概念模型的自动方法,包括将相关的OCL约束映射到检查它们所需的代码。为了证明工作的实际适用性,本文在一个支持四种基本OCL表达式的工具中实现了它:基于迭代的表达式、OCL预定义操作、If表达式和Let表达式。
{"title":"OCL Constraints Checking on NoSQL Systems Through an MDA-Based Approach","authors":"F. Abdelhédi, A. A. Brahim, G. Zurfluh","doi":"10.4018/IJDWM.2021010101","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010101","url":null,"abstract":"Big data have received a great deal of attention in recent years. Not only is the amount of data on a completely different level than before, but also the authors have different type of data including factors such as format, structure, and sources. This has definitely changed the tools one needs to handle big data, giving rise to NoSQL systems. While NoSQL systems have proven their efficiency to handle big data, it's still an unsolved problem how the automatic storage of big data in NoSQL systems could be done. This paper proposes an automatic approach for implementing UML conceptual models in NoSQL systems, including the mapping of the associated OCL constraints to the code required for checking them. In order to demonstrate the practical applicability of the work, this paper has realized it in a tool supporting four fundamental OCL expressions: iterate-based expressions, OCL predefined operations, If expression, and Let expression.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"118 1","pages":"1-14"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77388253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Approach for Retrieving Faster Query Results From Data Warehouse Using Synonymous Materialized Queries 一种使用同义物化查询快速检索数据仓库查询结果的方法
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2021-01-01 DOI: 10.4018/IJDWM.2021040105
S. Chakraborty, Jyotika Doshi
The enterprise data warehouse stores an enormous amount of data collected from multiple sources for analytical processing and strategic decision making. The analytical processing is done using online analytical processing (OLAP) queries where the performance in terms of result retrieval time is an important factor. The major existing approaches for retrieving results from a data warehouse are multidimensional data cubes and materialized views that incur more storage, processing, and maintenance costs. The present study strives to achieve a simpler and faster query result retrieval approach from data warehouse with reduced storage space and minimal maintenance cost. The execution time of frequent queries is saved in the present approach by storing their results for reuse when the query is fired next time. The executed OLAP queries are stored along with the query results and necessary metadata information in a relational database is referred as materialized query database (MQDB). The tables, fields, functions, relational operators, and criteria used in the input query are matched with those of stored query, and if they are found to be same, then the input query and the stored query are considered as a synonymous query. Further, the stored query is checked for incremental updates, and if no incremental updates are required, then the existing stored results are fetched from MQDB. On the other hand, if the stored query requires an incremental update of results, then the processing of only incremental result is considered from data marts. The performance of MQDB model is evaluated by comparing with the developed novel approach, and it is observed that, using MQDB, a significant reduction in query processing time is achieved as compared to the major existing approaches. The developed model will be useful for the organizations keeping their historical records in the data warehouse.
企业数据仓库存储从多个来源收集的大量数据,用于分析处理和战略决策。分析处理使用在线分析处理(OLAP)查询完成,其中结果检索时间方面的性能是一个重要因素。从数据仓库中检索结果的现有主要方法是多维数据集和物化视图,这会产生更多的存储、处理和维护成本。本研究力求在减少存储空间和最小化维护成本的前提下,实现一种更简单、更快速的数据仓库查询结果检索方法。在当前方法中,存储频繁查询的结果以便下次触发查询时重用,从而节省了频繁查询的执行时间。执行的OLAP查询与查询结果和必要的元数据信息一起存储在关系数据库中,称为物化查询数据库(MQDB)。输入查询中使用的表、字段、函数、关系运算符和条件与存储查询中的表、字段、函数、关系运算符和条件相匹配,如果发现它们相同,则将输入查询和存储查询视为同义查询。此外,将检查存储的查询是否有增量更新,如果不需要增量更新,则从MQDB获取现有的存储结果。另一方面,如果存储的查询需要对结果进行增量更新,则只考虑对数据集市中的增量结果进行处理。通过与开发的新方法进行比较来评估MQDB模型的性能,并且可以观察到,与现有的主要方法相比,使用MQDB可以显著减少查询处理时间。开发的模型对于在数据仓库中保存历史记录的组织非常有用。
{"title":"An Approach for Retrieving Faster Query Results From Data Warehouse Using Synonymous Materialized Queries","authors":"S. Chakraborty, Jyotika Doshi","doi":"10.4018/IJDWM.2021040105","DOIUrl":"https://doi.org/10.4018/IJDWM.2021040105","url":null,"abstract":"The enterprise data warehouse stores an enormous amount of data collected from multiple sources for analytical processing and strategic decision making. The analytical processing is done using online analytical processing (OLAP) queries where the performance in terms of result retrieval time is an important factor. The major existing approaches for retrieving results from a data warehouse are multidimensional data cubes and materialized views that incur more storage, processing, and maintenance costs. The present study strives to achieve a simpler and faster query result retrieval approach from data warehouse with reduced storage space and minimal maintenance cost. The execution time of frequent queries is saved in the present approach by storing their results for reuse when the query is fired next time. The executed OLAP queries are stored along with the query results and necessary metadata information in a relational database is referred as materialized query database (MQDB). The tables, fields, functions, relational operators, and criteria used in the input query are matched with those of stored query, and if they are found to be same, then the input query and the stored query are considered as a synonymous query. Further, the stored query is checked for incremental updates, and if no incremental updates are required, then the existing stored results are fetched from MQDB. On the other hand, if the stored query requires an incremental update of results, then the processing of only incremental result is considered from data marts. The performance of MQDB model is evaluated by comparing with the developed novel approach, and it is observed that, using MQDB, a significant reduction in query processing time is achieved as compared to the major existing approaches. The developed model will be useful for the organizations keeping their historical records in the data warehouse.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"3 1","pages":"85-105"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73248408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Data Warehousing and Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1