Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021010103
Abdélilah Balamane
Most existing biclustering algorithms take into account the properties that hold for a set of objects. However, it could be beneficial in several application domains such as organized crimes, genetics, or digital marketing to identify homogeneous groups of similar objects in terms of both the presence and the absence of attributes. In this paper, the author proposes a scalable and efficient algorithm of biclustering that exploits a binary matrix to produce at least three types of biclusters where the cell's column (1) are filled with 1's, (2) are filled with 0's, and (3) some columns filled with 1's and/or with 0's. This procedure is scalable and it's executed without having to consider the complementary of the initial binary context. The implementation and validation of the method on data sets illustrates its potential in the discovery of relevant patterns.
{"title":"Scalable Biclustering Algorithm Considers the Presence or Absence of Properties","authors":"Abdélilah Balamane","doi":"10.4018/IJDWM.2021010103","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010103","url":null,"abstract":"Most existing biclustering algorithms take into account the properties that hold for a set of objects. However, it could be beneficial in several application domains such as organized crimes, genetics, or digital marketing to identify homogeneous groups of similar objects in terms of both the presence and the absence of attributes. In this paper, the author proposes a scalable and efficient algorithm of biclustering that exploits a binary matrix to produce at least three types of biclusters where the cell's column (1) are filled with 1's, (2) are filled with 0's, and (3) some columns filled with 1's and/or with 0's. This procedure is scalable and it's executed without having to consider the complementary of the initial binary context. The implementation and validation of the method on data sets illustrates its potential in the discovery of relevant patterns.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"76 1","pages":"39-56"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75477856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021040104
V. Rajinikanth, S. Kadry
In medical domain, the detection of the acute diseases based on the medical data plays a vital role in identifying the nature, cause, and the severity of the disease with suitable accuracy; this information supports the doctor during the decision making and treatment planning procedures. The research aims to develop a framework for preserving the disease-evidence-information (DEvI) to support the automated disease detection process. Various phases of DEvI include (1) data collection, (2) data pre- and post-processing, (3) disease information mining, and (4) implementation of a deep-neural-network (DNN) architecture to detect the disease. To demonstrate the proposed framework, assessment of lung nodule (LN) is presented, and the attained result confirms that this framework helps to attain better segmentation as well as classification result. This technique is clinically significant and helps to reduce the diagnostic burden of the doctor during the malignant LN detection.
{"title":"Development of a Framework for Preserving the Disease-Evidence-Information to Support Efficient Disease Diagnosis","authors":"V. Rajinikanth, S. Kadry","doi":"10.4018/IJDWM.2021040104","DOIUrl":"https://doi.org/10.4018/IJDWM.2021040104","url":null,"abstract":"In medical domain, the detection of the acute diseases based on the medical data plays a vital role in identifying the nature, cause, and the severity of the disease with suitable accuracy; this information supports the doctor during the decision making and treatment planning procedures. The research aims to develop a framework for preserving the disease-evidence-information (DEvI) to support the automated disease detection process. Various phases of DEvI include (1) data collection, (2) data pre- and post-processing, (3) disease information mining, and (4) implementation of a deep-neural-network (DNN) architecture to detect the disease. To demonstrate the proposed framework, assessment of lung nodule (LN) is presented, and the attained result confirms that this framework helps to attain better segmentation as well as classification result. This technique is clinically significant and helps to reduce the diagnostic burden of the doctor during the malignant LN detection.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"40 1","pages":"63-84"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81417632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021010105
Neha Gupta, Sakshi Jolly
Data usually comes into data warehouses from multiple sources having different formats and are specifically categorized into three groups (i.e., structured, semi-structured, and unstructured). Various data mining technologies are used to collect, refine, and analyze the data which further leads to the problem of data quality management. Data purgation occurs when the data is subject to ETL methodology in order to maintain and improve the data quality. The data may contain unnecessary information and may have inappropriate symbols which can be defined as dummy values, cryptic values, or missing values. The present work has improved the expectation-maximization algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics to ensure dummy values, Wards algorithm with Minkowski distance to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse.
{"title":"Enhancing Data Quality at ETL Stage of Data Warehousing","authors":"Neha Gupta, Sakshi Jolly","doi":"10.4018/IJDWM.2021010105","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010105","url":null,"abstract":"Data usually comes into data warehouses from multiple sources having different formats and are specifically categorized into three groups (i.e., structured, semi-structured, and unstructured). Various data mining technologies are used to collect, refine, and analyze the data which further leads to the problem of data quality management. Data purgation occurs when the data is subject to ETL methodology in order to maintain and improve the data quality. The data may contain unnecessary information and may have inappropriate symbols which can be defined as dummy values, cryptic values, or missing values. The present work has improved the expectation-maximization algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics to ensure dummy values, Wards algorithm with Minkowski distance to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"5 1","pages":"74-91"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73515708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021070101
Amir Sakka, S. Bimonte, F. Pinet, Lucile Sautot
With the maturity of crowdsourcing systems, new analysis possibilities appear where volunteers play a crucial role by bringing the implicit knowledge issued from practical and daily experience. At the same time, data warehouse and OLAP systems represent the first citizen of decision-support systems. They allow analyzing a huge volume of data according to the multidimensional model. The more the multidimensional model reflects the decision-makers' analysis needs, the more the DW project is successful. However, when volunteers are involved in the design of DWs, existing DW design methodologies present some limitations. In this work, the authors present the main features of volunteer data warehouse (VDW) design, and they study the main existing DW design methodology to find out how they can contribute to fulfil the features needed by this particular DW approach. To provide a formal framework to classify existing work, they provide a study of differences between classical DW users and volunteers. The paper also presents a set of open issues for VDW.
{"title":"Volunteer Data Warehouse: State of the Art","authors":"Amir Sakka, S. Bimonte, F. Pinet, Lucile Sautot","doi":"10.4018/IJDWM.2021070101","DOIUrl":"https://doi.org/10.4018/IJDWM.2021070101","url":null,"abstract":"With the maturity of crowdsourcing systems, new analysis possibilities appear where volunteers play a crucial role by bringing the implicit knowledge issued from practical and daily experience. At the same time, data warehouse and OLAP systems represent the first citizen of decision-support systems. They allow analyzing a huge volume of data according to the multidimensional model. The more the multidimensional model reflects the decision-makers' analysis needs, the more the DW project is successful. However, when volunteers are involved in the design of DWs, existing DW design methodologies present some limitations. In this work, the authors present the main features of volunteer data warehouse (VDW) design, and they study the main existing DW design methodology to find out how they can contribute to fulfil the features needed by this particular DW approach. To provide a formal framework to classify existing work, they provide a study of differences between classical DW users and volunteers. The paper also presents a set of open issues for VDW.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"9 1","pages":"1-21"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87022433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021010102
S. Belkacem, K. Boukhalfa
Social media are used by hundreds of millions of users worldwide. On these platforms, any user can post and share updates with individuals from his social network. Due to the large amount of data, users are overwhelmed by updates displayed chronologically in their newsfeed. Moreover, most of them are irrelevant. Ranking newsfeed updates in order of relevance is proposed to help users quickly catch up with the relevant updates. In this work, the authors first study approaches proposed in this area according to four main criteria: features that may influence relevance, relevance prediction models, training and evaluation methods, and evaluation platforms. Then the authors propose an approach that leverages another type of feature which is the expertise of the update's author for the corresponding topics. Experimental results on Twitter highlight that judging expertise, which has not been considered in the academic and the industrial communities, is crucial for maximizing the relevance of updates in newsfeeds.
{"title":"Ranking News Feed Updates on Social Media: A Review and Expertise-Aware Approach","authors":"S. Belkacem, K. Boukhalfa","doi":"10.4018/IJDWM.2021010102","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010102","url":null,"abstract":"Social media are used by hundreds of millions of users worldwide. On these platforms, any user can post and share updates with individuals from his social network. Due to the large amount of data, users are overwhelmed by updates displayed chronologically in their newsfeed. Moreover, most of them are irrelevant. Ranking newsfeed updates in order of relevance is proposed to help users quickly catch up with the relevant updates. In this work, the authors first study approaches proposed in this area according to four main criteria: features that may influence relevance, relevance prediction models, training and evaluation methods, and evaluation platforms. Then the authors propose an approach that leverages another type of feature which is the expertise of the update's author for the corresponding topics. Experimental results on Twitter highlight that judging expertise, which has not been considered in the academic and the industrial communities, is crucial for maximizing the relevance of updates in newsfeeds.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"1 1","pages":"15-38"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82725512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021070104
Nitesh Sukhwani, Venkateswara Rao Kagita, Vikas Kumar, S. K. Panda
Skyline recommendation with uncertain preferences has drawn AI researchers' attention in recent years due to its wide range of applications. The naive approach of skyline recommendation computes the skyline probability of all objects and ranks them accordingly. However, in many applications, the interest is in determining top-k objects rather than their ranking. The most efficient algorithm to determine an object's skyline probability employs the concepts of zero-contributing set and prefix-based k-level absorption. The authors show that the performance of these methods highly depends on the arrangement of objects in the database. In this paper, the authors propose a method for determining top-k skyline objects without computing the skyline probability of all the objects. They also propose and analyze different methods of ordering the objects in the database. Finally, they empirically show the efficacy of the proposed approaches on several synthetic and real-world data sets.
{"title":"Efficient Computation of Top-K Skyline Objects in Data Set With Uncertain Preferences","authors":"Nitesh Sukhwani, Venkateswara Rao Kagita, Vikas Kumar, S. K. Panda","doi":"10.4018/IJDWM.2021070104","DOIUrl":"https://doi.org/10.4018/IJDWM.2021070104","url":null,"abstract":"Skyline recommendation with uncertain preferences has drawn AI researchers' attention in recent years due to its wide range of applications. The naive approach of skyline recommendation computes the skyline probability of all objects and ranks them accordingly. However, in many applications, the interest is in determining top-k objects rather than their ranking. The most efficient algorithm to determine an object's skyline probability employs the concepts of zero-contributing set and prefix-based k-level absorption. The authors show that the performance of these methods highly depends on the arrangement of objects in the database. In this paper, the authors propose a method for determining top-k skyline objects without computing the skyline probability of all the objects. They also propose and analyze different methods of ordering the objects in the database. Finally, they empirically show the efficacy of the proposed approaches on several synthetic and real-world data sets.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"19 1","pages":"68-80"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90233739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021040103
Long Giang Nguyen, Le Hoang Son, N. Tuan, T. Ngan, Nguyen Nhu Son, N. Thang
The tolerance rough set model is an effective tool to solve attribute reduction problem directly on incomplete decision systems without pre-processing missing values. In practical applications, incomplete decision systems are often changed and updated, especially in the case of adding or removing attributes. To solve the problem of finding reduct on dynamic incomplete decision systems, researchers have proposed many incremental algorithms to decrease execution time. However, the proposed incremental algorithms are mainly based on filter approach in which classification accuracy was calculated after the reduct has been obtained. As the results, these filter algorithms do not get the best result in term of the number of attributes in reduct and classification accuracy. This paper proposes two distance based filter-wrapper incremental algorithms: the algorithm IFWA_AA in case of adding attributes and the algorithm IFWA_DA in case of deleting attributes. Experimental results show that proposed filter-wrapper incremental algorithm IFWA_AA decreases significantly the number of attributes in reduct and improves classification accuracy compared to filter incremental algorithms such as UARA, IDRA.
{"title":"Filter-Wrapper Incremental Algorithms for Finding Reduct in Incomplete Decision Systems When Adding and Deleting an Attribute Set","authors":"Long Giang Nguyen, Le Hoang Son, N. Tuan, T. Ngan, Nguyen Nhu Son, N. Thang","doi":"10.4018/IJDWM.2021040103","DOIUrl":"https://doi.org/10.4018/IJDWM.2021040103","url":null,"abstract":"The tolerance rough set model is an effective tool to solve attribute reduction problem directly on incomplete decision systems without pre-processing missing values. In practical applications, incomplete decision systems are often changed and updated, especially in the case of adding or removing attributes. To solve the problem of finding reduct on dynamic incomplete decision systems, researchers have proposed many incremental algorithms to decrease execution time. However, the proposed incremental algorithms are mainly based on filter approach in which classification accuracy was calculated after the reduct has been obtained. As the results, these filter algorithms do not get the best result in term of the number of attributes in reduct and classification accuracy. This paper proposes two distance based filter-wrapper incremental algorithms: the algorithm IFWA_AA in case of adding attributes and the algorithm IFWA_DA in case of deleting attributes. Experimental results show that proposed filter-wrapper incremental algorithm IFWA_AA decreases significantly the number of attributes in reduct and improves classification accuracy compared to filter incremental algorithms such as UARA, IDRA.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"1 1","pages":"39-62"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75470700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021010104
I. Jacob, P. Betty, P. Darney, Hoang Viet Long, T. Tuan, Y. H. Robinson, S. Vimal, E. G. Julie
Methods to retrieve images involve retrieving images from the database by using features of it. They are colour, shape, and texture. These features are used to find the similarity for the query image with that of images in the database. The images are sorted in the order with this similarity. The article uses intra- and inter-texture chrominance and its intensity. Here inter-chromatic texture feature is extracted by LOCTP (local oppugnant colored texture pattern). Local binary pattern (LBP) gives the intra-texture information. Histogram of oriented gradient (HoG) is used to get the shape information from the satellite images. The performance analysis is land-cover remote sensing database, NWPU-VHR-10 dataset, and satellite optical land cover database gives better results than the previous works.
{"title":"Image Retrieval Using Intensity Gradients and Texture Chromatic Pattern: Satellite Images Retrieval","authors":"I. Jacob, P. Betty, P. Darney, Hoang Viet Long, T. Tuan, Y. H. Robinson, S. Vimal, E. G. Julie","doi":"10.4018/IJDWM.2021010104","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010104","url":null,"abstract":"Methods to retrieve images involve retrieving images from the database by using features of it. They are colour, shape, and texture. These features are used to find the similarity for the query image with that of images in the database. The images are sorted in the order with this similarity. The article uses intra- and inter-texture chrominance and its intensity. Here inter-chromatic texture feature is extracted by LOCTP (local oppugnant colored texture pattern). Local binary pattern (LBP) gives the intra-texture information. Histogram of oriented gradient (HoG) is used to get the shape information from the satellite images. The performance analysis is land-cover remote sensing database, NWPU-VHR-10 dataset, and satellite optical land cover database gives better results than the previous works.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"1 1","pages":"57-73"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83343927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021010101
F. Abdelhédi, A. A. Brahim, G. Zurfluh
Big data have received a great deal of attention in recent years. Not only is the amount of data on a completely different level than before, but also the authors have different type of data including factors such as format, structure, and sources. This has definitely changed the tools one needs to handle big data, giving rise to NoSQL systems. While NoSQL systems have proven their efficiency to handle big data, it's still an unsolved problem how the automatic storage of big data in NoSQL systems could be done. This paper proposes an automatic approach for implementing UML conceptual models in NoSQL systems, including the mapping of the associated OCL constraints to the code required for checking them. In order to demonstrate the practical applicability of the work, this paper has realized it in a tool supporting four fundamental OCL expressions: iterate-based expressions, OCL predefined operations, If expression, and Let expression.
{"title":"OCL Constraints Checking on NoSQL Systems Through an MDA-Based Approach","authors":"F. Abdelhédi, A. A. Brahim, G. Zurfluh","doi":"10.4018/IJDWM.2021010101","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010101","url":null,"abstract":"Big data have received a great deal of attention in recent years. Not only is the amount of data on a completely different level than before, but also the authors have different type of data including factors such as format, structure, and sources. This has definitely changed the tools one needs to handle big data, giving rise to NoSQL systems. While NoSQL systems have proven their efficiency to handle big data, it's still an unsolved problem how the automatic storage of big data in NoSQL systems could be done. This paper proposes an automatic approach for implementing UML conceptual models in NoSQL systems, including the mapping of the associated OCL constraints to the code required for checking them. In order to demonstrate the practical applicability of the work, this paper has realized it in a tool supporting four fundamental OCL expressions: iterate-based expressions, OCL predefined operations, If expression, and Let expression.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"118 1","pages":"1-14"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77388253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021040105
S. Chakraborty, Jyotika Doshi
The enterprise data warehouse stores an enormous amount of data collected from multiple sources for analytical processing and strategic decision making. The analytical processing is done using online analytical processing (OLAP) queries where the performance in terms of result retrieval time is an important factor. The major existing approaches for retrieving results from a data warehouse are multidimensional data cubes and materialized views that incur more storage, processing, and maintenance costs. The present study strives to achieve a simpler and faster query result retrieval approach from data warehouse with reduced storage space and minimal maintenance cost. The execution time of frequent queries is saved in the present approach by storing their results for reuse when the query is fired next time. The executed OLAP queries are stored along with the query results and necessary metadata information in a relational database is referred as materialized query database (MQDB). The tables, fields, functions, relational operators, and criteria used in the input query are matched with those of stored query, and if they are found to be same, then the input query and the stored query are considered as a synonymous query. Further, the stored query is checked for incremental updates, and if no incremental updates are required, then the existing stored results are fetched from MQDB. On the other hand, if the stored query requires an incremental update of results, then the processing of only incremental result is considered from data marts. The performance of MQDB model is evaluated by comparing with the developed novel approach, and it is observed that, using MQDB, a significant reduction in query processing time is achieved as compared to the major existing approaches. The developed model will be useful for the organizations keeping their historical records in the data warehouse.
{"title":"An Approach for Retrieving Faster Query Results From Data Warehouse Using Synonymous Materialized Queries","authors":"S. Chakraborty, Jyotika Doshi","doi":"10.4018/IJDWM.2021040105","DOIUrl":"https://doi.org/10.4018/IJDWM.2021040105","url":null,"abstract":"The enterprise data warehouse stores an enormous amount of data collected from multiple sources for analytical processing and strategic decision making. The analytical processing is done using online analytical processing (OLAP) queries where the performance in terms of result retrieval time is an important factor. The major existing approaches for retrieving results from a data warehouse are multidimensional data cubes and materialized views that incur more storage, processing, and maintenance costs. The present study strives to achieve a simpler and faster query result retrieval approach from data warehouse with reduced storage space and minimal maintenance cost. The execution time of frequent queries is saved in the present approach by storing their results for reuse when the query is fired next time. The executed OLAP queries are stored along with the query results and necessary metadata information in a relational database is referred as materialized query database (MQDB). The tables, fields, functions, relational operators, and criteria used in the input query are matched with those of stored query, and if they are found to be same, then the input query and the stored query are considered as a synonymous query. Further, the stored query is checked for incremental updates, and if no incremental updates are required, then the existing stored results are fetched from MQDB. On the other hand, if the stored query requires an incremental update of results, then the processing of only incremental result is considered from data marts. The performance of MQDB model is evaluated by comparing with the developed novel approach, and it is observed that, using MQDB, a significant reduction in query processing time is achieved as compared to the major existing approaches. The developed model will be useful for the organizations keeping their historical records in the data warehouse.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"3 1","pages":"85-105"},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73248408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}