Pub Date : 2021-10-01DOI: 10.4018/ijdwm.2021100103
Han Li, Zhao Liu, P. Zhu
The missing values in industrial data restrict the applications. Although this incomplete data contains enough information for engineers to support subsequent development, there are still too many missing values for algorithms to establish precise models. This is because the engineering domain knowledge is not considered, and valuable information is not fully captured. Therefore, this article proposes an engineering domain knowledge-based framework for modelling incomplete industrial data. The raw datasets are partitioned and processed at different scales. Firstly, the hierarchical features are combined to decrease the missing ratio. In order to fill the missing values in special data, which is identified for classifying the samples, samples with only part of the features presented are fully utilized instead of being removed to establish local imputation model. Then samples are divided into different groups to transfer the information. A series of industrial data is analyzed for verifying the feasibility of the proposed method.
{"title":"An Engineering Domain Knowledge-Based Framework for Modelling Highly Incomplete Industrial Data","authors":"Han Li, Zhao Liu, P. Zhu","doi":"10.4018/ijdwm.2021100103","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100103","url":null,"abstract":"The missing values in industrial data restrict the applications. Although this incomplete data contains enough information for engineers to support subsequent development, there are still too many missing values for algorithms to establish precise models. This is because the engineering domain knowledge is not considered, and valuable information is not fully captured. Therefore, this article proposes an engineering domain knowledge-based framework for modelling incomplete industrial data. The raw datasets are partitioned and processed at different scales. Firstly, the hierarchical features are combined to decrease the missing ratio. In order to fill the missing values in special data, which is identified for classifying the samples, samples with only part of the features presented are fully utilized instead of being removed to establish local imputation model. Then samples are divided into different groups to transfer the information. A series of industrial data is analyzed for verifying the feasibility of the proposed method.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81853124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.4018/ijdwm.2021100105
R. Abirami, M. DuraiRajVincentP., S. Kadry
Early and automatic segmentation of lung infections from computed tomography images of COVID-19 patients is crucial for timely quarantine and effective treatment. However, automating the segmentation of lung infection from CT slices is challenging due to a lack of contrast between the normal and infected tissues. A CNN and GAN-based framework are presented to classify and then segment the lung infections automatically from COVID-19 lung CT slices. In this work, the authors propose a novel method named P2P-COVID-SEG to automatically classify COVID-19 and normal CT images and then segment COVID-19 lung infections from CT images using GAN. The proposed model outperformed the existing classification models with an accuracy of 98.10%. The segmentation results outperformed existing methods and achieved infection segmentation with accurate boundaries. The Dice coefficient achieved using GAN segmentation is 81.11%. The segmentation results demonstrate that the proposed model outperforms the existing models and achieves state-of-the-art performance.
{"title":"P2P-COVID-GAN: Classification and Segmentation of COVID-19 Lung Infections From CT Images Using GAN","authors":"R. Abirami, M. DuraiRajVincentP., S. Kadry","doi":"10.4018/ijdwm.2021100105","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100105","url":null,"abstract":"Early and automatic segmentation of lung infections from computed tomography images of COVID-19 patients is crucial for timely quarantine and effective treatment. However, automating the segmentation of lung infection from CT slices is challenging due to a lack of contrast between the normal and infected tissues. A CNN and GAN-based framework are presented to classify and then segment the lung infections automatically from COVID-19 lung CT slices. In this work, the authors propose a novel method named P2P-COVID-SEG to automatically classify COVID-19 and normal CT images and then segment COVID-19 lung infections from CT images using GAN. The proposed model outperformed the existing classification models with an accuracy of 98.10%. The segmentation results outperformed existing methods and achieved infection segmentation with accurate boundaries. The Dice coefficient achieved using GAN segmentation is 81.11%. The segmentation results demonstrate that the proposed model outperforms the existing models and achieves state-of-the-art performance.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79083588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.4018/ijdwm.2021100104
Thang Truong Nguyen, Long Giang Nguyen, D. T. Tran, T. T. Nguyen, Huy Quang Nguyen, Anh Viet Pham, T. D. Vu
Attribute reduction from decision tables is one of the crucial topics in data mining. This problem belongs to NP-hard and many approximation algorithms based on the filter or the filter-wrapper approaches have been designed to find the reducts. Intuitionistic fuzzy set (IFS) has been regarded as the effective tool to deal with such the problem by adding two degrees, namely the membership and non-membership for each data element. The separation of attributes in the view of two counterparts as in the IFS set would increase the quality of classification and reduce the reducts. From this motivation, this paper proposes a new filter-wrapper algorithm based on the IFS for attribute reduction from decision tables. The contributions include a new instituitionistics fuzzy distance between partitions accompanied with theoretical analysis. The filter-wrapper algorithm is designed based on that distance with the new stopping condition based on the concept of delta-equality. Experiments are conducted on the benchmark UCI machine learning repository datasets.
{"title":"A Novel Filter-Wrapper Algorithm on Intuitionistic Fuzzy Set for Attribute Reduction From Decision Tables","authors":"Thang Truong Nguyen, Long Giang Nguyen, D. T. Tran, T. T. Nguyen, Huy Quang Nguyen, Anh Viet Pham, T. D. Vu","doi":"10.4018/ijdwm.2021100104","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100104","url":null,"abstract":"Attribute reduction from decision tables is one of the crucial topics in data mining. This problem belongs to NP-hard and many approximation algorithms based on the filter or the filter-wrapper approaches have been designed to find the reducts. Intuitionistic fuzzy set (IFS) has been regarded as the effective tool to deal with such the problem by adding two degrees, namely the membership and non-membership for each data element. The separation of attributes in the view of two counterparts as in the IFS set would increase the quality of classification and reduce the reducts. From this motivation, this paper proposes a new filter-wrapper algorithm based on the IFS for attribute reduction from decision tables. The contributions include a new instituitionistics fuzzy distance between partitions accompanied with theoretical analysis. The filter-wrapper algorithm is designed based on that distance with the new stopping condition based on the concept of delta-equality. Experiments are conducted on the benchmark UCI machine learning repository datasets.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84367662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.4018/ijdwm.2021100102
Bruno Oliveira, Óscar Oliveira, O. Belo
Considering extract-transform-load (ETL) as a complex and evolutionary process, development teams must conscientiously and rigorously create log strategies for retrieving the most value of the information that can be gathered from the events that occur through the ETL workflow. Efficient logging strategies must be structured so that metrics, logs, and alerts can, beyond their troubleshooting capabilities, provide insights about the system. This paper presents a configurable and flexible ETL component for creating logging mechanisms in ETL workflows. A pattern-oriented approach is followed as a way to abstract ETL activities and enable its mapping to physical primitives that can be interpreted by ETL commercial tools.
{"title":"ETL Logs Under a Pattern-Oriented Approach","authors":"Bruno Oliveira, Óscar Oliveira, O. Belo","doi":"10.4018/ijdwm.2021100102","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100102","url":null,"abstract":"Considering extract-transform-load (ETL) as a complex and evolutionary process, development teams must conscientiously and rigorously create log strategies for retrieving the most value of the information that can be gathered from the events that occur through the ETL workflow. Efficient logging strategies must be structured so that metrics, logs, and alerts can, beyond their troubleshooting capabilities, provide insights about the system. This paper presents a configurable and flexible ETL component for creating logging mechanisms in ETL workflows. A pattern-oriented approach is followed as a way to abstract ETL activities and enable its mapping to physical primitives that can be interpreted by ETL commercial tools.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85506565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-01DOI: 10.4018/IJDWM.2021070102
S. Chakraborty
Data from multiple sources are loaded into the organization data warehouse for analysis. Since some OLAP queries are quite frequently fired on the warehouse data, their execution time is reduced by storing the queries and results in a relational database, referred as materialized query database (MQDB). If the tables, fields, functions, and criteria of input query and stored query are the same but the query criteria specified in WHERE or HAVING clause do not match, then they are considered non-synonymous to each other. In the present research, the results of non-synonymous queries are generated by reusing the existing stored results after applying UNION or MINUS operations on them. This will reduce the execution time of non-synonymous queries. For superset criteria values of input query, UNION operation is applied, and for subset values, MINUS operation is applied. Incremental result processing of existing stored results, if required, is performed using Data Marts.
{"title":"A Novel Approach Using Non-Synonymous Materialized Queries for Data Warehousing","authors":"S. Chakraborty","doi":"10.4018/IJDWM.2021070102","DOIUrl":"https://doi.org/10.4018/IJDWM.2021070102","url":null,"abstract":"Data from multiple sources are loaded into the organization data warehouse for analysis. Since some OLAP queries are quite frequently fired on the warehouse data, their execution time is reduced by storing the queries and results in a relational database, referred as materialized query database (MQDB). If the tables, fields, functions, and criteria of input query and stored query are the same but the query criteria specified in WHERE or HAVING clause do not match, then they are considered non-synonymous to each other. In the present research, the results of non-synonymous queries are generated by reusing the existing stored results after applying UNION or MINUS operations on them. This will reduce the execution time of non-synonymous queries. For superset criteria values of input query, UNION operation is applied, and for subset values, MINUS operation is applied. Incremental result processing of existing stored results, if required, is performed using Data Marts.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74450217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021010103
Abdélilah Balamane
Most existing biclustering algorithms take into account the properties that hold for a set of objects. However, it could be beneficial in several application domains such as organized crimes, genetics, or digital marketing to identify homogeneous groups of similar objects in terms of both the presence and the absence of attributes. In this paper, the author proposes a scalable and efficient algorithm of biclustering that exploits a binary matrix to produce at least three types of biclusters where the cell's column (1) are filled with 1's, (2) are filled with 0's, and (3) some columns filled with 1's and/or with 0's. This procedure is scalable and it's executed without having to consider the complementary of the initial binary context. The implementation and validation of the method on data sets illustrates its potential in the discovery of relevant patterns.
{"title":"Scalable Biclustering Algorithm Considers the Presence or Absence of Properties","authors":"Abdélilah Balamane","doi":"10.4018/IJDWM.2021010103","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010103","url":null,"abstract":"Most existing biclustering algorithms take into account the properties that hold for a set of objects. However, it could be beneficial in several application domains such as organized crimes, genetics, or digital marketing to identify homogeneous groups of similar objects in terms of both the presence and the absence of attributes. In this paper, the author proposes a scalable and efficient algorithm of biclustering that exploits a binary matrix to produce at least three types of biclusters where the cell's column (1) are filled with 1's, (2) are filled with 0's, and (3) some columns filled with 1's and/or with 0's. This procedure is scalable and it's executed without having to consider the complementary of the initial binary context. The implementation and validation of the method on data sets illustrates its potential in the discovery of relevant patterns.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75477856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021010105
Neha Gupta, Sakshi Jolly
Data usually comes into data warehouses from multiple sources having different formats and are specifically categorized into three groups (i.e., structured, semi-structured, and unstructured). Various data mining technologies are used to collect, refine, and analyze the data which further leads to the problem of data quality management. Data purgation occurs when the data is subject to ETL methodology in order to maintain and improve the data quality. The data may contain unnecessary information and may have inappropriate symbols which can be defined as dummy values, cryptic values, or missing values. The present work has improved the expectation-maximization algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics to ensure dummy values, Wards algorithm with Minkowski distance to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse.
{"title":"Enhancing Data Quality at ETL Stage of Data Warehousing","authors":"Neha Gupta, Sakshi Jolly","doi":"10.4018/IJDWM.2021010105","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010105","url":null,"abstract":"Data usually comes into data warehouses from multiple sources having different formats and are specifically categorized into three groups (i.e., structured, semi-structured, and unstructured). Various data mining technologies are used to collect, refine, and analyze the data which further leads to the problem of data quality management. Data purgation occurs when the data is subject to ETL methodology in order to maintain and improve the data quality. The data may contain unnecessary information and may have inappropriate symbols which can be defined as dummy values, cryptic values, or missing values. The present work has improved the expectation-maximization algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics to ensure dummy values, Wards algorithm with Minkowski distance to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73515708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021040104
V. Rajinikanth, S. Kadry
In medical domain, the detection of the acute diseases based on the medical data plays a vital role in identifying the nature, cause, and the severity of the disease with suitable accuracy; this information supports the doctor during the decision making and treatment planning procedures. The research aims to develop a framework for preserving the disease-evidence-information (DEvI) to support the automated disease detection process. Various phases of DEvI include (1) data collection, (2) data pre- and post-processing, (3) disease information mining, and (4) implementation of a deep-neural-network (DNN) architecture to detect the disease. To demonstrate the proposed framework, assessment of lung nodule (LN) is presented, and the attained result confirms that this framework helps to attain better segmentation as well as classification result. This technique is clinically significant and helps to reduce the diagnostic burden of the doctor during the malignant LN detection.
{"title":"Development of a Framework for Preserving the Disease-Evidence-Information to Support Efficient Disease Diagnosis","authors":"V. Rajinikanth, S. Kadry","doi":"10.4018/IJDWM.2021040104","DOIUrl":"https://doi.org/10.4018/IJDWM.2021040104","url":null,"abstract":"In medical domain, the detection of the acute diseases based on the medical data plays a vital role in identifying the nature, cause, and the severity of the disease with suitable accuracy; this information supports the doctor during the decision making and treatment planning procedures. The research aims to develop a framework for preserving the disease-evidence-information (DEvI) to support the automated disease detection process. Various phases of DEvI include (1) data collection, (2) data pre- and post-processing, (3) disease information mining, and (4) implementation of a deep-neural-network (DNN) architecture to detect the disease. To demonstrate the proposed framework, assessment of lung nodule (LN) is presented, and the attained result confirms that this framework helps to attain better segmentation as well as classification result. This technique is clinically significant and helps to reduce the diagnostic burden of the doctor during the malignant LN detection.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81417632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021070101
Amir Sakka, S. Bimonte, F. Pinet, Lucile Sautot
With the maturity of crowdsourcing systems, new analysis possibilities appear where volunteers play a crucial role by bringing the implicit knowledge issued from practical and daily experience. At the same time, data warehouse and OLAP systems represent the first citizen of decision-support systems. They allow analyzing a huge volume of data according to the multidimensional model. The more the multidimensional model reflects the decision-makers' analysis needs, the more the DW project is successful. However, when volunteers are involved in the design of DWs, existing DW design methodologies present some limitations. In this work, the authors present the main features of volunteer data warehouse (VDW) design, and they study the main existing DW design methodology to find out how they can contribute to fulfil the features needed by this particular DW approach. To provide a formal framework to classify existing work, they provide a study of differences between classical DW users and volunteers. The paper also presents a set of open issues for VDW.
{"title":"Volunteer Data Warehouse: State of the Art","authors":"Amir Sakka, S. Bimonte, F. Pinet, Lucile Sautot","doi":"10.4018/IJDWM.2021070101","DOIUrl":"https://doi.org/10.4018/IJDWM.2021070101","url":null,"abstract":"With the maturity of crowdsourcing systems, new analysis possibilities appear where volunteers play a crucial role by bringing the implicit knowledge issued from practical and daily experience. At the same time, data warehouse and OLAP systems represent the first citizen of decision-support systems. They allow analyzing a huge volume of data according to the multidimensional model. The more the multidimensional model reflects the decision-makers' analysis needs, the more the DW project is successful. However, when volunteers are involved in the design of DWs, existing DW design methodologies present some limitations. In this work, the authors present the main features of volunteer data warehouse (VDW) design, and they study the main existing DW design methodology to find out how they can contribute to fulfil the features needed by this particular DW approach. To provide a formal framework to classify existing work, they provide a study of differences between classical DW users and volunteers. The paper also presents a set of open issues for VDW.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87022433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.4018/IJDWM.2021010102
S. Belkacem, K. Boukhalfa
Social media are used by hundreds of millions of users worldwide. On these platforms, any user can post and share updates with individuals from his social network. Due to the large amount of data, users are overwhelmed by updates displayed chronologically in their newsfeed. Moreover, most of them are irrelevant. Ranking newsfeed updates in order of relevance is proposed to help users quickly catch up with the relevant updates. In this work, the authors first study approaches proposed in this area according to four main criteria: features that may influence relevance, relevance prediction models, training and evaluation methods, and evaluation platforms. Then the authors propose an approach that leverages another type of feature which is the expertise of the update's author for the corresponding topics. Experimental results on Twitter highlight that judging expertise, which has not been considered in the academic and the industrial communities, is crucial for maximizing the relevance of updates in newsfeeds.
{"title":"Ranking News Feed Updates on Social Media: A Review and Expertise-Aware Approach","authors":"S. Belkacem, K. Boukhalfa","doi":"10.4018/IJDWM.2021010102","DOIUrl":"https://doi.org/10.4018/IJDWM.2021010102","url":null,"abstract":"Social media are used by hundreds of millions of users worldwide. On these platforms, any user can post and share updates with individuals from his social network. Due to the large amount of data, users are overwhelmed by updates displayed chronologically in their newsfeed. Moreover, most of them are irrelevant. Ranking newsfeed updates in order of relevance is proposed to help users quickly catch up with the relevant updates. In this work, the authors first study approaches proposed in this area according to four main criteria: features that may influence relevance, relevance prediction models, training and evaluation methods, and evaluation platforms. Then the authors propose an approach that leverages another type of feature which is the expertise of the update's author for the corresponding topics. Experimental results on Twitter highlight that judging expertise, which has not been considered in the academic and the industrial communities, is crucial for maximizing the relevance of updates in newsfeeds.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82725512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}