Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00056
Ying Liu, Wei Wang, Tianlin Zhang, Zhenyu Cui
Learning effective feature interactions behind user behavior is challenging in credit scoring. Existing machine learning methods seem to have a strong bias towards low-order or high-order interactions, or require expertise feature engineering. In this paper, we present a novel neural network approach AttentionFM, which incorporates Factorization Machines and Attention mechanism for credit scoring. The proposed model focuses more on critical features and emphasizes both low- and high-order feature interactions, with no need of manually feature engineering on raw data representation. Experimental results demonstrate that our proposed model significantly outperforms the baselines based on two public datasets.
{"title":"AttentionFM: Incorporating Attention Mechanism and Factorization Machine for Credit Scoring","authors":"Ying Liu, Wei Wang, Tianlin Zhang, Zhenyu Cui","doi":"10.1109/ICDMW51313.2020.00056","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00056","url":null,"abstract":"Learning effective feature interactions behind user behavior is challenging in credit scoring. Existing machine learning methods seem to have a strong bias towards low-order or high-order interactions, or require expertise feature engineering. In this paper, we present a novel neural network approach AttentionFM, which incorporates Factorization Machines and Attention mechanism for credit scoring. The proposed model focuses more on critical features and emphasizes both low- and high-order feature interactions, with no need of manually feature engineering on raw data representation. Experimental results demonstrate that our proposed model significantly outperforms the baselines based on two public datasets.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117273944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00119
Pierre-Antoine Laharotte, Romain Billot, Nour-Eddin El Faouzi
Can we expose the relationship between the physical dynamics of a network and its predictability? To contribute to this point, we propose a dimensionality reduction method for network states prediction based on spatiotemporal data. The method is intended to deal with large scale networks, where only a subset of critical links can be relevant for accurate multidimensional prediction (MIMO) performances. The algorithm is based on Latent Dirichlet Allocation (LDA) to highlight relevant topics in terms of networks dynamics. The feature selection trick relies on the assumption that the most representative links of the most dominant topics are critical links for short term prediction. The method is fully implemented to an original application field: short term road traffic prediction on large scale urban networks based on GPS data. Results highlight significant reductions in dimensionality and execution time, a global improvement of prediction performances as well as a better resilience to non recurrent traffic flow conditions.
{"title":"Detecting Dynamic Critical Links within Large Scale Network for Traffic State Prediction","authors":"Pierre-Antoine Laharotte, Romain Billot, Nour-Eddin El Faouzi","doi":"10.1109/ICDMW51313.2020.00119","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00119","url":null,"abstract":"Can we expose the relationship between the physical dynamics of a network and its predictability? To contribute to this point, we propose a dimensionality reduction method for network states prediction based on spatiotemporal data. The method is intended to deal with large scale networks, where only a subset of critical links can be relevant for accurate multidimensional prediction (MIMO) performances. The algorithm is based on Latent Dirichlet Allocation (LDA) to highlight relevant topics in terms of networks dynamics. The feature selection trick relies on the assumption that the most representative links of the most dominant topics are critical links for short term prediction. The method is fully implemented to an original application field: short term road traffic prediction on large scale urban networks based on GPS data. Results highlight significant reductions in dimensionality and execution time, a global improvement of prediction performances as well as a better resilience to non recurrent traffic flow conditions.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121991379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00087
A. T. Adebisi, V. Gonuguntla, Ho-Won Lee, K. Veluvolu
Dementia associated disorders such as vascular dementia, frontotemporal dementia and Alzheimer dementia lead to cognitive impairment. Discrimination of dementia associated disorders has reamined a challenging task as they have overlapping underlying complex structures and display similar clinical features. In this work, we explore an EEG based frequent subgraph searching technique to characterize stages of brain functional networks of mild cognitive impairment (MCI), Alzheimer's disease (AD) and vascular dementia (VD) subjects in comparison with healthy control (HC) subjects. To identify the frequent subgraph related to dementia, we first formulated the brain functional network based on the phase information of EEG with mutual information as a measure. The whole network is then divided into sub-regions and frequent sub-graph search is performed. The identified frequent subgraphs were employed to discriminate the dementia associated disorders from the data recorded from 10 healthy and 32 dementia subjects in various stages. Results show that the proposed method has the potential to quantify the disease progression using brain functional connectivity and the identified networks can aid in the diagnosis of dementia associated disorders.
{"title":"Classification of Dementia Associated Disorders Using EEG based Frequent Subgraph Technique","authors":"A. T. Adebisi, V. Gonuguntla, Ho-Won Lee, K. Veluvolu","doi":"10.1109/ICDMW51313.2020.00087","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00087","url":null,"abstract":"Dementia associated disorders such as vascular dementia, frontotemporal dementia and Alzheimer dementia lead to cognitive impairment. Discrimination of dementia associated disorders has reamined a challenging task as they have overlapping underlying complex structures and display similar clinical features. In this work, we explore an EEG based frequent subgraph searching technique to characterize stages of brain functional networks of mild cognitive impairment (MCI), Alzheimer's disease (AD) and vascular dementia (VD) subjects in comparison with healthy control (HC) subjects. To identify the frequent subgraph related to dementia, we first formulated the brain functional network based on the phase information of EEG with mutual information as a measure. The whole network is then divided into sub-regions and frequent sub-graph search is performed. The identified frequent subgraphs were employed to discriminate the dementia associated disorders from the data recorded from 10 healthy and 32 dementia subjects in various stages. Results show that the proposed method has the potential to quantify the disease progression using brain functional connectivity and the identified networks can aid in the diagnosis of dementia associated disorders.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133765864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advanced Metering Infrastructures (AMIs) facilitate individual load forecasting. The individual load forecasting not only improves the accuracy of aggregated load forecasting but is a fundamental component of various power applications. With the highlight of deep learning (DL) in the individual load forecasting, a serving platform specialized in deep learning is required to forecast with AMI stream data. However, the existing serving platforms for DL models do not consider stream data as an input but usually support image or text data through RESTful API. To solve this problem, we propose StreamDL that is a serving framework providing deep learning inference with AMI stream data. It leverages Apache Kafka to support stream data and Kubernetes to support the cloud environment. StreamDL considers the specific requirements for stream data, which supports stream parsing to fit any DL model especially recurrent network and continual training to alleviate accuracy degradation by the change of stream distribution. In this paper, we introduce the detail of the StreamDL platform and its use-cases using real AMI data.
{"title":"StreamDL: Deep Learning Serving Platform for AMI Stream Forecasting","authors":"Eunju Yang, Changha Lee, Ji-Hwan Kim, Tuan Manh Tao, Chan-Hyun Youn","doi":"10.1109/ICDMW51313.2020.00104","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00104","url":null,"abstract":"Advanced Metering Infrastructures (AMIs) facilitate individual load forecasting. The individual load forecasting not only improves the accuracy of aggregated load forecasting but is a fundamental component of various power applications. With the highlight of deep learning (DL) in the individual load forecasting, a serving platform specialized in deep learning is required to forecast with AMI stream data. However, the existing serving platforms for DL models do not consider stream data as an input but usually support image or text data through RESTful API. To solve this problem, we propose StreamDL that is a serving framework providing deep learning inference with AMI stream data. It leverages Apache Kafka to support stream data and Kubernetes to support the cloud environment. StreamDL considers the specific requirements for stream data, which supports stream parsing to fit any DL model especially recurrent network and continual training to alleviate accuracy degradation by the change of stream distribution. In this paper, we introduce the detail of the StreamDL platform and its use-cases using real AMI data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127018210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00046
J. V. D. Hoogen, Stefan Bloemheuvel, M. Atzmüller
Deep Learning (DL) provides considerable opportunities for increased efficiency and performance in fault diagnosis. The ability of DL methods for automatic feature extraction can reduce the need for time-intensive feature construction and prior knowledge on complex signal processing. In this paper, we propose two models that are built on the Wide-Kernel Deep Convolutional Neural Network (WDCNN) framework to improve performance of classifying fault conditions using multivariate time series data, also with respect to limited and/or noisy training data. In our experiments, we use the renowned benchmark dataset from the Case Western Reserve University (CWRU) bearing experiment [1] to assess our models' performance, and to investigate their usability towards large-scale applications by simulating noisy industrial environments. Here, the proposed models show an exceptionally good performance without any preprocessing or data augmentation and outperform traditional Machine Learning applications as well as state-of-the-art DL models considerably, even in such complex multi-class classification tasks. We show that both models are also able to adapt well to noisy input data, which makes them suitable for condition-based maintenance contexts. Furthermore, we investigate and demonstrate explainability and transparency of the models which is particularly important in large-scale industrial applications.
{"title":"An Improved Wide-Kernel CNN for Classifying Multivariate Signals in Fault Diagnosis","authors":"J. V. D. Hoogen, Stefan Bloemheuvel, M. Atzmüller","doi":"10.1109/ICDMW51313.2020.00046","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00046","url":null,"abstract":"Deep Learning (DL) provides considerable opportunities for increased efficiency and performance in fault diagnosis. The ability of DL methods for automatic feature extraction can reduce the need for time-intensive feature construction and prior knowledge on complex signal processing. In this paper, we propose two models that are built on the Wide-Kernel Deep Convolutional Neural Network (WDCNN) framework to improve performance of classifying fault conditions using multivariate time series data, also with respect to limited and/or noisy training data. In our experiments, we use the renowned benchmark dataset from the Case Western Reserve University (CWRU) bearing experiment [1] to assess our models' performance, and to investigate their usability towards large-scale applications by simulating noisy industrial environments. Here, the proposed models show an exceptionally good performance without any preprocessing or data augmentation and outperform traditional Machine Learning applications as well as state-of-the-art DL models considerably, even in such complex multi-class classification tasks. We show that both models are also able to adapt well to noisy input data, which makes them suitable for condition-based maintenance contexts. Furthermore, we investigate and demonstrate explainability and transparency of the models which is particularly important in large-scale industrial applications.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116010948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00128
Gejun Le, Qifeng Gu, Qingshan Jiang, Weiyi Lin
Supply chain involves mutual independent and distrusted stakeholders and large of sensitive order data. Sharing data among stakeholders is a essential project because that improves efficiency for various workflow among stakeholders. This paper proposes TrustedChain, a blockchain-based data sharing scheme for supply chain, which has two advantages: (a) trusted: we present a trusted environment, Trusted Environment (TE), based on blockchain to allow mutually distrusted stakeholders manage data collaboratively. (b) secure: we provide a secure design that first stores order forms in Distributed Database (DDB) and then records URI in Contract Account (CA) of TE. In addition, Supply-Business Contract Management (SCM) manages all CA and Node Communication (NC) allows communication over the network. The security analysis and evaluation prove the effectiveness of TrustedChain.
{"title":"TrustedChain: A Blockchain-based Data Sharing Scheme for Supply Chain","authors":"Gejun Le, Qifeng Gu, Qingshan Jiang, Weiyi Lin","doi":"10.1109/ICDMW51313.2020.00128","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00128","url":null,"abstract":"Supply chain involves mutual independent and distrusted stakeholders and large of sensitive order data. Sharing data among stakeholders is a essential project because that improves efficiency for various workflow among stakeholders. This paper proposes TrustedChain, a blockchain-based data sharing scheme for supply chain, which has two advantages: (a) trusted: we present a trusted environment, Trusted Environment (TE), based on blockchain to allow mutually distrusted stakeholders manage data collaboratively. (b) secure: we provide a secure design that first stores order forms in Distributed Database (DDB) and then records URI in Contract Account (CA) of TE. In addition, Supply-Business Contract Management (SCM) manages all CA and Node Communication (NC) allows communication over the network. The security analysis and evaluation prove the effectiveness of TrustedChain.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"10 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134470362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00109
Sue Hyang Lim, S. Kim, Hyeong Min Lee, Sijun Kim, Y. Shin
Rapid charging of Li-ion batteries is vital for the commercialization of electric propulsion systems. But, during the fast-charging process, reduction in the battery capacity and temperature increases must be considered in real-time. Most Li-ion battery chargers follow the charging profile of an open-loop system, which has been determined based on prior knowledge. However, such a system does not reflect the temperature change of the battery and the degree of aging. Therefore, in this study, we propose a neural network-based charging profile model by applying a closed-loop system to reflect the various states of batteries; we also show two battery-state characteristics in addition to temperature. Consequently, we show battery characteristics other than those shown in the past, such as the battery voltage and temperature trends. In addition to the design of the charging current, an improvement of approximately 22 ∼ 50% based on the mean absolute error (MAE) is achieved. By considering the various characteristics, the long short-term memory performance is determined to be better when compared to the feed-forward neural network, and this performance is improved by 35% based on MAE.
{"title":"Design of Neural Network-based Boost Charging for Reducing the Charging Time of Li-ion Battery","authors":"Sue Hyang Lim, S. Kim, Hyeong Min Lee, Sijun Kim, Y. Shin","doi":"10.1109/ICDMW51313.2020.00109","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00109","url":null,"abstract":"Rapid charging of Li-ion batteries is vital for the commercialization of electric propulsion systems. But, during the fast-charging process, reduction in the battery capacity and temperature increases must be considered in real-time. Most Li-ion battery chargers follow the charging profile of an open-loop system, which has been determined based on prior knowledge. However, such a system does not reflect the temperature change of the battery and the degree of aging. Therefore, in this study, we propose a neural network-based charging profile model by applying a closed-loop system to reflect the various states of batteries; we also show two battery-state characteristics in addition to temperature. Consequently, we show battery characteristics other than those shown in the past, such as the battery voltage and temperature trends. In addition to the design of the charging current, an improvement of approximately 22 ∼ 50% based on the mean absolute error (MAE) is achieved. By considering the various characteristics, the long short-term memory performance is determined to be better when compared to the feed-forward neural network, and this performance is improved by 35% based on MAE.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123358265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00097
J. Wu, Qian Teng, Gautam Srivastava, Matin Pirouz, Chun-Wei Lin
In this paper, we propose a new pattern called skyline quantity-utility pattern (SQUP) to provide better estimations in the decision-making process by considering quantity and utility together. Two algorithms respectively called SQUM-1 and SQUM-2 are presented to efficiently mine the set of SQUPs. Two new efficient utility-max structures are also mentioned for the reduction of the candidate itemsets respectively utilized in two developed algorithms. Our in-depth experimental results prove that our proposed algorithms achieve good performance in terms of runtime and memory usage.
{"title":"Efficient Mining of Non-Dominated High Quantity-Utility Patterns","authors":"J. Wu, Qian Teng, Gautam Srivastava, Matin Pirouz, Chun-Wei Lin","doi":"10.1109/ICDMW51313.2020.00097","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00097","url":null,"abstract":"In this paper, we propose a new pattern called skyline quantity-utility pattern (SQUP) to provide better estimations in the decision-making process by considering quantity and utility together. Two algorithms respectively called SQUM-1 and SQUM-2 are presented to efficiently mine the set of SQUPs. Two new efficient utility-max structures are also mentioned for the reduction of the candidate itemsets respectively utilized in two developed algorithms. Our in-depth experimental results prove that our proposed algorithms achieve good performance in terms of runtime and memory usage.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123810350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00030
Zhen Liu, J. Tian, Lingxi Zhao, Yanling Zhang
Recommendation systems have been widely developed for numerous applications. Existing systems may still suffer from negative transfer or cold starts. These drawbacks are essentially due to overlooking domain-specific users' personal preferences or cross-domain user-item interactions. To address these problems, we propose a cross-domain recommendation algorithm built on a mapping-based attentive feature transfer (MAFT) model. Our MAFT model utilizes matrix factorization and an attention mechanism for fine-grained modeling of user preferences. Then, overlapping cross-domain user features are combined through feature fusion. Moreover, a multilayer perceptron (MLP) is built to map the obtained user features to target-domain user features. Finally, the user-item ratings can be predicted in the target domain. We carried out experiments on the large-scale MovieLens dataset as well as the real Douban Book and Douban Movie datasets. The results show that the precision of the MAFT-based method is clearly higher than those of other cross-domain recommendation methods, especially for cold-start users with few item interactions.
{"title":"Attentive-Feature Transfer based on Mapping for Cross-domain Recommendation","authors":"Zhen Liu, J. Tian, Lingxi Zhao, Yanling Zhang","doi":"10.1109/ICDMW51313.2020.00030","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00030","url":null,"abstract":"Recommendation systems have been widely developed for numerous applications. Existing systems may still suffer from negative transfer or cold starts. These drawbacks are essentially due to overlooking domain-specific users' personal preferences or cross-domain user-item interactions. To address these problems, we propose a cross-domain recommendation algorithm built on a mapping-based attentive feature transfer (MAFT) model. Our MAFT model utilizes matrix factorization and an attention mechanism for fine-grained modeling of user preferences. Then, overlapping cross-domain user features are combined through feature fusion. Moreover, a multilayer perceptron (MLP) is built to map the obtained user features to target-domain user features. Finally, the user-item ratings can be predicted in the target domain. We carried out experiments on the large-scale MovieLens dataset as well as the real Douban Book and Douban Movie datasets. The results show that the precision of the MAFT-based method is clearly higher than those of other cross-domain recommendation methods, especially for cold-start users with few item interactions.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130021462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00064
Christian Schreckenberger, Tim Glockner, H. Stuckenschmidt, Christian Bartelt
Trapezoidal Data Streams are an emerging topic, where not only the data volume increases, but also the data dimension, i.e. new features emerge. In this paper, we address the challenges that arise from this problem by providing a novel approach to restructure and prune Hoeffding trees. We evaluate our approach on synthetic datasets, where we can show that the approach significantly improves the performance compared to the baseline of an adjusted Hoeffding tree algorithm without restructuring and pruning.
{"title":"Restructuring of Hoeffding Trees for Trapezoidal Data Streams","authors":"Christian Schreckenberger, Tim Glockner, H. Stuckenschmidt, Christian Bartelt","doi":"10.1109/ICDMW51313.2020.00064","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00064","url":null,"abstract":"Trapezoidal Data Streams are an emerging topic, where not only the data volume increases, but also the data dimension, i.e. new features emerge. In this paper, we address the challenges that arise from this problem by providing a novel approach to restructure and prune Hoeffding trees. We evaluate our approach on synthetic datasets, where we can show that the approach significantly improves the performance compared to the baseline of an adjusted Hoeffding tree algorithm without restructuring and pruning.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114661349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}