Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00095
M. Nouioua, Ying Wang, Philippe Fournier-Viger, Jerry Chun‐wei Lin, J. Wu
High utility itemset mining is a well-studied data mining task for analyzing customer transactions. The goal is to find all high utility itemsets, that is items purchased together that generate a profit equal to or greater than a user-defined minimum utility threshold. However, a limitation of traditional high utility itemset mining algorithms is that item categories (e.g. drinks, dairy products) are ignored. Recently, two algorithms were designed to find multi-level and cross-level high utility itemsets to reveal relationships between items and/or categories of items. This is achieved by considering a product taxonomy, where items are organized into a hierarchy. Though these algorithms can reveal interesting patterns, a problem is that setting the minimum utility threshold is not intuitive and greatly influences the number of patterns found and the algorithms' performance. If the user sets the threshold too low, a huge number of patterns is found and runtimes can be very long, while if the threshold is set too high, few patterns are found. Hence, a user often have to run an algorithm numerous times to find an appropriate threshold value to obtain just enough patterns. This paper addresses this issue by presenting a novel algorithm called TKC (Top-K Cross-level high utility itemset miner), which let the user directly set the number of patterns $k$ to be discovered. TKC performs a depth-first search and include search space pruning techniques and an optimization to enhance its performance. Experiments were done on retail data with taxonomy information. Results indicate that the algorithm is efficient and the optimization improves its performance.
高效用项集挖掘是一种被广泛研究的用于分析客户交易的数据挖掘任务。目标是找到所有高效用物品集,即一起购买的产生利润等于或大于用户定义的最小效用阈值的物品。然而,传统的高效用项目集挖掘算法的一个局限性是忽略了项目类别(例如饮料,乳制品)。最近,设计了两种算法来寻找多层次和跨层次的高效用项目集,以揭示项目之间和/或项目类别之间的关系。这可以通过考虑产品分类法来实现,产品分类法将项目组织成层次结构。虽然这些算法可以揭示有趣的模式,但问题是设置最小效用阈值并不直观,并且会极大地影响发现的模式数量和算法的性能。如果用户将阈值设置得太低,则会发现大量的模式,并且运行时间可能很长,而如果阈值设置得太高,则会发现很少的模式。因此,用户通常必须多次运行算法才能找到合适的阈值,以获得刚好足够的模式。本文通过提出一种名为TKC (Top-K Cross-level high utility itemset miner)的新算法来解决这个问题,该算法允许用户直接设置要发现的模式的数量。TKC执行深度优先搜索,包括搜索空间修剪技术和优化,以提高其性能。利用分类信息对零售数据进行了实验。结果表明,该算法是有效的,优化后的算法性能得到了提高。
{"title":"TKC: Mining Top-K Cross-Level High Utility Itemsets","authors":"M. Nouioua, Ying Wang, Philippe Fournier-Viger, Jerry Chun‐wei Lin, J. Wu","doi":"10.1109/ICDMW51313.2020.00095","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00095","url":null,"abstract":"High utility itemset mining is a well-studied data mining task for analyzing customer transactions. The goal is to find all high utility itemsets, that is items purchased together that generate a profit equal to or greater than a user-defined minimum utility threshold. However, a limitation of traditional high utility itemset mining algorithms is that item categories (e.g. drinks, dairy products) are ignored. Recently, two algorithms were designed to find multi-level and cross-level high utility itemsets to reveal relationships between items and/or categories of items. This is achieved by considering a product taxonomy, where items are organized into a hierarchy. Though these algorithms can reveal interesting patterns, a problem is that setting the minimum utility threshold is not intuitive and greatly influences the number of patterns found and the algorithms' performance. If the user sets the threshold too low, a huge number of patterns is found and runtimes can be very long, while if the threshold is set too high, few patterns are found. Hence, a user often have to run an algorithm numerous times to find an appropriate threshold value to obtain just enough patterns. This paper addresses this issue by presenting a novel algorithm called TKC (Top-K Cross-level high utility itemset miner), which let the user directly set the number of patterns $k$ to be discovered. TKC performs a depth-first search and include search space pruning techniques and an optimization to enhance its performance. Experiments were done on retail data with taxonomy information. Results indicate that the algorithm is efficient and the optimization improves its performance.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127908375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00089
Krzysztof Mnich, A. Polewko-Klim, A. Golinska, W. Lesiński, W. Rudnicki
Super learner algorithm was created to combine results of multiple base learners with the use of cross validation. However, in many cases it does not outperform significantly a simple average of the base results. We propose to apply multiple repeats of cross validation to improve the performance of super learning. Two approaches to application of repeated cross validation were tested on artificial data sets and on real-life, biomedical data sets. One of the approaches, MEAN OUTPUT strategy, proved to significantly improve the results. To reduce the computational complexity of the algorithm, we suggest the use of 3-fold, rather than the previously recommended 10-fold validation. The tests showed, that this simplification does not affect the super learning results.
{"title":"Super Learning with Repeated Cross Validation","authors":"Krzysztof Mnich, A. Polewko-Klim, A. Golinska, W. Lesiński, W. Rudnicki","doi":"10.1109/ICDMW51313.2020.00089","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00089","url":null,"abstract":"Super learner algorithm was created to combine results of multiple base learners with the use of cross validation. However, in many cases it does not outperform significantly a simple average of the base results. We propose to apply multiple repeats of cross validation to improve the performance of super learning. Two approaches to application of repeated cross validation were tested on artificial data sets and on real-life, biomedical data sets. One of the approaches, MEAN OUTPUT strategy, proved to significantly improve the results. To reduce the computational complexity of the algorithm, we suggest the use of 3-fold, rather than the previously recommended 10-fold validation. The tests showed, that this simplification does not affect the super learning results.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124521291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00079
Yefan Zhou, Zhao Lv, Chaoqun Wang, Shengli Zhang
The number of traffic accident deaths caused by driving is increasing every year, in which the improper driving behaviors account for a large proportion of traffic accidents. To alert the driver's behaviors, we design a light and fast neural network (LFNN). On this basis, we construct a convolutional two-stream interactive network framework. One stream is used to acquire the spatial information of hand appearance; the other stream is used to obtain hand movement's temporal information. The features generated by the two streams are fused and classified through a short, interactive connection network. Our network structure has been tested on the CVRR-HANDS 3D data set. The accuracy reaches up to 96.5%, which obtains an obvious improvement compared with state of the art.
{"title":"A Two-Stream Network For Driving Hand Gesture Recognition","authors":"Yefan Zhou, Zhao Lv, Chaoqun Wang, Shengli Zhang","doi":"10.1109/ICDMW51313.2020.00079","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00079","url":null,"abstract":"The number of traffic accident deaths caused by driving is increasing every year, in which the improper driving behaviors account for a large proportion of traffic accidents. To alert the driver's behaviors, we design a light and fast neural network (LFNN). On this basis, we construct a convolutional two-stream interactive network framework. One stream is used to acquire the spatial information of hand appearance; the other stream is used to obtain hand movement's temporal information. The features generated by the two streams are fused and classified through a short, interactive connection network. Our network structure has been tested on the CVRR-HANDS 3D data set. The accuracy reaches up to 96.5%, which obtains an obvious improvement compared with state of the art.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116751023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00063
Ricardo Pereira, Bruno Casal Laraña, Nádia Soares, M. Araujo
When working with real-world temporal data, it is common to encounter features whose distribution is changing over time. The naive employment of Machine Learning models on this unstable data might lead to rapidly degrading performance, especially if the new distribution is much different from what was previously seen during training. In order to cope with this problem, it is critical to automatically identify features that are changing over time. With these features detected, data scientists and other practitioners will be able to mitigate the issue (for instance, by applying data transformations), deploying more robust models that retain high performance for longer periods of time. In this paper, we describe which temporal changes a feature should not suffer from, and propose TEDD, a technique to a) identify when a dataset might lead to an unstable Machine Learning model and b) automatically detect which features cause such lack of robustness. In order to achieve it, we leverage a regression model to highlight which features contribute to a good prediction of an instance's timestamp. We compare our approach to other methods in real and synthetic data, testing their detection capability on all simple change patterns. We show that our method: detects all types of basic changes, both for numerical and categorical features; can detect multivariate drifts; returns a comparable value measuring the amount of change of each feature; requires no parameter tuning; and is scalable both on number of features and instances of the dataset.
{"title":"TEDD: Robust Detection of Unstable Temporal Features","authors":"Ricardo Pereira, Bruno Casal Laraña, Nádia Soares, M. Araujo","doi":"10.1109/ICDMW51313.2020.00063","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00063","url":null,"abstract":"When working with real-world temporal data, it is common to encounter features whose distribution is changing over time. The naive employment of Machine Learning models on this unstable data might lead to rapidly degrading performance, especially if the new distribution is much different from what was previously seen during training. In order to cope with this problem, it is critical to automatically identify features that are changing over time. With these features detected, data scientists and other practitioners will be able to mitigate the issue (for instance, by applying data transformations), deploying more robust models that retain high performance for longer periods of time. In this paper, we describe which temporal changes a feature should not suffer from, and propose TEDD, a technique to a) identify when a dataset might lead to an unstable Machine Learning model and b) automatically detect which features cause such lack of robustness. In order to achieve it, we leverage a regression model to highlight which features contribute to a good prediction of an instance's timestamp. We compare our approach to other methods in real and synthetic data, testing their detection capability on all simple change patterns. We show that our method: detects all types of basic changes, both for numerical and categorical features; can detect multivariate drifts; returns a comparable value measuring the amount of change of each feature; requires no parameter tuning; and is scalable both on number of features and instances of the dataset.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121516076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00039
Rui Ye, Qing Zhang, Hengliang Luo
Recent advancements in Graph Neural Networks (GNN) have achieved promising results for the session-based recommendation, which aims to predict a user's actions based on anonymous sessions. However, existing graph-structured recommendation methods only focus on the internals of a session and neglect cross-session effect which contains valuable complement information for more accurately learning the taste of the user in the current session. Meanwhile, the graph structure lacks the sequential position information so that different sequential sessions can be constructed as the same graph, inevitably limiting its capacity of obtaining an accurate vector of a session representation. In order to solve the above limitations, we propose Cross-session Aware Temporal Convolutional Network (CA-TCN) model. For the cross-session aware aspect, CA-TCN builds a global-item graph and a session-context graph to model cross-session influence on both items and sessions. Global-item graph explores the global cross-session influence on items by building relevant item connections among all sessions. Session-context graph explores the complex cross-session influence on sessions by building the connections between the current session and other sessions with similar user intents and behavioral patterns as the current session. And, we connect items and sessions with hierarchical item-level and session-level attention mechanism. Besides, compared with the GNN, TCN can perform convolution operation on multi-hops items and maintain sequence information in the process of convolution. Extensive experiments on two real-world datasets show that our method outperforms state-of-the-art methods consistently.
{"title":"Cross-Session Aware Temporal Convolutional Network for Session-based Recommendation","authors":"Rui Ye, Qing Zhang, Hengliang Luo","doi":"10.1109/ICDMW51313.2020.00039","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00039","url":null,"abstract":"Recent advancements in Graph Neural Networks (GNN) have achieved promising results for the session-based recommendation, which aims to predict a user's actions based on anonymous sessions. However, existing graph-structured recommendation methods only focus on the internals of a session and neglect cross-session effect which contains valuable complement information for more accurately learning the taste of the user in the current session. Meanwhile, the graph structure lacks the sequential position information so that different sequential sessions can be constructed as the same graph, inevitably limiting its capacity of obtaining an accurate vector of a session representation. In order to solve the above limitations, we propose Cross-session Aware Temporal Convolutional Network (CA-TCN) model. For the cross-session aware aspect, CA-TCN builds a global-item graph and a session-context graph to model cross-session influence on both items and sessions. Global-item graph explores the global cross-session influence on items by building relevant item connections among all sessions. Session-context graph explores the complex cross-session influence on sessions by building the connections between the current session and other sessions with similar user intents and behavioral patterns as the current session. And, we connect items and sessions with hierarchical item-level and session-level attention mechanism. Besides, compared with the GNN, TCN can perform convolution operation on multi-hops items and maintain sequence information in the process of convolution. Extensive experiments on two real-world datasets show that our method outperforms state-of-the-art methods consistently.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121637774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00093
Minh-Son Dao, Ngoc-Thanh Nguyen, R. U. Kiran, K. Zettsu
Monitoring traffic congestion in smart cities is a challenging problem of great importance in Intelligent Transportation Systems (ITS). Most previous works focused on developing machine learning models that can predict traffic congestion on a meshcode (i.e., a portion of an earth's surface) at a particular time instance. The key limitation of these studies is that they fail to provide holistic information regarding the sets of meshcodes in which regular congestion may happen in the forecasted data. This paper proposes a novel framework to address this problem. The proposed framework employs a 3DCNN multi-source deep learning model (hereafter, called Fusion-3DCNN) to predict traffic congestion on a particular meshcode at a particular time instance. The predicted traffic congestion data is later transformed into a temporal database and feed to the periodic-frequent pattern mining algorithm to identify the sets of meshcode in which regular congestions may happen in the predicted data. Experimental results on real-world traffic congestion data demonstrate that the proposed framework is efficient.
{"title":"Insights From Urban Sensing Data: From Chaos to Predicted Congestion Patterns","authors":"Minh-Son Dao, Ngoc-Thanh Nguyen, R. U. Kiran, K. Zettsu","doi":"10.1109/ICDMW51313.2020.00093","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00093","url":null,"abstract":"Monitoring traffic congestion in smart cities is a challenging problem of great importance in Intelligent Transportation Systems (ITS). Most previous works focused on developing machine learning models that can predict traffic congestion on a meshcode (i.e., a portion of an earth's surface) at a particular time instance. The key limitation of these studies is that they fail to provide holistic information regarding the sets of meshcodes in which regular congestion may happen in the forecasted data. This paper proposes a novel framework to address this problem. The proposed framework employs a 3DCNN multi-source deep learning model (hereafter, called Fusion-3DCNN) to predict traffic congestion on a particular meshcode at a particular time instance. The predicted traffic congestion data is later transformed into a temporal database and feed to the periodic-frequent pattern mining algorithm to identify the sets of meshcode in which regular congestions may happen in the predicted data. Experimental results on real-world traffic congestion data demonstrate that the proposed framework is efficient.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115966104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00062
Shalini Pandey, Andrew S. Lan, G. Karypis, J. Srivastava
In recent years, Massive Open Online Courses (MOOCs) have witnessed immense growth in popularity. Now, due to the recent Covid19 pandemic situation, it is important to push the limits of online education. Discussion forums are primary means of interaction among learners and instructors. However, with growing class size, students face the challenge of finding useful and informative discussion forums. This problem can be solved by matching the interest of students with thread contents. The fundamental challenge is that the student interests drift as they progress through the course, and forum contents evolve as students or instructors update them. In our paper, we propose to predict future interest trajectories of students. Our model consists of two key operations: 1) Update operation and 2) Projection operation. Update operation models the inter-dependency between the evolution of student and thread using coupled Recurrent Neural Networks when the student posts on the thread. The projection operation learns to estimate future embedding of students and threads. For students, the projection operation learns the drift in their interests caused by the change in the course topic they study. The projection operation for threads exploits how different posts induce varying interest levels in a student according to the thread structure. Extensive experimentation on three real-world MOOC datasets shows that our model significantly outperforms other baselines for thread recommendation.
{"title":"Learning Student Interest Trajectory for MOOC Thread Recommendation","authors":"Shalini Pandey, Andrew S. Lan, G. Karypis, J. Srivastava","doi":"10.1109/ICDMW51313.2020.00062","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00062","url":null,"abstract":"In recent years, Massive Open Online Courses (MOOCs) have witnessed immense growth in popularity. Now, due to the recent Covid19 pandemic situation, it is important to push the limits of online education. Discussion forums are primary means of interaction among learners and instructors. However, with growing class size, students face the challenge of finding useful and informative discussion forums. This problem can be solved by matching the interest of students with thread contents. The fundamental challenge is that the student interests drift as they progress through the course, and forum contents evolve as students or instructors update them. In our paper, we propose to predict future interest trajectories of students. Our model consists of two key operations: 1) Update operation and 2) Projection operation. Update operation models the inter-dependency between the evolution of student and thread using coupled Recurrent Neural Networks when the student posts on the thread. The projection operation learns to estimate future embedding of students and threads. For students, the projection operation learns the drift in their interests caused by the change in the course topic they study. The projection operation for threads exploits how different posts induce varying interest levels in a student according to the thread structure. Extensive experimentation on three real-world MOOC datasets shows that our model significantly outperforms other baselines for thread recommendation.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126380033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00108
Nyoungwoo Lee, Jehyun Nam, Ho‐Jin Choi
Power supplied enterprises need to accurately detect abnormal power consumption cases to predict power demand. Since actual abnormal power consumption patterns are irregular, a flexible model should be designed to address this situation. Thus, we inspect abnormal power consumption data and predict potential abnormal patterns. Based on these insights, the goal of this work is to generate data onto the identified abnormal patterns and to design a flexible model that can detect the generated abnormal data. As a result, a performance for anomaly detection of the final model recorded 74% and 72% accuracy for original abnormal and normal data, respectively, and randomly generated abnormal data recorded 95.07% accuracy for growth type and 89.69% accuracy for reduction type. We suggest a set of ways to identify potential abnormal data and design flexible models to address them.
{"title":"Anomaly Detection and Visualization for Electricity Consumption Data","authors":"Nyoungwoo Lee, Jehyun Nam, Ho‐Jin Choi","doi":"10.1109/ICDMW51313.2020.00108","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00108","url":null,"abstract":"Power supplied enterprises need to accurately detect abnormal power consumption cases to predict power demand. Since actual abnormal power consumption patterns are irregular, a flexible model should be designed to address this situation. Thus, we inspect abnormal power consumption data and predict potential abnormal patterns. Based on these insights, the goal of this work is to generate data onto the identified abnormal patterns and to design a flexible model that can detect the generated abnormal data. As a result, a performance for anomaly detection of the final model recorded 74% and 72% accuracy for original abnormal and normal data, respectively, and randomly generated abnormal data recorded 95.07% accuracy for growth type and 89.69% accuracy for reduction type. We suggest a set of ways to identify potential abnormal data and design flexible models to address them.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128009794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00103
Changha Lee, Seonghwan Kim, Chan-Hyun Youn
As the development of smart grid with Advanced Metering Infrastructure (AMI) consisting of network infrastructure, smart meter, and data management system, the smart grid system can analyze energy data to efficiently control energy generation and distribution. Through recent advance of analysis based on neural network, some deep neural networks have proven to perform better than conventional analytical techniques. However, Basic learning process is facing challenges on analyze time-series data from smart meter based on deep learning in realtime. Although the strategies of gradually learning a deep neural network through the continual learning method was proposed, it is only effective when data feature is not significantly changed, therefore, the performance improvements are still needed on environment where the data distribution fluctuates according to different power consumption habits. Therefore, we proposed a scheduled continual deep learning on edge-cloud system to improve and accelerate learning performance on the multi-client power consumption data, which biased data feature varies dramatically. Using cosine similarity of electric load pattern, the scheduling algorithm manages and controls the gradient from optimizing process. The evaluated performance with general experiments shows the validity of proposed scheme compared to the base method.
{"title":"An Accelerated Continual Learning with Demand Prediction based Scheduling in Edge-Cloud Computing","authors":"Changha Lee, Seonghwan Kim, Chan-Hyun Youn","doi":"10.1109/ICDMW51313.2020.00103","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00103","url":null,"abstract":"As the development of smart grid with Advanced Metering Infrastructure (AMI) consisting of network infrastructure, smart meter, and data management system, the smart grid system can analyze energy data to efficiently control energy generation and distribution. Through recent advance of analysis based on neural network, some deep neural networks have proven to perform better than conventional analytical techniques. However, Basic learning process is facing challenges on analyze time-series data from smart meter based on deep learning in realtime. Although the strategies of gradually learning a deep neural network through the continual learning method was proposed, it is only effective when data feature is not significantly changed, therefore, the performance improvements are still needed on environment where the data distribution fluctuates according to different power consumption habits. Therefore, we proposed a scheduled continual deep learning on edge-cloud system to improve and accelerate learning performance on the multi-client power consumption data, which biased data feature varies dramatically. Using cosine similarity of electric load pattern, the scheduling algorithm manages and controls the gradient from optimizing process. The evaluated performance with general experiments shows the validity of proposed scheme compared to the base method.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128210094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00126
Giacomo Ziffer, Alessio Bernardo, Emanuele Della Valle, A. Bifet
In recent years an increasing number of applications, IoT sensors and websites have produced endless streams of data. These data streams are not only unbounded, but their characteristics dynamically change over time, generating a phenomenon called concept drift. The standard machine learning models do not work properly in this context and new techniques have been developed in order to tackle these challenges. In this paper we present a new Naïve Bayes algorithm that exploits Kalman Filter, namely KalmanNB, to manage automatically concept drift. Furthermore, we want to investigate when this new approach, which directly follows the values of data's attributes, is better than the standard strategy, which monitors the performance of the model in order to detect a drift. Extensive experiments on both artificial and real datasets with concept drifts reveal that KalmanNB is a valid alternative to the state-of-the-art algorithms, outperforming the latter especially in case of recurring concept drifts.
{"title":"Fast Incremental Naïve Bayes with Kalman Filtering","authors":"Giacomo Ziffer, Alessio Bernardo, Emanuele Della Valle, A. Bifet","doi":"10.1109/ICDMW51313.2020.00126","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00126","url":null,"abstract":"In recent years an increasing number of applications, IoT sensors and websites have produced endless streams of data. These data streams are not only unbounded, but their characteristics dynamically change over time, generating a phenomenon called concept drift. The standard machine learning models do not work properly in this context and new techniques have been developed in order to tackle these challenges. In this paper we present a new Naïve Bayes algorithm that exploits Kalman Filter, namely KalmanNB, to manage automatically concept drift. Furthermore, we want to investigate when this new approach, which directly follows the values of data's attributes, is better than the standard strategy, which monitors the performance of the model in order to detect a drift. Extensive experiments on both artificial and real datasets with concept drifts reveal that KalmanNB is a valid alternative to the state-of-the-art algorithms, outperforming the latter especially in case of recurring concept drifts.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128425855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}