首页 > 最新文献

2020 International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
TKC: Mining Top-K Cross-Level High Utility Itemsets TKC:挖掘Top-K跨级别高实用物品集
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00095
M. Nouioua, Ying Wang, Philippe Fournier-Viger, Jerry Chun‐wei Lin, J. Wu
High utility itemset mining is a well-studied data mining task for analyzing customer transactions. The goal is to find all high utility itemsets, that is items purchased together that generate a profit equal to or greater than a user-defined minimum utility threshold. However, a limitation of traditional high utility itemset mining algorithms is that item categories (e.g. drinks, dairy products) are ignored. Recently, two algorithms were designed to find multi-level and cross-level high utility itemsets to reveal relationships between items and/or categories of items. This is achieved by considering a product taxonomy, where items are organized into a hierarchy. Though these algorithms can reveal interesting patterns, a problem is that setting the minimum utility threshold is not intuitive and greatly influences the number of patterns found and the algorithms' performance. If the user sets the threshold too low, a huge number of patterns is found and runtimes can be very long, while if the threshold is set too high, few patterns are found. Hence, a user often have to run an algorithm numerous times to find an appropriate threshold value to obtain just enough patterns. This paper addresses this issue by presenting a novel algorithm called TKC (Top-K Cross-level high utility itemset miner), which let the user directly set the number of patterns $k$ to be discovered. TKC performs a depth-first search and include search space pruning techniques and an optimization to enhance its performance. Experiments were done on retail data with taxonomy information. Results indicate that the algorithm is efficient and the optimization improves its performance.
高效用项集挖掘是一种被广泛研究的用于分析客户交易的数据挖掘任务。目标是找到所有高效用物品集,即一起购买的产生利润等于或大于用户定义的最小效用阈值的物品。然而,传统的高效用项目集挖掘算法的一个局限性是忽略了项目类别(例如饮料,乳制品)。最近,设计了两种算法来寻找多层次和跨层次的高效用项目集,以揭示项目之间和/或项目类别之间的关系。这可以通过考虑产品分类法来实现,产品分类法将项目组织成层次结构。虽然这些算法可以揭示有趣的模式,但问题是设置最小效用阈值并不直观,并且会极大地影响发现的模式数量和算法的性能。如果用户将阈值设置得太低,则会发现大量的模式,并且运行时间可能很长,而如果阈值设置得太高,则会发现很少的模式。因此,用户通常必须多次运行算法才能找到合适的阈值,以获得刚好足够的模式。本文通过提出一种名为TKC (Top-K Cross-level high utility itemset miner)的新算法来解决这个问题,该算法允许用户直接设置要发现的模式的数量。TKC执行深度优先搜索,包括搜索空间修剪技术和优化,以提高其性能。利用分类信息对零售数据进行了实验。结果表明,该算法是有效的,优化后的算法性能得到了提高。
{"title":"TKC: Mining Top-K Cross-Level High Utility Itemsets","authors":"M. Nouioua, Ying Wang, Philippe Fournier-Viger, Jerry Chun‐wei Lin, J. Wu","doi":"10.1109/ICDMW51313.2020.00095","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00095","url":null,"abstract":"High utility itemset mining is a well-studied data mining task for analyzing customer transactions. The goal is to find all high utility itemsets, that is items purchased together that generate a profit equal to or greater than a user-defined minimum utility threshold. However, a limitation of traditional high utility itemset mining algorithms is that item categories (e.g. drinks, dairy products) are ignored. Recently, two algorithms were designed to find multi-level and cross-level high utility itemsets to reveal relationships between items and/or categories of items. This is achieved by considering a product taxonomy, where items are organized into a hierarchy. Though these algorithms can reveal interesting patterns, a problem is that setting the minimum utility threshold is not intuitive and greatly influences the number of patterns found and the algorithms' performance. If the user sets the threshold too low, a huge number of patterns is found and runtimes can be very long, while if the threshold is set too high, few patterns are found. Hence, a user often have to run an algorithm numerous times to find an appropriate threshold value to obtain just enough patterns. This paper addresses this issue by presenting a novel algorithm called TKC (Top-K Cross-level high utility itemset miner), which let the user directly set the number of patterns $k$ to be discovered. TKC performs a depth-first search and include search space pruning techniques and an optimization to enhance its performance. Experiments were done on retail data with taxonomy information. Results indicate that the algorithm is efficient and the optimization improves its performance.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127908375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Super Learning with Repeated Cross Validation 重复交叉验证的超级学习
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00089
Krzysztof Mnich, A. Polewko-Klim, A. Golinska, W. Lesiński, W. Rudnicki
Super learner algorithm was created to combine results of multiple base learners with the use of cross validation. However, in many cases it does not outperform significantly a simple average of the base results. We propose to apply multiple repeats of cross validation to improve the performance of super learning. Two approaches to application of repeated cross validation were tested on artificial data sets and on real-life, biomedical data sets. One of the approaches, MEAN OUTPUT strategy, proved to significantly improve the results. To reduce the computational complexity of the algorithm, we suggest the use of 3-fold, rather than the previously recommended 10-fold validation. The tests showed, that this simplification does not affect the super learning results.
超级学习器算法是将多个基础学习器的结果结合使用交叉验证。然而,在许多情况下,它的性能并不明显优于基本结果的简单平均值。我们建议使用多次重复的交叉验证来提高超级学习的性能。在人工数据集和现实生活中的生物医学数据集上测试了重复交叉验证应用的两种方法。其中一种方法,MEAN OUTPUT策略,被证明可以显著改善结果。为了降低算法的计算复杂度,我们建议使用3倍验证,而不是之前推荐的10倍验证。实验表明,这种简化并不影响超级学习效果。
{"title":"Super Learning with Repeated Cross Validation","authors":"Krzysztof Mnich, A. Polewko-Klim, A. Golinska, W. Lesiński, W. Rudnicki","doi":"10.1109/ICDMW51313.2020.00089","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00089","url":null,"abstract":"Super learner algorithm was created to combine results of multiple base learners with the use of cross validation. However, in many cases it does not outperform significantly a simple average of the base results. We propose to apply multiple repeats of cross validation to improve the performance of super learning. Two approaches to application of repeated cross validation were tested on artificial data sets and on real-life, biomedical data sets. One of the approaches, MEAN OUTPUT strategy, proved to significantly improve the results. To reduce the computational complexity of the algorithm, we suggest the use of 3-fold, rather than the previously recommended 10-fold validation. The tests showed, that this simplification does not affect the super learning results.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124521291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Two-Stream Network For Driving Hand Gesture Recognition 一种驱动手势识别的双流网络
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00079
Yefan Zhou, Zhao Lv, Chaoqun Wang, Shengli Zhang
The number of traffic accident deaths caused by driving is increasing every year, in which the improper driving behaviors account for a large proportion of traffic accidents. To alert the driver's behaviors, we design a light and fast neural network (LFNN). On this basis, we construct a convolutional two-stream interactive network framework. One stream is used to acquire the spatial information of hand appearance; the other stream is used to obtain hand movement's temporal information. The features generated by the two streams are fused and classified through a short, interactive connection network. Our network structure has been tested on the CVRR-HANDS 3D data set. The accuracy reaches up to 96.5%, which obtains an obvious improvement compared with state of the art.
由驾驶引起的交通事故死亡人数每年都在增加,其中不当驾驶行为占交通事故的很大比例。为了提醒驾驶员的行为,我们设计了一个轻速神经网络(LFNN)。在此基础上,构造了一个卷积双流交互网络框架。一个流用于获取手的外观空间信息;另一个流用于获取手部运动的时间信息。两个流生成的特征通过一个简短的交互连接网络进行融合和分类。我们的网络结构已经在CVRR-HANDS三维数据集上进行了测试。准确度达到96.5%,与现有技术相比有明显提高。
{"title":"A Two-Stream Network For Driving Hand Gesture Recognition","authors":"Yefan Zhou, Zhao Lv, Chaoqun Wang, Shengli Zhang","doi":"10.1109/ICDMW51313.2020.00079","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00079","url":null,"abstract":"The number of traffic accident deaths caused by driving is increasing every year, in which the improper driving behaviors account for a large proportion of traffic accidents. To alert the driver's behaviors, we design a light and fast neural network (LFNN). On this basis, we construct a convolutional two-stream interactive network framework. One stream is used to acquire the spatial information of hand appearance; the other stream is used to obtain hand movement's temporal information. The features generated by the two streams are fused and classified through a short, interactive connection network. Our network structure has been tested on the CVRR-HANDS 3D data set. The accuracy reaches up to 96.5%, which obtains an obvious improvement compared with state of the art.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116751023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TEDD: Robust Detection of Unstable Temporal Features 不稳定时间特征的鲁棒检测
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00063
Ricardo Pereira, Bruno Casal Laraña, Nádia Soares, M. Araujo
When working with real-world temporal data, it is common to encounter features whose distribution is changing over time. The naive employment of Machine Learning models on this unstable data might lead to rapidly degrading performance, especially if the new distribution is much different from what was previously seen during training. In order to cope with this problem, it is critical to automatically identify features that are changing over time. With these features detected, data scientists and other practitioners will be able to mitigate the issue (for instance, by applying data transformations), deploying more robust models that retain high performance for longer periods of time. In this paper, we describe which temporal changes a feature should not suffer from, and propose TEDD, a technique to a) identify when a dataset might lead to an unstable Machine Learning model and b) automatically detect which features cause such lack of robustness. In order to achieve it, we leverage a regression model to highlight which features contribute to a good prediction of an instance's timestamp. We compare our approach to other methods in real and synthetic data, testing their detection capability on all simple change patterns. We show that our method: detects all types of basic changes, both for numerical and categorical features; can detect multivariate drifts; returns a comparable value measuring the amount of change of each feature; requires no parameter tuning; and is scalable both on number of features and instances of the dataset.
在处理真实世界的时态数据时,经常会遇到分布随时间变化的特性。在这种不稳定的数据上天真地使用机器学习模型可能会导致性能迅速下降,特别是如果新的分布与之前在训练中看到的分布有很大不同。为了处理这个问题,自动识别随时间变化的特征是至关重要的。通过检测这些特征,数据科学家和其他从业者将能够缓解这个问题(例如,通过应用数据转换),部署更健壮的模型,从而在更长的时间内保持高性能。在本文中,我们描述了特征不应该遭受哪些时间变化,并提出了TEDD,一种技术,a)识别数据集何时可能导致不稳定的机器学习模型,b)自动检测哪些特征导致缺乏鲁棒性。为了实现这一点,我们利用回归模型来突出显示哪些特性有助于对实例的时间戳进行良好的预测。我们将我们的方法与其他方法在真实和合成数据中进行比较,测试它们对所有简单变化模式的检测能力。我们表明,我们的方法:检测所有类型的基本变化,包括数值和分类特征;可以检测多变量漂移;返回衡量每个特征变化量的可比值;不需要参数调整;并且在数据集的特征和实例数量上都是可扩展的。
{"title":"TEDD: Robust Detection of Unstable Temporal Features","authors":"Ricardo Pereira, Bruno Casal Laraña, Nádia Soares, M. Araujo","doi":"10.1109/ICDMW51313.2020.00063","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00063","url":null,"abstract":"When working with real-world temporal data, it is common to encounter features whose distribution is changing over time. The naive employment of Machine Learning models on this unstable data might lead to rapidly degrading performance, especially if the new distribution is much different from what was previously seen during training. In order to cope with this problem, it is critical to automatically identify features that are changing over time. With these features detected, data scientists and other practitioners will be able to mitigate the issue (for instance, by applying data transformations), deploying more robust models that retain high performance for longer periods of time. In this paper, we describe which temporal changes a feature should not suffer from, and propose TEDD, a technique to a) identify when a dataset might lead to an unstable Machine Learning model and b) automatically detect which features cause such lack of robustness. In order to achieve it, we leverage a regression model to highlight which features contribute to a good prediction of an instance's timestamp. We compare our approach to other methods in real and synthetic data, testing their detection capability on all simple change patterns. We show that our method: detects all types of basic changes, both for numerical and categorical features; can detect multivariate drifts; returns a comparable value measuring the amount of change of each feature; requires no parameter tuning; and is scalable both on number of features and instances of the dataset.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121516076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Session Aware Temporal Convolutional Network for Session-based Recommendation 基于会话推荐的跨会话感知时态卷积网络
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00039
Rui Ye, Qing Zhang, Hengliang Luo
Recent advancements in Graph Neural Networks (GNN) have achieved promising results for the session-based recommendation, which aims to predict a user's actions based on anonymous sessions. However, existing graph-structured recommendation methods only focus on the internals of a session and neglect cross-session effect which contains valuable complement information for more accurately learning the taste of the user in the current session. Meanwhile, the graph structure lacks the sequential position information so that different sequential sessions can be constructed as the same graph, inevitably limiting its capacity of obtaining an accurate vector of a session representation. In order to solve the above limitations, we propose Cross-session Aware Temporal Convolutional Network (CA-TCN) model. For the cross-session aware aspect, CA-TCN builds a global-item graph and a session-context graph to model cross-session influence on both items and sessions. Global-item graph explores the global cross-session influence on items by building relevant item connections among all sessions. Session-context graph explores the complex cross-session influence on sessions by building the connections between the current session and other sessions with similar user intents and behavioral patterns as the current session. And, we connect items and sessions with hierarchical item-level and session-level attention mechanism. Besides, compared with the GNN, TCN can perform convolution operation on multi-hops items and maintain sequence information in the process of convolution. Extensive experiments on two real-world datasets show that our method outperforms state-of-the-art methods consistently.
图神经网络(GNN)的最新进展在基于会话的推荐方面取得了可喜的成果,该推荐旨在基于匿名会话预测用户的行为。然而,现有的图结构推荐方法只关注会话的内部,而忽略了跨会话效应,而跨会话效应包含有价值的补充信息,可以更准确地学习当前会话中用户的口味。同时,图结构缺乏序列位置信息,无法将不同的序列会话构造为同一图,这不可避免地限制了其获得准确的会话表示向量的能力。为了解决上述局限性,我们提出了跨会话感知时态卷积网络(CA-TCN)模型。对于跨会话感知方面,CA-TCN构建了一个全局项目图和一个会话上下文图,以模拟跨会话对项目和会话的影响。全局项目图通过在所有会话之间建立相关项目连接来探索全局跨会话对项目的影响。会话上下文图通过在当前会话和具有与当前会话相似的用户意图和行为模式的其他会话之间建立连接,探索复杂的跨会话对会话的影响。并且,我们使用分层的项目级和会话级注意机制连接项目和会话。此外,与GNN相比,TCN可以对多跳项进行卷积运算,并在卷积过程中保持序列信息。在两个真实世界数据集上的广泛实验表明,我们的方法始终优于最先进的方法。
{"title":"Cross-Session Aware Temporal Convolutional Network for Session-based Recommendation","authors":"Rui Ye, Qing Zhang, Hengliang Luo","doi":"10.1109/ICDMW51313.2020.00039","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00039","url":null,"abstract":"Recent advancements in Graph Neural Networks (GNN) have achieved promising results for the session-based recommendation, which aims to predict a user's actions based on anonymous sessions. However, existing graph-structured recommendation methods only focus on the internals of a session and neglect cross-session effect which contains valuable complement information for more accurately learning the taste of the user in the current session. Meanwhile, the graph structure lacks the sequential position information so that different sequential sessions can be constructed as the same graph, inevitably limiting its capacity of obtaining an accurate vector of a session representation. In order to solve the above limitations, we propose Cross-session Aware Temporal Convolutional Network (CA-TCN) model. For the cross-session aware aspect, CA-TCN builds a global-item graph and a session-context graph to model cross-session influence on both items and sessions. Global-item graph explores the global cross-session influence on items by building relevant item connections among all sessions. Session-context graph explores the complex cross-session influence on sessions by building the connections between the current session and other sessions with similar user intents and behavioral patterns as the current session. And, we connect items and sessions with hierarchical item-level and session-level attention mechanism. Besides, compared with the GNN, TCN can perform convolution operation on multi-hops items and maintain sequence information in the process of convolution. Extensive experiments on two real-world datasets show that our method outperforms state-of-the-art methods consistently.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121637774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Insights From Urban Sensing Data: From Chaos to Predicted Congestion Patterns 从城市传感数据的见解:从混乱到预测的拥堵模式
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00093
Minh-Son Dao, Ngoc-Thanh Nguyen, R. U. Kiran, K. Zettsu
Monitoring traffic congestion in smart cities is a challenging problem of great importance in Intelligent Transportation Systems (ITS). Most previous works focused on developing machine learning models that can predict traffic congestion on a meshcode (i.e., a portion of an earth's surface) at a particular time instance. The key limitation of these studies is that they fail to provide holistic information regarding the sets of meshcodes in which regular congestion may happen in the forecasted data. This paper proposes a novel framework to address this problem. The proposed framework employs a 3DCNN multi-source deep learning model (hereafter, called Fusion-3DCNN) to predict traffic congestion on a particular meshcode at a particular time instance. The predicted traffic congestion data is later transformed into a temporal database and feed to the periodic-frequent pattern mining algorithm to identify the sets of meshcode in which regular congestions may happen in the predicted data. Experimental results on real-world traffic congestion data demonstrate that the proposed framework is efficient.
智能城市交通拥堵监测是智能交通系统(ITS)中一个具有挑战性的重要问题。以前的大多数工作都集中在开发机器学习模型上,这些模型可以预测在特定时间实例下网格码(即地球表面的一部分)上的交通拥堵。这些研究的关键限制是,它们无法提供有关预测数据中可能发生常规拥塞的网码集的整体信息。本文提出了一个新的框架来解决这个问题。提出的框架采用3DCNN多源深度学习模型(以下称为Fusion-3DCNN)来预测特定时间实例下特定网格码上的交通拥堵。然后将预测的交通拥堵数据转换为一个时态数据库,并提供给周期-频繁模式挖掘算法,以识别预测数据中可能发生定期拥堵的网格码集。在实际交通拥堵数据上的实验结果表明,该框架是有效的。
{"title":"Insights From Urban Sensing Data: From Chaos to Predicted Congestion Patterns","authors":"Minh-Son Dao, Ngoc-Thanh Nguyen, R. U. Kiran, K. Zettsu","doi":"10.1109/ICDMW51313.2020.00093","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00093","url":null,"abstract":"Monitoring traffic congestion in smart cities is a challenging problem of great importance in Intelligent Transportation Systems (ITS). Most previous works focused on developing machine learning models that can predict traffic congestion on a meshcode (i.e., a portion of an earth's surface) at a particular time instance. The key limitation of these studies is that they fail to provide holistic information regarding the sets of meshcodes in which regular congestion may happen in the forecasted data. This paper proposes a novel framework to address this problem. The proposed framework employs a 3DCNN multi-source deep learning model (hereafter, called Fusion-3DCNN) to predict traffic congestion on a particular meshcode at a particular time instance. The predicted traffic congestion data is later transformed into a temporal database and feed to the periodic-frequent pattern mining algorithm to identify the sets of meshcode in which regular congestions may happen in the predicted data. Experimental results on real-world traffic congestion data demonstrate that the proposed framework is efficient.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115966104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning Student Interest Trajectory for MOOC Thread Recommendation 学习MOOC主题推荐的学生兴趣轨迹
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00062
Shalini Pandey, Andrew S. Lan, G. Karypis, J. Srivastava
In recent years, Massive Open Online Courses (MOOCs) have witnessed immense growth in popularity. Now, due to the recent Covid19 pandemic situation, it is important to push the limits of online education. Discussion forums are primary means of interaction among learners and instructors. However, with growing class size, students face the challenge of finding useful and informative discussion forums. This problem can be solved by matching the interest of students with thread contents. The fundamental challenge is that the student interests drift as they progress through the course, and forum contents evolve as students or instructors update them. In our paper, we propose to predict future interest trajectories of students. Our model consists of two key operations: 1) Update operation and 2) Projection operation. Update operation models the inter-dependency between the evolution of student and thread using coupled Recurrent Neural Networks when the student posts on the thread. The projection operation learns to estimate future embedding of students and threads. For students, the projection operation learns the drift in their interests caused by the change in the course topic they study. The projection operation for threads exploits how different posts induce varying interest levels in a student according to the thread structure. Extensive experimentation on three real-world MOOC datasets shows that our model significantly outperforms other baselines for thread recommendation.
近年来,大规模在线开放课程(MOOCs)越来越受欢迎。现在,由于最近的covid - 19大流行形势,重要的是要突破在线教育的极限。论坛是学习者和教师之间互动的主要手段。然而,随着班级规模的扩大,学生们面临着寻找有用和信息丰富的论坛的挑战。这个问题可以通过将学生的兴趣与线程内容相匹配来解决。最根本的挑战是,学生的兴趣随着课程的进展而变化,而论坛的内容随着学生或教师的更新而变化。在我们的论文中,我们建议预测学生未来的兴趣轨迹。我们的模型包括两个关键操作:1)更新操作和2)投影操作。当学生在线程上发帖时,更新操作使用耦合递归神经网络来模拟学生和线程之间的相互依赖关系。投影运算学习估计学生和线程的未来嵌入。对于学生来说,投射运算学习了由于所学习的课程主题的变化而引起的兴趣的漂移。线程的投影操作利用不同的帖子如何根据线程结构引起学生的不同兴趣水平。在三个真实的MOOC数据集上进行的大量实验表明,我们的模型在线程推荐方面明显优于其他基线。
{"title":"Learning Student Interest Trajectory for MOOC Thread Recommendation","authors":"Shalini Pandey, Andrew S. Lan, G. Karypis, J. Srivastava","doi":"10.1109/ICDMW51313.2020.00062","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00062","url":null,"abstract":"In recent years, Massive Open Online Courses (MOOCs) have witnessed immense growth in popularity. Now, due to the recent Covid19 pandemic situation, it is important to push the limits of online education. Discussion forums are primary means of interaction among learners and instructors. However, with growing class size, students face the challenge of finding useful and informative discussion forums. This problem can be solved by matching the interest of students with thread contents. The fundamental challenge is that the student interests drift as they progress through the course, and forum contents evolve as students or instructors update them. In our paper, we propose to predict future interest trajectories of students. Our model consists of two key operations: 1) Update operation and 2) Projection operation. Update operation models the inter-dependency between the evolution of student and thread using coupled Recurrent Neural Networks when the student posts on the thread. The projection operation learns to estimate future embedding of students and threads. For students, the projection operation learns the drift in their interests caused by the change in the course topic they study. The projection operation for threads exploits how different posts induce varying interest levels in a student according to the thread structure. Extensive experimentation on three real-world MOOC datasets shows that our model significantly outperforms other baselines for thread recommendation.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126380033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Anomaly Detection and Visualization for Electricity Consumption Data 电力消耗数据的异常检测和可视化
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00108
Nyoungwoo Lee, Jehyun Nam, Ho‐Jin Choi
Power supplied enterprises need to accurately detect abnormal power consumption cases to predict power demand. Since actual abnormal power consumption patterns are irregular, a flexible model should be designed to address this situation. Thus, we inspect abnormal power consumption data and predict potential abnormal patterns. Based on these insights, the goal of this work is to generate data onto the identified abnormal patterns and to design a flexible model that can detect the generated abnormal data. As a result, a performance for anomaly detection of the final model recorded 74% and 72% accuracy for original abnormal and normal data, respectively, and randomly generated abnormal data recorded 95.07% accuracy for growth type and 89.69% accuracy for reduction type. We suggest a set of ways to identify potential abnormal data and design flexible models to address them.
供电企业需要准确检测异常用电情况,预测用电需求。由于实际的异常功耗模式是不规则的,因此应该设计一个灵活的模型来处理这种情况。因此,我们检查异常的功耗数据,并预测潜在的异常模式。基于这些见解,这项工作的目标是将数据生成到已识别的异常模式上,并设计一个可以检测生成的异常数据的灵活模型。结果,最终模型的异常检测性能对原始异常和正常数据的准确率分别为74%和72%,随机生成的异常数据对生长型的准确率为95.07%,对约简型的准确率为89.69%。我们提出了一组方法来识别潜在的异常数据,并设计灵活的模型来解决这些问题。
{"title":"Anomaly Detection and Visualization for Electricity Consumption Data","authors":"Nyoungwoo Lee, Jehyun Nam, Ho‐Jin Choi","doi":"10.1109/ICDMW51313.2020.00108","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00108","url":null,"abstract":"Power supplied enterprises need to accurately detect abnormal power consumption cases to predict power demand. Since actual abnormal power consumption patterns are irregular, a flexible model should be designed to address this situation. Thus, we inspect abnormal power consumption data and predict potential abnormal patterns. Based on these insights, the goal of this work is to generate data onto the identified abnormal patterns and to design a flexible model that can detect the generated abnormal data. As a result, a performance for anomaly detection of the final model recorded 74% and 72% accuracy for original abnormal and normal data, respectively, and randomly generated abnormal data recorded 95.07% accuracy for growth type and 89.69% accuracy for reduction type. We suggest a set of ways to identify potential abnormal data and design flexible models to address them.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128009794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Accelerated Continual Learning with Demand Prediction based Scheduling in Edge-Cloud Computing 边缘云计算中基于需求预测调度的加速持续学习
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00103
Changha Lee, Seonghwan Kim, Chan-Hyun Youn
As the development of smart grid with Advanced Metering Infrastructure (AMI) consisting of network infrastructure, smart meter, and data management system, the smart grid system can analyze energy data to efficiently control energy generation and distribution. Through recent advance of analysis based on neural network, some deep neural networks have proven to perform better than conventional analytical techniques. However, Basic learning process is facing challenges on analyze time-series data from smart meter based on deep learning in realtime. Although the strategies of gradually learning a deep neural network through the continual learning method was proposed, it is only effective when data feature is not significantly changed, therefore, the performance improvements are still needed on environment where the data distribution fluctuates according to different power consumption habits. Therefore, we proposed a scheduled continual deep learning on edge-cloud system to improve and accelerate learning performance on the multi-client power consumption data, which biased data feature varies dramatically. Using cosine similarity of electric load pattern, the scheduling algorithm manages and controls the gradient from optimizing process. The evaluated performance with general experiments shows the validity of proposed scheme compared to the base method.
随着智能电网的发展,先进的计量基础设施(AMI)由网络基础设施、智能电表和数据管理系统组成,智能电网系统可以对能源数据进行分析,从而有效地控制能源的生产和分配。通过近年来基于神经网络的分析进展,一些深度神经网络已经被证明比传统的分析技术表现得更好。然而,基于深度学习的智能电表时序数据实时分析在基础学习过程中面临挑战。虽然提出了通过持续学习方法逐步学习深度神经网络的策略,但该策略仅在数据特征变化不明显的情况下有效,因此,在数据分布随功耗习惯不同而波动的环境下,仍需提高性能。因此,我们提出了一种边缘云系统上的定时持续深度学习,以提高和加速在多客户端功耗数据上的学习性能,这些数据的偏差特征变化很大。调度算法利用负荷模式余弦相似度,从优化过程对梯度进行管理和控制。通过一般实验对该方案的性能进行了评价,结果表明该方案与基本方法相比是有效的。
{"title":"An Accelerated Continual Learning with Demand Prediction based Scheduling in Edge-Cloud Computing","authors":"Changha Lee, Seonghwan Kim, Chan-Hyun Youn","doi":"10.1109/ICDMW51313.2020.00103","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00103","url":null,"abstract":"As the development of smart grid with Advanced Metering Infrastructure (AMI) consisting of network infrastructure, smart meter, and data management system, the smart grid system can analyze energy data to efficiently control energy generation and distribution. Through recent advance of analysis based on neural network, some deep neural networks have proven to perform better than conventional analytical techniques. However, Basic learning process is facing challenges on analyze time-series data from smart meter based on deep learning in realtime. Although the strategies of gradually learning a deep neural network through the continual learning method was proposed, it is only effective when data feature is not significantly changed, therefore, the performance improvements are still needed on environment where the data distribution fluctuates according to different power consumption habits. Therefore, we proposed a scheduled continual deep learning on edge-cloud system to improve and accelerate learning performance on the multi-client power consumption data, which biased data feature varies dramatically. Using cosine similarity of electric load pattern, the scheduling algorithm manages and controls the gradient from optimizing process. The evaluated performance with general experiments shows the validity of proposed scheme compared to the base method.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128210094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast Incremental Naïve Bayes with Kalman Filtering 快速增量Naïve与卡尔曼滤波贝叶斯
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00126
Giacomo Ziffer, Alessio Bernardo, Emanuele Della Valle, A. Bifet
In recent years an increasing number of applications, IoT sensors and websites have produced endless streams of data. These data streams are not only unbounded, but their characteristics dynamically change over time, generating a phenomenon called concept drift. The standard machine learning models do not work properly in this context and new techniques have been developed in order to tackle these challenges. In this paper we present a new Naïve Bayes algorithm that exploits Kalman Filter, namely KalmanNB, to manage automatically concept drift. Furthermore, we want to investigate when this new approach, which directly follows the values of data's attributes, is better than the standard strategy, which monitors the performance of the model in order to detect a drift. Extensive experiments on both artificial and real datasets with concept drifts reveal that KalmanNB is a valid alternative to the state-of-the-art algorithms, outperforming the latter especially in case of recurring concept drifts.
近年来,越来越多的应用程序,物联网传感器和网站产生了无尽的数据流。这些数据流不仅是无界的,而且它们的特征随着时间的推移而动态变化,产生了一种称为概念漂移的现象。标准的机器学习模型在这种情况下不能正常工作,为了解决这些挑战,新技术已经被开发出来。在本文中,我们提出了一种新的Naïve贝叶斯算法,该算法利用卡尔曼滤波器,即卡尔曼nb,来自动管理概念漂移。此外,我们想要研究这种直接遵循数据属性值的新方法,何时比标准策略更好,后者监视模型的性能以检测漂移。在具有概念漂移的人工和真实数据集上进行的大量实验表明,KalmanNB是最先进算法的有效替代方案,特别是在反复出现概念漂移的情况下,KalmanNB的性能优于后者。
{"title":"Fast Incremental Naïve Bayes with Kalman Filtering","authors":"Giacomo Ziffer, Alessio Bernardo, Emanuele Della Valle, A. Bifet","doi":"10.1109/ICDMW51313.2020.00126","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00126","url":null,"abstract":"In recent years an increasing number of applications, IoT sensors and websites have produced endless streams of data. These data streams are not only unbounded, but their characteristics dynamically change over time, generating a phenomenon called concept drift. The standard machine learning models do not work properly in this context and new techniques have been developed in order to tackle these challenges. In this paper we present a new Naïve Bayes algorithm that exploits Kalman Filter, namely KalmanNB, to manage automatically concept drift. Furthermore, we want to investigate when this new approach, which directly follows the values of data's attributes, is better than the standard strategy, which monitors the performance of the model in order to detect a drift. Extensive experiments on both artificial and real datasets with concept drifts reveal that KalmanNB is a valid alternative to the state-of-the-art algorithms, outperforming the latter especially in case of recurring concept drifts.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128425855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2020 International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1