首页 > 最新文献

2020 International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
A Short-Term Cryptocurrency Price Movement Prediction Using Centrality Measures 使用中心性度量的短期加密货币价格走势预测
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00058
Kin-Hon Ho, Wai-Han Chiu, Chin Li
We conduct a network analysis with centrality measures, using historical daily close prices of top 120 cryptocurrencies between 2013 and 2020, to study and understand the dynamic evolution and characteristics of the cryptocurrency market. Our study has three primary findings: (1) the overall cross-return correlation among the cryptocurrencies is weakening from 2013 to 2016 and then strengthening thereafter; (2) cryptocurrencies that are primarily used for transaction payment, notably BTC, dominate the market until mid-2016, followed by those developed for applications using blockchain as the underlying technology, particularly data storage and recording such as MAID and FCT, between mid-2016 and mid-2017. Since then, ETH, alongside with its strongly correlated cryptocurrencies have replaced BTC to become the benchmark cryptocurrencies. Furthermore, during the outbreak of COVID-19, QTUM and BNB have intermittently replaced ETH to take the leading positions due to their active community engagement during the pandemic; (3) centrality measures are useful features in improving the prediction accuracy of the short-term cryptocurrency price movement.
我们使用2013年至2020年间排名前120位的加密货币的历史每日收盘价,使用中心性指标进行网络分析,以研究和了解加密货币市场的动态演变和特征。我们的研究有三个主要发现:(1)从2013年到2016年,加密货币之间的总体交叉收益相关性减弱,之后增强;(2)主要用于交易支付的加密货币,特别是比特币,在2016年年中之前主导市场,其次是使用区块链作为底层技术的应用程序开发的加密货币,特别是数据存储和记录,如MAID和FCT,在2016年年中至2017年年中。从那时起,ETH以及与之密切相关的加密货币取代了BTC,成为基准加密货币。此外,在2019冠状病毒病暴发期间,由于QTUM和BNB在大流行期间积极参与社区活动,因此间歇性地取代ETH占据领先地位;(3)中心性度量是提高短期加密货币价格走势预测准确性的有用特征。
{"title":"A Short-Term Cryptocurrency Price Movement Prediction Using Centrality Measures","authors":"Kin-Hon Ho, Wai-Han Chiu, Chin Li","doi":"10.1109/ICDMW51313.2020.00058","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00058","url":null,"abstract":"We conduct a network analysis with centrality measures, using historical daily close prices of top 120 cryptocurrencies between 2013 and 2020, to study and understand the dynamic evolution and characteristics of the cryptocurrency market. Our study has three primary findings: (1) the overall cross-return correlation among the cryptocurrencies is weakening from 2013 to 2016 and then strengthening thereafter; (2) cryptocurrencies that are primarily used for transaction payment, notably BTC, dominate the market until mid-2016, followed by those developed for applications using blockchain as the underlying technology, particularly data storage and recording such as MAID and FCT, between mid-2016 and mid-2017. Since then, ETH, alongside with its strongly correlated cryptocurrencies have replaced BTC to become the benchmark cryptocurrencies. Furthermore, during the outbreak of COVID-19, QTUM and BNB have intermittently replaced ETH to take the leading positions due to their active community engagement during the pandemic; (3) centrality measures are useful features in improving the prediction accuracy of the short-term cryptocurrency price movement.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114792531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integration of Fuzzy and Deep Learning in Three-Way Decisions 模糊学习与深度学习在三方决策中的集成
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00019
L. Subhashini, Yuefeng Li, Jinglan Zhang, Ajantha S Atukorale
The problem of uncertainty is a challenging issue to solve in opinion mining models. Existing models that use machine learning algorithms are unable to identify uncertainty within online customer reviews because of broad uncertain boundaries. Many researchers have developed fuzzy models to solve this problem. However, the problem of large uncertain boundaries remains with fuzzy models. The common challenging issue is that there is a big uncertain boundary between positive and negative classes as user reviews (or opinions) include many uncertainties. Dealing with these uncertainties is problematic due in many frequently used words may be non-relevant. This paper proposes a three-way based framework which integrates fuzzy concepts and deep learning together to solve the problem of uncertainty. Many experiments were conducted using movie review and ebook review datasets. The experimental results show that the proposed three-way framework is useful for dealing with uncertainties in opinions and we were able to show that significant F-measure for two benchmark dataset.
在意见挖掘模型中,不确定性问题是一个很有挑战性的问题。由于存在广泛的不确定边界,使用机器学习算法的现有模型无法识别在线客户评论中的不确定性。许多研究者开发了模糊模型来解决这个问题。然而,模糊模型仍然存在大不确定边界的问题。常见的挑战问题是,由于用户评论(或意见)包含许多不确定性,因此正面和负面类别之间存在很大的不确定边界。处理这些不确定性是有问题的,因为许多经常使用的单词可能是不相关的。本文提出了一种将模糊概念和深度学习相结合的基于三向的框架来解决不确定性问题。使用电影评论和电子书评论数据集进行了许多实验。实验结果表明,所提出的三方框架对于处理意见中的不确定性是有用的,并且我们能够证明两个基准数据集的F-measure是显著的。
{"title":"Integration of Fuzzy and Deep Learning in Three-Way Decisions","authors":"L. Subhashini, Yuefeng Li, Jinglan Zhang, Ajantha S Atukorale","doi":"10.1109/ICDMW51313.2020.00019","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00019","url":null,"abstract":"The problem of uncertainty is a challenging issue to solve in opinion mining models. Existing models that use machine learning algorithms are unable to identify uncertainty within online customer reviews because of broad uncertain boundaries. Many researchers have developed fuzzy models to solve this problem. However, the problem of large uncertain boundaries remains with fuzzy models. The common challenging issue is that there is a big uncertain boundary between positive and negative classes as user reviews (or opinions) include many uncertainties. Dealing with these uncertainties is problematic due in many frequently used words may be non-relevant. This paper proposes a three-way based framework which integrates fuzzy concepts and deep learning together to solve the problem of uncertainty. Many experiments were conducted using movie review and ebook review datasets. The experimental results show that the proposed three-way framework is useful for dealing with uncertainties in opinions and we were able to show that significant F-measure for two benchmark dataset.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114267495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Temporally-Reweighted Dirichlet Process Mixture Anomaly Detector 时间重加权狄利克雷过程混合异常检测器
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00045
JunYong Tong, Nick Torenvliet
This paper proposes a streaming anomaly detection algorithm using variational Bayesian non-parametric methods. We extend the use of Dirichlet process mixture models to anomaly detection for online streaming data through the use of streaming variational bayes method and a cohesion function. Using our algorithm, we were able to update model parameters sequentially near real-time, using a fixed amount of computational resources. The algorithm was able to capture the temporal dynamics of the data and enabled good online anomaly detection. We demonstrate the performance, and discuss results, of the algorithm on an industrial datasets with anomalies provided by a local utility.
本文提出了一种基于变分贝叶斯非参数方法的流异常检测算法。我们通过使用流变分贝叶斯方法和内聚函数将Dirichlet过程混合模型的使用扩展到在线流数据的异常检测中。使用我们的算法,我们能够使用固定数量的计算资源,近乎实时地顺序更新模型参数。该算法能够捕获数据的时间动态,并实现良好的在线异常检测。我们在本地公用事业公司提供的具有异常的工业数据集上演示了该算法的性能并讨论了结果。
{"title":"Temporally-Reweighted Dirichlet Process Mixture Anomaly Detector","authors":"JunYong Tong, Nick Torenvliet","doi":"10.1109/ICDMW51313.2020.00045","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00045","url":null,"abstract":"This paper proposes a streaming anomaly detection algorithm using variational Bayesian non-parametric methods. We extend the use of Dirichlet process mixture models to anomaly detection for online streaming data through the use of streaming variational bayes method and a cohesion function. Using our algorithm, we were able to update model parameters sequentially near real-time, using a fixed amount of computational resources. The algorithm was able to capture the temporal dynamics of the data and enabled good online anomaly detection. We demonstrate the performance, and discuss results, of the algorithm on an industrial datasets with anomalies provided by a local utility.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130792155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A federated learning based approach for loan defaults prediction 基于联邦学习的贷款违约预测方法
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00057
Geet Shingi
The number of defaults in bank loans have recently been increasing in the past years. However, the process of sanctioning the loan has still been done manually in many of the banking organizations. Dependency on human intervention and delay in results have been the biggest obstacles in this system. While implementing machine learning models for banking applications, the security of sensitive customer banking data has always been a crucial concern and with strong legislative rules in place, sharing of data with other organizations is not possible. Along with this, the loan dataset is highly imbalanced, there are very few samples of defaults as compared to repaid loans. Hence, these problems make the default prediction system difficult to learn the patterns of defaults and thus difficult to predict them. Previous machine learning-based approaches to automate the process have been training models on the same organization's data but in today's world, classifying the loan application on the data within the organizations is no longer sufficient and a feasible solution. In this paper, we propose a federated learning-based approach for the prediction of loan applications that are less likely to be repaid which helps in resolving the above mentioned issues by sharing the weight of the model which are aggregated at the central server. The federated system is coupled with Synthetic Minority Over-sampling Technique(SMOTE) to solve the problem of imbalanced training data. Further, The federated system is coupled with a weighted aggregation based on the number of samples and performance of a worker on his dataset to further augment the performance. The improved performance by our model on publicly available real-world data further validates the same. Flexible, aggregated models can prove to be crucial in keeping out the defaulters in loan applications.
过去几年,银行贷款违约的数量一直在增加。然而,在许多银行组织中,批准贷款的过程仍然是手工完成的。依赖人为干预和拖延结果一直是这一系统的最大障碍。在为银行应用程序实施机器学习模型时,敏感客户银行数据的安全性一直是一个关键问题,并且由于有强有力的立法规则,与其他组织共享数据是不可能的。与此同时,贷款数据集高度不平衡,与偿还贷款相比,违约样本很少。因此,这些问题使得默认预测系统难以学习默认模式,从而难以预测默认模式。以前基于机器学习的自动化流程方法是在同一组织的数据上训练模型,但在当今世界,根据组织内的数据对贷款申请进行分类不再是足够的,也是可行的解决方案。在本文中,我们提出了一种基于联邦学习的方法来预测不太可能偿还的贷款申请,该方法通过共享在中央服务器上聚合的模型的权重来帮助解决上述问题。为了解决训练数据不平衡的问题,将联邦系统与合成少数派过采样技术(SMOTE)相结合。此外,联邦系统与基于样本数量和工作人员在其数据集上的性能的加权聚合相结合,以进一步提高性能。我们的模型在公开可用的真实数据上的改进性能进一步验证了这一点。事实证明,灵活的聚合模型对于阻止贷款申请中的违约者是至关重要的。
{"title":"A federated learning based approach for loan defaults prediction","authors":"Geet Shingi","doi":"10.1109/ICDMW51313.2020.00057","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00057","url":null,"abstract":"The number of defaults in bank loans have recently been increasing in the past years. However, the process of sanctioning the loan has still been done manually in many of the banking organizations. Dependency on human intervention and delay in results have been the biggest obstacles in this system. While implementing machine learning models for banking applications, the security of sensitive customer banking data has always been a crucial concern and with strong legislative rules in place, sharing of data with other organizations is not possible. Along with this, the loan dataset is highly imbalanced, there are very few samples of defaults as compared to repaid loans. Hence, these problems make the default prediction system difficult to learn the patterns of defaults and thus difficult to predict them. Previous machine learning-based approaches to automate the process have been training models on the same organization's data but in today's world, classifying the loan application on the data within the organizations is no longer sufficient and a feasible solution. In this paper, we propose a federated learning-based approach for the prediction of loan applications that are less likely to be repaid which helps in resolving the above mentioned issues by sharing the weight of the model which are aggregated at the central server. The federated system is coupled with Synthetic Minority Over-sampling Technique(SMOTE) to solve the problem of imbalanced training data. Further, The federated system is coupled with a weighted aggregation based on the number of samples and performance of a worker on his dataset to further augment the performance. The improved performance by our model on publicly available real-world data further validates the same. Flexible, aggregated models can prove to be crucial in keeping out the defaulters in loan applications.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123874255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Detecting Dynamic Critical Links within Large Scale Network for Traffic State Prediction 基于流量状态预测的大规模网络动态关键链路检测
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00119
Pierre-Antoine Laharotte, Romain Billot, Nour-Eddin El Faouzi
Can we expose the relationship between the physical dynamics of a network and its predictability? To contribute to this point, we propose a dimensionality reduction method for network states prediction based on spatiotemporal data. The method is intended to deal with large scale networks, where only a subset of critical links can be relevant for accurate multidimensional prediction (MIMO) performances. The algorithm is based on Latent Dirichlet Allocation (LDA) to highlight relevant topics in terms of networks dynamics. The feature selection trick relies on the assumption that the most representative links of the most dominant topics are critical links for short term prediction. The method is fully implemented to an original application field: short term road traffic prediction on large scale urban networks based on GPS data. Results highlight significant reductions in dimensionality and execution time, a global improvement of prediction performances as well as a better resilience to non recurrent traffic flow conditions.
我们能揭示网络的物理动态与其可预测性之间的关系吗?为此,我们提出了一种基于时空数据的网络状态预测降维方法。该方法旨在处理大规模网络,其中只有关键链路的子集可以与准确的多维预测(MIMO)性能相关。该算法基于潜在狄利克雷分配(Latent Dirichlet Allocation, LDA),从网络动力学的角度突出相关主题。特征选择技巧依赖于一个假设,即最主要主题中最具代表性的链接是短期预测的关键链接。该方法完全应用于基于GPS数据的大规模城市网络短期道路交通预测这一新颖的应用领域。结果突出了维数和执行时间的显著降低,预测性能的整体改善以及对非经常性交通流条件的更好的弹性。
{"title":"Detecting Dynamic Critical Links within Large Scale Network for Traffic State Prediction","authors":"Pierre-Antoine Laharotte, Romain Billot, Nour-Eddin El Faouzi","doi":"10.1109/ICDMW51313.2020.00119","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00119","url":null,"abstract":"Can we expose the relationship between the physical dynamics of a network and its predictability? To contribute to this point, we propose a dimensionality reduction method for network states prediction based on spatiotemporal data. The method is intended to deal with large scale networks, where only a subset of critical links can be relevant for accurate multidimensional prediction (MIMO) performances. The algorithm is based on Latent Dirichlet Allocation (LDA) to highlight relevant topics in terms of networks dynamics. The feature selection trick relies on the assumption that the most representative links of the most dominant topics are critical links for short term prediction. The method is fully implemented to an original application field: short term road traffic prediction on large scale urban networks based on GPS data. Results highlight significant reductions in dimensionality and execution time, a global improvement of prediction performances as well as a better resilience to non recurrent traffic flow conditions.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121991379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Experimental Evaluation of Data Classification Models for Credibility Based Fake News Detection 基于可信度的假新闻检测数据分类模型实验评价
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00022
A. Ramkissoon, Shareeda Mohammed
The existence of fake news is a problem challenging today's social media enabled world. Fake news can be classified using varying methods. Predicting and detecting fake news has proven to be challenging even for machine learning algorithms. This research attempts to investigate nine such machine learning algorithms to understand their performance with Credibility Based Fake News Detection. This study uses a standard dataset with features relating to the credibility of news publishers. These features are analysed using each of these algorithms. The results of these experiments are analysed using four evaluation methodologies. The analysis reveals varying performance with the use of each of the nine methods. Based upon our selected dataset, one of these methods has proven to be most appropriate for the purpose of Credibility Based Fake News Detection.
假新闻的存在是当今社交媒体世界面临的一个挑战。假新闻可以用不同的方法分类。事实证明,即使对机器学习算法来说,预测和检测假新闻也很有挑战性。本研究试图研究九种这样的机器学习算法,以了解它们在基于可信度的假新闻检测中的表现。本研究使用了一个标准数据集,其中包含与新闻出版商可信度相关的特征。使用这些算法对这些特征进行分析。用四种评价方法对实验结果进行了分析。分析显示,使用这九种方法中的每一种都有不同的性能。根据我们选择的数据集,其中一种方法已被证明是最适合基于可信度的假新闻检测的。
{"title":"An Experimental Evaluation of Data Classification Models for Credibility Based Fake News Detection","authors":"A. Ramkissoon, Shareeda Mohammed","doi":"10.1109/ICDMW51313.2020.00022","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00022","url":null,"abstract":"The existence of fake news is a problem challenging today's social media enabled world. Fake news can be classified using varying methods. Predicting and detecting fake news has proven to be challenging even for machine learning algorithms. This research attempts to investigate nine such machine learning algorithms to understand their performance with Credibility Based Fake News Detection. This study uses a standard dataset with features relating to the credibility of news publishers. These features are analysed using each of these algorithms. The results of these experiments are analysed using four evaluation methodologies. The analysis reveals varying performance with the use of each of the nine methods. Based upon our selected dataset, one of these methods has proven to be most appropriate for the purpose of Credibility Based Fake News Detection.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121053428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Blockchain Applications to combat the global trade of falsified drugs 区块链应用于打击假药全球贸易
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00127
Y. Kostyuchenko, Qingshan Jiang
The globalization of the pharmaceutical supply chain has lead to new challenges, the leading position among them is the fight against falsified and substandard pharmaceutical products. Such kind of products causes ineffective or harmful therapies all over the world. Traditional centralized technical tools can hardly satisfy the requirements of the changing industry. In this paper, we research the application of Blockchain solutions to modernize the drug supply chain and minimize the amount of the poor-quality medications.
医药供应链的全球化带来了新的挑战,其中首当其冲的就是打击假冒伪劣药品。这类产品在世界各地造成无效或有害的治疗。传统的集中式技术工具难以满足日新月异的行业需求。在本文中,我们研究了区块链解决方案的应用,以实现药品供应链的现代化,并最大限度地减少劣质药品的数量。
{"title":"Blockchain Applications to combat the global trade of falsified drugs","authors":"Y. Kostyuchenko, Qingshan Jiang","doi":"10.1109/ICDMW51313.2020.00127","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00127","url":null,"abstract":"The globalization of the pharmaceutical supply chain has lead to new challenges, the leading position among them is the fight against falsified and substandard pharmaceutical products. Such kind of products causes ineffective or harmful therapies all over the world. Traditional centralized technical tools can hardly satisfy the requirements of the changing industry. In this paper, we research the application of Blockchain solutions to modernize the drug supply chain and minimize the amount of the poor-quality medications.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121258094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StreamDL: Deep Learning Serving Platform for AMI Stream Forecasting StreamDL: AMI流预测的深度学习服务平台
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00104
Eunju Yang, Changha Lee, Ji-Hwan Kim, Tuan Manh Tao, Chan-Hyun Youn
Advanced Metering Infrastructures (AMIs) facilitate individual load forecasting. The individual load forecasting not only improves the accuracy of aggregated load forecasting but is a fundamental component of various power applications. With the highlight of deep learning (DL) in the individual load forecasting, a serving platform specialized in deep learning is required to forecast with AMI stream data. However, the existing serving platforms for DL models do not consider stream data as an input but usually support image or text data through RESTful API. To solve this problem, we propose StreamDL that is a serving framework providing deep learning inference with AMI stream data. It leverages Apache Kafka to support stream data and Kubernetes to support the cloud environment. StreamDL considers the specific requirements for stream data, which supports stream parsing to fit any DL model especially recurrent network and continual training to alleviate accuracy degradation by the change of stream distribution. In this paper, we introduce the detail of the StreamDL platform and its use-cases using real AMI data.
先进的计量基础设施(ami)促进了个体负荷预测。个体负荷预测不仅提高了总体负荷预测的准确性,而且是各种电力应用的基本组成部分。随着深度学习在个体负荷预测中的突出作用,需要一个专门的深度学习服务平台来对AMI流数据进行预测。然而,现有的深度学习模型服务平台并不将流数据作为输入,而是通常通过RESTful API支持图像或文本数据。为了解决这个问题,我们提出了StreamDL,它是一个服务框架,提供AMI流数据的深度学习推理。它利用Apache Kafka来支持流数据,利用Kubernetes来支持云环境。StreamDL考虑了对流数据的特殊要求,支持流解析以适应任何深度学习模型,特别是循环网络和持续训练,以减轻由于流分布变化而导致的准确性下降。在本文中,我们详细介绍了StreamDL平台及其使用实例。
{"title":"StreamDL: Deep Learning Serving Platform for AMI Stream Forecasting","authors":"Eunju Yang, Changha Lee, Ji-Hwan Kim, Tuan Manh Tao, Chan-Hyun Youn","doi":"10.1109/ICDMW51313.2020.00104","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00104","url":null,"abstract":"Advanced Metering Infrastructures (AMIs) facilitate individual load forecasting. The individual load forecasting not only improves the accuracy of aggregated load forecasting but is a fundamental component of various power applications. With the highlight of deep learning (DL) in the individual load forecasting, a serving platform specialized in deep learning is required to forecast with AMI stream data. However, the existing serving platforms for DL models do not consider stream data as an input but usually support image or text data through RESTful API. To solve this problem, we propose StreamDL that is a serving framework providing deep learning inference with AMI stream data. It leverages Apache Kafka to support stream data and Kubernetes to support the cloud environment. StreamDL considers the specific requirements for stream data, which supports stream parsing to fit any DL model especially recurrent network and continual training to alleviate accuracy degradation by the change of stream distribution. In this paper, we introduce the detail of the StreamDL platform and its use-cases using real AMI data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127018210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Improved Wide-Kernel CNN for Classifying Multivariate Signals in Fault Diagnosis 一种用于故障诊断中多变量信号分类的改进宽核CNN
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00046
J. V. D. Hoogen, Stefan Bloemheuvel, M. Atzmüller
Deep Learning (DL) provides considerable opportunities for increased efficiency and performance in fault diagnosis. The ability of DL methods for automatic feature extraction can reduce the need for time-intensive feature construction and prior knowledge on complex signal processing. In this paper, we propose two models that are built on the Wide-Kernel Deep Convolutional Neural Network (WDCNN) framework to improve performance of classifying fault conditions using multivariate time series data, also with respect to limited and/or noisy training data. In our experiments, we use the renowned benchmark dataset from the Case Western Reserve University (CWRU) bearing experiment [1] to assess our models' performance, and to investigate their usability towards large-scale applications by simulating noisy industrial environments. Here, the proposed models show an exceptionally good performance without any preprocessing or data augmentation and outperform traditional Machine Learning applications as well as state-of-the-art DL models considerably, even in such complex multi-class classification tasks. We show that both models are also able to adapt well to noisy input data, which makes them suitable for condition-based maintenance contexts. Furthermore, we investigate and demonstrate explainability and transparency of the models which is particularly important in large-scale industrial applications.
深度学习(DL)为提高故障诊断的效率和性能提供了大量的机会。深度学习方法的自动特征提取能力可以减少对复杂信号处理的耗时特征构建和先验知识的需求。在本文中,我们提出了建立在宽核深度卷积神经网络(WDCNN)框架上的两个模型,以提高使用多元时间序列数据对故障条件进行分类的性能,也适用于有限和/或有噪声的训练数据。在我们的实验中,我们使用来自凯斯西储大学(CWRU)轴承实验[1]的著名基准数据集来评估我们的模型的性能,并通过模拟嘈杂的工业环境来研究它们在大规模应用中的可用性。在这里,所提出的模型在没有任何预处理或数据增强的情况下表现出非常好的性能,并且即使在如此复杂的多类分类任务中,也大大优于传统的机器学习应用程序以及最先进的深度学习模型。我们表明,这两种模型也能够很好地适应噪声输入数据,这使得它们适用于基于状态的维护环境。此外,我们调查并证明了模型的可解释性和透明度,这在大规模工业应用中尤为重要。
{"title":"An Improved Wide-Kernel CNN for Classifying Multivariate Signals in Fault Diagnosis","authors":"J. V. D. Hoogen, Stefan Bloemheuvel, M. Atzmüller","doi":"10.1109/ICDMW51313.2020.00046","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00046","url":null,"abstract":"Deep Learning (DL) provides considerable opportunities for increased efficiency and performance in fault diagnosis. The ability of DL methods for automatic feature extraction can reduce the need for time-intensive feature construction and prior knowledge on complex signal processing. In this paper, we propose two models that are built on the Wide-Kernel Deep Convolutional Neural Network (WDCNN) framework to improve performance of classifying fault conditions using multivariate time series data, also with respect to limited and/or noisy training data. In our experiments, we use the renowned benchmark dataset from the Case Western Reserve University (CWRU) bearing experiment [1] to assess our models' performance, and to investigate their usability towards large-scale applications by simulating noisy industrial environments. Here, the proposed models show an exceptionally good performance without any preprocessing or data augmentation and outperform traditional Machine Learning applications as well as state-of-the-art DL models considerably, even in such complex multi-class classification tasks. We show that both models are also able to adapt well to noisy input data, which makes them suitable for condition-based maintenance contexts. Furthermore, we investigate and demonstrate explainability and transparency of the models which is particularly important in large-scale industrial applications.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116010948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Revenue Maximization using Multitask Learning for Promotion Recommendation 利用多任务学习实现推广推荐的收益最大化
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00029
Venkataramana B. Kini, A. Manjunatha
This paper proposes and evaluates a multitask transfer learning approach to collectively optimize customer loyalty, retail revenue, and promotional revenue. Multitask neural network is employed to predict a customer's propensity to purchase within fine-grained categories. The network is then fine-tuned using transfer learning for a specific promotional campaign. Lastly, retail revenue and promotional revenue are jointly optimized conditioned on customer loyalty. Experiments are conducted using a large retail dataset that shows the efficacy of the proposed method compared to baselines used in the industry. A large retailer is currently adopting the proposed methodology in promotional campaigning owing to significant overall revenue and loyalty gains.
本文提出并评估了一种多任务迁移学习方法,以共同优化客户忠诚度、零售收入和促销收入。采用多任务神经网络对细粒度分类的消费者购买倾向进行预测。然后,针对特定的促销活动,使用迁移学习对网络进行微调。最后,以顾客忠诚度为条件,对零售收入和促销收入进行联合优化。实验使用大型零售数据集进行,与行业中使用的基线相比,显示了所提出方法的有效性。一家大型零售商目前在促销活动中采用了拟议的方法,因为总体收入和忠诚度都得到了显著提高。
{"title":"Revenue Maximization using Multitask Learning for Promotion Recommendation","authors":"Venkataramana B. Kini, A. Manjunatha","doi":"10.1109/ICDMW51313.2020.00029","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00029","url":null,"abstract":"This paper proposes and evaluates a multitask transfer learning approach to collectively optimize customer loyalty, retail revenue, and promotional revenue. Multitask neural network is employed to predict a customer's propensity to purchase within fine-grained categories. The network is then fine-tuned using transfer learning for a specific promotional campaign. Lastly, retail revenue and promotional revenue are jointly optimized conditioned on customer loyalty. Experiments are conducted using a large retail dataset that shows the efficacy of the proposed method compared to baselines used in the industry. A large retailer is currently adopting the proposed methodology in promotional campaigning owing to significant overall revenue and loyalty gains.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128935401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2020 International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1