首页 > 最新文献

2020 International Conference on Data Mining Workshops (ICDMW)最新文献

英文 中文
DQN-based Join Order Optimization by Learning Experiences of Running Queries on Spark SQL 基于dqn的连接顺序优化——学习在Spark SQL上运行查询的经验
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00107
Kyeong-Min Lee, InA Kim, Kyu-Chul Lee
In a smart grid, various types of queries such as ad-hoc queries and analytic queries are requested for data. There is a limit to query evaluation based on a single node database engines because queries are requested for a large scale of data in the smart grid. In this paper, to improve the performance of retrieving a large scale of data in the smart grid environment, we propose a DQN-based join order optimization model on Spark SQL. The model learns the actual processing time of queries that are evaluated on Spark SQL, not the estimated costs. By learning the optimal join orders from previous experiences, we optimize the join orders with similar performance to Spark SQL without collecting and computing the statistics of an input data set.
在智能网格中,数据会被请求各种类型的查询,例如临时查询和分析查询。基于单节点数据库引擎的查询评估存在局限性,因为智能电网中的查询请求是针对大规模数据的。为了提高智能电网环境下大规模数据检索的性能,本文提出了一种基于dqn的Spark SQL连接顺序优化模型。该模型学习在Spark SQL上评估的查询的实际处理时间,而不是估计的成本。通过从以前的经验中学习最优连接顺序,我们优化了与Spark SQL性能相似的连接顺序,而无需收集和计算输入数据集的统计信息。
{"title":"DQN-based Join Order Optimization by Learning Experiences of Running Queries on Spark SQL","authors":"Kyeong-Min Lee, InA Kim, Kyu-Chul Lee","doi":"10.1109/ICDMW51313.2020.00107","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00107","url":null,"abstract":"In a smart grid, various types of queries such as ad-hoc queries and analytic queries are requested for data. There is a limit to query evaluation based on a single node database engines because queries are requested for a large scale of data in the smart grid. In this paper, to improve the performance of retrieving a large scale of data in the smart grid environment, we propose a DQN-based join order optimization model on Spark SQL. The model learns the actual processing time of queries that are evaluated on Spark SQL, not the estimated costs. By learning the optimal join orders from previous experiences, we optimize the join orders with similar performance to Spark SQL without collecting and computing the statistics of an input data set.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125880840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Anomaly Detection of Periodic Multivariate Time Series under High Acquisition Frequency Scene in IoT 物联网高采集频率场景下周期多元时间序列异常检测
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00078
Shuo Zhang, Xiaofei Chen, Jiayuan Chen, Qiao Jiang, Hejiao Huang
Anomaly Detection of Multivariate Time Series is an intensive research topic in data mining, especially with the rise of Industry 4.0. However, few existing approaches are taken under high acquisition scene, and only a minority of them took periodicity of time series into consideration. In this paper, we propose a novel network Dual-window RNN-CNN to detect periodic time series anomalies of high acquisition frequency scene in IoT. We first apply Dual-window to segment time series according to the periodicity of data and solve the time alignment problem. Then we utilize Multi-head GRU to compress the data volume and extract temporal features sensor by sensor, which not only solves the problems caused by high acquisition but also adds more flexible transfer ability to our network. In order to improve the robustness of our network in different periodic scenes of IoT, three different kinds of GRU mode are put forward. Finally we use CNN-based Autoencoder to locate anomalies according to both temporal and spatial dependencies. It should also be note that Multi-head GRU broadens the receptive field of CNN-based Autoencoder. Two parts of experiment were carried to verify the validity of Dual-Window RNN-CNN. The first part is conducted on UCR/UEA benchmark to discuss the performance of Dual-Window RNN-CNN under different structures and hyper parameters, for datasets in UCR/UAE benchmark contain enough timestamps to monitor the high acquisition and periodicity in IoT. The second part is conducted on Yahoo Webscope benchmark and NAB to compare our network with other classic time series anomaly detection approaches. Experiment results confirm that our Dual-Window RNN-CNN outperforms other approaches in anomaly detection of periodic multivariate time series, demonstrating the advantages of our network in high acquisition scene.
随着工业4.0的兴起,多变量时间序列异常检测是数据挖掘领域的一个热点研究课题。然而,现有的方法很少考虑高采集场景,而且只有少数方法考虑了时间序列的周期性。在本文中,我们提出了一种新的双窗口RNN-CNN网络来检测物联网中高采集频率场景的周期性时间序列异常。首先根据数据的周期性,应用双窗口对时间序列进行分割,解决时间对齐问题。然后利用Multi-head GRU压缩数据量,逐个传感器提取时间特征,既解决了高采集带来的问题,又使网络具有更灵活的传输能力。为了提高网络在物联网不同周期场景下的鲁棒性,提出了三种不同的GRU模式。最后利用基于cnn的自编码器根据时间和空间依赖关系定位异常。值得注意的是,多头GRU拓宽了基于cnn的自编码器的接受域。通过两部分实验验证了双窗口RNN-CNN的有效性。第一部分是在UCR/UEA基准测试上进行的,讨论了双窗口RNN-CNN在不同结构和超参数下的性能,因为UCR/UAE基准测试中的数据集包含足够的时间戳来监控物联网中的高采集和周期性。第二部分在Yahoo Webscope基准测试和NAB上进行,将我们的网络与其他经典的时间序列异常检测方法进行比较。实验结果证实,我们的双窗口RNN-CNN在周期性多元时间序列异常检测方面优于其他方法,证明了我们的网络在高采集场景下的优势。
{"title":"Anomaly Detection of Periodic Multivariate Time Series under High Acquisition Frequency Scene in IoT","authors":"Shuo Zhang, Xiaofei Chen, Jiayuan Chen, Qiao Jiang, Hejiao Huang","doi":"10.1109/ICDMW51313.2020.00078","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00078","url":null,"abstract":"Anomaly Detection of Multivariate Time Series is an intensive research topic in data mining, especially with the rise of Industry 4.0. However, few existing approaches are taken under high acquisition scene, and only a minority of them took periodicity of time series into consideration. In this paper, we propose a novel network Dual-window RNN-CNN to detect periodic time series anomalies of high acquisition frequency scene in IoT. We first apply Dual-window to segment time series according to the periodicity of data and solve the time alignment problem. Then we utilize Multi-head GRU to compress the data volume and extract temporal features sensor by sensor, which not only solves the problems caused by high acquisition but also adds more flexible transfer ability to our network. In order to improve the robustness of our network in different periodic scenes of IoT, three different kinds of GRU mode are put forward. Finally we use CNN-based Autoencoder to locate anomalies according to both temporal and spatial dependencies. It should also be note that Multi-head GRU broadens the receptive field of CNN-based Autoencoder. Two parts of experiment were carried to verify the validity of Dual-Window RNN-CNN. The first part is conducted on UCR/UEA benchmark to discuss the performance of Dual-Window RNN-CNN under different structures and hyper parameters, for datasets in UCR/UAE benchmark contain enough timestamps to monitor the high acquisition and periodicity in IoT. The second part is conducted on Yahoo Webscope benchmark and NAB to compare our network with other classic time series anomaly detection approaches. Experiment results confirm that our Dual-Window RNN-CNN outperforms other approaches in anomaly detection of periodic multivariate time series, demonstrating the advantages of our network in high acquisition scene.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124852819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Temporally-Reweighted Dirichlet Process Mixture Anomaly Detector 时间重加权狄利克雷过程混合异常检测器
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00045
JunYong Tong, Nick Torenvliet
This paper proposes a streaming anomaly detection algorithm using variational Bayesian non-parametric methods. We extend the use of Dirichlet process mixture models to anomaly detection for online streaming data through the use of streaming variational bayes method and a cohesion function. Using our algorithm, we were able to update model parameters sequentially near real-time, using a fixed amount of computational resources. The algorithm was able to capture the temporal dynamics of the data and enabled good online anomaly detection. We demonstrate the performance, and discuss results, of the algorithm on an industrial datasets with anomalies provided by a local utility.
本文提出了一种基于变分贝叶斯非参数方法的流异常检测算法。我们通过使用流变分贝叶斯方法和内聚函数将Dirichlet过程混合模型的使用扩展到在线流数据的异常检测中。使用我们的算法,我们能够使用固定数量的计算资源,近乎实时地顺序更新模型参数。该算法能够捕获数据的时间动态,并实现良好的在线异常检测。我们在本地公用事业公司提供的具有异常的工业数据集上演示了该算法的性能并讨论了结果。
{"title":"Temporally-Reweighted Dirichlet Process Mixture Anomaly Detector","authors":"JunYong Tong, Nick Torenvliet","doi":"10.1109/ICDMW51313.2020.00045","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00045","url":null,"abstract":"This paper proposes a streaming anomaly detection algorithm using variational Bayesian non-parametric methods. We extend the use of Dirichlet process mixture models to anomaly detection for online streaming data through the use of streaming variational bayes method and a cohesion function. Using our algorithm, we were able to update model parameters sequentially near real-time, using a fixed amount of computational resources. The algorithm was able to capture the temporal dynamics of the data and enabled good online anomaly detection. We demonstrate the performance, and discuss results, of the algorithm on an industrial datasets with anomalies provided by a local utility.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130792155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building knowledge graphs of homicide investigation chronologies 建立凶杀调查年表的知识图谱
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00115
Ritika Pandey, P. Brantingham, Craig D. Uchida, G. Mohler
Homicide investigations generate large and diverse data in the form of witness interview transcripts, physical evidence, photographs, DNA, etc. Homicide case chronologies are summaries of these data created by investigators that consist of short text-based entries documenting specific steps taken in the investigation. A chronology tracks the evolution of an investigation, including when and how persons involved and items of evidence became part of a case. In this article we discuss a framework for creating knowledge graphs of case chronologies that may aid investigators in analyzing homicide case data and also allow for post hoc analysis of the key features that determine whether a homicide is ultimately solved. Our method consists of 1) performing named entity recognition to determine witnesses, suspects, and detectives from chronology entries 2) using keyword expansion to identify documentary, physical, and forensic evidence in each entry and 3) linking entities and evidence to construct a homicide investigation knowledge graph. We compare the performance of several choices of methodologies for these sub-tasks using homicide investigation chronologies from Los Angeles, California. We then analyze the association between network statistics of the knowledge graphs and homicide solvability.
凶杀调查产生了大量多样的数据,包括证人采访记录、物证、照片、DNA等。杀人案年表是调查人员创建的这些数据的摘要,由记录调查中采取的具体步骤的简短文本条目组成。年表记录了一项调查的演变,包括涉及的人员和证据项目何时以及如何成为案件的一部分。在本文中,我们讨论了一个用于创建案件年表知识图谱的框架,它可以帮助调查人员分析杀人案数据,并允许对决定杀人案最终是否得到解决的关键特征进行事后分析。我们的方法包括:1)执行命名实体识别,从年表条目中确定证人、嫌疑人和侦探;2)使用关键字扩展来识别每个条目中的文件、物理和法医证据;3)将实体和证据联系起来,构建凶杀调查知识图谱。我们使用来自加利福尼亚州洛杉矶的凶杀调查年表来比较这些子任务的几种方法选择的性能。然后,我们分析了知识图的网络统计与凶杀可解性之间的关系。
{"title":"Building knowledge graphs of homicide investigation chronologies","authors":"Ritika Pandey, P. Brantingham, Craig D. Uchida, G. Mohler","doi":"10.1109/ICDMW51313.2020.00115","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00115","url":null,"abstract":"Homicide investigations generate large and diverse data in the form of witness interview transcripts, physical evidence, photographs, DNA, etc. Homicide case chronologies are summaries of these data created by investigators that consist of short text-based entries documenting specific steps taken in the investigation. A chronology tracks the evolution of an investigation, including when and how persons involved and items of evidence became part of a case. In this article we discuss a framework for creating knowledge graphs of case chronologies that may aid investigators in analyzing homicide case data and also allow for post hoc analysis of the key features that determine whether a homicide is ultimately solved. Our method consists of 1) performing named entity recognition to determine witnesses, suspects, and detectives from chronology entries 2) using keyword expansion to identify documentary, physical, and forensic evidence in each entry and 3) linking entities and evidence to construct a homicide investigation knowledge graph. We compare the performance of several choices of methodologies for these sub-tasks using homicide investigation chronologies from Los Angeles, California. We then analyze the association between network statistics of the knowledge graphs and homicide solvability.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123822929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A federated learning based approach for loan defaults prediction 基于联邦学习的贷款违约预测方法
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00057
Geet Shingi
The number of defaults in bank loans have recently been increasing in the past years. However, the process of sanctioning the loan has still been done manually in many of the banking organizations. Dependency on human intervention and delay in results have been the biggest obstacles in this system. While implementing machine learning models for banking applications, the security of sensitive customer banking data has always been a crucial concern and with strong legislative rules in place, sharing of data with other organizations is not possible. Along with this, the loan dataset is highly imbalanced, there are very few samples of defaults as compared to repaid loans. Hence, these problems make the default prediction system difficult to learn the patterns of defaults and thus difficult to predict them. Previous machine learning-based approaches to automate the process have been training models on the same organization's data but in today's world, classifying the loan application on the data within the organizations is no longer sufficient and a feasible solution. In this paper, we propose a federated learning-based approach for the prediction of loan applications that are less likely to be repaid which helps in resolving the above mentioned issues by sharing the weight of the model which are aggregated at the central server. The federated system is coupled with Synthetic Minority Over-sampling Technique(SMOTE) to solve the problem of imbalanced training data. Further, The federated system is coupled with a weighted aggregation based on the number of samples and performance of a worker on his dataset to further augment the performance. The improved performance by our model on publicly available real-world data further validates the same. Flexible, aggregated models can prove to be crucial in keeping out the defaulters in loan applications.
过去几年,银行贷款违约的数量一直在增加。然而,在许多银行组织中,批准贷款的过程仍然是手工完成的。依赖人为干预和拖延结果一直是这一系统的最大障碍。在为银行应用程序实施机器学习模型时,敏感客户银行数据的安全性一直是一个关键问题,并且由于有强有力的立法规则,与其他组织共享数据是不可能的。与此同时,贷款数据集高度不平衡,与偿还贷款相比,违约样本很少。因此,这些问题使得默认预测系统难以学习默认模式,从而难以预测默认模式。以前基于机器学习的自动化流程方法是在同一组织的数据上训练模型,但在当今世界,根据组织内的数据对贷款申请进行分类不再是足够的,也是可行的解决方案。在本文中,我们提出了一种基于联邦学习的方法来预测不太可能偿还的贷款申请,该方法通过共享在中央服务器上聚合的模型的权重来帮助解决上述问题。为了解决训练数据不平衡的问题,将联邦系统与合成少数派过采样技术(SMOTE)相结合。此外,联邦系统与基于样本数量和工作人员在其数据集上的性能的加权聚合相结合,以进一步提高性能。我们的模型在公开可用的真实数据上的改进性能进一步验证了这一点。事实证明,灵活的聚合模型对于阻止贷款申请中的违约者是至关重要的。
{"title":"A federated learning based approach for loan defaults prediction","authors":"Geet Shingi","doi":"10.1109/ICDMW51313.2020.00057","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00057","url":null,"abstract":"The number of defaults in bank loans have recently been increasing in the past years. However, the process of sanctioning the loan has still been done manually in many of the banking organizations. Dependency on human intervention and delay in results have been the biggest obstacles in this system. While implementing machine learning models for banking applications, the security of sensitive customer banking data has always been a crucial concern and with strong legislative rules in place, sharing of data with other organizations is not possible. Along with this, the loan dataset is highly imbalanced, there are very few samples of defaults as compared to repaid loans. Hence, these problems make the default prediction system difficult to learn the patterns of defaults and thus difficult to predict them. Previous machine learning-based approaches to automate the process have been training models on the same organization's data but in today's world, classifying the loan application on the data within the organizations is no longer sufficient and a feasible solution. In this paper, we propose a federated learning-based approach for the prediction of loan applications that are less likely to be repaid which helps in resolving the above mentioned issues by sharing the weight of the model which are aggregated at the central server. The federated system is coupled with Synthetic Minority Over-sampling Technique(SMOTE) to solve the problem of imbalanced training data. Further, The federated system is coupled with a weighted aggregation based on the number of samples and performance of a worker on his dataset to further augment the performance. The improved performance by our model on publicly available real-world data further validates the same. Flexible, aggregated models can prove to be crucial in keeping out the defaulters in loan applications.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123874255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Learning Disentangled Representation of Residential Power Demand Peak via Convolutional-Recurrent Triplet Network 基于卷积-循环三重网络的住宅电力需求峰值解纠缠表征
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00110
Hyung-Jun Moon, Seok-Jun Bu, Sung-Bae Cho
In the time-series models for predicting residential energy consumption, the energy properties collected through multiple sensors usually include irregular and seasonal factors. The irregular pattern resulting from them is called peak demand, which is a major cause of performance degradation. In order to enhance the performance, we propose a convolutional-recurrent triplet network to learn and detect the demand peaks. The proposed model generates the latent space for demand peaks from data, which is transferred into convolutional neural network-long short-term memory (CNN-LSTM) to finally predict the future power demand. Experiments with the dataset of UCI household power consumption composed of a total of 2,075,259 time-series data show that the proposed model reduces the error by 23.63% and outperforms the state-of-the-art deep learning models including the CNN-LSTM. Especially, the proposed model improves the prediction performance by modeling the distribution of demand peaks in Euclidean space.
在住宅能耗预测的时间序列模型中,通过多个传感器采集的能源特性通常包含不规则和季节性因素。由此产生的不规则模式称为峰值需求,这是导致性能下降的主要原因。为了提高性能,我们提出了一种卷积-循环三重网络来学习和检测需求峰值。该模型从数据中生成需求峰值的潜在空间,并将其传递到卷积神经网络长短期记忆(CNN-LSTM)中,最终预测未来的电力需求。在包含2,075,259个时间序列数据的UCI家庭用电数据集上进行的实验表明,该模型的误差降低了23.63%,优于CNN-LSTM等最先进的深度学习模型。该模型通过对需求峰在欧氏空间中的分布进行建模,提高了预测性能。
{"title":"Learning Disentangled Representation of Residential Power Demand Peak via Convolutional-Recurrent Triplet Network","authors":"Hyung-Jun Moon, Seok-Jun Bu, Sung-Bae Cho","doi":"10.1109/ICDMW51313.2020.00110","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00110","url":null,"abstract":"In the time-series models for predicting residential energy consumption, the energy properties collected through multiple sensors usually include irregular and seasonal factors. The irregular pattern resulting from them is called peak demand, which is a major cause of performance degradation. In order to enhance the performance, we propose a convolutional-recurrent triplet network to learn and detect the demand peaks. The proposed model generates the latent space for demand peaks from data, which is transferred into convolutional neural network-long short-term memory (CNN-LSTM) to finally predict the future power demand. Experiments with the dataset of UCI household power consumption composed of a total of 2,075,259 time-series data show that the proposed model reduces the error by 23.63% and outperforms the state-of-the-art deep learning models including the CNN-LSTM. Especially, the proposed model improves the prediction performance by modeling the distribution of demand peaks in Euclidean space.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122762712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Revenue Maximization using Multitask Learning for Promotion Recommendation 利用多任务学习实现推广推荐的收益最大化
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00029
Venkataramana B. Kini, A. Manjunatha
This paper proposes and evaluates a multitask transfer learning approach to collectively optimize customer loyalty, retail revenue, and promotional revenue. Multitask neural network is employed to predict a customer's propensity to purchase within fine-grained categories. The network is then fine-tuned using transfer learning for a specific promotional campaign. Lastly, retail revenue and promotional revenue are jointly optimized conditioned on customer loyalty. Experiments are conducted using a large retail dataset that shows the efficacy of the proposed method compared to baselines used in the industry. A large retailer is currently adopting the proposed methodology in promotional campaigning owing to significant overall revenue and loyalty gains.
本文提出并评估了一种多任务迁移学习方法,以共同优化客户忠诚度、零售收入和促销收入。采用多任务神经网络对细粒度分类的消费者购买倾向进行预测。然后,针对特定的促销活动,使用迁移学习对网络进行微调。最后,以顾客忠诚度为条件,对零售收入和促销收入进行联合优化。实验使用大型零售数据集进行,与行业中使用的基线相比,显示了所提出方法的有效性。一家大型零售商目前在促销活动中采用了拟议的方法,因为总体收入和忠诚度都得到了显著提高。
{"title":"Revenue Maximization using Multitask Learning for Promotion Recommendation","authors":"Venkataramana B. Kini, A. Manjunatha","doi":"10.1109/ICDMW51313.2020.00029","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00029","url":null,"abstract":"This paper proposes and evaluates a multitask transfer learning approach to collectively optimize customer loyalty, retail revenue, and promotional revenue. Multitask neural network is employed to predict a customer's propensity to purchase within fine-grained categories. The network is then fine-tuned using transfer learning for a specific promotional campaign. Lastly, retail revenue and promotional revenue are jointly optimized conditioned on customer loyalty. Experiments are conducted using a large retail dataset that shows the efficacy of the proposed method compared to baselines used in the industry. A large retailer is currently adopting the proposed methodology in promotional campaigning owing to significant overall revenue and loyalty gains.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128935401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Experimental Evaluation of Data Classification Models for Credibility Based Fake News Detection 基于可信度的假新闻检测数据分类模型实验评价
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00022
A. Ramkissoon, Shareeda Mohammed
The existence of fake news is a problem challenging today's social media enabled world. Fake news can be classified using varying methods. Predicting and detecting fake news has proven to be challenging even for machine learning algorithms. This research attempts to investigate nine such machine learning algorithms to understand their performance with Credibility Based Fake News Detection. This study uses a standard dataset with features relating to the credibility of news publishers. These features are analysed using each of these algorithms. The results of these experiments are analysed using four evaluation methodologies. The analysis reveals varying performance with the use of each of the nine methods. Based upon our selected dataset, one of these methods has proven to be most appropriate for the purpose of Credibility Based Fake News Detection.
假新闻的存在是当今社交媒体世界面临的一个挑战。假新闻可以用不同的方法分类。事实证明,即使对机器学习算法来说,预测和检测假新闻也很有挑战性。本研究试图研究九种这样的机器学习算法,以了解它们在基于可信度的假新闻检测中的表现。本研究使用了一个标准数据集,其中包含与新闻出版商可信度相关的特征。使用这些算法对这些特征进行分析。用四种评价方法对实验结果进行了分析。分析显示,使用这九种方法中的每一种都有不同的性能。根据我们选择的数据集,其中一种方法已被证明是最适合基于可信度的假新闻检测的。
{"title":"An Experimental Evaluation of Data Classification Models for Credibility Based Fake News Detection","authors":"A. Ramkissoon, Shareeda Mohammed","doi":"10.1109/ICDMW51313.2020.00022","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00022","url":null,"abstract":"The existence of fake news is a problem challenging today's social media enabled world. Fake news can be classified using varying methods. Predicting and detecting fake news has proven to be challenging even for machine learning algorithms. This research attempts to investigate nine such machine learning algorithms to understand their performance with Credibility Based Fake News Detection. This study uses a standard dataset with features relating to the credibility of news publishers. These features are analysed using each of these algorithms. The results of these experiments are analysed using four evaluation methodologies. The analysis reveals varying performance with the use of each of the nine methods. Based upon our selected dataset, one of these methods has proven to be most appropriate for the purpose of Credibility Based Fake News Detection.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121053428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Electric Energy Demand Forecasting with Explainable Time-series Modeling 基于可解释时间序列模型的电力需求预测
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00101
Jin-Young Kim, Sung-Bae Cho
Recently, deep learning models are utilized to predict the energy consumption. However, to construct the smart grid systems, the conventional methods have limitation on explanatory power or require manual analysis. To overcome it, in this paper, we present a novel deep learning model that can infer the predicted results by calculating the correlation between the latent variables and output as well as forecast the future consumption in high performance. The proposed model is composed of 1) a main encoder that models the past energy demand, 2) a sub encoder that models electric information except global active power as the latent variable in two dimensions, 3) a predictor that maps the future demand from the concatenation of the latent variables extracted from each encoder, and 4) an explainer that provides the most significant electric information. Several experiments on a household electric energy demand dataset show that the proposed model not only has better performance than the conventional models, but also provides the ability to explain the results by analyzing the correlation of inputs, latent variables, and energy demand predicted in the form of time-series.
最近,深度学习模型被用来预测能源消耗。然而,为了构建智能电网系统,传统的方法存在解释力有限或需要人工分析的问题。为了克服这个问题,在本文中,我们提出了一种新的深度学习模型,该模型可以通过计算潜在变量与输出之间的相关性来推断预测结果,并预测高性能的未来消耗。所提出的模型由1)模拟过去能源需求的主编码器,2)模拟除全球有功功率作为两个维度潜在变量的电力信息的副编码器,3)从每个编码器提取的潜在变量的连接中映射未来需求的预测器,以及4)提供最重要电力信息的解释器组成。在一个家庭电力需求数据集上的实验表明,该模型不仅比传统模型具有更好的性能,而且能够通过分析输入、潜在变量和以时间序列形式预测的能源需求之间的相关性来解释结果。
{"title":"Electric Energy Demand Forecasting with Explainable Time-series Modeling","authors":"Jin-Young Kim, Sung-Bae Cho","doi":"10.1109/ICDMW51313.2020.00101","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00101","url":null,"abstract":"Recently, deep learning models are utilized to predict the energy consumption. However, to construct the smart grid systems, the conventional methods have limitation on explanatory power or require manual analysis. To overcome it, in this paper, we present a novel deep learning model that can infer the predicted results by calculating the correlation between the latent variables and output as well as forecast the future consumption in high performance. The proposed model is composed of 1) a main encoder that models the past energy demand, 2) a sub encoder that models electric information except global active power as the latent variable in two dimensions, 3) a predictor that maps the future demand from the concatenation of the latent variables extracted from each encoder, and 4) an explainer that provides the most significant electric information. Several experiments on a household electric energy demand dataset show that the proposed model not only has better performance than the conventional models, but also provides the ability to explain the results by analyzing the correlation of inputs, latent variables, and energy demand predicted in the form of time-series.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"349 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122041042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Blockchain Applications to combat the global trade of falsified drugs 区块链应用于打击假药全球贸易
Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00127
Y. Kostyuchenko, Qingshan Jiang
The globalization of the pharmaceutical supply chain has lead to new challenges, the leading position among them is the fight against falsified and substandard pharmaceutical products. Such kind of products causes ineffective or harmful therapies all over the world. Traditional centralized technical tools can hardly satisfy the requirements of the changing industry. In this paper, we research the application of Blockchain solutions to modernize the drug supply chain and minimize the amount of the poor-quality medications.
医药供应链的全球化带来了新的挑战,其中首当其冲的就是打击假冒伪劣药品。这类产品在世界各地造成无效或有害的治疗。传统的集中式技术工具难以满足日新月异的行业需求。在本文中,我们研究了区块链解决方案的应用,以实现药品供应链的现代化,并最大限度地减少劣质药品的数量。
{"title":"Blockchain Applications to combat the global trade of falsified drugs","authors":"Y. Kostyuchenko, Qingshan Jiang","doi":"10.1109/ICDMW51313.2020.00127","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00127","url":null,"abstract":"The globalization of the pharmaceutical supply chain has lead to new challenges, the leading position among them is the fight against falsified and substandard pharmaceutical products. Such kind of products causes ineffective or harmful therapies all over the world. Traditional centralized technical tools can hardly satisfy the requirements of the changing industry. In this paper, we research the application of Blockchain solutions to modernize the drug supply chain and minimize the amount of the poor-quality medications.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121258094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 International Conference on Data Mining Workshops (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1