2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

Domain Adaptation Through Cluster Integration and Correlation 基于聚类集成和关联的领域自适应

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00025

Vishnu Manasa Devagiri, V. Boeva, Shahrooz Abghari

Domain shift is a common problem in many real-world applications using machine learning models. Most of the existing solutions are based on supervised and deep-learning models. This paper proposes a novel clustering algorithm capable of producing an adapted and/or integrated clustering model for the considered domains. Source and target domains are represented by clustering models such that each cluster of a domain models a specific scenario of the studied phenomenon by defining a range of allowable values for each attribute in a given data vector. The proposed domain integration algorithm works in two steps: (i) cross-labeling and (ii) integration. Initially, each clustering model is crossly applied to label the cluster representatives of the other model. These labels are used to determine the correlations between the two models to identify the common clusters for both domains, which must be integrated within the second step. Different features of the proposed algorithm are studied and evaluated on a publicly available human activity recognition (HAR) data set and real-world data from a smart logistics use case provided by an industrial partner. The experiment's goal on the HAR data set is to showcase the algorithm's potential in automatic data labeling. While the conducted experiments on the smart logistics use case evaluate and compare the performance of the integrated and two adapted models in different domains.

在使用机器学习模型的许多实际应用中，领域转移是一个常见的问题。大多数现有的解决方案都是基于监督和深度学习模型。本文提出了一种新的聚类算法，能够为所考虑的领域产生自适应和/或集成的聚类模型。源域和目标域由聚类模型表示，这样一个域的每个集群通过定义给定数据向量中每个属性的允许值范围来模拟所研究现象的特定场景。提出的域积分算法分为两个步骤:(i)交叉标记和(ii)积分。最初，每个聚类模型被交叉应用来标记另一个模型的聚类代表。这些标签用于确定两个模型之间的相关性，以识别两个领域的公共集群，这必须在第二步中集成。在公开可用的人类活动识别(HAR)数据集和来自工业合作伙伴提供的智能物流用例的真实数据上，研究和评估了所提出算法的不同特征。在HAR数据集上的实验目标是展示该算法在自动数据标记方面的潜力。在智能物流用例的实验中，对集成模型和两个自适应模型在不同领域的性能进行了评估和比较。

{"title":"Domain Adaptation Through Cluster Integration and Correlation","authors":"Vishnu Manasa Devagiri, V. Boeva, Shahrooz Abghari","doi":"10.1109/ICDMW58026.2022.00025","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00025","url":null,"abstract":"Domain shift is a common problem in many real-world applications using machine learning models. Most of the existing solutions are based on supervised and deep-learning models. This paper proposes a novel clustering algorithm capable of producing an adapted and/or integrated clustering model for the considered domains. Source and target domains are represented by clustering models such that each cluster of a domain models a specific scenario of the studied phenomenon by defining a range of allowable values for each attribute in a given data vector. The proposed domain integration algorithm works in two steps: (i) cross-labeling and (ii) integration. Initially, each clustering model is crossly applied to label the cluster representatives of the other model. These labels are used to determine the correlations between the two models to identify the common clusters for both domains, which must be integrated within the second step. Different features of the proposed algorithm are studied and evaluated on a publicly available human activity recognition (HAR) data set and real-world data from a smart logistics use case provided by an industrial partner. The experiment's goal on the HAR data set is to showcase the algorithm's potential in automatic data labeling. While the conducted experiments on the smart logistics use case evaluate and compare the performance of the integrated and two adapted models in different domains.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127338379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Graph Neural Network with Learnable Permutation Pooling 基于可学习置换池的图神经网络改进

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00094

Yu Jin, J. JáJá

Graph neural networks (GNN) have achieved great success in various graph-related applications. Most existing graph neural network models follow the message-passing neural network (MPNN) paradigm where the graph pooling function forms a critical component that directly determines the model effectiveness. In this paper, we propose PermPool, a new graph pooling function that provably improves the GNN model expressiveness. The method is based on the insight that the distribution of node permuations, when defined properly, forms characteristic encoding of graphs. We propose to express graph representations as the expectation of node permutations with a general pooling function. We show that the graph representation remains invariant to node-reordering and has strong expressive power than MPNN models. In addition, we propose novel permutation modeling and sampling techniques that integrate PermPool into the differentiable neural network models. Empirical results show that our method outperformed other pooling methods in benchmark graph classification tasks.

图神经网络(GNN)在各种与图相关的应用中取得了巨大的成功。大多数现有的图神经网络模型都遵循消息传递神经网络(MPNN)范式，其中图池函数是直接决定模型有效性的关键组件。在本文中，我们提出了一个新的图池函数PermPool，它可以证明提高GNN模型的表达性。该方法是基于节点排列的分布，当定义正确时，形成图的特征编码的洞察力。我们提出用一般池化函数将图表示表示为节点排列的期望。我们证明了图表示对节点重排序保持不变性，并且比MPNN模型具有更强的表达能力。此外，我们提出了新的排列建模和采样技术，将PermPool集成到可微神经网络模型中。实验结果表明，我们的方法在基准图分类任务中优于其他池化方法。

引用次数: 0

Simplifying Process Navigations - Divide and Rule way 简化过程导航-分而治之的方式

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00020

Sai Charan Emmadi, Satya Samudrala, Parag Agrawal, M. Natu

Enterprises heavily rely on their batch processes to ensure smooth business operations. These processes contain thousands of jobs and millions of inter-dependencies. This makes it very difficult to track failures and delays, assess impact, and take timely corrective actions. Hence, it becomes very important to create logically independent groups of processes, so that it is easy to navigate, visualize, and analyze large complex processes, and highlight the areas that need attention. We present a greedy approach to find the logical groups that best meet the objective function and constraints related to batch systems. The proposed approach is implemented and used by various customers. We have validated the proposed approach on real-world customer.

企业在很大程度上依赖于它们的批处理流程来确保业务的顺利运营。这些流程包含数千个作业和数百万个相互依赖项。这使得跟踪故障和延迟、评估影响和及时采取纠正措施变得非常困难。因此，创建逻辑上独立的流程组变得非常重要，这样就可以轻松地导航、可视化和分析大型复杂流程，并突出显示需要注意的区域。我们提出了一种贪心的方法来寻找最能满足批处理系统目标函数和约束的逻辑群。所提出的方法被各种客户实现和使用。我们已经在实际客户中验证了所建议的方法。

引用次数: 0

Cascaded Multi-Class Network Intrusion Detection With Decision Tree and Self-attentive Model 基于决策树和自关注模型的级联多类网络入侵检测

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00081

Yuchen Lan, Tram Truong-Huu, Ji-Yan Wu, S. Teo

Network intrusion has become a leading threat to breaching the security of Internet applications. With the reemergence of artificial intelligence, deep neural networks (DNN) have been widely used for network intrusion detection. However, one main problem with the DNN models is the dependency on sufficient high-quality labeled data to train the model to achieve decent accuracy. DNN models may incur many false predictions on the imbalanced intrusion datasets, especially on the minority classes. While we continue advocating for using machine learning and deep learning for network intrusion detection, we aim at addressing the drawback of existing DNN models by effectively integrating decision tree and feature tokenizer (FT)-transformer. First, the decision tree algorithm is used for the binary classification of regular (normal) traffic and malicious traffic. Second, FT-transformer performs the multi-category classification on that malicious traffic to identify the type of attacking traffic. We conduct the performance evaluation using three publicly available datasets: CIC-IDS 2017, UNSW-NB15, and Kitsune datasets. Experimental results show that among three datasets, the proposed technique achieves the best performance on the CIC-IDS 2017 dataset with the macro precision, recall, and F1-score of 84.6%, 83.6%, and 93.2%, respectively.

网络入侵已成为破坏互联网应用安全的主要威胁。随着人工智能的兴起，深度神经网络(DNN)被广泛应用于网络入侵检测。然而，深度神经网络模型的一个主要问题是依赖于足够高质量的标记数据来训练模型以达到适当的精度。DNN模型在不平衡的入侵数据集上可能会产生许多错误的预测，特别是在少数类上。虽然我们继续提倡使用机器学习和深度学习进行网络入侵检测，但我们的目标是通过有效地集成决策树和特征标记器(FT)-变压器来解决现有DNN模型的缺点。首先，采用决策树算法对正常(正常)流量和恶意流量进行二值分类。其次，FT-transformer对恶意流量进行多类别分类，识别攻击流量的类型。我们使用三个公开的数据集进行性能评估:CIC-IDS 2017, UNSW-NB15和Kitsune数据集。实验结果表明，该方法在CIC-IDS 2017数据集上表现最佳，宏观精度、召回率和f1得分分别为84.6%、83.6%和93.2%。

{"title":"Cascaded Multi-Class Network Intrusion Detection With Decision Tree and Self-attentive Model","authors":"Yuchen Lan, Tram Truong-Huu, Ji-Yan Wu, S. Teo","doi":"10.1109/ICDMW58026.2022.00081","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00081","url":null,"abstract":"Network intrusion has become a leading threat to breaching the security of Internet applications. With the reemergence of artificial intelligence, deep neural networks (DNN) have been widely used for network intrusion detection. However, one main problem with the DNN models is the dependency on sufficient high-quality labeled data to train the model to achieve decent accuracy. DNN models may incur many false predictions on the imbalanced intrusion datasets, especially on the minority classes. While we continue advocating for using machine learning and deep learning for network intrusion detection, we aim at addressing the drawback of existing DNN models by effectively integrating decision tree and feature tokenizer (FT)-transformer. First, the decision tree algorithm is used for the binary classification of regular (normal) traffic and malicious traffic. Second, FT-transformer performs the multi-category classification on that malicious traffic to identify the type of attacking traffic. We conduct the performance evaluation using three publicly available datasets: CIC-IDS 2017, UNSW-NB15, and Kitsune datasets. Experimental results show that among three datasets, the proposed technique achieves the best performance on the CIC-IDS 2017 dataset with the macro precision, recall, and F1-score of 84.6%, 83.6%, and 93.2%, respectively.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129042131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Solving Non-linear Optimization Problem in Engineering by Model-Informed Generative Adversarial Network (MI-GAN) 基于模型信息的生成对抗网络(MI-GAN)求解工程非线性优化问题

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00035

Yuxuan Li, Chaoyue Zhao, Chenang Liu

Optimization models have been widely used in many engineering systems to solve the problems related to system operation and management. For instance, in power systems, the optimal power flow (OPF) problem, which is a critical component of power system operations, can be formulated using optimization models. Specifically, the alternating current OPF (AC-OPF) problems are challenging since some of the constraints are non-linear and non-convex. Moreover, due to the high variability that the power system may have, the coefficients of the optimization model may change, increasing the difficulty of solving the OPF problem. Although the conventional optimization tools and deep learning approaches have been investigated, the feasibility and optimality of the solutions may still be unsatisfactory. Hence, in this paper, based on the recently developed model-informed generative adversarial network (MI-GAN) framework, a tailored version for solving the non-linear AC-OPF problem under uncertainties is proposed. The contributions of this work can be summarized into two main aspects: (1) To ensure the feasibility and improve the optimality of the generated solutions, two important layers, namely, the feasibility filter layer and optimality-filter layer, are considered and designed; and (2) An efficient model-informed selector is designed and integrated to the GAN architecture, by incorporating these two new layers to inform the generator. Experiments on the IEEE test systems demonstrate the efficacy and potential of the proposed method for solving non-linear AC-OPF problems.

优化模型已广泛应用于许多工程系统中，用于解决系统运行和管理的相关问题。例如，在电力系统中，最优潮流(OPF)问题是电力系统运行的关键组成部分，可以用优化模型来表述。具体来说，交流OPF (AC-OPF)问题是具有挑战性的，因为一些约束是非线性和非凸的。此外，由于电力系统可能具有高可变性，优化模型的系数可能会发生变化，从而增加了求解OPF问题的难度。尽管传统的优化工具和深度学习方法已经被研究过，但解决方案的可行性和最优性可能仍然令人不满意。因此，本文基于最近发展的模型知情生成对抗网络(MI-GAN)框架，提出了一个解决不确定条件下非线性AC-OPF问题的定制版本。本工作的贡献可以概括为两个主要方面:(1)为了保证生成的解的可行性和提高其最优性，考虑并设计了两个重要的层，即可行性过滤层和最优性-过滤层;(2)通过结合这两个新层来通知生成器，设计了一个有效的模型通知选择器并将其集成到GAN体系结构中。在IEEE测试系统上的实验证明了该方法解决非线性AC-OPF问题的有效性和潜力。

{"title":"Solving Non-linear Optimization Problem in Engineering by Model-Informed Generative Adversarial Network (MI-GAN)","authors":"Yuxuan Li, Chaoyue Zhao, Chenang Liu","doi":"10.1109/ICDMW58026.2022.00035","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00035","url":null,"abstract":"Optimization models have been widely used in many engineering systems to solve the problems related to system operation and management. For instance, in power systems, the optimal power flow (OPF) problem, which is a critical component of power system operations, can be formulated using optimization models. Specifically, the alternating current OPF (AC-OPF) problems are challenging since some of the constraints are non-linear and non-convex. Moreover, due to the high variability that the power system may have, the coefficients of the optimization model may change, increasing the difficulty of solving the OPF problem. Although the conventional optimization tools and deep learning approaches have been investigated, the feasibility and optimality of the solutions may still be unsatisfactory. Hence, in this paper, based on the recently developed model-informed generative adversarial network (MI-GAN) framework, a tailored version for solving the non-linear AC-OPF problem under uncertainties is proposed. The contributions of this work can be summarized into two main aspects: (1) To ensure the feasibility and improve the optimality of the generated solutions, two important layers, namely, the feasibility filter layer and optimality-filter layer, are considered and designed; and (2) An efficient model-informed selector is designed and integrated to the GAN architecture, by incorporating these two new layers to inform the generator. Experiments on the IEEE test systems demonstrate the efficacy and potential of the proposed method for solving non-linear AC-OPF problems.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126793797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Next POI Recommender System: Multi-view Representation Learning for Outstanding Performance in Various Context 下一个POI推荐系统:多视图表示学习在各种环境下的卓越表现

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00150

Yeonghwan Jeon, Junhyung Kim

Location-based Social Networks (LBSNs) are software service that enable a user to find knowledge and to socialize with other users by offering other user's contents (e.g. reviews, photos, etc.) to a user. This LBSNs have many sub-fields, but Point-of-Interest (POI) recommendation is the most important. Because it is related to the growth of Small and Medium Enterprise (SME) by increasing visitation rate. Generally, it should be possible to respond to various contexts of users in POI recommendation. These contexts are very various and complex, but we define mainly three contexts based on user behavior in local domain. However, each context is defined by different user behavior, so each model and performance are different on various evaluation criteria. In other words, no model is outstanding in all contexts. Therefore, this paper introduces how to define each context, how to make POI embedding for recommendation in empirical multi-view representation learning technique, and how to make optimized POI embedding which is outstanding performance in all contexts of POI recommendation, for various downstream tasks.

基于位置的社交网络(LBSNs)是一种软件服务，使用户能够通过向用户提供其他用户的内容(例如评论，照片等)来查找知识并与其他用户进行社交。这个lbsn有许多子字段，但是兴趣点(POI)推荐是最重要的。因为它关系到中小企业的成长，通过提高访问量。一般来说，在POI推荐中应该能够响应用户的各种上下文。这些上下文是非常多样和复杂的，但我们主要根据用户在局部域的行为定义了三种上下文。然而，每个上下文都是由不同的用户行为定义的，因此每个模型和性能在不同的评估标准上是不同的。换句话说，没有一个模型在所有环境中都是杰出的。因此，本文介绍了如何定义每个上下文，如何在经验多视图表示学习技术中进行推荐的POI嵌入，以及如何针对各种下游任务进行优化的POI嵌入，这是POI推荐在所有上下文中的突出性能。

{"title":"Next POI Recommender System: Multi-view Representation Learning for Outstanding Performance in Various Context","authors":"Yeonghwan Jeon, Junhyung Kim","doi":"10.1109/ICDMW58026.2022.00150","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00150","url":null,"abstract":"Location-based Social Networks (LBSNs) are software service that enable a user to find knowledge and to socialize with other users by offering other user's contents (e.g. reviews, photos, etc.) to a user. This LBSNs have many sub-fields, but Point-of-Interest (POI) recommendation is the most important. Because it is related to the growth of Small and Medium Enterprise (SME) by increasing visitation rate. Generally, it should be possible to respond to various contexts of users in POI recommendation. These contexts are very various and complex, but we define mainly three contexts based on user behavior in local domain. However, each context is defined by different user behavior, so each model and performance are different on various evaluation criteria. In other words, no model is outstanding in all contexts. Therefore, this paper introduces how to define each context, how to make POI embedding for recommendation in empirical multi-view representation learning technique, and how to make optimized POI embedding which is outstanding performance in all contexts of POI recommendation, for various downstream tasks.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116321863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Edit distance with Quasi Real Penalties: a hybrid distance for network-constrained trajectories 带准实惩罚的编辑距离:网络约束轨迹的混合距离

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00136

Noudéhouénou Lionel Jaderne Houssou, Jean-Loup Guillaume, A. Prigent

In this paper, we propose a new distance for network-constrained trajectories named Edit distance with Quasi Real Penalties (EQRP). Depending on the case, it can compare trajectories as non-ordered sets and as sequences while other distances only compare trajectories as non-ordered sets or as sequences. Moreover, it is parameter-free, manages local time shifting, and respects triangle inequality; three properties expected from a trajectory distance that are not satisfied simultaneously by any other distance to the best of our knowledge. To demonstrate the pertinence of our idea, we benchmark our distance against some state-of-the-art distances for network-constrained trajectories. Specifically, for each distance, we determine its capability to identify precisely similar trajectories. We also determine their respective performance for trajectory clustering. Our results show the predominance of EQRP over the existing edit distances and in some cases a more precise ability to evaluate the dissimilarity between network-constrained trajectories compared to other measures.

本文提出了一种新的网络约束轨迹距离，称为带拟实惩罚的编辑距离(EQRP)。根据具体情况，它可以将轨迹作为无序集和序列进行比较，而其他距离只能将轨迹作为无序集或序列进行比较。此外，它是无参数的，能控制局部时移，并尊重三角形不等式;据我们所知，轨道距离不能同时满足其他距离的三个特性。为了证明我们的想法的相关性，我们将我们的距离与网络约束轨迹的一些最先进的距离进行基准测试。具体来说，对于每一段距离，我们确定其识别精确相似轨迹的能力。我们还确定了它们各自的轨迹聚类性能。我们的结果表明，EQRP优于现有的编辑距离，并且在某些情况下，与其他措施相比，更精确地评估网络约束轨迹之间的差异。

{"title":"Edit distance with Quasi Real Penalties: a hybrid distance for network-constrained trajectories","authors":"Noudéhouénou Lionel Jaderne Houssou, Jean-Loup Guillaume, A. Prigent","doi":"10.1109/ICDMW58026.2022.00136","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00136","url":null,"abstract":"In this paper, we propose a new distance for network-constrained trajectories named Edit distance with Quasi Real Penalties (EQRP). Depending on the case, it can compare trajectories as non-ordered sets and as sequences while other distances only compare trajectories as non-ordered sets or as sequences. Moreover, it is parameter-free, manages local time shifting, and respects triangle inequality; three properties expected from a trajectory distance that are not satisfied simultaneously by any other distance to the best of our knowledge. To demonstrate the pertinence of our idea, we benchmark our distance against some state-of-the-art distances for network-constrained trajectories. Specifically, for each distance, we determine its capability to identify precisely similar trajectories. We also determine their respective performance for trajectory clustering. Our results show the predominance of EQRP over the existing edit distances and in some cases a more precise ability to evaluate the dissimilarity between network-constrained trajectories compared to other measures.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126458438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nearest neighbors with incremental learning for real-time forecasting of electricity demand 最近的邻居，增量学习，实时预测电力需求

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00112

Laura Melgar-García, David Gutiérrez-Avilés, Cristina Rubio-Escudero, A. T. Lora

Electricity demand forecasting is very useful for the different actors involved in the energy sector to plan the supply chain (generation, storage and distribution of energy). Nowadays energy demand data are streaming data coming from smart meters and has to be processed in real-time for more efficient demand management. In addition, this kind of data can present changes over time such as new patterns, new trends, etc. Therefore, real-time forecasting algorithms have to adapt and adjust to online arriving data in order to provide timely and accurate responses. This work presents a new algorithm for electricity demand forecasting in real-time. The proposed algorithm generates a prediction model based on the K-nearest neighbors algorithm, which is incrementally updated as online data arrives. Both time-frequency and error threshold based model updates have been evaluated. Results using Spanish electricity demand data with a ten-minute sampling frequency rate are reported, reaching 2% error with the best prediction model obtained when the update is daily.

电力需求预测对于参与能源部门的不同参与者规划供应链(发电、储存和分配能源)非常有用。如今，能源需求数据是来自智能电表的流数据，必须实时处理才能更有效地进行需求管理。此外，这类数据可以随着时间的推移呈现变化，如新的模式、新的趋势等。因此，实时预测算法必须适应和调整在线到达的数据，以便提供及时准确的响应。本文提出了一种新的实时电力需求预测算法。该算法基于k近邻算法生成预测模型，该模型随着在线数据的到来而增量更新。对基于时频和误差阈值的模型更新进行了评估。采用西班牙电力需求数据，以10分钟的采样频率进行报告，当每日更新时，获得的最佳预测模型误差达到2%。

{"title":"Nearest neighbors with incremental learning for real-time forecasting of electricity demand","authors":"Laura Melgar-García, David Gutiérrez-Avilés, Cristina Rubio-Escudero, A. T. Lora","doi":"10.1109/ICDMW58026.2022.00112","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00112","url":null,"abstract":"Electricity demand forecasting is very useful for the different actors involved in the energy sector to plan the supply chain (generation, storage and distribution of energy). Nowadays energy demand data are streaming data coming from smart meters and has to be processed in real-time for more efficient demand management. In addition, this kind of data can present changes over time such as new patterns, new trends, etc. Therefore, real-time forecasting algorithms have to adapt and adjust to online arriving data in order to provide timely and accurate responses. This work presents a new algorithm for electricity demand forecasting in real-time. The proposed algorithm generates a prediction model based on the K-nearest neighbors algorithm, which is incrementally updated as online data arrives. Both time-frequency and error threshold based model updates have been evaluated. Results using Spanish electricity demand data with a ten-minute sampling frequency rate are reported, reaching 2% error with the best prediction model obtained when the update is daily.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128202799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Abnormal Entity-Aware Knowledge Graph Completion 异常实体感知知识图谱补全

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00118

Keyi Sun, Shuo Yu, Ciyuan Peng, Xiang Li, Mehdi Naseriparsa, Feng Xia

In real-world scenarios, knowledge graphs remain incomplete and contain abnormal information, such as redundan-cies, contradictions, inconsistencies, misspellings, and abnormal values. These shortcomings in the knowledge graphs potentially affect service quality in many applications. Although many approaches are proposed to perform knowledge graph completion, they are incapable of handling the abnormal information of knowledge graphs. Therefore, to address the abnormal information issue for the knowledge graph completion task, we design a novel knowledge graph completion framework called ABET, which specially focuses on abnormal entities. ABET consists of two components: a) abnormal entity prediction and b) knowledge graph completion. Firstly, the prediction component automati-cally predicts the abnormal entities in knowledge graphs. Then, the completion component effectively captures the heterogeneous structural information and the high-order features of neighbours based on different relations. Experiments demonstrate that ABET is an effective knowledge graph completion framework, which has made significant improvements over baselines. We further verify that ABET is robust for knowledge graph completion task with abnormal entities.

在现实场景中，知识图仍然是不完整的，并且包含异常信息，例如冗余、矛盾、不一致、拼写错误和异常值。知识图的这些缺点可能会影响许多应用程序的服务质量。尽管提出了许多方法来完成知识图补全，但它们都无法处理知识图的异常信息。因此，为了解决知识图补全任务中的异常信息问题，我们设计了一种新的知识图补全框架ABET，该框架特别关注异常实体。ABET由两个部分组成:a)异常实体预测和b)知识图谱补全。首先，预测组件对知识图中的异常实体进行自动预测。然后，补全分量根据不同的关系有效地捕获异构结构信息和邻居的高阶特征。实验表明，ABET是一种有效的知识图谱补全框架，与基线相比有了显著的改进。进一步验证了ABET对于具有异常实体的知识图补全任务的鲁棒性。

{"title":"Abnormal Entity-Aware Knowledge Graph Completion","authors":"Keyi Sun, Shuo Yu, Ciyuan Peng, Xiang Li, Mehdi Naseriparsa, Feng Xia","doi":"10.1109/ICDMW58026.2022.00118","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00118","url":null,"abstract":"In real-world scenarios, knowledge graphs remain incomplete and contain abnormal information, such as redundan-cies, contradictions, inconsistencies, misspellings, and abnormal values. These shortcomings in the knowledge graphs potentially affect service quality in many applications. Although many approaches are proposed to perform knowledge graph completion, they are incapable of handling the abnormal information of knowledge graphs. Therefore, to address the abnormal information issue for the knowledge graph completion task, we design a novel knowledge graph completion framework called ABET, which specially focuses on abnormal entities. ABET consists of two components: a) abnormal entity prediction and b) knowledge graph completion. Firstly, the prediction component automati-cally predicts the abnormal entities in knowledge graphs. Then, the completion component effectively captures the heterogeneous structural information and the high-order features of neighbours based on different relations. Experiments demonstrate that ABET is an effective knowledge graph completion framework, which has made significant improvements over baselines. We further verify that ABET is robust for knowledge graph completion task with abnormal entities.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131047946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Influence Maximization on Hypergraphs via Similarity-based Diffusion 基于相似度扩散的超图影响最大化

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00158

M. E. Aktas, Sidra Jawaid, Ihsan Gokalp, Esra Akbas

Influence maximization is an important problem in network science that aims to detect critical structures, such as nodes and interactions, with a higher influence on diffusion. It has applications in information spreading, rumor controlling, marketing, disease spreading, advertising, and more. Although the influence maximization problem in graphs has been studied ex-tensively, there are a few studies that explore critical structures in hypergraphs and these studies mostly focus on detecting influential nodes rather than higher-order interactions, i.e., hyperedges. In this paper, we study the influential hyperedge detection problem. We first design diffusion models on hypergraphs based on the similarity between hyperedges. Our claim here is that similarity between hyperedges is positively correlated with the diffusion process. To study this claim, we first calculate similarity scores between hyperedges and construct similarity-based hypergraph Laplacians. Next, we extend standard graph centrality measures for hyperedges using these Laplacians. We compare the similarity- based hypergraph Laplacians with the state-of-the-art influential hyperedge detection method using two evaluation metrics: the size of the giant component and the Susceptible-Infected-Recovered (SIR) simulation model. Our experimental results suggest that overall, similarity-based Laplacians are more effective than the state-of-the-art method in finding influential higher-order hyperedges.

影响最大化是网络科学中的一个重要问题，它旨在检测对扩散有较大影响的关键结构，如节点和交互。它在信息传播、谣言控制、市场营销、疾病传播、广告等方面都有应用。尽管图中的影响最大化问题已经得到了广泛的研究，但只有少数研究探索了超图中的关键结构，这些研究主要集中在检测影响节点而不是高阶相互作用，即超边。本文主要研究具有影响的超边缘检测问题。我们首先基于超边之间的相似性设计了超图上的扩散模型。我们在这里的主张是，超边之间的相似性与扩散过程正相关。为了研究这一说法，我们首先计算超边之间的相似性得分，并构造基于相似性的超图拉普拉斯算子。接下来，我们使用这些拉普拉斯算子扩展超边的标准图中心性度量。我们使用两个评估指标将基于相似性的超图拉普拉斯算子与最先进的有影响力的超边缘检测方法进行比较:巨型组件的大小和敏感-感染-恢复(SIR)模拟模型。我们的实验结果表明，总体而言，在寻找有影响力的高阶超边方面，基于相似性的拉普拉斯算子比最先进的方法更有效。

{"title":"Influence Maximization on Hypergraphs via Similarity-based Diffusion","authors":"M. E. Aktas, Sidra Jawaid, Ihsan Gokalp, Esra Akbas","doi":"10.1109/ICDMW58026.2022.00158","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00158","url":null,"abstract":"Influence maximization is an important problem in network science that aims to detect critical structures, such as nodes and interactions, with a higher influence on diffusion. It has applications in information spreading, rumor controlling, marketing, disease spreading, advertising, and more. Although the influence maximization problem in graphs has been studied ex-tensively, there are a few studies that explore critical structures in hypergraphs and these studies mostly focus on detecting influential nodes rather than higher-order interactions, i.e., hyperedges. In this paper, we study the influential hyperedge detection problem. We first design diffusion models on hypergraphs based on the similarity between hyperedges. Our claim here is that similarity between hyperedges is positively correlated with the diffusion process. To study this claim, we first calculate similarity scores between hyperedges and construct similarity-based hypergraph Laplacians. Next, we extend standard graph centrality measures for hyperedges using these Laplacians. We compare the similarity- based hypergraph Laplacians with the state-of-the-art influential hyperedge detection method using two evaluation metrics: the size of the giant component and the Susceptible-Infected-Recovered (SIR) simulation model. Our experimental results suggest that overall, similarity-based Laplacians are more effective than the state-of-the-art method in finding influential higher-order hyperedges.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133357932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1