基于大数据用户画像的 FCM-LR 交叉窃电检测研究

IF 1.6 Q2 ENGINEERING, MULTIDISCIPLINARY International Journal of System Assurance Engineering and Management Pub Date : 2024-04-18 DOI:10.1007/s13198-024-02333-8

Ronghui Hu, Tong Zhen

{"title":"基于大数据用户画像的 FCM-LR 交叉窃电检测研究","authors":"Ronghui Hu, Tong Zhen","doi":"10.1007/s13198-024-02333-8","DOIUrl":null,"url":null,"abstract":"<p>Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has reached 1:200, which significantly limits the accuracy of the ETD model. In cases like this, we refer to it as positive-unlabeled learning. Down-sampling wastes a large amount of negative samples, while up-sampling will result in the ETD model not being robust. Both can lead to ETD models performing well in experimental environments but poorly in production environments. In this context, this paper proposes a semi-supervised electricity theft detection algorithm based on fuzzy c-means and logistic regression cross detection (FCM-LR). Firstly, a statistical feature set based on business data and load data is proposed to depict the profile of electricity users, which can achieve the effect of reducing the complexity of data structure. Furthermore, by using the FCM-LR method, the utilization of unlabeled data can be maximized, and new electricity theft patterns can be discovered. The simulation results show that the theft detection effect of this method is significant, with Precision, Recall, F1, and Area under Curve all approaching 99%.</p>","PeriodicalId":14463,"journal":{"name":"International Journal of System Assurance Engineering and Management","volume":"31 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on FCM-LR cross electricity theft detection based on big data user profile\",\"authors\":\"Ronghui Hu, Tong Zhen\",\"doi\":\"10.1007/s13198-024-02333-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has reached 1:200, which significantly limits the accuracy of the ETD model. In cases like this, we refer to it as positive-unlabeled learning. Down-sampling wastes a large amount of negative samples, while up-sampling will result in the ETD model not being robust. Both can lead to ETD models performing well in experimental environments but poorly in production environments. In this context, this paper proposes a semi-supervised electricity theft detection algorithm based on fuzzy c-means and logistic regression cross detection (FCM-LR). Firstly, a statistical feature set based on business data and load data is proposed to depict the profile of electricity users, which can achieve the effect of reducing the complexity of data structure. Furthermore, by using the FCM-LR method, the utilization of unlabeled data can be maximized, and new electricity theft patterns can be discovered. The simulation results show that the theft detection effect of this method is significant, with Precision, Recall, F1, and Area under Curve all approaching 99%.</p>\",\"PeriodicalId\":14463,\"journal\":{\"name\":\"International Journal of System Assurance Engineering and Management\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of System Assurance Engineering and Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s13198-024-02333-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of System Assurance Engineering and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13198-024-02333-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

基于机器学习和深度学习的数据驱动型窃电检测（ETD）具有自动化、实时性和高效性等优点，但需要大量标注数据来训练模型。然而，正样本和未标记样本之间的不平衡比已达到 1:200，这极大地限制了 ETD 模型的准确性。在这种情况下，我们称之为正向无标签学习。下采样会浪费大量负样本，而上采样则会导致 ETD 模型不稳定。这两种情况都会导致 ETD 模型在实验环境中表现良好，但在生产环境中表现不佳。在此背景下，本文提出了一种基于模糊 c-means 和逻辑回归交叉检测（FCM-LR）的半监督窃电检测算法。首先，提出了基于业务数据和负荷数据的统计特征集来刻画电力用户的特征，从而达到降低数据结构复杂度的效果。此外，通过使用 FCM-LR 方法，可以最大限度地利用未标记数据，发现新的窃电模式。仿真结果表明，该方法的窃电检测效果显著，精确度、召回率、F1 和曲线下面积均接近 99%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Research on FCM-LR cross electricity theft detection based on big data user profile

Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has reached 1:200, which significantly limits the accuracy of the ETD model. In cases like this, we refer to it as positive-unlabeled learning. Down-sampling wastes a large amount of negative samples, while up-sampling will result in the ETD model not being robust. Both can lead to ETD models performing well in experimental environments but poorly in production environments. In this context, this paper proposes a semi-supervised electricity theft detection algorithm based on fuzzy c-means and logistic regression cross detection (FCM-LR). Firstly, a statistical feature set based on business data and load data is proposed to depict the profile of electricity users, which can achieve the effect of reducing the complexity of data structure. Furthermore, by using the FCM-LR method, the utilization of unlabeled data can be maximized, and new electricity theft patterns can be discovered. The simulation results show that the theft detection effect of this method is significant, with Precision, Recall, F1, and Area under Curve all approaching 99%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of System Assurance Engineering and Management ENGINEERING, MULTIDISCIPLINARY-

CiteScore

4.30

自引率

10.00%

发文量

252

期刊介绍： This Journal is established with a view to cater to increased awareness for high quality research in the seamless integration of heterogeneous technologies to formulate bankable solutions to the emergent complex engineering problems. Assurance engineering could be thought of as relating to the provision of higher confidence in the reliable and secure implementation of a system’s critical characteristic features through the espousal of a holistic approach by using a wide variety of cross disciplinary tools and techniques. Successful realization of sustainable and dependable products, systems and services involves an extensive adoption of Reliability, Quality, Safety and Risk related procedures for achieving high assurancelevels of performance; also pivotal are the management issues related to risk and uncertainty that govern the practical constraints encountered in their deployment. It is our intention to provide a platform for the modeling and analysis of large engineering systems, among the other aforementioned allied goals of systems assurance engineering, leading to the enforcement of performance enhancement measures. Achieving a fine balance between theory and practice is the primary focus. The Journal only publishes high quality papers that have passed the rigorous peer review procedure of an archival scientific Journal. The aim is an increasing number of submissions, wide circulation and a high impact factor.