Research on FCM-LR cross electricity theft detection based on big data user profile

IF 1.6 Q2 ENGINEERING, MULTIDISCIPLINARY International Journal of System Assurance Engineering and Management Pub Date : 2024-04-18 DOI:10.1007/s13198-024-02333-8
Ronghui Hu, Tong Zhen
{"title":"Research on FCM-LR cross electricity theft detection based on big data user profile","authors":"Ronghui Hu, Tong Zhen","doi":"10.1007/s13198-024-02333-8","DOIUrl":null,"url":null,"abstract":"<p>Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has reached 1:200, which significantly limits the accuracy of the ETD model. In cases like this, we refer to it as positive-unlabeled learning. Down-sampling wastes a large amount of negative samples, while up-sampling will result in the ETD model not being robust. Both can lead to ETD models performing well in experimental environments but poorly in production environments. In this context, this paper proposes a semi-supervised electricity theft detection algorithm based on fuzzy c-means and logistic regression cross detection (FCM-LR). Firstly, a statistical feature set based on business data and load data is proposed to depict the profile of electricity users, which can achieve the effect of reducing the complexity of data structure. Furthermore, by using the FCM-LR method, the utilization of unlabeled data can be maximized, and new electricity theft patterns can be discovered. The simulation results show that the theft detection effect of this method is significant, with Precision, Recall, F1, and Area under Curve all approaching 99%.</p>","PeriodicalId":14463,"journal":{"name":"International Journal of System Assurance Engineering and Management","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of System Assurance Engineering and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13198-024-02333-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Data-driven electricity theft detection (ETD) based on machine learning and deep learning has the advantages of automation, real-time performance, and efficiency while requiring a large amount of labeled data to train models. However, the imbalance ratio between positive and unlabeled samples has reached 1:200, which significantly limits the accuracy of the ETD model. In cases like this, we refer to it as positive-unlabeled learning. Down-sampling wastes a large amount of negative samples, while up-sampling will result in the ETD model not being robust. Both can lead to ETD models performing well in experimental environments but poorly in production environments. In this context, this paper proposes a semi-supervised electricity theft detection algorithm based on fuzzy c-means and logistic regression cross detection (FCM-LR). Firstly, a statistical feature set based on business data and load data is proposed to depict the profile of electricity users, which can achieve the effect of reducing the complexity of data structure. Furthermore, by using the FCM-LR method, the utilization of unlabeled data can be maximized, and new electricity theft patterns can be discovered. The simulation results show that the theft detection effect of this method is significant, with Precision, Recall, F1, and Area under Curve all approaching 99%.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于大数据用户画像的 FCM-LR 交叉窃电检测研究
基于机器学习和深度学习的数据驱动型窃电检测(ETD)具有自动化、实时性和高效性等优点,但需要大量标注数据来训练模型。然而,正样本和未标记样本之间的不平衡比已达到 1:200,这极大地限制了 ETD 模型的准确性。在这种情况下,我们称之为正向无标签学习。下采样会浪费大量负样本,而上采样则会导致 ETD 模型不稳定。这两种情况都会导致 ETD 模型在实验环境中表现良好,但在生产环境中表现不佳。在此背景下,本文提出了一种基于模糊 c-means 和逻辑回归交叉检测(FCM-LR)的半监督窃电检测算法。首先,提出了基于业务数据和负荷数据的统计特征集来刻画电力用户的特征,从而达到降低数据结构复杂度的效果。此外,通过使用 FCM-LR 方法,可以最大限度地利用未标记数据,发现新的窃电模式。仿真结果表明,该方法的窃电检测效果显著,精确度、召回率、F1 和曲线下面积均接近 99%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.30
自引率
10.00%
发文量
252
期刊介绍: This Journal is established with a view to cater to increased awareness for high quality research in the seamless integration of heterogeneous technologies to formulate bankable solutions to the emergent complex engineering problems. Assurance engineering could be thought of as relating to the provision of higher confidence in the reliable and secure implementation of a system’s critical characteristic features through the espousal of a holistic approach by using a wide variety of cross disciplinary tools and techniques. Successful realization of sustainable and dependable products, systems and services involves an extensive adoption of Reliability, Quality, Safety and Risk related procedures for achieving high assurancelevels of performance; also pivotal are the management issues related to risk and uncertainty that govern the practical constraints encountered in their deployment. It is our intention to provide a platform for the modeling and analysis of large engineering systems, among the other aforementioned allied goals of systems assurance engineering, leading to the enforcement of performance enhancement measures. Achieving a fine balance between theory and practice is the primary focus. The Journal only publishes high quality papers that have passed the rigorous peer review procedure of an archival scientific Journal. The aim is an increasing number of submissions, wide circulation and a high impact factor.
期刊最新文献
Vision-based gait analysis to detect Parkinson’s disease using hybrid Harris hawks and Arithmetic optimization algorithm with Random Forest classifier Zero crossing point detection in a distorted sinusoidal signal using random forest classifier FL-XGBTC: federated learning inspired with XG-boost tuned classifier for YouTube spam content detection A generalized product adoption model under random marketing conditions Assessing e-learning platforms in higher education with reference to student satisfaction: a PLS-SEM approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1