Atul Jaysing Patil;Ram Naresh;Raj Kumar Jarial;Hasmat Malik
{"title":"Optimized Synthetic Data Integration With Transformer’s DGA Data for Improved ML-Based Fault Identification","authors":"Atul Jaysing Patil;Ram Naresh;Raj Kumar Jarial;Hasmat Malik","doi":"10.1109/TDEI.2024.3421915","DOIUrl":null,"url":null,"abstract":"Ensuring transformer health and accurate fault diagnosis is crucial for the reliable operation of power systems. The development of data-driven techniques for fault interpretation in mineral oil-filled transformers becomes challenging due to the limited availability of real-world data. The research investigates the development of a novel optimized synthetic data dataset for three different ML models—K-nearest neighbors (KNNs), support vector machine (SVM), and random forest (RF)—that maximizes the accuracy of these data-driven algorithms without training with excessive data instances resulting in overfitting on the training dataset. Utilizing a dataset from 1135 diverse transformers for ML model training, the study introduces a novel two-step iterative and optimized methodology for generating a synthetic database. The integration of real and synthetic data enhances the overall efficacy of incipient fault identification using ML algorithms. To ensure robust evaluation and comparison of performance, the IEC TC 10 dataset is employed. With the optimized dataset, the accuracy of the KNN model increased from 79.33% to 90.26% when the prior was trained only with real-world data. The verification of the generated synthetic data from the proposed method, compared to existing methods, demonstrated its superiority in dataset quality.","PeriodicalId":13247,"journal":{"name":"IEEE Transactions on Dielectrics and Electrical Insulation","volume":"32 1","pages":"598-607"},"PeriodicalIF":3.1000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Dielectrics and Electrical Insulation","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10580982/","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Ensuring transformer health and accurate fault diagnosis is crucial for the reliable operation of power systems. The development of data-driven techniques for fault interpretation in mineral oil-filled transformers becomes challenging due to the limited availability of real-world data. The research investigates the development of a novel optimized synthetic data dataset for three different ML models—K-nearest neighbors (KNNs), support vector machine (SVM), and random forest (RF)—that maximizes the accuracy of these data-driven algorithms without training with excessive data instances resulting in overfitting on the training dataset. Utilizing a dataset from 1135 diverse transformers for ML model training, the study introduces a novel two-step iterative and optimized methodology for generating a synthetic database. The integration of real and synthetic data enhances the overall efficacy of incipient fault identification using ML algorithms. To ensure robust evaluation and comparison of performance, the IEC TC 10 dataset is employed. With the optimized dataset, the accuracy of the KNN model increased from 79.33% to 90.26% when the prior was trained only with real-world data. The verification of the generated synthetic data from the proposed method, compared to existing methods, demonstrated its superiority in dataset quality.
期刊介绍:
Topics that are concerned with dielectric phenomena and measurements, with development and characterization of gaseous, vacuum, liquid and solid electrical insulating materials and systems; and with utilization of these materials in circuits and systems under condition of use.