Lu Jiangang, Zhao Ruifeng, Yu Zhiwen, Dai Yue, Shu Jiawei, Yang Ting
{"title":"Text classification for distribution substation inspection based on BERT-TextRCNN model","authors":"Lu Jiangang, Zhao Ruifeng, Yu Zhiwen, Dai Yue, Shu Jiawei, Yang Ting","doi":"10.3389/fenrg.2024.1411654","DOIUrl":null,"url":null,"abstract":"With the advancement of source-load interaction in the new power systems, data-driven approaches have provided a foundational support for aggregating and interacting between sources and loads. However, with the widespread integration of distributed energy resources, fine-grained perception of intelligent sensing devices, and the inherent stochasticity of source-load dynamics, a massive amount of raw data is being recorded and accumulated in the data center. Valuable information is often dispersed across different paragraphs of the raw data, making it challenging to extract effectively. Distribution substation inspection plays a crucial role in ensuring the safe operation of the power system. Traditional methods for inspection report text classification typically rely on manual judgment and accumulated experience, resulting in low efficiency and a significant misjudgment rate. Therefore, this paper proposes a text classification method for inspection reports based on the pre-trained BERT-TextRCNN model. By utilizing the dense connection between the BERT embedding layer and the neural network, the proposed method improves the accuracy of matching long texts. This article collected 2,831 maintenance data for the first quarter of 2023 from the distribution room, including approximately 58 environmental testing data, 738 environmental box testing data, approximately 672 distribution room testing data, and approximately 1,363 box type substation testing data. A text corpus was constructed for experiments. Experimental results demonstrate that the proposed model automatically classifies a large volume of manually recorded inspection report data based on time, location, and faults, achieving a classification accuracy of 94.7%, precision of 92%, recall of 92%, and F1 score of 90.3%.","PeriodicalId":12428,"journal":{"name":"Frontiers in Energy Research","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Energy Research","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3389/fenrg.2024.1411654","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0
Abstract
With the advancement of source-load interaction in the new power systems, data-driven approaches have provided a foundational support for aggregating and interacting between sources and loads. However, with the widespread integration of distributed energy resources, fine-grained perception of intelligent sensing devices, and the inherent stochasticity of source-load dynamics, a massive amount of raw data is being recorded and accumulated in the data center. Valuable information is often dispersed across different paragraphs of the raw data, making it challenging to extract effectively. Distribution substation inspection plays a crucial role in ensuring the safe operation of the power system. Traditional methods for inspection report text classification typically rely on manual judgment and accumulated experience, resulting in low efficiency and a significant misjudgment rate. Therefore, this paper proposes a text classification method for inspection reports based on the pre-trained BERT-TextRCNN model. By utilizing the dense connection between the BERT embedding layer and the neural network, the proposed method improves the accuracy of matching long texts. This article collected 2,831 maintenance data for the first quarter of 2023 from the distribution room, including approximately 58 environmental testing data, 738 environmental box testing data, approximately 672 distribution room testing data, and approximately 1,363 box type substation testing data. A text corpus was constructed for experiments. Experimental results demonstrate that the proposed model automatically classifies a large volume of manually recorded inspection report data based on time, location, and faults, achieving a classification accuracy of 94.7%, precision of 92%, recall of 92%, and F1 score of 90.3%.
期刊介绍:
Frontiers in Energy Research makes use of the unique Frontiers platform for open-access publishing and research networking for scientists, which provides an equal opportunity to seek, share and create knowledge. The mission of Frontiers is to place publishing back in the hands of working scientists and to promote an interactive, fair, and efficient review process. Articles are peer-reviewed according to the Frontiers review guidelines, which evaluate manuscripts on objective editorial criteria