基于 BERT-TextRCNN 模型的配电变电站检测文本分类

IF 2.6 4区 工程技术 Q3 ENERGY & FUELS Frontiers in Energy Research Pub Date : 2024-07-31 DOI:10.3389/fenrg.2024.1411654
Lu Jiangang, Zhao Ruifeng, Yu Zhiwen, Dai Yue, Shu Jiawei, Yang Ting
{"title":"基于 BERT-TextRCNN 模型的配电变电站检测文本分类","authors":"Lu Jiangang, Zhao Ruifeng, Yu Zhiwen, Dai Yue, Shu Jiawei, Yang Ting","doi":"10.3389/fenrg.2024.1411654","DOIUrl":null,"url":null,"abstract":"With the advancement of source-load interaction in the new power systems, data-driven approaches have provided a foundational support for aggregating and interacting between sources and loads. However, with the widespread integration of distributed energy resources, fine-grained perception of intelligent sensing devices, and the inherent stochasticity of source-load dynamics, a massive amount of raw data is being recorded and accumulated in the data center. Valuable information is often dispersed across different paragraphs of the raw data, making it challenging to extract effectively. Distribution substation inspection plays a crucial role in ensuring the safe operation of the power system. Traditional methods for inspection report text classification typically rely on manual judgment and accumulated experience, resulting in low efficiency and a significant misjudgment rate. Therefore, this paper proposes a text classification method for inspection reports based on the pre-trained BERT-TextRCNN model. By utilizing the dense connection between the BERT embedding layer and the neural network, the proposed method improves the accuracy of matching long texts. This article collected 2,831 maintenance data for the first quarter of 2023 from the distribution room, including approximately 58 environmental testing data, 738 environmental box testing data, approximately 672 distribution room testing data, and approximately 1,363 box type substation testing data. A text corpus was constructed for experiments. Experimental results demonstrate that the proposed model automatically classifies a large volume of manually recorded inspection report data based on time, location, and faults, achieving a classification accuracy of 94.7%, precision of 92%, recall of 92%, and F1 score of 90.3%.","PeriodicalId":12428,"journal":{"name":"Frontiers in Energy Research","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text classification for distribution substation inspection based on BERT-TextRCNN model\",\"authors\":\"Lu Jiangang, Zhao Ruifeng, Yu Zhiwen, Dai Yue, Shu Jiawei, Yang Ting\",\"doi\":\"10.3389/fenrg.2024.1411654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advancement of source-load interaction in the new power systems, data-driven approaches have provided a foundational support for aggregating and interacting between sources and loads. However, with the widespread integration of distributed energy resources, fine-grained perception of intelligent sensing devices, and the inherent stochasticity of source-load dynamics, a massive amount of raw data is being recorded and accumulated in the data center. Valuable information is often dispersed across different paragraphs of the raw data, making it challenging to extract effectively. Distribution substation inspection plays a crucial role in ensuring the safe operation of the power system. Traditional methods for inspection report text classification typically rely on manual judgment and accumulated experience, resulting in low efficiency and a significant misjudgment rate. Therefore, this paper proposes a text classification method for inspection reports based on the pre-trained BERT-TextRCNN model. By utilizing the dense connection between the BERT embedding layer and the neural network, the proposed method improves the accuracy of matching long texts. This article collected 2,831 maintenance data for the first quarter of 2023 from the distribution room, including approximately 58 environmental testing data, 738 environmental box testing data, approximately 672 distribution room testing data, and approximately 1,363 box type substation testing data. A text corpus was constructed for experiments. Experimental results demonstrate that the proposed model automatically classifies a large volume of manually recorded inspection report data based on time, location, and faults, achieving a classification accuracy of 94.7%, precision of 92%, recall of 92%, and F1 score of 90.3%.\",\"PeriodicalId\":12428,\"journal\":{\"name\":\"Frontiers in Energy Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Energy Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3389/fenrg.2024.1411654\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Energy Research","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3389/fenrg.2024.1411654","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0

摘要

随着新电力系统中源与负载互动的发展,数据驱动方法为源与负载之间的聚合和互动提供了基础支持。然而,随着分布式能源资源的广泛集成、智能传感设备的细粒度感知以及源-负载动态的内在随机性,大量原始数据被记录并积累到数据中心。有价值的信息往往分散在原始数据的不同段落中,因此要有效提取这些信息非常困难。配电变电站检测在确保电力系统的安全运行方面发挥着至关重要的作用。传统的巡检报告文本分类方法通常依赖人工判断和经验积累,效率低且误判率高。因此,本文提出了一种基于预训练 BERT-TextRCNN 模型的检验报告文本分类方法。通过利用 BERT 嵌入层与神经网络之间的密集连接,本文提出的方法提高了长文本匹配的准确性。本文收集了配电室 2023 年第一季度的 2831 条维护数据,包括约 58 条环境检测数据、738 条环境箱式检测数据、约 672 条配电室检测数据和约 1363 条箱式变电站检测数据。为实验构建了文本语料库。实验结果表明,所提出的模型能根据时间、地点和故障对大量人工记录的检测报告数据进行自动分类,分类准确率达到 94.7%,精确率达到 92%,召回率达到 92%,F1 分数达到 90.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Text classification for distribution substation inspection based on BERT-TextRCNN model
With the advancement of source-load interaction in the new power systems, data-driven approaches have provided a foundational support for aggregating and interacting between sources and loads. However, with the widespread integration of distributed energy resources, fine-grained perception of intelligent sensing devices, and the inherent stochasticity of source-load dynamics, a massive amount of raw data is being recorded and accumulated in the data center. Valuable information is often dispersed across different paragraphs of the raw data, making it challenging to extract effectively. Distribution substation inspection plays a crucial role in ensuring the safe operation of the power system. Traditional methods for inspection report text classification typically rely on manual judgment and accumulated experience, resulting in low efficiency and a significant misjudgment rate. Therefore, this paper proposes a text classification method for inspection reports based on the pre-trained BERT-TextRCNN model. By utilizing the dense connection between the BERT embedding layer and the neural network, the proposed method improves the accuracy of matching long texts. This article collected 2,831 maintenance data for the first quarter of 2023 from the distribution room, including approximately 58 environmental testing data, 738 environmental box testing data, approximately 672 distribution room testing data, and approximately 1,363 box type substation testing data. A text corpus was constructed for experiments. Experimental results demonstrate that the proposed model automatically classifies a large volume of manually recorded inspection report data based on time, location, and faults, achieving a classification accuracy of 94.7%, precision of 92%, recall of 92%, and F1 score of 90.3%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Frontiers in Energy Research
Frontiers in Energy Research Economics, Econometrics and Finance-Economics and Econometrics
CiteScore
3.90
自引率
11.80%
发文量
1727
审稿时长
12 weeks
期刊介绍: Frontiers in Energy Research makes use of the unique Frontiers platform for open-access publishing and research networking for scientists, which provides an equal opportunity to seek, share and create knowledge. The mission of Frontiers is to place publishing back in the hands of working scientists and to promote an interactive, fair, and efficient review process. Articles are peer-reviewed according to the Frontiers review guidelines, which evaluate manuscripts on objective editorial criteria
期刊最新文献
Grid-integrated solutions for sustainable EV charging: a comparative study of renewable energy and battery storage systems Research on the impact of digitalization on energy companies’ green transition: new insights from China Multi-objective-based economic and emission dispatch with integration of wind energy sources using different optimization algorithms Demand-side management scenario analysis for the energy-efficient future of Pakistan: Bridging the gap between market interests and national priorities Modeling and scheduling of utility-scale energy storage toward high-share renewable coordination
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1