{"title":"Feature selection algorithm for substation main equipment defect text mining based on natural language processing","authors":"Xiaoqing Mai, Tianhu Zhang, Changwu Hu, Yan Zhang","doi":"10.1049/cps2.12079","DOIUrl":null,"url":null,"abstract":"<p>The dimension of relevant text feature space and feature weight of substation main equipment defect information is high, so it is difficult to accurately select mining features. The Natural Language Processing (NLP) medium and short-term neural network model is used to realise the defect information text feature word segmentation in the log. After extracting the text features of defect information of main substation equipment with high categories to form the feature space; the TF-IDF algorithm is designed to calculate the importance weight of text keywords, judge the criticality of defect information text feature vocabulary, accurately locate defect information text features, and realise defect information text feature mining. Experiments show that the algorithm has high precision for specific word segmentation of massive substation main equipment log information.</p>","PeriodicalId":36881,"journal":{"name":"IET Cyber-Physical Systems: Theory and Applications","volume":"9 3","pages":"238-246"},"PeriodicalIF":1.7000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cps2.12079","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Cyber-Physical Systems: Theory and Applications","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cps2.12079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The dimension of relevant text feature space and feature weight of substation main equipment defect information is high, so it is difficult to accurately select mining features. The Natural Language Processing (NLP) medium and short-term neural network model is used to realise the defect information text feature word segmentation in the log. After extracting the text features of defect information of main substation equipment with high categories to form the feature space; the TF-IDF algorithm is designed to calculate the importance weight of text keywords, judge the criticality of defect information text feature vocabulary, accurately locate defect information text features, and realise defect information text feature mining. Experiments show that the algorithm has high precision for specific word segmentation of massive substation main equipment log information.