{"title":"基于自然语言处理的变电站主设备缺陷文本挖掘特征选择算法","authors":"Xiaoqing Mai, Tianhu Zhang, Changwu Hu, Yan Zhang","doi":"10.1049/cps2.12079","DOIUrl":null,"url":null,"abstract":"<p>The dimension of relevant text feature space and feature weight of substation main equipment defect information is high, so it is difficult to accurately select mining features. The Natural Language Processing (NLP) medium and short-term neural network model is used to realise the defect information text feature word segmentation in the log. After extracting the text features of defect information of main substation equipment with high categories to form the feature space; the TF-IDF algorithm is designed to calculate the importance weight of text keywords, judge the criticality of defect information text feature vocabulary, accurately locate defect information text features, and realise defect information text feature mining. Experiments show that the algorithm has high precision for specific word segmentation of massive substation main equipment log information.</p>","PeriodicalId":36881,"journal":{"name":"IET Cyber-Physical Systems: Theory and Applications","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cps2.12079","citationCount":"0","resultStr":"{\"title\":\"Feature selection algorithm for substation main equipment defect text mining based on natural language processing\",\"authors\":\"Xiaoqing Mai, Tianhu Zhang, Changwu Hu, Yan Zhang\",\"doi\":\"10.1049/cps2.12079\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The dimension of relevant text feature space and feature weight of substation main equipment defect information is high, so it is difficult to accurately select mining features. The Natural Language Processing (NLP) medium and short-term neural network model is used to realise the defect information text feature word segmentation in the log. After extracting the text features of defect information of main substation equipment with high categories to form the feature space; the TF-IDF algorithm is designed to calculate the importance weight of text keywords, judge the criticality of defect information text feature vocabulary, accurately locate defect information text features, and realise defect information text feature mining. Experiments show that the algorithm has high precision for specific word segmentation of massive substation main equipment log information.</p>\",\"PeriodicalId\":36881,\"journal\":{\"name\":\"IET Cyber-Physical Systems: Theory and Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cps2.12079\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Cyber-Physical Systems: Theory and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cps2.12079\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Cyber-Physical Systems: Theory and Applications","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cps2.12079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Feature selection algorithm for substation main equipment defect text mining based on natural language processing
The dimension of relevant text feature space and feature weight of substation main equipment defect information is high, so it is difficult to accurately select mining features. The Natural Language Processing (NLP) medium and short-term neural network model is used to realise the defect information text feature word segmentation in the log. After extracting the text features of defect information of main substation equipment with high categories to form the feature space; the TF-IDF algorithm is designed to calculate the importance weight of text keywords, judge the criticality of defect information text feature vocabulary, accurately locate defect information text features, and realise defect information text feature mining. Experiments show that the algorithm has high precision for specific word segmentation of massive substation main equipment log information.