Guodong Li, Qiuyi Zhang, Rongrong Zheng, Chenhui Wang
{"title":"基于文本聚类的故障分析方法","authors":"Guodong Li, Qiuyi Zhang, Rongrong Zheng, Chenhui Wang","doi":"10.1109/ICCCS49078.2020.9118528","DOIUrl":null,"url":null,"abstract":"A large number of typical fault cases accumulated in the informatization work of State Grid Corporation of China are mostly descriptive text data, which is difficult to understand and analyze by means of automation. In view of this problem, text mining technology is used to extract fault problems and causes from fault cases to form the causal relationship of faults, so as to provide necessary conditions for the next step of fault text mining. This system uses the method of text clustering for fault location and auxiliary research. First of all,do the segmentation of fault information and processing scheme, in this step, the Chinese word segmentation is carried out by using the Jieba word segmentation tool. Secondly, it is necessary to clean the segmentation results and build a corpus. Thirdly, in order to represent the corpus as the type that the computer can calculate the similarity, we need to transform the corpus into frequency matrix. And then instead of using traditional k-means clustering algorithm to cluster, we use the calinski_harabaz score to evaluate the best value of K. Finally, we put this model into application in actual production, build the fault information and solution mapping table.","PeriodicalId":105556,"journal":{"name":"2020 5th International Conference on Computer and Communication Systems (ICCCS)","volume":"2017 25","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Fault Analysis Method Based on Text Clustering\",\"authors\":\"Guodong Li, Qiuyi Zhang, Rongrong Zheng, Chenhui Wang\",\"doi\":\"10.1109/ICCCS49078.2020.9118528\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A large number of typical fault cases accumulated in the informatization work of State Grid Corporation of China are mostly descriptive text data, which is difficult to understand and analyze by means of automation. In view of this problem, text mining technology is used to extract fault problems and causes from fault cases to form the causal relationship of faults, so as to provide necessary conditions for the next step of fault text mining. This system uses the method of text clustering for fault location and auxiliary research. First of all,do the segmentation of fault information and processing scheme, in this step, the Chinese word segmentation is carried out by using the Jieba word segmentation tool. Secondly, it is necessary to clean the segmentation results and build a corpus. Thirdly, in order to represent the corpus as the type that the computer can calculate the similarity, we need to transform the corpus into frequency matrix. And then instead of using traditional k-means clustering algorithm to cluster, we use the calinski_harabaz score to evaluate the best value of K. Finally, we put this model into application in actual production, build the fault information and solution mapping table.\",\"PeriodicalId\":105556,\"journal\":{\"name\":\"2020 5th International Conference on Computer and Communication Systems (ICCCS)\",\"volume\":\"2017 25\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Computer and Communication Systems (ICCCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCS49078.2020.9118528\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCS49078.2020.9118528","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A large number of typical fault cases accumulated in the informatization work of State Grid Corporation of China are mostly descriptive text data, which is difficult to understand and analyze by means of automation. In view of this problem, text mining technology is used to extract fault problems and causes from fault cases to form the causal relationship of faults, so as to provide necessary conditions for the next step of fault text mining. This system uses the method of text clustering for fault location and auxiliary research. First of all,do the segmentation of fault information and processing scheme, in this step, the Chinese word segmentation is carried out by using the Jieba word segmentation tool. Secondly, it is necessary to clean the segmentation results and build a corpus. Thirdly, in order to represent the corpus as the type that the computer can calculate the similarity, we need to transform the corpus into frequency matrix. And then instead of using traditional k-means clustering algorithm to cluster, we use the calinski_harabaz score to evaluate the best value of K. Finally, we put this model into application in actual production, build the fault information and solution mapping table.