Shaohua Sun, Zemei Dai, Xinkui Xi, Xin Shan, Bo Wang
{"title":"基于自然语言处理的电力故障预案文本信息提取","authors":"Shaohua Sun, Zemei Dai, Xinkui Xi, Xin Shan, Bo Wang","doi":"10.1109/IICSPI.2018.8690379","DOIUrl":null,"url":null,"abstract":"A large amount of texts recorded in Chinese exist in power grid enterprises. These texts contain abundant information of power system. Manually mining the text information is inefficient and the accuracy may vary with different dispatchers. In this paper, the power fault countermeasure text is taken as the object to study the power Chinese text information extraction method. Power texts are segmented firstly based on the nature language process (NLP), the ontology lexicon is established according to the power word attribute in the power fault countermeasure text; Based on the syntax structure characteristics of punctuations and the concept of separate parsing phrase are brought in to guide the division of long texts, which can separate the sentence with only one power entity and its related information; The syntax rule template applicable to the separate parsing phrase is established based on the meta-character templates (generalization slot, fixed word-combination, wildcard character, and registry function) used for the power fault preplan text information extraction and the structured output of that information; At last, the generalization ability and the universality of the template are analyzed. Examples show that the rule template applies to the information extraction of most texts with strong universality and high accuracy.","PeriodicalId":6673,"journal":{"name":"2018 IEEE International Conference of Safety Produce Informatization (IICSPI)","volume":"9 1","pages":"617-621"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Power Fault Preplan Text Information Extraction Based on NLP\",\"authors\":\"Shaohua Sun, Zemei Dai, Xinkui Xi, Xin Shan, Bo Wang\",\"doi\":\"10.1109/IICSPI.2018.8690379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A large amount of texts recorded in Chinese exist in power grid enterprises. These texts contain abundant information of power system. Manually mining the text information is inefficient and the accuracy may vary with different dispatchers. In this paper, the power fault countermeasure text is taken as the object to study the power Chinese text information extraction method. Power texts are segmented firstly based on the nature language process (NLP), the ontology lexicon is established according to the power word attribute in the power fault countermeasure text; Based on the syntax structure characteristics of punctuations and the concept of separate parsing phrase are brought in to guide the division of long texts, which can separate the sentence with only one power entity and its related information; The syntax rule template applicable to the separate parsing phrase is established based on the meta-character templates (generalization slot, fixed word-combination, wildcard character, and registry function) used for the power fault preplan text information extraction and the structured output of that information; At last, the generalization ability and the universality of the template are analyzed. Examples show that the rule template applies to the information extraction of most texts with strong universality and high accuracy.\",\"PeriodicalId\":6673,\"journal\":{\"name\":\"2018 IEEE International Conference of Safety Produce Informatization (IICSPI)\",\"volume\":\"9 1\",\"pages\":\"617-621\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference of Safety Produce Informatization (IICSPI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IICSPI.2018.8690379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference of Safety Produce Informatization (IICSPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICSPI.2018.8690379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Power Fault Preplan Text Information Extraction Based on NLP
A large amount of texts recorded in Chinese exist in power grid enterprises. These texts contain abundant information of power system. Manually mining the text information is inefficient and the accuracy may vary with different dispatchers. In this paper, the power fault countermeasure text is taken as the object to study the power Chinese text information extraction method. Power texts are segmented firstly based on the nature language process (NLP), the ontology lexicon is established according to the power word attribute in the power fault countermeasure text; Based on the syntax structure characteristics of punctuations and the concept of separate parsing phrase are brought in to guide the division of long texts, which can separate the sentence with only one power entity and its related information; The syntax rule template applicable to the separate parsing phrase is established based on the meta-character templates (generalization slot, fixed word-combination, wildcard character, and registry function) used for the power fault preplan text information extraction and the structured output of that information; At last, the generalization ability and the universality of the template are analyzed. Examples show that the rule template applies to the information extraction of most texts with strong universality and high accuracy.