Yang Lian-bao, Li Ping, Ma Xiao-ning, Li Xin-qin, Xue Rui, Wang Zhe
{"title":"ES-based full text retrieval and analysis of railway accident fault tracking report","authors":"Yang Lian-bao, Li Ping, Ma Xiao-ning, Li Xin-qin, Xue Rui, Wang Zhe","doi":"10.1109/MAPE.2017.8250908","DOIUrl":null,"url":null,"abstract":"To tackle the difficulty of retrieving and analyzing the unstructured large railway accident fault text, this paper proposes a retrieval scheme based Elasticsearch, which is a distributed full text search engine. The scheme adopts the Chinese word segmentation which integrates the railway domain dictionary, and uses the mainstream inverted index technology to realize the fast indexing after Chinese word segmentation, and applies the mature TF-IDF algorithm to realize the text search. Based on the structural characteristics of the railway accident fault tracking report, a text feature extraction method based on text format and regular expression is adopted to realize the extraction of accident name, accident location and so on. Finally, this paper adopts a railway bureau's railway company accident tracking report to do experiments and analysis from July to December 2016, which verified that ES full-text retrieval for near real-time. Through the text feature extraction, this paper uses the word cloud to show the key accident fault providing guidance for on-site work, establishing foundation of railway industry full text retrieval and analysis.","PeriodicalId":320947,"journal":{"name":"2017 7th IEEE International Symposium on Microwave, Antenna, Propagation, and EMC Technologies (MAPE)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th IEEE International Symposium on Microwave, Antenna, Propagation, and EMC Technologies (MAPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MAPE.2017.8250908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
To tackle the difficulty of retrieving and analyzing the unstructured large railway accident fault text, this paper proposes a retrieval scheme based Elasticsearch, which is a distributed full text search engine. The scheme adopts the Chinese word segmentation which integrates the railway domain dictionary, and uses the mainstream inverted index technology to realize the fast indexing after Chinese word segmentation, and applies the mature TF-IDF algorithm to realize the text search. Based on the structural characteristics of the railway accident fault tracking report, a text feature extraction method based on text format and regular expression is adopted to realize the extraction of accident name, accident location and so on. Finally, this paper adopts a railway bureau's railway company accident tracking report to do experiments and analysis from July to December 2016, which verified that ES full-text retrieval for near real-time. Through the text feature extraction, this paper uses the word cloud to show the key accident fault providing guidance for on-site work, establishing foundation of railway industry full text retrieval and analysis.