{"title":"Technology Research of Tibetan Hot Topics Extraction","authors":"Guixian Xu, L. Qiu","doi":"10.1109/WAINA.2015.17","DOIUrl":null,"url":null,"abstract":"With the increase of a large numbers of Tibetan information, Tibetan text processing has become popular and important. Tibetan hot topics extraction has become one of the Tibetan information analysis tools. This paper describes a method of the hot topics extraction from Tibetan text. First, construction of the dataset is described. Second, Tibetan word segmentation is presented. Third, the feature selection and the text representation are conducted. The classical TFIDF is used to calculate the weights of features. At last, statistical-based method is utilized to extract the hot topics. The experiment shows it can extract the topics effectively and the results can reflect the characteristics of hot topic category. It is helpful and meaningful for text classification, information retrieval as well as construction of high-quality corpus.","PeriodicalId":6845,"journal":{"name":"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops","volume":"6 1","pages":"204-208"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WAINA.2015.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the increase of a large numbers of Tibetan information, Tibetan text processing has become popular and important. Tibetan hot topics extraction has become one of the Tibetan information analysis tools. This paper describes a method of the hot topics extraction from Tibetan text. First, construction of the dataset is described. Second, Tibetan word segmentation is presented. Third, the feature selection and the text representation are conducted. The classical TFIDF is used to calculate the weights of features. At last, statistical-based method is utilized to extract the hot topics. The experiment shows it can extract the topics effectively and the results can reflect the characteristics of hot topic category. It is helpful and meaningful for text classification, information retrieval as well as construction of high-quality corpus.