{"title":"Extracting topic keywords from Sina Weibo text sets","authors":"S. Xu, Juncai Guo, Xue Chen","doi":"10.1109/ICALIP.2016.7846663","DOIUrl":null,"url":null,"abstract":"Sina Weibo is one of the most popular microblogging website in China. It has more than 500 million registered users and the daily production of posters is over 100 million, with a market penetration similar to Twitter. Mining the useful information from large volume of fragmented short texts is a fundamental but very challenging research work. This paper proposes a method LET(LDA&Entropy&Tex-trank) to extract topic keywords from Sina Weibo topics text sets. LET considers both topic influence of keywords and topic discrimination of keyword that combines the merits of LDA, Entropy and TextRank. In addition, we design a new standard evaluation method KESS (topic KEywords Sta-ndard Sequence). Based on KESS, we can compute the offset loss scores for the four different keywords extraction methods. Extensive simulations show that LET is a comparatively efficient and effective method to obtain topic words from hot topics of Sina Weibo.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALIP.2016.7846663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Sina Weibo is one of the most popular microblogging website in China. It has more than 500 million registered users and the daily production of posters is over 100 million, with a market penetration similar to Twitter. Mining the useful information from large volume of fragmented short texts is a fundamental but very challenging research work. This paper proposes a method LET(LDA&Entropy&Tex-trank) to extract topic keywords from Sina Weibo topics text sets. LET considers both topic influence of keywords and topic discrimination of keyword that combines the merits of LDA, Entropy and TextRank. In addition, we design a new standard evaluation method KESS (topic KEywords Sta-ndard Sequence). Based on KESS, we can compute the offset loss scores for the four different keywords extraction methods. Extensive simulations show that LET is a comparatively efficient and effective method to obtain topic words from hot topics of Sina Weibo.