Topic Analysis of Microblog About “Didi Taxi” Based on K-means Algorithm

Yonghe Lu, Xin Xiong
{"title":"Topic Analysis of Microblog About “Didi Taxi” Based on K-means Algorithm","authors":"Yonghe Lu, Xin Xiong","doi":"10.11648/J.AJIST.20190303.13","DOIUrl":null,"url":null,"abstract":"In the age of information and digitization, most users publish and obtain real-time information by microblog in social networks. Through effective means, we can accurately discover, organize, and utilize the valuable information hidden behind the massive short texts of social networks. Then we can explore hot topics in microblog, which is conducive to public opinion monitoring and marketing development. In today's society, Didi Taxi has become a necessary choice for many users to travel. This paper applied K-means clustering algorithm to topic analysis of Sina microblog short text on Didi Taxi. We crawled 17226 search results of microblog relevant to the topic of Didi Taxi from April 2019 to June 2019. After a series of data cleaning and data preprocessing steps, we used TF-IDF method to represent 15054 pieces of text data after processing. Through the evaluation of silhouette coefficient, we set the dimension of text 300 and the number of clusters 34 with K-means. Next, we extracted 8 topic clusters from 34 clusters, which include the advantages and disadvantages of Didi Taxi and its development status. Finally, we discussed the results by human check in semantic perspective. Through the topic analysis of microblog, we can understand the public’s attitude to Didi Taxi and provide the basis for the management of the government or company in the future.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Society for Information Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11648/J.AJIST.20190303.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In the age of information and digitization, most users publish and obtain real-time information by microblog in social networks. Through effective means, we can accurately discover, organize, and utilize the valuable information hidden behind the massive short texts of social networks. Then we can explore hot topics in microblog, which is conducive to public opinion monitoring and marketing development. In today's society, Didi Taxi has become a necessary choice for many users to travel. This paper applied K-means clustering algorithm to topic analysis of Sina microblog short text on Didi Taxi. We crawled 17226 search results of microblog relevant to the topic of Didi Taxi from April 2019 to June 2019. After a series of data cleaning and data preprocessing steps, we used TF-IDF method to represent 15054 pieces of text data after processing. Through the evaluation of silhouette coefficient, we set the dimension of text 300 and the number of clusters 34 with K-means. Next, we extracted 8 topic clusters from 34 clusters, which include the advantages and disadvantages of Didi Taxi and its development status. Finally, we discussed the results by human check in semantic perspective. Through the topic analysis of microblog, we can understand the public’s attitude to Didi Taxi and provide the basis for the management of the government or company in the future.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于K-means算法的“滴滴打车”微博话题分析
在信息化和数字化的时代,大多数用户在社交网络上通过微博发布和获取实时信息。通过有效的手段,我们可以准确地发现、组织和利用隐藏在社交网络海量短文本背后的有价值的信息。然后我们可以在微博中挖掘热点话题,这有利于舆论监测和营销发展。在当今社会,滴滴打车已经成为很多用户出行的必备选择。本文将k均值聚类算法应用于滴滴打车新浪微博短文本的话题分析。我们抓取了2019年4月至2019年6月与滴滴打车主题相关的微博搜索结果17226条。经过一系列的数据清洗和数据预处理步骤,我们使用TF-IDF方法表示处理后的15054条文本数据。通过对剪影系数的评估,我们设定文本的维数为300,K-means的聚类数为34。接下来,我们从34个聚类中提取了8个主题聚类,包括滴滴打车的优劣势和发展现状。最后,从语义的角度讨论了人工检查的结果。通过对微博的话题分析,可以了解公众对滴滴打车的态度,为未来政府或公司的管理提供依据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
3.5 months
期刊最新文献
Information Resources Management in the Twenty-First Century: Challenges, Prospects, and the Librarian’s Role Technical Infrastructure to Support Public Value Co-creation in Smart City Perceived Usefulness of Web 2.0 Tools for Knowledge Management by University Undergraduate Students: A Review of Literature Group Emotion Recognition for Weibo Topics Based on BERT with TextCNN Research on the Service of Special Collections of University Libraries Empowered by Intelligent Media
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1