Text embedding techniques for efficient clustering of twitter data.

IF 2.3 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Evolutionary Intelligence Pub Date : 2023-02-07 DOI:10.1007/s12065-023-00825-3
Jayasree Ravi, Sushil Kulkarni
{"title":"Text embedding techniques for efficient clustering of twitter data.","authors":"Jayasree Ravi, Sushil Kulkarni","doi":"10.1007/s12065-023-00825-3","DOIUrl":null,"url":null,"abstract":"<p><p>World wide web is abundant with various types of information such blogs, social media posts, news articles. With this type of magnitude of online content, there is a need to deeply understand the insights of it in order to make use of the information for practical applications such as event detection, polarity, sentiment analysis and so on. Natural Language Processing (NLP) is the study of such information which is used for text classification, sentiment analysis, clustering of similar text. NLP makes use of linguistic knowledge and build machine learning models to analyse textual information. NLP finds its way in various applications like classification of online review into positive and negative without actually reading the reviews and feedback. For text analysis, there should be a way to quantify the text based on its frequency of occurrence, correlation with neighbouring words, contextual similarity of words, etc. One such way is word embedding. This study applies various word embedding techniques on tweets of popular news channels and clusters the resultant vectors using K-means algorithm. From this study, it is found out that Bidirectional Encoder Representations from Transformers (BERT) has achieved highest accuracy rate when used with K-means clustering.</p>","PeriodicalId":46237,"journal":{"name":"Evolutionary Intelligence","volume":" ","pages":"1-11"},"PeriodicalIF":2.3000,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904526/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12065-023-00825-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

World wide web is abundant with various types of information such blogs, social media posts, news articles. With this type of magnitude of online content, there is a need to deeply understand the insights of it in order to make use of the information for practical applications such as event detection, polarity, sentiment analysis and so on. Natural Language Processing (NLP) is the study of such information which is used for text classification, sentiment analysis, clustering of similar text. NLP makes use of linguistic knowledge and build machine learning models to analyse textual information. NLP finds its way in various applications like classification of online review into positive and negative without actually reading the reviews and feedback. For text analysis, there should be a way to quantify the text based on its frequency of occurrence, correlation with neighbouring words, contextual similarity of words, etc. One such way is word embedding. This study applies various word embedding techniques on tweets of popular news channels and clusters the resultant vectors using K-means algorithm. From this study, it is found out that Bidirectional Encoder Representations from Transformers (BERT) has achieved highest accuracy rate when used with K-means clustering.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高效聚类 twitter 数据的文本嵌入技术。
万维网上充斥着各种类型的信息,如博客、社交媒体帖子、新闻文章等。面对如此大量的在线内容,有必要深入了解其中的见解,以便将这些信息用于实际应用,如事件检测、极性分析、情感分析等。自然语言处理(NLP)就是对此类信息的研究,可用于文本分类、情感分析、相似文本聚类等。NLP 利用语言知识并建立机器学习模型来分析文本信息。NLP 在各种应用中都能找到自己的用武之地,比如在不实际阅读评论和反馈的情况下,将在线评论分为正面和负面。在文本分析中,应该有一种方法可以根据文本的出现频率、与邻近词语的相关性、词语的上下文相似性等对文本进行量化。其中一种方法就是词语嵌入。本研究对热门新闻频道的推文应用了各种词嵌入技术,并使用 K-means 算法对结果向量进行聚类。研究发现,双向变压器编码器表示法(BERT)与 K-means 聚类法配合使用时,准确率最高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Evolutionary Intelligence
Evolutionary Intelligence COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
6.80
自引率
0.00%
发文量
108
期刊介绍: This Journal provides an international forum for the timely publication and dissemination of foundational and applied research in the domain of Evolutionary Intelligence. The spectrum of emerging fields in contemporary artificial intelligence, including Big Data, Deep Learning, Computational Neuroscience bridged with evolutionary computing and other population-based search methods constitute the flag of Evolutionary Intelligence Journal.Topics of interest for Evolutionary Intelligence refer to different aspects of evolutionary models of computation empowered with intelligence-based approaches, including but not limited to architectures, model optimization and tuning, machine learning algorithms, life inspired adaptive algorithms, swarm-oriented strategies, high performance computing, massive data processing, with applications to domains like computer vision, image processing, simulation, robotics, computational finance, media, internet of things, medicine, bioinformatics, smart cities, and similar. Surveys outlining the state of art in specific subfields and applications are welcome.
期刊最新文献
Geometric mean optimizer for achieving efficiency in truss structural design Fuzzy logic applied to mutation size in evolutionary strategies On the computation of Delaunay triangulations via genetic algorithms Marine predators social group optimization: a hybrid approach Bilevel optimization based on foraging by different ant species for real-time transportation planning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1