Text embedding techniques for efficient clustering of twitter data.

IF 2.6 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Evolutionary Intelligence Pub Date : 2023-02-07 DOI:10.1007/s12065-023-00825-3

Jayasree Ravi, Sushil Kulkarni

{"title":"Text embedding techniques for efficient clustering of twitter data.","authors":"Jayasree Ravi, Sushil Kulkarni","doi":"10.1007/s12065-023-00825-3","DOIUrl":null,"url":null,"abstract":"<p><p>World wide web is abundant with various types of information such blogs, social media posts, news articles. With this type of magnitude of online content, there is a need to deeply understand the insights of it in order to make use of the information for practical applications such as event detection, polarity, sentiment analysis and so on. Natural Language Processing (NLP) is the study of such information which is used for text classification, sentiment analysis, clustering of similar text. NLP makes use of linguistic knowledge and build machine learning models to analyse textual information. NLP finds its way in various applications like classification of online review into positive and negative without actually reading the reviews and feedback. For text analysis, there should be a way to quantify the text based on its frequency of occurrence, correlation with neighbouring words, contextual similarity of words, etc. One such way is word embedding. This study applies various word embedding techniques on tweets of popular news channels and clusters the resultant vectors using K-means algorithm. From this study, it is found out that Bidirectional Encoder Representations from Transformers (BERT) has achieved highest accuracy rate when used with K-means clustering.</p>","PeriodicalId":46237,"journal":{"name":"Evolutionary Intelligence","volume":" ","pages":"1-11"},"PeriodicalIF":2.6000,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904526/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12065-023-00825-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

World wide web is abundant with various types of information such blogs, social media posts, news articles. With this type of magnitude of online content, there is a need to deeply understand the insights of it in order to make use of the information for practical applications such as event detection, polarity, sentiment analysis and so on. Natural Language Processing (NLP) is the study of such information which is used for text classification, sentiment analysis, clustering of similar text. NLP makes use of linguistic knowledge and build machine learning models to analyse textual information. NLP finds its way in various applications like classification of online review into positive and negative without actually reading the reviews and feedback. For text analysis, there should be a way to quantify the text based on its frequency of occurrence, correlation with neighbouring words, contextual similarity of words, etc. One such way is word embedding. This study applies various word embedding techniques on tweets of popular news channels and clusters the resultant vectors using K-means algorithm. From this study, it is found out that Bidirectional Encoder Representations from Transformers (BERT) has achieved highest accuracy rate when used with K-means clustering.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

高效聚类 twitter 数据的文本嵌入技术。

万维网上充斥着各种类型的信息，如博客、社交媒体帖子、新闻文章等。面对如此大量的在线内容，有必要深入了解其中的见解，以便将这些信息用于实际应用，如事件检测、极性分析、情感分析等。自然语言处理（NLP）就是对此类信息的研究，可用于文本分类、情感分析、相似文本聚类等。NLP 利用语言知识并建立机器学习模型来分析文本信息。NLP 在各种应用中都能找到自己的用武之地，比如在不实际阅读评论和反馈的情况下，将在线评论分为正面和负面。在文本分析中，应该有一种方法可以根据文本的出现频率、与邻近词语的相关性、词语的上下文相似性等对文本进行量化。其中一种方法就是词语嵌入。本研究对热门新闻频道的推文应用了各种词嵌入技术，并使用 K-means 算法对结果向量进行聚类。研究发现，双向变压器编码器表示法（BERT）与 K-means 聚类法配合使用时，准确率最高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Evolutionary Intelligence COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

6.80

自引率

0.00%

发文量

108

期刊介绍： This Journal provides an international forum for the timely publication and dissemination of foundational and applied research in the domain of Evolutionary Intelligence. The spectrum of emerging fields in contemporary artificial intelligence, including Big Data, Deep Learning, Computational Neuroscience bridged with evolutionary computing and other population-based search methods constitute the flag of Evolutionary Intelligence Journal.Topics of interest for Evolutionary Intelligence refer to different aspects of evolutionary models of computation empowered with intelligence-based approaches, including but not limited to architectures, model optimization and tuning, machine learning algorithms, life inspired adaptive algorithms, swarm-oriented strategies, high performance computing, massive data processing, with applications to domains like computer vision, image processing, simulation, robotics, computational finance, media, internet of things, medicine, bioinformatics, smart cities, and similar. Surveys outlining the state of art in specific subfields and applications are welcome.

期刊最新文献

Geometric mean optimizer for achieving efficiency in truss structural design Fuzzy logic applied to mutation size in evolutionary strategies On the computation of Delaunay triangulations via genetic algorithms Marine predators social group optimization: a hybrid approach Bilevel optimization based on foraging by different ant species for real-time transportation planning