Topic Modeling Using LDA and BERT Techniques: Teknofest Example

Ercan Atagün, Bengisu Hartoka, A. Albayrak
{"title":"Topic Modeling Using LDA and BERT Techniques: Teknofest Example","authors":"Ercan Atagün, Bengisu Hartoka, A. Albayrak","doi":"10.1109/UBMK52708.2021.9558988","DOIUrl":null,"url":null,"abstract":"This paper is a natural language processing study and includes models used in natural language processing. In this paper, topic modeling, which is one of the sub-fields of natural language processing, has been studied. In order to make topic modeling, the data set was obtained by using the data scraping method, which has been very popular in recent years, over social media. The dataset is related to Teknofest competitions. The dataset was created by utilizing the Selenium library, one of the popular libraries used for the data scraping method. In order to be able to analyze on the prepared data set and to ensure the consistency of the clustering process, the text to be used before the analysis was preprocessed. After text preprocessing, clustering was performed on the data set with natural language processing techniques such as BERT and LDA.","PeriodicalId":106516,"journal":{"name":"2021 6th International Conference on Computer Science and Engineering (UBMK)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK52708.2021.9558988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

This paper is a natural language processing study and includes models used in natural language processing. In this paper, topic modeling, which is one of the sub-fields of natural language processing, has been studied. In order to make topic modeling, the data set was obtained by using the data scraping method, which has been very popular in recent years, over social media. The dataset is related to Teknofest competitions. The dataset was created by utilizing the Selenium library, one of the popular libraries used for the data scraping method. In order to be able to analyze on the prepared data set and to ensure the consistency of the clustering process, the text to be used before the analysis was preprocessed. After text preprocessing, clustering was performed on the data set with natural language processing techniques such as BERT and LDA.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用LDA和BERT技术的主题建模:Teknofest示例
本文是一项自然语言处理研究,包括自然语言处理中使用的模型。主题建模是自然语言处理的一个分支领域。为了进行主题建模,使用近年来非常流行的数据抓取方法在社交媒体上获取数据集。该数据集与Teknofest竞赛有关。数据集是利用Selenium库创建的,Selenium库是用于数据抓取方法的流行库之一。为了能够对准备好的数据集进行分析,并保证聚类过程的一致性,对分析前要使用的文本进行预处理。文本预处理后,采用BERT、LDA等自然语言处理技术对数据集进行聚类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Emotion Analysis from Facial Expressions Using Convolutional Neural Networks Early Stage Fault Prediction via Inter-Project Rule Transfer Semantic Similarity Comparison of Word Representation Methods in the Field of Health Small Object Detection and Tracking from Aerial Imagery Anomaly Detection with Deep Long Short Term Memory Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1