Deep Learning-Based Classification of News Texts Using Doc2Vec Model

Hasibe Busra Dogru, Sahra Tilki, Akhtar Jamil, Alaa Ali Hameed
{"title":"Deep Learning-Based Classification of News Texts Using Doc2Vec Model","authors":"Hasibe Busra Dogru, Sahra Tilki, Akhtar Jamil, Alaa Ali Hameed","doi":"10.1109/CAIDA51941.2021.9425290","DOIUrl":null,"url":null,"abstract":"The rapid increment in internet usage has also resulted in bulk gerenation of text data . Therefore, investigation of new techniques for automatic classification of textual content is needed as manually managing unstructured text is challenging. The main objective of text classification is to train a model such that it should place an unseen text into correct category. In this study, text classification was performed using the Doc2vec word embedding method on the Turkish Text Classification 3600 (TTC-3600) dataset consisting of Turkish news texts and the BBC-News dataset consisting of English news texts. As the classification method, deep learning-based CNN and traditional machine learning classification methods Gauss Naive Bayes (GNB), Random Forest (RF), Naive Bayes (NB) and Support Vector Machine (SVM) are used. In the proposed model, the highest result was obtained as 94.17% in the Turkish dataset and 96.41% in the English dataset in the classification made with CNN.","PeriodicalId":272573,"journal":{"name":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAIDA51941.2021.9425290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

The rapid increment in internet usage has also resulted in bulk gerenation of text data . Therefore, investigation of new techniques for automatic classification of textual content is needed as manually managing unstructured text is challenging. The main objective of text classification is to train a model such that it should place an unseen text into correct category. In this study, text classification was performed using the Doc2vec word embedding method on the Turkish Text Classification 3600 (TTC-3600) dataset consisting of Turkish news texts and the BBC-News dataset consisting of English news texts. As the classification method, deep learning-based CNN and traditional machine learning classification methods Gauss Naive Bayes (GNB), Random Forest (RF), Naive Bayes (NB) and Support Vector Machine (SVM) are used. In the proposed model, the highest result was obtained as 94.17% in the Turkish dataset and 96.41% in the English dataset in the classification made with CNN.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于Doc2Vec模型的深度学习新闻文本分类
互联网使用的快速增长也导致了文本数据的大量生成。因此,需要研究文本内容自动分类的新技术,因为手动管理非结构化文本是具有挑战性的。文本分类的主要目标是训练一个模型,使其能够将未见过的文本放入正确的类别中。在本研究中,使用Doc2vec词嵌入方法对由土耳其语新闻文本组成的土耳其文本分类3600 (TTC-3600)数据集和由英语新闻文本组成的BBC-News数据集进行文本分类。分类方法采用了基于深度学习的CNN和传统的机器学习分类方法高斯朴素贝叶斯(GNB)、随机森林(RF)、朴素贝叶斯(NB)和支持向量机(SVM)。在本文提出的模型中,使用CNN进行分类,土耳其语数据集和英语数据集的分类结果最高,分别为94.17%和96.41%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Melanoma Skin Lesions Classification using Deep Convolutional Neural Network with Transfer Learning A Comparison of Two-Stage Classifier Algorithm with Ensemble Techniques On Detection of Diabetic Retinopathy Predicting Congestive Heart Failure Risk Factors in King Abdulaziz Medical City A Machine Learning Approach Robotics: Biological Hypercomputation and Bio-Inspired Swarms Intelligence AI Support Marketing: Understanding the Customer Journey towards the Business Development
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1