Hasibe Busra Dogru, Sahra Tilki, Akhtar Jamil, Alaa Ali Hameed
{"title":"Deep Learning-Based Classification of News Texts Using Doc2Vec Model","authors":"Hasibe Busra Dogru, Sahra Tilki, Akhtar Jamil, Alaa Ali Hameed","doi":"10.1109/CAIDA51941.2021.9425290","DOIUrl":null,"url":null,"abstract":"The rapid increment in internet usage has also resulted in bulk gerenation of text data . Therefore, investigation of new techniques for automatic classification of textual content is needed as manually managing unstructured text is challenging. The main objective of text classification is to train a model such that it should place an unseen text into correct category. In this study, text classification was performed using the Doc2vec word embedding method on the Turkish Text Classification 3600 (TTC-3600) dataset consisting of Turkish news texts and the BBC-News dataset consisting of English news texts. As the classification method, deep learning-based CNN and traditional machine learning classification methods Gauss Naive Bayes (GNB), Random Forest (RF), Naive Bayes (NB) and Support Vector Machine (SVM) are used. In the proposed model, the highest result was obtained as 94.17% in the Turkish dataset and 96.41% in the English dataset in the classification made with CNN.","PeriodicalId":272573,"journal":{"name":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAIDA51941.2021.9425290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
The rapid increment in internet usage has also resulted in bulk gerenation of text data . Therefore, investigation of new techniques for automatic classification of textual content is needed as manually managing unstructured text is challenging. The main objective of text classification is to train a model such that it should place an unseen text into correct category. In this study, text classification was performed using the Doc2vec word embedding method on the Turkish Text Classification 3600 (TTC-3600) dataset consisting of Turkish news texts and the BBC-News dataset consisting of English news texts. As the classification method, deep learning-based CNN and traditional machine learning classification methods Gauss Naive Bayes (GNB), Random Forest (RF), Naive Bayes (NB) and Support Vector Machine (SVM) are used. In the proposed model, the highest result was obtained as 94.17% in the Turkish dataset and 96.41% in the English dataset in the classification made with CNN.