{"title":"Supervised Topic Modeling Using Word Embedding with Machine Learning Techniques","authors":"Rana Nassif, Mohamed Waleed Fahkr","doi":"10.1109/AECT47998.2020.9194177","DOIUrl":null,"url":null,"abstract":"Large amounts of text are collected on the internet every day. As more text documents become available, it becomes essential to categorize them for efficient archiving, retrieval and search. In this paper, we investigate both statistical and machine learning techniques like (HMM & Deep learning network) combined with two well-known word embedding models (word2vec & Glove) for supervised document classification. The investigated combinations are compared with state-of-the-art approaches applied on the same data. The main contribution of this paper is to demonstrate the importance of both the meaning and the order of the word on topic modeling. This has often been overlooked in previous work as neither were taken into consideration where in some others only one was taken. This paper shows that one of our proposed models; which employed a hybrid between LSTM and CNN neural networks, obtained better accuracy on the same dataset than all state-of-the-art models in the literature.","PeriodicalId":331415,"journal":{"name":"2019 International Conference on Advances in the Emerging Computing Technologies (AECT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Advances in the Emerging Computing Technologies (AECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AECT47998.2020.9194177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Large amounts of text are collected on the internet every day. As more text documents become available, it becomes essential to categorize them for efficient archiving, retrieval and search. In this paper, we investigate both statistical and machine learning techniques like (HMM & Deep learning network) combined with two well-known word embedding models (word2vec & Glove) for supervised document classification. The investigated combinations are compared with state-of-the-art approaches applied on the same data. The main contribution of this paper is to demonstrate the importance of both the meaning and the order of the word on topic modeling. This has often been overlooked in previous work as neither were taken into consideration where in some others only one was taken. This paper shows that one of our proposed models; which employed a hybrid between LSTM and CNN neural networks, obtained better accuracy on the same dataset than all state-of-the-art models in the literature.