Supervised Topic Modeling Using Word Embedding with Machine Learning Techniques

Rana Nassif, Mohamed Waleed Fahkr
{"title":"Supervised Topic Modeling Using Word Embedding with Machine Learning Techniques","authors":"Rana Nassif, Mohamed Waleed Fahkr","doi":"10.1109/AECT47998.2020.9194177","DOIUrl":null,"url":null,"abstract":"Large amounts of text are collected on the internet every day. As more text documents become available, it becomes essential to categorize them for efficient archiving, retrieval and search. In this paper, we investigate both statistical and machine learning techniques like (HMM & Deep learning network) combined with two well-known word embedding models (word2vec & Glove) for supervised document classification. The investigated combinations are compared with state-of-the-art approaches applied on the same data. The main contribution of this paper is to demonstrate the importance of both the meaning and the order of the word on topic modeling. This has often been overlooked in previous work as neither were taken into consideration where in some others only one was taken. This paper shows that one of our proposed models; which employed a hybrid between LSTM and CNN neural networks, obtained better accuracy on the same dataset than all state-of-the-art models in the literature.","PeriodicalId":331415,"journal":{"name":"2019 International Conference on Advances in the Emerging Computing Technologies (AECT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Advances in the Emerging Computing Technologies (AECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AECT47998.2020.9194177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Large amounts of text are collected on the internet every day. As more text documents become available, it becomes essential to categorize them for efficient archiving, retrieval and search. In this paper, we investigate both statistical and machine learning techniques like (HMM & Deep learning network) combined with two well-known word embedding models (word2vec & Glove) for supervised document classification. The investigated combinations are compared with state-of-the-art approaches applied on the same data. The main contribution of this paper is to demonstrate the importance of both the meaning and the order of the word on topic modeling. This has often been overlooked in previous work as neither were taken into consideration where in some others only one was taken. This paper shows that one of our proposed models; which employed a hybrid between LSTM and CNN neural networks, obtained better accuracy on the same dataset than all state-of-the-art models in the literature.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用词嵌入和机器学习技术的监督主题建模
每天在互联网上收集大量的文本。随着越来越多的文本文档变得可用,对它们进行分类以进行有效的归档、检索和搜索变得至关重要。在本文中,我们研究了统计和机器学习技术,如HMM和深度学习网络,结合两个著名的词嵌入模型(word2vec和Glove)进行监督文档分类。将所研究的组合与应用于相同数据的最先进方法进行比较。本文的主要贡献在于论证了语意和语序对主题建模的重要性。这一点在以前的工作中经常被忽视,因为两者都没有考虑到,而在其他一些工作中只考虑了一个。本文展示了我们提出的一个模型;它采用了LSTM和CNN神经网络的混合,在相同的数据集上获得了比文献中所有最先进的模型更好的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Permissioned Blockchain-Based Security for SDN in IoT Cloud Networks Educational Business Intelligence Framework Visualizing Significant Features using Metaheuristic Algorithm and Feature Selection A Formal Approach To Validate Block-Chains Software Cost Estimation – A Comparative Study of COCOMO-II and Bailey-Basili Models IoT for Smart Parking
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1