Neural Networks Based on Latent Dirichlet Allocation For News Web Page Classifications

Adel R. Alharbi, Shwaa D. Alharbi, Amer Aljaedi, Oluwatobi Akanbi
{"title":"Neural Networks Based on Latent Dirichlet Allocation For News Web Page Classifications","authors":"Adel R. Alharbi, Shwaa D. Alharbi, Amer Aljaedi, Oluwatobi Akanbi","doi":"10.1109/IICAIET49801.2020.9257842","DOIUrl":null,"url":null,"abstract":"Any popular news website in our modern life, offering details to millions of users every day. Although computer technology continues to grow, the number of disease data is rising. How to structure the document to enable data recognition dynamically has become one of the main challenges for sophisticated web services. Traditional systematic classification of news text requires not only a lot of human and financial assets but it also hardly accomplishes fast classification function. In this work, we introduce a new method relying on both the Latent Dirichlet Allocation and the Neural Networks that are used in the Arabic document classification. Our approach adopts the Vector Space Model to interpret documents in applications for the text classification. In this process, the text is represented as a term vector; n-grams. These methods can not distinguish semantic or textual content; this results in considerable space for features and semantic losses. In this research, the new proposal utilizes a “topics” sampled as text characteristics by the Latent Dirichlet Allocation method. Effectively it eliminates the problems described. We have extracted important themes (topics) of all the texts. Each theme is identified by a different descriptor distribution, and then each text is depicted on the vectors of certain themes. Our experiments indicate that the proposed solution is capable of achieving high efficiency with an accuracy rate of 85.11% for the Arabic text classification task.","PeriodicalId":300885,"journal":{"name":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET49801.2020.9257842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Any popular news website in our modern life, offering details to millions of users every day. Although computer technology continues to grow, the number of disease data is rising. How to structure the document to enable data recognition dynamically has become one of the main challenges for sophisticated web services. Traditional systematic classification of news text requires not only a lot of human and financial assets but it also hardly accomplishes fast classification function. In this work, we introduce a new method relying on both the Latent Dirichlet Allocation and the Neural Networks that are used in the Arabic document classification. Our approach adopts the Vector Space Model to interpret documents in applications for the text classification. In this process, the text is represented as a term vector; n-grams. These methods can not distinguish semantic or textual content; this results in considerable space for features and semantic losses. In this research, the new proposal utilizes a “topics” sampled as text characteristics by the Latent Dirichlet Allocation method. Effectively it eliminates the problems described. We have extracted important themes (topics) of all the texts. Each theme is identified by a different descriptor distribution, and then each text is depicted on the vectors of certain themes. Our experiments indicate that the proposed solution is capable of achieving high efficiency with an accuracy rate of 85.11% for the Arabic text classification task.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于潜在Dirichlet分配的神经网络用于新闻网页分类
任何在我们现代生活中流行的新闻网站,每天为数百万用户提供细节。尽管计算机技术不断发展,但疾病数据的数量也在不断增加。如何构建文档以动态地实现数据识别已成为复杂web服务的主要挑战之一。传统的新闻文本系统分类不仅需要耗费大量的人力和财力,而且难以实现快速分类的功能。在这项工作中,我们引入了一种依赖于潜在狄利克雷分配和神经网络的新方法,这些方法用于阿拉伯语文档分类。我们的方法采用向量空间模型来解释文本分类应用中的文档。在这个过程中,文本被表示为一个术语向量;字格。这些方法不能区分语义或文本内容;这给特征和语义损失留下了相当大的空间。在本研究中,新提案利用潜在狄利克雷分配方法采样的“主题”作为文本特征。它有效地消除了所描述的问题。我们摘录了所有文本的重要主题。每个主题由不同的描述符分布来标识,然后每个文本在特定主题的向量上进行描绘。实验表明,该方法对阿拉伯文本分类任务具有较高的效率,准确率达到85.11%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Estimating the Number of Cameras Required for a Given Classroom for Face-based Smart Attendance System Stock Market Prediction using Ensemble of Deep Neural Networks Timed Cellular Automata for Flight Delay Scheduling Optimization Experimenting Deep Convolutional Visual Feature Learning using Compositional Subspace Representation and Fashion-MNIST An Investigation of the Effect of Different Number of Electrodes on EIT Reconstructed Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1