一种使用机器学习的有效阿拉伯语文档分类方法

Abdullah Y. Muaad , G. Hemantha Kumar , J. Hanumanthappa , J.V. Bibal Benifa , M. Naveen Mourya , Channabasava Chola , M. Pramodha , R. Bhairava
{"title":"一种使用机器学习的有效阿拉伯语文档分类方法","authors":"Abdullah Y. Muaad ,&nbsp;G. Hemantha Kumar ,&nbsp;J. Hanumanthappa ,&nbsp;J.V. Bibal Benifa ,&nbsp;M. Naveen Mourya ,&nbsp;Channabasava Chola ,&nbsp;M. Pramodha ,&nbsp;R. Bhairava","doi":"10.1016/j.gltp.2022.03.003","DOIUrl":null,"url":null,"abstract":"<div><p>Arabic text classification is one application of Natural Language Processing (NLP). It has been used to analyze and categorize Arabic text. Analyzing text has become an essential part of our lives because of the increasing number of text data which makes text classification a big data problem. Arabic text classification systems become significant to maintain vital information in many domains such as education, and health sector, and public services. In the presented research work, the Arabic text classification model is developed using various algorithms namely Multinomial Naïve Bayesian (MNB), Bernoulli Naïve Bayesian (BNB), Stochastic Gradient Descent (SGD), Logistic Regression (LR), Support vector classifier (SVC), Linear SVC, and convolutional neural networks (CNN). These algorithms have been implemented utilizing the Al-Khaleej dataset. The experiments are carried out with various representation models and it is observed that CNN with character level model outperforms others. The result of CNN exceeds the state-of-the-art machine learning method with an accuracy equal to 98. The presented methods will be useful in different domains, particularly on social media.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"3 1","pages":"Pages 267-271"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666285X22000036/pdfft?md5=36c739d798dd1fd9e54e70d8ff68307f&pid=1-s2.0-S2666285X22000036-main.pdf","citationCount":"8","resultStr":"{\"title\":\"An effective approach for Arabic document classification using machine learning\",\"authors\":\"Abdullah Y. Muaad ,&nbsp;G. Hemantha Kumar ,&nbsp;J. Hanumanthappa ,&nbsp;J.V. Bibal Benifa ,&nbsp;M. Naveen Mourya ,&nbsp;Channabasava Chola ,&nbsp;M. Pramodha ,&nbsp;R. Bhairava\",\"doi\":\"10.1016/j.gltp.2022.03.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Arabic text classification is one application of Natural Language Processing (NLP). It has been used to analyze and categorize Arabic text. Analyzing text has become an essential part of our lives because of the increasing number of text data which makes text classification a big data problem. Arabic text classification systems become significant to maintain vital information in many domains such as education, and health sector, and public services. In the presented research work, the Arabic text classification model is developed using various algorithms namely Multinomial Naïve Bayesian (MNB), Bernoulli Naïve Bayesian (BNB), Stochastic Gradient Descent (SGD), Logistic Regression (LR), Support vector classifier (SVC), Linear SVC, and convolutional neural networks (CNN). These algorithms have been implemented utilizing the Al-Khaleej dataset. The experiments are carried out with various representation models and it is observed that CNN with character level model outperforms others. The result of CNN exceeds the state-of-the-art machine learning method with an accuracy equal to 98. The presented methods will be useful in different domains, particularly on social media.</p></div>\",\"PeriodicalId\":100588,\"journal\":{\"name\":\"Global Transitions Proceedings\",\"volume\":\"3 1\",\"pages\":\"Pages 267-271\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666285X22000036/pdfft?md5=36c739d798dd1fd9e54e70d8ff68307f&pid=1-s2.0-S2666285X22000036-main.pdf\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Transitions Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666285X22000036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X22000036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

阿拉伯语文本分类是自然语言处理(NLP)的一个应用。它已被用于分析和分类阿拉伯语文本。由于文本数据的不断增加,文本分析已经成为我们生活中必不可少的一部分,这使得文本分类成为一个大数据问题。阿拉伯文分类系统对于维护教育、卫生部门和公共服务等许多领域的重要信息具有重要意义。在本研究中,使用多项Naïve贝叶斯(MNB)、伯努利Naïve贝叶斯(BNB)、随机梯度下降(SGD)、逻辑回归(LR)、支持向量分类器(SVC)、线性SVC和卷积神经网络(CNN)等算法开发了阿拉伯语文本分类模型。这些算法是利用Al-Khaleej数据集实现的。用不同的表示模型进行了实验,观察到具有字符级模型的CNN表现优于其他模型。CNN的结果超过了最先进的机器学习方法,准确率达到98。所提出的方法将在不同的领域有用,特别是在社交媒体上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An effective approach for Arabic document classification using machine learning

Arabic text classification is one application of Natural Language Processing (NLP). It has been used to analyze and categorize Arabic text. Analyzing text has become an essential part of our lives because of the increasing number of text data which makes text classification a big data problem. Arabic text classification systems become significant to maintain vital information in many domains such as education, and health sector, and public services. In the presented research work, the Arabic text classification model is developed using various algorithms namely Multinomial Naïve Bayesian (MNB), Bernoulli Naïve Bayesian (BNB), Stochastic Gradient Descent (SGD), Logistic Regression (LR), Support vector classifier (SVC), Linear SVC, and convolutional neural networks (CNN). These algorithms have been implemented utilizing the Al-Khaleej dataset. The experiments are carried out with various representation models and it is observed that CNN with character level model outperforms others. The result of CNN exceeds the state-of-the-art machine learning method with an accuracy equal to 98. The presented methods will be useful in different domains, particularly on social media.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhanced Energy Efficient Secure Routing Protocol for Mobile Ad-Hoc Network Grid interconnected H-bridge multilevel inverter for renewable power applications using repeating units and level boosting network Power Generation Using Ocean Waves: A Review Development of an Arabic HQAS-based ASAG to consider an ignored knowledge in misspelled multiple words short answers Smartphone assist deep neural network to detect the citrus diseases in Agri-informatics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1