Abdullah Y. Muaad , G. Hemantha Kumar , J. Hanumanthappa , J.V. Bibal Benifa , M. Naveen Mourya , Channabasava Chola , M. Pramodha , R. Bhairava
{"title":"一种使用机器学习的有效阿拉伯语文档分类方法","authors":"Abdullah Y. Muaad , G. Hemantha Kumar , J. Hanumanthappa , J.V. Bibal Benifa , M. Naveen Mourya , Channabasava Chola , M. Pramodha , R. Bhairava","doi":"10.1016/j.gltp.2022.03.003","DOIUrl":null,"url":null,"abstract":"<div><p>Arabic text classification is one application of Natural Language Processing (NLP). It has been used to analyze and categorize Arabic text. Analyzing text has become an essential part of our lives because of the increasing number of text data which makes text classification a big data problem. Arabic text classification systems become significant to maintain vital information in many domains such as education, and health sector, and public services. In the presented research work, the Arabic text classification model is developed using various algorithms namely Multinomial Naïve Bayesian (MNB), Bernoulli Naïve Bayesian (BNB), Stochastic Gradient Descent (SGD), Logistic Regression (LR), Support vector classifier (SVC), Linear SVC, and convolutional neural networks (CNN). These algorithms have been implemented utilizing the Al-Khaleej dataset. The experiments are carried out with various representation models and it is observed that CNN with character level model outperforms others. The result of CNN exceeds the state-of-the-art machine learning method with an accuracy equal to 98. The presented methods will be useful in different domains, particularly on social media.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"3 1","pages":"Pages 267-271"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666285X22000036/pdfft?md5=36c739d798dd1fd9e54e70d8ff68307f&pid=1-s2.0-S2666285X22000036-main.pdf","citationCount":"8","resultStr":"{\"title\":\"An effective approach for Arabic document classification using machine learning\",\"authors\":\"Abdullah Y. Muaad , G. Hemantha Kumar , J. Hanumanthappa , J.V. Bibal Benifa , M. Naveen Mourya , Channabasava Chola , M. Pramodha , R. Bhairava\",\"doi\":\"10.1016/j.gltp.2022.03.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Arabic text classification is one application of Natural Language Processing (NLP). It has been used to analyze and categorize Arabic text. Analyzing text has become an essential part of our lives because of the increasing number of text data which makes text classification a big data problem. Arabic text classification systems become significant to maintain vital information in many domains such as education, and health sector, and public services. In the presented research work, the Arabic text classification model is developed using various algorithms namely Multinomial Naïve Bayesian (MNB), Bernoulli Naïve Bayesian (BNB), Stochastic Gradient Descent (SGD), Logistic Regression (LR), Support vector classifier (SVC), Linear SVC, and convolutional neural networks (CNN). These algorithms have been implemented utilizing the Al-Khaleej dataset. The experiments are carried out with various representation models and it is observed that CNN with character level model outperforms others. The result of CNN exceeds the state-of-the-art machine learning method with an accuracy equal to 98. The presented methods will be useful in different domains, particularly on social media.</p></div>\",\"PeriodicalId\":100588,\"journal\":{\"name\":\"Global Transitions Proceedings\",\"volume\":\"3 1\",\"pages\":\"Pages 267-271\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666285X22000036/pdfft?md5=36c739d798dd1fd9e54e70d8ff68307f&pid=1-s2.0-S2666285X22000036-main.pdf\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Transitions Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666285X22000036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X22000036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An effective approach for Arabic document classification using machine learning
Arabic text classification is one application of Natural Language Processing (NLP). It has been used to analyze and categorize Arabic text. Analyzing text has become an essential part of our lives because of the increasing number of text data which makes text classification a big data problem. Arabic text classification systems become significant to maintain vital information in many domains such as education, and health sector, and public services. In the presented research work, the Arabic text classification model is developed using various algorithms namely Multinomial Naïve Bayesian (MNB), Bernoulli Naïve Bayesian (BNB), Stochastic Gradient Descent (SGD), Logistic Regression (LR), Support vector classifier (SVC), Linear SVC, and convolutional neural networks (CNN). These algorithms have been implemented utilizing the Al-Khaleej dataset. The experiments are carried out with various representation models and it is observed that CNN with character level model outperforms others. The result of CNN exceeds the state-of-the-art machine learning method with an accuracy equal to 98. The presented methods will be useful in different domains, particularly on social media.