Automatic Multiclass Document Classification of Hindi Poems using Machine Learning Techniques

Kaushika Pal, B. Patel
{"title":"Automatic Multiclass Document Classification of Hindi Poems using Machine Learning Techniques","authors":"Kaushika Pal, B. Patel","doi":"10.1109/INCET49848.2020.9154001","DOIUrl":null,"url":null,"abstract":"Text Classification of Indic language face fundamental challenges in terms of achieving good accuracy, as the languages are morphologically rich and too much information is fused in words. In this paper an actual experiment implemented is demonstrated for Classification of Hindi Poem documents to classify poems into 3 classes namely Shringar, Karuna and Veera. Poem content represents mood and have sentiments associated, the classification of emotions become more challenging when the language is morphologically rich. In current experiment 122 documents manually collected from web were processed and after preprocessing 122 documents were generated containing only meaningful data, than processed documents were used to extract features using Bag of Words Model and those features are converted into numeric representation for passing them into Training model. For classification 5 machine-learning classification algorithms namely Random Forest, Support Vector Machine, Decision Tree Algorithm, K nearest Neighbors and Naive Bayes each with it’s two versions are used. The model is tested with 20% of test data and the results are compared with stored label of this data to calculate accuracy. Experiments shows that Naïve Bayes with 64% accuracy and Random Forest with 56% are performing better as compared to other algorithms for Hindi Poem Classification.","PeriodicalId":174411,"journal":{"name":"2020 International Conference for Emerging Technology (INCET)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference for Emerging Technology (INCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INCET49848.2020.9154001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Text Classification of Indic language face fundamental challenges in terms of achieving good accuracy, as the languages are morphologically rich and too much information is fused in words. In this paper an actual experiment implemented is demonstrated for Classification of Hindi Poem documents to classify poems into 3 classes namely Shringar, Karuna and Veera. Poem content represents mood and have sentiments associated, the classification of emotions become more challenging when the language is morphologically rich. In current experiment 122 documents manually collected from web were processed and after preprocessing 122 documents were generated containing only meaningful data, than processed documents were used to extract features using Bag of Words Model and those features are converted into numeric representation for passing them into Training model. For classification 5 machine-learning classification algorithms namely Random Forest, Support Vector Machine, Decision Tree Algorithm, K nearest Neighbors and Naive Bayes each with it’s two versions are used. The model is tested with 20% of test data and the results are compared with stored label of this data to calculate accuracy. Experiments shows that Naïve Bayes with 64% accuracy and Random Forest with 56% are performing better as compared to other algorithms for Hindi Poem Classification.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用机器学习技术的印地语诗歌自动多类文档分类
印度语文本分类由于其语言形态丰富,单词中融合了过多的信息,在准确性方面面临着根本性的挑战。本文以印地语诗歌文献分类为例,进行了实际实验,将诗歌分为Shringar、Karuna和Veera三类。诗歌内容代表着情绪并与情感相关联,当语言形态丰富时,情感的分类就变得更具挑战性。本实验对人工采集的122篇网络文档进行处理,预处理后生成122篇只包含有意义数据的文档,然后利用word Bag模型提取特征,并将特征转换为数字表示传递给Training模型。对于分类,使用了5种机器学习分类算法,即随机森林,支持向量机,决策树算法,K近邻和朴素贝叶斯,每种算法都有两个版本。用20%的测试数据对模型进行测试,并将结果与该数据的存储标签进行比较,计算准确率。实验表明,Naïve Bayes的准确率为64%,Random Forest的准确率为56%,与其他印地语诗歌分类算法相比,表现更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Investigation of DC Parameters of Double Gate Tunnel Field Effect Transistor (DG- TFET) for different Gate Dielectrics An Open-source Framework for Robust Portable Cellular Network Efficiency Comparison of Supervised and Unsupervised Classifier on Content Based Classification using Shape, Color, Texture Improved Divorce Prediction Using Machine learning- Particle Swarm Optimization (PSO) Machine Learning Based Synchrophasor Data Analysis for Islanding Detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1