{"title":"Automatic Multiclass Document Classification of Hindi Poems using Machine Learning Techniques","authors":"Kaushika Pal, B. Patel","doi":"10.1109/INCET49848.2020.9154001","DOIUrl":null,"url":null,"abstract":"Text Classification of Indic language face fundamental challenges in terms of achieving good accuracy, as the languages are morphologically rich and too much information is fused in words. In this paper an actual experiment implemented is demonstrated for Classification of Hindi Poem documents to classify poems into 3 classes namely Shringar, Karuna and Veera. Poem content represents mood and have sentiments associated, the classification of emotions become more challenging when the language is morphologically rich. In current experiment 122 documents manually collected from web were processed and after preprocessing 122 documents were generated containing only meaningful data, than processed documents were used to extract features using Bag of Words Model and those features are converted into numeric representation for passing them into Training model. For classification 5 machine-learning classification algorithms namely Random Forest, Support Vector Machine, Decision Tree Algorithm, K nearest Neighbors and Naive Bayes each with it’s two versions are used. The model is tested with 20% of test data and the results are compared with stored label of this data to calculate accuracy. Experiments shows that Naïve Bayes with 64% accuracy and Random Forest with 56% are performing better as compared to other algorithms for Hindi Poem Classification.","PeriodicalId":174411,"journal":{"name":"2020 International Conference for Emerging Technology (INCET)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference for Emerging Technology (INCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INCET49848.2020.9154001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Text Classification of Indic language face fundamental challenges in terms of achieving good accuracy, as the languages are morphologically rich and too much information is fused in words. In this paper an actual experiment implemented is demonstrated for Classification of Hindi Poem documents to classify poems into 3 classes namely Shringar, Karuna and Veera. Poem content represents mood and have sentiments associated, the classification of emotions become more challenging when the language is morphologically rich. In current experiment 122 documents manually collected from web were processed and after preprocessing 122 documents were generated containing only meaningful data, than processed documents were used to extract features using Bag of Words Model and those features are converted into numeric representation for passing them into Training model. For classification 5 machine-learning classification algorithms namely Random Forest, Support Vector Machine, Decision Tree Algorithm, K nearest Neighbors and Naive Bayes each with it’s two versions are used. The model is tested with 20% of test data and the results are compared with stored label of this data to calculate accuracy. Experiments shows that Naïve Bayes with 64% accuracy and Random Forest with 56% are performing better as compared to other algorithms for Hindi Poem Classification.