{"title":"Phrases based Document Classification from Semi Supervised Hierarchical LDA","authors":"Rohit Agarwal","doi":"10.1109/iccakm50778.2021.9357720","DOIUrl":null,"url":null,"abstract":"Different state-of-the-art document classification models are based on bag of words model such as Support Vector Machine, Naive Bayes and Neural Network. These models do not contain the word's semantic meaning. In any document, meaning of the words can be demonstrated by their presence and vicinity of particular words. Bag of Phrases is one technique by which author can preserve the vicinity of the words. This model is proficient to distinguish the capability of phrases in document classification. In this paper author proposes Semi-Supervised Hierarchical Latent Dirichlet Allocation (SSHLDA) model which uses the outstanding theme to isolate the phrases from the corpus. The proposed model incorporates the phrases in vector space model for document classification. Experiment performs on the organic document with Bag of Phrase technique and show the effective classification. When compare with state-of-the-models.","PeriodicalId":165854,"journal":{"name":"2021 2nd International Conference on Computation, Automation and Knowledge Management (ICCAKM)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Conference on Computation, Automation and Knowledge Management (ICCAKM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccakm50778.2021.9357720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Different state-of-the-art document classification models are based on bag of words model such as Support Vector Machine, Naive Bayes and Neural Network. These models do not contain the word's semantic meaning. In any document, meaning of the words can be demonstrated by their presence and vicinity of particular words. Bag of Phrases is one technique by which author can preserve the vicinity of the words. This model is proficient to distinguish the capability of phrases in document classification. In this paper author proposes Semi-Supervised Hierarchical Latent Dirichlet Allocation (SSHLDA) model which uses the outstanding theme to isolate the phrases from the corpus. The proposed model incorporates the phrases in vector space model for document classification. Experiment performs on the organic document with Bag of Phrase technique and show the effective classification. When compare with state-of-the-models.