基于N-Gram的Naïve贝叶斯分类器分层分类性能评价

J. Shah
{"title":"基于N-Gram的Naïve贝叶斯分类器分层分类性能评价","authors":"J. Shah","doi":"10.1109/ICCMC.2019.8819751","DOIUrl":null,"url":null,"abstract":"Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.","PeriodicalId":232624,"journal":{"name":"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Performance Evaluation of Applying N-Gram Based Naïve Bayes Classifier for Hierarchical Classification\",\"authors\":\"J. Shah\",\"doi\":\"10.1109/ICCMC.2019.8819751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.\",\"PeriodicalId\":232624,\"journal\":{\"name\":\"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCMC.2019.8819751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMC.2019.8819751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

文本分类是为文本文档分配一个或多个类标签的过程。如果文本分类问题的类别过多,并且某些类别的训练文档数量较少,则文本分类任务变得困难。对于培训文档数量较少的类别,召回率会更低。为了处理类别过多的文本分类问题,并考虑用户概要和文档概要中类别之间的父子/兄弟关系进行基于内容的过滤,分层分类是较好的方法。分层分类的主要问题是错误传播。在层次结构的早期级别发生的错误将延续到它下面的所有级别。因此,需要减少在层次结构早期的错误分类。术语歧义可能是导致分类错误的原因之一。Naïve贝叶斯分类方法主要用于文本分类问题,因为它需要较少的训练和测试时间。Naïve贝叶斯模型认为对于给定的类,项之间是不依赖的。对于项相互依赖的数据,naïve贝叶斯算法的性能会下降。本文将基于词级n图的多项式Naïve贝叶斯分类方法与层次分类相结合,减少了层次早期出现的误分类,改进了基于内容的过滤。该算法还提出了一种减少n-gram模型中计算项概率的执行时间的方法naïve bayes模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Performance Evaluation of Applying N-Gram Based Naïve Bayes Classifier for Hierarchical Classification
Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Review on Design & Testing of CCD Detector Data Generation & Acquisition System Comparative Analysis of Segmentation Techniques using Histopathological Images of Breast Cancer Decoding Parallel Program Execution by using Java Interactive Visualization Environment (JIVE): Behavioral and Performance Analysis Bandwidth enhancement of a rectangular inset-fed micro-strip patch antenna with DGS for ISM band Classification of Abusive Comments in Social Media using Deep Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1