News Topic Classification Using Mutual Information and Bayesian Network

Fahmi Salman Nurfikri, M. S. Mubarok, Adiwijaya
{"title":"News Topic Classification Using Mutual Information and Bayesian Network","authors":"Fahmi Salman Nurfikri, M. S. Mubarok, Adiwijaya","doi":"10.1109/ICOICT.2018.8528806","DOIUrl":null,"url":null,"abstract":"News topic classification in this research is categorizing or distinguishing news, in textual data format, into a particular category based on information contained in the news. One of methods that can be used for this task is Bayesian Network that is one of uncertainty reasoning methods that uses probabilistic and directed acyclic graph to model conditional dependencies among variables. However, a textual data normally contains a considerable amount of variables and it could be problem for Bayesian Network since a large number of variables results high complexity, especially time complexity, in learning of Bayesian Network both structure and parameters. In addition, a considerable amount of variables could degrade accuracy since some variables might be irrelevant. In this research, we used Mutual Information as text feature selection method to provide relevant features for Bayesian Network classifier. Based on the conducted research, Mutual information as feature selector is able to improve classification performance of Bayesian Network. The highest classification rate obtained by employing Mutual Information is 75.34%, meanwhile the classification rate without Mutual Information is 45.95%, both in micro-average F1-score.","PeriodicalId":266335,"journal":{"name":"2018 6th International Conference on Information and Communication Technology (ICoICT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 6th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICT.2018.8528806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

News topic classification in this research is categorizing or distinguishing news, in textual data format, into a particular category based on information contained in the news. One of methods that can be used for this task is Bayesian Network that is one of uncertainty reasoning methods that uses probabilistic and directed acyclic graph to model conditional dependencies among variables. However, a textual data normally contains a considerable amount of variables and it could be problem for Bayesian Network since a large number of variables results high complexity, especially time complexity, in learning of Bayesian Network both structure and parameters. In addition, a considerable amount of variables could degrade accuracy since some variables might be irrelevant. In this research, we used Mutual Information as text feature selection method to provide relevant features for Bayesian Network classifier. Based on the conducted research, Mutual information as feature selector is able to improve classification performance of Bayesian Network. The highest classification rate obtained by employing Mutual Information is 75.34%, meanwhile the classification rate without Mutual Information is 45.95%, both in micro-average F1-score.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于互信息和贝叶斯网络的新闻主题分类
本研究中的新闻主题分类是根据新闻中包含的信息,以文本数据格式将新闻分类或区分为特定的类别。可用于此任务的方法之一是贝叶斯网络,它是一种不确定性推理方法,使用概率和有向无环图来建模变量之间的条件依赖关系。然而,文本数据通常包含相当数量的变量,这对于贝叶斯网络来说可能是一个问题,因为大量的变量导致贝叶斯网络在学习结构和参数方面的高复杂性,特别是时间复杂性。此外,大量的变量可能会降低准确性,因为有些变量可能是不相关的。在本研究中,我们采用互信息作为文本特征选择方法,为贝叶斯网络分类器提供相关特征。根据研究结果,互信息作为特征选择器能够提高贝叶斯网络的分类性能。采用互信息的分类率最高为75.34%,未采用互信息的分类率最高为45.95%,均为微平均f1分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Steering Committee Analysis of Non-Negative Double Singular Value Decomposition Initialization Method on Eigenspace-based Fuzzy C-Means Algorithm for Indonesian Online News Topic Detection Mining Web Log Data for Personalized Recommendation System Kernelization of Eigenspace-Based Fuzzy C-Means for Topic Detection on Indonesian News Mining Customer Opinion for Topic Modeling Purpose: Case Study of Ride-Hailing Service Provider
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1