{"title":"News Topic Classification Using Mutual Information and Bayesian Network","authors":"Fahmi Salman Nurfikri, M. S. Mubarok, Adiwijaya","doi":"10.1109/ICOICT.2018.8528806","DOIUrl":null,"url":null,"abstract":"News topic classification in this research is categorizing or distinguishing news, in textual data format, into a particular category based on information contained in the news. One of methods that can be used for this task is Bayesian Network that is one of uncertainty reasoning methods that uses probabilistic and directed acyclic graph to model conditional dependencies among variables. However, a textual data normally contains a considerable amount of variables and it could be problem for Bayesian Network since a large number of variables results high complexity, especially time complexity, in learning of Bayesian Network both structure and parameters. In addition, a considerable amount of variables could degrade accuracy since some variables might be irrelevant. In this research, we used Mutual Information as text feature selection method to provide relevant features for Bayesian Network classifier. Based on the conducted research, Mutual information as feature selector is able to improve classification performance of Bayesian Network. The highest classification rate obtained by employing Mutual Information is 75.34%, meanwhile the classification rate without Mutual Information is 45.95%, both in micro-average F1-score.","PeriodicalId":266335,"journal":{"name":"2018 6th International Conference on Information and Communication Technology (ICoICT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 6th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICT.2018.8528806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
News topic classification in this research is categorizing or distinguishing news, in textual data format, into a particular category based on information contained in the news. One of methods that can be used for this task is Bayesian Network that is one of uncertainty reasoning methods that uses probabilistic and directed acyclic graph to model conditional dependencies among variables. However, a textual data normally contains a considerable amount of variables and it could be problem for Bayesian Network since a large number of variables results high complexity, especially time complexity, in learning of Bayesian Network both structure and parameters. In addition, a considerable amount of variables could degrade accuracy since some variables might be irrelevant. In this research, we used Mutual Information as text feature selection method to provide relevant features for Bayesian Network classifier. Based on the conducted research, Mutual information as feature selector is able to improve classification performance of Bayesian Network. The highest classification rate obtained by employing Mutual Information is 75.34%, meanwhile the classification rate without Mutual Information is 45.95%, both in micro-average F1-score.