{"title":"A classification and extraction method of attribute hybrid big data based on Naive Bayes algorithm","authors":"Liantian Li, Ling Yang","doi":"10.3233/jcm-226802","DOIUrl":null,"url":null,"abstract":"In the identification of network text information, the existing technology is difficult to accurately extract and classify text information with high propagation speed and high update speed. In order to solve this problem, the research combines the Naive Bayes algorithm with the feature two-dimensional information gain weighting method, uses the feature weighting method to optimize the Naive Bayes algorithm, and calculates the dimension of different documents and data categories through a new feature operation method. The data gain between them can improve its classification performance, and the classification models are compared and analyzed in the actual Chinese and English databases. The research results show that the classification accuracy rates of the IGDC-DWNB model in the Sogou database, 20-newsgroup database, Fudan database and Ruster21578 database are 0.89, 0.89, 0.93, and 0.88, respectively, which are higher than other classification models in the same environment. It can be seen that the model designed in the research has higher classification accuracy, stronger overall performance, and stronger reliability and robustness in practical applications, which can provide a new development idea for big data classification technology.","PeriodicalId":14668,"journal":{"name":"J. Comput. Methods Sci. Eng.","volume":"30 1","pages":"1955-1970"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Comput. Methods Sci. Eng.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jcm-226802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the identification of network text information, the existing technology is difficult to accurately extract and classify text information with high propagation speed and high update speed. In order to solve this problem, the research combines the Naive Bayes algorithm with the feature two-dimensional information gain weighting method, uses the feature weighting method to optimize the Naive Bayes algorithm, and calculates the dimension of different documents and data categories through a new feature operation method. The data gain between them can improve its classification performance, and the classification models are compared and analyzed in the actual Chinese and English databases. The research results show that the classification accuracy rates of the IGDC-DWNB model in the Sogou database, 20-newsgroup database, Fudan database and Ruster21578 database are 0.89, 0.89, 0.93, and 0.88, respectively, which are higher than other classification models in the same environment. It can be seen that the model designed in the research has higher classification accuracy, stronger overall performance, and stronger reliability and robustness in practical applications, which can provide a new development idea for big data classification technology.