{"title":"Text Classification Algorithm Based on TF-IDF and BERT","authors":"Jian Sun, Jiajin Bao, Liping Bu","doi":"10.1109/ICTech55460.2022.00112","DOIUrl":null,"url":null,"abstract":"In the past decades, the speed development of the Web and a large amount of data published through the Web have made it the largest public data source in the world. The network has become a carrier of massive information. How to efficiently classify text for the acquired massive information is a hot issue of current research. The traditional machine learning algorithms for text classification have many disadvantages such as inconspicuous text features, long training period and loss of word order. This article puts forward a BERT model based method for technology information text auto-Categoriz to improve the accuracy text classification of science and technology information. The results suggest that the using method has significantly improved accuracy, recall and fl_score, and has a good Chinese text classification effect.","PeriodicalId":290836,"journal":{"name":"2022 11th International Conference of Information and Communication Technology (ICTech))","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference of Information and Communication Technology (ICTech))","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTech55460.2022.00112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In the past decades, the speed development of the Web and a large amount of data published through the Web have made it the largest public data source in the world. The network has become a carrier of massive information. How to efficiently classify text for the acquired massive information is a hot issue of current research. The traditional machine learning algorithms for text classification have many disadvantages such as inconspicuous text features, long training period and loss of word order. This article puts forward a BERT model based method for technology information text auto-Categoriz to improve the accuracy text classification of science and technology information. The results suggest that the using method has significantly improved accuracy, recall and fl_score, and has a good Chinese text classification effect.