一种网络新闻分类方法:融合噪声滤波和卷积神经网络

Chunhui He, Yanli Hu, Aixia Zhou, Zhen Tan, Chong Zhang, Bin Ge
{"title":"一种网络新闻分类方法:融合噪声滤波和卷积神经网络","authors":"Chunhui He, Yanli Hu, Aixia Zhou, Zhen Tan, Chong Zhang, Bin Ge","doi":"10.1145/3421515.3421523","DOIUrl":null,"url":null,"abstract":"As the way of Internet information transfer, web news plays a significant role in information sharing. Considering that web news usually contains a lot of content, after in-depth analysis, we found that not all content is related to the news topic, and a lot of web news contains some noise content, and these noises content have serious interference to the text classification task. So, how to filter noise and purify web news content to improve the accuracy of web news classification has become a challenging problem. In this paper, we proposed a web news classification method via fusing noise detection, BERT-based semantic similarity noise filtering and convolutional neural network (NF-CNN) to solve the problem. In order to comprehensively evaluate the performance of the method, we use the Chinese public news classification dataset to evaluate it. The experimental results demonstrate that our method can effectively detect and filter a lot of noise text and the average F1 score can reach 95.61% on web news classification task.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Web News Classification Method: Fusion Noise Filtering and Convolutional Neural Network\",\"authors\":\"Chunhui He, Yanli Hu, Aixia Zhou, Zhen Tan, Chong Zhang, Bin Ge\",\"doi\":\"10.1145/3421515.3421523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the way of Internet information transfer, web news plays a significant role in information sharing. Considering that web news usually contains a lot of content, after in-depth analysis, we found that not all content is related to the news topic, and a lot of web news contains some noise content, and these noises content have serious interference to the text classification task. So, how to filter noise and purify web news content to improve the accuracy of web news classification has become a challenging problem. In this paper, we proposed a web news classification method via fusing noise detection, BERT-based semantic similarity noise filtering and convolutional neural network (NF-CNN) to solve the problem. In order to comprehensively evaluate the performance of the method, we use the Chinese public news classification dataset to evaluate it. The experimental results demonstrate that our method can effectively detect and filter a lot of noise text and the average F1 score can reach 95.61% on web news classification task.\",\"PeriodicalId\":294293,\"journal\":{\"name\":\"2020 2nd Symposium on Signal Processing Systems\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd Symposium on Signal Processing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3421515.3421523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd Symposium on Signal Processing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3421515.3421523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

网络新闻作为互联网信息传递的一种方式,在信息共享方面发挥着重要的作用。考虑到网络新闻通常包含大量的内容,经过深入分析,我们发现并不是所有的内容都与新闻主题相关,并且很多网络新闻包含一些噪声内容,这些噪声内容对文本分类任务有严重的干扰。因此,如何过滤噪声,净化网络新闻内容,提高网络新闻分类的准确性成为一个具有挑战性的问题。本文提出了一种融合噪声检测、基于bert的语义相似度噪声滤波和卷积神经网络(NF-CNN)的网络新闻分类方法。为了全面评价该方法的性能,我们使用中文公开新闻分类数据集对其进行评价。实验结果表明,我们的方法可以有效地检测和过滤大量的噪声文本,在网络新闻分类任务上的平均F1分可以达到95.61%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Web News Classification Method: Fusion Noise Filtering and Convolutional Neural Network
As the way of Internet information transfer, web news plays a significant role in information sharing. Considering that web news usually contains a lot of content, after in-depth analysis, we found that not all content is related to the news topic, and a lot of web news contains some noise content, and these noises content have serious interference to the text classification task. So, how to filter noise and purify web news content to improve the accuracy of web news classification has become a challenging problem. In this paper, we proposed a web news classification method via fusing noise detection, BERT-based semantic similarity noise filtering and convolutional neural network (NF-CNN) to solve the problem. In order to comprehensively evaluate the performance of the method, we use the Chinese public news classification dataset to evaluate it. The experimental results demonstrate that our method can effectively detect and filter a lot of noise text and the average F1 score can reach 95.61% on web news classification task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Joint Opinion Target and Target-oriented Opinion Words Extraction by BERT and IOT Model Feature Extraction and Matching of Slam Image Based on Improved SIFT Algorithm Color Recognition of Vehicle Based on Low Light Enhancement and Pixel-wise Contextual Attention A Pedestrian Re-identification Method Based on Multi-frame Fusion Part-based Convolutional Baseline Network Adaptive Robust Watermarking Algorithm Based on Image Texture
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1