Sandhya Aneja, Siti Nur Afikah Bte Abdul Mazid, Nagender Aneja
Machine translation has many applications such as news translation, email translation, official letter translation etc. Commercial translators, e.g. Google Translation lags in regional vocabulary and are unable to learn the bilingual text in the source and target languages within the input. In this paper, a regional vocabulary-based application-oriented Neural Machine Translation (NMT) model is proposed over the data set of emails used at the University for communication over a period of three years. A state-of-the-art Sequence-to-Sequence Neural Network for ML → EN (Malay to English) and EN → ML (English to Malay) translations is compared with Google Translate using Gated Recurrent Unit Recurrent Neural Network machine translation model with attention decoder. The low BLEU score of Google Translation in comparison to our model indicates that the application based regional models are better. The low BLEU score of English to Malay of our model and Google Translation indicates that the Malay Language has complex language features corresponding to English.
{"title":"Neural Machine Translation model for University Email Application","authors":"Sandhya Aneja, Siti Nur Afikah Bte Abdul Mazid, Nagender Aneja","doi":"10.1145/3421515.3421522","DOIUrl":"https://doi.org/10.1145/3421515.3421522","url":null,"abstract":"Machine translation has many applications such as news translation, email translation, official letter translation etc. Commercial translators, e.g. Google Translation lags in regional vocabulary and are unable to learn the bilingual text in the source and target languages within the input. In this paper, a regional vocabulary-based application-oriented Neural Machine Translation (NMT) model is proposed over the data set of emails used at the University for communication over a period of three years. A state-of-the-art Sequence-to-Sequence Neural Network for ML → EN (Malay to English) and EN → ML (English to Malay) translations is compared with Google Translate using Gated Recurrent Unit Recurrent Neural Network machine translation model with attention decoder. The low BLEU score of Google Translation in comparison to our model indicates that the application based regional models are better. The low BLEU score of English to Malay of our model and Google Translation indicates that the Malay Language has complex language features corresponding to English.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133854331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to improve the robustness and accuracy of slam system, the Improved SIFT algorithm is used to extract the image features. Firstly, the characteristics of the image in slam are analyzed and the image preprocessing is carried out to reduce the gray mutation. Secondly, in order to meet the real-time requirements, the feature descriptors of sift are simplified to improve the speed. Using the continuity of slam image, the method of pixel neighborhood matching reduces the time of feature matching and reduces the error matching rate of repeated texture. GPU is used to implement the Improved SIFT feature algorithm. Finally, the simulation results show that the trajectory accuracy is improved by more than 35% and the image processing time is about 12ms. At the same time, the system accuracy is improved.
{"title":"Feature Extraction and Matching of Slam Image Based on Improved SIFT Algorithm","authors":"Xinrong Mao, Kaiming Liu, Y. Hang","doi":"10.1145/3421515.3421528","DOIUrl":"https://doi.org/10.1145/3421515.3421528","url":null,"abstract":"In order to improve the robustness and accuracy of slam system, the Improved SIFT algorithm is used to extract the image features. Firstly, the characteristics of the image in slam are analyzed and the image preprocessing is carried out to reduce the gray mutation. Secondly, in order to meet the real-time requirements, the feature descriptors of sift are simplified to improve the speed. Using the continuity of slam image, the method of pixel neighborhood matching reduces the time of feature matching and reduces the error matching rate of repeated texture. GPU is used to implement the Improved SIFT feature algorithm. Finally, the simulation results show that the trajectory accuracy is improved by more than 35% and the image processing time is about 12ms. At the same time, the system accuracy is improved.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115254350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we investigate two sub-tasks of aspect-based sentiment analysis (ABSA) through the pre-trained language model BERT, namely opinion target extraction (OTE) and target-oriented opinion words extraction (TOWE). Specifically, we build a novel framework for the joint extraction model of opinion target and target-oriented opinion words feedback, which aims to extract the opinion target and corresponding opinion words. In order to accomplish the TOWE task more effectively, we proposed an IO-LSTM+Transformer structure, termed IOT, which has excellent performance in domain-specific datasets when combined with the BERT pre-training model. To validate the effectiveness of our model, we develop a pipeline model for comparison. Experiment results show that our model can extract the pair of opinion target and opinion words from the sentence more effectively than the pipeline model. Therefore, our joint model has the potential to facilitate other tasks of ABSA.
{"title":"Joint Opinion Target and Target-oriented Opinion Words Extraction by BERT and IOT Model","authors":"Yuanfa Zhu, Weiwen Zhang, Depei Wang","doi":"10.1145/3421515.3421536","DOIUrl":"https://doi.org/10.1145/3421515.3421536","url":null,"abstract":"In this paper, we investigate two sub-tasks of aspect-based sentiment analysis (ABSA) through the pre-trained language model BERT, namely opinion target extraction (OTE) and target-oriented opinion words extraction (TOWE). Specifically, we build a novel framework for the joint extraction model of opinion target and target-oriented opinion words feedback, which aims to extract the opinion target and corresponding opinion words. In order to accomplish the TOWE task more effectively, we proposed an IO-LSTM+Transformer structure, termed IOT, which has excellent performance in domain-specific datasets when combined with the BERT pre-training model. To validate the effectiveness of our model, we develop a pipeline model for comparison. Experiment results show that our model can extract the pair of opinion target and opinion words from the sentence more effectively than the pipeline model. Therefore, our joint model has the potential to facilitate other tasks of ABSA.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114204988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Esophageal cancer is one of the diseases afflicting human beings. Automatic segmentation of esophagus and esophageal tumor from computed tomography (CT) images is a challenging problem, which can assist in the diagnosis of esophageal cancer. In this paper, DB M-Net is proposed for the segmentation of esophagus and esophageal tumor from CT images, which combines M-Net modified from U-Net with an approximate function for binarization called differentiable binarization (DB). We construct the multi-scale input layers and the multi-level output layers in the network to facilitate features fusion, and DB is performed to enhance the robustness. Fewer parameters are applied in our DB M-Net but the network achieves a better performance. The experiments are based on the dataset of 2,219 slices from 16 CT scans, which show our DB M-Net outperforms other existing algorithms.
{"title":"DB M-Net: An Efficient Segmentation Network for Esophagus and Esophageal Tumor in Computed Tomography Images","authors":"Donghao Zhou, Guoheng Huang, W. Ling, Haomin Ni, Lianglun Cheng, Jian Zhou","doi":"10.1145/3421515.3421531","DOIUrl":"https://doi.org/10.1145/3421515.3421531","url":null,"abstract":"Esophageal cancer is one of the diseases afflicting human beings. Automatic segmentation of esophagus and esophageal tumor from computed tomography (CT) images is a challenging problem, which can assist in the diagnosis of esophageal cancer. In this paper, DB M-Net is proposed for the segmentation of esophagus and esophageal tumor from CT images, which combines M-Net modified from U-Net with an approximate function for binarization called differentiable binarization (DB). We construct the multi-scale input layers and the multi-level output layers in the network to facilitate features fusion, and DB is performed to enhance the robustness. Fewer parameters are applied in our DB M-Net but the network achieves a better performance. The experiments are based on the dataset of 2,219 slices from 16 CT scans, which show our DB M-Net outperforms other existing algorithms.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127938943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kanhong Xiao, Guoheng Huang, W. Ling, Lianglun Cheng, Tao Peng, Jian Zhou
Breast cancer is the most common cancer among women worldwide. The effective detection the location of breast cancer from the ultrasound images can assist doctors in diagnosing breast cancer. Diverse morphology, blurred edges, and small amount of data causes great difficulty in the detection of breast cancer. Deep learning is very advantageous when facing these problems. However, the problems of training on small sample datasets and the imbalance of positive and negative samples are problems that need to be solved. In order to improve the accuracy of ultrasound breast cancer detection, a small sample breast cancer detection method based on data augmentation and corner pooling is proposed in this paper. In this method, we propose a way for solving over-fitting of small samples and solving the imbalance problem of positive and negative samples. Data augmentation module based on geometric and noise transformation is proposed to solve the problem of small samples, and detection module based on focal loss and corner pooling is proposed to solve the problem of imbalance samples. The experiment found that the method used in this paper has more advantages than the mainstream methods in difficult to distinguish samples. The method used in this paper has an AP of 84.65%, which is higher than state-of-the-art methods.
{"title":"Breast Cancer Detection of Small Sample Based on Data Augmentation and Corner Pooling","authors":"Kanhong Xiao, Guoheng Huang, W. Ling, Lianglun Cheng, Tao Peng, Jian Zhou","doi":"10.1145/3421515.3421526","DOIUrl":"https://doi.org/10.1145/3421515.3421526","url":null,"abstract":"Breast cancer is the most common cancer among women worldwide. The effective detection the location of breast cancer from the ultrasound images can assist doctors in diagnosing breast cancer. Diverse morphology, blurred edges, and small amount of data causes great difficulty in the detection of breast cancer. Deep learning is very advantageous when facing these problems. However, the problems of training on small sample datasets and the imbalance of positive and negative samples are problems that need to be solved. In order to improve the accuracy of ultrasound breast cancer detection, a small sample breast cancer detection method based on data augmentation and corner pooling is proposed in this paper. In this method, we propose a way for solving over-fitting of small samples and solving the imbalance problem of positive and negative samples. Data augmentation module based on geometric and noise transformation is proposed to solve the problem of small samples, and detection module based on focal loss and corner pooling is proposed to solve the problem of imbalance samples. The experiment found that the method used in this paper has more advantages than the mainstream methods in difficult to distinguish samples. The method used in this paper has an AP of 84.65%, which is higher than state-of-the-art methods.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125012583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With about 300 million people in the world suffer from depression, depressive disorder has become a major health problem in the world. The 2017 Audio/Visual Emotion Challenge required Participants to build a model in order to detect depression based on audio, video, and text data. In this paper, we use single-modality, transcribed text data, for depression detection. We proposed a decision fusion model which combines Bert text embedding of interview transcript and key phrases recognition. Text embedding module is composed of Bert embedding model and LSTM network. Key phrases recognition module recognizes words such as “depression”, “cannot sleep” that are believed to be valuable in improving the recognition accuracy. We fuse the two identification methods at the decision level. Our proposed decision fusion model outperforms previous single-modality approaches in terms of classification accuracy. The F1 scores and precision is 0.81 and 0.82, respectively.
{"title":"Text-based Decision Fusion Model for Detecting Depression","authors":"Yufeng Zhang, Yingxue Wang, Xueli Wang, Bochao Zou, Haiyong Xie","doi":"10.1145/3421515.3421516","DOIUrl":"https://doi.org/10.1145/3421515.3421516","url":null,"abstract":"With about 300 million people in the world suffer from depression, depressive disorder has become a major health problem in the world. The 2017 Audio/Visual Emotion Challenge required Participants to build a model in order to detect depression based on audio, video, and text data. In this paper, we use single-modality, transcribed text data, for depression detection. We proposed a decision fusion model which combines Bert text embedding of interview transcript and key phrases recognition. Text embedding module is composed of Bert embedding model and LSTM network. Key phrases recognition module recognizes words such as “depression”, “cannot sleep” that are believed to be valuable in improving the recognition accuracy. We fuse the two identification methods at the decision level. Our proposed decision fusion model outperforms previous single-modality approaches in terms of classification accuracy. The F1 scores and precision is 0.81 and 0.82, respectively.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122086407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital watermarking is a key technology to solve copyright protection and content authentication. Most existing watermarking algorithms are based on global embedding, which cannot well balance the imperceptibility and robustness of watermarking. This paper proposes an adaptive robust watermarking algorithm based on image texture, which mainly includes: (1) color image is converted from RGB space to Lab space and the stable scale invariant feature transform(SIFT) points are extracted based on L component as the embedding position of watermark; (2)the structured forest edge is extracted using machine learning as the watermark image which is decomposed by using lifting wavelet transform (LWT) and then encrypted using logical chaos transform; (3)in consideration of human visual system, the strength factor is adaptively selected by using the brightness information and texture complexity of the L component. Experimental results show that the proposed algorithm in Lab space has the better visual invisibility and robustness to resist various attacks, especially for cropping, noise and JPEG compression attacks in comparison with other related algorithms.
{"title":"Adaptive Robust Watermarking Algorithm Based on Image Texture","authors":"Xing Yang, Y. Liu, Tingge Zhu","doi":"10.1145/3421515.3421518","DOIUrl":"https://doi.org/10.1145/3421515.3421518","url":null,"abstract":"Digital watermarking is a key technology to solve copyright protection and content authentication. Most existing watermarking algorithms are based on global embedding, which cannot well balance the imperceptibility and robustness of watermarking. This paper proposes an adaptive robust watermarking algorithm based on image texture, which mainly includes: (1) color image is converted from RGB space to Lab space and the stable scale invariant feature transform(SIFT) points are extracted based on L component as the embedding position of watermark; (2)the structured forest edge is extracted using machine learning as the watermark image which is decomposed by using lifting wavelet transform (LWT) and then encrypted using logical chaos transform; (3)in consideration of human visual system, the strength factor is adaptively selected by using the brightness information and texture complexity of the L component. Experimental results show that the proposed algorithm in Lab space has the better visual invisibility and robustness to resist various attacks, especially for cropping, noise and JPEG compression attacks in comparison with other related algorithms.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116780645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many times, an ophthalmologist will infer the health of the eye, the development of eye diseases, and the recovery by observing the morphological changes of the iris tissue. Therefore, accurate and automatic segmentation of the iris is a very important task. In this paper, we propose an iris segmentation method to tackle with the partly occlusion case that includes fast eye detection based on MTCNN, iris segmentation based on Weighted FCN and Hough Transform and coordinate correction for radius of iris in the real world. Firstly, we apply Multi-task Cascaded Convolutional Networks for eye detection, which is light and fast. Then we propose Weighted FCN and Hough Transform to segment the iris, even if the iris is partially occlusive. Finally, we design a calibration scheme to correct the iris radius in the real world. Experimental results show that the accuracy rate of the proposed method reaches 97.6% and precision rate 98.5%, superior to state-of-the-art methods.
{"title":"Fast Iris Segmentation under Partly Occlusion Based on MTCNN and Weighted FCN","authors":"Haomin Ni, Guoheng Huang, Lianglun Cheng, Donghao Zhou, Tao Wang, Feng Zhao","doi":"10.1145/3421515.3421529","DOIUrl":"https://doi.org/10.1145/3421515.3421529","url":null,"abstract":"Many times, an ophthalmologist will infer the health of the eye, the development of eye diseases, and the recovery by observing the morphological changes of the iris tissue. Therefore, accurate and automatic segmentation of the iris is a very important task. In this paper, we propose an iris segmentation method to tackle with the partly occlusion case that includes fast eye detection based on MTCNN, iris segmentation based on Weighted FCN and Hough Transform and coordinate correction for radius of iris in the real world. Firstly, we apply Multi-task Cascaded Convolutional Networks for eye detection, which is light and fast. Then we propose Weighted FCN and Hough Transform to segment the iris, even if the iris is partially occlusive. Finally, we design a calibration scheme to correct the iris radius in the real world. Experimental results show that the accuracy rate of the proposed method reaches 97.6% and precision rate 98.5%, superior to state-of-the-art methods.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125278468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Opioid as an addiction is a serious public health threat in the U.S., leads to massive deaths and other social problems. Medical treatment and mental supports are considering factors in rehabilitation process for opioid addicts. In this process families and friends play an important role in supporting and help the addict to stay clean. However, they may not know the best action to take due to lack of knowledge or certainty. Therefore, there are situations that addicts tend to use social media as a question/answering platform to seek answer for an inquiry. Unfortunately, It is often difficult to search over pages or different forums for a quick answer and it can be time-consuming, confusing and ultimately frustrating for the addicts. Hence, We propose a novel chatbot that is integrated with state-of-the-art deep learning techniques to retrieve an instant answer for a user’s query from Reddit social media. Our experiment illustrates that the chatbot provides answers in scenarios that there is no exact matched question in the discussion forums but there are questions with semantic similarities to the user query. Consequently, we illustrate real use cases where our chatbot retrieves responses from Reddit social media forums.
{"title":"Robo : A Counselor Chatbot for Opioid Addicted Patients","authors":"M. Moghadasi, Yuan Zhuang, Hashim Abu-gellban","doi":"10.1145/3421515.3421525","DOIUrl":"https://doi.org/10.1145/3421515.3421525","url":null,"abstract":"Opioid as an addiction is a serious public health threat in the U.S., leads to massive deaths and other social problems. Medical treatment and mental supports are considering factors in rehabilitation process for opioid addicts. In this process families and friends play an important role in supporting and help the addict to stay clean. However, they may not know the best action to take due to lack of knowledge or certainty. Therefore, there are situations that addicts tend to use social media as a question/answering platform to seek answer for an inquiry. Unfortunately, It is often difficult to search over pages or different forums for a quick answer and it can be time-consuming, confusing and ultimately frustrating for the addicts. Hence, We propose a novel chatbot that is integrated with state-of-the-art deep learning techniques to retrieve an instant answer for a user’s query from Reddit social media. Our experiment illustrates that the chatbot provides answers in scenarios that there is no exact matched question in the discussion forums but there are questions with semantic similarities to the user query. Consequently, we illustrate real use cases where our chatbot retrieves responses from Reddit social media forums.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131736868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunhui He, Yanli Hu, Aixia Zhou, Zhen Tan, Chong Zhang, Bin Ge
As the way of Internet information transfer, web news plays a significant role in information sharing. Considering that web news usually contains a lot of content, after in-depth analysis, we found that not all content is related to the news topic, and a lot of web news contains some noise content, and these noises content have serious interference to the text classification task. So, how to filter noise and purify web news content to improve the accuracy of web news classification has become a challenging problem. In this paper, we proposed a web news classification method via fusing noise detection, BERT-based semantic similarity noise filtering and convolutional neural network (NF-CNN) to solve the problem. In order to comprehensively evaluate the performance of the method, we use the Chinese public news classification dataset to evaluate it. The experimental results demonstrate that our method can effectively detect and filter a lot of noise text and the average F1 score can reach 95.61% on web news classification task.
{"title":"A Web News Classification Method: Fusion Noise Filtering and Convolutional Neural Network","authors":"Chunhui He, Yanli Hu, Aixia Zhou, Zhen Tan, Chong Zhang, Bin Ge","doi":"10.1145/3421515.3421523","DOIUrl":"https://doi.org/10.1145/3421515.3421523","url":null,"abstract":"As the way of Internet information transfer, web news plays a significant role in information sharing. Considering that web news usually contains a lot of content, after in-depth analysis, we found that not all content is related to the news topic, and a lot of web news contains some noise content, and these noises content have serious interference to the text classification task. So, how to filter noise and purify web news content to improve the accuracy of web news classification has become a challenging problem. In this paper, we proposed a web news classification method via fusing noise detection, BERT-based semantic similarity noise filtering and convolutional neural network (NF-CNN) to solve the problem. In order to comprehensively evaluate the performance of the method, we use the Chinese public news classification dataset to evaluate it. The experimental results demonstrate that our method can effectively detect and filter a lot of noise text and the average F1 score can reach 95.61% on web news classification task.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132452556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}