{"title":"基于加权平均词嵌入的文本分类器","authors":"AbdAllah Elsaadawy, Marwan Torki, Nagwa Ei-Makky","doi":"10.1109/JEC-ECC.2018.8679539","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new technique for text representation by generating a sentence vector using a weighted average of words representation where Naive Bayes log count ratio is used as the weight of each word. The quality of this representation is measured in a text classification task using FastText and Word2Vec models. Results show accuracy improvement over unweighted average techniques using the same models. Also, we compare our results to other traditional text representation and classification techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) and Naive Bayes Support Vector Machine (NBSVM).","PeriodicalId":197824,"journal":{"name":"2018 International Japan-Africa Conference on Electronics, Communications and Computations (JAC-ECC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Text Classifier Using Weighted Average Word Embedding\",\"authors\":\"AbdAllah Elsaadawy, Marwan Torki, Nagwa Ei-Makky\",\"doi\":\"10.1109/JEC-ECC.2018.8679539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a new technique for text representation by generating a sentence vector using a weighted average of words representation where Naive Bayes log count ratio is used as the weight of each word. The quality of this representation is measured in a text classification task using FastText and Word2Vec models. Results show accuracy improvement over unweighted average techniques using the same models. Also, we compare our results to other traditional text representation and classification techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) and Naive Bayes Support Vector Machine (NBSVM).\",\"PeriodicalId\":197824,\"journal\":{\"name\":\"2018 International Japan-Africa Conference on Electronics, Communications and Computations (JAC-ECC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Japan-Africa Conference on Electronics, Communications and Computations (JAC-ECC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JEC-ECC.2018.8679539\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Japan-Africa Conference on Electronics, Communications and Computations (JAC-ECC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JEC-ECC.2018.8679539","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Text Classifier Using Weighted Average Word Embedding
In this paper, we propose a new technique for text representation by generating a sentence vector using a weighted average of words representation where Naive Bayes log count ratio is used as the weight of each word. The quality of this representation is measured in a text classification task using FastText and Word2Vec models. Results show accuracy improvement over unweighted average techniques using the same models. Also, we compare our results to other traditional text representation and classification techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) and Naive Bayes Support Vector Machine (NBSVM).