{"title":"跨机器学习和深度学习模型的不同词嵌入技术性能评价","authors":"Tanmoy Mazumder, Shawan Das, Md. Hasibur Rahman, Tanjina Helaly, Tanmoy Sarkar Pias","doi":"10.1109/ICCIT57492.2022.10055572","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is one of the core fields of Natural Language Processing(NLP). Numerous machine learning and deep learning algorithms have been developed to achieve this task. Generally, deep learning models perform better in this task as they are trained on massive amounts of data. This, however, also poses a disadvantage as collecting sufficient amounts of data is a challenge and training with this data requires devices with high computational power. Word embedding is a vital step in applying machine learning models for NLP tasks. Different word embedding techniques affect the performance of machine learning algorithms. This paper evaluates GloVe, CountVectorizer, and TF-IDF embedding techniques with multiple machine learning models and proves that the right combination of embedding technique and machine learning model(TF-IDF+Logistic Regression: 87.75% accuracy) can achieve nearly the same performance or more as deep learning models (LSTM: 87.89%).","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"42 11-12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Evaluation of Different Word Embedding Techniques Across Machine Learning and Deep Learning Models\",\"authors\":\"Tanmoy Mazumder, Shawan Das, Md. Hasibur Rahman, Tanjina Helaly, Tanmoy Sarkar Pias\",\"doi\":\"10.1109/ICCIT57492.2022.10055572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is one of the core fields of Natural Language Processing(NLP). Numerous machine learning and deep learning algorithms have been developed to achieve this task. Generally, deep learning models perform better in this task as they are trained on massive amounts of data. This, however, also poses a disadvantage as collecting sufficient amounts of data is a challenge and training with this data requires devices with high computational power. Word embedding is a vital step in applying machine learning models for NLP tasks. Different word embedding techniques affect the performance of machine learning algorithms. This paper evaluates GloVe, CountVectorizer, and TF-IDF embedding techniques with multiple machine learning models and proves that the right combination of embedding technique and machine learning model(TF-IDF+Logistic Regression: 87.75% accuracy) can achieve nearly the same performance or more as deep learning models (LSTM: 87.89%).\",\"PeriodicalId\":255498,\"journal\":{\"name\":\"2022 25th International Conference on Computer and Information Technology (ICCIT)\",\"volume\":\"42 11-12\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 25th International Conference on Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIT57492.2022.10055572\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10055572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Evaluation of Different Word Embedding Techniques Across Machine Learning and Deep Learning Models
Sentiment analysis is one of the core fields of Natural Language Processing(NLP). Numerous machine learning and deep learning algorithms have been developed to achieve this task. Generally, deep learning models perform better in this task as they are trained on massive amounts of data. This, however, also poses a disadvantage as collecting sufficient amounts of data is a challenge and training with this data requires devices with high computational power. Word embedding is a vital step in applying machine learning models for NLP tasks. Different word embedding techniques affect the performance of machine learning algorithms. This paper evaluates GloVe, CountVectorizer, and TF-IDF embedding techniques with multiple machine learning models and proves that the right combination of embedding technique and machine learning model(TF-IDF+Logistic Regression: 87.75% accuracy) can achieve nearly the same performance or more as deep learning models (LSTM: 87.89%).