{"title":"Contextualized vs. Static Word Embeddings for Word-based Analysis of Opposing Opinions","authors":"Wassakorn Sarakul, Attapol T. Rutherford","doi":"10.1109/JCSSE58229.2023.10202014","DOIUrl":null,"url":null,"abstract":"Word embeddings are useful for studying public opinions by summarizing opinions about a concept by finding the nearest neighbors in the word embedding space. Static word embeddings such as word2vec are powerful for handling large amounts of text, while contextualized word embeddings from transformer-based models yield better embeddings by some evaluation metrics. In this study, we explore the differences between static and contextualized embeddings for word-based analysis of opposing opinions. We find that pre-training is necessary for static embeddings when the corpus is small, but contextualized embeddings are superior. When the focus corpus is large, static embeddings reflect related concepts, while contextualized embeddings often show synonyms or cohypernyms. Static embeddings trained only on the focus corpus capture opposing opinions better than contextualized embeddings.","PeriodicalId":298838,"journal":{"name":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE58229.2023.10202014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Word embeddings are useful for studying public opinions by summarizing opinions about a concept by finding the nearest neighbors in the word embedding space. Static word embeddings such as word2vec are powerful for handling large amounts of text, while contextualized word embeddings from transformer-based models yield better embeddings by some evaluation metrics. In this study, we explore the differences between static and contextualized embeddings for word-based analysis of opposing opinions. We find that pre-training is necessary for static embeddings when the corpus is small, but contextualized embeddings are superior. When the focus corpus is large, static embeddings reflect related concepts, while contextualized embeddings often show synonyms or cohypernyms. Static embeddings trained only on the focus corpus capture opposing opinions better than contextualized embeddings.