通过多语言 twitter 文本嵌入分析政党立场。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Frontiers in Big Data Pub Date : 2024-05-30 eCollection Date: 2024-01-01 DOI:10.3389/fdata.2024.1330392

Jinghui Chen, Takayuki Mizuno, Shohei Doi

{"title":"通过多语言 twitter 文本嵌入分析政党立场。","authors":"Jinghui Chen, Takayuki Mizuno, Shohei Doi","doi":"10.3389/fdata.2024.1330392","DOIUrl":null,"url":null,"abstract":"Traditional monolingual word embedding models transform words into high-dimensional vectors which represent semantics relations between words as relationships between vectors in the high-dimensional space. They serve as productive tools to interpret multifarious aspects of the social world in social science research. Building on the previous research which interprets multifaceted meanings of words by projecting them onto word-level dimensions defined by differences between antonyms, we extend the architecture of establishing word-level cultural dimensions to the sentence level and adopt a Language-agnostic BERT model (LaBSE) to detect position similarities in a multi-language environment. We assess the efficacy of our sentence-level methodology using Twitter data from US politicians, comparing it to the traditional word-level embedding model. We also adopt Latent Dirichlet Allocation (LDA) to investigate detailed topics in these tweets and interpret politicians' positions from different angles. In addition, we adopt Twitter data from Spanish politicians and visualize their positions in a multi-language space to analyze position similarities across countries. The results show that our sentence-level methodology outperform traditional word-level model. We also demonstrate that our methodology is effective dealing with fine-sorted themes from the result that political positions towards different topics vary even within the same politicians. Through verification using American and Spanish political datasets, we find that the positioning of American and Spanish politicians on our defined liberal-conservative axis aligns with social common sense, political news, and previous research. Our architecture improves the standard word-level methodology and can be considered as a useful architecture for sentence-level applications in the future.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1330392"},"PeriodicalIF":2.4000,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169868/pdf/","citationCount":"0","resultStr":"{\"title\":\"Analyzing political party positions through multi-language twitter text embeddings.\",\"authors\":\"Jinghui Chen, Takayuki Mizuno, Shohei Doi\",\"doi\":\"10.3389/fdata.2024.1330392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional monolingual word embedding models transform words into high-dimensional vectors which represent semantics relations between words as relationships between vectors in the high-dimensional space. They serve as productive tools to interpret multifarious aspects of the social world in social science research. Building on the previous research which interprets multifaceted meanings of words by projecting them onto word-level dimensions defined by differences between antonyms, we extend the architecture of establishing word-level cultural dimensions to the sentence level and adopt a Language-agnostic BERT model (LaBSE) to detect position similarities in a multi-language environment. We assess the efficacy of our sentence-level methodology using Twitter data from US politicians, comparing it to the traditional word-level embedding model. We also adopt Latent Dirichlet Allocation (LDA) to investigate detailed topics in these tweets and interpret politicians' positions from different angles. In addition, we adopt Twitter data from Spanish politicians and visualize their positions in a multi-language space to analyze position similarities across countries. The results show that our sentence-level methodology outperform traditional word-level model. We also demonstrate that our methodology is effective dealing with fine-sorted themes from the result that political positions towards different topics vary even within the same politicians. Through verification using American and Spanish political datasets, we find that the positioning of American and Spanish politicians on our defined liberal-conservative axis aligns with social common sense, political news, and previous research. Our architecture improves the standard word-level methodology and can be considered as a useful architecture for sentence-level applications in the future.\",\"PeriodicalId\":52859,\"journal\":{\"name\":\"Frontiers in Big Data\",\"volume\":\"7 \",\"pages\":\"1330392\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169868/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Big Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fdata.2024.1330392\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdata.2024.1330392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

传统的单语词嵌入模型将词转化为高维向量，将词与词之间的语义关系表示为高维空间中向量之间的关系。在社会科学研究中，它们是解释社会世界多方面的有效工具。以往的研究通过将词语投射到由反义词之间的差异所定义的词语层面维度来解释词语的多层面含义，在此基础上，我们将建立词语层面文化维度的架构扩展到句子层面，并采用语言无关的 BERT 模型（LaBSE）来检测多语言环境中的位置相似性。我们使用美国政治家的 Twitter 数据评估了句子级方法的有效性，并将其与传统的词级嵌入模型进行了比较。我们还采用 Latent Dirichlet Allocation (LDA) 来研究这些推文中的详细主题，并从不同角度解读政治家的立场。此外，我们还采用了西班牙政治家的 Twitter 数据，并在多语言空间中将他们的立场可视化，以分析各国立场的相似性。结果表明，我们的句子级方法优于传统的单词级模型。我们还证明了我们的方法能有效地处理细分类主题，因为即使是同一政治人物，对不同主题的政治立场也各不相同。通过使用美国和西班牙的政治数据集进行验证，我们发现美国和西班牙政治家在我们定义的自由-保守轴线上的定位与社会常识、政治新闻和以往的研究相一致。我们的架构改进了标准的词级方法，可被视为未来句子级应用的有用架构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Analyzing political party positions through multi-language twitter text embeddings.

Traditional monolingual word embedding models transform words into high-dimensional vectors which represent semantics relations between words as relationships between vectors in the high-dimensional space. They serve as productive tools to interpret multifarious aspects of the social world in social science research. Building on the previous research which interprets multifaceted meanings of words by projecting them onto word-level dimensions defined by differences between antonyms, we extend the architecture of establishing word-level cultural dimensions to the sentence level and adopt a Language-agnostic BERT model (LaBSE) to detect position similarities in a multi-language environment. We assess the efficacy of our sentence-level methodology using Twitter data from US politicians, comparing it to the traditional word-level embedding model. We also adopt Latent Dirichlet Allocation (LDA) to investigate detailed topics in these tweets and interpret politicians' positions from different angles. In addition, we adopt Twitter data from Spanish politicians and visualize their positions in a multi-language space to analyze position similarities across countries. The results show that our sentence-level methodology outperform traditional word-level model. We also demonstrate that our methodology is effective dealing with fine-sorted themes from the result that political positions towards different topics vary even within the same politicians. Through verification using American and Spanish political datasets, we find that the positioning of American and Spanish politicians on our defined liberal-conservative axis aligns with social common sense, political news, and previous research. Our architecture improves the standard word-level methodology and can be considered as a useful architecture for sentence-level applications in the future.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Big Data Multiple-

CiteScore

5.20

自引率

3.20%

发文量

122

审稿时长

13 weeks