Using word embeddings to investigate human psychology: Methods and applications

心理科学进展 Pub Date : 2023-01-01 DOI:10.3724/sp.j.1042.2023.00887

H. Bao, Zi-Xi Wang, Xi Cheng, Zhan Su, Ying-Hong Yang, Guang-Yao Zhang, Bo Wang, Hua-Jian Cai

{"title":"Using word embeddings to investigate human psychology: Methods and applications","authors":"H. Bao, Zi-Xi Wang, Xi Cheng, Zhan Su, Ying-Hong Yang, Guang-Yao Zhang, Bo Wang, Hua-Jian Cai","doi":"10.3724/sp.j.1042.2023.00887","DOIUrl":null,"url":null,"abstract":": As a basic technique in natural language processing (NLP), word embedding represents a word with a low-dimensional, dense, and continuous numeric vector (i.e., word vector). Word embeddings can be obtained by using neural network algorithms to predict words from the surrounding words or vice versa (Word2Vec and FastText) or words’ probability of co-occurrence (GloVe) in large-scale text corpora. In this case, the values of dimensions of a word vector denote the pattern of how a word can be predicted in a context, substantially connoting its semantic information. Therefore, word embeddings can be utilized for semantic analyses of text. In recent years, word embeddings have been rapidly employed to study human psychology, including human semantic processing, cognitive judgment, individual divergent thinking (creativity), group-level social cognition, sociocultural changes, and so forth. We have developed the R package “PsychWordVec” to help researchers utilize and analyze word embeddings in a tidy approach. Future research using word embeddings should (1) distinguish between implicit and explicit components of social cognition, (2) train fine-grained word vectors in terms of time and region to facilitate cross-temporal and cross-cultural research, and (3) deepen and expand the application of contextualized word embeddings and large pre-trained language models such as GPT and BERT","PeriodicalId":62025,"journal":{"name":"心理科学进展","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"心理科学进展","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.3724/sp.j.1042.2023.00887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

: As a basic technique in natural language processing (NLP), word embedding represents a word with a low-dimensional, dense, and continuous numeric vector (i.e., word vector). Word embeddings can be obtained by using neural network algorithms to predict words from the surrounding words or vice versa (Word2Vec and FastText) or words’ probability of co-occurrence (GloVe) in large-scale text corpora. In this case, the values of dimensions of a word vector denote the pattern of how a word can be predicted in a context, substantially connoting its semantic information. Therefore, word embeddings can be utilized for semantic analyses of text. In recent years, word embeddings have been rapidly employed to study human psychology, including human semantic processing, cognitive judgment, individual divergent thinking (creativity), group-level social cognition, sociocultural changes, and so forth. We have developed the R package “PsychWordVec” to help researchers utilize and analyze word embeddings in a tidy approach. Future research using word embeddings should (1) distinguish between implicit and explicit components of social cognition, (2) train fine-grained word vectors in terms of time and region to facilitate cross-temporal and cross-cultural research, and (3) deepen and expand the application of contextualized word embeddings and large pre-trained language models such as GPT and BERT

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用词嵌入来研究人类心理:方法和应用

词嵌入是自然语言处理(NLP)中的一项基本技术，它用一个低维、密集、连续的数字向量(即词向量)来表示一个词。词嵌入可以通过使用神经网络算法从周围的词中预测词(Word2Vec和FastText)或大规模文本语料库中的词的共现概率(GloVe)来获得。在这种情况下，单词向量的维度值表示如何在上下文中预测单词的模式，实质上包含其语义信息。因此，词嵌入可以用于文本的语义分析。近年来，词嵌入被迅速应用于人类心理研究，包括人类语义加工、认知判断、个体发散思维(创造力)、群体层面的社会认知、社会文化变迁等。我们开发了R软件包“PsychWordVec”，以帮助研究人员以一种整洁的方式利用和分析词嵌入。未来使用词嵌入的研究应该:(1)区分社会认知的内隐和外显成分;(2)在时间和区域上训练细粒度的词向量，以促进跨时间和跨文化的研究;(3)深化和扩展语境化词嵌入和大型预训练语言模型(如GPT和BERT)的应用

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

心理科学进展

自引率

0.00%

发文量

4819

期刊介绍：