The text-package: An R-package for analyzing and visualizing human language using natural language processing and transformers.

IF 7.6 1区心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY Psychological methods Pub Date : 2023-12-01 Epub Date: 2023-05-01 DOI:10.1037/met0000542

Oscar Kjell, Salvatore Giorgi, H Andrew Schwartz

{"title":"The text-package: An R-package for analyzing and visualizing human language using natural language processing and transformers.","authors":"Oscar Kjell, Salvatore Giorgi, H Andrew Schwartz","doi":"10.1037/met0000542","DOIUrl":null,"url":null,"abstract":"<p><p>The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (https://r-text.org/), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. The text-package is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large data sets. The tutorial describes methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel pipelines. The reader learns about three core methods: (1) textEmbed(): to transform text to modern transformer-based word embeddings; (2) textTrain() and textPredict(): to train predictive models with embeddings as input, and use the models to predict from; (3) textSimilarity() and textDistance(): to compute semantic similarity/distance scores between texts. The reader also learns about two extended methods: (1) textProjection()/textProjectionPlot() and (2) textCentrality()/textCentralityPlot(): to examine and visualize text within the embedding space. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1478-1498"},"PeriodicalIF":7.6000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000542","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/5/1 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (https://r-text.org/), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. The text-package is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large data sets. The tutorial describes methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel pipelines. The reader learns about three core methods: (1) textEmbed(): to transform text to modern transformer-based word embeddings; (2) textTrain() and textPredict(): to train predictive models with embeddings as input, and use the models to predict from; (3) textSimilarity() and textDistance(): to compute semantic similarity/distance scores between texts. The reader also learns about two extended methods: (1) textProjection()/textProjectionPlot() and (2) textCentrality()/textCentralityPlot(): to examine and visualize text within the embedding space. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

文本包使用自然语言处理和转换器分析人类语言并使之可视化的 R 软件包。

个人用于表达自己的语言包含丰富的心理信息。最近，自然语言处理（NLP）和深度学习（DL）（即转换器）领域取得了重大进展，在与理解自然语言相关的任务中取得了巨大的性能提升。然而，这些最先进的方法还没有被心理学研究人员轻松使用，也没有被设计为人类水平分析的最佳方法。本教程介绍文本 (https://r-text.org/)，这是一个新的 R 软件包，用于使用转换器、NLP 和 DL 的最新技术对人类语言进行分析和可视化。text 软件包既是一个用于访问最先进语言模型的模块化解决方案，也是一个用于人类语言分析的端到端解决方案。因此，文本软件包提供了用户友好的功能，可测试社会科学中相对较小和较大数据集的假设。本教程介绍了分析文本的方法，提供了具有可靠默认值的函数，这些函数可以现成使用，同时也为高级用户提供了一个框架，使他们可以在此基础上建立新的管道。读者将学习三种核心方法：(1) textEmbed()：将文本转换为基于转换器的现代词嵌入；(2) textTrain() 和 textPredict()：将嵌入作为输入来训练预测模型，并使用模型进行预测；(3) textSimilarity() 和 textDistance()：计算文本之间的语义相似性/距离分数。读者还将了解到两个扩展方法：(1) textProjection()/textProjectionPlot() 和 (2) textCentrality()/textCentralityPlot()：用于检查和可视化嵌入空间中的文本。(PsycInfo Database Record (c) 2024 APA，保留所有权利）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-

CiteScore

13.10

自引率

7.10%

发文量

159

期刊介绍： Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.