{"title":"The text-package: An R-package for analyzing and visualizing human language using natural language processing and transformers.","authors":"Oscar Kjell, Salvatore Giorgi, H Andrew Schwartz","doi":"10.1037/met0000542","DOIUrl":null,"url":null,"abstract":"<p><p>The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (https://r-text.org/), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. The text-package is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large data sets. The tutorial describes methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel pipelines. The reader learns about three core methods: (1) textEmbed(): to transform text to modern transformer-based word embeddings; (2) textTrain() and textPredict(): to train predictive models with embeddings as input, and use the models to predict from; (3) textSimilarity() and textDistance(): to compute semantic similarity/distance scores between texts. The reader also learns about two extended methods: (1) textProjection()/textProjectionPlot() and (2) textCentrality()/textCentralityPlot(): to examine and visualize text within the embedding space. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1478-1498"},"PeriodicalIF":7.6000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000542","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/5/1 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (https://r-text.org/), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. The text-package is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large data sets. The tutorial describes methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel pipelines. The reader learns about three core methods: (1) textEmbed(): to transform text to modern transformer-based word embeddings; (2) textTrain() and textPredict(): to train predictive models with embeddings as input, and use the models to predict from; (3) textSimilarity() and textDistance(): to compute semantic similarity/distance scores between texts. The reader also learns about two extended methods: (1) textProjection()/textProjectionPlot() and (2) textCentrality()/textCentralityPlot(): to examine and visualize text within the embedding space. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
期刊介绍:
Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.