A guide on extracting and tidying tweets with R

Cadernos de Linguística Pub Date : 2021-12-03 DOI:10.25189/2675-4916.2021.v2.n4.id410

J. Adams, Carlos Augusto Jardim Chiarelli

引用次数: 0

Abstract

Social media platforms represent a deep resource for academic research and a wide range of untapped possibilities for linguists (D'ARCY; YOUNG, 2012). This rapidly developing field presents various ethical issues and unique challenges regarding methods to retrieve and analyze data. This tutorial provides a straightforward guide to harvesting and tidying Twitter data, focused mainly on the Tweets' text, by using the R programming language (R CORE TEAM, 2020) via Twitter's APIs. The R code was developed in Adams (2020), based on the rtweet package (KEARNEY, 2018), and successfully resulted in a script for corpora compilation. In this tutorial, we discuss limitations, problems, and solutions in our framework for conducting ethical research on this social networking site. Our ethical concerns go beyond what we "agree to" in terms of use and privacy policies, that is, we argue that their content does not contemplate all the concerns researchers need to attend to. Additionally, our aim is to show that using Twitter as a data source does not require advanced computational skills.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用R提取和整理推文的指南

社交媒体平台为学术研究提供了深厚的资源，为语言学家提供了广泛的未开发的可能性(D'ARCY;年轻,2012)。这个快速发展的领域提出了各种各样的伦理问题和关于检索和分析数据的方法的独特挑战。本教程提供了一个简单的指南，通过Twitter的api使用R编程语言(R CORE TEAM, 2020)来收集和整理Twitter数据，主要关注Twitter的文本。R代码是在Adams(2020)中基于rtweet包(KEARNEY, 2018)开发的，并成功生成了用于语料库编译的脚本。在本教程中，我们将讨论在这个社交网站上进行伦理研究的框架中的限制、问题和解决方案。我们的伦理问题超出了我们在使用和隐私政策方面“同意”的范围，也就是说，我们认为他们的内容没有考虑到研究人员需要关注的所有问题。此外，我们的目的是表明使用Twitter作为数据源并不需要高级的计算技能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Cadernos de Linguística

自引率

0.00%

发文量