语言学家的泊松回归:用brms建模计数数据的教程介绍

IF 2.8 0 LANGUAGE & LINGUISTICS Language and Linguistics Compass Pub Date : 2021-11-16 DOI:10.1111/lnc3.12439

Bodo Winter, Paul-Christian Bürkner

{"title":"语言学家的泊松回归:用brms建模计数数据的教程介绍","authors":"Bodo Winter, Paul-Christian Bürkner","doi":"10.1111/lnc3.12439","DOIUrl":null,"url":null,"abstract":"Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterising count data with no or unknown upper bound. Given the prevalence of count data in linguistics, Poisson regression has wide utility no matter what subfield of linguistics is considered. However, in contrast to logistic regression, Poisson regression is surprisingly little known. Here, we make a case for why linguists need to consider Poisson regression, and give recommendations for when Poisson regression is more appropriate compared to logistic regression. This tutorial introduces readers to foundational concepts needed to understand the basics of Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.","PeriodicalId":47472,"journal":{"name":"Language and Linguistics Compass","volume":"15 11","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://compass.onlinelibrary.wiley.com/doi/epdf/10.1111/lnc3.12439","citationCount":"26","resultStr":"{\"title\":\"Poisson regression for linguists: A tutorial introduction to modelling count data with brms\",\"authors\":\"Bodo Winter, Paul-Christian Bürkner\",\"doi\":\"10.1111/lnc3.12439\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterising count data with no or unknown upper bound. Given the prevalence of count data in linguistics, Poisson regression has wide utility no matter what subfield of linguistics is considered. However, in contrast to logistic regression, Poisson regression is surprisingly little known. Here, we make a case for why linguists need to consider Poisson regression, and give recommendations for when Poisson regression is more appropriate compared to logistic regression. This tutorial introduces readers to foundational concepts needed to understand the basics of Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.\",\"PeriodicalId\":47472,\"journal\":{\"name\":\"Language and Linguistics Compass\",\"volume\":\"15 11\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2021-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://compass.onlinelibrary.wiley.com/doi/epdf/10.1111/lnc3.12439\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language and Linguistics Compass\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/lnc3.12439\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language and Linguistics Compass","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/lnc3.12439","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 26

摘要

计数数据在语言学的许多不同领域都很流行，例如在计数单词、句法结构、语篇小品、格标记或语音错误时。泊松分布是描述没有上界或未知上界的计数数据的典型分布。鉴于计数数据在语言学中的普遍存在，泊松回归无论在语言学的哪个子领域都具有广泛的实用性。然而，与逻辑回归相比，泊松回归令人惊讶地鲜为人知。在这里，我们提出了一个案例，为什么语言学家需要考虑泊松回归，并给出了泊松回归何时比逻辑回归更合适的建议。本教程向读者介绍了理解泊松回归基础所需的基本概念，然后是使用R包brms的动手教程。我们讨论了一个数据集，其中加泰罗尼亚语和韩语使用者改变了他们共同语音手势的频率，作为礼貌上下文的函数。该数据集还涉及暴露变量(合并时间以处理不等间隔)和过度分散(过度方差)。总之，我们希望更多的语言学家将泊松回归用于计数数据的分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Poisson regression for linguists: A tutorial introduction to modelling count data with brms

Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterising count data with no or unknown upper bound. Given the prevalence of count data in linguistics, Poisson regression has wide utility no matter what subfield of linguistics is considered. However, in contrast to logistic regression, Poisson regression is surprisingly little known. Here, we make a case for why linguists need to consider Poisson regression, and give recommendations for when Poisson regression is more appropriate compared to logistic regression. This tutorial introduces readers to foundational concepts needed to understand the basics of Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Language and Linguistics Compass LANGUAGE & LINGUISTICS-

CiteScore

5.40

自引率

4.00%

发文量

期刊介绍： Unique in its range, Language and Linguistics Compass is an online-only journal publishing original, peer-reviewed surveys of current research from across the entire discipline. Language and Linguistics Compass publishes state-of-the-art reviews, supported by a comprehensive bibliography and accessible to an international readership. Language and Linguistics Compass is aimed at senior undergraduates, postgraduates and academics, and will provide a unique reference tool for researching essays, preparing lectures, writing a research proposal, or just keeping up with new developments in a specific area of interest.

期刊最新文献

A Guide to Build (ING) GLMM Trees in Canadian Maritime English: Part 2, Linguistic Factors From Psycholinguistics to Computer Vision. A Comprehensive Review of Object Naming Data and Studies Phonetics and Phonology of Voiceless Nasals Phonetics and Phonology of Voiceless Nasals Issue Information