Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores.

IF 5 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Behavior Research Methods Pub Date : 2024-10-01 Epub Date: 2024-07-24 DOI:10.3758/s13428-024-02457-6

Jose Manuel Rivera Espejo, Sven De Maeyer, Steven Gillis

{"title":"Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores.","authors":"Jose Manuel Rivera Espejo, Sven De Maeyer, Steven Gillis","doi":"10.3758/s13428-024-02457-6","DOIUrl":null,"url":null,"abstract":"<p><p>When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8132-8154"},"PeriodicalIF":5.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362487/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-024-02457-6","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

万事俱备，只欠东风：通过熵分数测量语音清晰度时的数据挑战。

在研究不可观测的复杂特征时，数据收集和汇总过程可能会给数据带来明显的特征，如边界性、测量误差、聚类、异常值和异方差性。如果不能综合处理这些特征，就会在统计方面遇到挑战，从而阻碍对这些特征的假设进行研究。本研究旨在证明贝叶斯β-比例广义线性潜在混合模型（β-比例GLLAMM）（Rabe-Hesketh等人，Psychometrika，69（2），167-90，2004a；计量经济学杂志，128（2），301-23，2004c，2004b；Skrondal和Rabe-Hesketh，2004年）在探讨有关语音清晰度的研究假设时处理数据特征的有效性。为实现这一目标，本研究重新审查了布南等人最初收集的自发语音样本转录数据（《儿童语言杂志》，50（1），78-103，2023 年）。这些数据被汇总为熵分数。研究比较了贝塔比例 GLLAMM 与正态线性混合模型 (LMM) （Holmes 等人，2019 年）的预测准确性，并考察了其从熵分数估计潜在可懂度的能力。该研究还说明了如何利用所提出的模型来探讨与说话人相关的因素对可懂度的影响。贝塔比例 GLLAMM 并非没有挑战；它的实施需要对数据生成过程和概率编程语言知识提出假设，而这两者都是贝叶斯方法的核心。然而，结果表明，该模型在预测经验现象方面优于普通的 LMM，而且能够量化潜在的可理解性。此外，提出的模型还有助于探索与说话者相关因素和可懂度有关的假设。最终，这项研究对有兴趣定量测量复杂、不可观测的结构，同时准确预测经验现象的研究人员和数据分析师具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Behavior Research Methods Multiple-

CiteScore

10.30

自引率

9.30%

发文量

266

期刊介绍： Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.