Jose Manuel Rivera Espejo, Sven De Maeyer, Steven Gillis
{"title":"Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores.","authors":"Jose Manuel Rivera Espejo, Sven De Maeyer, Steven Gillis","doi":"10.3758/s13428-024-02457-6","DOIUrl":null,"url":null,"abstract":"<p><p>When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.</p>","PeriodicalId":4,"journal":{"name":"ACS Applied Energy Materials","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362487/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Energy Materials","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-024-02457-6","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/24 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.
期刊介绍:
ACS Applied Energy Materials is an interdisciplinary journal publishing original research covering all aspects of materials, engineering, chemistry, physics and biology relevant to energy conversion and storage. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important energy applications.