Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores.

IF 5.4 3区 材料科学 Q2 CHEMISTRY, PHYSICAL ACS Applied Energy Materials Pub Date : 2024-10-01 Epub Date: 2024-07-24 DOI:10.3758/s13428-024-02457-6
Jose Manuel Rivera Espejo, Sven De Maeyer, Steven Gillis
{"title":"Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores.","authors":"Jose Manuel Rivera Espejo, Sven De Maeyer, Steven Gillis","doi":"10.3758/s13428-024-02457-6","DOIUrl":null,"url":null,"abstract":"<p><p>When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.</p>","PeriodicalId":4,"journal":{"name":"ACS Applied Energy Materials","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362487/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Energy Materials","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-024-02457-6","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/24 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
万事俱备,只欠东风:通过熵分数测量语音清晰度时的数据挑战。
在研究不可观测的复杂特征时,数据收集和汇总过程可能会给数据带来明显的特征,如边界性、测量误差、聚类、异常值和异方差性。如果不能综合处理这些特征,就会在统计方面遇到挑战,从而阻碍对这些特征的假设进行研究。本研究旨在证明贝叶斯β-比例广义线性潜在混合模型(β-比例GLLAMM)(Rabe-Hesketh等人,Psychometrika,69(2),167-90,2004a;计量经济学杂志,128(2),301-23,2004c,2004b;Skrondal和Rabe-Hesketh,2004年)在探讨有关语音清晰度的研究假设时处理数据特征的有效性。为实现这一目标,本研究重新审查了布南等人最初收集的自发语音样本转录数据(《儿童语言杂志》,50(1),78-103,2023 年)。这些数据被汇总为熵分数。研究比较了贝塔比例 GLLAMM 与正态线性混合模型 (LMM) (Holmes 等人,2019 年)的预测准确性,并考察了其从熵分数估计潜在可懂度的能力。该研究还说明了如何利用所提出的模型来探讨与说话人相关的因素对可懂度的影响。贝塔比例 GLLAMM 并非没有挑战;它的实施需要对数据生成过程和概率编程语言知识提出假设,而这两者都是贝叶斯方法的核心。然而,结果表明,该模型在预测经验现象方面优于普通的 LMM,而且能够量化潜在的可理解性。此外,提出的模型还有助于探索与说话者相关因素和可懂度有关的假设。最终,这项研究对有兴趣定量测量复杂、不可观测的结构,同时准确预测经验现象的研究人员和数据分析师具有重要意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACS Applied Energy Materials
ACS Applied Energy Materials Materials Science-Materials Chemistry
CiteScore
10.30
自引率
6.20%
发文量
1368
期刊介绍: ACS Applied Energy Materials is an interdisciplinary journal publishing original research covering all aspects of materials, engineering, chemistry, physics and biology relevant to energy conversion and storage. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important energy applications.
期刊最新文献
Red ginseng polysaccharide promotes ferroptosis in gastric cancer cells by inhibiting PI3K/Akt pathway through down-regulation of AQP3. Diagnostic value of 18F-PSMA-1007 PET/CT for predicting the pathological grade of prostate cancer. Correction. WYC-209 inhibited GC malignant progression by down-regulating WNT4 through RARα. Efficacy and pharmacodynamic effect of anti-CD73 and anti-PD-L1 monoclonal antibodies in combination with cytotoxic therapy: observations from mouse tumor models.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1