头颈部癌症语音清晰度感知判断的自动建模。

IF 2.1 3区医学 Q2 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY International Journal of Language & Communication Disorders Pub Date : 2024-01-18 DOI:10.1111/1460-6984.13004

Sebastião Quintas, Mathieu Balaguer, Julie Mauclair, Virginie Woisard, Julien Pinquier

{"title":"头颈部癌症语音清晰度感知判断的自动建模。","authors":"Sebastião Quintas, Mathieu Balaguer, Julie Mauclair, Virginie Woisard, Julien Pinquier","doi":"10.1111/1460-6984.13004","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Perceptual measures such as speech intelligibility are known to be biased, variant and subjective, to which an automatic approach has been seen as a more reliable alternative. On the other hand, automatic approaches tend to lack explainability, an aspect that can prevent the widespread usage of these technologies clinically.</p>\n </section>\n \n <section>\n \n <h3> Aims</h3>\n \n <p>In the present work, we aim to study the relationship between four perceptual parameters and speech intelligibility by automatically modelling the behaviour of six perceptual judges, in the context of head and neck cancer. From this evaluation we want to assess the different levels of relevance of each parameter as well as the different judge profiles that arise, both perceptually and automatically.</p>\n </section>\n \n <section>\n \n <h3> Methods and Procedures</h3>\n \n <p>Based on a passage reading task from the Carcinologic Speech Severity Index (C2SI) corpus, six expert listeners assessed the voice quality, resonance, prosody and phonemic distortions, as well as the speech intelligibility of patients treated for oral or oropharyngeal cancer. A statistical analysis and an ensemble of automatic systems, one per judge, were devised, where speech intelligibility is predicted as a function of the four aforementioned perceptual parameters of voice quality, resonance, prosody and phonemic distortions.</p>\n </section>\n \n <section>\n \n <h3> Outcomes and Results</h3>\n \n <p>The results suggest that we can automatically predict speech intelligibility as a function of the four aforementioned perceptual parameters, achieving a high correlation of 0.775 (Spearman's <i>ρ</i>). Furthermore, different judge profiles were found perceptually that were successfully modelled automatically.</p>\n </section>\n \n <section>\n \n <h3> Conclusions and Implications</h3>\n \n <p>The four investigated perceptual parameters influence the global rating of speech intelligibility, showing that different judge profiles emerge. The proposed automatic approach displayed a more uniform profile across all judges, displaying a more reliable, unbiased and objective prediction. The system also adds an extra layer of interpretability, since speech intelligibility is regressed as a direct function of the individual prediction of the four perceptual parameters, an improvement over more black box approaches.</p>\n </section>\n \n <section>\n \n <h3> WHAT THIS PAPER ADDS</h3>\n \n <section>\n \n <h3> What is already known on this subject</h3>\n \n <div>\n <ul>\n \n <li>Speech intelligibility is a clinical measure typically used in the post-treatment assessment of speech affecting disorders, such as head and neck cancer. Their perceptual assessment is currently the main method of evaluation; however, it is known to be quite subjective since intelligibility can be seen as a combination of other perceptual parameters (voice quality, resonance, etc.). Given this, automatic approaches have been seen as a more viable alternative to the traditionally used perceptual assessments.</li>\n </ul>\n </div>\n </section>\n \n <section>\n \n <h3> What this study adds to existing knowledge</h3>\n \n <div>\n <ul>\n \n <li>The present work introduces a study based on the relationship between four perceptual parameters (voice quality, resonance, prosody and phonemic distortions) and speech intelligibility, by automatically modelling the behaviour of six perceptual judges. The results suggest that different judge profiles arise, both in the perceptual case as well as in the automatic models. These different profiles found showcase the different schools of thought that perceptual judges have, in comparison to the automatic judges, that display more uniform levels of relevance across all the four perceptual parameters. This aspect shows that an automatic approach promotes unbiased, reliable and more objective predictions.</li>\n </ul>\n </div>\n </section>\n \n <section>\n \n <h3> What are the clinical implications of this work?</h3>\n \n <div>\n <ul>\n \n <li>The automatic prediction of speech intelligibility, using a combination of four perceptual parameters, show that these approaches can achieve high correlations with the reference scores while maintaining a certain degree of explainability. The more uniform judge profiles found on the automatic case also display less biased results towards the four perceptual parameters. This aspect facilitates the clinical implementation of this class of systems, as opposed to the more subjective and harder to reproduce perceptual assessments.</li>\n </ul>\n </div>\n </section>\n </section>\n </div>","PeriodicalId":49182,"journal":{"name":"International Journal of Language & Communication Disorders","volume":"59 4","pages":"1422-1435"},"PeriodicalIF":2.1000,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1460-6984.13004","citationCount":"0","resultStr":"{\"title\":\"Automatic modelling of perceptual judges in the context of head and neck cancer speech intelligibility\",\"authors\":\"Sebastião Quintas, Mathieu Balaguer, Julie Mauclair, Virginie Woisard, Julien Pinquier\",\"doi\":\"10.1111/1460-6984.13004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Perceptual measures such as speech intelligibility are known to be biased, variant and subjective, to which an automatic approach has been seen as a more reliable alternative. On the other hand, automatic approaches tend to lack explainability, an aspect that can prevent the widespread usage of these technologies clinically.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Aims</h3>\\n \\n <p>In the present work, we aim to study the relationship between four perceptual parameters and speech intelligibility by automatically modelling the behaviour of six perceptual judges, in the context of head and neck cancer. From this evaluation we want to assess the different levels of relevance of each parameter as well as the different judge profiles that arise, both perceptually and automatically.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods and Procedures</h3>\\n \\n <p>Based on a passage reading task from the Carcinologic Speech Severity Index (C2SI) corpus, six expert listeners assessed the voice quality, resonance, prosody and phonemic distortions, as well as the speech intelligibility of patients treated for oral or oropharyngeal cancer. A statistical analysis and an ensemble of automatic systems, one per judge, were devised, where speech intelligibility is predicted as a function of the four aforementioned perceptual parameters of voice quality, resonance, prosody and phonemic distortions.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Outcomes and Results</h3>\\n \\n <p>The results suggest that we can automatically predict speech intelligibility as a function of the four aforementioned perceptual parameters, achieving a high correlation of 0.775 (Spearman's <i>ρ</i>). Furthermore, different judge profiles were found perceptually that were successfully modelled automatically.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions and Implications</h3>\\n \\n <p>The four investigated perceptual parameters influence the global rating of speech intelligibility, showing that different judge profiles emerge. The proposed automatic approach displayed a more uniform profile across all judges, displaying a more reliable, unbiased and objective prediction. The system also adds an extra layer of interpretability, since speech intelligibility is regressed as a direct function of the individual prediction of the four perceptual parameters, an improvement over more black box approaches.</p>\\n </section>\\n \\n <section>\\n \\n <h3> WHAT THIS PAPER ADDS</h3>\\n \\n <section>\\n \\n <h3> What is already known on this subject</h3>\\n \\n <div>\\n <ul>\\n \\n <li>Speech intelligibility is a clinical measure typically used in the post-treatment assessment of speech affecting disorders, such as head and neck cancer. Their perceptual assessment is currently the main method of evaluation; however, it is known to be quite subjective since intelligibility can be seen as a combination of other perceptual parameters (voice quality, resonance, etc.). Given this, automatic approaches have been seen as a more viable alternative to the traditionally used perceptual assessments.</li>\\n </ul>\\n </div>\\n </section>\\n \\n <section>\\n \\n <h3> What this study adds to existing knowledge</h3>\\n \\n <div>\\n <ul>\\n \\n <li>The present work introduces a study based on the relationship between four perceptual parameters (voice quality, resonance, prosody and phonemic distortions) and speech intelligibility, by automatically modelling the behaviour of six perceptual judges. The results suggest that different judge profiles arise, both in the perceptual case as well as in the automatic models. These different profiles found showcase the different schools of thought that perceptual judges have, in comparison to the automatic judges, that display more uniform levels of relevance across all the four perceptual parameters. This aspect shows that an automatic approach promotes unbiased, reliable and more objective predictions.</li>\\n </ul>\\n </div>\\n </section>\\n \\n <section>\\n \\n <h3> What are the clinical implications of this work?</h3>\\n \\n <div>\\n <ul>\\n \\n <li>The automatic prediction of speech intelligibility, using a combination of four perceptual parameters, show that these approaches can achieve high correlations with the reference scores while maintaining a certain degree of explainability. The more uniform judge profiles found on the automatic case also display less biased results towards the four perceptual parameters. This aspect facilitates the clinical implementation of this class of systems, as opposed to the more subjective and harder to reproduce perceptual assessments.</li>\\n </ul>\\n </div>\\n </section>\\n </section>\\n </div>\",\"PeriodicalId\":49182,\"journal\":{\"name\":\"International Journal of Language & Communication Disorders\",\"volume\":\"59 4\",\"pages\":\"1422-1435\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1460-6984.13004\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Language & Communication Disorders\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/1460-6984.13004\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Language & Communication Disorders","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1460-6984.13004","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：众所周知，语音清晰度等感知测量方法存在偏差、变异和主观性，因此自动方法被视为更可靠的替代方法。另一方面，自动方法往往缺乏可解释性，这可能会阻碍这些技术在临床上的广泛应用。目的：在本研究中，我们以头颈部癌症为背景，通过自动模拟六位感知评判员的行为，研究四个感知参数与语音清晰度之间的关系。通过这一评估，我们希望从感知和自动两方面来评估每个参数的不同相关程度以及所产生的不同评判标准：方法和程序：基于癌症语音严重程度指数（C2SI）语料库中的段落阅读任务，六位专家听者对口腔癌或口咽癌患者的语音质量、共鸣、拟声和音位失真以及语音可懂度进行了评估。我们设计了一套统计分析和自动系统，每个评委一个系统，根据上述四个感知参数（语音质量、共鸣、前奏和音位失真）的函数预测语音清晰度：结果表明，我们可以根据上述四个感知参数自动预测语音清晰度，相关性高达 0.775（Spearman's ρ）。此外，还发现了不同的评委感知特征，并成功地进行了自动建模：所研究的四个感知参数影响了对语音清晰度的总体评价，显示出不同的评判特征。所提出的自动方法在所有评委中显示出更加统一的特征，从而做出更加可靠、公正和客观的预测。该系统还增加了一层额外的可解释性，因为语音清晰度是四个感知参数的个人预测的直接回归函数，这是对黑箱方法的一种改进：关于此主题的已知信息语言清晰度是一种临床测量指标，通常用于头颈部癌症等语言障碍的治疗后评估。其感知评估是目前主要的评估方法；然而，由于可懂度可以被视为其他感知参数（语音质量、共鸣等）的组合，因此它被认为是相当主观的。有鉴于此，自动方法被认为是传统感知评估方法的一种更可行的替代方法。本研究对现有知识的补充本研究以四个感知参数（语音质量、共鸣、前奏和音位失真）与语音可懂度之间的关系为基础，通过自动模拟六位感知评判者的行为进行研究。结果表明，无论是在感知情况下还是在自动建模中，都会出现不同的评判标准。与自动法官相比，感知法官在所有四个感知参数的相关性方面表现出更一致的水平。这表明，自动方法可促进无偏见、可靠和更客观的预测。这项工作的临床意义是什么？使用四种感知参数组合自动预测语音清晰度的结果表明，这些方法可以实现与参考分数的高度相关性，同时保持一定程度的可解释性。在自动案例中发现的更统一的法官特征也显示出对四个感知参数较少的偏差结果。与主观性更强、更难再现的感知评估相比，这一方面有利于这类系统的临床实施。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Automatic modelling of perceptual judges in the context of head and neck cancer speech intelligibility

Background

Perceptual measures such as speech intelligibility are known to be biased, variant and subjective, to which an automatic approach has been seen as a more reliable alternative. On the other hand, automatic approaches tend to lack explainability, an aspect that can prevent the widespread usage of these technologies clinically.

Aims

In the present work, we aim to study the relationship between four perceptual parameters and speech intelligibility by automatically modelling the behaviour of six perceptual judges, in the context of head and neck cancer. From this evaluation we want to assess the different levels of relevance of each parameter as well as the different judge profiles that arise, both perceptually and automatically.

Methods and Procedures

Based on a passage reading task from the Carcinologic Speech Severity Index (C2SI) corpus, six expert listeners assessed the voice quality, resonance, prosody and phonemic distortions, as well as the speech intelligibility of patients treated for oral or oropharyngeal cancer. A statistical analysis and an ensemble of automatic systems, one per judge, were devised, where speech intelligibility is predicted as a function of the four aforementioned perceptual parameters of voice quality, resonance, prosody and phonemic distortions.

Outcomes and Results

The results suggest that we can automatically predict speech intelligibility as a function of the four aforementioned perceptual parameters, achieving a high correlation of 0.775 (Spearman's ρ). Furthermore, different judge profiles were found perceptually that were successfully modelled automatically.

Conclusions and Implications

The four investigated perceptual parameters influence the global rating of speech intelligibility, showing that different judge profiles emerge. The proposed automatic approach displayed a more uniform profile across all judges, displaying a more reliable, unbiased and objective prediction. The system also adds an extra layer of interpretability, since speech intelligibility is regressed as a direct function of the individual prediction of the four perceptual parameters, an improvement over more black box approaches.

WHAT THIS PAPER ADDS

What is already known on this subject

Speech intelligibility is a clinical measure typically used in the post-treatment assessment of speech affecting disorders, such as head and neck cancer. Their perceptual assessment is currently the main method of evaluation; however, it is known to be quite subjective since intelligibility can be seen as a combination of other perceptual parameters (voice quality, resonance, etc.). Given this, automatic approaches have been seen as a more viable alternative to the traditionally used perceptual assessments.

What this study adds to existing knowledge

The present work introduces a study based on the relationship between four perceptual parameters (voice quality, resonance, prosody and phonemic distortions) and speech intelligibility, by automatically modelling the behaviour of six perceptual judges. The results suggest that different judge profiles arise, both in the perceptual case as well as in the automatic models. These different profiles found showcase the different schools of thought that perceptual judges have, in comparison to the automatic judges, that display more uniform levels of relevance across all the four perceptual parameters. This aspect shows that an automatic approach promotes unbiased, reliable and more objective predictions.

What are the clinical implications of this work?

The automatic prediction of speech intelligibility, using a combination of four perceptual parameters, show that these approaches can achieve high correlations with the reference scores while maintaining a certain degree of explainability. The more uniform judge profiles found on the automatic case also display less biased results towards the four perceptual parameters. This aspect facilitates the clinical implementation of this class of systems, as opposed to the more subjective and harder to reproduce perceptual assessments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Language & Communication Disorders AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY-REHABILITATION

CiteScore

3.30

自引率

12.50%

发文量

116

审稿时长

6-12 weeks

期刊介绍： The International Journal of Language & Communication Disorders (IJLCD) is the official journal of the Royal College of Speech & Language Therapists. The Journal welcomes submissions on all aspects of speech, language, communication disorders and speech and language therapy. It provides a forum for the exchange of information and discussion of issues of clinical or theoretical relevance in the above areas.