Human-Centered Design of Voice Communications: Gender Aspects

J. Holub, Yann Kowalczuk
{"title":"Human-Centered Design of Voice Communications: Gender Aspects","authors":"J. Holub, Yann Kowalczuk","doi":"10.54941/ahfe1002926","DOIUrl":null,"url":null,"abstract":"Perceiving the transmitted speech is a task that puts certain amount of cognitive\n load on the human brain. The degree of this load depends on several factors, e.g., the\n loudness of the perceived speech, the type and intensity of background noise, the\n quality and accent of the speech, familiarity with the topic of the message, etc. This\n load also varies between the native and non-native language (of the listener). Different\n levels of such load are manifested in longer duration workloads (e.g., during a work\n shift) by different levels of overall fatigue, which affects the decrease in the\n worker's action or decision error rate when performing other concurrent tasks (the\n so-called parallel-task paradigm). For technologies used in speech transmission or\n synthesis, e.g., in telecommunications, radio communications, and machine to human\n communications, the above implies a strong need to optimize the coding of human (or\n synthetic) voice to minimize listening effort during communication. Listening effort\n (LE) can be assessed by subjective tests following, e.g., ITU-T P.800 Recommendation,\n along with listening quality (LQ) as specified in P.800. A natural (but nowhere\n explicitely mentioned) requirement is that male and female voices are transferred with\n similar LQ and LE parameters; in other words, the transmission technology, including\n coding algorithms, frequency filters, or sampling rates, should not privilege one gender\n over the other to maintain similar working conditions and opportunities for all.The\n subjective test laboratory has performed gender analysis for all subjective test\n projects since 2018 to see how (mis)balanced the transmission quality between male and\n female speakers is. The identified misbalance can affect many professionals that deploy\n distant voice communication in their daily duties – think of female airport approach\n control dispatchers or other professionals (policewomen) who are principally handicapped\n by technological aspects of their job - worse voice transmission quality means higher\n listening effort is needed and may lead to consequent (subconscious) discomfort of their\n communication partners, or even intelligibility issues. Of course, this fact is not\n surprising for narrow-band or even old analog AM transmissions (as still used in\n AIRCOM). It can only be used as an argument to upgrade communication means to a suitable\n digital format. Unfortunately, some contemporary wide-band or even full-band digital\n communications also show statistically significant differences between quality of\n transferred male and female voices. The detailed results will be presented, including\n interesting systematic language dependencies (English, German, Mandarin).In the\n conclusions, suggestions for future codec designs considering the human-centric\n gender-balanced requirements are proposed. These include the minimum frequency response\n of the future coders, granularity of the perceptual frequency scaling, etc. Also,\n suggestions for gender neutrality of original (studio quality) recordings used to\n prepare the speech samples for the subjective tests are included.","PeriodicalId":383834,"journal":{"name":"Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial\n Intelligence and Future Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial\n Intelligence and Future Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1002926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Perceiving the transmitted speech is a task that puts certain amount of cognitive load on the human brain. The degree of this load depends on several factors, e.g., the loudness of the perceived speech, the type and intensity of background noise, the quality and accent of the speech, familiarity with the topic of the message, etc. This load also varies between the native and non-native language (of the listener). Different levels of such load are manifested in longer duration workloads (e.g., during a work shift) by different levels of overall fatigue, which affects the decrease in the worker's action or decision error rate when performing other concurrent tasks (the so-called parallel-task paradigm). For technologies used in speech transmission or synthesis, e.g., in telecommunications, radio communications, and machine to human communications, the above implies a strong need to optimize the coding of human (or synthetic) voice to minimize listening effort during communication. Listening effort (LE) can be assessed by subjective tests following, e.g., ITU-T P.800 Recommendation, along with listening quality (LQ) as specified in P.800. A natural (but nowhere explicitely mentioned) requirement is that male and female voices are transferred with similar LQ and LE parameters; in other words, the transmission technology, including coding algorithms, frequency filters, or sampling rates, should not privilege one gender over the other to maintain similar working conditions and opportunities for all.The subjective test laboratory has performed gender analysis for all subjective test projects since 2018 to see how (mis)balanced the transmission quality between male and female speakers is. The identified misbalance can affect many professionals that deploy distant voice communication in their daily duties – think of female airport approach control dispatchers or other professionals (policewomen) who are principally handicapped by technological aspects of their job - worse voice transmission quality means higher listening effort is needed and may lead to consequent (subconscious) discomfort of their communication partners, or even intelligibility issues. Of course, this fact is not surprising for narrow-band or even old analog AM transmissions (as still used in AIRCOM). It can only be used as an argument to upgrade communication means to a suitable digital format. Unfortunately, some contemporary wide-band or even full-band digital communications also show statistically significant differences between quality of transferred male and female voices. The detailed results will be presented, including interesting systematic language dependencies (English, German, Mandarin).In the conclusions, suggestions for future codec designs considering the human-centric gender-balanced requirements are proposed. These include the minimum frequency response of the future coders, granularity of the perceptual frequency scaling, etc. Also, suggestions for gender neutrality of original (studio quality) recordings used to prepare the speech samples for the subjective tests are included.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
以人为本的语音交流设计:性别方面
感知所传递的言语是一项给人类大脑带来一定认知负荷的任务。这种负荷的程度取决于几个因素,例如,感知语音的响度,背景噪声的类型和强度,语音的质量和口音,对信息主题的熟悉程度等。这个负载在(侦听器的)本机语言和非本机语言之间也有所不同。这种负荷的不同水平表现在持续时间较长的工作负荷中(例如,在轮班期间),通过不同水平的整体疲劳,这影响了工人在执行其他并发任务(所谓的并行任务范式)时的行动或决策错误率的降低。对于用于语音传输或合成的技术,例如,在电信、无线电通信和机器对人的通信中,上述情况意味着强烈需要优化人类(或合成)语音的编码,以尽量减少通信期间的收听努力。听力努力(LE)可以通过主观测试来评估,例如,遵循ITU-T P.800建议书,以及P.800中规定的听力质量(LQ)。一个自然的(但没有明确提到的)要求是男声和女声以相似的LQ和LE参数传递;换句话说,传输技术,包括编码算法、频率滤波器或采样率,不应该使一种性别优于另一种性别,以保持所有人都有类似的工作条件和机会。主观测试实验室从2018年开始对所有主观测试项目进行性别分析,以了解男性和女性说话者之间的传输质量如何平衡。这种发现的不平衡会影响许多在日常工作中使用远程语音通信的专业人员——想想女性机场进近控制调度员或其他专业人员(女警察),她们主要受到工作技术方面的限制——更差的语音传输质量意味着需要付出更高的倾听努力,并可能导致他们的沟通伙伴(潜意识)感到不舒服,甚至是可理解性问题。当然,对于窄带甚至老式的模拟调幅传输(仍在AIRCOM中使用),这一事实并不令人惊讶。它只能被用作将通信手段升级到合适的数字格式的论据。不幸的是,一些当代宽带甚至全频段数字通信也显示出男声和女声传输质量在统计上的显著差异。详细的结果将呈现,包括有趣的系统语言依赖(英语,德语,普通话)。在结论中,对未来的编解码器设计提出了以人为中心的性别平衡要求。这些包括未来编码器的最小频率响应,感知频率缩放的粒度等。此外,还包括用于准备主观测试的语音样本的原始(演播室质量)录音的性别中立建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Framework of Future Industrial Worker Characteristics On-Site and Remote Crowdsourcing of Accessibility Data for People with Mobility Impairments: A Case Study in Zurich’s District 1 Human-Centered Design of Voice Communications: Gender Aspects Engaging Students through Conversational Chatbots and Digital Content: A Climate Action Perspective The Course Glancer - Leveraging Interactive Visualization for Course Selection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1