使用图像图式评估人工智能生成音频的描述质量

Purnima Kamath, Zhuoyao Li, Chitralekha Gupta, Kokil Jaidka, Suranga Nanayakkara, L. Wyse
{"title":"使用图像图式评估人工智能生成音频的描述质量","authors":"Purnima Kamath, Zhuoyao Li, Chitralekha Gupta, Kokil Jaidka, Suranga Nanayakkara, L. Wyse","doi":"10.1145/3581641.3584083","DOIUrl":null,"url":null,"abstract":"Novel AI-generated audio samples are evaluated for descriptive qualities such as the smoothness of a morph using crowdsourced human listening tests. However, the methods to design interfaces for such experiments and to effectively articulate the descriptive audio quality under test receive very little attention in the evaluation metrics literature. In this paper, we explore the use of visual metaphors of image-schema to design interfaces to evaluate AI-generated audio. Furthermore, we highlight the importance of framing and contextualizing a descriptive audio quality under measurement using such constructs. Using both pitched sounds and textures, we conduct two sets of experiments to investigate how the quality of responses vary with audio and task complexities. Our results show that, in both cases, by using image-schemas we can improve the quality and consensus of AI-generated audio evaluations. Our findings reinforce the importance of interface design for listening tests and stationary visual constructs to communicate temporal qualities of AI-generated audio samples, especially to naïve listeners on crowdsourced platforms.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Evaluating Descriptive Quality of AI-Generated Audio Using Image-Schemas\",\"authors\":\"Purnima Kamath, Zhuoyao Li, Chitralekha Gupta, Kokil Jaidka, Suranga Nanayakkara, L. Wyse\",\"doi\":\"10.1145/3581641.3584083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Novel AI-generated audio samples are evaluated for descriptive qualities such as the smoothness of a morph using crowdsourced human listening tests. However, the methods to design interfaces for such experiments and to effectively articulate the descriptive audio quality under test receive very little attention in the evaluation metrics literature. In this paper, we explore the use of visual metaphors of image-schema to design interfaces to evaluate AI-generated audio. Furthermore, we highlight the importance of framing and contextualizing a descriptive audio quality under measurement using such constructs. Using both pitched sounds and textures, we conduct two sets of experiments to investigate how the quality of responses vary with audio and task complexities. Our results show that, in both cases, by using image-schemas we can improve the quality and consensus of AI-generated audio evaluations. Our findings reinforce the importance of interface design for listening tests and stationary visual constructs to communicate temporal qualities of AI-generated audio samples, especially to naïve listeners on crowdsourced platforms.\",\"PeriodicalId\":118159,\"journal\":{\"name\":\"Proceedings of the 28th International Conference on Intelligent User Interfaces\",\"volume\":\"134 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th International Conference on Intelligent User Interfaces\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581641.3584083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Intelligent User Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581641.3584083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

使用众包的人类听力测试来评估新颖的人工智能生成的音频样本的描述性质量,例如变形的平滑度。然而,为这样的实验设计界面的方法,以及有效地表达测试中描述性音频质量的方法,在评估指标文献中很少受到关注。在本文中,我们探索了使用图像图式的视觉隐喻来设计界面以评估人工智能生成的音频。此外,我们强调了使用这些结构在测量下构建和语境化描述性音频质量的重要性。使用音调和纹理,我们进行了两组实验来研究响应质量如何随音频和任务复杂性而变化。我们的研究结果表明,在这两种情况下,通过使用图像模式,我们可以提高人工智能生成的音频评估的质量和共识。我们的研究结果强调了听力测试和静态视觉结构的界面设计的重要性,以传达人工智能生成的音频样本的时间质量,特别是向众包平台上的naïve听众。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating Descriptive Quality of AI-Generated Audio Using Image-Schemas
Novel AI-generated audio samples are evaluated for descriptive qualities such as the smoothness of a morph using crowdsourced human listening tests. However, the methods to design interfaces for such experiments and to effectively articulate the descriptive audio quality under test receive very little attention in the evaluation metrics literature. In this paper, we explore the use of visual metaphors of image-schema to design interfaces to evaluate AI-generated audio. Furthermore, we highlight the importance of framing and contextualizing a descriptive audio quality under measurement using such constructs. Using both pitched sounds and textures, we conduct two sets of experiments to investigate how the quality of responses vary with audio and task complexities. Our results show that, in both cases, by using image-schemas we can improve the quality and consensus of AI-generated audio evaluations. Our findings reinforce the importance of interface design for listening tests and stationary visual constructs to communicate temporal qualities of AI-generated audio samples, especially to naïve listeners on crowdsourced platforms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Interactive User Interface for Dialogue Summarization Human-Centered Deferred Inference: Measuring User Interactions and Setting Deferral Criteria for Human-AI Teams Drawing with Reframer: Emergence and Control in Co-Creative AI Don’t fail me! The Level 5 Autonomous Driving Information Dilemma regarding Transparency and User Experience It Seems Smart, but It Acts Stupid: Development of Trust in AI Advice in a Repeated Legal Decision-Making Task
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1