Electroglottography based voice-to-MIDI real time converter with AI voice act classification

E. Donati, Christos Chousidis
{"title":"Electroglottography based voice-to-MIDI real time converter with AI voice act classification","authors":"E. Donati, Christos Chousidis","doi":"10.1109/MeMeA54994.2022.9856413","DOIUrl":null,"url":null,"abstract":"Voice-to-MIDI real-time conversion is a challenging task that presents a series of obstacles and complications. The main issue is the tracking of the pitch. The frequency tracking of human voice can be inaccurate and computationally expensive due to spectral complexity of voice sounds. Moreover, with microphone-based systems, the presence of environmental noise and neighbouring sounds further affect the accuracy of the frequency tracking. Another issue with the conversion of voice into MIDI, is the presence of non-singing phonemes. As every sound picked up by the microphone would go through the conversion system, any voice or sounded phonemes produced by the user will result in a MIDI output. This research addresses such issues by applying a novel experimental method which employs electroglottography, known to the medical community as EGG, as a source for the pitch tracking operation. Electroglottography improves both the accuracy of the tracking and the ease of processing as it delivers a direct evaluation of the vocal folds operation whilst bypassing any contamination from other sound sources. Furthermore, to address the issue of non-singing phonemes, the proposed method employs the use of neural networks for a real-time classification of the vocal act produced by the user.","PeriodicalId":106228,"journal":{"name":"2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MeMeA54994.2022.9856413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Voice-to-MIDI real-time conversion is a challenging task that presents a series of obstacles and complications. The main issue is the tracking of the pitch. The frequency tracking of human voice can be inaccurate and computationally expensive due to spectral complexity of voice sounds. Moreover, with microphone-based systems, the presence of environmental noise and neighbouring sounds further affect the accuracy of the frequency tracking. Another issue with the conversion of voice into MIDI, is the presence of non-singing phonemes. As every sound picked up by the microphone would go through the conversion system, any voice or sounded phonemes produced by the user will result in a MIDI output. This research addresses such issues by applying a novel experimental method which employs electroglottography, known to the medical community as EGG, as a source for the pitch tracking operation. Electroglottography improves both the accuracy of the tracking and the ease of processing as it delivers a direct evaluation of the vocal folds operation whilst bypassing any contamination from other sound sources. Furthermore, to address the issue of non-singing phonemes, the proposed method employs the use of neural networks for a real-time classification of the vocal act produced by the user.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于电声门图的语音- midi实时转换器与人工智能语音行为分类
语音到midi的实时转换是一项具有挑战性的任务,存在一系列障碍和复杂性。主要的问题是对球的跟踪。由于人声频谱的复杂性,人声的频率跟踪是不准确的,而且计算成本很高。此外,对于基于麦克风的系统,环境噪声和邻近声音的存在进一步影响频率跟踪的准确性。将声音转换为MIDI的另一个问题是存在非歌唱音素。由于麦克风接收到的每一个声音都会经过转换系统,因此用户产生的任何声音或声音音素都会产生MIDI输出。本研究通过应用一种新颖的实验方法来解决这些问题,该方法采用电声门图,医学界称为EGG,作为音高跟踪操作的来源。电声门图提高了跟踪的准确性和处理的便利性,因为它提供了声带操作的直接评估,同时绕过了任何来自其他声源的污染。此外,为了解决非歌唱音素的问题,该方法采用神经网络对用户产生的声乐行为进行实时分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Time and Frequency domain analysis of APB muscles Abduction in adult dominant hand using surface electromyography signals Comparison of Noise Reduction Techniques for Dysarthric Speech Recognition Effects of ROI positioning on the measurement of engineered muscle tissue contractility with an optical tracking method Atrial Fibrillation Detection by Means of Edge Computing on Wearable Device: A Feasibility Assessment Unraveling the biological meaning of radiomic features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1