Gbenga Omotara, Mark L. Berardi, Maria Dietrich, G. DeSouza
{"title":"由模式识别和有限自动机组成的流水线用于语音功能亢进研究中的VCV产品识别","authors":"Gbenga Omotara, Mark L. Berardi, Maria Dietrich, G. DeSouza","doi":"10.1109/SSCI50451.2021.9659927","DOIUrl":null,"url":null,"abstract":"Relative fundamental frequency (RFF) is an acoustic measure used to quantify vocal effort in voice science. Since it seeks to capture transitions between (i.e. to/from) steady-state vowels and unvoiced consonants, any machine learning approach to recognize patterns in these transitions should require time properties capable of identifying the sequence of phonemes. At the same time, Neural Networks (NN) have become a ubiquitous solution for data-driven problems, and Recursive NNs (RNN) provide a time-series schema to address time-dependent problems. Indeed, typical Neural Network solutions require either a time-series schema like in RNN or some spectral transformation to be able to handle time-dependent data. In this study, we decided to ignore - at least momentarily - any time-series dependency of the data and employed a simple NN to classify elements of the speech. Later, a State-Machine was used to identify their sequence with the purpose of localizing the transitions between voiced and unvoiced sounds in vowel-consonant-vowel (VCV) productions. The goal of this study was to demonstrate that a pipeline consisting of time-agnostic (Neural Network) and time-dependent (State Machine) components can be used to recognize time-dependent patterns in VCV productions.","PeriodicalId":255763,"journal":{"name":"2021 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Pipeline Consisting of Pattern Recognition and Finite Automata for Recognizing VCV Productions in the Study of Vocal Hyperfunction\",\"authors\":\"Gbenga Omotara, Mark L. Berardi, Maria Dietrich, G. DeSouza\",\"doi\":\"10.1109/SSCI50451.2021.9659927\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relative fundamental frequency (RFF) is an acoustic measure used to quantify vocal effort in voice science. Since it seeks to capture transitions between (i.e. to/from) steady-state vowels and unvoiced consonants, any machine learning approach to recognize patterns in these transitions should require time properties capable of identifying the sequence of phonemes. At the same time, Neural Networks (NN) have become a ubiquitous solution for data-driven problems, and Recursive NNs (RNN) provide a time-series schema to address time-dependent problems. Indeed, typical Neural Network solutions require either a time-series schema like in RNN or some spectral transformation to be able to handle time-dependent data. In this study, we decided to ignore - at least momentarily - any time-series dependency of the data and employed a simple NN to classify elements of the speech. Later, a State-Machine was used to identify their sequence with the purpose of localizing the transitions between voiced and unvoiced sounds in vowel-consonant-vowel (VCV) productions. The goal of this study was to demonstrate that a pipeline consisting of time-agnostic (Neural Network) and time-dependent (State Machine) components can be used to recognize time-dependent patterns in VCV productions.\",\"PeriodicalId\":255763,\"journal\":{\"name\":\"2021 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSCI50451.2021.9659927\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI50451.2021.9659927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Pipeline Consisting of Pattern Recognition and Finite Automata for Recognizing VCV Productions in the Study of Vocal Hyperfunction
Relative fundamental frequency (RFF) is an acoustic measure used to quantify vocal effort in voice science. Since it seeks to capture transitions between (i.e. to/from) steady-state vowels and unvoiced consonants, any machine learning approach to recognize patterns in these transitions should require time properties capable of identifying the sequence of phonemes. At the same time, Neural Networks (NN) have become a ubiquitous solution for data-driven problems, and Recursive NNs (RNN) provide a time-series schema to address time-dependent problems. Indeed, typical Neural Network solutions require either a time-series schema like in RNN or some spectral transformation to be able to handle time-dependent data. In this study, we decided to ignore - at least momentarily - any time-series dependency of the data and employed a simple NN to classify elements of the speech. Later, a State-Machine was used to identify their sequence with the purpose of localizing the transitions between voiced and unvoiced sounds in vowel-consonant-vowel (VCV) productions. The goal of this study was to demonstrate that a pipeline consisting of time-agnostic (Neural Network) and time-dependent (State Machine) components can be used to recognize time-dependent patterns in VCV productions.