{"title":"Automatic Prosody Markup Based on Fundamental Frequency","authors":"A. Shilkov, S. Ivanov, Maxim Sipatov, V. Golodov","doi":"10.1109/RusAutoCon52004.2021.9537523","DOIUrl":null,"url":null,"abstract":"Prosody can be referred to those elements of speech that represent the properties of syllables and larger units of speech. Prosody also includes individual linguistic functions such as rhythm, accent, and intonation. Prosody makes it possible to identify the speaker's vocal personality or the characteristics of utterances, such as the speaker's emotional state or the style of utterance. Studying dialects or languages often requires prosody markup. The article is devoted to automatic prosody markup based on the fundamental frequency of the utterance. With the extraction of the fundamental frequency being the main challenge multiple methods are reviewed such as SOTA neural networks and more conservative algorithms. After applying one of these methods, SLAM markup is obtained. In the end markups based on differently obtained fundamental frequencies are compared.","PeriodicalId":106150,"journal":{"name":"2021 International Russian Automation Conference (RusAutoCon)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Russian Automation Conference (RusAutoCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RusAutoCon52004.2021.9537523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Prosody can be referred to those elements of speech that represent the properties of syllables and larger units of speech. Prosody also includes individual linguistic functions such as rhythm, accent, and intonation. Prosody makes it possible to identify the speaker's vocal personality or the characteristics of utterances, such as the speaker's emotional state or the style of utterance. Studying dialects or languages often requires prosody markup. The article is devoted to automatic prosody markup based on the fundamental frequency of the utterance. With the extraction of the fundamental frequency being the main challenge multiple methods are reviewed such as SOTA neural networks and more conservative algorithms. After applying one of these methods, SLAM markup is obtained. In the end markups based on differently obtained fundamental frequencies are compared.