{"title":"Towards Integration of Embodiment Features for Prosodic Prominence Prediction from Text","authors":"P. Madhyastha","doi":"10.1145/3536220.3558540","DOIUrl":null,"url":null,"abstract":"Prosodic prominence prediction is an important task in the area of speech processing and especially forms an essential part of modern text-to-speech systems. Previous work has broadly focused on acoustic and linguistic features (such as syntactic and semantic features) for predicting prosodic prominence. However, human models of prosody are known to be highly multimodal and grounded on denotations of physical entities and embodied experience. In this paper we present a first study where we integrate multimodal sensorimotor associations by exploiting the Lancaster Sensorimotor Norms towards prosodic prominence prediction. Our results highlight the importance of sensorimotor knowledge especially for models in low-data regimens where we show that it improves the performance by a significant margin.","PeriodicalId":186796,"journal":{"name":"Companion Publication of the 2022 International Conference on Multimodal Interaction","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2022 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3536220.3558540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Prosodic prominence prediction is an important task in the area of speech processing and especially forms an essential part of modern text-to-speech systems. Previous work has broadly focused on acoustic and linguistic features (such as syntactic and semantic features) for predicting prosodic prominence. However, human models of prosody are known to be highly multimodal and grounded on denotations of physical entities and embodied experience. In this paper we present a first study where we integrate multimodal sensorimotor associations by exploiting the Lancaster Sensorimotor Norms towards prosodic prominence prediction. Our results highlight the importance of sensorimotor knowledge especially for models in low-data regimens where we show that it improves the performance by a significant margin.