RhySpeech: A Deployable Rhythmic Text-to-Speech Based on Feed-Forward Transformer for Reading Disabilities

Yi-Hsien Lin
{"title":"RhySpeech: A Deployable Rhythmic Text-to-Speech Based on Feed-Forward Transformer for Reading Disabilities","authors":"Yi-Hsien Lin","doi":"10.1145/3590003.3590062","DOIUrl":null,"url":null,"abstract":"Dyslexia was first proposed in 1877, but this century-old problem still troubles many people today [1]. Dyslexia is marked by difficulty in reading despite having normal or superior conditions in their environment and intellectual ability, is curable using multi-sensory learning, which involves providing audio stimulus, sometimes generated from expressive text-to-speech. However, such generated audio lacks rhythmic features, marked by inadequate insertion of pauses. In response to such technological difficulty, this paper proposes RhySpeech, which models rhythm using feed-forward transformer neural networks and an LRV (Latent Rhythm Vector). The LRV receives input from the pitch, energy, and duration features encoded using a Transformers network along with the numeric encoding of the previous 16 phonemes, which together build a strong sense of context for the pause prediction. This LRV is trained to generate adequate lengths and positions of pa uses, allowing the synthesized audio to have more accurate pausing","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3590003.3590062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Dyslexia was first proposed in 1877, but this century-old problem still troubles many people today [1]. Dyslexia is marked by difficulty in reading despite having normal or superior conditions in their environment and intellectual ability, is curable using multi-sensory learning, which involves providing audio stimulus, sometimes generated from expressive text-to-speech. However, such generated audio lacks rhythmic features, marked by inadequate insertion of pauses. In response to such technological difficulty, this paper proposes RhySpeech, which models rhythm using feed-forward transformer neural networks and an LRV (Latent Rhythm Vector). The LRV receives input from the pitch, energy, and duration features encoded using a Transformers network along with the numeric encoding of the previous 16 phonemes, which together build a strong sense of context for the pause prediction. This LRV is trained to generate adequate lengths and positions of pa uses, allowing the synthesized audio to have more accurate pausing
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
RhySpeech:一种可部署的基于前馈变压器的有节奏文本到语音的阅读障碍
诵读困难症最早是在1877年提出的,但这个长达一个世纪的问题至今仍困扰着许多人[1]。阅读障碍的特点是,尽管他们的环境和智力能力正常或优越,但阅读困难,可以通过多感官学习来治愈,这种学习包括提供音频刺激,有时是由表达性的文本到语音产生的。然而,这种生成的音频缺乏节奏特征,其特点是插入的停顿不足。针对这种技术困难,本文提出了RhySpeech,它使用前馈变压器神经网络和LRV (Latent rhythm Vector)来建模节奏。LRV接收来自音高、能量和持续时间特征的输入,这些特征使用transformer网络编码,以及前16个音素的数字编码,它们一起为暂停预测构建了强大的上下文感。这个LRV训练产生足够的长度和位置的pa使用,允许合成音频有更准确的暂停
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Interpretable Brain Network Atlas-Based Hybrid Model for Mild Cognitive Impairment Progression Prediction Heart Sound Classification Algorithm Based on Sub-band Statistics and Time-frequency Fusion Features An Unmanned Lane Detection Algorithm Using Deep Learning and Ordered Test Sets Strategy Federated Learning-Based Intrusion Detection Method for Smart Grid A U-Net based Self-Supervised Image Generation Model Applying PCA using Small Datasets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1