Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous data constrains spontaneous speech generation for speakers without spontaneous data. To address these problems, we propose SponTTS, a two-stage approach based on bottleneck (BN) features to model and transfer spontaneous style for TTS. In the first stage, we adopt a Conditional Variational Autoencoder (CVAE) to capture spontaneous prosody from a BN feature and involve the spontaneous phenomena by the constraint of spontaneous phenomena embedding prediction loss. Besides, we introduce a flow-based predictor to predict a latent spontaneous style representation from the text, which enriches the prosody and context-specific spontaneous phenomena during inference. In the second stage, we adopt a VITS-like module to transfer the spontaneous style learned in the first stage to target speakers. Experiments demonstrate that SponTTS is effective in modeling spontaneous style and transferring the style to the target speakers, generating spontaneous speech with high naturalness, expressiveness, and speaker similarity. The zero-shot spontaneous style TTS test further verifies the generalization and robustness of SponTTS in generating spontaneous speech for unseen speakers.
{"title":"SponTTS: modeling and transferring spontaneous style for TTS","authors":"Li, Hanzhao, Zhu, Xinfa, Xue, Liumeng, Song, Yang, Chen, Yunlin, Xie, Lei","doi":"10.48550/arxiv.2311.07179","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07179","url":null,"abstract":"Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous data constrains spontaneous speech generation for speakers without spontaneous data. To address these problems, we propose SponTTS, a two-stage approach based on bottleneck (BN) features to model and transfer spontaneous style for TTS. In the first stage, we adopt a Conditional Variational Autoencoder (CVAE) to capture spontaneous prosody from a BN feature and involve the spontaneous phenomena by the constraint of spontaneous phenomena embedding prediction loss. Besides, we introduce a flow-based predictor to predict a latent spontaneous style representation from the text, which enriches the prosody and context-specific spontaneous phenomena during inference. In the second stage, we adopt a VITS-like module to transfer the spontaneous style learned in the first stage to target speakers. Experiments demonstrate that SponTTS is effective in modeling spontaneous style and transferring the style to the target speakers, generating spontaneous speech with high naturalness, expressiveness, and speaker similarity. The zero-shot spontaneous style TTS test further verifies the generalization and robustness of SponTTS in generating spontaneous speech for unseen speakers.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"118 15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developing novel high-temperature van der Waals ferromagnetic semiconductor materials and investigating their interface coupling effects with two-dimensional topological semimetals are pivotal for advancing next-generation spintronic and quantum devices. However, most van der Waals ferromagnetic semiconductors exhibit ferromagnetism only at low temperatures, limiting the proximity research on their interfaces with topological semimetals. Here, we report an intrinsic, van der Waals layered room-temperature ferromagnetic semiconductor crystal, FeCr0.5Ga1.5Se4 (FCGS), with a Curie temperature as high as 370 K, setting a new record for van der Waals ferromagnetic semiconductors. The saturation magnetization at low temperature (2 K) and room temperature (300 K) reaches 8.2 emu/g and 2.7 emu/g, respectively. Furthermore, FCGS possesses a bandgap of approximately 1.2 eV, which is comparable to the widely used commercial silicon. The FCGS/graphene heterostructure exhibits an impeccably smooth and gapless interface, thereby inducing a robust magnetic proximity coupling effect between FCGS and graphene. After the proximity coupling, graphene undergoes a charge carrier transition from electrons to holes, accompanied by a transition from non-magnetic to ferromagnetic transport behavior with robust anomalous Hall effect. Notably, the anomalous Hall effect remains robust even temperatures up to 400 K.
{"title":"Robust magnetic proximity induced anomalous Hall effect in a room\u0000 temperature van der Waals ferromagnetic semiconductor based 2D\u0000 heterostructure","authors":"Wu, Hao, Yang, Li, Zhang, Gaojie, Jin, Wen, Xiao, Bichen, Zhang, Wenfeng, Chang, Haixin","doi":"10.48550/arxiv.2311.07183","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07183","url":null,"abstract":"Developing novel high-temperature van der Waals ferromagnetic semiconductor materials and investigating their interface coupling effects with two-dimensional topological semimetals are pivotal for advancing next-generation spintronic and quantum devices. However, most van der Waals ferromagnetic semiconductors exhibit ferromagnetism only at low temperatures, limiting the proximity research on their interfaces with topological semimetals. Here, we report an intrinsic, van der Waals layered room-temperature ferromagnetic semiconductor crystal, FeCr0.5Ga1.5Se4 (FCGS), with a Curie temperature as high as 370 K, setting a new record for van der Waals ferromagnetic semiconductors. The saturation magnetization at low temperature (2 K) and room temperature (300 K) reaches 8.2 emu/g and 2.7 emu/g, respectively. Furthermore, FCGS possesses a bandgap of approximately 1.2 eV, which is comparable to the widely used commercial silicon. The FCGS/graphene heterostructure exhibits an impeccably smooth and gapless interface, thereby inducing a robust magnetic proximity coupling effect between FCGS and graphene. After the proximity coupling, graphene undergoes a charge carrier transition from electrons to holes, accompanied by a transition from non-magnetic to ferromagnetic transport behavior with robust anomalous Hall effect. Notably, the anomalous Hall effect remains robust even temperatures up to 400 K.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"118 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}