{"title":"语音合成软件程序中Luganda文本规范化模块的设计与实现","authors":"Ronald Kizito;Wayne S. Okello;Sulaiman Kagumire","doi":"10.23919/SAIEE.2020.9194384","DOIUrl":null,"url":null,"abstract":"This paper describes a Luganda text normalization module, a crucial component needed for a Luganda Text to Speech system. We describe the use of a rule-based approach for detection, classification and verbalization of Luganda text. At the core of this module are the Luganda grammar rules that were hand-built to normalize Non-Standard Words (NSWs) from different semiotic and noun classes. Input text is first analyzed, matched against handcrafted patterns developed using regular expressions to detect any NSWs. Upon detection, NSWs are tokenized and classified into one of the semiotic classes and then if necessary, into one of the Luganda noun classes. These are subsequently verbalized, each according to its semiotic as well as noun class, and a new text file is produced. We tested the module with 7 datasets and achieved average detection and normalization rates of 82% and 77.7% respectively.","PeriodicalId":42493,"journal":{"name":"SAIEE Africa Research Journal","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.23919/SAIEE.2020.9194384","citationCount":"2","resultStr":"{\"title\":\"Design and implementation of a Luganda text normalization module for a speech synthesis software program\",\"authors\":\"Ronald Kizito;Wayne S. Okello;Sulaiman Kagumire\",\"doi\":\"10.23919/SAIEE.2020.9194384\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a Luganda text normalization module, a crucial component needed for a Luganda Text to Speech system. We describe the use of a rule-based approach for detection, classification and verbalization of Luganda text. At the core of this module are the Luganda grammar rules that were hand-built to normalize Non-Standard Words (NSWs) from different semiotic and noun classes. Input text is first analyzed, matched against handcrafted patterns developed using regular expressions to detect any NSWs. Upon detection, NSWs are tokenized and classified into one of the semiotic classes and then if necessary, into one of the Luganda noun classes. These are subsequently verbalized, each according to its semiotic as well as noun class, and a new text file is produced. We tested the module with 7 datasets and achieved average detection and normalization rates of 82% and 77.7% respectively.\",\"PeriodicalId\":42493,\"journal\":{\"name\":\"SAIEE Africa Research Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2020-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.23919/SAIEE.2020.9194384\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SAIEE Africa Research Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9194384/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAIEE Africa Research Journal","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9194384/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Design and implementation of a Luganda text normalization module for a speech synthesis software program
This paper describes a Luganda text normalization module, a crucial component needed for a Luganda Text to Speech system. We describe the use of a rule-based approach for detection, classification and verbalization of Luganda text. At the core of this module are the Luganda grammar rules that were hand-built to normalize Non-Standard Words (NSWs) from different semiotic and noun classes. Input text is first analyzed, matched against handcrafted patterns developed using regular expressions to detect any NSWs. Upon detection, NSWs are tokenized and classified into one of the semiotic classes and then if necessary, into one of the Luganda noun classes. These are subsequently verbalized, each according to its semiotic as well as noun class, and a new text file is produced. We tested the module with 7 datasets and achieved average detection and normalization rates of 82% and 77.7% respectively.