{"title":"建构台语(闽南语)、客家话与国语多语语料库之探讨","authors":"Ren-Yuan Lyu, Min-Siong Liang, Yuang-Chin Chiang","doi":"10.30019/IJCLCLP.200408.0001","DOIUrl":null,"url":null,"abstract":"The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung University and sponsored by the National Science Council of Taiwan. It is expected that a multilingual speech corpus will be collected, covering the three most frequently used languages in Taiwan: Taiwanese (Min-nan), Hakka, and Mandarin. This 3-year project has the goal of collecting a phonetically abundant speech corpus of more than 1,800 speakers and hundreds of hours of speech. Recently, the first version of this corpus containing speech of 600 speakers of Taiwanese and Mandarin was finished and is ready to be released. It contains about 49 hours of speech and 247,000 utterances.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Toward Constructing A Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin Chinese\",\"authors\":\"Ren-Yuan Lyu, Min-Siong Liang, Yuang-Chin Chiang\",\"doi\":\"10.30019/IJCLCLP.200408.0001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung University and sponsored by the National Science Council of Taiwan. It is expected that a multilingual speech corpus will be collected, covering the three most frequently used languages in Taiwan: Taiwanese (Min-nan), Hakka, and Mandarin. This 3-year project has the goal of collecting a phonetically abundant speech corpus of more than 1,800 speakers and hundreds of hours of speech. Recently, the first version of this corpus containing speech of 600 speakers of Taiwanese and Mandarin was finished and is ready to be released. It contains about 49 hours of speech and 247,000 utterances.\",\"PeriodicalId\":436300,\"journal\":{\"name\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30019/IJCLCLP.200408.0001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Linguistics Chin. Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30019/IJCLCLP.200408.0001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Toward Constructing A Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin Chinese
The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung University and sponsored by the National Science Council of Taiwan. It is expected that a multilingual speech corpus will be collected, covering the three most frequently used languages in Taiwan: Taiwanese (Min-nan), Hakka, and Mandarin. This 3-year project has the goal of collecting a phonetically abundant speech corpus of more than 1,800 speakers and hundreds of hours of speech. Recently, the first version of this corpus containing speech of 600 speakers of Taiwanese and Mandarin was finished and is ready to be released. It contains about 49 hours of speech and 247,000 utterances.