The NCCU Corpus of Spoken Chinese: Mandarin, Hakka, and Southern Min

IF 0.3 0 LANGUAGE & LINGUISTICS Taiwan Journal of Linguistics Pub Date : 2008-12-01 DOI:10.6519/TJL.2008.6(2).5
Kawai Chui, Huei-ling Lai
{"title":"The NCCU Corpus of Spoken Chinese: Mandarin, Hakka, and Southern Min","authors":"Kawai Chui, Huei-ling Lai","doi":"10.6519/TJL.2008.6(2).5","DOIUrl":null,"url":null,"abstract":"In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialects undergoing linguistic changes, but the population of Southern Min and Hakka is also diminishing. The NCCU Corpus of Spoken Chinese is thus a project of language documentation whereby open online access to Mandarin, Hakka, and Southern Min data is provided for non-profit-making research. As a language documentation project, the NCCU spoken corpus focuses on collecting and archiving spoken forms of various types. It consists of three sub-corpora, namely the Corpus of Spoken Mandarin, the Corpus of Spoken Hakka, and the Corpus of Spoken Southern Min. The three corpora share a common scheme for the collection of spoken data, mostly in the form of spontaneous face-to-face conversations. The infrastructure of the corpus is designed in a simple yet user-friendly way, so that data can be processed efficiently in the database, and users can browse the spoken data directly from the web. We hope that our work can encourage more people to engage in building up spoken corpora from different perspectives and for different purposes.","PeriodicalId":41000,"journal":{"name":"Taiwan Journal of Linguistics","volume":"6 1","pages":"119-144"},"PeriodicalIF":0.3000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Taiwan Journal of Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6519/TJL.2008.6(2).5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 40

Abstract

In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialects undergoing linguistic changes, but the population of Southern Min and Hakka is also diminishing. The NCCU Corpus of Spoken Chinese is thus a project of language documentation whereby open online access to Mandarin, Hakka, and Southern Min data is provided for non-profit-making research. As a language documentation project, the NCCU spoken corpus focuses on collecting and archiving spoken forms of various types. It consists of three sub-corpora, namely the Corpus of Spoken Mandarin, the Corpus of Spoken Hakka, and the Corpus of Spoken Southern Min. The three corpora share a common scheme for the collection of spoken data, mostly in the form of spontaneous face-to-face conversations. The infrastructure of the corpus is designed in a simple yet user-friendly way, so that data can be processed efficiently in the database, and users can browse the spoken data directly from the web. We hope that our work can encourage more people to engage in building up spoken corpora from different perspectives and for different purposes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
中文口语语料库:国语、客家话、闽南话
在台湾,大多数人说普通话、闽南语或客家话。不仅中国的三种方言正在经历语言的变化,而且南民和客家的人口也在减少。因此,中央大学汉语口语语料库是一个语言文献项目,提供普通话、客家话和闽南语数据的开放在线访问,用于非营利性研究。作为一项语言文献计划,中央语言学院口语语料库的重点是收集和归档各类口语形式。它由三个子语料库组成,即普通话口语语料库、客家口语语料库和南民口语语料库。这三个语料库对口语数据的收集有一个共同的方案,大多采用自发面对面对话的形式。语料库的基础结构采用了简单而友好的设计方式,使得数据可以在数据库中进行高效的处理,用户可以直接从web上浏览口语数据。我们希望我们的工作能够鼓励更多的人从不同的角度和不同的目的来参与口语语料库的建设。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Taiwan Journal of Linguistics
Taiwan Journal of Linguistics LANGUAGE & LINGUISTICS-
CiteScore
0.40
自引率
0.00%
发文量
0
审稿时长
20 weeks
期刊介绍: Taiwan Journal of Linguistics is an international journal dedicated to the publication of research papers in linguistics and welcomes contributions in all areas of the scientific study of language. Contributions may be submitted from all countries and are accepted all year round. The language of publication is English. There are no restrictions on regular submission; however, manuscripts simultaneously submitted to other publications cannot be accepted. TJL adheres to a strict standard of double-blind reviews to minimize biases that might be caused by knowledge of the author’s gender, culture, or standing within the professional community. Once a manuscript is determined as potentially suitable for the journal after an initial screening by the editor, all information that may identify the author is removed, and copies are sent to at least two qualified reviewers. The selection of reviewers is based purely on professional considerations and their identity will be kept strictly confidential by TJL. All feedback from the reviewers, except such comments as may be specifically referred to the attention of the editor, is faithfully relayed to the authors to assist them in improving their work, regardless of whether the paper is to be accepted, accepted upon minor revision, revised and resubmitted, or rejected.
期刊最新文献
THE USE OF CLASSIFIERS IN VIETNAMESE IN TYPICAL AND ATYPICAL LANGUAGE DEVELOPMENT. Semantics and syntax of the passive construction in hainan min MEANING IN REPAIR: THE ABSTRACT NOUN YISI 'MEANING/INTENTION' IN THE MANAGEMENT OF INTERSUBJECTIVITY IN MANDARIN CONVERSATION GOAL-OF-MOTION READINGS IN THAI DIRECTIONAL SERIAL VERBS The selection of negative words in Taiwanese: The principles and pedagogy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1