{"title":"为更多的研究人员提供更好的数据——利用BNCweb的音频功能","authors":"S. Hoffmann, Sabine Arndt-Lappe","doi":"10.2478/icame-2021-0004","DOIUrl":null,"url":null,"abstract":"Abstract In spite of the wide agreement among linguists as to the significance of spoken language data, actual speech data have not formed the basis of empirical work on English as much as one would think. The present paper is intended to contribute to changing this situation, on a theoretical and on a practical level. On a theoretical level, we discuss different research traditions within (English) linguistics. Whereas speech data have become increasingly important in various linguistic disciplines, major corpora of English developed within the corpus-linguistic community, carefully sampled to be representative of language usage, are usually restricted to orthographic transcriptions of spoken language. As a result, phonological phenomena have remained conspicuously understudied within traditional corpus linguistics. At the same time, work with current speech corpora often requires a considerable level of specialist knowledge and tailor-made solutions. On a practical level, we present a new feature of BNCweb (Hoffmann et al. 2008), a user-friendly interface to the British National Corpus, which gives users access to audio and phonemic transcriptions of more than five million words of spontaneous speech. With the help of a pilot study on the variability of intrusive r we illustrate the scope of the new possibilities.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"5 1","pages":"125 - 154"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Better data for more researchers – using the audio features of BNCweb\",\"authors\":\"S. Hoffmann, Sabine Arndt-Lappe\",\"doi\":\"10.2478/icame-2021-0004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In spite of the wide agreement among linguists as to the significance of spoken language data, actual speech data have not formed the basis of empirical work on English as much as one would think. The present paper is intended to contribute to changing this situation, on a theoretical and on a practical level. On a theoretical level, we discuss different research traditions within (English) linguistics. Whereas speech data have become increasingly important in various linguistic disciplines, major corpora of English developed within the corpus-linguistic community, carefully sampled to be representative of language usage, are usually restricted to orthographic transcriptions of spoken language. As a result, phonological phenomena have remained conspicuously understudied within traditional corpus linguistics. At the same time, work with current speech corpora often requires a considerable level of specialist knowledge and tailor-made solutions. On a practical level, we present a new feature of BNCweb (Hoffmann et al. 2008), a user-friendly interface to the British National Corpus, which gives users access to audio and phonemic transcriptions of more than five million words of spontaneous speech. With the help of a pilot study on the variability of intrusive r we illustrate the scope of the new possibilities.\",\"PeriodicalId\":73271,\"journal\":{\"name\":\"ICAME journal : computers in English linguistics\",\"volume\":\"5 1\",\"pages\":\"125 - 154\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICAME journal : computers in English linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/icame-2021-0004\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICAME journal : computers in English linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/icame-2021-0004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
尽管语言学家对口语数据的重要性达成了广泛的共识,但实际的口语数据并没有像人们想象的那样成为英语实证研究的基础。本文旨在从理论和实践两方面为改变这种情况作出贡献。在理论层面上,我们讨论(英语)语言学中不同的研究传统。虽然语音数据在各个语言学科中变得越来越重要,但在语料库语言社区中开发的主要英语语料库,经过仔细采样以代表语言使用,通常仅限于口语的正字法转录。因此,在传统语料库语言学中,语音现象的研究明显不足。同时,使用当前的语音语料库通常需要相当水平的专业知识和量身定制的解决方案。在实践层面上,我们提出了BNCweb的一个新功能(Hoffmann et al. 2008),这是一个对英国国家语料库的用户友好界面,使用户可以访问超过500万单词的自发语音的音频和音位转录。通过对侵入性r变异性的初步研究,我们阐明了新的可能性的范围。
Better data for more researchers – using the audio features of BNCweb
Abstract In spite of the wide agreement among linguists as to the significance of spoken language data, actual speech data have not formed the basis of empirical work on English as much as one would think. The present paper is intended to contribute to changing this situation, on a theoretical and on a practical level. On a theoretical level, we discuss different research traditions within (English) linguistics. Whereas speech data have become increasingly important in various linguistic disciplines, major corpora of English developed within the corpus-linguistic community, carefully sampled to be representative of language usage, are usually restricted to orthographic transcriptions of spoken language. As a result, phonological phenomena have remained conspicuously understudied within traditional corpus linguistics. At the same time, work with current speech corpora often requires a considerable level of specialist knowledge and tailor-made solutions. On a practical level, we present a new feature of BNCweb (Hoffmann et al. 2008), a user-friendly interface to the British National Corpus, which gives users access to audio and phonemic transcriptions of more than five million words of spontaneous speech. With the help of a pilot study on the variability of intrusive r we illustrate the scope of the new possibilities.