来自YouTube的北美双模态和自我修复数据库

Q2 Arts and Humanities Psychology of Language and Communication Pub Date : 2022-01-01 DOI:10.2478/plc-2022-13

Steven Coats

{"title":"来自YouTube的北美双模态和自我修复数据库","authors":"Steven Coats","doi":"10.2478/plc-2022-13","DOIUrl":null,"url":null,"abstract":"Abstract Sequences of two modal verbs in spoken English can represent use of a nonstandard syntactic feature (double modal) or a corrected utterance in which a speaker begins with one modal auxiliary, but switches to another (self-repair). This article presents the Double Modals and Self-Repairs (DMSR) database, a table of naturalistic double modals and self-repairs in videos from local government entities in North America, created from the Corpus of North American Spoken English (CoNASE). The paper describes the procedures used for the database’s creation, discusses potential uses, and presents an exploratory analysis in which a logistic regression classifier is trained with CoNASE data to distinguish authentic double modals from self-repair sequences on the basis of local discourse context. The analysis demonstrates how large corpora of speech can be used to investigate the links between syntactic and pragmatic phenomena and shows specifically that double modals are an interactive device, while two-modal sequences as self-repairs may be the result of high cognitive load. The paper concludes with a discussion of multimodal corpus creation from YouTube for the study of lexical, syntactic, and interactional phenomena in speech as well as for the analysis of complex, multilevel computer-mediated communication (CMC) phenomena.","PeriodicalId":20768,"journal":{"name":"Psychology of Language and Communication","volume":"26 1","pages":"273 - 296"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A database of North American double modals and self-repairs from YouTube\",\"authors\":\"Steven Coats\",\"doi\":\"10.2478/plc-2022-13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Sequences of two modal verbs in spoken English can represent use of a nonstandard syntactic feature (double modal) or a corrected utterance in which a speaker begins with one modal auxiliary, but switches to another (self-repair). This article presents the Double Modals and Self-Repairs (DMSR) database, a table of naturalistic double modals and self-repairs in videos from local government entities in North America, created from the Corpus of North American Spoken English (CoNASE). The paper describes the procedures used for the database’s creation, discusses potential uses, and presents an exploratory analysis in which a logistic regression classifier is trained with CoNASE data to distinguish authentic double modals from self-repair sequences on the basis of local discourse context. The analysis demonstrates how large corpora of speech can be used to investigate the links between syntactic and pragmatic phenomena and shows specifically that double modals are an interactive device, while two-modal sequences as self-repairs may be the result of high cognitive load. The paper concludes with a discussion of multimodal corpus creation from YouTube for the study of lexical, syntactic, and interactional phenomena in speech as well as for the analysis of complex, multilevel computer-mediated communication (CMC) phenomena.\",\"PeriodicalId\":20768,\"journal\":{\"name\":\"Psychology of Language and Communication\",\"volume\":\"26 1\",\"pages\":\"273 - 296\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Psychology of Language and Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/plc-2022-13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychology of Language and Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/plc-2022-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 0

摘要

摘要英语口语中的双语气动词序列可以表示非标准句法特征（双语气）的使用，也可以表示说话人从一个语气助词开始，但转换到另一个语气辅助词（自我修复）的纠正话语。本文介绍了双模态和自我修复（DMSR）数据库，这是一个由北美口语语料库（CoNASE）创建的北美地方政府实体视频中的自然主义双模态和自修复表。本文描述了数据库创建过程，讨论了潜在的用途，并提出了一种探索性分析，其中用CoNASE数据训练逻辑回归分类器，以在本地话语上下文的基础上区分真实的双模态和自修复序列。该分析表明，大型语料库可以用来研究句法和语用现象之间的联系，并特别表明双模态是一种互动装置，而双模态序列作为自我修复可能是高认知负荷的结果。本文最后讨论了YouTube上的多模式语料库创建，用于研究语音中的词汇、句法和互动现象，以及分析复杂的、多层次的计算机中介通信（CMC）现象。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A database of North American double modals and self-repairs from YouTube

Abstract Sequences of two modal verbs in spoken English can represent use of a nonstandard syntactic feature (double modal) or a corrected utterance in which a speaker begins with one modal auxiliary, but switches to another (self-repair). This article presents the Double Modals and Self-Repairs (DMSR) database, a table of naturalistic double modals and self-repairs in videos from local government entities in North America, created from the Corpus of North American Spoken English (CoNASE). The paper describes the procedures used for the database’s creation, discusses potential uses, and presents an exploratory analysis in which a logistic regression classifier is trained with CoNASE data to distinguish authentic double modals from self-repair sequences on the basis of local discourse context. The analysis demonstrates how large corpora of speech can be used to investigate the links between syntactic and pragmatic phenomena and shows specifically that double modals are an interactive device, while two-modal sequences as self-repairs may be the result of high cognitive load. The paper concludes with a discussion of multimodal corpus creation from YouTube for the study of lexical, syntactic, and interactional phenomena in speech as well as for the analysis of complex, multilevel computer-mediated communication (CMC) phenomena.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊