Studying text coherence in Czech – a corpus-based analysis

IF 0.2 0 LANGUAGE & LINGUISTICS Topics in Linguistics Pub Date : 2017-12-20 DOI:10.1515/topling-2017-0009
Magdaléna Rysová
{"title":"Studying text coherence in Czech – a corpus-based analysis","authors":"Magdaléna Rysová","doi":"10.1515/topling-2017-0009","DOIUrl":null,"url":null,"abstract":"Abstract The paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness) in Czech. It focuses on examining the interaction of these two language phenomena and observes where they meet to participate in text structuring. Specifically, the paper analyses contextually bound and non-bound sentence items and examines whether (and how often) they are involved in relations of grammatical coreference in Czech newspaper articles. The analysis is carried out on the language data of the Prague Dependency Treebank (PDT) containing 3,165 Czech texts. The results of the analysis are helpful in automatic text annotation - the paper presents how (or to what extent) the annotation of grammatical coreference may be used in automatic (pre-)annotation of sentence information structure in Czech. It demonstrates how accurately we may (automatically) assume the value of contextual boundness for the antecedent and anaphor (as the two participants of a grammatical coreference relation). The results of the paper demonstrate that the anaphor of grammatical coreference is automatically predictable - it is a non-contrastive contextually bound sentence item in 99.18% of cases. On the other hand, the value of contextual boundness of the antecedent is not so easy to estimate (according to the PDT, the antecedent is contextually non-bound in 37% of cases, non-contrastive contextually bound in 50% and contrastive contextually bound in 13% of cases).","PeriodicalId":41377,"journal":{"name":"Topics in Linguistics","volume":"18 1","pages":"36 - 47"},"PeriodicalIF":0.2000,"publicationDate":"2017-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Topics in Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/topling-2017-0009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract The paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness) in Czech. It focuses on examining the interaction of these two language phenomena and observes where they meet to participate in text structuring. Specifically, the paper analyses contextually bound and non-bound sentence items and examines whether (and how often) they are involved in relations of grammatical coreference in Czech newspaper articles. The analysis is carried out on the language data of the Prague Dependency Treebank (PDT) containing 3,165 Czech texts. The results of the analysis are helpful in automatic text annotation - the paper presents how (or to what extent) the annotation of grammatical coreference may be used in automatic (pre-)annotation of sentence information structure in Czech. It demonstrates how accurately we may (automatically) assume the value of contextual boundness for the antecedent and anaphor (as the two participants of a grammatical coreference relation). The results of the paper demonstrate that the anaphor of grammatical coreference is automatically predictable - it is a non-contrastive contextually bound sentence item in 99.18% of cases. On the other hand, the value of contextual boundness of the antecedent is not so easy to estimate (according to the PDT, the antecedent is contextually non-bound in 37% of cases, non-contrastive contextually bound in 50% and contrastive contextually bound in 13% of cases).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于语料库的捷克语语篇连贯研究
摘要本文涉及捷克语料库语言学领域,是目前通过语言互动分析语篇连贯性的各种研究之一。它对捷克语的语法共指和句子信息结构(从上下文有界性的角度)进行了基于语料库的分析。它着重考察这两种语言现象的相互作用,并观察它们在文本结构中相遇的地方。具体而言,本文分析了捷克报纸文章中上下文绑定和非绑定的句子项目,并考察了它们是否(以及频率)涉及语法共指关系。对布拉格依赖性树库(PDT)的语言数据进行了分析,该库包含3165篇捷克语文本。分析结果有助于文本自动注释——本文介绍了语法共指的注释如何(或在多大程度上)用于捷克语句子信息结构的自动(前)注释。它证明了我们可以(自动)准确地假设先行词和回指(作为语法共指关系的两个参与者)的上下文有界性的价值。研究结果表明,语法共指的回指是自动可预测的——在99.18%的情况下,它是一个非对比语境绑定的句子项目。另一方面,先行词的上下文有界性的值不那么容易估计(根据PDT,37%的情况下先行词上下文无界,50%的情况下非对比上下文有界,13%的情况下对比上下文有缘)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Topics in Linguistics
Topics in Linguistics LANGUAGE & LINGUISTICS-
CiteScore
1.20
自引率
0.00%
发文量
7
审稿时长
26 weeks
期刊最新文献
The semantic complexity of Hausa kinship terms The mental consideration of resilience as a relevant social concept (a corpus-based research of American English) Austin in the Lab: Empirically reconsidering the constative-performative distinction The ADV speaking-construction in American English: A quantitative corpus-based investigation The morphological and syntactic functions of Dagbani nominal suffixes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1