Exploring and categorising the Arabic copula and auxiliary kāna through enhanced part-of-speech tagging

IF 0.8 Q3 LINGUISTICS Corpora Pub Date : 2021-11-01 DOI:10.3366/cor.2021.0225
A. Hardie, Wesam M. A. Ibrahim
{"title":"Exploring and categorising the Arabic copula and auxiliary kāna through enhanced part-of-speech tagging","authors":"A. Hardie, Wesam M. A. Ibrahim","doi":"10.3366/cor.2021.0225","DOIUrl":null,"url":null,"abstract":"Arabic syntax has yet to be studied in detail from a corpus-based perspective. The Arabic copula kāna (‘be’), functions also as an auxiliary, creating periphrastic tense–aspect constructions; but the literature on these functions is far from exhaustive. To analyse kāna within the one-million word Corpus of Contemporary Arabic, part-of-speech tagging (using novel, targeted enhancements to a previously described program which improves the accessibility for linguistic analysis of the output of Habash et al.’s [2012] mada disambiguator for the Buckwalter Arabic morphological analyser) is applied to disambiguate copula and auxiliary at a high rate of accuracy. Concordances of both are extracted, and 10 percent samples (499 instances of copula kāna and 387 of auxiliary kāna) are analysed manually to identify surface-level grammatical patterns and meanings. This raw analysis is then systematised according to the more general patterns’ main parameters of variation; special descriptions are developed for specific, apparently fixed-form expressions (including two phraseologies which afford expression of verbal and adjectival modality). Overall, we uncover substantial new detail, not mentioned in existing grammars (e.g., the quantitative predominance of the past imperfect construction over other uses of auxiliary kāna). There exists notable potential for these corpus-based findings to inform and enhance not only grammatical descriptions but also pedagogy of Arabic as a first or second/foreign language.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Corpora","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3366/cor.2021.0225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 1

Abstract

Arabic syntax has yet to be studied in detail from a corpus-based perspective. The Arabic copula kāna (‘be’), functions also as an auxiliary, creating periphrastic tense–aspect constructions; but the literature on these functions is far from exhaustive. To analyse kāna within the one-million word Corpus of Contemporary Arabic, part-of-speech tagging (using novel, targeted enhancements to a previously described program which improves the accessibility for linguistic analysis of the output of Habash et al.’s [2012] mada disambiguator for the Buckwalter Arabic morphological analyser) is applied to disambiguate copula and auxiliary at a high rate of accuracy. Concordances of both are extracted, and 10 percent samples (499 instances of copula kāna and 387 of auxiliary kāna) are analysed manually to identify surface-level grammatical patterns and meanings. This raw analysis is then systematised according to the more general patterns’ main parameters of variation; special descriptions are developed for specific, apparently fixed-form expressions (including two phraseologies which afford expression of verbal and adjectival modality). Overall, we uncover substantial new detail, not mentioned in existing grammars (e.g., the quantitative predominance of the past imperfect construction over other uses of auxiliary kāna). There exists notable potential for these corpus-based findings to inform and enhance not only grammatical descriptions but also pedagogy of Arabic as a first or second/foreign language.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过增强词性标注对阿拉伯语联结词和助词kāna进行探索和分类
阿拉伯语语法尚未从语料库的角度进行详细研究。阿拉伯语系词kāna('be')也起辅助作用,创造了周边时-体结构;但关于这些功能的文献还远远不够详尽。为了分析现代阿拉伯语一百万字语料库中的kāna,词性标记(使用对先前描述的程序的新颖、有针对性的增强,该程序提高了Habash等人[2012]mada用于Buckwalter阿拉伯语词形分析器的消歧器输出的语言分析的可访问性)以高准确率应用于消歧系词和辅助词。提取两者的一致性,并手动分析10%的样本(499个系词kāna和387个助词kās na),以识别表层语法模式和含义。然后,根据更一般的模式的主要变异参数,将这种原始分析系统化;专门的描述是针对特定的、明显固定的形式表达(包括两种提供动词和形容形式表达的短语)而开发的。总的来说,我们发现了大量新的细节,这些细节在现有语法中没有提及(例如,过去不完美的结构在数量上优于辅助kāna的其他用法)。这些基于语料库的发现不仅有助于语法描述,而且有助于提高阿拉伯语作为第一语言或第二语言/外语的教育学。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Corpora
Corpora LINGUISTICS-
CiteScore
1.70
自引率
0.00%
发文量
20
期刊最新文献
Towards increased reliability and transparency in projects with manual linguistic coding The Corpus of Historical Mapudungun: morpho-phonological parsing and the history of a Native American language A comparable corpus-based study of phrasal verbs in academic writing by English and Chinese scholars across disciplines A corpus-based study of the discourse functions of English tense: the co-occurrence of tense and lexical aspect at various textual positions of news reports Twenty-first century ideological discourses about US migrant education that transcend registers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1