Lexical and syntactic features of academic Russian texts: a discriminant analysis

R. Kupriyanov, M. Solnyshkina, M. Dascalu, Tatyana A. Soldatkina
{"title":"Lexical and syntactic features of academic Russian texts: a discriminant analysis","authors":"R. Kupriyanov, M. Solnyshkina, M. Dascalu, Tatyana A. Soldatkina","doi":"10.18413/2313-8912-2022-8-4-0-8","DOIUrl":null,"url":null,"abstract":"This article presents three mathematical models to differentiate academic texts from three subject discourses written in Russian (i.e., Philological, Mathematical, and Natural Sciences) which further enable design and automated profiling of corresponding typologies. Our models include 5 indices, one at surface level (i.e., sentence length) and 4 syntax features (i.e., mean verbs per sentence, mean adjectives per sentence, local noun overlap, and global argument overlap). We identified and validated the five statistically significant features out of 45 linguistic features extracted from our research corpus consisting of 91.185 tokens. The shortest sentence length is found in Russian language textbooks while the longest sentences are identified in Natural Science texts. The mean number of verbs, nouns, and adjectives per sentence is higher in Natural Science textbooks, whereas Mathematics discourse is characterized by the shortest word length, highest local noun overlap, and highest global argument overlap. We assign the metric differences between the three discourses to their functions: Natural Science texts are characterized by descriptions and narrative passages in contrast to Philology that is associated with opinions. Mathematical discourse operates with precise definitions, explanations and justifications thus exercising numerous overlaps. The discriminant analysis built on top of the features supports the development of text profilers targeting parametric analyses. The automation of these features and the provided formulas for classification enable the design and development of text profilers required for textbook writing and editing. Our findings are useful for professional linguists, technologists, and academic writers to select and modify texts for their target audience.","PeriodicalId":346928,"journal":{"name":"RESEARCH RESULT Theoretical and Applied Linguistics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"RESEARCH RESULT Theoretical and Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18413/2313-8912-2022-8-4-0-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This article presents three mathematical models to differentiate academic texts from three subject discourses written in Russian (i.e., Philological, Mathematical, and Natural Sciences) which further enable design and automated profiling of corresponding typologies. Our models include 5 indices, one at surface level (i.e., sentence length) and 4 syntax features (i.e., mean verbs per sentence, mean adjectives per sentence, local noun overlap, and global argument overlap). We identified and validated the five statistically significant features out of 45 linguistic features extracted from our research corpus consisting of 91.185 tokens. The shortest sentence length is found in Russian language textbooks while the longest sentences are identified in Natural Science texts. The mean number of verbs, nouns, and adjectives per sentence is higher in Natural Science textbooks, whereas Mathematics discourse is characterized by the shortest word length, highest local noun overlap, and highest global argument overlap. We assign the metric differences between the three discourses to their functions: Natural Science texts are characterized by descriptions and narrative passages in contrast to Philology that is associated with opinions. Mathematical discourse operates with precise definitions, explanations and justifications thus exercising numerous overlaps. The discriminant analysis built on top of the features supports the development of text profilers targeting parametric analyses. The automation of these features and the provided formulas for classification enable the design and development of text profilers required for textbook writing and editing. Our findings are useful for professional linguists, technologists, and academic writers to select and modify texts for their target audience.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
俄语学术语篇的词汇和句法特征辨析
本文提出了三个数学模型来区分用俄语写的三个学科话语(即,文献学,数学和自然科学)的学术文本,从而进一步实现相应类型学的设计和自动分析。我们的模型包括5个指标,一个在表层(即句子长度)和4个语法特征(即每个句子的平均动词,每个句子的平均形容词,局部名词重叠和全局参数重叠)。我们从91.185个token组成的研究语料库中提取的45个语言特征中识别并验证了5个具有统计意义的特征。俄语教科书的句子长度最短,而自然科学教科书的句子长度最长。在自然科学教科书中,每句动词、名词和形容词的平均数量更高,而数学话语的特点是最短的单词长度,最高的局部名词重叠和最高的全局论点重叠。我们将三种话语之间的度量差异分配给它们的功能:自然科学文本的特点是描述和叙事段落,而文字学则与意见有关。数学话语以精确的定义、解释和论证运作,因此有许多重叠。建立在特征之上的判别分析支持以参数分析为目标的文本分析器的开发。这些功能的自动化和提供的分类公式使设计和开发教科书编写和编辑所需的文本分析器成为可能。我们的发现对专业语言学家、技术专家和学术作家为他们的目标受众选择和修改文本很有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Le Guin's magic in the context of Taoism: reading A Wizard of Earthsea Apart together in extremis: an exploratory study of online social media discourse on the emergency shift to distance interpreting Language complexity across sub-styles and genres in legal Russian Relexification and dialect levelling in the genesis of creoles: the case of the Arabic-based creole, Nubi Research article discussion moves and steps in papers on medicine: academic literacy and respect for readers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1