The Evolution of the Idiolect over the Lifetime: A Quantitative and Qualitative Study of French 19th Century Literature

Q1 Arts and Humanities Journal of Cultural Analytics Pub Date : 2022-09-01 DOI:10.22148/001c.37588
Olga Seminck, P. Gambette, Dominique Legallois, T. Poibeau
{"title":"The Evolution of the Idiolect over the Lifetime: A Quantitative and Qualitative Study of French 19th Century Literature","authors":"Olga Seminck, P. Gambette, Dominique Legallois, T. Poibeau","doi":"10.22148/001c.37588","DOIUrl":null,"url":null,"abstract":"The way in which authors express themselves is unique but changes over their lifetime. However, quantitative studies of this idiolectal evolution are rare. Using the Corpus for Idiolectal Research (CIDRE) that contains the dated works of 11 prolific 19th century French fiction writers, we propose new methods to identify, quantify and describe the grammatical-stylistic changes that take place using lexico-morphosyntactic patterns, also called motifs. To examine the strength of the chronological signal of change, we developed a method to calculate if a distance matrix of literary works contains a stronger chronological signal than expected by chance. Ten out of 11 corpora showed a higher than chance chronological signal, leading us to conclude that the evolution of the idiolect is in a mathematical sense monotonic, supporting the rectilinearity hypothesis previously put forward in the stylometric literature. The rectilinear property of the evolution of the idiolect found for most authors in CIDRE subsequently enabled us to propose a machine learning task: predicting the year in which a work was written. For the majority of the authors in our corpus, the accuracy and the amount of variance that is explained by the model were high and we discuss why the technique might fail for others. After applying a feature selection algorithm, we examined the most important features, i.e. the motifs that have the greatest influence on idiolectal evolution. We find that some of those features are stylistic and have been previously identified in qualitative literature studies. We report some remarkable stylistic constructions revealed by our algorithm to illustrate which kind of stylistic patterns can be extracted using our method.","PeriodicalId":33005,"journal":{"name":"Journal of Cultural Analytics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cultural Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22148/001c.37588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 4

Abstract

The way in which authors express themselves is unique but changes over their lifetime. However, quantitative studies of this idiolectal evolution are rare. Using the Corpus for Idiolectal Research (CIDRE) that contains the dated works of 11 prolific 19th century French fiction writers, we propose new methods to identify, quantify and describe the grammatical-stylistic changes that take place using lexico-morphosyntactic patterns, also called motifs. To examine the strength of the chronological signal of change, we developed a method to calculate if a distance matrix of literary works contains a stronger chronological signal than expected by chance. Ten out of 11 corpora showed a higher than chance chronological signal, leading us to conclude that the evolution of the idiolect is in a mathematical sense monotonic, supporting the rectilinearity hypothesis previously put forward in the stylometric literature. The rectilinear property of the evolution of the idiolect found for most authors in CIDRE subsequently enabled us to propose a machine learning task: predicting the year in which a work was written. For the majority of the authors in our corpus, the accuracy and the amount of variance that is explained by the model were high and we discuss why the technique might fail for others. After applying a feature selection algorithm, we examined the most important features, i.e. the motifs that have the greatest influence on idiolectal evolution. We find that some of those features are stylistic and have been previously identified in qualitative literature studies. We report some remarkable stylistic constructions revealed by our algorithm to illustrate which kind of stylistic patterns can be extracted using our method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一生中惯用语的演变:19世纪法国文学的定量与定性研究
作家表达自己的方式是独特的,但随着他们的一生而变化。然而,这种个体进化的定量研究很少。利用包含11位19世纪多产法国小说作家作品的成语研究语料库(CIDRE),我们提出了新的方法来识别,量化和描述使用词汇-形态-句法模式(也称为母旨)发生的语法-风格变化。为了检验时间变化信号的强度,我们开发了一种方法来计算文学作品的距离矩阵是否包含比偶然预期的更强的时间变化信号。11个语料库中有10个显示出高于偶然的时间顺序信号,这使我们得出结论,从数学意义上讲,习语的演变是单调的,支持了先前在文体学文献中提出的线性假设。大多数作者在CIDRE中发现的惯语进化的线性特性随后使我们能够提出一个机器学习任务:预测作品写作的年份。对于我们语料库中的大多数作者来说,该模型解释的准确性和方差量很高,我们讨论了为什么该技术可能会对其他人失败。在应用特征选择算法后,我们检查了最重要的特征,即对个体进化影响最大的基序。我们发现其中一些特征是文体上的,并且已经在先前的定性文献研究中被确定。我们报告了算法揭示的一些显著的文体结构,以说明使用我们的方法可以提取哪些文体模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Cultural Analytics
Journal of Cultural Analytics Arts and Humanities-Literature and Literary Theory
CiteScore
2.90
自引率
0.00%
发文量
9
审稿时长
10 weeks
期刊最新文献
Soviet View of the World. Exploring Long-Term Visual Patterns in “Novosti dnia” Newsreel Journal (1945-1992) A Digital Archaeology of Early Hispanic Film Culture: Film Magazines and the Male Fan Reader A Digital Trail of Rupture. The German Film Exile 1933-1945 in the Data of Günter Peter Straschek Approaching a National Film History through Data. Network Analysis in German Film History Digital Film Historiography: Challenges of/and Interdisciplinarity
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1