基于核心词用法演变差异的中古汉语文本定量文体分析

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-05-28 DOI:10.1145/3665794
Bing Qiu, Jiahao Huo
{"title":"基于核心词用法演变差异的中古汉语文本定量文体分析","authors":"Bing Qiu, Jiahao Huo","doi":"10.1145/3665794","DOIUrl":null,"url":null,"abstract":"<p>Stylistic analysis enables open-ended and exploratory observation of languages. To fill the gap in the quantitative analysis of the stylistic systems of Middle Chinese, we construct lexical features based on the evolutive core word usage and scheme a Bayesian method for feature parameters estimation. The lexical features are from the Swadesh list, each of which has different word forms along with the language evolution during the Middle Ages. We thus count the varied word of those entries along with the language evolution as the linguistic features. With the Bayesian formulation, the feature parameters are estimated to construct a high-dimensional random feature vector in order to obtain the pair-wise dissimilarity matrix of all the texts based on different distance measures. Finally, we perform the spectral embedding and clustering to visualize, categorize and analyze the linguistic styles of Middle Chinese texts. The quantitative result agrees with the existing qualitative conclusions and furthermore, betters our understanding of the linguistic styles of Middle Chinese from both the inter-category and intra-category aspects. It also helps unveil the special styles induced by the indirect language contact.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"47 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quantitative Stylistic Analysis of Middle Chinese Texts Based on the Dissimilarity of Evolutive Core Word Usage\",\"authors\":\"Bing Qiu, Jiahao Huo\",\"doi\":\"10.1145/3665794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Stylistic analysis enables open-ended and exploratory observation of languages. To fill the gap in the quantitative analysis of the stylistic systems of Middle Chinese, we construct lexical features based on the evolutive core word usage and scheme a Bayesian method for feature parameters estimation. The lexical features are from the Swadesh list, each of which has different word forms along with the language evolution during the Middle Ages. We thus count the varied word of those entries along with the language evolution as the linguistic features. With the Bayesian formulation, the feature parameters are estimated to construct a high-dimensional random feature vector in order to obtain the pair-wise dissimilarity matrix of all the texts based on different distance measures. Finally, we perform the spectral embedding and clustering to visualize, categorize and analyze the linguistic styles of Middle Chinese texts. The quantitative result agrees with the existing qualitative conclusions and furthermore, betters our understanding of the linguistic styles of Middle Chinese from both the inter-category and intra-category aspects. It also helps unveil the special styles induced by the indirect language contact.</p>\",\"PeriodicalId\":54312,\"journal\":{\"name\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"volume\":\"47 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Asian and Low-Resource Language Information Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3665794\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3665794","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

文体分析可以对语言进行开放性和探索性的观察。为了填补中古汉语文体系统定量分析的空白,我们根据核心词的演变用法构建词性特征,并采用贝叶斯方法进行特征参数估计。词性特征来自 Swadesh 词表,每个词性特征都随着中古语言的演变而有不同的词形。因此,我们将这些词条中的不同单词以及语言演变过程视为语言特征。通过贝叶斯公式,我们估算了特征参数,构建了一个高维随机特征向量,从而根据不同的距离度量获得了所有文本的成对异质性矩阵。最后,我们进行频谱嵌入和聚类,对中古汉语文本的语言风格进行可视化、分类和分析。定量结果与已有的定性结论相吻合,并进一步从类别间和类别内两个方面加深了我们对中古汉语语言风格的理解。它还有助于揭示间接语言接触所引发的特殊语体。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Quantitative Stylistic Analysis of Middle Chinese Texts Based on the Dissimilarity of Evolutive Core Word Usage

Stylistic analysis enables open-ended and exploratory observation of languages. To fill the gap in the quantitative analysis of the stylistic systems of Middle Chinese, we construct lexical features based on the evolutive core word usage and scheme a Bayesian method for feature parameters estimation. The lexical features are from the Swadesh list, each of which has different word forms along with the language evolution during the Middle Ages. We thus count the varied word of those entries along with the language evolution as the linguistic features. With the Bayesian formulation, the feature parameters are estimated to construct a high-dimensional random feature vector in order to obtain the pair-wise dissimilarity matrix of all the texts based on different distance measures. Finally, we perform the spectral embedding and clustering to visualize, categorize and analyze the linguistic styles of Middle Chinese texts. The quantitative result agrees with the existing qualitative conclusions and furthermore, betters our understanding of the linguistic styles of Middle Chinese from both the inter-category and intra-category aspects. It also helps unveil the special styles induced by the indirect language contact.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.60
自引率
15.00%
发文量
241
期刊介绍: The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.
期刊最新文献
Study on Intelligent Scoring of English Composition Based on Machine Learning from the Perspective of Natural Language Processing FedREAS: A Robust Efficient Aggregation and Selection Framework for Federated Learning X-Phishing-Writer: A Framework for Cross-Lingual Phishing Email Generation Automatic Algerian Sarcasm Detection from Texts and Images KannadaLex: A lexical database with psycholinguistic information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1