Research on the Value and Language Features of Chinese Language and Literature Texts Based on Text Mining Technology

Q3 Multidisciplinary Archives Des Sciences Pub Date : 2024-05-15 DOI:10.62227/as/74221
Yiyi Ru
{"title":"Research on the Value and Language Features of Chinese Language and Literature Texts Based on Text Mining Technology","authors":"Yiyi Ru","doi":"10.62227/as/74221","DOIUrl":null,"url":null,"abstract":"The linguistic characteristics of a literary work are the way of thinking embodied in the author’s use of language. From the textual value of Chinese language literature, this paper analyzes the spiritual connotation of Chinese language literature from two dimensions: reading and education. Based on the web crawler technology, we obtain the text data of Chinese language literature from three writers, Bajin, Yu Zheng and Qiong Yao, preprocess the data through data cleaning, Chinese word segmentation, de-duplication, etc., and extract the feature values of the text by using the TF-IDF algorithm. Then the text documents are mapped onto vectors using the VSM model, and the parameters of the LDA topic model are estimated by the Gibbs sampling algorithm in order to better obtain the topic changes of the Chinese language literature texts. This paper carries out linguistic feature verification from the lexical and similarity features of Chinese language literary texts. It is found that the difference in lexical density between Ba Jin’s Cold Night and Resting Garden is only 2.1 percentage points, and the frequency of the verb “to say” is 1,213 times and 735 times respectively. The average sentence lengths of Yu Zheng and Qiong Yao fluctuate within the range of [18.49,34.27], and Qiong Yao’s works have a higher thematic concentration than Zheng Zheng’s works. Analyzing the linguistic features of Chinese language literary texts based on text mining techniques helps to understand the authors’ language usage methods and helps to promote innovative expression paths in literary texts.","PeriodicalId":55478,"journal":{"name":"Archives Des Sciences","volume":" 33","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives Des Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62227/as/74221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

Abstract

The linguistic characteristics of a literary work are the way of thinking embodied in the author’s use of language. From the textual value of Chinese language literature, this paper analyzes the spiritual connotation of Chinese language literature from two dimensions: reading and education. Based on the web crawler technology, we obtain the text data of Chinese language literature from three writers, Bajin, Yu Zheng and Qiong Yao, preprocess the data through data cleaning, Chinese word segmentation, de-duplication, etc., and extract the feature values of the text by using the TF-IDF algorithm. Then the text documents are mapped onto vectors using the VSM model, and the parameters of the LDA topic model are estimated by the Gibbs sampling algorithm in order to better obtain the topic changes of the Chinese language literature texts. This paper carries out linguistic feature verification from the lexical and similarity features of Chinese language literary texts. It is found that the difference in lexical density between Ba Jin’s Cold Night and Resting Garden is only 2.1 percentage points, and the frequency of the verb “to say” is 1,213 times and 735 times respectively. The average sentence lengths of Yu Zheng and Qiong Yao fluctuate within the range of [18.49,34.27], and Qiong Yao’s works have a higher thematic concentration than Zheng Zheng’s works. Analyzing the linguistic features of Chinese language literary texts based on text mining techniques helps to understand the authors’ language usage methods and helps to promote innovative expression paths in literary texts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于文本挖掘技术的汉语言文学文本价值与语言特点研究
文学作品的语言特点是作者运用语言所体现的思维方式。本文从华语文学的文本价值出发,从阅读和教育两个维度分析华语文学的精神内涵。基于网络爬虫技术,获取巴金、于正、琼瑶三位作家的华语文学文本数据,通过数据清洗、中文分词、去重等方法对数据进行预处理,并利用TF-IDF算法提取文本的特征值。然后利用 VSM 模型将文本文献映射到向量上,并利用 Gibbs 抽样算法估计 LDA 主题模型的参数,从而更好地获得中文文献文本的主题变化。本文从汉语文学文本的词性特征和相似性特征出发,进行了语言特征验证。研究发现,巴金的《寒夜》和《憩园》在词性密度上仅相差 2.1 个百分点,动词 "说 "的出现频率分别为 1 213 次和 735 次。于正和琼瑶的平均句长在[18.49,34.27]的范围内波动,琼瑶作品的主题集中度高于于正作品。基于文本挖掘技术分析华语文学文本的语言特点,有助于了解作者的语言使用方法,有助于推动文学文本表达路径的创新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Archives Des Sciences
Archives Des Sciences 综合性期刊-综合性期刊
CiteScore
1.10
自引率
0.00%
发文量
0
审稿时长
1 months
期刊介绍: Archives des Sciences est un journal scientifique multidisciplinaire et international. Les articles sont soumis à un comité de lecture.
期刊最新文献
Teaching Innovation of Core Literacy in Physical Education and Health Courses in Colleges and Universities Based on Multiple Regression Modeling Status and Challenges of Schooling Effectiveness in Malo-Koza District of Gofa Zone, Ethiopia Choosing Professional Supporting Means to Improve Topspin Forehand Tennis Ball Technique for Students in Physical Education at Hong Duc University Research on Homework in ELT: A Systematic Review Military-Aggressive Crime as A Subject of War Criminology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1