Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function

Hiroshi Ogura, Hiromi Amano, Masato Kondo
{"title":"Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function","authors":"Hiroshi Ogura, Hiromi Amano, Masato Kondo","doi":"10.4236/JDAIP.2019.72004","DOIUrl":null,"url":null,"abstract":"In this study, we regard written texts as time series data and try to investigate dynamic correlations of word occurrences by utilizing an autocorrelation function (ACF). After defining appropriate formula for the ACF that is suitable for expressing the dynamic correlations of words, we use the formula to calculate ACFs for frequent words in 12 books. The ACFs obtained can be classified into two groups: One group of ACFs shows dynamic correlations, with these ACFs well described by a modified Kohlrausch-Williams-Watts (KWW) function; the other group of ACFs shows no correlations, with these ACFs fitted by a simple stepdown function. A word having the former ACF is called a Type-I word and a word with the latter ACF is called a Type-II word. It is also shown that the ACFs of Type-II words can be derived theoretically by assuming that the stochastic process governing word occurrence is a homogeneous Poisson point process. Based on the fitting of the ACFs by KWW and stepdown functions, we propose a measure of word importance which expresses the extent to which a word is important in a particular text. The validity of the measure is confirmed by using the Kleinburg’s burst detection algorithm.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"数据分析和信息处理(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/JDAIP.2019.72004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In this study, we regard written texts as time series data and try to investigate dynamic correlations of word occurrences by utilizing an autocorrelation function (ACF). After defining appropriate formula for the ACF that is suitable for expressing the dynamic correlations of words, we use the formula to calculate ACFs for frequent words in 12 books. The ACFs obtained can be classified into two groups: One group of ACFs shows dynamic correlations, with these ACFs well described by a modified Kohlrausch-Williams-Watts (KWW) function; the other group of ACFs shows no correlations, with these ACFs fitted by a simple stepdown function. A word having the former ACF is called a Type-I word and a word with the latter ACF is called a Type-II word. It is also shown that the ACFs of Type-II words can be derived theoretically by assuming that the stochastic process governing word occurrence is a homogeneous Poisson point process. Based on the fitting of the ACFs by KWW and stepdown functions, we propose a measure of word importance which expresses the extent to which a word is important in a particular text. The validity of the measure is confirmed by using the Kleinburg’s burst detection algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用自相关函数测量文字动态相关性
在本研究中,我们将书面文本视为时间序列数据,并试图利用自相关函数(ACF)来研究单词出现的动态相关性。在定义了适合表达单词动态相关性的ACF公式后,我们使用该公式计算了12本书中频繁单词的ACF。所得的ACFs可分为两类:一类ACFs表现出动态相关性,这些ACFs由修正的Kohlrausch-Williams-Watts (KWW)函数很好地描述;另一组ACFs没有表现出相关性,这些ACFs由一个简单的降压函数拟合。具有前一种ACF的单词称为第一类单词,具有后一种ACF的单词称为第二类单词。假设控制词出现的随机过程是齐次泊松点过程,可以从理论上推导出二类词的ACFs。基于KWW和阶跃函数对ACFs的拟合,我们提出了一个词重要性的度量,它表达了一个词在特定文本中的重要程度。利用Kleinburg突发检测算法验证了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
91
期刊最新文献
A Hybrid Neural Network Model Based on Transfer Learning for Forecasting Forex Market Enhancing Police Officers’ Cybercrime Investigation Skills Using a Checklist Tool A Sufficient Statistical Test for Dynamic Stability Lung Cancer Prediction from Elvira Biomedical Dataset Using Ensemble Classifier with Principal Component Analysis Modelling Key Population Attrition in the HIV and AIDS Programme in Kenya Using Random Survival Forests with Synthetic Minority Oversampling Technique-Nominal Continuous
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1