Berhane Abebe, M. Chebunin, A. Kovalevskii, N. Zakrevskaya
{"title":"Statistical tests for text homogeneity: using forward and backward processes of numbers of different words","authors":"Berhane Abebe, M. Chebunin, A. Kovalevskii, N. Zakrevskaya","doi":"10.53482/2022_53_401","DOIUrl":null,"url":null,"abstract":"The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.","PeriodicalId":51918,"journal":{"name":"Glottometrics","volume":"37 1","pages":"42-58"},"PeriodicalIF":0.2000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Glottometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53482/2022_53_401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 1
Abstract
The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.
期刊介绍:
The aim of Glottometrics is quantification, measurement and mathematical modeling of any kind of language phenomena. We invite contributions on probabilistic or other mathematical models (e.g. graph theoretic or optimization approaches) which enable to establish language laws that can be validated by testing statistical hypotheses.