{"title":"Attempting at parametrization of moderate-length poetic texts: Moses, a poem by Ivan Franko","authors":"S. Buk, Andrij Rovenchak","doi":"10.53482/2022_53_399","DOIUrl":null,"url":null,"abstract":"The aim of this study is to find parameters that can be used for classification of not very long texts, for example, by author, genre, etc. We go through various known parameters and analyze to what extent they are useful for the intended purposes. We also suggest some improvements that need to be checked further. We calculate the values of parameters at various points of text comprising N tokens (running words) counted from the beginning of text. As parameters with prospects of author and/or language attribution we identify, in particular, the h-point scaling coefficient, Yule’s K, relative repeat rate, and the fraction of dis legomena. These parameters demonstrate quite stable behavior in N. Another set includes scaling exponents of parameters with respect to N. Certain modifications are suggested for Lambda and entropy introducing logarithmic corrections being powers of ln N. The results are applicable for texts of thousands to tens of thousand words.","PeriodicalId":51918,"journal":{"name":"Glottometrics","volume":"6 1","pages":"1-23"},"PeriodicalIF":0.2000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Glottometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53482/2022_53_399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 1
Abstract
The aim of this study is to find parameters that can be used for classification of not very long texts, for example, by author, genre, etc. We go through various known parameters and analyze to what extent they are useful for the intended purposes. We also suggest some improvements that need to be checked further. We calculate the values of parameters at various points of text comprising N tokens (running words) counted from the beginning of text. As parameters with prospects of author and/or language attribution we identify, in particular, the h-point scaling coefficient, Yule’s K, relative repeat rate, and the fraction of dis legomena. These parameters demonstrate quite stable behavior in N. Another set includes scaling exponents of parameters with respect to N. Certain modifications are suggested for Lambda and entropy introducing logarithmic corrections being powers of ln N. The results are applicable for texts of thousands to tens of thousand words.
期刊介绍:
The aim of Glottometrics is quantification, measurement and mathematical modeling of any kind of language phenomena. We invite contributions on probabilistic or other mathematical models (e.g. graph theoretic or optimization approaches) which enable to establish language laws that can be validated by testing statistical hypotheses.