{"title":"A New Universal Code Helps to Distinguish Natural Language from Random Texts","authors":"L. Debowski","doi":"10.1515/9783110420296-005","DOIUrl":null,"url":null,"abstract":"Using a new universal distribution called switch distribution, we reveal a prominent statistical difference between a text in natural language and its unigram version. For the text in natural language, the cross mutual information grows as a power law, whereas for the unigram text, it grows logarithmically. In this way, we corroborate Hilberg’s conjecture and disprove an alternative hypothesis that texts in natural language are generated by the unigram model.","PeriodicalId":426263,"journal":{"name":"Recent Contributions to Quantitative Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Contributions to Quantitative Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/9783110420296-005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Using a new universal distribution called switch distribution, we reveal a prominent statistical difference between a text in natural language and its unigram version. For the text in natural language, the cross mutual information grows as a power law, whereas for the unigram text, it grows logarithmically. In this way, we corroborate Hilberg’s conjecture and disprove an alternative hypothesis that texts in natural language are generated by the unigram model.