{"title":"类型学中的统计偏差控制","authors":"Matías Guzmán Naranjo, Laura Becker","doi":"10.1515/lingty-2021-0002","DOIUrl":null,"url":null,"abstract":"Abstract In this paper, we propose two new statistical controls for genealogical and areal bias in typological samples. Our test case being the effect of VO-order effect on affix position (prefixation vs. suffixation), we show how statistical modeling including a phylogenetic regression term (phylogenetic control) and a two-dimensional Gaussian Process (areal control) can be used to capture genealogical and areal effects in a large but unbalanced sample. We find that, once these biases are controlled for, VO-order has no effect on affix position. Another important finding, which is in line with previous studies, is that areal effects are as important as genealogical effects, emphasizing the importance of areal or contact control in typological studies built on language samples. On the other hand, we also show that strict probability sampling is not required with the statistical controls that we propose, as long as the sample is a variety sample large enough to cover different areas and families. This has the crucial practical consequence that it allows us to include as much of the available information as possible, without the need to artificially restrict the sample and potentially lose otherwise available information.","PeriodicalId":45834,"journal":{"name":"Linguistic Typology","volume":"26 1","pages":"605 - 670"},"PeriodicalIF":1.7000,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Statistical bias control in typology\",\"authors\":\"Matías Guzmán Naranjo, Laura Becker\",\"doi\":\"10.1515/lingty-2021-0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In this paper, we propose two new statistical controls for genealogical and areal bias in typological samples. Our test case being the effect of VO-order effect on affix position (prefixation vs. suffixation), we show how statistical modeling including a phylogenetic regression term (phylogenetic control) and a two-dimensional Gaussian Process (areal control) can be used to capture genealogical and areal effects in a large but unbalanced sample. We find that, once these biases are controlled for, VO-order has no effect on affix position. Another important finding, which is in line with previous studies, is that areal effects are as important as genealogical effects, emphasizing the importance of areal or contact control in typological studies built on language samples. On the other hand, we also show that strict probability sampling is not required with the statistical controls that we propose, as long as the sample is a variety sample large enough to cover different areas and families. This has the crucial practical consequence that it allows us to include as much of the available information as possible, without the need to artificially restrict the sample and potentially lose otherwise available information.\",\"PeriodicalId\":45834,\"journal\":{\"name\":\"Linguistic Typology\",\"volume\":\"26 1\",\"pages\":\"605 - 670\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2021-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Linguistic Typology\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1515/lingty-2021-0002\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguistic Typology","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1515/lingty-2021-0002","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
Abstract In this paper, we propose two new statistical controls for genealogical and areal bias in typological samples. Our test case being the effect of VO-order effect on affix position (prefixation vs. suffixation), we show how statistical modeling including a phylogenetic regression term (phylogenetic control) and a two-dimensional Gaussian Process (areal control) can be used to capture genealogical and areal effects in a large but unbalanced sample. We find that, once these biases are controlled for, VO-order has no effect on affix position. Another important finding, which is in line with previous studies, is that areal effects are as important as genealogical effects, emphasizing the importance of areal or contact control in typological studies built on language samples. On the other hand, we also show that strict probability sampling is not required with the statistical controls that we propose, as long as the sample is a variety sample large enough to cover different areas and families. This has the crucial practical consequence that it allows us to include as much of the available information as possible, without the need to artificially restrict the sample and potentially lose otherwise available information.
期刊介绍:
Linguistic Typology provides a forum for all work of relevance to the study of language typology and cross-linguistic variation. It welcomes work taking a typological perspective on all domains of the structure of spoken and signed languages, including historical change, language processing, and sociolinguistics. Diverse descriptive and theoretical frameworks are welcomed so long as they have a clear bearing on the study of cross-linguistic variation. We welcome cross-disciplinary approaches to the study of linguistic diversity, as well as work dealing with just one or a few languages, as long as it is typologically informed and typologically and theoretically relevant, and contains new empirical evidence.