Do boys and girls write the same? Analysis of n-grams of morphological categories (¿Niños y niñas escriben igual? Análisis de n-gramas de categorías morfológicas)
{"title":"Do boys and girls write the same? Analysis of n-grams of morphological categories (¿Niños y niñas escriben igual? Análisis de n-gramas de categorías morfológicas)","authors":"Sheila Queralt, Jordi Cicres","doi":"10.1080/11356405.2022.2121130","DOIUrl":null,"url":null,"abstract":"ABSTRACT The objective of this study is to characterize writing samples in Catalan written by boys and girls in primary school (from seven to 12 years old) using syntactic patterns. The corpus contains 169 writings divided by sex (76 boys and 93 girls) with an average of 200 words and a total length of 33,763 words. From this corpus, we calculated the 40 n-grams of the most frequent morphological categories (bigrams, trigrams). The data were statistically analysed using ANOVA and Linear Discriminant Analysis, and the accuracy in predicting the writer’s gender in a cross-validation experiment was 60.4% using both bigrams and trigrams. When the children’s age was taken into account, the percentage of accuracy was higher than 70% in both the original classification and the cross-validation. The identification of the most discriminating bigrams and trigrams allowed us to determine that girls show a greater expressive capacity and superior syntactic maturity, and greater lexical and syntactic richness.","PeriodicalId":51688,"journal":{"name":"Culture and Education","volume":"11 1","pages":"33 - 63"},"PeriodicalIF":1.1000,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Culture and Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/11356405.2022.2121130","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0
Abstract
ABSTRACT The objective of this study is to characterize writing samples in Catalan written by boys and girls in primary school (from seven to 12 years old) using syntactic patterns. The corpus contains 169 writings divided by sex (76 boys and 93 girls) with an average of 200 words and a total length of 33,763 words. From this corpus, we calculated the 40 n-grams of the most frequent morphological categories (bigrams, trigrams). The data were statistically analysed using ANOVA and Linear Discriminant Analysis, and the accuracy in predicting the writer’s gender in a cross-validation experiment was 60.4% using both bigrams and trigrams. When the children’s age was taken into account, the percentage of accuracy was higher than 70% in both the original classification and the cross-validation. The identification of the most discriminating bigrams and trigrams allowed us to determine that girls show a greater expressive capacity and superior syntactic maturity, and greater lexical and syntactic richness.