Peter Organisciak, Michele Newman, David Eby, Selcuk Acar, Denis G. Dumas
{"title":"孩子们怎么说话?通过儿童导向的语言模型改进文本挖掘的教育使用","authors":"Peter Organisciak, Michele Newman, David Eby, Selcuk Acar, Denis G. Dumas","doi":"10.1108/ils-06-2022-0082","DOIUrl":null,"url":null,"abstract":"\nPurpose\nMost educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children.\n\n\nDesign/methodology/approach\nThis study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models.\n\n\nFindings\nChild-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.\n\n\nResearch limitations/implications\nThe findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.\n\n\nSocial implications\nUnderstanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases.\n\n\nOriginality/value\nResearch in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.\n","PeriodicalId":44588,"journal":{"name":"Information and Learning Sciences","volume":"6 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"How do the kids speak? Improving educational use of text mining with child-directed language models\",\"authors\":\"Peter Organisciak, Michele Newman, David Eby, Selcuk Acar, Denis G. Dumas\",\"doi\":\"10.1108/ils-06-2022-0082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nPurpose\\nMost educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children.\\n\\n\\nDesign/methodology/approach\\nThis study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models.\\n\\n\\nFindings\\nChild-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.\\n\\n\\nResearch limitations/implications\\nThe findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.\\n\\n\\nSocial implications\\nUnderstanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases.\\n\\n\\nOriginality/value\\nResearch in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.\\n\",\"PeriodicalId\":44588,\"journal\":{\"name\":\"Information and Learning Sciences\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Learning Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/ils-06-2022-0082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Learning Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ils-06-2022-0082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
How do the kids speak? Improving educational use of text mining with child-directed language models
Purpose
Most educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children.
Design/methodology/approach
This study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models.
Findings
Child-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.
Research limitations/implications
The findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.
Social implications
Understanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases.
Originality/value
Research in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.
期刊介绍:
Information and Learning Sciences advances inter-disciplinary research that explores scholarly intersections shared within 2 key fields: information science and the learning sciences / education sciences. The journal provides a publication venue for work that strengthens our scholarly understanding of human inquiry and learning phenomena, especially as they relate to design and uses of information and e-learning systems innovations.