孩子们怎么说话?通过儿童导向的语言模型改进文本挖掘的教育使用

IF 2.2 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE Information and Learning Sciences Pub Date : 2023-01-19 DOI:10.1108/ils-06-2022-0082

Peter Organisciak, Michele Newman, David Eby, Selcuk Acar, Denis G. Dumas

{"title":"孩子们怎么说话?通过儿童导向的语言模型改进文本挖掘的教育使用","authors":"Peter Organisciak, Michele Newman, David Eby, Selcuk Acar, Denis G. Dumas","doi":"10.1108/ils-06-2022-0082","DOIUrl":null,"url":null,"abstract":"\nPurpose\nMost educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children.\n\n\nDesign/methodology/approach\nThis study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models.\n\n\nFindings\nChild-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.\n\n\nResearch limitations/implications\nThe findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.\n\n\nSocial implications\nUnderstanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases.\n\n\nOriginality/value\nResearch in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.\n","PeriodicalId":44588,"journal":{"name":"Information and Learning Sciences","volume":"6 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"How do the kids speak? Improving educational use of text mining with child-directed language models\",\"authors\":\"Peter Organisciak, Michele Newman, David Eby, Selcuk Acar, Denis G. Dumas\",\"doi\":\"10.1108/ils-06-2022-0082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nPurpose\\nMost educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children.\\n\\n\\nDesign/methodology/approach\\nThis study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models.\\n\\n\\nFindings\\nChild-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.\\n\\n\\nResearch limitations/implications\\nThe findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.\\n\\n\\nSocial implications\\nUnderstanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases.\\n\\n\\nOriginality/value\\nResearch in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.\\n\",\"PeriodicalId\":44588,\"journal\":{\"name\":\"Information and Learning Sciences\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Learning Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/ils-06-2022-0082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Learning Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ils-06-2022-0082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 2

摘要

大多数教育评估倾向于采用封闭式格式，这样更容易获得一致的分数，也更实惠。然而，最近的工作已经利用信息科学的计算文本方法，使开放式测量对年龄较大的学生更有效和可靠。本研究的目的是确定计算文本挖掘应用程序使用的模型在用于小学年龄儿童样本时是否需要进行调整。设计/方法/方法本研究引入了适用于特定儿童文本分析的领域语义模型，以便更好地进行小学阶段的教育评估。从儿童导向的多模态口语和书面语来源中编译的语料库，用于训练儿童语言模型，并根据标准的非年龄特定语义模型进行评估。研究发现，以儿童为导向的语言在词汇和词义使用方面与普通英语有所不同，但表现出较少的性别和种族偏见。该模型在发散思维测量的教育应用中进行了评估，并显示出对广义英语模型的改进。研究局限/启示:研究结果表明，在不断发展的自动化发散思维领域，需要针对特定年龄的语言模型，并通过显示儿童语言的可测量差异，强烈鼓励在计算文本分析的其他教育用途中使用同样的模型。社会意义在自动化教育评估中更有代表性地理解儿童的语言可以使测试更加公平和公平。此外，针对儿童的语言模型较少存在性别和种族偏见。独创性/价值开放式回答的计算测量研究迄今为止使用的语言模型是在一般英语来源或特定领域的来源(如教科书)上训练的。据作者所知，这篇论文是第一个研究针对年龄的教育评估语言模型的论文。此外，虽然已经有几个针对儿童创建或儿童导向语音的高质量语料库，但本文提出的语料库是第一个具有大规模文本建模所需的广度和规模的语料库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

How do the kids speak? Improving educational use of text mining with child-directed language models

Purpose Most educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children. Design/methodology/approach This study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models. Findings Child-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models. Research limitations/implications The findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children. Social implications Understanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases. Originality/value Research in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information and Learning Sciences INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

9.50

自引率

2.90%

发文量

期刊介绍： Information and Learning Sciences advances inter-disciplinary research that explores scholarly intersections shared within 2 key fields: information science and the learning sciences / education sciences. The journal provides a publication venue for work that strengthens our scholarly understanding of human inquiry and learning phenomena, especially as they relate to design and uses of information and e-learning systems innovations.

期刊最新文献

A critical (theory) data literacy: tales from the field Toward a new framework for teaching algorithmic literacy Promoting students’ informal inferential reasoning through arts-integrated data literacy education The data awareness framework as part of data literacies in K-12 education Learning experience network analysis for design-based research