孩子们怎么说话?通过儿童导向的语言模型改进文本挖掘的教育使用

IF 1.6 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE Information and Learning Sciences Pub Date : 2023-01-19 DOI:10.1108/ils-06-2022-0082
Peter Organisciak, Michele Newman, David Eby, Selcuk Acar, Denis G. Dumas
{"title":"孩子们怎么说话?通过儿童导向的语言模型改进文本挖掘的教育使用","authors":"Peter Organisciak, Michele Newman, David Eby, Selcuk Acar, Denis G. Dumas","doi":"10.1108/ils-06-2022-0082","DOIUrl":null,"url":null,"abstract":"\nPurpose\nMost educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children.\n\n\nDesign/methodology/approach\nThis study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models.\n\n\nFindings\nChild-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.\n\n\nResearch limitations/implications\nThe findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.\n\n\nSocial implications\nUnderstanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases.\n\n\nOriginality/value\nResearch in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.\n","PeriodicalId":44588,"journal":{"name":"Information and Learning Sciences","volume":"6 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"How do the kids speak? Improving educational use of text mining with child-directed language models\",\"authors\":\"Peter Organisciak, Michele Newman, David Eby, Selcuk Acar, Denis G. Dumas\",\"doi\":\"10.1108/ils-06-2022-0082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nPurpose\\nMost educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children.\\n\\n\\nDesign/methodology/approach\\nThis study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models.\\n\\n\\nFindings\\nChild-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.\\n\\n\\nResearch limitations/implications\\nThe findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.\\n\\n\\nSocial implications\\nUnderstanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases.\\n\\n\\nOriginality/value\\nResearch in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.\\n\",\"PeriodicalId\":44588,\"journal\":{\"name\":\"Information and Learning Sciences\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Learning Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/ils-06-2022-0082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Learning Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ils-06-2022-0082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 2

摘要

大多数教育评估倾向于采用封闭式格式,这样更容易获得一致的分数,也更实惠。然而,最近的工作已经利用信息科学的计算文本方法,使开放式测量对年龄较大的学生更有效和可靠。本研究的目的是确定计算文本挖掘应用程序使用的模型在用于小学年龄儿童样本时是否需要进行调整。设计/方法/方法本研究引入了适用于特定儿童文本分析的领域语义模型,以便更好地进行小学阶段的教育评估。从儿童导向的多模态口语和书面语来源中编译的语料库,用于训练儿童语言模型,并根据标准的非年龄特定语义模型进行评估。研究发现,以儿童为导向的语言在词汇和词义使用方面与普通英语有所不同,但表现出较少的性别和种族偏见。该模型在发散思维测量的教育应用中进行了评估,并显示出对广义英语模型的改进。研究局限/启示:研究结果表明,在不断发展的自动化发散思维领域,需要针对特定年龄的语言模型,并通过显示儿童语言的可测量差异,强烈鼓励在计算文本分析的其他教育用途中使用同样的模型。社会意义在自动化教育评估中更有代表性地理解儿童的语言可以使测试更加公平和公平。此外,针对儿童的语言模型较少存在性别和种族偏见。独创性/价值开放式回答的计算测量研究迄今为止使用的语言模型是在一般英语来源或特定领域的来源(如教科书)上训练的。据作者所知,这篇论文是第一个研究针对年龄的教育评估语言模型的论文。此外,虽然已经有几个针对儿童创建或儿童导向语音的高质量语料库,但本文提出的语料库是第一个具有大规模文本建模所需的广度和规模的语料库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
How do the kids speak? Improving educational use of text mining with child-directed language models
Purpose Most educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. The purpose of this study is to determine whether models used by computational text mining applications need to be adapted when used with samples of elementary-aged children. Design/methodology/approach This study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multimodal mix of spoken and written child-directed sources is presented, used to train a children’s language model and evaluated against standard non-age-specific semantic models. Findings Child-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models. Research limitations/implications The findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children. Social implications Understanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Furthermore, child-specific language models have fewer gender and race biases. Originality/value Research in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. To the best of the authors’ knowledge, this paper is the first to study age-specific language models for educational assessment. In addition, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information and Learning Sciences
Information and Learning Sciences INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
9.50
自引率
2.90%
发文量
30
期刊介绍: Information and Learning Sciences advances inter-disciplinary research that explores scholarly intersections shared within 2 key fields: information science and the learning sciences / education sciences. The journal provides a publication venue for work that strengthens our scholarly understanding of human inquiry and learning phenomena, especially as they relate to design and uses of information and e-learning systems innovations.
期刊最新文献
A critical (theory) data literacy: tales from the field Toward a new framework for teaching algorithmic literacy Promoting students’ informal inferential reasoning through arts-integrated data literacy education The data awareness framework as part of data literacies in K-12 education Learning experience network analysis for design-based research
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1