Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpus

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Data in Brief Pub Date : 2025-04-01 Epub Date: 2025-02-03 DOI:10.1016/j.dib.2025.111349

Khabibulla Madatov , Sapura Sattarova , Jernej Vičič

引用次数: 0

Abstract

The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the ``Explanatory Vocabulary of the Uzbek Language'' (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the ``Uzbek Primary School Corpus'' (UPSC) by authors. Using the ``Comparative Lemma Extraction Method'' (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

乌兹别克小学教育词汇集：学校语料库的抽取与分析

这项研究工作的主要目标是确定小学生在每学年应该知道/掌握的新单词的数量。为此，我们创建了两个数据集。第一个数据集是基于“乌兹别克语解释性词汇”（EDUL）编制的。第二个数据集是由乌兹别克斯坦共和国学前和学校教育部批准的35本1-4年级小学教科书创建的，作者将其命名为“乌兹别克斯坦小学语料库”（UPSC）。使用本文作者提出的“比较引理提取法”（CLEM），我们创建了1-4年级的词汇表，并解决了小学生每学年应该学习多少新词的问题（不考虑词形，因为乌兹别克语是一种形态丰富的语言）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Data in Brief MULTIDISCIPLINARY SCIENCES-

CiteScore

3.10

自引率

0.00%

发文量

996

审稿时长

70 days

期刊介绍： Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.