基于 NLP 的大型多选题试题库管理

IF 2.9 Q1 EDUCATION & EDUCATIONAL RESEARCH Journal of Learning Analytics Pub Date : 2023-12-12 DOI:10.18608/jla.2023.7897

Valentina Albano, D. Firmani, Luigi Laura, Jerin George Mathew, Anna Lucia Paoletti, Irene Torrente

{"title":"基于 NLP 的大型多选题试题库管理","authors":"Valentina Albano, D. Firmani, Luigi Laura, Jerin George Mathew, Anna Lucia Paoletti, Irene Torrente","doi":"10.18608/jla.2023.7897","DOIUrl":null,"url":null,"abstract":"Multiple-choice questions (MCQs) are widely used in educational assessments and professional certification exams. Managing large repositories of MCQs, however, poses several challenges due to the high volume of questions and the need to maintain their quality and relevance over time. One of these challenges is the presence of questions that duplicate concepts but are formulated differently. Such questions can indeed elude syntactic controls but provide no added value to the repository.In this paper, we focus on this specific challenge and propose a workflow for the discovery and management of potential duplicate questions in large MCQ repositories. Overall, the workflow comprises three main steps: MCQ preprocessing, similarity computation, and finally a graph-based exploration and analysis of the obtained similarity values. For the preprocessing phase, we consider three main strategies: (i) removing the list of candidate answers from each question, (ii) augmenting each question with the correct answer, or (iii) augmenting each question with all candidate answers. Then, we use deep learning–based natural language processing (NLP) techniques, based on the Transformers architecture, to compute similarities between MCQs based on semantics. Finally, we propose a new approach to graph exploration based on graph communities to analyze the similarities and relationships between MCQs in the graph. We illustrate the approach with a case study of the Competenze Digital program, a large-scale assessment project by the Italian government. ","PeriodicalId":36754,"journal":{"name":"Journal of Learning Analytics","volume":"34 14","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NLP-Based Management of Large Multiple-Choice Test Item Repositories\",\"authors\":\"Valentina Albano, D. Firmani, Luigi Laura, Jerin George Mathew, Anna Lucia Paoletti, Irene Torrente\",\"doi\":\"10.18608/jla.2023.7897\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiple-choice questions (MCQs) are widely used in educational assessments and professional certification exams. Managing large repositories of MCQs, however, poses several challenges due to the high volume of questions and the need to maintain their quality and relevance over time. One of these challenges is the presence of questions that duplicate concepts but are formulated differently. Such questions can indeed elude syntactic controls but provide no added value to the repository.In this paper, we focus on this specific challenge and propose a workflow for the discovery and management of potential duplicate questions in large MCQ repositories. Overall, the workflow comprises three main steps: MCQ preprocessing, similarity computation, and finally a graph-based exploration and analysis of the obtained similarity values. For the preprocessing phase, we consider three main strategies: (i) removing the list of candidate answers from each question, (ii) augmenting each question with the correct answer, or (iii) augmenting each question with all candidate answers. Then, we use deep learning–based natural language processing (NLP) techniques, based on the Transformers architecture, to compute similarities between MCQs based on semantics. Finally, we propose a new approach to graph exploration based on graph communities to analyze the similarities and relationships between MCQs in the graph. We illustrate the approach with a case study of the Competenze Digital program, a large-scale assessment project by the Italian government. \",\"PeriodicalId\":36754,\"journal\":{\"name\":\"Journal of Learning Analytics\",\"volume\":\"34 14\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Learning Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18608/jla.2023.7897\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Learning Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18608/jla.2023.7897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

多项选择题（MCQ）被广泛应用于教育评估和专业认证考试中。然而，由于问题数量庞大，而且需要长期保持其质量和相关性，因此管理大型 MCQ 题库面临着诸多挑战。其中一个挑战是存在概念重复但表述不同的问题。在本文中，我们重点讨论了这一具体挑战，并提出了在大型 MCQ 库中发现和管理潜在重复问题的工作流程。总的来说，工作流程包括三个主要步骤：MCQ 预处理、相似性计算，最后是对获得的相似性值进行基于图的探索和分析。在预处理阶段，我们考虑了三种主要策略：(i) 删除每个问题的候选答案列表；(ii) 用正确答案增强每个问题；或 (iii) 用所有候选答案增强每个问题。然后，我们在 Transformers 架构的基础上使用基于深度学习的自然语言处理（NLP）技术，根据语义计算 MCQ 之间的相似性。最后，我们提出了一种基于图群落的图探索新方法，用于分析图中 MCQ 之间的相似性和关系。我们以意大利政府的大型评估项目 Competenze Digital 计划为例，对该方法进行了说明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NLP-Based Management of Large Multiple-Choice Test Item Repositories

Multiple-choice questions (MCQs) are widely used in educational assessments and professional certification exams. Managing large repositories of MCQs, however, poses several challenges due to the high volume of questions and the need to maintain their quality and relevance over time. One of these challenges is the presence of questions that duplicate concepts but are formulated differently. Such questions can indeed elude syntactic controls but provide no added value to the repository.In this paper, we focus on this specific challenge and propose a workflow for the discovery and management of potential duplicate questions in large MCQ repositories. Overall, the workflow comprises three main steps: MCQ preprocessing, similarity computation, and finally a graph-based exploration and analysis of the obtained similarity values. For the preprocessing phase, we consider three main strategies: (i) removing the list of candidate answers from each question, (ii) augmenting each question with the correct answer, or (iii) augmenting each question with all candidate answers. Then, we use deep learning–based natural language processing (NLP) techniques, based on the Transformers architecture, to compute similarities between MCQs based on semantics. Finally, we propose a new approach to graph exploration based on graph communities to analyze the similarities and relationships between MCQs in the graph. We illustrate the approach with a case study of the Competenze Digital program, a large-scale assessment project by the Italian government.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊