Multimodal Capture of Teacher-Student Interactions for Automated Dialogic Analysis in Live Classrooms

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI:10.1145/2818346.2830602

S. D’Mello, A. Olney, Nathaniel Blanchard, Borhan Samei, Xiaoyi Sun, Brooke Ward, Sean Kelly

{"title":"Multimodal Capture of Teacher-Student Interactions for Automated Dialogic Analysis in Live Classrooms","authors":"S. D’Mello, A. Olney, Nathaniel Blanchard, Borhan Samei, Xiaoyi Sun, Brooke Ward, Sean Kelly","doi":"10.1145/2818346.2830602","DOIUrl":null,"url":null,"abstract":"We focus on data collection designs for the automated analysis of teacher-student interactions in live classrooms with the goal of identifying instructional activities (e.g., lecturing, discussion) and assessing the quality of dialogic instruction (e.g., analysis of questions). Our designs were motivated by multiple technical requirements and constraints. Most importantly, teachers could be individually micfied but their audio needed to be of excellent quality for automatic speech recognition (ASR) and spoken utterance segmentation. Individual students could not be micfied but classroom audio quality only needed to be sufficient to detect student spoken utterances. Visual information could only be recorded if students could not be identified. Design 1 used an omnidirectional laptop microphone to record both teacher and classroom audio and was quickly deemed unsuitable. In Designs 2 and 3, teachers wore a wireless Samson AirLine 77 vocal headset system, which is a unidirectional microphone with a cardioid pickup pattern. In Design 2, classroom audio was recorded with dual first- generation Microsoft Kinects placed at the front corners of the class. Design 3 used a Crown PZM-30D pressure zone microphone mounted on the blackboard to record classroom audio. Designs 2 and 3 were tested by recording audio in 38 live middle school classrooms from six U.S. schools while trained human coders simultaneously performed live coding of classroom discourse. Qualitative and quantitative analyses revealed that Design 3 was suitable for three of our core tasks: (1) ASR on teacher speech (word recognition rate of 66% and word overlap rate of 69% using Google Speech ASR engine); (2) teacher utterance segmentation (F-measure of 97%); and (3) student utterance segmentation (F-measure of 66%). Ideas to incorporate video and skeletal tracking with dual second-generation Kinects to produce Design 4 are discussed.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2818346.2830602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 49

Abstract

We focus on data collection designs for the automated analysis of teacher-student interactions in live classrooms with the goal of identifying instructional activities (e.g., lecturing, discussion) and assessing the quality of dialogic instruction (e.g., analysis of questions). Our designs were motivated by multiple technical requirements and constraints. Most importantly, teachers could be individually micfied but their audio needed to be of excellent quality for automatic speech recognition (ASR) and spoken utterance segmentation. Individual students could not be micfied but classroom audio quality only needed to be sufficient to detect student spoken utterances. Visual information could only be recorded if students could not be identified. Design 1 used an omnidirectional laptop microphone to record both teacher and classroom audio and was quickly deemed unsuitable. In Designs 2 and 3, teachers wore a wireless Samson AirLine 77 vocal headset system, which is a unidirectional microphone with a cardioid pickup pattern. In Design 2, classroom audio was recorded with dual first- generation Microsoft Kinects placed at the front corners of the class. Design 3 used a Crown PZM-30D pressure zone microphone mounted on the blackboard to record classroom audio. Designs 2 and 3 were tested by recording audio in 38 live middle school classrooms from six U.S. schools while trained human coders simultaneously performed live coding of classroom discourse. Qualitative and quantitative analyses revealed that Design 3 was suitable for three of our core tasks: (1) ASR on teacher speech (word recognition rate of 66% and word overlap rate of 69% using Google Speech ASR engine); (2) teacher utterance segmentation (F-measure of 97%); and (3) student utterance segmentation (F-measure of 66%). Ideas to incorporate video and skeletal tracking with dual second-generation Kinects to produce Design 4 are discussed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

实时课堂中自动化对话分析的师生互动的多模态捕获

我们专注于实时课堂中师生互动自动分析的数据收集设计，目标是识别教学活动(例如，讲座，讨论)和评估对话教学的质量(例如，问题分析)。我们的设计受到多种技术需求和约束的推动。最重要的是，教师可以被单独识别，但他们的音频需要高质量的自动语音识别(ASR)和语音分割。个别学生不能被识别，但课堂音频质量只需要足以检测学生的口语。只有在无法识别学生身份的情况下，才能记录视觉信息。设计1使用全向笔记本麦克风记录老师和教室的声音，很快被认为不合适。在设计2和3中，老师们戴着无线Samson AirLine 77语音耳机系统，这是一个单向麦克风，带有心形拾音器模式。在设计2中，教室的音频是用放置在教室前角的双第一代微软kinect录制的。设计3采用安装在黑板上的Crown PZM-30D压区麦克风录制课堂音频。设计2和设计3通过在美国六所学校的38个中学教室现场录制音频进行测试，同时训练有素的人类编码员对课堂话语进行现场编码。定性和定量分析表明，设计3适合我们的三个核心任务:(1)对教师语音进行ASR(使用Google speech ASR引擎，单词识别率为66%，单词重叠率为69%);(2)教师话语分割(f值为97%);(3)学生话语分割(f值66%)。讨论了将视频和骨骼跟踪与双第二代kinect结合到设计4中的想法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

自引率

0.00%

发文量