This study explores the potential application of Large Language Models (LLMs) in the Mispronunciation Detection and Diagnosis (MDD) system, which includes pronunciation error detection, feedback, and diagnosis. Accurate detection of incorrect pronunciation, along with comprehensive and effective diagnosis, is key to guiding learners in corrective exercises. Traditional MDD research requires collecting data for specific tasks and training models similar to those used in speech recognition. Moreover, most previous research focuses on identifying the types of errors rather than providing specific pronunciation guidance, resulting in textual feedback on pronunciation errors being relatively limited and the content not sufficiently rich. The recent breakthroughs in LLMs have created new opportunities for pronunciation learning through their ability to generate fluent, educationally valuable feedback, such as explaining error types, demonstrating correct pronunciation, and providing personalized practice guidance. This study explores the potential of multimodal speech models in end-to-end pronunciation error detection and feedback generation. Our experiment results show that comprehensive fine-tuning of the Whisper model using second language (L2) speech data can improve its ability to model L2 speech, thereby increasing the accuracy of mispronunciation detection. The feedback text generated by this model is comparable in quality to the current state-of-the-art (SOTA) level based on LLMs (G-Score of 0.52, compared to SOTA’s 0.54). In addition, this study proposes a pronunciation error feedback method based on pronunciation attribute features using LLMs. The LLMs effectively improve the accuracy and effectiveness of feedback text by analyzing the pronunciation attribute features of incorrect phoneme positions. The evaluation of feedback from LLMs by L2 learners indicates a significant improvement in comprehensibility and helpfulness when using these pronunciation representations. These results confirm the potential of articulatory feature engineering and strategic model optimization in CAPT systems by enhancing learner engagement while reducing instructor workload.
扫码关注我们
求助内容:
应助结果提醒方式:
