Classroom dialogue is crucial for effective teaching and learning, prompting many professional development (PD) programs to focus on dialogic pedagogy. Traditionally, these programs rely on manual analysis of classroom practices, which limits timely feedback to teachers. To address this, artificial intelligence (AI) has been employed for rapid dialogue analysis. However, practical applications of AI models remain limited, often prioritising state-of-the-art performance over educational impact. This study explores whether higher accuracy in AI models correlates with better educational outcomes. We evaluated the performance of two language models—BERT and Llama3—in dialogic analysis and assessed the impact of their performance differences on teachers' learning within a PD program. By fine-tuning BERT and engineering prompts for Llama3, we found that BERT exhibited substantially higher accuracy in analysing dialogic moves. Sixty preservice teachers were randomly assigned to either the BERT or Llama3 group, both participating in a workshop on the academically productive talk (APT) framework. The BERT group utilized the fine-tuned BERT model to facilitate their learning, while the Llama3 group employed the Llama3 model. Statistical analysis showed significant improvements in both groups' knowledge and motivation to learn the APT framework, with high levels of satisfaction reported. Notably, no significant differences were found between the two groups in posttest knowledge, motivation, and satisfaction. Interviews further elucidated how both models facilitated teachers' learning of the APT framework. This study validates the use of AI in teacher training and is among the first to investigate the relationship between AI accuracy and educational outcomes.