Based on the Interaction Hypothesis, the study investigates the impact of different conversational Generative Artificial Intelligence (GenAI) chatbots on English as a Foreign Language (EFL) learners’ willingness to communicate (WTC), foreign language speaking anxiety (FLSA), self-perceived communicative competence (SPCC) and speaking performance. Three groups of Chinese undergraduate students were recruited: a control group (CG, N = 33) and two experimental groups (EG1, N = 33; EG2, N = 33). The CG interacted with the teacher and classmates during the speaking class. In contrast, EG1 interacted with a text- and voice-based conversational GenAI chatbot called Typebot, while EG2 engaged with a conversational GenAI chatbot that featured both text and voice interaction along with human-like avatars named D-ID Agent. Quantitative analysis using multilevel modelling revealed that EG2 showed significant improvements in WTC and SPCC and a notable reduction in FLSA levels compared to CG. However, the pre- and post-speaking test results showed no significant differences in speaking performance across the groups. Qualitative data from semi-structured interviews supported these findings, highlighting the immersive learning experience and emotional support provided by the human-like avatars. These results suggest that visually embodied GenAI chatbots can effectively enhance the emotional experience during the language learning. The study provides practical insights for language educators on integrating GenAI technologies in language teaching, emphasising the benefits of human-like avatars in fostering a more engaging and supportive learning environment.