The emergence of ChatGPT has drawn considerable attention in the NLP community for its impressive performance across a wide range of language tasks. However, its effectiveness in multi-label movie genre prediction remains underexplored. This study evaluates the genre prediction capabilities of multiple Large Language Models (LLMs), including ChatGPT, using the MovieLens-100K dataset comprising 1682 movies spanning 18 genres. We investigate zero-shot and few-shot prompting strategies based on movie trailer transcripts and subtitles, where each movie may belong to multiple genres. Our results show that ChatGPT consistently outperforms earlier LLM baselines under both zero-shot and few-shot settings, while instruction fine-tuning further improves recall and overall predictive coverage. To explore multimodal extensions, we augment textual prompts with visual cues extracted from movie posters using a Vision-Language Model (VLM). While the incorporation of visual information yields selective, genre-dependent benefits–particularly improving recall for visually distinctive genres–the overall gains in aggregate performance metrics remain limited. Overall, our findings highlight the robustness of prompt-based and fine-tuned LLMs for genre prediction, and suggest that multimodal information can provide complementary signals in specific cases, motivating future work on tighter task-aligned vision-language integration.
扫码关注我们
求助内容:
应助结果提醒方式:
