The growing popularity of Large Vision-Language Models has highlighted and intensified one of the most well-known challenges in the field of Large Language Models: training is mainly, and most of the time exclusively, conducted on English data. Consequently, the resulting models are more prone to error in non-English tasks, and this issue is exacerbated in multimodal settings that are even more complex and use task-specific datasets. Given this, research on Large Language Models has turned toward adapting them to non-English languages. However, the scarcity of open and curated resources for these languages poses a significant limitation. In this work, we aim to tackle the aforementioned challenge by exploring Large Vision-Language Models adaptation to non-English languages, using machine translation to overcome the lack of curated data. We also analyze how the evaluation of the results is influenced when training a vision-to-text adapter across different languages, examining the performance variations and challenges associated with multilingual adaptation. Finally, we highlight the importance of using open resources to ensure transparency and reproducibility of the results. Following this philosophy, we provide open access to the entire codebase of the adaptation pipeline, along with the trained models and dataset, to foster further research.1
扫码关注我们
求助内容:
应助结果提醒方式:
