Recent research shows that Large Language Models (LLMs) demonstrate human-comparable performance on various cognitive tasks, suggesting reasoning-like capabilities. However, the language dependency of these capabilities and the contribution of their neural network states remain underexplored. This study investigates how different prompts and languages influence the reasoning performance of LLMs compared to humans, while exploring the internal cognitive-like processes of LLMs through representational similarity analysis (RSA). Using scenario-based and mathematical Cognitive Reflection Test (CRT) questions across four languages, we evaluated the reasoning capabilities of LLM Qwen 2.5 (including Gemma 2.9 and Llama 3.1 replications). Results showed that language significantly impacts performance in scenario-based CRT that requires nuanced semantic processing. However, RSA of the inner state activations revealed that the LLM processed identical questions similarly across languages, suggesting that the model encodes semantics in a language-independent latent space. Additionally, the LLM's performance improved when it verbalised its reasoning, and this verbalisation increased similarity in activations. Layer-wise analyses revealed a U-shaped similarity pattern across early to late layers in Qwen and Gemma but not Llama. Furthermore, scenario-based and equivalent mathematical CRT versions elicited similar activation patterns for the paired questions, even after controlling for input and output confounds, pointing to format-agnostic reasoning mechanisms. These results highlight that while LLMs exhibit language-invariant semantic representations and format-agnostic reasoning, their performance remains sensitive to linguistic nuances and self-generated verbalisations, offering insights into both the strengths and limitations of their cognitive-like processing.
扫码关注我们
求助内容:
应助结果提醒方式:
