Urban streetscape transitions influence emotions and perceived restorativeness, yet most studies rely on static imagery. This study investigates whether GPT-5 can approximate human perceptions of dynamic transitions, using Hong Kong as a case. We applied a three-stage methodology consisting of alignment, attribution, and interpretation to systematically compare GPT-5 with human evaluations. Four five-minute walking videos were evaluated by 200 participants on affective states using the Pleasure, Arousal, and Dominance (PAD) model, perceived restorativeness (PRS), and subjective impressions. GPT-5 received stitched video frames and performed three tasks: overall scoring, segment-based scoring with feature attribution, and short language-based explanations. Comparisons showed a moderate positive correlation between GPT-5 and human ratings (Pearson r = 0.44, Spearman ρ = 0.43, both p < 0.001). GPT-5 aligned more closely with humans on structural and physical cues (e.g., greenery and density indicators) but showed markedly lower agreement on affective and restorative dimensions. In contrast, the language-based explanations showed both overlap and divergence. Both GPT-5 and humans highlighted greenery, but their interpretations differed in other aspects: humans emphasized openness and experiential cues, whereas GPT-5 focused on signage, buildings, and traffic. The findings highlight how generative AI responds to dynamic environmental transitions, filling a gap left by static-image studies and providing a useful complement to human judgement in high-density cities.
扫码关注我们
求助内容:
应助结果提醒方式:
