The perception of safety significantly impacts residents’ urban living and socio-economic development. However, the phenomenon and drivers of gender differences in safety perceptions have not received sufficient emphasis, resulting in the gradual exacerbation of gender inequality in urban environments. To address this issue, we explored a research methodology that integrates visual perception with socio-environmental characteristics to more comprehensively explain gender differences in safety perceptions. We conducted an empirical investigation in the primary urban area of Nanjing, China. We explored the spatial distribution characteristics of safety perception differences using the Gradient Boosting Decision Tree model and spatial autocorrelation analysis. Additionally, we examined the impact of visual elements on gender differences through ridge regression analysis. Given the unsteady spatial distribution of urban environmental data and safety perceptions, we employed multi-scale geographically weighted regression models to account for differential distributions. These models captured the spatial relationships between indicators of socio-economic characteristics, urban environmental characteristics, social media vitality, and safety perceptions. Some interesting findings were identified in the study: (1) Gender differences were concentrated in high-density old urban areas and expansive agricultural land. (2) Women have more negative perceptions of the color richness of streets and the enclosure of interfaces. (3) Characteristics of local people’s activities positively influenced perceptions of safety, whereas characteristics representing diverse people’s activities more negatively characterized perceptions of safety for men. This study contributes a comprehensive and replicable methodology to the research on gender differences in urban perceptions, offering insights for urban planning decisions and promoting gender inclusivity.