Passive surface-wave methods using dense seismic arrays have gained growing attention in near-surface high-resolution imaging in urban environments. Deep learning (DL) in the extraction of dispersion curves and inversion can release a tremendous workload brought by dense seismic arrays. We presented a case study of imaging shear-wave velocity (Vs) structure and detecting low-velocity layer (LVL) in the Hangzhou urban area (eastern China). We used traffic-induced passive surface-wave data recorded by dense linear arrays. We extracted phase-velocity dispersion curves from noise recordings using seismic interferometry and multichannel analysis of surface waves. We adopted a convolutional neural network to estimate near-surface Vs models by inverting Rayleigh-wave fundamental-mode phase velocities. To improve the accuracy of the inversion, we utilized the sensitivities to weight the loss function. The average root mean square error from the weighted inversion is 46% lower than that from the unweighted DL inversion. The estimated pseudo-2D Vs profiles correspond to the velocities obtained from downhole seismic measurements. Compared with an investigation on the same survey area, our inversion results are more consistent with the Vs provided by downhole seismic measurements within 50–60 m where the LVL exists. The trained neural network successfully identified that the LVL is located at 50–60 m deep. To check the applicability of the trained neural network, we applied it to a nearby passive surface-wave survey and the inversion results agree with the existing investigation results. The two applications demonstrate the accuracy and efficiency of delineating near-surface Vs structures with the LVL from traffic-induced noise using the DL technique. The DL inversion has great potential for monitoring subsurface medium changes in urban areas.