Dynamic detection is crucial for intelligent vision systems, enabling applications like autonomous vehicles and advanced surveillance. Event-based sensors, which convert illumination variations into sparse event spikes, are highly effective for dynamic detection with low data redundancy. However, current event-based vision sensors with simplified photosensitive capacitor structures face limitations, particularly in their spectral response, which hinders effective information acquisition in multispectral scenes. Here, we introduce a two-terminal thin-film event-based vision sensor that innovatively integrates an inorganic oxide p–n junction with the pyro-phototronic effect, synergistically combining the photovoltaic and pyroelectric mechanisms. This innovation enables spiking signals with a tenfold increase in responsivity, a dynamic range of 110 dB, and an extended spectral response from ultraviolet (UV) to near-infrared (NIR). With a thin-film sensor array, these spiking signals accurately extract fingerprint edge features even under low-light conditions, benefiting from high sensitivity to minor luminance variations. Additionally, the sensors' broadband spiking response captures richer information, achieving 99.25% accuracy in multispectral dynamic gesture recognition while reducing data processing by over 65%. This approach effectively eliminates redundant data while minimizing information loss, offering a promising alternative to current dynamic perception technologies.