Machine learning techniques are yielding better results than traditional statistical techniques to estimate traffic-related air pollutant (TRAP) concentrations. However, required data inputs, particularly complex traffic data, are costly and rarely collected in real-time. This study leverages real-time object detection techniques to accurately predict TRAP concentrations by extracting traffic variables solely from videos. Fine particulate matter (PM2.5), nitrogen dioxide (NO2) and ozone (O3) concentrations are recorded by low-cost sensors, with traffic data extracted using object detection and tracking algorithms. Extreme Gradient Boosting, random forest, and multilinear regression models are employed to predict concentrations across different predictor combinations. Our optimal models accurately predict PM2.5, NO2, and O3 concentrations with R2 values of 0.94, 0.95, and 0.92, respectively. This study demonstrates a cost-effective approach with high accuracies in predicting real-time TRAP using a low-cost and low-maintenance tool: a video camera. Cities could similarly track TRAP using traffic camera infrastructure without additional sensor deployment.