Film thickness is a critical quality attribute of coated pharmaceutical pellets, as it directly influences drug release profiles and stability. In fluidized bed coating processes, accurate in-line measurements are challenging with traditional imaging methods under complex conditions, such as high material density, pellet overlap, and defocusing. Therefore, this paper introduces an innovative visual imaging strategy leveraging the Mask R-CNN algorithm for non-invasive, real-time film thickness monitoring during fluidized bed coating processes. The proposed approach achieves precise pellet segmentation and effectively addresses challenges posed by pellet overlap, defocusing, and blurring. The superiority and accuracy of the Mask R-CNN algorithm were validated against traditional methods such as Otsu thresholding, Canny edge detection, and off-line techniques, including UV–visible spectrophotometry and laser diffraction. The sensitivity and robustness of the proposed approach were further explored under conditions of high contamination, overexposure, and low contrast arising from color variations. The results of this study demonstrate the potential of deep learning-based imaging to transform process analytical technology (PAT), facilitating dynamic and precise quality monitoring in pharmaceutical production.