Accurately differentiating maize mixtures and assessing grain cleaning loss contributes to improving the efficiency and sustainability of agricultural systems. This study proposes a novel detection method integrating time–frequency images of particle vibro-piezoelectric signals and machine learning to classify grain and impurity and assess maize cleaning loss. Specifically, an indie-developed vibro-piezoelectric detection setup is employed to capture the time-domain response signals of grain and impurity for building a database of maize collision signals. Using the Short-Time Fourier Transform (STFT) and Weighted Average Algorithm (WAA), 1D time-domain signals characterizing only the time-varying properties are converted into 2D time–frequency images possessing rich spectral feature information and energy distribution. Subsequently, 15 texture features are extracted from 2D time–frequency images with the Grey-Level-Gradient Co-ccurrence Matrix (GLGCM). After eliminating weakly-correlated features, eleven texture features are chosen and consolidated within the first four Principal Components (PCs). These four PCs and the traditional 1D time-domain signals are pre-processed and input into the Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) classifiers. The NB model with Savitzky-Golay (SG) pre-processing utilizing 2D time–frequency image features exhibits the highest accuracy of 95.74%, surpassing the optimal 1D time-domain classification model by 5.31 percentage points. Bench tests verify that the piezoelectric detection unit with the optimal NB model can control the absolute error in grain loss rate to within 0.43%. Notably, the proposed method also applies to the classification and cleaning loss detection of other typical crops by replacing the collision signal database.