In recent years, the use of millimetre wave radio signals for speech recognition has rapidly developed. The absence of high-frequency components resulting from the material vibration constraints of fully viewed indoor objects has undermined the recognition accuracy in this field. This paper presents a new solution to the Chinese digits speech recognition problem by reconstructing the high-frequency harmonic and non-harmonic components with the radio signals received by millimetre wave radar sensors. A time–frequency analysis was conducted to convert the phase variations extracted from the radar I/Q signals to spectrograms. An improved threshold strategy was used to enhance the harmonic components on the spectrogram. Subsequently, a CycleGAN-based network was constructed to recover non-harmonic components on the spectrograms. An evaluation experiment was performed with a 77-GHz frequency modulated continuous wave radar sensor to use the induced vibrations of aluminium foils, glass, and anti-static bags to recognise the speeches of standard Chinese digit numbers (0–9). The F1 score in the speech recognition experiment reached 96.6%, with a micro average accuracy exceeding 98.3%. These results show that the proposed method can improve recognition accuracy by generating finer signatures from radio signals.