This article presents an advanced synchronization and channel estimation architecture for multi-stream MIMO systems in sub-terahertz environments. To streamline hardware complexity, we employ Golay cross-correlation across all detection and estimation schemes. Key innovations include a precise timing detection algorithm that utilizes pulse shaping impulse response and quadratic regression, along with multiple window-based approaches to enhance performance against non-ideal effects. At the architectural level, a shared optimized Golay correlator reduces hardware usage by 23%, efficiently handling multiple correlation lengths in a single design. Additionally, we propose an indexing-count method that addresses sorting challenges, achieving notable improvements in processing speed and complexity reduction. The proposed design supports the highest modulation schemes defined in IEEE Std. 802.15.3d, achieving an uncoded bit error rate of $1.96times 10^{-4}$