This comprehensive review aims to clarify the growing impact of Transformer-based models in the fields of neuroscience, neurology, and psychiatry. Originally developed as a solution for analyzing sequential data, the Transformer architecture has evolved to effectively capture complex spatiotemporal relationships and long-range dependencies that are common in biomedical data. Its adaptability and effectiveness in deciphering intricate patterns within medical studies have established it as a key tool in advancing our understanding of neural functions and disorders, representing a significant departure from traditional computational methods. The review begins by introducing the structure and principles of Transformer architectures. It then explores their applicability, ranging from disease diagnosis and prognosis to the evaluation of cognitive processes and neural decoding. The specific design modifications tailored for these applications and their subsequent impact on performance are also discussed. We conclude by providing a comprehensive assessment of recent advancements, prevailing challenges, and future directions, highlighting the shift in neuroscientific research and clinical practice towards an artificial intelligence-centric paradigm, particularly given the prominence of Transformer architecture in the most successful large pre-trained models. This review serves as an informative reference for researchers, clinicians, and professionals who are interested in understanding and harnessing the transformative potential of Transformer-based models in neuroscience, neurology, and psychiatry.