Objective: Reproducibility is key for diagnostic tests involving subjective evaluation by experts. Our aim was to systematically review the reproducibility of visual analysis in clinical electroencephalogram (EEG). In this paper, we give data on the scope of EEG features found, and detailed reproducibility data for the most studied feature.
Methods: We searched four databases for articles reporting reproducibility in clinical EEG, until June 2023. Two raters screened 24 553 citations, and then 2736 full texts. Quality was assessed according to the GRRAS guidelines.
Results: We found 275 studies (268 interrater and 20 intrarater), addressing 606 different EEG features. Only 38 EEG features had been studied in >2 studies. Most studies had <50 patients and EEGs. The most often addressed feature was seizure detection (62 papers). Interrater reproducibility of seizure detection was substantial-to-almost-perfect with experienced raters and raw EEG (kappa .62-.88). With experienced raters and transformed EEG, reproducibility was substantial (kappa .63-.70). Inexperienced raters had lower reproducibility. Seizure lateralization reproducibility was moderate to substantial (kappa .58-.77) but lower than for seizure detection.
Significance: Most EEG reproducibility studies are done only once. Intrarater studies are rare. The reproducibility of visual EEG analysis is variable. Interrater reproducibility for seizure detection is substantial-to-perfect with experienced raters and raw EEG, less with inexperienced raters or transformed EEG. The results of visual EEG analysis vary within the same rater, and between raters. There is a need for larger collaborative studies, using improved methodology, as well as more intrarater studies of EEG interpretation.