Background: Peer review is central to maintaining scientific quality and helps editors make decisions. However, the volume of scientific publications continues to rise, placing pressure on the peer review system. With the rise of generative AI, its role in supporting peer review is gaining attention. This study aims to compare human-written and AI-generated peer review reports.
Methods: We analysed 398 peer review reports linked to 119 research articles published in BMJ Open in 2024. Publicly available reports and manuscripts were included. Editorials, corrections, and protocols were excluded. AI reports were generated using ChatGPT. All reports were anonymized and assessed by two independent reviewers. We conducted a hybrid thematic analysis. Frequencies of themes were calculated and compared by reviewer type. For quantitative comparison, we used the Mann-Whitney U test to assess differences in review quality scores and Fisher's exact test to compare the distribution of themes. All analyses were conducted using R software.
Results: Human reviewers gave more detailed and diverse comments. They addressed deeper issues like interpretation, originality, and applicability. AI reviews covered more sections but focused on routine or structural elements. AI outperformed slightly in format-related domains. Co-occurrence analysis showed human reviews linked diverse themes, while AI comments were structurally clustered. Shannon Index confirmed that human reviews were more thematically diverse.
Conclusions: AI can support peer review by screening for basic errors. However, it lacks insight, critical judgment, and contextual awareness. Human input remains essential for meaningful review. Review-specific AI tools that preserve confidentiality are needed for future integration.
扫码关注我们
求助内容:
应助结果提醒方式:
