Objective: Endoscopic scoring of disease activity is essential in ulcerative colitis trials but is limited by inter-reader variability and operational complexity. We evaluated a novel central reading paradigm integrating two independently developed machine learning models with human adjudication for discordant cases ('2M+1H') to determine whether it is statistically non-inferior to the traditional two-reader-plus-adjudicator model.
Methods: A total of 150 full-length endoscopic videos were retrospectively scored using both the conventional central reading approach and the 2M+1H paradigm. Each machine learning model was developed independently using distinct algorithms and datasets. When the two model-generated scores disagreed, a board-certified gastroenterologist adjudicated the final score. The primary endpoint was agreement with the reference standard, measured by quadratic weighted kappa. Secondary endpoints included agreement on binary outcomes (endoscopic improvement and remission), reduction in human reads and evaluation of outcome variability due to human reader mix.
Results: The 2M+1H approach achieved a quadratic weighted kappa of 0.78 with the reference standard, meeting the prespecified threshold for non-inferiority. Agreement with the reference standard was 82.7% for endoscopic improvement and 89.3% for remission. Compared with the traditional method, the 2M+1H paradigm reduced human reads per video by 81%. Notably, 16% of cases in the human-only approach yielded different final scores depending on reader assignment.
Conclusion: The 2M+1H central reading paradigm provides statistically non-inferior accuracy with greater operational efficiency and potentially enhanced reproducibility. Prospective validation against clinical, biomarker and histological outcomes is warranted.
扫码关注我们
求助内容:
应助结果提醒方式:
