The advancement of Artificial Intelligence (AI) has revolutionized medical diagnostics and treatment. Large-scale public datasets are fueling research in this field. Therefore, this systematic review is a comprehensive analysis of 13 foundational medical datasets. It evaluates the characteristics, performance metrics, and inherent biases of datasets across medical imaging, electronic health records, and genomics. The published literature is systematically reviewed to categorize these datasets, with a focus on performance metrics for everyday machine learning tasks. Additionally, this research documents evidence of systemic bias and limitations that affect model generalizability and clinical equity. Our analysis reveals compelling evidence that significant limitations temper the remarkable progress of algorithms. It has been frequently observed that AI models suffer dramatic accuracy drops when tested beyond their training distribution, with the Area Under the Curve consistently declining from 0.95 to 0.63. The research also identified consistent patterns of systemic bias that threaten the equitable application of healthcare. This bias stems from unrepresentative sampling, subjective annotation practices, label noise, and Natural Language Processing-derived ground-truth labels. Our findings demonstrate the urgent need for a paradigm shift in the development of medical applications. The AI and medical communities must prioritize generating diverse datasets and mitigating systematic bias. This study provides evidence-based recommendations and a technical toolkit to address these challenges and reduce any health disparities.
扫码关注我们
求助内容:
应助结果提醒方式:
