Introduction: Low-density lipoprotein cholesterol (LDL-C) is a significant cardiovascular risk factor, as direct measurement is expensive and often unavailable in most clinical laboratories. The Friedewald formula (FD), despite its widespread use since 1972, has notable limitations, especially at high triglyceride levels and low LDL-C concentrations. Machine learning (ML) techniques offer promising alternatives for accurate LDL-C estimation, potentially overcoming traditional formula limitations by leveraging complex pattern recognition in lipid profile data.
Material and methods: This retrospective study analyzed 34,678 lipid profiles from patients over 18 years attending Hospital Virgen Macarena, Seville (January 2021-December 2022). The study was approved by the Ethics Committee (CEI HVM-VR_03/2024). All lipid parameters (total cholesterol, triglycerides, HDL-C, LDL-C) were measured using Cobas 6000 analyzer. Twenty-two machine learning models were developed using Python's PyCaret library with 80/20 train-test split. Models included Linear Regression, Random Forest, XGBoost, LightGBM, and Gradient Boosting among others. Performance was evaluated using coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE). Four triglyceride subgroups were analyzed: <150, 150-250, 250-400, and >400mg/dL.
Results: The dataset comprised 34,678 individuals with mean values: total cholesterol 204.6±73.36mg/dL, triglycerides 203.95±143.94mg/dL, HDL-C 51.83±18.45mg/dL, and LDL-C 120.38±62.29mg/dL. LightGBM achieved the highest performance (R2=0.965, RMSE=11.35, MAE=7.99), followed by Gradient Boosting (R2=0.962, RMSE=11.89, MAE=7.87) and XGBoost (R2=0.958, RMSE=12.49, MAE=8.3). Traditional formulas showed inferior performance: Martin-Hopkins (R2=0.951, RMSE=13.82, MAE=9.3) and Friedewald (R2=0.926, RMSE=16.92, MAE=11.97). Performance differences were more pronounced at triglyceride levels≥250mg/dL, with ML models maintaining R2>0.92 while classical formulas deteriorated significantly, particularly Friedewald (R2=0.34) at triglycerides>400mg/dL.
Conclusions: Machine learning models, particularly boosting algorithms (LightGBM, Gradient Boosting, XGBoost), significantly outperformed traditional LDL-C calculation formulas across all triglyceride ranges. These AI-based approaches yielded superior accuracy and robustness, especially in challenging clinical scenarios with elevated triglycerides where conventional formulas fail. Implementation of ML models in clinical laboratories could provide more reliable LDL-C estimations, contributing to improved cardiovascular risk stratification and patient management. This technological advancement represents a promising transformation in laboratory medicine methodology.
扫码关注我们
求助内容:
应助结果提醒方式:
