Background
Indirect methods for estimating clinical reference intervals (RIs) use statistical analysis to identify non-pathological sub-distributions within large datasets acquired from routine clinical testing. This approach has the potential to accelerate the estimation of precise RIs, accounting for influential variables such as age, gender, and ethnicity. Most existing methods are based on traditional statistics and hand-crafted algorithms. The investigation of supervised learning, which often outperforms traditional approaches, has been impeded by the limitations of real-world data. However, previous studies have widely used synthetic data for evaluating and benchmarking indirect methods due several advantages over real-world data, including greater control, variability, accessibility, and the availability of exact ground-truth RIs. Synthetic data may also provide a pathway for developing data-driven solutions for indirect RI estimation.
Methods
In this study, we leveraged synthetic data to train two convolutional neural networks (CNNs) to predict the parameters of underlying reference distributions (RDs) in diverse real-world clinical datasets. While one model was trained for standard univariate data, the other was extended to bivariate data, enabling the prediction of covariance between clinical analytes. Trained models were evaluated using both real-world and synthetic test datasets and compared with four alternative algorithms.
Results
Model predictions closely matched directly estimated RIs and RDs in real-world data and known RDs in synthetic data, outperforming four alternative indirect methods: GMM, refineR, reflimR, and RINetv1. Using labeled healthy and HCV-positive groups in real data, we compared established univariate RIs with predicted multivariate reference regions (MRRs). On average, the MRRs showed 1) higher coverage of healthy patients (closer to the desired 95%) and 2) smaller regions, which reduce the likelihood of including abnormal values.
Conclusions
Synthetic data training is a viable approach for developing accurate indirect RI estimation models for both univariate and bivariate clinical data. This strategy could help address some limitations of real-world data, direct analyses, and univariate RIs.
扫码关注我们
求助内容:
应助结果提醒方式:
