Background: Predicting health insurance uptake remains a critical challenge for policymakers and insurance providers seeking to optimise coverage strategies and resource allocation. In Sierra Leone, health insurance uptake remains extremely low, and understanding determinants is vital for universal health coverage goals.
Objective: To develop and evaluate an innovative ensemble feature selection methodology for health insurance uptake prediction, establishing new performance benchmarks through systematic comparison of multiple machine learning algorithms using comprehensive validation strategies.
Methods: This study employed supervised machine learning to predict health insurance uptake among 15,574 women using data from the 2019 Sierra Leone Demographic and Health Survey (SLDHS). We implemented an ensemble feature selection approach that requires consensus across Adaptive Ant Colony Optimisation, Recursive Feature Elimination, and Backwards Elimination techniques. Seven algorithms were systematically compared: Logistic Regression, Support Vector Machines, K-Nearest Neighbors, Random Forest, Gradient Boosting, XGBoost, and LightGBM. SMOTE addressed class imbalance, whilst validation employed nested 5-fold cross-validation, 10-fold cross-validation, and hold-out testing to prevent information leakage.
Results: Random Forest achieved exceptional performance with 0.9973 accuracy, 0.9973 precision, 0.9973 recall, 0.9973 F1-score, and perfect 1.0000 ROC AUC on hold-out testing. XGBoost delivered comparable results with 0.9914 across all metrics and 0.9998 ROC AUC. Backward Feature Elimination consistently yielded superior results across ensemble methods. However, the near-perfect performance warrants cautious interpretation and requires external validation to confirm generalizability.
Conclusions: This research establishes new performance benchmarks for health insurance prediction, significantly exceeding existing literature, which has direct implications for health insurance policy and practice in Sierra Leone. The innovative ensemble feature selection methodology provides a robust framework for enhancing prediction accuracy across healthcare applications, offering immediate practical value for stakeholders. Future work should prioritize external validation, explainability analysis, and temporal stability assessment to ensure practical deployment readiness.
Objective: Medication discrepancies at hospital admission are common and can cause preventable patient harm. Predictive models can help prioritize medication reconciliation for high-risk patients. This study aimed to develop and validate machine learning (ML) models for predicting clinically relevant medication reconciliation discrepancies in emergency department (ED) patients, and to compare their performance with logistic regression.
Methods: We conducted a single-center, retrospective study at UZ Leuven. The dataset included patients admitted to the ED between 2017 and 2019 (development set) and 2021-2022 (temporal validation set). The outcome variable was the presence of at least one clinically relevant medication discrepancy, defined by expert panel adjudication. Variables were extracted from the electronic health record, with care to avoid data leakage. Three models - logistic regression, random forest, and eXtreme Gradient Boosting - were developed using tailored variable selection strategies, and validated temporally. Model performance was assessed via discrimination, calibration, and classification metrics. Clinical utility was assessed using decision curve analysis.
Results: The development and validation cohorts included 817 and 349 patients, respectively. LR and RF models demonstrated moderate discrimination on temporal validation (AUROC 0.67-0.68). The XGBoost model showed lower discrimination (AUROC 0.63). Calibration was comparable across models. Decision curve analysis showed only small differences in net benefit between models across clinically relevant threshold probabilities.
Conclusion: ML models provided no clear improvement over logistic regression, which achieved similar predictive performance and greater interpretability. These findings highlight both the potential and the limitations of ML for supporting targeted medication reconciliation in ED workflows. Future research should explore the added value of richer data sources, such as unstructured clinical narratives.

