Dementia is a major global health issue that significantly impacts millions of individuals, families, and societies worldwide, creating a substantial burden on healthcare systems. This study introduces a novel approach for predicting dementia by employing the Logistic Regression (LR) model, enhanced with Recursive Feature Elimination (RFE), applied to a unique dataset comprising 1000 patients, with 49.60% male and 50.40% female. The LR model, recognized for its simplicity and effectiveness in binary classification tasks, is optimized through RFE, a technique that iteratively eliminates less significant features to improve model performance. The model’s effectiveness was assessed using comprehensive metrics, including accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), and Kappa score. Furthermore, SHapley Additive exPlanations (SHAP) values were employed to increase the interpretability of the model, providing insights into the most influential features for dementia prediction. To address the issue of overfitting, a standardization technique was implemented, which enhanced the model’s predictive performance. The findings of this study hold potential implications for early dementia detection, informing intervention strategies, and optimizing healthcare resource allocation.
Barrett's esophagus is an asymptomatic precursor to esophageal adenocarcinoma. Its rising incidence due to lifestyle factors, coupled with healthcare costs, requires cost-effective alternatives for surveillance. We propose a decision-analytic Markov cohort model to simulate Barrett's esophagus's natural progression to esophageal adenocarcinoma using TreeAge Pro. Health states include metaplasia (non-dysplastic Barrett's esophagus), low-grade dysplasia, high-grade dysplasia, and esophageal adenocarcinoma. Triplicates of these health states represent one non-stratified and two risk-stratified cohorts for devising risk-based strategies. A cycle length of six months and a time horizon of 35 years, totaling 70 cycles, is considered. Model inputs are derived from literature and, when unavailable from an extensive local database of 1087 patients (5081 person-years) from March 2003–2021, cleaned and analyzed with Rstudio (R version 3.6.3). Specific tests included descriptive statistics, Cox-proportional hazard models, and graphing. A seven-step calibration process is performed for risk-stratified and non-stratified groups simultaneously to match the progression to high-grade dysplasia and esophageal adenocarcinoma. This allows comparison between risk- and non-risk-based strategies. The calibration process included input parameterization, optimization, goodness of fit calculation, selection of sets meeting convergence criteria, and integration into probabilistic sensitivity analysis. This process generated 10,187 sets of transition probabilities, with 4358 meeting convergence criteria, ensuring equal model outputs in all groups. Mortality was 10.7% for cancer-related deaths, matching literature values. This process provides a robust framework for evaluating Barrett's esophagus progression and management strategies, supporting informed decision-making in healthcare.
The primary goal of this research is to examine the impact of balancing data on the prediction quality and inference in multilevel logistic regression models. Logistic regression is a valuable approach for modeling binary outcomes expected in health applications. The class imbalance problem, where one of the two outcome categories occurs much more often than the other, is common in healthcare data, such as when modeling the risk factors for rare diseases. The issue is particularly relevant for medical data that contains individual measurements and other data sources measured at a geographic region level, such as environmental risk factors. For this work, both prediction and model interpretation are of interest. A simulation model is proposed to test the impact of balancing strategies on the logistic multilevel model's parameter estimation, inference, and predictive performance. The simulated information emulates characteristics of a Gestational Diabetes Mellitus (GDM) dataset from Indiana's Medicaid program. Several datasets were simulated with varying levels of complexity, involving the balance of the outcome variable and predictors. These datasets exhibited high- or low-frequency occurrences in specific intersections of variables, often called ‘cells.’ The impact of the balancing strategies on prediction and inference was assessed using different techniques, such as the Equivalence (TOST) Test, power analysis, and predictive measures. To the best of our knowledge, this is the first research that explores the impact of using balanced samples on coefficient estimation and prediction measures when using logistic multilevel modeling, finding evidence about the benefits of using balanced samples in this context.

