Machine learning (ML) algorithms are increasingly used to estimate propensity score with expectation of improving causal inference. However, the validity of data-driven ML-based approaches for confounder selection and adjustment remains unclear. In this study, we emulated the device-stratified secondary analysis of the PARADIGM-HF trial among U.S. veterans with heart failure and implanted cardiac devices from 2016 to 2020. We benchmarked observational estimates from three propensity score approaches against the trial results. (1) logistic regression with pre-specified confounders (2), generalized boosted models (GBM) using the same pre-specified confounders, and (3) GBM with expanded covariates and automated feature selection. Logistic regression-based propensity score approach yielded estimates closest to the trial (HR = 0.93, 95% CI 0.61-1.42; 23-month RR = 0.86, 95% CI 0.57-1.24 vs. trial HR = 0.81, 95% CI 0.61-1.06). Despite better predictive performance, GBM with pre-specified confounders showed no improvement over the logistic regression approach (HR = 0.97, 95% CI 0.68-1.37; RR = 0.96, 95% CI 0.89-1.98). Moreover, GBM with expanded covariates and data-driven automated feature selection substantially increased bias (HR = 0.61, 95% CI 0.30-1.23; RR = 0.69, 95% CI 0.36-1.04). Our findings suggest that ML-based propensity score methods do not inherently improve causal estimation possibly due to residual confounding from omitted or partially adjusted variables and may introduce overadjustment bias when combined with automated feature selection. These results underscore the importance of careful confounder specification and causal reasoning over algorithmic complexity in causal inference.
The associations of coffee and tea intake with long-term risk of dementia have not been thoroughly established. Additionally, the potential mediating roles of circulating inflammatory biomarkers in these associations remain less explored. We included 6,001 participants from the Health and Retirement Study (HRS, 2013-2020) and 2,650 participants from the Framingham Heart Study Offspring cohort (FOS, 1998-2018), all free of dementia at baseline. Coffee and tea intake was assessed using a semi-quantitative food frequency questionnaire in both cohorts. Dementia diagnosis was ascertained using a validated algorithm and clinical review panel. Cox proportional hazard models were utilized to evaluate the associations of coffee and tea intake with dementia. Mediation analysis was conducted to examine whether circulating inflammatory biomarkers mediated these associations. During a median follow-up of 7.0 years in HRS and 11.1 years in FOS, 231 individuals in HRS and 204 in FOS developed all-cause dementia. Compared with intake of less than 1 cup of coffee per day, consuming ≥ 2 cups daily had a 28-37% lower risk of dementia (Hazards ratio [HR] = 0.72, 95% confidence interval [CI]: 0.52, 0.99, P-trend = 0.045 in HRS; HR = 0.63, 95% CI: 0.45, 0.90, P-trend = 0.015 in FOS). Compared to non-consumers, moderate tea consumption was associated with a lower dementia risk in HRS (HR = 0.65, 95% CI: 0.48, 0.89 for > 0 to < 1 cup/day; HR = 0.53, 95% CI: 0.30, 0.94 for ≥ 1 to < 2 cups/day), but no significant association was observed in FOS. In the mediation analysis, the association between coffee intake and dementia was partially mediated by interleukin-10 (IL-10, 29.30%), Cystatin C (24.45%), C-reactive protein (CRP, 16.54%), interleukin-1 receptor antagonist (IL-1RA, 11.06%), and soluble tumor necrosis factor receptor-1 (sTNFR-1, 10.78%). In conclusion, higher coffee consumption (≥ 2 cups per day) is associated with a lower risk of dementia, partially mediated by a set of inflammatory biomarkers. Moderate intake of tea (0-2 cups per day) may relate to a lower risk of dementia. Further large-scale observational and interventional studies are warranted to confirm these findings.
The rapid expansion of large-scale electronic health record (EHR) data has underscored the necessity for advanced analytical methods, such as disease network analyses, to comprehensively identify and interpret multimorbidity patterns and disease progression pathways. To overcome existing obstacles associated with performing sophisticated disease network analyses on EHR data, we developed DiNetxify, an open-source Python package implementing our recently introduced three-dimensional (3D) disease network analysis method ( https://hzcohort.github.io/DiNetxify/ ). DiNetxify provides a dedicated data class for handling various EHR data, comprehensive modular functions for executing complete 3D disease network analyses, and visualization functions for interactive exploration of results. The package is efficient, user-friendly, and optimized for large-scale EHR datasets. It supports diverse study designs, customizable analysis parameters, and parallel computing for enhanced performance. Through a case study utilizing UK Biobank data to investigate disease networks associated with short leukocyte telomere length, we demonstrated the capability of DiNetxify to identify meaningful disease clusters and progression patterns consistent with established knowledge while uncovering novel insights. Computationally, the software successfully completed analyses involving cohorts exceeding half a million exposed individuals within 17 h, using moderate computational resources. We thus anticipate that DiNetxify can significantly reduce technical barriers to facilitate broader adoption of advanced disease network analysis techniques by different researchers, thereby enhancing the exploration of EHR data to improve the understanding of holistic health dynamics.
Clinical trials have shown favorable effects of exercise on frailty, supporting physical activity (PA) as a treatment and prevention strategy. Proteomics studies suggest that PA alters levels of many proteins, some of which may function as molecules in the biological processes underlying frailty. However, these studies have focused on structured exercise programs or cross-sectional PA-protein associations. Therefore, the effects of long-term PA on frailty-associated proteins remain unknown. Among 14,898 middle-aged adults, we emulated a target trial that assigned individuals to either (i) achieve and maintain the recommended PA level (≥ 150 min/week of moderate-to-vigorous physical activity [MVPA]) through 6 (± 0.3) years of follow-up or (ii) follow a "natural course" strategy, where all individuals engage in various amounts of habitual MVPA. We estimated the effects of long-term adherence to recommended MVPA versus the natural course strategy on 45 previously identified frailty-associated proteins at the end of the follow-up using inverse probability of weighting (IPW) and iterative conditional expectations (ICE). We found that long-term adherence to recommended MVPA improved the population levels of many frailty-associated proteins (ranged from 0.04 to 0.11 standard deviation); the greatest benefits were seen in proteins involved in the nervous system (e.g., voltage-dependent calcium channel subunit alpha-2/delta-3 [CACNA2D3], contactin-1 [CNTN1], neural cell adhesion molecule 1 [NCAM1], and transmembrane protein 132D [TMEM132D]) and inflammation (e.g., high-temperature requirement serine protease A1 [HTRA1] and C-reactive protein [CRP]). Our findings suggest improved nervous system and reduced inflammation as the biological basis of long-term engagement in adequate PA as an intervention strategy for frailty.

