Background: Breast cancer is the most common malignancy among women worldwide. Despite advances in treating breast cancer over the past decades, drug resistance and adverse effects remain challenging. Recent therapeutic progress has shifted toward using drug combinations for better treatment efficiency. However, with a growing number of potential small-molecule cancer inhibitors, in silico strategies to predict pharmacological synergy before experimental trials are required to compensate for time and cost restrictions. Many deep learning models have been previously proposed to predict the synergistic effects of drug combinations with high performance. However, these models heavily relied on a large number of drug chemical structural fingerprints as their main features, which made model interpretation a challenge.
Results: This study developed a deep neural network model that predicts synergy between small-molecule pairs based on their inhibitory activities against 13 selected key proteins. The synergy prediction model achieved a Pearson correlation coefficient between model predictions and experimental data of 0.63 across five breast cancer cell lines. BT-549 and MCF-7 achieved the highest correlation of 0.67 when considering individual cell lines. Despite achieving a moderate correlation compared to previous deep learning models, our model offers a distinctive advantage in terms of interpretability. Using the inhibitory activities against key protein targets as the main features allowed a straightforward interpretation of the model since the individual features had direct biological meaning. By tracing the synergistic interactions of compounds through their target proteins, we gained insights into the patterns our model recognized as indicative of synergistic effects.
Conclusions: The framework employed in the present study lays the groundwork for future advancements, especially in model interpretation. By combining deep learning techniques and target-specific models, this study shed light on potential patterns of target-protein inhibition profiles that could be exploited in breast cancer treatment.
Purpose: Epistasis, the interaction between two or more genes, is integral to the study of genetics and is present throughout nature. Yet, it is seldom fully explored as most approaches primarily focus on single-locus effects, partly because analyzing all pairwise and higher-order interactions requires significant computational resources. Furthermore, existing methods for epistasis detection only consider a Cartesian (multiplicative) model for interaction terms. This is likely limiting as epistatic interactions can evolve to produce varied relationships between genetic loci, some complex and not linearly separable.
Methods: We present new algorithms for the interaction coefficients for standard regression models for epistasis that permit many varied models for the interaction terms for loci and efficient memory usage. The algorithms are given for two-way and three-way epistasis and may be generalized to higher order epistasis. Statistical tests for the interaction coefficients are also provided. We also present an efficient matrix based algorithm for permutation testing for two-way epistasis. We offer a proof and experimental evidence that methods that look for epistasis only at loci that have main effects may not be justified. Given the computational efficiency of the algorithm, we applied the method to a rat data set and mouse data set, with at least 10,000 loci and 1,000 samples each, using the standard Cartesian model and the XOR model to explore body mass index.
Results: This study reveals that although many of the loci found to exhibit significant statistical epistasis overlap between models in rats, the pairs are mostly distinct. Further, the XOR model found greater evidence for statistical epistasis in many more pairs of loci in both data sets with almost all significant epistasis in mice identified using XOR. In the rat data set, loci involved in epistasis under the XOR model are enriched for biologically relevant pathways.
Conclusion: Our results in both species show that many biologically relevant epistatic relationships would have been undetected if only one interaction model was applied, providing evidence that varied interaction models should be implemented to explore epistatic interactions that occur in living systems.
Background: Previous studies have shown an association between gut microbiota and cardiovascular diseases (CVDs). However, the underlying causal relationship remains unclear. This study aims to elucidate the causal relationship between gut microbiota and CVDs and to explore the pathogenic role of gut microbiota in CVDs.
Methods: In this two-sample Mendelian randomization study, we used genetic instruments from publicly available genome-wide association studies, including single-nucleotide polymorphisms (SNPs) associated with gut microbiota (n = 14,306) and CVDs (n = 2,207,591). We employed multiple statistical analysis methods, including inverse variance weighting, MR Egger, weighted median, MR pleiotropic residuals and outliers, and the leave-one-out method, to estimate the causal relationship between gut microbiota and CVDs. Additionally, we conducted multiple analyses to assess horizontal pleiotropy and heterogeneity.
Results: GWAS summary data were available from a pooled sample of 2,221,897 adult and adolescent participants. Our findings indicated that specific gut microbiota had either protective or detrimental effects on CVDs. Notably, Howardella (OR = 0.955, 95% CI: 0.913-0.999, P = .05), Intestinibacter (OR = 0.908, 95% CI:0.831-0.993, P = .03), Lachnospiraceae (NK4A136 group) (OR = 0.904, 95% CI:0.841-0.973, P = .007), Turicibacter (OR = 0.904, 95% CI: 0.838-0.976, P = .01), Holdemania (OR, 0.898; 95% CI: 0.810-0.995, P = .04) and Odoribacter (OR, 0.835; 95% CI: 0.710-0.993, P = .04) exhibited a protective causal effect on atrial fibrillation, while other microbiota had adverse causal effects. Similar effects were observed with respect to coronary artery disease, myocardial infarction, ischemic stroke, and hypertension. Furthermore, reversed Mendelian randomization analyses revealed that atrial fibrillation and ischemic stroke had causal effects on certain gut microbiotas.
Conclusion: Our study underscored the importance of gut microbiota in the context of CVDs and lent support to the hypothesis that increasing the abundance of probiotics or decreasing the abundance of harmful bacterial populations may offer protection against specific CVDs. Nevertheless, further research is essential to translate these findings into clinical practice.
Background: Prioritizing candidate drugs based on genome-wide expression data is an emerging approach in systems pharmacology due to its holistic perspective for preclinical drug evaluation. In the current study, a network-based approach was proposed and applied to prioritize plant polyphenols and identify potential drug combinations in breast cancer. We focused on MEK5/ERK5 signalling pathway genes, a recently identified potential drug target in cancer with roles spanning major carcinogenesis processes.
Results: By constructing and identifying perturbed protein-protein interaction networks for luminal A breast cancer, plant polyphenols and drugs from transcriptome data, we first demonstrated their systemic effects on the MEK5/ERK5 signalling pathway. Subsequently, we applied a pathway-specific network pharmacology pipeline to prioritize plant polyphenols and potential drug combinations for use in breast cancer. Our analysis prioritized genistein among plant polyphenols. Drug combination simulations predicted several FDA-approved drugs in breast cancer with well-established pharmacology as candidates for target network synergistic combination with genistein. This study also highlights the concept of target network enhancer drugs, with drugs previously not well characterised in breast cancer being prioritized for use in the MEK5/ERK5 pathway in breast cancer.
Conclusion: This study proposes a computational framework for drug prioritization and combination with the MEK5/ERK5 signaling pathway in breast cancer. The method is flexible and provides the scientific community with a robust method that can be applied to other complex diseases.
Background: 1-methyladenosine (m1A) is a variant of methyladenosine that holds a methyl substituent in the 1st position having a prominent role in RNA stability and human metabolites.
Objective: Traditional approaches, such as mass spectrometry and site-directed mutagenesis, proved to be time-consuming and complicated.
Methodology: The present research focused on the identification of m1A sites within RNA sequences using novel feature development mechanisms. The obtained features were used to train the ensemble models, including blending, boosting, and bagging. Independent testing and k-fold cross validation were then performed on the trained ensemble models.
Results: The proposed model outperformed the preexisting predictors and revealed optimized scores based on major accuracy metrics.
Conclusion: For research purpose, a user-friendly webserver of the proposed model can be accessed through https://taseersuleman-m1a-ensem1.streamlit.app/ .
Background: Nowadays, the chance of discovering the best antibody candidates for predicting clinical malaria has notably increased due to the availability of multi-sera data. The analysis of these data is typically divided into a feature selection phase followed by a predictive one where several models are constructed for predicting the outcome of interest. A key question in the analysis is to determine which antibodies should be included in the predictive stage and whether they should be included in the original or a transformed scale (i.e. binary/dichotomized).
Methods: To answer this question, we developed three approaches for antibody selection in the context of predicting clinical malaria: (i) a basic and simple approach based on selecting antibodies via the nonparametric Mann-Whitney-Wilcoxon test; (ii) an optimal dychotomizationdichotomization approach where each antibody was selected according to the optimal cut-off via maximization of the chi-squared (χ2) statistic for two-way tables; (iii) a hybrid parametric/non-parametric approach that integrates Box-Cox transformation followed by a t-test, together with the use of finite mixture models and the Mann-Whitney-Wilcoxon test as a last resort. We illustrated the application of these three approaches with published serological data of 36 Plasmodium falciparum antigens for predicting clinical malaria in 121 Kenyan children. The predictive analysis was based on a Super Learner where predictions from multiple classifiers including the Random Forest were pooled together.
Results: Our results led to almost similar areas under the Receiver Operating Characteristic curves of 0.72 (95% CI = [0.62, 0.82]), 0.80 (95% CI = [0.71, 0.89]), 0.79 (95% CI = [0.7, 0.88]) for the simple, dichotomization and hybrid approaches, respectively. These approaches were based on 6, 20, and 16 antibodies, respectively.
Conclusions: The three feature selection strategies provided a better predictive performance of the outcome when compared to the previous results relying on Random Forest including all the 36 antibodies (AUC = 0.68, 95% CI = [0.57;0.79]). Given the similar predictive performance, we recommended that the three strategies should be used in conjunction in the same data set and selected according to their complexity.
Background: Although the 2019 EULAR/ACR classification criteria for systemic lupus erythematosus (SLE) has required at least a positive anti-nuclear antibody (ANA) titer (≥ 1:80), it remains challenging for clinicians to identify patients with SLE. This study aimed to develop a machine learning (ML) approach to assist in the detection of SLE patients using genomic data and electronic health records.
Methods: Participants with a positive ANA (≥ 1:80) were enrolled from the Taiwan Precision Medicine Initiative cohort. The Taiwan Biobank version 2 array was used to detect single nucleotide polymorphism (SNP) data. Six ML models, Logistic Regression, Random Forest (RF), Support Vector Machine, Light Gradient Boosting Machine, Gradient Tree Boosting, and Extreme Gradient Boosting (XGB), were used to identify SLE patients. The importance of the clinical and genetic features was determined by Shapley Additive Explanation (SHAP) values. A logistic regression model was applied to identify genetic variations associated with SLE in the subset of patients with an ANA equal to or exceeding 1:640.
Results: A total of 946 SLE and 1,892 non-SLE controls were included in this analysis. Among the six ML models, RF and XGB demonstrated superior performance in the differentiation of SLE from non-SLE. The leading features in the SHAP diagram were anti-double strand DNA antibodies, ANA titers, AC4 ANA pattern, polygenic risk scores, complement levels, and SNPs. Additionally, in the subgroup with a high ANA titer (≥ 1:640), six SNPs positively associated with SLE and five SNPs negatively correlated with SLE were discovered.
Conclusions: ML approaches offer the potential to assist in diagnosing SLE and uncovering novel SNPs in a group of patients with autoimmunity.