Large scale -omics datasets can provide new insights into normal and disease-related biology when analyzed through a systems biology framework. However, technical artefacts present in most -omics datasets due to variations in sample preparation, batching, platform settings, personnel, and other experimental procedures prevent useful analyses of such data without prior adjustment for these technical factors. Here, we demonstrate a tunable median polish of ratio (TAMPOR) approach for batch effect correction and agglomeration of multiple, multi-batch, site-specific cohorts into a single analyte abundance data matrix that is suitable for systems biology analyses. We illustrate the utility and versatility of TAMPOR through four distinct use cases where the method has been applied to different proteomic datasets, some of which contain a specific defect that must be addressed prior to analysis. We compare quality control metrics and sources of variance before and after application of TAMPOR to show that TAMPOR is effective at removing batch effects and other unwanted sources of variance in -omics data. We also show how TAMPOR can be used to harmonize -omics datasets even when the data are acquired using different analytical approaches. TAMPOR is a powerful and flexible approach for cleaning and harmonization of -omics data prior to downstream systems biology analysis.
Background: Count scores, disease clustering, and pairwise associations between diseases remain ubiquitous in multimorbidity research despite two major shortcomings: they yield no insight into plausible mechanisms underlying multimorbidity, and they ignore higher-order interactions such as effect modification.
Objectives: We argue that two components are currently missing but vital to develop novel multimorbidity metrics. Firstly, networks should be constructed which consists simultaneously of signs, symptoms, and diseases, since only then could they yield insight into plausible shared biological mechanisms underlying diseases.Secondly, learning pairwise associations is insufficient to fully characterize the correlations in a system. That is, synergistic (e.g., cooperative or antagonistic) effects are widespread in complex systems, where two or more elements combined give a larger or smaller effect than the sum of their individual effects. It can even occur that pairs of symptoms have no pairwise associations whatsoever, but in combination have a significant association. Therefore, higher-order interactions should be included in networks used to study multimorbidity, resulting in so-called hypergraphs.
Methods: We illustrate our argument using a synthetic Bayesian Network model of symptoms, signs and diseases, composed of pairwise and higher-order interactions. We simulate network interventions on both individual and population levels and compare the ground-truth outcomes with the predictions from pairwise associations.
Conclusion: We find that, when judged purely from the pairwise associations, interventions can have unexpected 'side-effects' or the most opportune intervention could be missed. The hypergraph uncovers links missed in pairwise networks, giving a more complete overview of sign and disease associations.
The principles governing genotype-phenotype relationships are still emerging(1-3), and detailed translational as well as transcriptomic information is required to understand complex phenotypes, such as the pathogenesis of Alzheimer's disease. For this reason, the proteomics of Alzheimer disease (AD) continues to be studied extensively. Although comparisons between data obtained from humans and mouse models have been reported, approaches that specifically address the between-species statistical comparisons are understudied. Our study investigated the performance of two statistical methods for identification of proteins and biological pathways associated with Alzheimer's disease for cross-species comparisons, taking specific data analysis challenges into account, including collinearity, dimensionality reduction and cross-species protein matching. We used a human dataset from a well-characterized cohort followed for over 22 years with proteomic data available. For the mouse model, we generated proteomic data from whole brains of CVN-AD and matching control mouse models. We used these analyses to determine the reliability of a mouse model to forecast significant proteomic-based pathological changes in the brain that may mimic pathology in human Alzheimer's disease. Compared with LASSO regression, partial least squares discriminant analysis provided better statistical performance for the proteomics analysis. The major biological finding of the study was that extracellular matrix proteins and integrin-related pathways were dysregulated in both the human and mouse data. This approach may help inform the development of mouse models that are more relevant to the study of human late-onset Alzheimer's disease.