We have developed a method for the definition and the analysis of gene expression signatures for diagnostic purposes. Our approach relies on construction of a reference map of transcriptional signatures, from both healthy controls and affected patients, using the respective mRNA or miRNA profiles. Subsequently, disease diagnosis can be performed by determining the relative map position of an individual’s transcriptional signature. Our approach addresses simultaneously the scarce repeatability issue and the high sensitivity of expression profiling methods to protocol variations, thereby providing a novel approach to RNA signature definition and analysis. Specifically, our method requires only that the relative position of RNA species be accurate in a ranking by value, not their absolute values. Furthermore, our method makes no assumptions on which RNA species must be included in the signature and, by considering a large subset (or even the whole set) of known RNAs, our approach can tolerate a moderate number of erroneous inversions in the ranking. The diagnostic power of our method has been convincingly demonstrated in an open scientific competition (sbv IMPROVER Diagnostic Signature Challenge), scoring second place overall, and first place in one sub-challenge. In addition, we report the application of our method to published miRNA expression profile data sets, quantifying its performance in terms of predictive capability and robustness to batch effects, compared with current state-of-the-art methods.
{"title":"Rank-based transcriptional signatures","authors":"Mario Lauria","doi":"10.4161/sysb.25982","DOIUrl":"https://doi.org/10.4161/sysb.25982","url":null,"abstract":"We have developed a method for the definition and the analysis of gene expression signatures for diagnostic purposes. Our approach relies on construction of a reference map of transcriptional signatures, from both healthy controls and affected patients, using the respective mRNA or miRNA profiles. Subsequently, disease diagnosis can be performed by determining the relative map position of an individual’s transcriptional signature. Our approach addresses simultaneously the scarce repeatability issue and the high sensitivity of expression profiling methods to protocol variations, thereby providing a novel approach to RNA signature definition and analysis. Specifically, our method requires only that the relative position of RNA species be accurate in a ranking by value, not their absolute values. Furthermore, our method makes no assumptions on which RNA species must be included in the signature and, by considering a large subset (or even the whole set) of known RNAs, our approach can tolerate a moderate number of erroneous inversions in the ranking. The diagnostic power of our method has been convincingly demonstrated in an open scientific competition (sbv IMPROVER Diagnostic Signature Challenge), scoring second place overall, and first place in one sub-challenge. In addition, we report the application of our method to published miRNA expression profile data sets, quantifying its performance in terms of predictive capability and robustness to batch effects, compared with current state-of-the-art methods.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"26 1","pages":"228 - 239"},"PeriodicalIF":0.0,"publicationDate":"2013-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25982","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Preetam Nandy, Michael Unger, C. Zechner, K. Dey, H. Koeppl
Making reliable diagnoses and predictions based on high-throughput transcriptional data has attracted immense attention in the past few years. While experimental gene profiling techniques—such as microarray platforms—are advancing rapidly, there is an increasing demand of computational methods being able to efficiently handle such data. In this work we propose a computational workflow for extracting diagnostic gene signatures from high-throughput transcriptional profiling data. In particular, our research was performed within the scope of the first IMPROVER challenge. The goal of that challenge was to extract and verify diagnostic signatures based on microarray gene expression data in four different disease areas: psoriasis, multiple sclerosis, chronic obstructive pulmonary disease and lung cancer. Each of the different disease areas is handled using the same three-stage algorithm. First, the data are normalized based on a multi-array average (RMA) normalization procedure to account for variability among different samples and data sets. Due to the vast dimensionality of the profiling data, we subsequently perform a feature pre-selection using a Wilcoxon’s rank sum statistic. The remaining features are then used to train an L1-regularized logistic regression model which acts as our primary classifier. Using the four different data sets, we analyze the proposed method and demonstrate its use in extracting diagnostic signatures from microarray gene expression data.
{"title":"Learning diagnostic signatures from microarray data using L1-regularized logistic regression","authors":"Preetam Nandy, Michael Unger, C. Zechner, K. Dey, H. Koeppl","doi":"10.4161/sysb.25271","DOIUrl":"https://doi.org/10.4161/sysb.25271","url":null,"abstract":"Making reliable diagnoses and predictions based on high-throughput transcriptional data has attracted immense attention in the past few years. While experimental gene profiling techniques—such as microarray platforms—are advancing rapidly, there is an increasing demand of computational methods being able to efficiently handle such data. In this work we propose a computational workflow for extracting diagnostic gene signatures from high-throughput transcriptional profiling data. In particular, our research was performed within the scope of the first IMPROVER challenge. The goal of that challenge was to extract and verify diagnostic signatures based on microarray gene expression data in four different disease areas: psoriasis, multiple sclerosis, chronic obstructive pulmonary disease and lung cancer. Each of the different disease areas is handled using the same three-stage algorithm. First, the data are normalized based on a multi-array average (RMA) normalization procedure to account for variability among different samples and data sets. Due to the vast dimensionality of the profiling data, we subsequently perform a feature pre-selection using a Wilcoxon’s rank sum statistic. The remaining features are then used to train an L1-regularized logistic regression model which acts as our primary classifier. Using the four different data sets, we analyze the proposed method and demonstrate its use in extracting diagnostic signatures from microarray gene expression data.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"240 - 246"},"PeriodicalIF":0.0,"publicationDate":"2013-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25271","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70654622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Networks represent powerful inference tools for the analysis of complex biological systems. Inference is especially relevant when associations between network nodes are established by focusing on modularity. The problem of identifying first, and validating then, modules in networks has received substantial attention, and many approaches have been proposed. An important goal is functional validation of the identified modules, based on existing database resources. The quality and performance of algorithms can be assessed by evaluating the matching rate between retrieved and well annotated modules, in addition to newly established associations. Due to the variety of algorithms, the concept of module resolution spectrum has become central to this specific research field. In general, coarse-resolution modules reflect global network regulation patterns operating at the gene level or at the protein pathway scale. Fine-resolution modules localize dense regions, uncovering details of the variety of the constitutive connectivity patterns. The resolution limit problem is affected by uncertainty factors such as experimental accuracy and detection power of inference methods, and impacts the quality and accuracy of functional annotation. Our proposed approach works at the systems level; it aims to dissect networks and look at modularity in breadth-first search followed by in-depth analysis. In particular, “slicing” the protein interactome under exam yields a sort of tomography scan implemented by eigendecomposition of network affinity matrices. Such affinity matrices can be designed ad hoc, characterized by topological attributes, and analyzed with spectral methods. Consequently, a selected interactome data set allows the exploration of disease protein maps modularity through selected eigenmodes that are informative of both direct (protein-centric) and indirect (protein-neighbor centric) connectivity patterns of cancer targets and associated morbidities. The network tomography approach is thus recommended to infer about disease-induced multiscale modularity.
{"title":"Protein networks tomography","authors":"E. Capobianco","doi":"10.4161/sysb.25607","DOIUrl":"https://doi.org/10.4161/sysb.25607","url":null,"abstract":"Networks represent powerful inference tools for the analysis of complex biological systems. Inference is especially relevant when associations between network nodes are established by focusing on modularity. The problem of identifying first, and validating then, modules in networks has received substantial attention, and many approaches have been proposed. An important goal is functional validation of the identified modules, based on existing database resources. The quality and performance of algorithms can be assessed by evaluating the matching rate between retrieved and well annotated modules, in addition to newly established associations. Due to the variety of algorithms, the concept of module resolution spectrum has become central to this specific research field. In general, coarse-resolution modules reflect global network regulation patterns operating at the gene level or at the protein pathway scale. Fine-resolution modules localize dense regions, uncovering details of the variety of the constitutive connectivity patterns. The resolution limit problem is affected by uncertainty factors such as experimental accuracy and detection power of inference methods, and impacts the quality and accuracy of functional annotation. Our proposed approach works at the systems level; it aims to dissect networks and look at modularity in breadth-first search followed by in-depth analysis. In particular, “slicing” the protein interactome under exam yields a sort of tomography scan implemented by eigendecomposition of network affinity matrices. Such affinity matrices can be designed ad hoc, characterized by topological attributes, and analyzed with spectral methods. Consequently, a selected interactome data set allows the exploration of disease protein maps modularity through selected eigenmodes that are informative of both direct (protein-centric) and indirect (protein-neighbor centric) connectivity patterns of cancer targets and associated morbidities. The network tomography approach is thus recommended to infer about disease-induced multiscale modularity.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"161 - 178"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25607","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70654543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The central nervous system (CNS) is composed of hundreds of distinct cell types, each expressing different subsets of genes from the genome. High throughput gene expression analysis of the CNS from patients and controls is a common method to screen for potentially pathological molecular mechanisms of psychiatric disease. One mechanism by which gene expression might be seen to vary across samples would be alterations in the cellular composition of the tissue. While the expressions of gene 'markers' for each cell type can provide certain information of cellularity, for many rare cell types markers are not well characterized. Moreover, if only small sets of markers are known, any substantial variation of a marker's expression pattern due to experiment conditions would result in poor sensitivity and specificity. Here, our proposed method combines prior information from mice cell-specific transcriptome profiling experiments with co-expression network analysis, to select large sets of potential cell type-specific gene markers in a systematic and unbiased manner. The method is efficient and robust, and identifies sufficient markers for further cellularity analysis. We then employ the markers to analytically detect changing cellular composition in human brain. Application of our method to temporal human brain microarray data successfully detects changes in cellularity over time that roughly correspond to known epochs of human brain development. Furthermore, application of our method to human brain samples with the neurodevelopmental disorder of autism supports the interpretation that the changes in astrocytes and neurons might contribute to the disorder.
{"title":"Cell Type Specific Analysis of Human Brain Transcriptome Data to Predict Alterations in Cellular Composition.","authors":"Xiaoxiao Xu, Arye Nehorai, Joseph Dougherty","doi":"10.4161/sysb.25630","DOIUrl":"https://doi.org/10.4161/sysb.25630","url":null,"abstract":"<p><p>The central nervous system (CNS) is composed of hundreds of distinct cell types, each expressing different subsets of genes from the genome. High throughput gene expression analysis of the CNS from patients and controls is a common method to screen for potentially pathological molecular mechanisms of psychiatric disease. One mechanism by which gene expression might be seen to vary across samples would be alterations in the cellular composition of the tissue. While the expressions of gene 'markers' for each cell type can provide certain information of cellularity, for many rare cell types markers are not well characterized. Moreover, if only small sets of markers are known, any substantial variation of a marker's expression pattern due to experiment conditions would result in poor sensitivity and specificity. Here, our proposed method combines prior information from mice cell-specific transcriptome profiling experiments with co-expression network analysis, to select large sets of potential cell type-specific gene markers in a systematic and unbiased manner. The method is efficient and robust, and identifies sufficient markers for further cellularity analysis. We then employ the markers to analytically detect changing cellular composition in human brain. Application of our method to temporal human brain microarray data successfully detects changes in cellularity over time that roughly correspond to known epochs of human brain development. Furthermore, application of our method to human brain samples with the neurodevelopmental disorder of autism supports the interpretation that the changes in astrocytes and neurons might contribute to the disorder.</p>","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 3","pages":"151-160"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25630","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32768337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul Fritsch, T. Craddock, Ryan M del Rosario, Mark Rice, AnneLiese Smylie, V. A. Folcik, G. de Vries, M. Fletcher, N. Klimas, G. Broderick
Feedback mechanisms throughout the immune and endocrine systems play a significant role in maintaining physiological homeostasis. Specifically, the hypothalamic-pituitary-adrenal (HPA) and hypothalamic-pituitary-gonadal (HPG) axes contribute important oversight of immune activity and homeostatic regulation. We propose that these components form an overarching regulatory system capable of supporting multiple homeostatic regimes. These emerge as a result of the extensive feedback mechanisms involving cytokine and hormone signaling. Here we explore the possible role of such alternate regulatory programs in perpetuating chronic immune and endocrine dysfunction in males. To do this we represent documented interactions within and between components of the male HPA-HPG-immune system as a set of discrete logic circuits. Analysis of these regulatory circuits indicated that even in the absence of external perturbations this model HPA-HPG-immune network supported three distinct and stable homeostatic regimes. To investigate the relevance of these predicted homeostatic regimes, we compared them to experimental data from male subjects with Gulf War illness (GWI) and chronic fatigue syndrome (CFS), two complex chronic conditions presenting with endocrine and immune dysregulation. Results indicated that molecular profiles observed experimentally in male GWI and CFS were both distinct from the normal resting state. Profile alignments suggests that regulatory circuitry is largely intact in male GWI and that the persistent immune dysfunction in this illness may at least in part be facilitated by the body’s own homeostatic drive. Conversely the profile for male CFS was distant from all three stable states suggesting the continued influence of an exogenous agent or lasting changes to the regulatory circuitry such as epigenetic alterations.
{"title":"Succumbing to the laws of attraction","authors":"Paul Fritsch, T. Craddock, Ryan M del Rosario, Mark Rice, AnneLiese Smylie, V. A. Folcik, G. de Vries, M. Fletcher, N. Klimas, G. Broderick","doi":"10.4161/sysb.28948","DOIUrl":"https://doi.org/10.4161/sysb.28948","url":null,"abstract":"Feedback mechanisms throughout the immune and endocrine systems play a significant role in maintaining physiological homeostasis. Specifically, the hypothalamic-pituitary-adrenal (HPA) and hypothalamic-pituitary-gonadal (HPG) axes contribute important oversight of immune activity and homeostatic regulation. We propose that these components form an overarching regulatory system capable of supporting multiple homeostatic regimes. These emerge as a result of the extensive feedback mechanisms involving cytokine and hormone signaling. Here we explore the possible role of such alternate regulatory programs in perpetuating chronic immune and endocrine dysfunction in males. To do this we represent documented interactions within and between components of the male HPA-HPG-immune system as a set of discrete logic circuits. Analysis of these regulatory circuits indicated that even in the absence of external perturbations this model HPA-HPG-immune network supported three distinct and stable homeostatic regimes. To investigate the relevance of these predicted homeostatic regimes, we compared them to experimental data from male subjects with Gulf War illness (GWI) and chronic fatigue syndrome (CFS), two complex chronic conditions presenting with endocrine and immune dysregulation. Results indicated that molecular profiles observed experimentally in male GWI and CFS were both distinct from the normal resting state. Profile alignments suggests that regulatory circuitry is largely intact in male GWI and that the persistent immune dysfunction in this illness may at least in part be facilitated by the body’s own homeostatic drive. Conversely the profile for male CFS was distant from all three stable states suggesting the continued influence of an exogenous agent or lasting changes to the regulatory circuitry such as epigenetic alterations.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"179 - 194"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.28948","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Djork-Arné Clevert, A. Mayr, Andreas Mitterecker, G. Klambauer, A. Valsesia, K. Forner, M. Tuefferd, W. Talloen, J. Wojcik, Hinrich W. H. Göhlmann, S. Hochreiter
Motivation: Current clinical and biological studies apply different biotechnologies and subsequently combine the resulting -omics data to test biological hypotheses. The plethora of -omics data and their combination generates a large number of hypotheses and apparently increases the study power. Contrary to these expectations, the wealth of -omics data may even reduce the statistical power of a study because of a large correction factor for multiple testing. Typically, this loss of power in analyzing -omics data are caused by an increased false detection rate (FDR) in measurements, like falsely detected DNA copy number changes, or falsely identified differentially expressed genes. The false detections are random and, therefore, not related to the tested conditions. Thus, a high FDR considerably decreases the discovery power of studies, especially if different -omics data are involved. Results: On a HapMap data set, where known CNVs have to be re-detected, I/NI call filtering was much more efficient than variance-based filtering. In particular, the I/NI call filter outperforms variance-based filters on data with rare events like the CNVs in the HapMap data set. We assessed the efficiency of the I/NI call filter in reducing the FDR on two different cancer cell lines where it reduced the FDR 18- to 22-fold. Materials and Methods: A mitigation strategy for too high FDRs is to filter out putative false detections. We suggest using probabilistic latent variable models to identify putative false detections which may be found via such models by high estimated noise or by model-based measurement inconsistencies across samples. To select such a model, a Bayesian approach starts with the maximum a priori model that assumes no detection and selects the maximum a posteriori model. Hence detection results in a deviation of the maximal posterior from the maximal prior model measured by the information gain obtained by the data. If this information gain exceeds a threshold then the selected model obtains an Informative/Non-Informative (I/NI) call that indicates a detection. I/NI call filtering has been successfully applied in different projects, but it has so far not been shown that correction for multiple testing after I/NI call filtering still controls the type-I error rate. We prove this important property of the I/NI call and show that it is independent of commonly used test statistics for null hypotheses. We apply the I/NI call to transcriptomics (gene expression), where the prior model corresponds to a constant gene expression level across compared samples, and to genomics, analyzing copy number variation (CNV) data, where the prior model corresponds to a constant DNA copy number of 2 across compared samples.
{"title":"Increasing the discovery power of -omics studies","authors":"Djork-Arné Clevert, A. Mayr, Andreas Mitterecker, G. Klambauer, A. Valsesia, K. Forner, M. Tuefferd, W. Talloen, J. Wojcik, Hinrich W. H. Göhlmann, S. Hochreiter","doi":"10.4161/sysb.25774","DOIUrl":"https://doi.org/10.4161/sysb.25774","url":null,"abstract":"Motivation: Current clinical and biological studies apply different biotechnologies and subsequently combine the resulting -omics data to test biological hypotheses. The plethora of -omics data and their combination generates a large number of hypotheses and apparently increases the study power. Contrary to these expectations, the wealth of -omics data may even reduce the statistical power of a study because of a large correction factor for multiple testing. Typically, this loss of power in analyzing -omics data are caused by an increased false detection rate (FDR) in measurements, like falsely detected DNA copy number changes, or falsely identified differentially expressed genes. The false detections are random and, therefore, not related to the tested conditions. Thus, a high FDR considerably decreases the discovery power of studies, especially if different -omics data are involved. Results: On a HapMap data set, where known CNVs have to be re-detected, I/NI call filtering was much more efficient than variance-based filtering. In particular, the I/NI call filter outperforms variance-based filters on data with rare events like the CNVs in the HapMap data set. We assessed the efficiency of the I/NI call filter in reducing the FDR on two different cancer cell lines where it reduced the FDR 18- to 22-fold. Materials and Methods: A mitigation strategy for too high FDRs is to filter out putative false detections. We suggest using probabilistic latent variable models to identify putative false detections which may be found via such models by high estimated noise or by model-based measurement inconsistencies across samples. To select such a model, a Bayesian approach starts with the maximum a priori model that assumes no detection and selects the maximum a posteriori model. Hence detection results in a deviation of the maximal posterior from the maximal prior model measured by the information gain obtained by the data. If this information gain exceeds a threshold then the selected model obtains an Informative/Non-Informative (I/NI) call that indicates a detection. I/NI call filtering has been successfully applied in different projects, but it has so far not been shown that correction for multiple testing after I/NI call filtering still controls the type-I error rate. We prove this important property of the I/NI call and show that it is independent of commonly used test statistics for null hypotheses. We apply the I/NI call to transcriptomics (gene expression), where the prior model corresponds to a constant gene expression level across compared samples, and to genomics, analyzing copy number variation (CNV) data, where the prior model corresponds to a constant DNA copy number of 2 across compared samples.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"84 - 93"},"PeriodicalIF":0.0,"publicationDate":"2013-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25774","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70654630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Feng, J. Hobbs, Xin Lu, Y. Yu, Pan Du, W. Kibbe, J. Chandler, L. Hou, Simon M. Lin
For the first time, we report here that Illumina high-density methylation arrays can also be used to estimate DNA copy number variations. We used the Illumina HM450K methylation array data to characterize the DNA copy number aberrations in the HT-29 colon cancer cell line to test our statistical model. Results were validated using an Affymetrix SNP array. Utilizing the CAMDA 2011 glioblastoma data set, we have demonstrated that our novel statistical method can potentially lower the cost and reduce the processing time of large-scale profiling studies where both DNA copy number and methylation status are of interest. Our new method, named methylCNV, is implemented in the Lumi package of Bioconductor.
{"title":"A statistical method to estimate DNA copy number from Illumina high-density methylation arrays","authors":"Gang Feng, J. Hobbs, Xin Lu, Y. Yu, Pan Du, W. Kibbe, J. Chandler, L. Hou, Simon M. Lin","doi":"10.4161/sysb.25896","DOIUrl":"https://doi.org/10.4161/sysb.25896","url":null,"abstract":"For the first time, we report here that Illumina high-density methylation arrays can also be used to estimate DNA copy number variations. We used the Illumina HM450K methylation array data to characterize the DNA copy number aberrations in the HT-29 colon cancer cell line to test our statistical model. Results were validated using an Affymetrix SNP array. Utilizing the CAMDA 2011 glioblastoma data set, we have demonstrated that our novel statistical method can potentially lower the cost and reduce the processing time of large-scale profiling studies where both DNA copy number and methylation status are of interest. Our new method, named methylCNV, is implemented in the Lumi package of Bioconductor.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"94 - 98"},"PeriodicalIF":0.0,"publicationDate":"2013-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25896","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: Glioblastoma multiforme (GBM) is the most common and lethal primary tumor of the brain and is associated with one of the worst 5-year survival rates among all human cancers. Identification of key molecular interactions and genetic variations that influence disease course and patient outcome may provide important insights into disease biology and treatment. Results: The P38 network and the micro RNA hsa-miR-9 significantly correlate with patient outcome in a manner that suggests a possible control mechanism of the microRNA over the pathway. This control mechanism can possibly be mimicked by a set of drugs that target the P38 pathway. These drugs are part of the treatment regimen for a subpopulation of the patients that participated in the TCGA study and for which the study provides clinical information. Conclusions: The results presented here call for attention to P38 network targeted treatments and identify the P38 network–hsa-miR-9 interaction as a critical control mechanism in GBM. Methods The Cancer Genome Atlas (TCGA), http://cancergenome.nih.gov/, provides the molecular profiles of 373 patients. Using the TCGA data set and two additional independent molecular and clinical data sets with a set of network-based computational algorithms, we were able to identify a single pathway and a microRNA that were implicated with disease outcome.
多形性胶质母细胞瘤(GBM)是最常见和致命的脑肿瘤,是所有人类癌症中5年生存率最差的肿瘤之一。确定影响病程和患者预后的关键分子相互作用和遗传变异可能为疾病生物学和治疗提供重要见解。结果:P38网络和微RNA hsa-miR-9与患者预后显著相关,这表明微RNA对该途径的可能控制机制。这种控制机制可能被一组靶向P38途径的药物所模仿。这些药物是参与TCGA研究的患者亚群的治疗方案的一部分,该研究为其提供了临床信息。结论:本文提出的结果呼吁关注P38网络靶向治疗,并确定P38网络- hsa- mir -9相互作用是GBM的关键控制机制。方法癌症基因组图谱(TCGA), http://cancergenome.nih.gov/,提供373例患者的分子图谱。利用TCGA数据集和另外两个独立的分子和临床数据集以及一套基于网络的计算算法,我们能够识别出与疾病结局有关的单一途径和microRNA。
{"title":"hsa-miR-9 and drug control over the P38 network as driving disease outcome in GBM patients","authors":"Rotem Ben-Hamo, S. Efroni","doi":"10.4161/sysb.25815","DOIUrl":"https://doi.org/10.4161/sysb.25815","url":null,"abstract":"Introduction: Glioblastoma multiforme (GBM) is the most common and lethal primary tumor of the brain and is associated with one of the worst 5-year survival rates among all human cancers. Identification of key molecular interactions and genetic variations that influence disease course and patient outcome may provide important insights into disease biology and treatment. Results: The P38 network and the micro RNA hsa-miR-9 significantly correlate with patient outcome in a manner that suggests a possible control mechanism of the microRNA over the pathway. This control mechanism can possibly be mimicked by a set of drugs that target the P38 pathway. These drugs are part of the treatment regimen for a subpopulation of the patients that participated in the TCGA study and for which the study provides clinical information. Conclusions: The results presented here call for attention to P38 network targeted treatments and identify the P38 network–hsa-miR-9 interaction as a critical control mechanism in GBM. Methods The Cancer Genome Atlas (TCGA), http://cancergenome.nih.gov/, provides the molecular profiles of 373 patients. Using the TCGA data set and two additional independent molecular and clinical data sets with a set of network-based computational algorithms, we were able to identify a single pathway and a microRNA that were implicated with disease outcome.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"76 - 83"},"PeriodicalIF":0.0,"publicationDate":"2013-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25815","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70654867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Louhimo, V. Aittomäki, A. Faisal, M. Laakso, Ping Chen, K. Ovaska, E. Valo, L. Lahti, V. Rogojin, Samuel Kaski, S. Hautaniemi
Background: Cancers are complex diseases whose comprehensive characterization requires genome-scale molecular data at multiple levels from genetics to transcriptomics and clinical data. Using our recently published Anduril bioinformatics framework and novel computational approaches, such as dependency analysis, we identify key variables at miRNA, copy number variation, expression, methylation, and pathway levels in glioblastoma multiforme (GBM) progression and drug resistance. Furthermore, we identify characteristics of clinically relevant subgroups, such as patients treated with temozolomide and patients with an EGFRvIII mutation, which is a constitutively active variant of EGFR. Results: We identify several novel genomic regions and transcript profiles that may contribute to GBM progression and drug resistance. All results and Anduril scripts are available at http://csbi.ltdk.helsinki.fi/camda/. Conclusions: Our results highlight the need for approaches that define context at several levels in order to identify genomic regions or transcript profiles playing key roles in cancer progression and drug resistance.
{"title":"Systematic use of computational methods allows stratification of treatment responders in glioblastoma multiforme","authors":"R. Louhimo, V. Aittomäki, A. Faisal, M. Laakso, Ping Chen, K. Ovaska, E. Valo, L. Lahti, V. Rogojin, Samuel Kaski, S. Hautaniemi","doi":"10.4161/sysb.28904","DOIUrl":"https://doi.org/10.4161/sysb.28904","url":null,"abstract":"Background: Cancers are complex diseases whose comprehensive characterization requires genome-scale molecular data at multiple levels from genetics to transcriptomics and clinical data. Using our recently published Anduril bioinformatics framework and novel computational approaches, such as dependency analysis, we identify key variables at miRNA, copy number variation, expression, methylation, and pathway levels in glioblastoma multiforme (GBM) progression and drug resistance. Furthermore, we identify characteristics of clinically relevant subgroups, such as patients treated with temozolomide and patients with an EGFRvIII mutation, which is a constitutively active variant of EGFR. Results: We identify several novel genomic regions and transcript profiles that may contribute to GBM progression and drug resistance. All results and Anduril scripts are available at http://csbi.ltdk.helsinki.fi/camda/. Conclusions: Our results highlight the need for approaches that define context at several levels in order to identify genomic regions or transcript profiles playing key roles in cancer progression and drug resistance.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"130 - 136"},"PeriodicalIF":0.0,"publicationDate":"2013-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.28904","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper outlines the construction of statistical models for liver pathology in rats and for drug induced liver injury. The envisioned purpose for these models would be to improve the cost of discovering compound toxicity in order to improve the overall cost of drug discovery. The size and breadth of the CAMDA liver toxicity data set presents unique opportunity to test whether statistical toxicity models can serve this purpose. The paper develops models for predicting toxicity from gene expression data. These models purposely exclude physiology and pathology data available in the CAMDA data. Physiology and pathology data require live rats and expensive time-consuming processing that are antithetical to the goal of reducing the time and cost required to determine compound toxicity. Two models are described. One employs Lasso regression and glmnet algorithm to extract models for rat liver pathology. The other employs stochastic gradient boosting to extract models for drug induced liver injury. This paper demonstrates that, given a data set of the size and quality of the CAMDA data, modern machine learning algorithms can extract high quality models—models with sufficient accuracy and specificity to serve the goal of reducing the costs of discovering compound toxicity.
{"title":"Statistical models for predicting liver toxicity from genomic data","authors":"Mike Bowles, R. Shigeta","doi":"10.4161/sysb.24254","DOIUrl":"https://doi.org/10.4161/sysb.24254","url":null,"abstract":"This paper outlines the construction of statistical models for liver pathology in rats and for drug induced liver injury. The envisioned purpose for these models would be to improve the cost of discovering compound toxicity in order to improve the overall cost of drug discovery. The size and breadth of the CAMDA liver toxicity data set presents unique opportunity to test whether statistical toxicity models can serve this purpose. The paper develops models for predicting toxicity from gene expression data. These models purposely exclude physiology and pathology data available in the CAMDA data. Physiology and pathology data require live rats and expensive time-consuming processing that are antithetical to the goal of reducing the time and cost required to determine compound toxicity. Two models are described. One employs Lasso regression and glmnet algorithm to extract models for rat liver pathology. The other employs stochastic gradient boosting to extract models for drug induced liver injury. This paper demonstrates that, given a data set of the size and quality of the CAMDA data, modern machine learning algorithms can extract high quality models—models with sufficient accuracy and specificity to serve the goal of reducing the costs of discovering compound toxicity.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"144 - 149"},"PeriodicalIF":0.0,"publicationDate":"2013-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.24254","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70654151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}