Pub Date : 2014-10-02DOI: 10.1080/21628130.2015.1127498
Mark Rice, T. Craddock, V. A. Folcik, Ryan M del Rosario, Zachary M. Barnes, N. Klimas, M. Fletcher, Joel P. Zysman, G. Broderick
We reported previously that the persistence of complex immune, endocrine and neurological symptoms that afflict up to one third of veterans from the 1990-91 Gulf War might be supported by a misdirected regulatory drive. Here we use a detailed model of immune signaling in concert with an overarching circuit model of known sex and stress hormone co-regulation to explore how the failure of regulatory elements may further establish a self-perpetuating imbalance that closely resembles Gulf War Illness (GWI). Defects to the model were imparted iteratively and the stable regulatory modes supported by these altered immune-endocrine circuits were identified using repeated simulation experiments. In each case the predicted homeostatic regimes were compared to experimental data collected in male GWI (n=20 ) and matched healthy veterans (n=22 ). We found that alignment of GWI with a new homeostatic regime improved significantly when cortisol's normal anti-inflammatory activity was interrupted. Alignment improved further when this cortisol insensitivity was compounded by the loss of the normal antagonistic effects of Th1 cytokines on Th2 lymphocyte activation. Together these simulation results suggest altered glucocorticoid gene regulation compounded by possible changes in IGF-1 regulation of Th1:Th2 immune balance may be key underlying features of GWI.
{"title":"Gulf War Illness: Is there lasting damage to the endocrine-immune circuitry?","authors":"Mark Rice, T. Craddock, V. A. Folcik, Ryan M del Rosario, Zachary M. Barnes, N. Klimas, M. Fletcher, Joel P. Zysman, G. Broderick","doi":"10.1080/21628130.2015.1127498","DOIUrl":"https://doi.org/10.1080/21628130.2015.1127498","url":null,"abstract":"We reported previously that the persistence of complex immune, endocrine and neurological symptoms that afflict up to one third of veterans from the 1990-91 Gulf War might be supported by a misdirected regulatory drive. Here we use a detailed model of immune signaling in concert with an overarching circuit model of known sex and stress hormone co-regulation to explore how the failure of regulatory elements may further establish a self-perpetuating imbalance that closely resembles Gulf War Illness (GWI). Defects to the model were imparted iteratively and the stable regulatory modes supported by these altered immune-endocrine circuits were identified using repeated simulation experiments. In each case the predicted homeostatic regimes were compared to experimental data collected in male GWI (n=20 ) and matched healthy veterans (n=22 ). We found that alignment of GWI with a new homeostatic regime improved significantly when cortisol's normal anti-inflammatory activity was interrupted. Alignment improved further when this cortisol insensitivity was compounded by the loss of the normal antagonistic effects of Th1 cytokines on Th2 lymphocyte activation. Together these simulation results suggest altered glucocorticoid gene regulation compounded by possible changes in IGF-1 regulation of Th1:Th2 immune balance may be key underlying features of GWI.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"2 1","pages":"80 - 89"},"PeriodicalIF":0.0,"publicationDate":"2014-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/21628130.2015.1127498","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60224076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-03DOI: 10.1080/21628130.2015.1016702
M. Zitnik, B. Zupan
Any knowledge discovery could in principal benefit from the fusion of directly or even indirectly related data sources. In this paper we explore whether data fusion by simultaneous matrix factorization could be adapted for survival regression. We propose a new method that jointly infers latent data factors from a number of heterogeneous data sets and estimates regression coefficients of a survival model. We have applied the method to CAMDA 2014 large-scale Cancer Genomes Challenge and modeled survival time as a function of gene, protein and miRNA expression data, and data on methylated and mutated regions. We find that both joint inference of data factors and regression coefficients and data fusion procedure are crucial for performance. Our approach is substantially more accurate than the baseline Aalen's additive model. Latent factors inferred by our approach could be mined further; for CAMDA challenge, we found that the most informative factors are related to known cancer processes.
{"title":"Survival regression by data fusion","authors":"M. Zitnik, B. Zupan","doi":"10.1080/21628130.2015.1016702","DOIUrl":"https://doi.org/10.1080/21628130.2015.1016702","url":null,"abstract":"Any knowledge discovery could in principal benefit from the fusion of directly or even indirectly related data sources. In this paper we explore whether data fusion by simultaneous matrix factorization could be adapted for survival regression. We propose a new method that jointly infers latent data factors from a number of heterogeneous data sets and estimates regression coefficients of a survival model. We have applied the method to CAMDA 2014 large-scale Cancer Genomes Challenge and modeled survival time as a function of gene, protein and miRNA expression data, and data on methylated and mutated regions. We find that both joint inference of data factors and regression coefficients and data fusion procedure are crucial for performance. Our approach is substantially more accurate than the baseline Aalen's additive model. Latent factors inferred by our approach could be mined further; for CAMDA challenge, we found that the most informative factors are related to known cancer processes.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"2 1","pages":"47 - 53"},"PeriodicalIF":0.0,"publicationDate":"2014-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/21628130.2015.1016702","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60224034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-03DOI: 10.1080/21628130.2015.1040618
S. Sikdar, Hyoyoung Choo Wosoba, Younathan Abdia, S. Dutta, R. Gill, S. Datta, S. Datta
It is known that all agents that cause cancer (carcinogens) also cause a change in the DNA sequence. In order to identify such often subtle changes, we attempt to integrate multiple molecular profile data sets released by the International Cancer Genome Consortium (ICGC). The list of data sets includes matched gene and microRNA expression profiles, somatic copy number variation, DNA methylation, and protein expression profiles for lung adenocarcinoma patients receiving treatments. We consider both unsupervised and supervised learning techniques (clustering and penalized regression) to identify interesting molecular markers corresponding to each type of –omics profiles that can differentiate patients. Associations between important markers of 2 types have been studied. An adaptive ensemble binary regression model has been presented that uses the entirety of available –omics profiles leading to a more accurate clinical prognosis for the patients in the given sample. This integrated study provides a more comprehensive picture of lung adenocarcinoma.
{"title":"An integrative exploratory analysis of –omics data from the ICGC cancer genomes lung adenocarcinoma study","authors":"S. Sikdar, Hyoyoung Choo Wosoba, Younathan Abdia, S. Dutta, R. Gill, S. Datta, S. Datta","doi":"10.1080/21628130.2015.1040618","DOIUrl":"https://doi.org/10.1080/21628130.2015.1040618","url":null,"abstract":"It is known that all agents that cause cancer (carcinogens) also cause a change in the DNA sequence. In order to identify such often subtle changes, we attempt to integrate multiple molecular profile data sets released by the International Cancer Genome Consortium (ICGC). The list of data sets includes matched gene and microRNA expression profiles, somatic copy number variation, DNA methylation, and protein expression profiles for lung adenocarcinoma patients receiving treatments. We consider both unsupervised and supervised learning techniques (clustering and penalized regression) to identify interesting molecular markers corresponding to each type of –omics profiles that can differentiate patients. Associations between important markers of 2 types have been studied. An adaptive ensemble binary regression model has been presented that uses the entirety of available –omics profiles leading to a more accurate clinical prognosis for the patients in the given sample. This integrated study provides a more comprehensive picture of lung adenocarcinoma.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"2 1","pages":"54 - 62"},"PeriodicalIF":0.0,"publicationDate":"2014-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/21628130.2015.1040618","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60224040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Jennen, J. Polman, M. Bessem, Maarten Coonen, J. V. van Delft, J. Kleinjans
In this study, we developed a transcriptomics based human in vitro model for predicting DILI in humans. The transcriptomics data (Affymetrix GeneChip Human Genome U133 Plus 2.0) from primary human hepatocytes were provided by the Japanese Toxicogenomics Project (TGP). The selected compounds were divided into two groups, i.e., most-DILI and no-DILI, based on FDA-approved drug labels. The compounds were further grouped in a training and validation set. The training set, containing the most extreme most-DILI and no-DILI compounds based on the in vivo rat clinical chemistry measurements from TGP, was used to develop the prediction model. The validation set showed high accuracy (> 90%) and performed better than splitting the compounds into training and validation set randomly.
在这项研究中,我们建立了一个基于转录组学的人类体外模型来预测人类DILI。原代人肝细胞的转录组学数据(Affymetrix GeneChip Human Genome U133 Plus 2.0)由日本毒物基因组学计划(TGP)提供。所选化合物根据fda批准的药物标签分为两组,即most-DILI和no-DILI。这些化合物进一步分组在一个训练和验证集。基于TGP的体内大鼠临床化学测量,我们使用包含最极端的dili和no-DILI化合物的训练集来建立预测模型。验证集具有较高的准确度(> 90%),优于将化合物随机分成训练集和验证集。
{"title":"Drug-induced liver injury classification model based on in vitro human transcriptomics and in vivo rat clinical chemistry data","authors":"D. Jennen, J. Polman, M. Bessem, Maarten Coonen, J. V. van Delft, J. Kleinjans","doi":"10.4161/sysb.29400","DOIUrl":"https://doi.org/10.4161/sysb.29400","url":null,"abstract":"In this study, we developed a transcriptomics based human in vitro model for predicting DILI in humans. The transcriptomics data (Affymetrix GeneChip Human Genome U133 Plus 2.0) from primary human hepatocytes were provided by the Japanese Toxicogenomics Project (TGP). The selected compounds were divided into two groups, i.e., most-DILI and no-DILI, based on FDA-approved drug labels. The compounds were further grouped in a training and validation set. The training set, containing the most extreme most-DILI and no-DILI compounds based on the in vivo rat clinical chemistry measurements from TGP, was used to develop the prediction model. The validation set showed high accuracy (> 90%) and performed better than splitting the compounds into training and validation set randomly.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"71 1","pages":"63 - 70"},"PeriodicalIF":0.0,"publicationDate":"2014-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.29400","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70656279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Suvitaival, J. Parkkinen, S. Virtanen, Samuel Kaski
We investigate the problem of detecting toxicogenomic associations that generalize across organisms, that is, statistical dependencies between transcriptional responses of multiple organisms and toxicological outcomes. We apply an interpretable probabilistic model to detect cross-organism toxicogenomic associations and propose an approach for drug toxicity analysis based on the interactive retrieval of drugs with similar toxicogenomic properties. We show that our approach can give relevant information about the properties of a drug even when direct prediction of toxicity is not feasible. Moreover, we show that a search from a cross-organism database can improve accuracy in the analysis.
{"title":"Cross-organism toxicogenomics with group factor analysis","authors":"T. Suvitaival, J. Parkkinen, S. Virtanen, Samuel Kaski","doi":"10.4161/sysb.29291","DOIUrl":"https://doi.org/10.4161/sysb.29291","url":null,"abstract":"We investigate the problem of detecting toxicogenomic associations that generalize across organisms, that is, statistical dependencies between transcriptional responses of multiple organisms and toxicological outcomes. We apply an interpretable probabilistic model to detect cross-organism toxicogenomic associations and propose an approach for drug toxicity analysis based on the interactive retrieval of drugs with similar toxicogenomic properties. We show that our approach can give relevant information about the properties of a drug even when direct prediction of toxicity is not feasible. Moreover, we show that a search from a cross-organism database can improve accuracy in the analysis.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"2 1","pages":"71 - 80"},"PeriodicalIF":0.0,"publicationDate":"2014-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.29291","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70656194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-04-03DOI: 10.1080/21628130.2015.1010928
H. Luuk
Our aim was to predict drug-induced liver injury potential from differential expression profiles in the Japanese Toxicogenomic Project's data. Additionally, we wondered which drug concentrations and treatment periods proved most informative for classification. The study confirmed that the effect of drugs on differential gene expression is dose and time-dependent suggesting that less information is contained in responses to low doses and short treatment periods. Cross-validation performance of predictions was low due to high false positive rate. Present results indicate that distinguishing between differential expression responses to nontoxic and toxic agents is non-trivial. The value of the Japanese Toxicogenomic Project's data for classification purposes would be increased if gene expression responses to additional nontoxic drugs were available.
{"title":"CAMDA 2014: Learning from differential expression in the Japanese Toxicogenomic Project","authors":"H. Luuk","doi":"10.1080/21628130.2015.1010928","DOIUrl":"https://doi.org/10.1080/21628130.2015.1010928","url":null,"abstract":"Our aim was to predict drug-induced liver injury potential from differential expression profiles in the Japanese Toxicogenomic Project's data. Additionally, we wondered which drug concentrations and treatment periods proved most informative for classification. The study confirmed that the effect of drugs on differential gene expression is dose and time-dependent suggesting that less information is contained in responses to low doses and short treatment periods. Cross-validation performance of predictions was low due to high false positive rate. Present results indicate that distinguishing between differential expression responses to nontoxic and toxic agents is non-trivial. The value of the Japanese Toxicogenomic Project's data for classification purposes would be increased if gene expression responses to additional nontoxic drugs were available.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"28 1","pages":"41 - 46"},"PeriodicalIF":0.0,"publicationDate":"2014-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/21628130.2015.1010928","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60224029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-04-03DOI: 10.1080/21628130.2015.1010923
O. Moskvin, S. McIlwain, I. Ong
Numerous methods of RNA-Seq data analysis have been developed, and there are more under active development. In this paper, our focus is on evaluating the impact of each processing stage; from pre-processing of sequencing reads to alignment/counting to count normalization to differential expression testing to downstream functional analysis, on the inferred functional pattern of biological response. We assess the impact of 6,912 combinations of technical and biological factors on the resulting signature of transcriptomic functional response. Given the absence of the ground truth, we use 2 complementary evaluation criteria: a) consistency of the functional patterns identified in 2 similar comparisons, namely effects of a naturally-toxic medium and a medium with artificially reconstituted toxicity, and b) consistency of the results in RNA-Seq and microarray versions of the same study. Our results show that despite high variability at the low-level processing stage (read pre-processing, alignment and counting) and the differential expression calling stage, their impact on the inferred pattern of biological response was surprisingly low; they were instead overshadowed by the choice of the functional enrichment method. The latter have an impact comparable in magnitude to the impact of biological factors per se.
{"title":"CAMDA 2014: Making sense of RNA-Seq data: From low-level processing to functional analysis","authors":"O. Moskvin, S. McIlwain, I. Ong","doi":"10.1080/21628130.2015.1010923","DOIUrl":"https://doi.org/10.1080/21628130.2015.1010923","url":null,"abstract":"Numerous methods of RNA-Seq data analysis have been developed, and there are more under active development. In this paper, our focus is on evaluating the impact of each processing stage; from pre-processing of sequencing reads to alignment/counting to count normalization to differential expression testing to downstream functional analysis, on the inferred functional pattern of biological response. We assess the impact of 6,912 combinations of technical and biological factors on the resulting signature of transcriptomic functional response. Given the absence of the ground truth, we use 2 complementary evaluation criteria: a) consistency of the functional patterns identified in 2 similar comparisons, namely effects of a naturally-toxic medium and a medium with artificially reconstituted toxicity, and b) consistency of the results in RNA-Seq and microarray versions of the same study. Our results show that despite high variability at the low-level processing stage (read pre-processing, alignment and counting) and the differential expression calling stage, their impact on the inferred pattern of biological response was surprisingly low; they were instead overshadowed by the choice of the functional enrichment method. The latter have an impact comparable in magnitude to the impact of biological factors per se.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"2 1","pages":"31 - 40"},"PeriodicalIF":0.0,"publicationDate":"2014-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/21628130.2015.1010923","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60223975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since experiments involving animal models are labor and time intensive, there is an attempt to replace these measurements on animal models with in vitro assays which has higher acceptance in the population concerning ethical issues. In this work, we explore to what extend animal models can be replaced by in vitro assays in the context of a toxicogenomics study. The data from the Japanese Toxicogenomics Project are gene expression profiles measured by microarrays from both in vitro and animal samples. We apply a comprehensive genomic association network analysis in order to study the comparative behavior of the genomic networks for the in vivo vs. in vitro data. The genomic networks are computed based on association scores of gene-gene pairs using a partial least squares modeling of gene expression values adjusted for sacrifice time and dosage. We apply permutation based statistical tests to compare the connectivity of a given gene, as well as a class of genes in the two networks which may be affected by a given drug. The goal is to identify parts of these networks including key genes that are not significantly altered for in vivo vs. in vitro samples for the majority of the drugs.
{"title":"Bridging in vivo and in vitro data from Japanese Toxicogenomics Project using network analyses","authors":"R. Gill, S. Datta, S. Datta","doi":"10.4161/sysb.28527","DOIUrl":"https://doi.org/10.4161/sysb.28527","url":null,"abstract":"Since experiments involving animal models are labor and time intensive, there is an attempt to replace these measurements on animal models with in vitro assays which has higher acceptance in the population concerning ethical issues. In this work, we explore to what extend animal models can be replaced by in vitro assays in the context of a toxicogenomics study. The data from the Japanese Toxicogenomics Project are gene expression profiles measured by microarrays from both in vitro and animal samples. We apply a comprehensive genomic association network analysis in order to study the comparative behavior of the genomic networks for the in vivo vs. in vitro data. The genomic networks are computed based on association scores of gene-gene pairs using a partial least squares modeling of gene expression values adjusted for sacrifice time and dosage. We apply permutation based statistical tests to compare the connectivity of a given gene, as well as a class of genes in the two networks which may be affected by a given drug. The goal is to identify parts of these networks including key genes that are not significantly altered for in vivo vs. in vitro samples for the majority of the drugs.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"2 1","pages":"1 - 7"},"PeriodicalIF":0.0,"publicationDate":"2014-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.28527","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional studies of liver toxicity involve screening compounds through in vivo and in vitro tests. They need to distinguish between compounds that represent little or no health concern and those with the greatest likelihood to cause adverse effects in humans. High-throughput and toxicogenomic screening methods coupled with a plethora of circumstantial evidence provide a challenge for improved toxicity prediction and require appropriate computational methods that integrate various biological, chemical and toxicological data. We report on a data fusion approach for prediction of drug-induced liver injury potential in humans using microarray data from the Japanese Toxicogenomics Project (TGP) as provided for the contest by CAMDA 2013 Conference. Our aim was to investigate if the data from different TGP studies could be fused together to boost prediction accuracy. We were also interested if in vitro studies provided sufficient information to refrain from studies in animals. We show that our recently proposed matrix factorization-based data fusion provides an elegant computational framework for integration of the TGP and related data sets, 29 data sets in total. Fusion yields a high cross-validated accuracy (AUC of 0.819 for in vivo assays), which is above the accuracy of the established machine learning procedure of stacked classification with feature selection. Our data analysis shows that animal studies may be replaced with in vitro assays (AUC = 0.799) and that liver injury in humans can be predicted from animal data (AUC = 0.811). Our principal contribution is a demonstration that analysis of toxicogenomic data can substantially benefit from data fusion with directly and circumstantially related data sets.
{"title":"Matrix factorization-based data fusion for drug-induced liver injury prediction","authors":"M. Zitnik, B. Zupan","doi":"10.4161/sysb.29072","DOIUrl":"https://doi.org/10.4161/sysb.29072","url":null,"abstract":"Traditional studies of liver toxicity involve screening compounds through in vivo and in vitro tests. They need to distinguish between compounds that represent little or no health concern and those with the greatest likelihood to cause adverse effects in humans. High-throughput and toxicogenomic screening methods coupled with a plethora of circumstantial evidence provide a challenge for improved toxicity prediction and require appropriate computational methods that integrate various biological, chemical and toxicological data. We report on a data fusion approach for prediction of drug-induced liver injury potential in humans using microarray data from the Japanese Toxicogenomics Project (TGP) as provided for the contest by CAMDA 2013 Conference. Our aim was to investigate if the data from different TGP studies could be fused together to boost prediction accuracy. We were also interested if in vitro studies provided sufficient information to refrain from studies in animals. We show that our recently proposed matrix factorization-based data fusion provides an elegant computational framework for integration of the TGP and related data sets, 29 data sets in total. Fusion yields a high cross-validated accuracy (AUC of 0.819 for in vivo assays), which is above the accuracy of the established machine learning procedure of stacked classification with feature selection. Our data analysis shows that animal studies may be replaced with in vitro assays (AUC = 0.799) and that liver injury in humans can be predicted from animal data (AUC = 0.811). Our principal contribution is a demonstration that analysis of toxicogenomic data can substantially benefit from data fusion with directly and circumstantially related data sets.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"112 1","pages":"16 - 22"},"PeriodicalIF":0.0,"publicationDate":"2014-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.29072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70656128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The 12th Annual International Conference on the Critical Assessment of Massive Data Analysis (CAMDA) used data from the massive Japanese Toxicogenomics Project (TGP) to predict drug-induced liver injury (DILI) concern provided by the U.S. Food and Drug Administration (FDA). The challenge was to predict DILI concern by means of gene expression data. Analysis of this high-dimensional toxicogenomic data requires statistical methodologies that can detect the transcriptomic associations with toxicity. We propose an analysis technique that involves sparse principal component analysis to efficiently reduce the dimension of the analysis problem. Sparse principal component variables are composed of groups of expressed genes. Associations between DILI concern and sparse principal component variables were tested and further scrutinized with sparse regression methodology to identify concise transcriptomic structures potentially responsible for and predictive of drug toxicity. Working with a subset of the TGP data with FDA DILI concern classification, we identified 5 transcriptomic structures (sparse principal component variables) statistically associated with DILI concern. The most statistically significant structure consists of the genes ZBTB16, FLVCR2, TNS3, and ASB13. Sparse statistical methods offer a new way to handle analysis issues with massive omic data. Sparse PCA can efficiently extract groups of transcriptomic markers that may indicate drug toxicity.
{"title":"Detecting networks of genes associated with human drug induced liver injury (DILI) concern using sparse principal components","authors":"A. Bonner, J. Beyene","doi":"10.4161/sysb.29413","DOIUrl":"https://doi.org/10.4161/sysb.29413","url":null,"abstract":"The 12th Annual International Conference on the Critical Assessment of Massive Data Analysis (CAMDA) used data from the massive Japanese Toxicogenomics Project (TGP) to predict drug-induced liver injury (DILI) concern provided by the U.S. Food and Drug Administration (FDA). The challenge was to predict DILI concern by means of gene expression data. Analysis of this high-dimensional toxicogenomic data requires statistical methodologies that can detect the transcriptomic associations with toxicity. We propose an analysis technique that involves sparse principal component analysis to efficiently reduce the dimension of the analysis problem. Sparse principal component variables are composed of groups of expressed genes. Associations between DILI concern and sparse principal component variables were tested and further scrutinized with sparse regression methodology to identify concise transcriptomic structures potentially responsible for and predictive of drug toxicity. Working with a subset of the TGP data with FDA DILI concern classification, we identified 5 transcriptomic structures (sparse principal component variables) statistically associated with DILI concern. The most statistically significant structure consists of the genes ZBTB16, FLVCR2, TNS3, and ASB13. Sparse statistical methods offer a new way to handle analysis issues with massive omic data. Sparse PCA can efficiently extract groups of transcriptomic markers that may indicate drug toxicity.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"98 1","pages":"23 - 30"},"PeriodicalIF":0.0,"publicationDate":"2014-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.29413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70656168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}