Z Dimitrova, D S Campo, S Ramachandran, G Vaughan, L Ganova-Raeva, Y Lin, J C Forbi, G Xia, P Skums, B Pearlman, Y Khudyakov
Hepatitis C Virus sequence studies mainly focus on the viral amplicon containing the Hypervariable region 1 (HVR1) to obtain a sample of sequences from which several population genetics parameters can be calculated. Recent advances in sequencing methods allow for analyzing an unprecedented number of viral variants from infected patients and present a novel opportunity for understanding viral evolution, drug resistance and immune escape. In the present paper, we compared three recent technologies for amplicon analysis: (i) Next-Generation Sequencing; (ii) Clonal sequencing using End-point Limiting-dilution for isolation of individual sequence variants followed by Real-Time PCR and sequencing; and (iii) Mass spectrometry of base-specific cleavage reactions of a target sequence. These three technologies were used to assess intra-host diversity and inter-host genetic relatedness in HVR1 amplicons obtained from 38 patients (subgenotypes 1a and 1b). Assessments of intra-host diversity varied greatly between sequence-based and mass-spectrometry-based data. However, assessments of inter-host variability by all three technologies were equally accurate in identification of genetic relatedness among viral strains. These results support the application of all three technologies for molecular epidemiology and population genetics studies. Mass spectrometry is especially promising given its high throughput, low cost and comparable results with sequence-based methods.
{"title":"Evaluation of viral heterogeneity using next-generation sequencing, end-point limiting-dilution and mass spectrometry.","authors":"Z Dimitrova, D S Campo, S Ramachandran, G Vaughan, L Ganova-Raeva, Y Lin, J C Forbi, G Xia, P Skums, B Pearlman, Y Khudyakov","doi":"10.3233/ISB-2012-0453","DOIUrl":"https://doi.org/10.3233/ISB-2012-0453","url":null,"abstract":"<p><p>Hepatitis C Virus sequence studies mainly focus on the viral amplicon containing the Hypervariable region 1 (HVR1) to obtain a sample of sequences from which several population genetics parameters can be calculated. Recent advances in sequencing methods allow for analyzing an unprecedented number of viral variants from infected patients and present a novel opportunity for understanding viral evolution, drug resistance and immune escape. In the present paper, we compared three recent technologies for amplicon analysis: (i) Next-Generation Sequencing; (ii) Clonal sequencing using End-point Limiting-dilution for isolation of individual sequence variants followed by Real-Time PCR and sequencing; and (iii) Mass spectrometry of base-specific cleavage reactions of a target sequence. These three technologies were used to assess intra-host diversity and inter-host genetic relatedness in HVR1 amplicons obtained from 38 patients (subgenotypes 1a and 1b). Assessments of intra-host diversity varied greatly between sequence-based and mass-spectrometry-based data. However, assessments of inter-host variability by all three technologies were equally accurate in identification of genetic relatedness among viral strains. These results support the application of all three technologies for molecular epidemiology and population genetics studies. Mass spectrometry is especially promising given its high throughput, low cost and comparable results with sequence-based methods.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"183-92"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0453","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D S Campo, Z Dimitrova, J Lara, M Purdy, H Thai, S Ramachandran, L Ganova-Raeva, X Zhai, J C Forbi, C G Teo, Y Khudyakov
The detection of compensatory mutations that abrogate negative fitness effects of drug-resistance and vaccine-escape mutations indicates the important role of epistatic connectivity in evolution of viruses, especially under the strong selection pressures. Mapping of epistatic connectivity in the form of coordinated substitutions should help to characterize molecular mechanisms shaping viral evolution and provides a tool for the development of novel anti-viral drugs and vaccines. We analyzed coordinated variation among amino acid sites in 370 the hepatitis B virus (HBV) polymerase sequences using Bayesian networks. Among the HBV polymerase domains the spacer domain separating terminal protein from the reverse-transcriptase domain, showed the highest network centrality. Coordinated substitutions preserve the hydrophobicity and charge of Spacer. Maximum likelihood estimates of codon selection showed that Spacer contains the highest number of positively selected sites. Identification of 67% of the domain lacking an ordered structure suggests that Spacer belongs to the class of intrinsically disordered domains and proteins whose crucial functional role in the regulation of transcription, translation and cellular signal transduction has only recently been recognized. Spacer plays a central role in the epistatic network associating substitutions across the HBV genome, including those conferring viral virulence, drug resistance and vaccine escape. The data suggest that Spacer is extensively involved in coordination of HBV evolution.
{"title":"Coordinated evolution of the hepatitis B virus polymerase.","authors":"D S Campo, Z Dimitrova, J Lara, M Purdy, H Thai, S Ramachandran, L Ganova-Raeva, X Zhai, J C Forbi, C G Teo, Y Khudyakov","doi":"10.3233/ISB-2012-0452","DOIUrl":"https://doi.org/10.3233/ISB-2012-0452","url":null,"abstract":"The detection of compensatory mutations that abrogate negative fitness effects of drug-resistance and vaccine-escape mutations indicates the important role of epistatic connectivity in evolution of viruses, especially under the strong selection pressures. Mapping of epistatic connectivity in the form of coordinated substitutions should help to characterize molecular mechanisms shaping viral evolution and provides a tool for the development of novel anti-viral drugs and vaccines. We analyzed coordinated variation among amino acid sites in 370 the hepatitis B virus (HBV) polymerase sequences using Bayesian networks. Among the HBV polymerase domains the spacer domain separating terminal protein from the reverse-transcriptase domain, showed the highest network centrality. Coordinated substitutions preserve the hydrophobicity and charge of Spacer. Maximum likelihood estimates of codon selection showed that Spacer contains the highest number of positively selected sites. Identification of 67% of the domain lacking an ordered structure suggests that Spacer belongs to the class of intrinsically disordered domains and proteins whose crucial functional role in the regulation of transcription, translation and cellular signal transduction has only recently been recognized. Spacer plays a central role in the epistatic network associating substitutions across the HBV genome, including those conferring viral virulence, drug resistance and vaccine escape. The data suggest that Spacer is extensively involved in coordination of HBV evolution.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"175-82"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serghei Mangul, Adrian Caciula, Olga Glebova, Ion Mandoiu, Alex Zelikovsky
The paper addresses the problem of how to use RNA-Seq data for transcriptome reconstruction and quantification, as well as novel transcript discovery in partially annotated genomes. We present a novel annotation-guided general framework for transcriptome discovery, reconstruction and quantification in partially annotated genomes and compare it with existing annotation-guided and genome-guided transcriptome assembly methods. Our method, referred as Discovery and Reconstruction of Unannotated Transcripts (DRUT), can be used to enhance existing transcriptome assemblers, such as Cufflinks, as well as to accurately estimate the transcript frequencies. Empirical analysis on synthetic datasets confirms that Cufflinks enhanced by DRUT has superior quality of reconstruction and frequency estimation of transcripts.
{"title":"Improved transcriptome quantification and reconstruction from RNA-Seq reads using partial annotations.","authors":"Serghei Mangul, Adrian Caciula, Olga Glebova, Ion Mandoiu, Alex Zelikovsky","doi":"10.3233/ISB-2012-0459","DOIUrl":"https://doi.org/10.3233/ISB-2012-0459","url":null,"abstract":"<p><p>The paper addresses the problem of how to use RNA-Seq data for transcriptome reconstruction and quantification, as well as novel transcript discovery in partially annotated genomes. We present a novel annotation-guided general framework for transcriptome discovery, reconstruction and quantification in partially annotated genomes and compare it with existing annotation-guided and genome-guided transcriptome assembly methods. Our method, referred as Discovery and Reconstruction of Unannotated Transcripts (DRUT), can be used to enhance existing transcriptome assemblers, such as Cufflinks, as well as to accurately estimate the transcript frequencies. Empirical analysis on synthetic datasets confirms that Cufflinks enhanced by DRUT has superior quality of reconstruction and frequency estimation of transcripts.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"251-61"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0459","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y Y Vaskin, I V Khomicheva, E V Ignatieva, E E Vityaev
The task of automatic extraction of the hierarchical structure of eukaryotic gene regulatory regions is in the junction of the fields of biology, mathematics and information technologies. A solution of the problem involves understanding of sophisticated mechanisms of eukaryotic gene regulation and applying advanced data mining technologies. In the paper the integrated system, implementing a powerful relation mining of biological data method, is discussed. The system allows taking into account prior information about the gene regulatory regions that is known by the biologist, performing the analysis on each hierarchical level, searching for a solution from a simple hypothesis to a complex one. The integration of ExpertDiscovery system into UGENE toolkit provides a convenient environment for conducting complex research and automating the work of a biologist. For demonstration, the system has been applied for recognition of SF1, SREBP, HNF4 vertebrate binding sites and for the analysis the human gene regulatory regions that promote liver-specific transcription.
{"title":"ExpertDiscovery and UGENE integrated system for intelligent analysis of regulatory regions of genes.","authors":"Y Y Vaskin, I V Khomicheva, E V Ignatieva, E E Vityaev","doi":"10.3233/ISB-2012-0448","DOIUrl":"https://doi.org/10.3233/ISB-2012-0448","url":null,"abstract":"<p><p>The task of automatic extraction of the hierarchical structure of eukaryotic gene regulatory regions is in the junction of the fields of biology, mathematics and information technologies. A solution of the problem involves understanding of sophisticated mechanisms of eukaryotic gene regulation and applying advanced data mining technologies. In the paper the integrated system, implementing a powerful relation mining of biological data method, is discussed. The system allows taking into account prior information about the gene regulatory regions that is known by the biologist, performing the analysis on each hierarchical level, searching for a solution from a simple hypothesis to a complex one. The integration of ExpertDiscovery system into UGENE toolkit provides a convenient environment for conducting complex research and automating the work of a biologist. For demonstration, the system has been applied for recognition of SF1, SREBP, HNF4 vertebrate binding sites and for the analysis the human gene regulatory regions that promote liver-specific transcription.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 3-4","pages":"97-108"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0448","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30870648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li-Ping Long, Changhe Yuan, Zhipeng Cai, Huiping Xu, Xiu-Feng Wan
Influenza A viruses have been responsible for large losses of lives around the world and continue to present a great public health challenge. In April 2009, a novel swine-origin H1N1 virus emerged in North America and caused the first pandemic of the 21st century. Toward the end of 2009, two waves of outbreaks occurred, and then the disease moderated. It will be critical to understand how this novel pandemic virus invaded and adapted to a human population. To understand the molecular dynamics and evolution in this pandemic H1N1 virus, we applied an Expectation-Maximization algorithm to estimate the Gaussian mixture in the genetic population of the hemagglutinin (HA) gene of these H1N1 viruses from April of 2009 to January of 2010 and compared them with the viruses that cause seasonal H1N1 influenza. Our results show that, after it was introduced to human population, the 2009 H1N1 viral HA gene changed its population structure from a single Gaussian distribution to two major Gaussian distributions. The breadths of HA genetic diversity of 2009 H1N1 virus also increased from the first wave to the second wave of this pandemic. Phylogenetic analyses demonstrated that only certain HA sublineages of 2009 H1N1 viruses were able to circulate throughout the pandemic period. In contrast, the influenza HA population structure of seasonal H1N1 virus was relatively stable, and the breadth of HA genetic diversity within a single season population remained similar. This study revealed an evolutionary mechanism for a novel pandemic virus. After the virus is introduced to human population, the influenza virus would expand their molecular diversity through both random mutations (genetic drift) and selections. Eventually, multiple levels of hierarchical Gaussian distributions will replace the earlier single distribution. An evolutionary model for pandemic H1N1 influenza A virus was proposed and demonstrated with a simulation.
甲型流感病毒在世界各地造成了巨大的生命损失,并继续对公共卫生构成巨大挑战。2009 年 4 月,一种源于猪的新型 H1N1 病毒在北美出现,并引发了 21 世纪的首次大流行。2009 年底,爆发了两波疫情,随后疫情有所缓和。了解这种新型大流行病毒是如何入侵并适应人类群体的至关重要。为了了解这种大流行 H1N1 病毒的分子动力学和进化过程,我们应用期望最大化算法估计了 2009 年 4 月至 2010 年 1 月期间这些 H1N1 病毒血凝素(HA)基因遗传群体的高斯混合物,并将其与引起季节性 H1N1 流感的病毒进行了比较。结果表明,2009 年 H1N1 病毒 HA 基因进入人类后,其种群结构从单一高斯分布变为两大高斯分布。2009 H1N1 病毒 HA 基因多样性的广度也从此次流感大流行的第一波增加到了第二波。系统发生学分析表明,2009 H1N1 病毒中只有某些 HA 亚系能够在整个大流行期间流行。相比之下,季节性 H1N1 病毒的流感 HA 群体结构相对稳定,单季群体内 HA 遗传多样性的广度保持相似。这项研究揭示了新型大流行病毒的进化机制。病毒进入人类后,流感病毒会通过随机突变(基因漂移)和选择两种方式扩大其分子多样性。最终,多层次的高斯分布将取代早期的单一分布。本文提出了甲型 H1N1 流感病毒大流行的进化模型,并进行了模拟演示。
{"title":"Mixture model analysis reflecting dynamics of the population diversity of 2009 pandemic H1N1 influenza virus.","authors":"Li-Ping Long, Changhe Yuan, Zhipeng Cai, Huiping Xu, Xiu-Feng Wan","doi":"10.3233/ISB-2012-0457","DOIUrl":"10.3233/ISB-2012-0457","url":null,"abstract":"<p><p>Influenza A viruses have been responsible for large losses of lives around the world and continue to present a great public health challenge. In April 2009, a novel swine-origin H1N1 virus emerged in North America and caused the first pandemic of the 21st century. Toward the end of 2009, two waves of outbreaks occurred, and then the disease moderated. It will be critical to understand how this novel pandemic virus invaded and adapted to a human population. To understand the molecular dynamics and evolution in this pandemic H1N1 virus, we applied an Expectation-Maximization algorithm to estimate the Gaussian mixture in the genetic population of the hemagglutinin (HA) gene of these H1N1 viruses from April of 2009 to January of 2010 and compared them with the viruses that cause seasonal H1N1 influenza. Our results show that, after it was introduced to human population, the 2009 H1N1 viral HA gene changed its population structure from a single Gaussian distribution to two major Gaussian distributions. The breadths of HA genetic diversity of 2009 H1N1 virus also increased from the first wave to the second wave of this pandemic. Phylogenetic analyses demonstrated that only certain HA sublineages of 2009 H1N1 viruses were able to circulate throughout the pandemic period. In contrast, the influenza HA population structure of seasonal H1N1 virus was relatively stable, and the breadth of HA genetic diversity within a single season population remained similar. This study revealed an evolutionary mechanism for a novel pandemic virus. After the virus is introduced to human population, the influenza virus would expand their molecular diversity through both random mutations (genetic drift) and selections. Eventually, multiple levels of hierarchical Gaussian distributions will replace the earlier single distribution. An evolutionary model for pandemic H1N1 influenza A virus was proposed and demonstrated with a simulation.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"225-36"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710479/pdf/nihms749403.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MicroRNA expression profiles can improve classification, diagnosis, and prognostic information of malignancies, including lung cancer. In this paper, we undertook to develop a miRNA-mRNA network and uncover unique growth suppressive miRNAs in lung cancer using microarray data. The miRNA-mRNA network was developed based on a bipartite graph theory approach, and a number of miRNA-mRNA modules have been identified to mine associations between miRNAs and mRNAs. From the network, we identified totally 29 protective miRNA-mRNA regulatory modules, since we restricted our search to protective miRNAs. Subsequently we analyzed the pathways for the target genes in the protective miRNA-mRNA modules using Pathway-Express. The miRNA-mRNA network efficiently detects hub mRNAs deregulated by the protective miRNAs and identifies cancer specific miRNAs in lung cancer. From the pathway analysis results, the ECM receptor pathway, Focal adhesion pathway and cell adhesion molecules pathway seem to be more interesting to investigate, since these pathways were related to all the ten protective miRNAs. Furthermore, protective miRNA target analysis revealed that genes VCAN, SIL, CD44 and MMP14 were found to have an important role in these pathways. Hence, it was inferred that these genes can be important putative targets for those protective miRNAs. A greater understanding of the mechanisms regulating VCAN, SIL, CD44 and MMP14 expression and activity will assist in the development of specific inhibitors of cancer cell metastasis. Thus these observations are expected to have an intense implication in cancer and may be useful for further research.
{"title":"miRNA-mRNA network detects hub mRNAs and cancer specific miRNAs in lung cancer.","authors":"Saranya Devaraj, Jeyakumar Natarajan","doi":"10.3233/ISB-2012-0444","DOIUrl":"https://doi.org/10.3233/ISB-2012-0444","url":null,"abstract":"<p><p>MicroRNA expression profiles can improve classification, diagnosis, and prognostic information of malignancies, including lung cancer. In this paper, we undertook to develop a miRNA-mRNA network and uncover unique growth suppressive miRNAs in lung cancer using microarray data. The miRNA-mRNA network was developed based on a bipartite graph theory approach, and a number of miRNA-mRNA modules have been identified to mine associations between miRNAs and mRNAs. From the network, we identified totally 29 protective miRNA-mRNA regulatory modules, since we restricted our search to protective miRNAs. Subsequently we analyzed the pathways for the target genes in the protective miRNA-mRNA modules using Pathway-Express. The miRNA-mRNA network efficiently detects hub mRNAs deregulated by the protective miRNAs and identifies cancer specific miRNAs in lung cancer. From the pathway analysis results, the ECM receptor pathway, Focal adhesion pathway and cell adhesion molecules pathway seem to be more interesting to investigate, since these pathways were related to all the ten protective miRNAs. Furthermore, protective miRNA target analysis revealed that genes VCAN, SIL, CD44 and MMP14 were found to have an important role in these pathways. Hence, it was inferred that these genes can be important putative targets for those protective miRNAs. A greater understanding of the mechanisms regulating VCAN, SIL, CD44 and MMP14 expression and activity will assist in the development of specific inhibitors of cancer cell metastasis. Thus these observations are expected to have an intense implication in cancer and may be useful for further research.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"281-95"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.
{"title":"QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads.","authors":"Austin Huang, Rami Kantor, Allison DeLong, Leeann Schreier, Sorin Istrail","doi":"10.3233/ISB-2012-0454","DOIUrl":"10.3233/ISB-2012-0454","url":null,"abstract":"<p><p>Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"193-201"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530257/pdf/nihms879660.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The identification of common tumor signatures can discover the shared molecular mechanisms underlying tumorgenesis whereby we can prevent and treat tumors by a system intervention. We identified tumor-associated signatures including pathways, transcription factors, microRNAs and gene ontology categories by analyzing gene sets for differential expression between normal vs. tumor phenotypes classes in various tumor gene expression datasets. We obtained the common tumor signatures based on their identified frequencies for different tumor types. Some shared signatures important for various tumor types were uncovered and discussed. We proposed that the interventions aiming at both the shared tumor signatures and the tissue-specific tumor signatures might be a potential approach to overcoming cancer.
{"title":"Identification of common tumor signatures based on gene set enrichment analysis.","authors":"Xiaosheng Wang","doi":"10.3233/ISB-2012-0440","DOIUrl":"10.3233/ISB-2012-0440","url":null,"abstract":"<p><p>The identification of common tumor signatures can discover the shared molecular mechanisms underlying tumorgenesis whereby we can prevent and treat tumors by a system intervention. We identified tumor-associated signatures including pathways, transcription factors, microRNAs and gene ontology categories by analyzing gene sets for differential expression between normal vs. tumor phenotypes classes in various tumor gene expression datasets. We obtained the common tumor signatures based on their identified frequencies for different tumor types. Some shared signatures important for various tumor types were uncovered and discussed. We proposed that the interventions aiming at both the shared tumor signatures and the tissue-specific tumor signatures might be a potential approach to overcoming cancer.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 1-2","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3579559/pdf/nihms443974.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30550460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajarshi Guha, Gary D Wiggins, David J Wild, Mu-Hyun Baik, Marlon E Pierce And, Geoffrey C Fox
Some of the latest trends in cheminformatics, computation, and the world wide web are reviewed with predictions of how these are likely to impact the field of cheminformatics in the next five years. The vision and some of the work of the Chemical Informatics and Cyberinfrastructure Collaboratory at Indiana University are described, which we base around the core concepts of e-Science and cyberinfrastructure that have proven successful in other fields. Our chemical informatics cyberinfrastructure is realized by building a flexible, generic infrastructure for cheminformatics tools and databases, exporting "best of breed" methods as easily-accessible web APIs for cheminformaticians, scientists, and researchers in other disciplines, and hosting a unique chemical informatics education program aimed at scientists and cheminformatics practitioners in academia and industry.
{"title":"Improving usability and accessibility of cheminformatics tools for chemists through cyberinfrastructure and education.","authors":"Rajarshi Guha, Gary D Wiggins, David J Wild, Mu-Hyun Baik, Marlon E Pierce And, Geoffrey C Fox","doi":"10.3233/CI-2008-0015","DOIUrl":"https://doi.org/10.3233/CI-2008-0015","url":null,"abstract":"<p><p>Some of the latest trends in cheminformatics, computation, and the world wide web are reviewed with predictions of how these are likely to impact the field of cheminformatics in the next five years. The vision and some of the work of the Chemical Informatics and Cyberinfrastructure Collaboratory at Indiana University are described, which we base around the core concepts of e-Science and cyberinfrastructure that have proven successful in other fields. Our chemical informatics cyberinfrastructure is realized by building a flexible, generic infrastructure for cheminformatics tools and databases, exporting \"best of breed\" methods as easily-accessible web APIs for cheminformaticians, scientists, and researchers in other disciplines, and hosting a unique chemical informatics education program aimed at scientists and cheminformatics practitioners in academia and industry.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 1-2","pages":"41-60"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/CI-2008-0015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30550463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I V Astrakhantseva, D S Campo, A Araujo, C-G Teo, Y Khudyakov, S Kamili
Distinguishing between acute and chronic HCV infections is clinically important given that early treatment of infected patients leads to high rates of sustained virological response. Analysis of 2179 clonal sequences derived from hypervariable region 1 (HVR1) of the HCV genome in samples obtained from patients with acute (n = 49) and chronic (n = 102) HCV infection showed that intra-host HVR1 diversity was 1.8 times higher in patients with chronic than acute infection. Significant differences in frequencies of 5 amino acids (positions 5, 7, 12, 16 and 18) and the average genetic distances among intra-host HVR1 variants were found using analysis of molecular variance. Differences were also observed in the polarity, volume and hydrophobicity of 10 amino acids (at positions 1, 4, 5, 12, 14, 15, 16, 21, 22 and 29). Based on these properties, a classification model could be constructed, which permitted HVR1 variants from acute and chronic cases to be discriminated with an accuracy of 88%. Progression from acute to chronic stage of HCV infection is accompanied by characteristic changes in amino acid composition of HVR1. Identifying these changes may permit diagnosis of recent HCV infection.
{"title":"Differences in variability of hypervariable region 1 of hepatitis C virus (HCV) between acute and chronic stages of HCV infection.","authors":"I V Astrakhantseva, D S Campo, A Araujo, C-G Teo, Y Khudyakov, S Kamili","doi":"10.3233/ISB-2012-0451","DOIUrl":"https://doi.org/10.3233/ISB-2012-0451","url":null,"abstract":"<p><p>Distinguishing between acute and chronic HCV infections is clinically important given that early treatment of infected patients leads to high rates of sustained virological response. Analysis of 2179 clonal sequences derived from hypervariable region 1 (HVR1) of the HCV genome in samples obtained from patients with acute (n = 49) and chronic (n = 102) HCV infection showed that intra-host HVR1 diversity was 1.8 times higher in patients with chronic than acute infection. Significant differences in frequencies of 5 amino acids (positions 5, 7, 12, 16 and 18) and the average genetic distances among intra-host HVR1 variants were found using analysis of molecular variance. Differences were also observed in the polarity, volume and hydrophobicity of 10 amino acids (at positions 1, 4, 5, 12, 14, 15, 16, 21, 22 and 29). Based on these properties, a classification model could be constructed, which permitted HVR1 variants from acute and chronic cases to be discriminated with an accuracy of 88%. Progression from acute to chronic stage of HCV infection is accompanied by characteristic changes in amino acid composition of HVR1. Identifying these changes may permit diagnosis of recent HCV infection.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"163-73"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0451","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}