Microarray technology facilitates the monitoring of the expression levels of thousands of genes over different experimental conditions simultaneously. Clustering is a popular data mining tool which can be applied to microarray gene expression data to identify co-expressed genes. Most of the traditional clustering methods optimize a single clustering goodness criterion and thus may not be capable of performing well on all kinds of datasets. Motivated by this, in this article, a multiobjective clustering technique that optimizes cluster compactness and separation simultaneously, has been improved through a novel support vector machine classification based cluster ensemble method. The superiority of MOCSVMEN (MultiObjective Clustering with Support Vector Machine based ENsemble) has been established by comparing its performance with that of several well known existing microarray data clustering algorithms. Two real-life benchmark gene expression datasets have been used for testing the comparative performances of different algorithms. A recently developed metric, called Biological Homogeneity Index (BHI), which computes the clustering goodness with respect to functional annotation, has been used for the comparison purpose.
微阵列技术有助于同时监测不同实验条件下数千个基因的表达水平。聚类是一种流行的数据挖掘工具,可以应用于微阵列基因表达数据来识别共表达基因。传统的聚类方法大多对单一的聚类优度准则进行优化,因此可能无法在所有类型的数据集上表现良好。基于此,本文通过一种新的基于支持向量机分类的聚类集成方法,改进了一种同时优化聚类紧密度和分离度的多目标聚类技术。通过将MOCSVMEN (multi - objective Clustering with Support Vector Machine based ENsemble)算法的性能与现有几种知名的微阵列数据聚类算法进行比较,证明了MOCSVMEN算法的优越性。两个现实生活中的基准基因表达数据集已被用于测试不同算法的比较性能。最近开发的一种度量,称为生物同质性指数(BHI),它计算关于功能注释的聚类优度,已用于比较目的。
{"title":"Gene expression data analysis using multiobjective clustering improved with SVM based ensemble.","authors":"Anirban Mukhopadhyay, Ujjwal Maulik, Sanghamitra Bandyopadhyay","doi":"10.3233/ISB-2012-0441","DOIUrl":"https://doi.org/10.3233/ISB-2012-0441","url":null,"abstract":"<p><p>Microarray technology facilitates the monitoring of the expression levels of thousands of genes over different experimental conditions simultaneously. Clustering is a popular data mining tool which can be applied to microarray gene expression data to identify co-expressed genes. Most of the traditional clustering methods optimize a single clustering goodness criterion and thus may not be capable of performing well on all kinds of datasets. Motivated by this, in this article, a multiobjective clustering technique that optimizes cluster compactness and separation simultaneously, has been improved through a novel support vector machine classification based cluster ensemble method. The superiority of MOCSVMEN (MultiObjective Clustering with Support Vector Machine based ENsemble) has been established by comparing its performance with that of several well known existing microarray data clustering algorithms. Two real-life benchmark gene expression datasets have been used for testing the comparative performances of different algorithms. A recently developed metric, called Biological Homogeneity Index (BHI), which computes the clustering goodness with respect to functional annotation, has been used for the comparison purpose.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 1-2","pages":"19-27"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0441","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30550461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z Dimitrova, D S Campo, S Ramachandran, G Vaughan, L Ganova-Raeva, Y Lin, J C Forbi, G Xia, P Skums, B Pearlman, Y Khudyakov
Hepatitis C Virus sequence studies mainly focus on the viral amplicon containing the Hypervariable region 1 (HVR1) to obtain a sample of sequences from which several population genetics parameters can be calculated. Recent advances in sequencing methods allow for analyzing an unprecedented number of viral variants from infected patients and present a novel opportunity for understanding viral evolution, drug resistance and immune escape. In the present paper, we compared three recent technologies for amplicon analysis: (i) Next-Generation Sequencing; (ii) Clonal sequencing using End-point Limiting-dilution for isolation of individual sequence variants followed by Real-Time PCR and sequencing; and (iii) Mass spectrometry of base-specific cleavage reactions of a target sequence. These three technologies were used to assess intra-host diversity and inter-host genetic relatedness in HVR1 amplicons obtained from 38 patients (subgenotypes 1a and 1b). Assessments of intra-host diversity varied greatly between sequence-based and mass-spectrometry-based data. However, assessments of inter-host variability by all three technologies were equally accurate in identification of genetic relatedness among viral strains. These results support the application of all three technologies for molecular epidemiology and population genetics studies. Mass spectrometry is especially promising given its high throughput, low cost and comparable results with sequence-based methods.
{"title":"Evaluation of viral heterogeneity using next-generation sequencing, end-point limiting-dilution and mass spectrometry.","authors":"Z Dimitrova, D S Campo, S Ramachandran, G Vaughan, L Ganova-Raeva, Y Lin, J C Forbi, G Xia, P Skums, B Pearlman, Y Khudyakov","doi":"10.3233/ISB-2012-0453","DOIUrl":"https://doi.org/10.3233/ISB-2012-0453","url":null,"abstract":"<p><p>Hepatitis C Virus sequence studies mainly focus on the viral amplicon containing the Hypervariable region 1 (HVR1) to obtain a sample of sequences from which several population genetics parameters can be calculated. Recent advances in sequencing methods allow for analyzing an unprecedented number of viral variants from infected patients and present a novel opportunity for understanding viral evolution, drug resistance and immune escape. In the present paper, we compared three recent technologies for amplicon analysis: (i) Next-Generation Sequencing; (ii) Clonal sequencing using End-point Limiting-dilution for isolation of individual sequence variants followed by Real-Time PCR and sequencing; and (iii) Mass spectrometry of base-specific cleavage reactions of a target sequence. These three technologies were used to assess intra-host diversity and inter-host genetic relatedness in HVR1 amplicons obtained from 38 patients (subgenotypes 1a and 1b). Assessments of intra-host diversity varied greatly between sequence-based and mass-spectrometry-based data. However, assessments of inter-host variability by all three technologies were equally accurate in identification of genetic relatedness among viral strains. These results support the application of all three technologies for molecular epidemiology and population genetics studies. Mass spectrometry is especially promising given its high throughput, low cost and comparable results with sequence-based methods.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"183-92"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0453","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D S Campo, Z Dimitrova, J Lara, M Purdy, H Thai, S Ramachandran, L Ganova-Raeva, X Zhai, J C Forbi, C G Teo, Y Khudyakov
The detection of compensatory mutations that abrogate negative fitness effects of drug-resistance and vaccine-escape mutations indicates the important role of epistatic connectivity in evolution of viruses, especially under the strong selection pressures. Mapping of epistatic connectivity in the form of coordinated substitutions should help to characterize molecular mechanisms shaping viral evolution and provides a tool for the development of novel anti-viral drugs and vaccines. We analyzed coordinated variation among amino acid sites in 370 the hepatitis B virus (HBV) polymerase sequences using Bayesian networks. Among the HBV polymerase domains the spacer domain separating terminal protein from the reverse-transcriptase domain, showed the highest network centrality. Coordinated substitutions preserve the hydrophobicity and charge of Spacer. Maximum likelihood estimates of codon selection showed that Spacer contains the highest number of positively selected sites. Identification of 67% of the domain lacking an ordered structure suggests that Spacer belongs to the class of intrinsically disordered domains and proteins whose crucial functional role in the regulation of transcription, translation and cellular signal transduction has only recently been recognized. Spacer plays a central role in the epistatic network associating substitutions across the HBV genome, including those conferring viral virulence, drug resistance and vaccine escape. The data suggest that Spacer is extensively involved in coordination of HBV evolution.
{"title":"Coordinated evolution of the hepatitis B virus polymerase.","authors":"D S Campo, Z Dimitrova, J Lara, M Purdy, H Thai, S Ramachandran, L Ganova-Raeva, X Zhai, J C Forbi, C G Teo, Y Khudyakov","doi":"10.3233/ISB-2012-0452","DOIUrl":"https://doi.org/10.3233/ISB-2012-0452","url":null,"abstract":"The detection of compensatory mutations that abrogate negative fitness effects of drug-resistance and vaccine-escape mutations indicates the important role of epistatic connectivity in evolution of viruses, especially under the strong selection pressures. Mapping of epistatic connectivity in the form of coordinated substitutions should help to characterize molecular mechanisms shaping viral evolution and provides a tool for the development of novel anti-viral drugs and vaccines. We analyzed coordinated variation among amino acid sites in 370 the hepatitis B virus (HBV) polymerase sequences using Bayesian networks. Among the HBV polymerase domains the spacer domain separating terminal protein from the reverse-transcriptase domain, showed the highest network centrality. Coordinated substitutions preserve the hydrophobicity and charge of Spacer. Maximum likelihood estimates of codon selection showed that Spacer contains the highest number of positively selected sites. Identification of 67% of the domain lacking an ordered structure suggests that Spacer belongs to the class of intrinsically disordered domains and proteins whose crucial functional role in the regulation of transcription, translation and cellular signal transduction has only recently been recognized. Spacer plays a central role in the epistatic network associating substitutions across the HBV genome, including those conferring viral virulence, drug resistance and vaccine escape. The data suggest that Spacer is extensively involved in coordination of HBV evolution.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"175-82"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serghei Mangul, Adrian Caciula, Olga Glebova, Ion Mandoiu, Alex Zelikovsky
The paper addresses the problem of how to use RNA-Seq data for transcriptome reconstruction and quantification, as well as novel transcript discovery in partially annotated genomes. We present a novel annotation-guided general framework for transcriptome discovery, reconstruction and quantification in partially annotated genomes and compare it with existing annotation-guided and genome-guided transcriptome assembly methods. Our method, referred as Discovery and Reconstruction of Unannotated Transcripts (DRUT), can be used to enhance existing transcriptome assemblers, such as Cufflinks, as well as to accurately estimate the transcript frequencies. Empirical analysis on synthetic datasets confirms that Cufflinks enhanced by DRUT has superior quality of reconstruction and frequency estimation of transcripts.
{"title":"Improved transcriptome quantification and reconstruction from RNA-Seq reads using partial annotations.","authors":"Serghei Mangul, Adrian Caciula, Olga Glebova, Ion Mandoiu, Alex Zelikovsky","doi":"10.3233/ISB-2012-0459","DOIUrl":"https://doi.org/10.3233/ISB-2012-0459","url":null,"abstract":"<p><p>The paper addresses the problem of how to use RNA-Seq data for transcriptome reconstruction and quantification, as well as novel transcript discovery in partially annotated genomes. We present a novel annotation-guided general framework for transcriptome discovery, reconstruction and quantification in partially annotated genomes and compare it with existing annotation-guided and genome-guided transcriptome assembly methods. Our method, referred as Discovery and Reconstruction of Unannotated Transcripts (DRUT), can be used to enhance existing transcriptome assemblers, such as Cufflinks, as well as to accurately estimate the transcript frequencies. Empirical analysis on synthetic datasets confirms that Cufflinks enhanced by DRUT has superior quality of reconstruction and frequency estimation of transcripts.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"251-61"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0459","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y Y Vaskin, I V Khomicheva, E V Ignatieva, E E Vityaev
The task of automatic extraction of the hierarchical structure of eukaryotic gene regulatory regions is in the junction of the fields of biology, mathematics and information technologies. A solution of the problem involves understanding of sophisticated mechanisms of eukaryotic gene regulation and applying advanced data mining technologies. In the paper the integrated system, implementing a powerful relation mining of biological data method, is discussed. The system allows taking into account prior information about the gene regulatory regions that is known by the biologist, performing the analysis on each hierarchical level, searching for a solution from a simple hypothesis to a complex one. The integration of ExpertDiscovery system into UGENE toolkit provides a convenient environment for conducting complex research and automating the work of a biologist. For demonstration, the system has been applied for recognition of SF1, SREBP, HNF4 vertebrate binding sites and for the analysis the human gene regulatory regions that promote liver-specific transcription.
{"title":"ExpertDiscovery and UGENE integrated system for intelligent analysis of regulatory regions of genes.","authors":"Y Y Vaskin, I V Khomicheva, E V Ignatieva, E E Vityaev","doi":"10.3233/ISB-2012-0448","DOIUrl":"https://doi.org/10.3233/ISB-2012-0448","url":null,"abstract":"<p><p>The task of automatic extraction of the hierarchical structure of eukaryotic gene regulatory regions is in the junction of the fields of biology, mathematics and information technologies. A solution of the problem involves understanding of sophisticated mechanisms of eukaryotic gene regulation and applying advanced data mining technologies. In the paper the integrated system, implementing a powerful relation mining of biological data method, is discussed. The system allows taking into account prior information about the gene regulatory regions that is known by the biologist, performing the analysis on each hierarchical level, searching for a solution from a simple hypothesis to a complex one. The integration of ExpertDiscovery system into UGENE toolkit provides a convenient environment for conducting complex research and automating the work of a biologist. For demonstration, the system has been applied for recognition of SF1, SREBP, HNF4 vertebrate binding sites and for the analysis the human gene regulatory regions that promote liver-specific transcription.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 3-4","pages":"97-108"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0448","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30870648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li-Ping Long, Changhe Yuan, Zhipeng Cai, Huiping Xu, Xiu-Feng Wan
Influenza A viruses have been responsible for large losses of lives around the world and continue to present a great public health challenge. In April 2009, a novel swine-origin H1N1 virus emerged in North America and caused the first pandemic of the 21st century. Toward the end of 2009, two waves of outbreaks occurred, and then the disease moderated. It will be critical to understand how this novel pandemic virus invaded and adapted to a human population. To understand the molecular dynamics and evolution in this pandemic H1N1 virus, we applied an Expectation-Maximization algorithm to estimate the Gaussian mixture in the genetic population of the hemagglutinin (HA) gene of these H1N1 viruses from April of 2009 to January of 2010 and compared them with the viruses that cause seasonal H1N1 influenza. Our results show that, after it was introduced to human population, the 2009 H1N1 viral HA gene changed its population structure from a single Gaussian distribution to two major Gaussian distributions. The breadths of HA genetic diversity of 2009 H1N1 virus also increased from the first wave to the second wave of this pandemic. Phylogenetic analyses demonstrated that only certain HA sublineages of 2009 H1N1 viruses were able to circulate throughout the pandemic period. In contrast, the influenza HA population structure of seasonal H1N1 virus was relatively stable, and the breadth of HA genetic diversity within a single season population remained similar. This study revealed an evolutionary mechanism for a novel pandemic virus. After the virus is introduced to human population, the influenza virus would expand their molecular diversity through both random mutations (genetic drift) and selections. Eventually, multiple levels of hierarchical Gaussian distributions will replace the earlier single distribution. An evolutionary model for pandemic H1N1 influenza A virus was proposed and demonstrated with a simulation.
甲型流感病毒在世界各地造成了巨大的生命损失,并继续对公共卫生构成巨大挑战。2009 年 4 月,一种源于猪的新型 H1N1 病毒在北美出现,并引发了 21 世纪的首次大流行。2009 年底,爆发了两波疫情,随后疫情有所缓和。了解这种新型大流行病毒是如何入侵并适应人类群体的至关重要。为了了解这种大流行 H1N1 病毒的分子动力学和进化过程,我们应用期望最大化算法估计了 2009 年 4 月至 2010 年 1 月期间这些 H1N1 病毒血凝素(HA)基因遗传群体的高斯混合物,并将其与引起季节性 H1N1 流感的病毒进行了比较。结果表明,2009 年 H1N1 病毒 HA 基因进入人类后,其种群结构从单一高斯分布变为两大高斯分布。2009 H1N1 病毒 HA 基因多样性的广度也从此次流感大流行的第一波增加到了第二波。系统发生学分析表明,2009 H1N1 病毒中只有某些 HA 亚系能够在整个大流行期间流行。相比之下,季节性 H1N1 病毒的流感 HA 群体结构相对稳定,单季群体内 HA 遗传多样性的广度保持相似。这项研究揭示了新型大流行病毒的进化机制。病毒进入人类后,流感病毒会通过随机突变(基因漂移)和选择两种方式扩大其分子多样性。最终,多层次的高斯分布将取代早期的单一分布。本文提出了甲型 H1N1 流感病毒大流行的进化模型,并进行了模拟演示。
{"title":"Mixture model analysis reflecting dynamics of the population diversity of 2009 pandemic H1N1 influenza virus.","authors":"Li-Ping Long, Changhe Yuan, Zhipeng Cai, Huiping Xu, Xiu-Feng Wan","doi":"10.3233/ISB-2012-0457","DOIUrl":"10.3233/ISB-2012-0457","url":null,"abstract":"<p><p>Influenza A viruses have been responsible for large losses of lives around the world and continue to present a great public health challenge. In April 2009, a novel swine-origin H1N1 virus emerged in North America and caused the first pandemic of the 21st century. Toward the end of 2009, two waves of outbreaks occurred, and then the disease moderated. It will be critical to understand how this novel pandemic virus invaded and adapted to a human population. To understand the molecular dynamics and evolution in this pandemic H1N1 virus, we applied an Expectation-Maximization algorithm to estimate the Gaussian mixture in the genetic population of the hemagglutinin (HA) gene of these H1N1 viruses from April of 2009 to January of 2010 and compared them with the viruses that cause seasonal H1N1 influenza. Our results show that, after it was introduced to human population, the 2009 H1N1 viral HA gene changed its population structure from a single Gaussian distribution to two major Gaussian distributions. The breadths of HA genetic diversity of 2009 H1N1 virus also increased from the first wave to the second wave of this pandemic. Phylogenetic analyses demonstrated that only certain HA sublineages of 2009 H1N1 viruses were able to circulate throughout the pandemic period. In contrast, the influenza HA population structure of seasonal H1N1 virus was relatively stable, and the breadth of HA genetic diversity within a single season population remained similar. This study revealed an evolutionary mechanism for a novel pandemic virus. After the virus is introduced to human population, the influenza virus would expand their molecular diversity through both random mutations (genetic drift) and selections. Eventually, multiple levels of hierarchical Gaussian distributions will replace the earlier single distribution. An evolutionary model for pandemic H1N1 influenza A virus was proposed and demonstrated with a simulation.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"225-36"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710479/pdf/nihms749403.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MicroRNA expression profiles can improve classification, diagnosis, and prognostic information of malignancies, including lung cancer. In this paper, we undertook to develop a miRNA-mRNA network and uncover unique growth suppressive miRNAs in lung cancer using microarray data. The miRNA-mRNA network was developed based on a bipartite graph theory approach, and a number of miRNA-mRNA modules have been identified to mine associations between miRNAs and mRNAs. From the network, we identified totally 29 protective miRNA-mRNA regulatory modules, since we restricted our search to protective miRNAs. Subsequently we analyzed the pathways for the target genes in the protective miRNA-mRNA modules using Pathway-Express. The miRNA-mRNA network efficiently detects hub mRNAs deregulated by the protective miRNAs and identifies cancer specific miRNAs in lung cancer. From the pathway analysis results, the ECM receptor pathway, Focal adhesion pathway and cell adhesion molecules pathway seem to be more interesting to investigate, since these pathways were related to all the ten protective miRNAs. Furthermore, protective miRNA target analysis revealed that genes VCAN, SIL, CD44 and MMP14 were found to have an important role in these pathways. Hence, it was inferred that these genes can be important putative targets for those protective miRNAs. A greater understanding of the mechanisms regulating VCAN, SIL, CD44 and MMP14 expression and activity will assist in the development of specific inhibitors of cancer cell metastasis. Thus these observations are expected to have an intense implication in cancer and may be useful for further research.
{"title":"miRNA-mRNA network detects hub mRNAs and cancer specific miRNAs in lung cancer.","authors":"Saranya Devaraj, Jeyakumar Natarajan","doi":"10.3233/ISB-2012-0444","DOIUrl":"https://doi.org/10.3233/ISB-2012-0444","url":null,"abstract":"<p><p>MicroRNA expression profiles can improve classification, diagnosis, and prognostic information of malignancies, including lung cancer. In this paper, we undertook to develop a miRNA-mRNA network and uncover unique growth suppressive miRNAs in lung cancer using microarray data. The miRNA-mRNA network was developed based on a bipartite graph theory approach, and a number of miRNA-mRNA modules have been identified to mine associations between miRNAs and mRNAs. From the network, we identified totally 29 protective miRNA-mRNA regulatory modules, since we restricted our search to protective miRNAs. Subsequently we analyzed the pathways for the target genes in the protective miRNA-mRNA modules using Pathway-Express. The miRNA-mRNA network efficiently detects hub mRNAs deregulated by the protective miRNAs and identifies cancer specific miRNAs in lung cancer. From the pathway analysis results, the ECM receptor pathway, Focal adhesion pathway and cell adhesion molecules pathway seem to be more interesting to investigate, since these pathways were related to all the ten protective miRNAs. Furthermore, protective miRNA target analysis revealed that genes VCAN, SIL, CD44 and MMP14 were found to have an important role in these pathways. Hence, it was inferred that these genes can be important putative targets for those protective miRNAs. A greater understanding of the mechanisms regulating VCAN, SIL, CD44 and MMP14 expression and activity will assist in the development of specific inhibitors of cancer cell metastasis. Thus these observations are expected to have an intense implication in cancer and may be useful for further research.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"281-95"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.
{"title":"QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads.","authors":"Austin Huang, Rami Kantor, Allison DeLong, Leeann Schreier, Sorin Istrail","doi":"10.3233/ISB-2012-0454","DOIUrl":"10.3233/ISB-2012-0454","url":null,"abstract":"<p><p>Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"193-201"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530257/pdf/nihms879660.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The identification of common tumor signatures can discover the shared molecular mechanisms underlying tumorgenesis whereby we can prevent and treat tumors by a system intervention. We identified tumor-associated signatures including pathways, transcription factors, microRNAs and gene ontology categories by analyzing gene sets for differential expression between normal vs. tumor phenotypes classes in various tumor gene expression datasets. We obtained the common tumor signatures based on their identified frequencies for different tumor types. Some shared signatures important for various tumor types were uncovered and discussed. We proposed that the interventions aiming at both the shared tumor signatures and the tissue-specific tumor signatures might be a potential approach to overcoming cancer.
{"title":"Identification of common tumor signatures based on gene set enrichment analysis.","authors":"Xiaosheng Wang","doi":"10.3233/ISB-2012-0440","DOIUrl":"10.3233/ISB-2012-0440","url":null,"abstract":"<p><p>The identification of common tumor signatures can discover the shared molecular mechanisms underlying tumorgenesis whereby we can prevent and treat tumors by a system intervention. We identified tumor-associated signatures including pathways, transcription factors, microRNAs and gene ontology categories by analyzing gene sets for differential expression between normal vs. tumor phenotypes classes in various tumor gene expression datasets. We obtained the common tumor signatures based on their identified frequencies for different tumor types. Some shared signatures important for various tumor types were uncovered and discussed. We proposed that the interventions aiming at both the shared tumor signatures and the tissue-specific tumor signatures might be a potential approach to overcoming cancer.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 1-2","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3579559/pdf/nihms443974.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30550460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajarshi Guha, Gary D Wiggins, David J Wild, Mu-Hyun Baik, Marlon E Pierce And, Geoffrey C Fox
Some of the latest trends in cheminformatics, computation, and the world wide web are reviewed with predictions of how these are likely to impact the field of cheminformatics in the next five years. The vision and some of the work of the Chemical Informatics and Cyberinfrastructure Collaboratory at Indiana University are described, which we base around the core concepts of e-Science and cyberinfrastructure that have proven successful in other fields. Our chemical informatics cyberinfrastructure is realized by building a flexible, generic infrastructure for cheminformatics tools and databases, exporting "best of breed" methods as easily-accessible web APIs for cheminformaticians, scientists, and researchers in other disciplines, and hosting a unique chemical informatics education program aimed at scientists and cheminformatics practitioners in academia and industry.
{"title":"Improving usability and accessibility of cheminformatics tools for chemists through cyberinfrastructure and education.","authors":"Rajarshi Guha, Gary D Wiggins, David J Wild, Mu-Hyun Baik, Marlon E Pierce And, Geoffrey C Fox","doi":"10.3233/CI-2008-0015","DOIUrl":"https://doi.org/10.3233/CI-2008-0015","url":null,"abstract":"<p><p>Some of the latest trends in cheminformatics, computation, and the world wide web are reviewed with predictions of how these are likely to impact the field of cheminformatics in the next five years. The vision and some of the work of the Chemical Informatics and Cyberinfrastructure Collaboratory at Indiana University are described, which we base around the core concepts of e-Science and cyberinfrastructure that have proven successful in other fields. Our chemical informatics cyberinfrastructure is realized by building a flexible, generic infrastructure for cheminformatics tools and databases, exporting \"best of breed\" methods as easily-accessible web APIs for cheminformaticians, scientists, and researchers in other disciplines, and hosting a unique chemical informatics education program aimed at scientists and cheminformatics practitioners in academia and industry.</p>","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 1-2","pages":"41-60"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/CI-2008-0015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30550463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}