In Silico Biology最新文献_第4页

Gene expression data analysis using multiobjective clustering improved with SVM based ensemble. 基于支持向量机集成改进的多目标聚类基因表达数据分析。

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/ISB-2012-0441

Anirban Mukhopadhyay, Ujjwal Maulik, Sanghamitra Bandyopadhyay

Microarray technology facilitates the monitoring of the expression levels of thousands of genes over different experimental conditions simultaneously. Clustering is a popular data mining tool which can be applied to microarray gene expression data to identify co-expressed genes. Most of the traditional clustering methods optimize a single clustering goodness criterion and thus may not be capable of performing well on all kinds of datasets. Motivated by this, in this article, a multiobjective clustering technique that optimizes cluster compactness and separation simultaneously, has been improved through a novel support vector machine classification based cluster ensemble method. The superiority of MOCSVMEN (MultiObjective Clustering with Support Vector Machine based ENsemble) has been established by comparing its performance with that of several well known existing microarray data clustering algorithms. Two real-life benchmark gene expression datasets have been used for testing the comparative performances of different algorithms. A recently developed metric, called Biological Homogeneity Index (BHI), which computes the clustering goodness with respect to functional annotation, has been used for the comparison purpose.

微阵列技术有助于同时监测不同实验条件下数千个基因的表达水平。聚类是一种流行的数据挖掘工具，可以应用于微阵列基因表达数据来识别共表达基因。传统的聚类方法大多对单一的聚类优度准则进行优化，因此可能无法在所有类型的数据集上表现良好。基于此，本文通过一种新的基于支持向量机分类的聚类集成方法，改进了一种同时优化聚类紧密度和分离度的多目标聚类技术。通过将MOCSVMEN (multi - objective Clustering with Support Vector Machine based ENsemble)算法的性能与现有几种知名的微阵列数据聚类算法进行比较，证明了MOCSVMEN算法的优越性。两个现实生活中的基准基因表达数据集已被用于测试不同算法的比较性能。最近开发的一种度量，称为生物同质性指数(BHI)，它计算关于功能注释的聚类优度，已用于比较目的。

{"title":"Gene expression data analysis using multiobjective clustering improved with SVM based ensemble.","authors":"Anirban Mukhopadhyay, Ujjwal Maulik, Sanghamitra Bandyopadhyay","doi":"10.3233/ISB-2012-0441","DOIUrl":"https://doi.org/10.3233/ISB-2012-0441","url":null,"abstract":"Microarray technology facilitates the monitoring of the expression levels of thousands of genes over different experimental conditions simultaneously. Clustering is a popular data mining tool which can be applied to microarray gene expression data to identify co-expressed genes. Most of the traditional clustering methods optimize a single clustering goodness criterion and thus may not be capable of performing well on all kinds of datasets. Motivated by this, in this article, a multiobjective clustering technique that optimizes cluster compactness and separation simultaneously, has been improved through a novel support vector machine classification based cluster ensemble method. The superiority of MOCSVMEN (MultiObjective Clustering with Support Vector Machine based ENsemble) has been established by comparing its performance with that of several well known existing microarray data clustering algorithms. Two real-life benchmark gene expression datasets have been used for testing the comparative performances of different algorithms. A recently developed metric, called Biological Homogeneity Index (BHI), which computes the clustering goodness with respect to functional annotation, has been used for the comparison purpose.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 1-2","pages":"19-27"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0441","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30550461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Evaluation of viral heterogeneity using next-generation sequencing, end-point limiting-dilution and mass spectrometry. 利用新一代测序、终点限制稀释和质谱法评估病毒异质性。

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/ISB-2012-0453

Z Dimitrova, D S Campo, S Ramachandran, G Vaughan, L Ganova-Raeva, Y Lin, J C Forbi, G Xia, P Skums, B Pearlman, Y Khudyakov

Hepatitis C Virus sequence studies mainly focus on the viral amplicon containing the Hypervariable region 1 (HVR1) to obtain a sample of sequences from which several population genetics parameters can be calculated. Recent advances in sequencing methods allow for analyzing an unprecedented number of viral variants from infected patients and present a novel opportunity for understanding viral evolution, drug resistance and immune escape. In the present paper, we compared three recent technologies for amplicon analysis: (i) Next-Generation Sequencing; (ii) Clonal sequencing using End-point Limiting-dilution for isolation of individual sequence variants followed by Real-Time PCR and sequencing; and (iii) Mass spectrometry of base-specific cleavage reactions of a target sequence. These three technologies were used to assess intra-host diversity and inter-host genetic relatedness in HVR1 amplicons obtained from 38 patients (subgenotypes 1a and 1b). Assessments of intra-host diversity varied greatly between sequence-based and mass-spectrometry-based data. However, assessments of inter-host variability by all three technologies were equally accurate in identification of genetic relatedness among viral strains. These results support the application of all three technologies for molecular epidemiology and population genetics studies. Mass spectrometry is especially promising given its high throughput, low cost and comparable results with sequence-based methods.

丙型肝炎病毒序列研究主要集中在含有高变区1 (HVR1)的病毒扩增子上，以获得一些序列样本，从中可以计算出一些群体遗传学参数。测序方法的最新进展允许分析来自感染患者的空前数量的病毒变体，并为了解病毒进化，耐药性和免疫逃逸提供了新的机会。在本文中，我们比较了三种最新的扩增子分析技术:(i)下一代测序;(ii)克隆测序，使用终点限制稀释法分离单个序列变异，然后进行实时荧光定量PCR和测序;(iii)目标序列碱基特异性裂解反应的质谱分析。这三种技术用于评估从38例患者(亚基因型1a和1b)获得的HVR1扩增子的宿主内多样性和宿主间遗传相关性。基于序列和基于质谱的数据对宿主内多样性的评估差异很大。然而，所有三种技术对宿主间变异的评估在鉴定病毒株之间的遗传相关性方面同样准确。这些结果支持了这三种技术在分子流行病学和群体遗传学研究中的应用。质谱法由于其高通量、低成本和与基于序列的方法可比较的结果而特别有前途。

{"title":"Evaluation of viral heterogeneity using next-generation sequencing, end-point limiting-dilution and mass spectrometry.","authors":"Z Dimitrova, D S Campo, S Ramachandran, G Vaughan, L Ganova-Raeva, Y Lin, J C Forbi, G Xia, P Skums, B Pearlman, Y Khudyakov","doi":"10.3233/ISB-2012-0453","DOIUrl":"https://doi.org/10.3233/ISB-2012-0453","url":null,"abstract":"Hepatitis C Virus sequence studies mainly focus on the viral amplicon containing the Hypervariable region 1 (HVR1) to obtain a sample of sequences from which several population genetics parameters can be calculated. Recent advances in sequencing methods allow for analyzing an unprecedented number of viral variants from infected patients and present a novel opportunity for understanding viral evolution, drug resistance and immune escape. In the present paper, we compared three recent technologies for amplicon analysis: (i) Next-Generation Sequencing; (ii) Clonal sequencing using End-point Limiting-dilution for isolation of individual sequence variants followed by Real-Time PCR and sequencing; and (iii) Mass spectrometry of base-specific cleavage reactions of a target sequence. These three technologies were used to assess intra-host diversity and inter-host genetic relatedness in HVR1 amplicons obtained from 38 patients (subgenotypes 1a and 1b). Assessments of intra-host diversity varied greatly between sequence-based and mass-spectrometry-based data. However, assessments of inter-host variability by all three technologies were equally accurate in identification of genetic relatedness among viral strains. These results support the application of all three technologies for molecular epidemiology and population genetics studies. Mass spectrometry is especially promising given its high throughput, low cost and comparable results with sequence-based methods.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"183-92"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0453","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

ExpertDiscovery and UGENE integrated system for intelligent analysis of regulatory regions of genes. ExpertDiscovery和UGENE集成系统，用于基因调控区域的智能分析。

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/ISB-2012-0448

Y Y Vaskin, I V Khomicheva, E V Ignatieva, E E Vityaev

The task of automatic extraction of the hierarchical structure of eukaryotic gene regulatory regions is in the junction of the fields of biology, mathematics and information technologies. A solution of the problem involves understanding of sophisticated mechanisms of eukaryotic gene regulation and applying advanced data mining technologies. In the paper the integrated system, implementing a powerful relation mining of biological data method, is discussed. The system allows taking into account prior information about the gene regulatory regions that is known by the biologist, performing the analysis on each hierarchical level, searching for a solution from a simple hypothesis to a complex one. The integration of ExpertDiscovery system into UGENE toolkit provides a convenient environment for conducting complex research and automating the work of a biologist. For demonstration, the system has been applied for recognition of SF1, SREBP, HNF4 vertebrate binding sites and for the analysis the human gene regulatory regions that promote liver-specific transcription.

真核生物基因调控区层次结构的自动提取是生物学、数学和信息技术领域的交叉课题。解决这个问题的方法包括了解真核生物基因调控的复杂机制和应用先进的数据挖掘技术。本文讨论了集成系统实现一种强大的生物数据关系挖掘方法。该系统可以考虑到生物学家已知的基因调控区域的先验信息，在每个层次上进行分析，从一个简单的假设到一个复杂的假设寻找解决方案。将ExpertDiscovery系统集成到UGENE工具包中，为进行复杂的研究和生物学家的自动化工作提供了方便的环境。为了证明这一点，该系统已被用于识别SF1、SREBP、HNF4脊椎动物结合位点，并用于分析促进肝脏特异性转录的人类基因调控区域。

{"title":"ExpertDiscovery and UGENE integrated system for intelligent analysis of regulatory regions of genes.","authors":"Y Y Vaskin, I V Khomicheva, E V Ignatieva, E E Vityaev","doi":"10.3233/ISB-2012-0448","DOIUrl":"https://doi.org/10.3233/ISB-2012-0448","url":null,"abstract":"The task of automatic extraction of the hierarchical structure of eukaryotic gene regulatory regions is in the junction of the fields of biology, mathematics and information technologies. A solution of the problem involves understanding of sophisticated mechanisms of eukaryotic gene regulation and applying advanced data mining technologies. In the paper the integrated system, implementing a powerful relation mining of biological data method, is discussed. The system allows taking into account prior information about the gene regulatory regions that is known by the biologist, performing the analysis on each hierarchical level, searching for a solution from a simple hypothesis to a complex one. The integration of ExpertDiscovery system into UGENE toolkit provides a convenient environment for conducting complex research and automating the work of a biologist. For demonstration, the system has been applied for recognition of SF1, SREBP, HNF4 vertebrate binding sites and for the analysis the human gene regulatory regions that promote liver-specific transcription.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 3-4","pages":"97-108"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0448","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30870648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Coordinated evolution of the hepatitis B virus polymerase. 乙型肝炎病毒聚合酶的协同进化。

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/ISB-2012-0452

D S Campo, Z Dimitrova, J Lara, M Purdy, H Thai, S Ramachandran, L Ganova-Raeva, X Zhai, J C Forbi, C G Teo, Y Khudyakov

The detection of compensatory mutations that abrogate negative fitness effects of drug-resistance and vaccine-escape mutations indicates the important role of epistatic connectivity in evolution of viruses, especially under the strong selection pressures. Mapping of epistatic connectivity in the form of coordinated substitutions should help to characterize molecular mechanisms shaping viral evolution and provides a tool for the development of novel anti-viral drugs and vaccines. We analyzed coordinated variation among amino acid sites in 370 the hepatitis B virus (HBV) polymerase sequences using Bayesian networks. Among the HBV polymerase domains the spacer domain separating terminal protein from the reverse-transcriptase domain, showed the highest network centrality. Coordinated substitutions preserve the hydrophobicity and charge of Spacer. Maximum likelihood estimates of codon selection showed that Spacer contains the highest number of positively selected sites. Identification of 67% of the domain lacking an ordered structure suggests that Spacer belongs to the class of intrinsically disordered domains and proteins whose crucial functional role in the regulation of transcription, translation and cellular signal transduction has only recently been recognized. Spacer plays a central role in the epistatic network associating substitutions across the HBV genome, including those conferring viral virulence, drug resistance and vaccine escape. The data suggest that Spacer is extensively involved in coordination of HBV evolution.

代偿性突变的发现消除了抗药性和疫苗逃逸突变的负适合度效应，表明上位性连接在病毒进化中的重要作用，特别是在强选择压力下。以协调取代的形式绘制上位连通性图谱，应有助于表征形成病毒进化的分子机制，并为开发新型抗病毒药物和疫苗提供工具。我们使用贝叶斯网络分析了370个乙型肝炎病毒(HBV)聚合酶序列氨基酸位点之间的协调变异。在HBV聚合酶结构域中，分离末端蛋白和逆转录酶结构域的间隔结构域显示出最高的网络中心性。配位取代保留了间隔剂的疏水性和电荷。密码子选择的最大似然估计表明，Spacer包含最多的正选择位点。发现67%的结构域缺乏有序结构，表明Spacer属于内在无序结构域和蛋白质，其在转录、翻译和细胞信号转导调节中的关键功能作用直到最近才被认识到。间隔蛋白在与HBV基因组相关的上位网络中起着核心作用，包括那些赋予病毒毒力、耐药性和疫苗逃逸的网络。这些数据表明，Spacer广泛参与HBV进化的协调。

{"title":"Coordinated evolution of the hepatitis B virus polymerase.","authors":"D S Campo, Z Dimitrova, J Lara, M Purdy, H Thai, S Ramachandran, L Ganova-Raeva, X Zhai, J C Forbi, C G Teo, Y Khudyakov","doi":"10.3233/ISB-2012-0452","DOIUrl":"https://doi.org/10.3233/ISB-2012-0452","url":null,"abstract":"The detection of compensatory mutations that abrogate negative fitness effects of drug-resistance and vaccine-escape mutations indicates the important role of epistatic connectivity in evolution of viruses, especially under the strong selection pressures. Mapping of epistatic connectivity in the form of coordinated substitutions should help to characterize molecular mechanisms shaping viral evolution and provides a tool for the development of novel anti-viral drugs and vaccines. We analyzed coordinated variation among amino acid sites in 370 the hepatitis B virus (HBV) polymerase sequences using Bayesian networks. Among the HBV polymerase domains the spacer domain separating terminal protein from the reverse-transcriptase domain, showed the highest network centrality. Coordinated substitutions preserve the hydrophobicity and charge of Spacer. Maximum likelihood estimates of codon selection showed that Spacer contains the highest number of positively selected sites. Identification of 67% of the domain lacking an ordered structure suggests that Spacer belongs to the class of intrinsically disordered domains and proteins whose crucial functional role in the regulation of transcription, translation and cellular signal transduction has only recently been recognized. Spacer plays a central role in the epistatic network associating substitutions across the HBV genome, including those conferring viral virulence, drug resistance and vaccine escape. The data suggest that Spacer is extensively involved in coordination of HBV evolution.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"175-82"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Mixture model analysis reflecting dynamics of the population diversity of 2009 pandemic H1N1 influenza virus. 反映 2009 年大流行 H1N1 流感病毒种群多样性动态的混合模型分析。

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/ISB-2012-0457

Li-Ping Long, Changhe Yuan, Zhipeng Cai, Huiping Xu, Xiu-Feng Wan

Influenza A viruses have been responsible for large losses of lives around the world and continue to present a great public health challenge. In April 2009, a novel swine-origin H1N1 virus emerged in North America and caused the first pandemic of the 21st century. Toward the end of 2009, two waves of outbreaks occurred, and then the disease moderated. It will be critical to understand how this novel pandemic virus invaded and adapted to a human population. To understand the molecular dynamics and evolution in this pandemic H1N1 virus, we applied an Expectation-Maximization algorithm to estimate the Gaussian mixture in the genetic population of the hemagglutinin (HA) gene of these H1N1 viruses from April of 2009 to January of 2010 and compared them with the viruses that cause seasonal H1N1 influenza. Our results show that, after it was introduced to human population, the 2009 H1N1 viral HA gene changed its population structure from a single Gaussian distribution to two major Gaussian distributions. The breadths of HA genetic diversity of 2009 H1N1 virus also increased from the first wave to the second wave of this pandemic. Phylogenetic analyses demonstrated that only certain HA sublineages of 2009 H1N1 viruses were able to circulate throughout the pandemic period. In contrast, the influenza HA population structure of seasonal H1N1 virus was relatively stable, and the breadth of HA genetic diversity within a single season population remained similar. This study revealed an evolutionary mechanism for a novel pandemic virus. After the virus is introduced to human population, the influenza virus would expand their molecular diversity through both random mutations (genetic drift) and selections. Eventually, multiple levels of hierarchical Gaussian distributions will replace the earlier single distribution. An evolutionary model for pandemic H1N1 influenza A virus was proposed and demonstrated with a simulation.

甲型流感病毒在世界各地造成了巨大的生命损失，并继续对公共卫生构成巨大挑战。2009 年 4 月，一种源于猪的新型 H1N1 病毒在北美出现，并引发了 21 世纪的首次大流行。2009 年底，爆发了两波疫情，随后疫情有所缓和。了解这种新型大流行病毒是如何入侵并适应人类群体的至关重要。为了了解这种大流行 H1N1 病毒的分子动力学和进化过程，我们应用期望最大化算法估计了 2009 年 4 月至 2010 年 1 月期间这些 H1N1 病毒血凝素（HA）基因遗传群体的高斯混合物，并将其与引起季节性 H1N1 流感的病毒进行了比较。结果表明，2009 年 H1N1 病毒 HA 基因进入人类后，其种群结构从单一高斯分布变为两大高斯分布。2009 H1N1 病毒 HA 基因多样性的广度也从此次流感大流行的第一波增加到了第二波。系统发生学分析表明，2009 H1N1 病毒中只有某些 HA 亚系能够在整个大流行期间流行。相比之下，季节性 H1N1 病毒的流感 HA 群体结构相对稳定，单季群体内 HA 遗传多样性的广度保持相似。这项研究揭示了新型大流行病毒的进化机制。病毒进入人类后，流感病毒会通过随机突变（基因漂移）和选择两种方式扩大其分子多样性。最终，多层次的高斯分布将取代早期的单一分布。本文提出了甲型 H1N1 流感病毒大流行的进化模型，并进行了模拟演示。

{"title":"Mixture model analysis reflecting dynamics of the population diversity of 2009 pandemic H1N1 influenza virus.","authors":"Li-Ping Long, Changhe Yuan, Zhipeng Cai, Huiping Xu, Xiu-Feng Wan","doi":"10.3233/ISB-2012-0457","DOIUrl":"10.3233/ISB-2012-0457","url":null,"abstract":"Influenza A viruses have been responsible for large losses of lives around the world and continue to present a great public health challenge. In April 2009, a novel swine-origin H1N1 virus emerged in North America and caused the first pandemic of the 21st century. Toward the end of 2009, two waves of outbreaks occurred, and then the disease moderated. It will be critical to understand how this novel pandemic virus invaded and adapted to a human population. To understand the molecular dynamics and evolution in this pandemic H1N1 virus, we applied an Expectation-Maximization algorithm to estimate the Gaussian mixture in the genetic population of the hemagglutinin (HA) gene of these H1N1 viruses from April of 2009 to January of 2010 and compared them with the viruses that cause seasonal H1N1 influenza. Our results show that, after it was introduced to human population, the 2009 H1N1 viral HA gene changed its population structure from a single Gaussian distribution to two major Gaussian distributions. The breadths of HA genetic diversity of 2009 H1N1 virus also increased from the first wave to the second wave of this pandemic. Phylogenetic analyses demonstrated that only certain HA sublineages of 2009 H1N1 viruses were able to circulate throughout the pandemic period. In contrast, the influenza HA population structure of seasonal H1N1 virus was relatively stable, and the breadth of HA genetic diversity within a single season population remained similar. This study revealed an evolutionary mechanism for a novel pandemic virus. After the virus is introduced to human population, the influenza virus would expand their molecular diversity through both random mutations (genetic drift) and selections. Eventually, multiple levels of hierarchical Gaussian distributions will replace the earlier single distribution. An evolutionary model for pandemic H1N1 influenza A virus was proposed and demonstrated with a simulation.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"225-36"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710479/pdf/nihms749403.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved transcriptome quantification and reconstruction from RNA-Seq reads using partial annotations. 利用部分注释改进RNA-Seq reads的转录组定量和重建。

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/ISB-2012-0459

Serghei Mangul, Adrian Caciula, Olga Glebova, Ion Mandoiu, Alex Zelikovsky

The paper addresses the problem of how to use RNA-Seq data for transcriptome reconstruction and quantification, as well as novel transcript discovery in partially annotated genomes. We present a novel annotation-guided general framework for transcriptome discovery, reconstruction and quantification in partially annotated genomes and compare it with existing annotation-guided and genome-guided transcriptome assembly methods. Our method, referred as Discovery and Reconstruction of Unannotated Transcripts (DRUT), can be used to enhance existing transcriptome assemblers, such as Cufflinks, as well as to accurately estimate the transcript frequencies. Empirical analysis on synthetic datasets confirms that Cufflinks enhanced by DRUT has superior quality of reconstruction and frequency estimation of transcripts.

本文解决了如何使用RNA-Seq数据进行转录组重建和定量的问题，以及在部分注释的基因组中发现新的转录本。我们提出了一种新的注释引导的转录组发现、重建和定量的通用框架，并将其与现有的注释引导和基因组引导转录组组装方法进行了比较。我们的方法被称为发现和重建未注释转录本(DRUT)，可用于增强现有的转录组组装器，如袖扣，以及准确估计转录本频率。对合成数据集的实证分析证实，经DRUT增强的袖扣具有较好的转录本重建和频率估计质量。

引用次数: 8

miRNA-mRNA network detects hub mRNAs and cancer specific miRNAs in lung cancer. miRNA-mRNA网络在肺癌中检测中心mrna和癌症特异性mirna。

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/ISB-2012-0444

Saranya Devaraj, Jeyakumar Natarajan

MicroRNA expression profiles can improve classification, diagnosis, and prognostic information of malignancies, including lung cancer. In this paper, we undertook to develop a miRNA-mRNA network and uncover unique growth suppressive miRNAs in lung cancer using microarray data. The miRNA-mRNA network was developed based on a bipartite graph theory approach, and a number of miRNA-mRNA modules have been identified to mine associations between miRNAs and mRNAs. From the network, we identified totally 29 protective miRNA-mRNA regulatory modules, since we restricted our search to protective miRNAs. Subsequently we analyzed the pathways for the target genes in the protective miRNA-mRNA modules using Pathway-Express. The miRNA-mRNA network efficiently detects hub mRNAs deregulated by the protective miRNAs and identifies cancer specific miRNAs in lung cancer. From the pathway analysis results, the ECM receptor pathway, Focal adhesion pathway and cell adhesion molecules pathway seem to be more interesting to investigate, since these pathways were related to all the ten protective miRNAs. Furthermore, protective miRNA target analysis revealed that genes VCAN, SIL, CD44 and MMP14 were found to have an important role in these pathways. Hence, it was inferred that these genes can be important putative targets for those protective miRNAs. A greater understanding of the mechanisms regulating VCAN, SIL, CD44 and MMP14 expression and activity will assist in the development of specific inhibitors of cancer cell metastasis. Thus these observations are expected to have an intense implication in cancer and may be useful for further research.

MicroRNA表达谱可以改善包括肺癌在内的恶性肿瘤的分类、诊断和预后信息。在本文中，我们致力于开发miRNA-mRNA网络，并利用微阵列数据揭示肺癌中独特的生长抑制mirna。miRNA-mRNA网络是基于二部图理论方法开发的，并且已经确定了许多miRNA-mRNA模块来挖掘mirna和mrna之间的关联。从网络中，我们确定了总共29个保护性miRNA-mRNA调控模块，因为我们将搜索限制在保护性mirna上。随后，我们使用Pathway-Express分析了保护性miRNA-mRNA模块中靶基因的通路。miRNA-mRNA网络有效地检测被保护性mirna解除调控的枢纽mrna，并在肺癌中识别癌症特异性mirna。从途径分析结果来看，ECM受体途径、局灶黏附途径和细胞黏附分子途径似乎更值得研究，因为这些途径都与这10种保护性mirna相关。此外，保护性miRNA靶分析显示，基因VCAN、SIL、CD44和MMP14在这些途径中发挥重要作用。因此，我们推测这些基因可能是这些保护性mirna的重要靶点。进一步了解VCAN、SIL、CD44和MMP14表达和活性的调控机制将有助于开发特异性的癌细胞转移抑制剂。因此，这些观察结果有望对癌症产生强烈的影响，并可能对进一步的研究有用。

{"title":"miRNA-mRNA network detects hub mRNAs and cancer specific miRNAs in lung cancer.","authors":"Saranya Devaraj, Jeyakumar Natarajan","doi":"10.3233/ISB-2012-0444","DOIUrl":"https://doi.org/10.3233/ISB-2012-0444","url":null,"abstract":"MicroRNA expression profiles can improve classification, diagnosis, and prognostic information of malignancies, including lung cancer. In this paper, we undertook to develop a miRNA-mRNA network and uncover unique growth suppressive miRNAs in lung cancer using microarray data. The miRNA-mRNA network was developed based on a bipartite graph theory approach, and a number of miRNA-mRNA modules have been identified to mine associations between miRNAs and mRNAs. From the network, we identified totally 29 protective miRNA-mRNA regulatory modules, since we restricted our search to protective miRNAs. Subsequently we analyzed the pathways for the target genes in the protective miRNA-mRNA modules using Pathway-Express. The miRNA-mRNA network efficiently detects hub mRNAs deregulated by the protective miRNAs and identifies cancer specific miRNAs in lung cancer. From the pathway analysis results, the ECM receptor pathway, Focal adhesion pathway and cell adhesion molecules pathway seem to be more interesting to investigate, since these pathways were related to all the ten protective miRNAs. Furthermore, protective miRNA target analysis revealed that genes VCAN, SIL, CD44 and MMP14 were found to have an important role in these pathways. Hence, it was inferred that these genes can be important putative targets for those protective miRNAs. A greater understanding of the mechanisms regulating VCAN, SIL, CD44 and MMP14 expression and activity will assist in the development of specific inhibitors of cancer cell metastasis. Thus these observations are expected to have an intense implication in cancer and may be useful for further research.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"281-95"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31091786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads. QColors：从短而不连续的下一代测序读数中重建保守病毒类群的算法。

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/ISB-2012-0454

Austin Huang, Rami Kantor, Allison DeLong, Leeann Schreier, Sorin Istrail

Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.

下一代测序技术最近被用于描述艾滋病病毒感染者体内病毒基因型异质性群体（称为类群）的突变谱。这些信息具有临床意义，因为患者体内艾滋病毒的少数基因亚群可使病毒摆脱免疫反应和抗逆转录病毒疗法等选择压力。然而，从新一代测序读数中重建准物种序列的方法尚未得到广泛应用，仍是一个新兴的研究领域。此外，大多数艾滋病研究方法都集中在 454 测序上，而实际使用的许多新一代测序平台仅限于相对于 454 测序更短的读长。在确定如何以最佳方式解决其他平台的读长限制方面，几乎没有开展任何工作。本文介绍的方法结合了读数差异和读数重叠的图示，以保守的方式确定序列中具有足够变异性的区域，从而分离出准物种序列。在这些容易推断准物种的区域内，我们使用约束编程法，通过冲突图的顶点着色来求解最优的准物种子序列确定方法，这种表示方法也适用于非连续读数的数据，如成对端测序。我们将该方法应用于基于实际患者内克隆 HIV-1 测序数据的模拟，从而展示了该方法的实用性。

{"title":"QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads.","authors":"Austin Huang, Rami Kantor, Allison DeLong, Leeann Schreier, Sorin Istrail","doi":"10.3233/ISB-2012-0454","DOIUrl":"10.3233/ISB-2012-0454","url":null,"abstract":"Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"193-201"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530257/pdf/nihms879660.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving usability and accessibility of cheminformatics tools for chemists through cyberinfrastructure and education. 通过网络基础设施和教育提高化学家化学信息学工具的可用性和可访问性。

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/CI-2008-0015

Rajarshi Guha, Gary D Wiggins, David J Wild, Mu-Hyun Baik, Marlon E Pierce And, Geoffrey C Fox

Some of the latest trends in cheminformatics, computation, and the world wide web are reviewed with predictions of how these are likely to impact the field of cheminformatics in the next five years. The vision and some of the work of the Chemical Informatics and Cyberinfrastructure Collaboratory at Indiana University are described, which we base around the core concepts of e-Science and cyberinfrastructure that have proven successful in other fields. Our chemical informatics cyberinfrastructure is realized by building a flexible, generic infrastructure for cheminformatics tools and databases, exporting "best of breed" methods as easily-accessible web APIs for cheminformaticians, scientists, and researchers in other disciplines, and hosting a unique chemical informatics education program aimed at scientists and cheminformatics practitioners in academia and industry.

回顾了化学信息学、计算和万维网的一些最新趋势，并预测了这些趋势在未来五年内可能如何影响化学信息学领域。描述了印第安纳大学化学信息学和网络基础设施合作实验室的愿景和一些工作，我们基于电子科学和网络基础设施的核心概念，这些概念在其他领域已被证明是成功的。我们的化学信息学网络基础设施是通过为化学信息学工具和数据库建立一个灵活的、通用的基础设施来实现的，为化学信息学家、科学家和其他学科的研究人员提供“最佳品种”方法作为易于访问的web api，并为学术界和工业界的科学家和化学信息学从业者提供一个独特的化学信息学教育计划。

引用次数: 5

Differences in variability of hypervariable region 1 of hepatitis C virus (HCV) between acute and chronic stages of HCV infection. 丙型肝炎病毒(HCV)高变区1变异性在急性和慢性丙型肝炎感染阶段的差异

Q2 Medicine

In Silico Biology

Pub Date : 2011-01-01 DOI: 10.3233/ISB-2012-0451

I V Astrakhantseva, D S Campo, A Araujo, C-G Teo, Y Khudyakov, S Kamili

Distinguishing between acute and chronic HCV infections is clinically important given that early treatment of infected patients leads to high rates of sustained virological response. Analysis of 2179 clonal sequences derived from hypervariable region 1 (HVR1) of the HCV genome in samples obtained from patients with acute (n = 49) and chronic (n = 102) HCV infection showed that intra-host HVR1 diversity was 1.8 times higher in patients with chronic than acute infection. Significant differences in frequencies of 5 amino acids (positions 5, 7, 12, 16 and 18) and the average genetic distances among intra-host HVR1 variants were found using analysis of molecular variance. Differences were also observed in the polarity, volume and hydrophobicity of 10 amino acids (at positions 1, 4, 5, 12, 14, 15, 16, 21, 22 and 29). Based on these properties, a classification model could be constructed, which permitted HVR1 variants from acute and chronic cases to be discriminated with an accuracy of 88%. Progression from acute to chronic stage of HCV infection is accompanied by characteristic changes in amino acid composition of HVR1. Identifying these changes may permit diagnosis of recent HCV infection.

区分急性和慢性丙型肝炎病毒感染在临床上很重要，因为感染患者的早期治疗导致持续病毒学反应的高发率。对急性(n = 49)和慢性(n = 102) HCV感染患者的2179个HCV基因组高变区1 (HVR1)克隆序列的分析表明，慢性HCV感染患者的宿主内HVR1多样性是急性感染患者的1.8倍。利用分子方差分析，发现宿主内HVR1变异的5个氨基酸(位置5、7、12、16和18)的频率和平均遗传距离存在显著差异。10个氨基酸(位置1、4、5、12、14、15、16、21、22和29)的极性、体积和疏水性也存在差异。基于这些属性，可以构建一个分类模型，该模型允许区分急性和慢性病例的HVR1变异，准确率为88%。从急性到慢性HCV感染的进展伴随着HVR1氨基酸组成的特征性变化。识别这些变化可能有助于诊断最近的HCV感染。

{"title":"Differences in variability of hypervariable region 1 of hepatitis C virus (HCV) between acute and chronic stages of HCV infection.","authors":"I V Astrakhantseva, D S Campo, A Araujo, C-G Teo, Y Khudyakov, S Kamili","doi":"10.3233/ISB-2012-0451","DOIUrl":"https://doi.org/10.3233/ISB-2012-0451","url":null,"abstract":"Distinguishing between acute and chronic HCV infections is clinically important given that early treatment of infected patients leads to high rates of sustained virological response. Analysis of 2179 clonal sequences derived from hypervariable region 1 (HVR1) of the HCV genome in samples obtained from patients with acute (n = 49) and chronic (n = 102) HCV infection showed that intra-host HVR1 diversity was 1.8 times higher in patients with chronic than acute infection. Significant differences in frequencies of 5 amino acids (positions 5, 7, 12, 16 and 18) and the average genetic distances among intra-host HVR1 variants were found using analysis of molecular variance. Differences were also observed in the polarity, volume and hydrophobicity of 10 amino acids (at positions 1, 4, 5, 12, 14, 15, 16, 21, 22 and 29). Based on these properties, a classification model could be constructed, which permitted HVR1 variants from acute and chronic cases to be discriminated with an accuracy of 88%. Progression from acute to chronic stage of HCV infection is accompanied by characteristic changes in amino acid composition of HVR1. Identifying these changes may permit diagnosis of recent HCV infection.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"11 5-6","pages":"163-73"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/ISB-2012-0451","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31088149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22