Genome informatics. International Conference on Genome Informatics最新文献_第9页

Calculation of protein-ligand binding free energy using smooth reaction path generation (SRPG) method: a comparison of the explicit water model, gb/sa model and docking score function.

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01 DOI: 10.1142/9781848165632_0008

D. Mitomo, Y. Fukunishi, J. Higo, Haruki Nakamura

We compared the protein-ligand binding free energies (G) obtained by the explicit water model, the MM-GB/SA (molecular-mechanics generalized Born surface area) model, and the docking scoring function. The free energies by the explicit water model and the MM-GB/SA model were calculated by the previously developed Smooth Reaction Path Generation (SRPG) method. In the SRPG method, a smooth reaction path was generated by linking two coordinates, one a bound state and the other an unbound state. The free energy surface along the path was calculated by a molecular dynamics (MD) simulation, and the binding free energy was estimated from the free energy surface. We applied these methods to the streptavidin-and-biotin system. The G value by the explicit water model was close to the experimental value. The G value by the MM-GB/SA model was overestimated and that by the scoring function was underestimated. The free energy surface by the explicit water model was close to that by the GB/SA model around the bound state (distances of < 6 A), but the discrepancy appears at distances of > 6 A. Thus, the difference in long-range Coulomb interaction should cause the error in G. The scoring function cannot take into account the entropy change of the protein. Thus, the error of G could depend on the target protein.

我们比较了显式水模型、MM-GB/SA(分子力学广义Born表面积)模型和对接评分函数得到的蛋白质-配体结合自由能(G)。通过显式水模型和MM-GB/SA模型计算自由能，采用光滑反应路径生成(SRPG)方法。在SRPG方法中，通过连接两个坐标，一个是束缚态，另一个是非束缚态，来生成光滑的反应路径。通过分子动力学(MD)模拟计算了沿路径的自由能面，并从自由能面估计了结合自由能。我们将这些方法应用于链霉亲和素-生物素系统。显式水模型计算的G值与实验值较为接近。MM-GB/SA模型的G值被高估，评分函数的G值被低估。在束缚态(< 6 A)附近，显式水模型的自由能面与GB/SA模型的自由能面接近，但在> 6 A处出现差异。因此，远程库仑相互作用的差异会导致g的误差。评分函数不能考虑蛋白质的熵变。因此，G的误差可能取决于靶蛋白。

{"title":"Calculation of protein-ligand binding free energy using smooth reaction path generation (SRPG) method: a comparison of the explicit water model, gb/sa model and docking score function.","authors":"D. Mitomo, Y. Fukunishi, J. Higo, Haruki Nakamura","doi":"10.1142/9781848165632_0008","DOIUrl":"https://doi.org/10.1142/9781848165632_0008","url":null,"abstract":"We compared the protein-ligand binding free energies (G) obtained by the explicit water model, the MM-GB/SA (molecular-mechanics generalized Born surface area) model, and the docking scoring function. The free energies by the explicit water model and the MM-GB/SA model were calculated by the previously developed Smooth Reaction Path Generation (SRPG) method. In the SRPG method, a smooth reaction path was generated by linking two coordinates, one a bound state and the other an unbound state. The free energy surface along the path was calculated by a molecular dynamics (MD) simulation, and the binding free energy was estimated from the free energy surface. We applied these methods to the streptavidin-and-biotin system. The G value by the explicit water model was close to the experimental value. The G value by the MM-GB/SA model was overestimated and that by the scoring function was underestimated. The free energy surface by the explicit water model was close to that by the GB/SA model around the bound state (distances of < 6 A), but the discrepancy appears at distances of > 6 A. Thus, the difference in long-range Coulomb interaction should cause the error in G. The scoring function cannot take into account the entropy change of the protein. Thus, the error of G could depend on the target protein.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"6 1","pages":"85-97"},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84567597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Tools for investigating mechanisms of antigenic variation: new extensions to varDB. 研究抗原性变异机制的工具:varDB的新扩展。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01 DOI: 10.1142/9781848165632_0005

C. Hayes, Diego Diez, Nicolas Joannin, M. Kanehisa, M. Wahlgren, C. Wheelock, S. Goto

The varDB project (http://www.vardb.org) aims to create and maintain a curated database of antigenic variation sequences as well as a platform for online sequence analysis. Along with the evolution of drug resistance, antigenic variation presents a moving target for public health endeavors and greatly complicates vaccination and eradication efforts. However, careful analysis of a large number of variant forms may reveal structural and functional constraints that can be exploited to identify stable and cross-reactive targets. VarDB attempts to facilitate this effort by providing streamlined interfaces to standard tools to help identify and prepare sequences for various forms of analysis. We have newly implemented such tools for codon usage, selection, recombination, secondary and tertiary structure, and sequence diversity analysis. Just as the adaptive immune system encodes a mechanism for dynamically generating diverse receptors instead of encoding a receptor for every possible epitope, many pathogens take advantage of heritable diversity generating mechanisms to produce progeny able to evade immune recognition. Instead of merely cataloging the observed variation, a major goal of varDB is to characterize and predict the potential range of antigenic variation within a pathogen by investigating the mechanisms by which it attempts to expand its implicit genome. We believe that the new sequence analysis tools will improve the usefulness and range of varDB.

varDB项目(http://www.vardb.org)旨在创建和维护一个抗原变异序列的数据库，以及一个在线序列分析平台。随着耐药性的演变，抗原变异为公共卫生工作提供了一个移动的目标，并大大复杂化了疫苗接种和根除工作。然而，对大量变体形式的仔细分析可能会揭示结构和功能限制，可以利用这些限制来确定稳定和交叉反应的目标。VarDB试图通过为标准工具提供简化的接口来简化这一工作，以帮助识别和准备用于各种形式分析的序列。我们已经实现了新的密码子使用、选择、重组、二级和三级结构和序列多样性分析工具。正如适应性免疫系统编码一种动态产生多种受体的机制，而不是为每一个可能的表位编码一个受体一样，许多病原体利用遗传多样性产生机制来产生能够逃避免疫识别的后代。varDB的主要目标不是仅仅对观察到的变异进行编目，而是通过研究病原体试图扩展其隐式基因组的机制来表征和预测病原体内抗原变异的潜在范围。我们相信新的序列分析工具将提高varDB的有用性和范围。

{"title":"Tools for investigating mechanisms of antigenic variation: new extensions to varDB.","authors":"C. Hayes, Diego Diez, Nicolas Joannin, M. Kanehisa, M. Wahlgren, C. Wheelock, S. Goto","doi":"10.1142/9781848165632_0005","DOIUrl":"https://doi.org/10.1142/9781848165632_0005","url":null,"abstract":"The varDB project (http://www.vardb.org) aims to create and maintain a curated database of antigenic variation sequences as well as a platform for online sequence analysis. Along with the evolution of drug resistance, antigenic variation presents a moving target for public health endeavors and greatly complicates vaccination and eradication efforts. However, careful analysis of a large number of variant forms may reveal structural and functional constraints that can be exploited to identify stable and cross-reactive targets. VarDB attempts to facilitate this effort by providing streamlined interfaces to standard tools to help identify and prepare sequences for various forms of analysis. We have newly implemented such tools for codon usage, selection, recombination, secondary and tertiary structure, and sequence diversity analysis. Just as the adaptive immune system encodes a mechanism for dynamically generating diverse receptors instead of encoding a receptor for every possible epitope, many pathogens take advantage of heritable diversity generating mechanisms to produce progeny able to evade immune recognition. Instead of merely cataloging the observed variation, a major goal of varDB is to characterize and predict the potential range of antigenic variation within a pathogen by investigating the mechanisms by which it attempts to expand its implicit genome. We believe that the new sequence analysis tools will improve the usefulness and range of varDB.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"299 1","pages":"46-59"},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86762633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Predicting protein-protein relationships from literature using latent topics. 利用潜在主题从文献中预测蛋白质与蛋白质的关系。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01 DOI: 10.1142/9781848165632_0001

T. Aso, K. Eguchi

This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model. We also apply probabilistic Latent Semantic Analysis (pLSA) as a baseline for comparison, and compare them from the viewpoints of log-likelihood, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the others, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.

本文研究了应用统计主题模型来提取和预测生物实体之间的关系，特别是蛋白质提及。潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)是一种很有前途的统计主题模型;然而，它还没有被研究用于这样的任务。在本文中，我们应用最先进的崩溃变分贝叶斯推理和吉布斯抽样推理来估计LDA模型。我们还采用概率潜在语义分析(pLSA)作为基线进行比较，并从对数似然、分类准确率和检索效率的角度对它们进行了比较。通过实验证明，在蛋白质-蛋白质关系预测任务中，崩塌变分LDA在分类精度和检索效率方面优于其他方法。

引用次数: 10

A new generation of homology search tools based on probabilistic inference. 基于概率推理的新一代同源搜索工具。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01

Sean R Eddy

Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.

在应用概率推理方法来提高序列同源性搜索的能力方面，已经取得了许多理论进展，但BLAST套件程序仍然是大多数领域的主力。这样做的主要原因是实用的:BLAST的程序比最快的概率推理方法的竞争实现快100倍左右。我描述了最近在蛋白质序列分析的HMMER软件套件上的工作，该软件使用剖面隐马尔可夫模型实现了概率推断。我们在HMMER3中的目标是达到BLAST的速度，同时进一步提高基于概率推理的方法的能力。HMMER3实现了一种新的局部序列对齐概率模型和一种新的启发式加速算法。结合现代处理器上高效的矢量并行实现，这些改进协同作用。HMMER3使用更强大的对数概率可能性评分(评分总和超过对齐不确定性，而不是单一的最佳对齐评分);它计算准确的期望值(e值)，这些分数没有模拟使用Karlin/Altschul理论的推广;它计算可能对齐集合的后验分布，并返回每个对齐残差的后验概率(置信度);它的整体速度与BLAST相当。HMMER项目旨在引入基于概率推理方法的新一代更强大的同源性搜索工具。

{"title":"A new generation of homology search tools based on probabilistic inference.","authors":"Sean R Eddy","doi":"","DOIUrl":"","url":null,"abstract":"Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"23 1","pages":"205-11"},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28733802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparative analysis of topological patterns in different mammalian networks. 不同哺乳动物网络拓扑模式的比较分析。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01

Bjoern Goemann, Anatolij P Potapov, Michael Ante, Edgar Wingender

We have systematically analyzed various topological patterns comprising 1, 2 or 3 nodes in the mammalian metabolic, signal transduction and transcription networks: These patterns were analyzed with regard to their frequency and statistical over-representation in each network, as well as to their topological significance for the coherence of the networks. The latter property was evaluated using the pairwise disconnectivity index, which we have recently introduced to quantify how critical network components are for the internal connectedness of a network. The 1-node pattern made up by a vertex with a self-loop has been found to exert particular properties in all three networks. In general, vertices with a self-loop tend to be topologically more important than other vertices. Moreover, self-loops have been found to be attached to most 2-node and 3-node patterns, thereby emphasizing a particular role of self-loop components in the architectural organization of the networks. For none of the networks, a positive correlation between the mean topological significance and the Z-score of a pattern could be observed. That is, in general, motifs are not per se more important for the overall network coherence than patterns that are not over-represented. All 2- and 3-node patterns that are over-represented and thus qualified as motifs in all three networks exhibit a loop structure. This intriguing observation can be viewed as an advantage of loop-like structures in building up the regulatory circuits of the whole cell. The transcription network has been found to differ from the other networks in that (i) self-loops play an even higher role, (ii) its binary loops are highly enriched with self-loops attached, and (iii) feed-back loops are not over-represented. Metabolic networks reveal some particular topological properties which may reflect the fact that metabolic paths are, to a large extent, reversible. Interestingly, some of the most important 3-node patterns of both the transcription and the signaling network can be concatenated to subnetworks comprising many genes that play a particular role in the regulation of cell proliferation.

我们系统地分析了哺乳动物代谢、信号转导和转录网络中包含1、2或3个节点的各种拓扑模式:分析了这些模式在每个网络中的频率和统计代表性，以及它们对网络一致性的拓扑意义。后一种属性是使用两两断开性指数来评估的，我们最近引入了两两断开性指数来量化网络组件对网络内部连通性的关键程度。由顶点和自环组成的1节点模式在这三种网络中都表现出特殊的性质。一般来说，具有自环的顶点在拓扑上往往比其他顶点更重要。此外，自环已被发现附着在大多数2节点和3节点模式上，从而强调了自环组件在网络架构组织中的特殊作用。对于所有网络，可以观察到平均拓扑显著性与模式的z分数之间的正相关。也就是说，一般来说，母题本身并不比没有过度表现的模式对整个网络的连贯性更重要。所有2节点和3节点模式都被过度表示，因此在所有三个网络中都被限定为母题，表现出循环结构。这一有趣的观察结果可以被看作是环状结构在构建整个细胞的调节回路方面的优势。转录网络与其他网络的不同之处在于(i)自环发挥更大的作用，(ii)其二元环高度富集自环，(iii)反馈环没有过度代表。代谢网络揭示了一些特殊的拓扑性质，这可能反映了代谢途径在很大程度上是可逆的。有趣的是，转录和信号网络的一些最重要的3节点模式可以连接到由许多在细胞增殖调节中起特定作用的基因组成的子网络。

{"title":"Comparative analysis of topological patterns in different mammalian networks.","authors":"Bjoern Goemann, Anatolij P Potapov, Michael Ante, Edgar Wingender","doi":"","DOIUrl":"","url":null,"abstract":"We have systematically analyzed various topological patterns comprising 1, 2 or 3 nodes in the mammalian metabolic, signal transduction and transcription networks: These patterns were analyzed with regard to their frequency and statistical over-representation in each network, as well as to their topological significance for the coherence of the networks. The latter property was evaluated using the pairwise disconnectivity index, which we have recently introduced to quantify how critical network components are for the internal connectedness of a network. The 1-node pattern made up by a vertex with a self-loop has been found to exert particular properties in all three networks. In general, vertices with a self-loop tend to be topologically more important than other vertices. Moreover, self-loops have been found to be attached to most 2-node and 3-node patterns, thereby emphasizing a particular role of self-loop components in the architectural organization of the networks. For none of the networks, a positive correlation between the mean topological significance and the Z-score of a pattern could be observed. That is, in general, motifs are not per se more important for the overall network coherence than patterns that are not over-represented. All 2- and 3-node patterns that are over-represented and thus qualified as motifs in all three networks exhibit a loop structure. This intriguing observation can be viewed as an advantage of loop-like structures in building up the regulatory circuits of the whole cell. The transcription network has been found to differ from the other networks in that (i) self-loops play an even higher role, (ii) its binary loops are highly enriched with self-loops attached, and (iii) feed-back loops are not over-represented. Metabolic networks reveal some particular topological properties which may reflect the fact that metabolic paths are, to a large extent, reversible. Interestingly, some of the most important 3-node patterns of both the transcription and the signaling network can be concatenated to subnetworks comprising many genes that play a particular role in the regulation of cell proliferation.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"23 1","pages":"32-45"},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28734940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comprehensive analysis of sequence-structure relationships in the loop regions of proteins. 蛋白质环区序列结构关系的综合分析。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01

Shugo Nakamura, Kentaro Shimizu

Local sequence-structure relationships in the loop regions of proteins were comprehensively estimated using simple prediction tools based on support vector regression (SVR). End-to-end distance was selected as a rough structural property of fragments, and the end-to-end distances of an enormous number of loop fragments from a wide variety of protein folds were directly predicted from sequence information by using SVR. We found that our method was more accurate than random prediction for predicting the structure of fragments comprising 5, 9, and 17 amino acids; moreover, the extended loop fragments could be successfully distinguished from turn structures on the basis of their sequences, which implies that the sequence-structure relationships were significant for loop fragments with a wide range of end-to-end distances. These results suggest that many loop regions as well as helices and strands restrict the conformational space of the entire tertiary structure of proteins to some extent; moreover, our findings throw light on the mechanism of protein folding and prediction of the tertiary structure of proteins without using structural templates.

利用基于支持向量回归(SVR)的简单预测工具，对蛋白质环区局部序列结构关系进行综合估计。选取端到端距离作为片段的粗略结构属性，利用支持向量回归算法从序列信息中直接预测大量来自多种蛋白质折叠的环状片段的端到端距离。我们发现我们的方法在预测包含5、9和17个氨基酸的片段的结构时比随机预测更准确;此外，从序列上可以很好地区分出延伸的环状片段与转弯结构，这表明对于端到端距离较大的环状片段，序列-结构关系是显著的。这些结果表明，许多环区以及螺旋和链在一定程度上限制了蛋白质整个三级结构的构象空间;此外，我们的研究结果揭示了蛋白质折叠的机制和蛋白质三级结构的预测，而不使用结构模板。

引用次数: 0

Comparative analysis of aerobic and anaerobic prokaryotes to identify correlation between oxygen requirement and gene-gene functional association patterns. 好氧和厌氧原核生物的比较分析，以确定氧需求与基因-基因功能关联模式之间的相关性。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01 DOI: 10.1142/9781848165632_0007

Yaming Lin, Hongwei Wu

Activities of prokaryotes are pivotal in shaping the environment, and are also greatly influenced by the environment. With the substantial progress in genome and metagenome sequencing and the about-to-be-standardized ecological context information, environment-centric comparative genomics will complement species-centric comparative genomics, illuminating how environments have shaped and maintained prokaryotic diversities. In this paper we report our preliminary studies on the association analysis of a particular duo of genomic and ecological traits of prokaryotes--gene-gene functional association patterns vs. oxygen requirement conditions. We first establish a stochastic model to describe gene arrangements on chromosomes, based on which the functional association between genes are quantified. The gene-gene functional association measures are validated using biological process ontology and KEGG pathway annotations. Student's t-tests are then performed on the aerobic and anaerobic organisms to identify those gene pairs that exhibit different functional association patterns in the two different oxygen requirement conditions. As it is difficult to design and conduct biological experiments to validate those genome-environment association relationships that have resulted from long-term accumulative genome-environment interactions, we finally conduct computational validations to determine whether the oxygen requirement condition of an organism is predictable based on gene-gene functional association patterns. The reported study demonstrates the existence and significance of the association relationships between certain gene-gene functional association patterns and oxygen requirement conditions of prokaryotes, as well as the effectiveness of the adopted methodology for such association analysis.

原核生物的活动是塑造环境的关键，也受环境的极大影响。随着基因组和宏基因组测序的实质性进展以及即将标准化的生态背景信息，以环境为中心的比较基因组学将补充以物种为中心的比较基因组学，阐明环境如何塑造和维持原核生物多样性。在本文中，我们报告了我们对原核生物基因组和生态性状的关联分析的初步研究——基因-基因功能关联模式与需氧量条件的关系。我们首先建立了一个随机模型来描述基因在染色体上的排列，并在此基础上量化了基因之间的功能关联。利用生物过程本体和KEGG通路注释验证了基因-基因功能关联测度。然后对有氧和厌氧生物进行学生t检验，以确定在两种不同的氧气需求条件下表现出不同功能关联模式的基因对。由于很难设计和实施生物学实验来验证那些由长期累积的基因组-环境相互作用产生的基因组-环境关联关系，我们最终进行计算验证，以确定基于基因-基因功能关联模式的生物体的需氧量条件是否可预测。本研究证明了某些基因-基因功能关联模式与原核生物的需氧量条件之间存在关联关系并具有重要意义，以及所采用的关联分析方法的有效性。

{"title":"Comparative analysis of aerobic and anaerobic prokaryotes to identify correlation between oxygen requirement and gene-gene functional association patterns.","authors":"Yaming Lin, Hongwei Wu","doi":"10.1142/9781848165632_0007","DOIUrl":"https://doi.org/10.1142/9781848165632_0007","url":null,"abstract":"Activities of prokaryotes are pivotal in shaping the environment, and are also greatly influenced by the environment. With the substantial progress in genome and metagenome sequencing and the about-to-be-standardized ecological context information, environment-centric comparative genomics will complement species-centric comparative genomics, illuminating how environments have shaped and maintained prokaryotic diversities. In this paper we report our preliminary studies on the association analysis of a particular duo of genomic and ecological traits of prokaryotes--gene-gene functional association patterns vs. oxygen requirement conditions. We first establish a stochastic model to describe gene arrangements on chromosomes, based on which the functional association between genes are quantified. The gene-gene functional association measures are validated using biological process ontology and KEGG pathway annotations. Student's t-tests are then performed on the aerobic and anaerobic organisms to identify those gene pairs that exhibit different functional association patterns in the two different oxygen requirement conditions. As it is difficult to design and conduct biological experiments to validate those genome-environment association relationships that have resulted from long-term accumulative genome-environment interactions, we finally conduct computational validations to determine whether the oxygen requirement condition of an organism is predictable based on gene-gene functional association patterns. The reported study demonstrates the existence and significance of the association relationships between certain gene-gene functional association patterns and oxygen requirement conditions of prokaryotes, as well as the effectiveness of the adopted methodology for such association analysis.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"2013 1","pages":"72-84"},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73376822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Strategies toward CNS-regeneration using induced pluripotent stem cells. 诱导多能干细胞再生中枢神经系统的策略。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01 DOI: 10.1142/9781848165632_0022

H. Okano

Induced pluripotent stem (iPS) cells are pluripotent stem cells directly reprogrammed from cultured mouse fibroblast by introducing Oct3/4, Sox2, c-Myc, and Klf4. Cells obtained using this technology, which allows the ethical issues and immunological rejection associated with embryonic stem (ES) cells to be avoided, might be a clinically useful source for cell replacement therapics. Here we demonstrate that murine iPS cells formed neurospheres that produced electrophysiologically functional neurons, astrocytes, and oligodendrocytes. Secondary neurospheres (SNSs) generated from various mouse iPS cell showed their neural differentiation capacity and teratoma formation after transplantation into the brain of immunodeficient NOD/SCID mice. We found that origin (source of somatic cells) of the iPS cells are the crucial determinant for the potential tumorigenicity of iPS-derived neural stem/progenitor cclls and that their tumorigenicity results from the persistent presence of undifferentiated cells within the SNSs. Furthermore, transplantation of non-tumorigenic Nanog-iPS-derived SNSs into mouse spinal cord injury (SCI) model promoted locomotor function recovery. Surprisingly, SNSs derived from c-Myc minus iPS cells generated without drug selection showed robust tumorigenesis, in spite of their potential to contribute adult chimeric mice without tumor formation.

诱导多能干细胞(iPS)是通过引入Oct3/4、Sox2、c-Myc和Klf4，从培养的小鼠成纤维细胞中直接重编程而成的多能干细胞。使用这种技术获得的细胞可以避免与胚胎干细胞相关的伦理问题和免疫排斥反应，可能是细胞替代疗法的临床有用来源。在这里，我们证明了小鼠iPS细胞形成神经球，产生电生理功能神经元、星形胶质细胞和少突胶质细胞。由多种小鼠iPS细胞生成的继发性神经球(SNSs)移植到免疫缺陷NOD/SCID小鼠脑后显示出神经分化能力和畸胎瘤形成。我们发现，诱导多能干细胞的来源(体细胞来源)是诱导多能干细胞衍生的神经干细胞/祖细胞潜在致瘤性的关键决定因素，它们的致瘤性来自于诱导多能干细胞内持续存在的未分化细胞。此外，非致瘤性nanog - ips衍生的sns移植到小鼠脊髓损伤(SCI)模型中，促进了运动功能的恢复。令人惊讶的是，在没有药物选择的情况下产生的c-Myc - minus iPS细胞衍生的SNSs显示出强大的肿瘤发生能力，尽管它们有可能在没有肿瘤形成的成年嵌合小鼠中发挥作用。

{"title":"Strategies toward CNS-regeneration using induced pluripotent stem cells.","authors":"H. Okano","doi":"10.1142/9781848165632_0022","DOIUrl":"https://doi.org/10.1142/9781848165632_0022","url":null,"abstract":"Induced pluripotent stem (iPS) cells are pluripotent stem cells directly reprogrammed from cultured mouse fibroblast by introducing Oct3/4, Sox2, c-Myc, and Klf4. Cells obtained using this technology, which allows the ethical issues and immunological rejection associated with embryonic stem (ES) cells to be avoided, might be a clinically useful source for cell replacement therapics. Here we demonstrate that murine iPS cells formed neurospheres that produced electrophysiologically functional neurons, astrocytes, and oligodendrocytes. Secondary neurospheres (SNSs) generated from various mouse iPS cell showed their neural differentiation capacity and teratoma formation after transplantation into the brain of immunodeficient NOD/SCID mice. We found that origin (source of somatic cells) of the iPS cells are the crucial determinant for the potential tumorigenicity of iPS-derived neural stem/progenitor cclls and that their tumorigenicity results from the persistent presence of undifferentiated cells within the SNSs. Furthermore, transplantation of non-tumorigenic Nanog-iPS-derived SNSs into mouse spinal cord injury (SCI) model promoted locomotor function recovery. Surprisingly, SNSs derived from c-Myc minus iPS cells generated without drug selection showed robust tumorigenesis, in spite of their potential to contribute adult chimeric mice without tumor formation.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"1 1","pages":"217-20"},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88972986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Recount: expectation maximization based error correction tool for next generation sequencing data. 重新计算:基于期望最大化的错误校正工具，用于下一代测序数据。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01

Edward Wijaya, Martin C Frith, Yutaka Suzuki, Paul Horton

Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.

下一代测序技术能够快速、大规模地生产序列数据集。不幸的是，这些技术也有一个不可忽视的测序错误率，通过引入错误的读取和减少真实读取的数量，使它们的输出产生偏差。尽管为SAGE数据开发的方法可以在相当程度上减少这些错误计数，但到目前为止，它们还没有以可扩展的方式实施。最近，一个名为FREC的程序已经开发出来，以解决下一代测序数据的这个问题。在本文中，我们介绍了我们实现的标签计数校正的期望最大化算法，并将其与FREC进行了比较。使用参考基因组和模拟数据，我们发现重新计算的性能与FREC一样好，甚至更好，同时使用更少的内存(例如5GB对75GB)。此外，我们报告了在基因表达分析的背景下首次使用真实数据进行标签计数校正的分析。我们的研究结果表明，标签计数校正不仅增加了可映射标签的数量，而且可以对下一代测序数据的生物学解释产生真正的影响。重新计算是一个开源的c++程序，可以在http://seq.cbrc.jp/recount上找到。

{"title":"Recount: expectation maximization based error correction tool for next generation sequencing data.","authors":"Edward Wijaya, Martin C Frith, Yutaka Suzuki, Paul Horton","doi":"","DOIUrl":"","url":null,"abstract":"Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"23 1","pages":"189-201"},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28733801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting protein-protein relationships from literature using latent topics. 利用潜在主题从文献中预测蛋白质与蛋白质的关系。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2009-10-01

Tatsuya Aso, Koji Eguchi

This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model. We also apply probabilistic Latent Semantic Analysis (pLSA) as a baseline for comparison, and compare them from the viewpoints of log-likelihood, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the others, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.

本文研究了应用统计主题模型来提取和预测生物实体之间的关系，特别是蛋白质提及。潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)是一种很有前途的统计主题模型;然而，它还没有被研究用于这样的任务。在本文中，我们应用最先进的崩溃变分贝叶斯推理和吉布斯抽样推理来估计LDA模型。我们还采用概率潜在语义分析(pLSA)作为基线进行比较，并从对数似然、分类准确率和检索效率的角度对它们进行了比较。通过实验证明，在蛋白质-蛋白质关系预测任务中，崩塌变分LDA在分类精度和检索效率方面优于其他方法。

引用次数: 0