首页 > 最新文献

Computational systems bioinformatics. Computational Systems Bioinformatics Conference最新文献

英文 中文
CBioC: beyond a prototype for collaborative annotation of molecular interactions from the literature. CBioC:超越文献中分子相互作用协作注释的原型。
C Baral, G Gonzalez, A Gitter, C Teegarden, A Zeigler, G Joshi-Topé

In molecular biology research, looking for information on a particular entity such as a gene or a protein may lead to thousands of articles, making it impossible for a researcher to individually read these articles and even just their abstracts. Thus, there is a need to curate the literature to get various nuggets of knowledge, such as an interaction between two proteins, and store them in a database. However the body of existing biomedical articles is growing at a very fast rate, making it impossible to curate them manually. An alternative approach of using computers for automatic extraction has problem with accuracy. We propose to leverage the advantages of both techniques, extracting binary relationships between biological entities automatically from the biomedical literature and providing a platform that allows community collaboration in the annotation of the extracted relationships. Thus, the community of researchers that writes and reads the biomedical texts can use the server for searching our database of extracted facts, and as an easy-to-use web platform to annotate facts relevant to them. We presented a preliminary prototype as a proof of concept earlier(1). This paper presents the working implementation available for download at http://www.cbioc.org as a browser-plug in for both Internet Explorer and FireFox. This current version has been available since June of 2006, and has over 160 registered users from around the world. Aside from its use as an annotation tool, data from CBioC has also been used in computational methods with encouraging results.

在分子生物学研究中,寻找特定实体(如基因或蛋白质)的信息可能会导致数千篇文章,这使得研究人员不可能单独阅读这些文章,甚至只是阅读它们的摘要。因此,有必要整理文献,以获得各种知识的金块,例如两种蛋白质之间的相互作用,并将它们存储在数据库中。然而,现有的生物医学文章正在以非常快的速度增长,这使得人工管理它们变得不可能。另一种使用计算机进行自动提取的方法存在准确性问题。我们建议利用这两种技术的优势,从生物医学文献中自动提取生物实体之间的二元关系,并提供一个平台,允许社区协作对提取的关系进行注释。因此,撰写和阅读生物医学文本的研究人员社区可以使用服务器搜索我们的提取事实数据库,并作为一个易于使用的web平台来注释与他们相关的事实。我们之前展示了一个初步原型作为概念验证(1)。本文提供了可从http://www.cbioc.org下载的工作实现,作为Internet Explorer和FireFox的浏览器插件。目前的版本从2006年6月开始提供,并且拥有来自世界各地的160多名注册用户。除了用作注释工具外,CBioC的数据还用于计算方法,并取得了令人鼓舞的结果。
{"title":"CBioC: beyond a prototype for collaborative annotation of molecular interactions from the literature.","authors":"C Baral,&nbsp;G Gonzalez,&nbsp;A Gitter,&nbsp;C Teegarden,&nbsp;A Zeigler,&nbsp;G Joshi-Topé","doi":"10.1142/9781860948732_0038","DOIUrl":"https://doi.org/10.1142/9781860948732_0038","url":null,"abstract":"<p><p>In molecular biology research, looking for information on a particular entity such as a gene or a protein may lead to thousands of articles, making it impossible for a researcher to individually read these articles and even just their abstracts. Thus, there is a need to curate the literature to get various nuggets of knowledge, such as an interaction between two proteins, and store them in a database. However the body of existing biomedical articles is growing at a very fast rate, making it impossible to curate them manually. An alternative approach of using computers for automatic extraction has problem with accuracy. We propose to leverage the advantages of both techniques, extracting binary relationships between biological entities automatically from the biomedical literature and providing a platform that allows community collaboration in the annotation of the extracted relationships. Thus, the community of researchers that writes and reads the biomedical texts can use the server for searching our database of extracted facts, and as an easy-to-use web platform to annotate facts relevant to them. We presented a preliminary prototype as a proof of concept earlier(1). This paper presents the working implementation available for download at http://www.cbioc.org as a browser-plug in for both Internet Explorer and FireFox. This current version has been available since June of 2006, and has over 160 registered users from around the world. Aside from its use as an annotation tool, data from CBioC has also been used in computational methods with encouraging results.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"381-4"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27061565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Algorithm for peptide sequencing by tandem mass spectrometry based on better preprocessing and anti-symmetric computational model. 基于改进预处理和反对称计算模型的串联质谱多肽测序算法。
Kang Ning, Hon Wai Leong

Peptide sequencing by tandem mass spectrometry is a very important, interesting, yet challenging problem in proteomics. This problem is extensively investigated by researchers recently, and the peptide sequencing results are becoming more and more accurate. However, many of these algorithms are using computational models based on some unverified assumptions. We believe that the investigation of the validity of these assumptions and related problems will lead to improvements in current algorithms. In this paper, we have first investigated peptide sequencing without preprocessing the spectrum, and we have shown that by introducing preprocessing on spectrum, peptide sequencing can be faster, easier and more accurate. We have then investigated one very important problem, the anti-symmetric problem in the peptide sequencing problem, and we have proved by experiments that model that simply ignore anti-symmetric of model that remove all anti-symmetric instances are too simple for peptide sequencing problem. We have proposed a new model for anti-symmetric problem in more realistic way. We have also proposed a novel algorithm which incorporate preprocessing and new model for anti-symmetric issue, and experiments show that this algorithm has better performance on datasets examined.

肽段串联质谱测序是蛋白质组学中一个非常重要、有趣但又具有挑战性的问题。近年来,这一问题得到了研究人员的广泛研究,肽测序结果也越来越准确。然而,这些算法中的许多都使用基于一些未经验证的假设的计算模型。我们相信,对这些假设和相关问题的有效性的调查将导致当前算法的改进。在本文中,我们首次研究了不经谱预处理的多肽测序,并证明了在谱上引入预处理可以使多肽测序更快、更容易、更准确。然后,我们研究了一个非常重要的问题,即肽序列问题中的反对称问题,并通过实验证明,对于肽序列问题,简单地忽略模型的反对称,去掉所有反对称实例的模型过于简单。我们提出了一种更现实的反对称问题的新模型。我们还提出了一种新的算法,该算法结合了预处理和新的模型来处理反对称问题,实验表明该算法在检验的数据集上具有更好的性能。
{"title":"Algorithm for peptide sequencing by tandem mass spectrometry based on better preprocessing and anti-symmetric computational model.","authors":"Kang Ning,&nbsp;Hon Wai Leong","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Peptide sequencing by tandem mass spectrometry is a very important, interesting, yet challenging problem in proteomics. This problem is extensively investigated by researchers recently, and the peptide sequencing results are becoming more and more accurate. However, many of these algorithms are using computational models based on some unverified assumptions. We believe that the investigation of the validity of these assumptions and related problems will lead to improvements in current algorithms. In this paper, we have first investigated peptide sequencing without preprocessing the spectrum, and we have shown that by introducing preprocessing on spectrum, peptide sequencing can be faster, easier and more accurate. We have then investigated one very important problem, the anti-symmetric problem in the peptide sequencing problem, and we have proved by experiments that model that simply ignore anti-symmetric of model that remove all anti-symmetric instances are too simple for peptide sequencing problem. We have proposed a new model for anti-symmetric problem in more realistic way. We have also proposed a novel algorithm which incorporate preprocessing and new model for anti-symmetric issue, and experiments show that this algorithm has better performance on datasets examined.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"19-30"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27061850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering of main orthologs for multiple genomes. 多基因组主要同源物的聚类。
Zheng Fu, Tao Jiang
The identification of orthologous genes shared by multiple genomes is critical for both functional and evolutionary studies in comparative genomics. While it is usually done by sequence similarity search and reconciled tree construction in practice, recently a new combinatorial approach and a high-throughput system MSOAR for ortholog identification between closely related genomes based on genome rearrangement and gene duplication have been proposed in (11). MSOAR assumes that orthologous genes correspond to each other in the most parsimonious evolutionary scenario minimizing the number of genome rearrangement and (post-speciation) gene duplication events. However, the parsimony approach used by MSOAR limits it to pairwsie genome comparisons. In this paper, we extend MSOAR to multiple (closely related) genomes and propose an ortholog clustering method, called MultiMSOAR, to infer main orthologs in multiple genomes. As a preliminary experiment, we apply MultiMSOAR to rat, mouse and human genomes, and validate our results using gene annotations and gene function classifications in the public databases. We further compare our results to the ortholog clusters predicted by MultiParanoid, which is an extension of the well-known program Inparanoid for pairwise genome comparisons. The comparison reveals that MultiMSOAR gives more detailed and accurate orthology information since it can effectively distinguish main orthologs from inparalogs.
多基因组共享的同源基因的鉴定对于比较基因组学的功能和进化研究至关重要。虽然在实践中通常通过序列相似性搜索和协调树构建来完成,但最近在(11)中提出了一种新的组合方法和基于基因组重排和基因复制的高通量系统MSOAR,用于密切相关基因组之间的同源性鉴定。MSOAR假设,在最简约的进化场景中,同源基因相互对应,使基因组重排和(物种形成后)基因复制事件的数量最小化。然而,MSOAR使用的简约方法限制了它的成对基因组比较。在本文中,我们将MSOAR扩展到多个(密切相关的)基因组,并提出了一种称为MultiMSOAR的同源聚类方法来推断多个基因组中的主要同源物。作为初步实验,我们将MultiMSOAR应用于大鼠、小鼠和人类基因组,并在公共数据库中使用基因注释和基因功能分类来验证我们的结果。我们进一步将我们的结果与MultiParanoid预测的同源聚类进行比较,MultiParanoid是著名的Inparanoid程序的扩展,用于两两基因组比较。结果表明,该方法能够有效区分主同源物和非同源物,提供了更详细、准确的同源信息。
{"title":"Clustering of main orthologs for multiple genomes.","authors":"Zheng Fu, Tao Jiang","doi":"10.1142/9781860948732_0022","DOIUrl":"https://doi.org/10.1142/9781860948732_0022","url":null,"abstract":"The identification of orthologous genes shared by multiple genomes is critical for both functional and evolutionary studies in comparative genomics. While it is usually done by sequence similarity search and reconciled tree construction in practice, recently a new combinatorial approach and a high-throughput system MSOAR for ortholog identification between closely related genomes based on genome rearrangement and gene duplication have been proposed in (11). MSOAR assumes that orthologous genes correspond to each other in the most parsimonious evolutionary scenario minimizing the number of genome rearrangement and (post-speciation) gene duplication events. However, the parsimony approach used by MSOAR limits it to pairwsie genome comparisons. In this paper, we extend MSOAR to multiple (closely related) genomes and propose an ortholog clustering method, called MultiMSOAR, to infer main orthologs in multiple genomes. As a preliminary experiment, we apply MultiMSOAR to rat, mouse and human genomes, and validate our results using gene annotations and gene function classifications in the public databases. We further compare our results to the ortholog clusters predicted by MultiParanoid, which is an extension of the well-known program Inparanoid for pairwise genome comparisons. The comparison reveals that MultiMSOAR gives more detailed and accurate orthology information since it can effectively distinguish main orthologs from inparalogs.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"16 1","pages":"195-201"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
IEM: an algorithm for iterative enhancement of motifs using comparative genomics data. IEM:一种利用比较基因组学数据迭代增强基序的算法。
Erliang Zeng, K. Mathee, G. Narasimhan
Understanding gene regulation is a key step to investigating gene functions and their relationships. Many algorithms have been developed to discover transcription factor binding sites (TFBS); they are predominantly located in upstream regions of genes and contribute to transcription regulation if they are bound by a specific transcription factor. However, traditional methods focusing on finding motifs have shortcomings, which can be overcome by using comparative genomics data that is now increasingly available. Traditional methods to score motifs also have their limitations. In this paper, we propose a new algorithm called IEM to refine motifs using comparative genomics data. We show the effectiveness of our techniques with several data sets. Two sets of experiments were performed with comparative genomics data on five strains of P. aeruginosa. One set of experiments were performed with similar data on four species of yeast. The weighted conservation score proposed in this paper is an improvement over existing motif scores.
了解基因调控是研究基因功能及其相互关系的关键一步。已经开发了许多算法来发现转录因子结合位点(TFBS);它们主要位于基因的上游区域,如果它们与特定的转录因子结合,则有助于转录调节。然而,传统的寻找基序的方法有缺点,这些缺点可以通过使用现在越来越多的比较基因组学数据来克服。传统的母题评分方法也有其局限性。在本文中,我们提出了一种新的算法,称为IEM,以细化基序使用比较基因组学数据。我们用几个数据集展示了我们的技术的有效性。利用比较基因组学数据对5株铜绿假单胞菌进行了两组实验。一组实验对四种酵母进行了类似的数据。本文提出的加权守恒分数是对现有基序分数的改进。
{"title":"IEM: an algorithm for iterative enhancement of motifs using comparative genomics data.","authors":"Erliang Zeng, K. Mathee, G. Narasimhan","doi":"10.1142/9781860948732_0025","DOIUrl":"https://doi.org/10.1142/9781860948732_0025","url":null,"abstract":"Understanding gene regulation is a key step to investigating gene functions and their relationships. Many algorithms have been developed to discover transcription factor binding sites (TFBS); they are predominantly located in upstream regions of genes and contribute to transcription regulation if they are bound by a specific transcription factor. However, traditional methods focusing on finding motifs have shortcomings, which can be overcome by using comparative genomics data that is now increasingly available. Traditional methods to score motifs also have their limitations. In this paper, we propose a new algorithm called IEM to refine motifs using comparative genomics data. We show the effectiveness of our techniques with several data sets. Two sets of experiments were performed with comparative genomics data on five strains of P. aeruginosa. One set of experiments were performed with similar data on four species of yeast. The weighted conservation score proposed in this paper is an improvement over existing motif scores.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"227-35"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A markov model based analysis of stochastic biochemical systems. 基于马尔可夫模型的随机生化系统分析。
P. Ghosh, Samik Ghosh, K. Basu, Sajial K Das
The molecular networks regulating basic physiological processes in a cell are generally converted into rate equations assuming the number of biochemical molecules as deterministic variables. At steady state these rate equations gives a set of differential equations that are solved using numerical methods. However, the stochastic cellular environment motivates us to propose a mathematical framework for analyzing such biochemical molecular networks. The stochastic simulators that solve a system of differential equations includes this stochasticity in the model, but suffer from simulation stiffness and require huge computational overheads. This paper describes a new markov chain based model to simulate such complex biological systems with reduced computation and memory overheads. The central idea is to transform the continuous domain chemical master equation (CME) based method into a discrete domain of molecular states with corresponding state transition probabilities and times. Our methodology allows the basic optimization schemes devised for the CME and can also be extended to reduce the computational and memory overheads appreciably at the cost of accuracy. The simulation results for the standard Enzyme-Kinetics and Transcriptional Regulatory systems show promising correspondence with the CME based methods and point to the efficacy of our scheme.
调节细胞基本生理过程的分子网络通常被转换成以生化分子数量为确定性变量的速率方程。在稳定状态下,这些速率方程给出一组微分方程,用数值方法求解。然而,随机细胞环境促使我们提出一个数学框架来分析这种生化分子网络。求解微分方程系统的随机模拟器在模型中包含了这种随机性,但受到模拟刚度的影响,并且需要大量的计算开销。本文描述了一种新的基于马尔可夫链的模型,以减少计算和内存开销来模拟这种复杂的生物系统。其核心思想是将基于连续域化学主方程(CME)的方法转化为具有相应状态转移概率和时间的分子状态离散域。我们的方法允许为CME设计的基本优化方案,也可以扩展到以准确性为代价显着减少计算和内存开销。标准酶动力学和转录调控系统的模拟结果表明,基于CME的方法具有良好的一致性,并指出了我们的方案的有效性。
{"title":"A markov model based analysis of stochastic biochemical systems.","authors":"P. Ghosh, Samik Ghosh, K. Basu, Sajial K Das","doi":"10.1142/9781860948732_0016","DOIUrl":"https://doi.org/10.1142/9781860948732_0016","url":null,"abstract":"The molecular networks regulating basic physiological processes in a cell are generally converted into rate equations assuming the number of biochemical molecules as deterministic variables. At steady state these rate equations gives a set of differential equations that are solved using numerical methods. However, the stochastic cellular environment motivates us to propose a mathematical framework for analyzing such biochemical molecular networks. The stochastic simulators that solve a system of differential equations includes this stochasticity in the model, but suffer from simulation stiffness and require huge computational overheads. This paper describes a new markov chain based model to simulate such complex biological systems with reduced computation and memory overheads. The central idea is to transform the continuous domain chemical master equation (CME) based method into a discrete domain of molecular states with corresponding state transition probabilities and times. Our methodology allows the basic optimization schemes devised for the CME and can also be extended to reduce the computational and memory overheads appreciably at the cost of accuracy. The simulation results for the standard Enzyme-Kinetics and Transcriptional Regulatory systems show promising correspondence with the CME based methods and point to the efficacy of our scheme.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"121-32"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Consensus contact prediction by linear programming. 线性规划的一致接触预测。
Xin Gao, D. Bu, S. Li, Ming Li, Jinbo Xu
Protein inter-residue contacts are of great use for protein structure determination or prediction. Recent CASP events have shown that a few accurately predicted contacts can help improve both computational efficiency and prediction accuracy of the ab inito folding methods. This paper develops an integer linear programming (ILP) method for consensus-based contact prediction. In contrast to the simple "majority voting" method assuming that all the individual servers are equal and independent, our method evaluates their correlations using the maximum likelihood method and constructs some latent independent servers using the principal component analysis technique. Then, we use an integer linear programming model to assign weights to these latent servers in order to maximize the deviation between the correct contacts and incorrect ones; our consensus prediction server is the weighted combination of these latent servers. In addition to the consensus information, our method also uses server-independent correlated mutation (CM) as one of the prediction features. Experimental results demonstrate that our contact prediction server performs better than the "majority voting" method. The accuracy of our method for the top L/5 contacts on CASP7 targets is 73.41%, which is much higher than previously reported studies. On the 16 free modeling (FM) targets, our method achieves an accuracy of 37.21%.
蛋白质残基间接触对蛋白质结构的测定和预测具有重要的意义。最近的CASP事件表明,一些准确预测的接触可以帮助提高蚁群折叠方法的计算效率和预测精度。提出了一种基于共识的接触预测的整数线性规划方法。与简单的“多数投票”方法相反,假设所有单独的服务器都是平等和独立的,我们的方法使用最大似然方法评估它们的相关性,并使用主成分分析技术构建一些潜在的独立服务器。然后,我们使用整数线性规划模型为这些潜在服务器分配权重,以使正确接触点与错误接触点之间的偏差最大化;我们的共识预测服务器是这些潜在服务器的加权组合。除了共识信息外,我们的方法还使用服务器无关的相关突变(CM)作为预测特征之一。实验结果表明,我们的接触预测服务器比“多数投票”方法性能更好。我们的方法对CASP7靶标上的前L/5位接触点的准确率为73.41%,远高于已有报道的研究。在16个自由建模(FM)目标上,我们的方法达到了37.21%的准确率。
{"title":"Consensus contact prediction by linear programming.","authors":"Xin Gao, D. Bu, S. Li, Ming Li, Jinbo Xu","doi":"10.1142/9781860948732_0033","DOIUrl":"https://doi.org/10.1142/9781860948732_0033","url":null,"abstract":"Protein inter-residue contacts are of great use for protein structure determination or prediction. Recent CASP events have shown that a few accurately predicted contacts can help improve both computational efficiency and prediction accuracy of the ab inito folding methods. This paper develops an integer linear programming (ILP) method for consensus-based contact prediction. In contrast to the simple \"majority voting\" method assuming that all the individual servers are equal and independent, our method evaluates their correlations using the maximum likelihood method and constructs some latent independent servers using the principal component analysis technique. Then, we use an integer linear programming model to assign weights to these latent servers in order to maximize the deviation between the correct contacts and incorrect ones; our consensus prediction server is the weighted combination of these latent servers. In addition to the consensus information, our method also uses server-independent correlated mutation (CM) as one of the prediction features. Experimental results demonstrate that our contact prediction server performs better than the \"majority voting\" method. The accuracy of our method for the top L/5 contacts on CASP7 targets is 73.41%, which is much higher than previously reported studies. On the 16 free modeling (FM) targets, our method achieves an accuracy of 37.21%.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"323-34"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
fRMSDPred: predicting local RMSD between structural fragments using sequence information. fRMSDPred:利用序列信息预测结构片段之间的局部RMSD。
H. Rangwala, G. Karypis
The effectiveness of comparative modeling approaches for protein structure prediction can be substantially improved by incorporating predicted structural information in the initial sequence-structure alignment. Motivated by the approaches used to align protein structures, this paper focuses on developing machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel functions. Our comprehensive empirical study shows superior results compared to the profile-to-profile scoring schemes.
通过在初始序列-结构比对中加入预测的结构信息,可以大大提高蛋白质结构预测的比较建模方法的有效性。受用于排列蛋白质结构的方法的启发,本文着重于开发用于估计一对蛋白质片段的RMSD值的机器学习方法。这些估计的片段级RMSD值可用于构建对齐,评估对齐的质量,并识别高质量的对齐片段。我们提出了一种算法来解决这个片段级RMSD预测问题,该算法使用基于支持向量回归和分类的监督学习框架,该框架结合了蛋白质谱、预测的二级结构、有效的信息编码方案和新的二阶成对指数核函数。我们的综合实证研究表明,与配置文件到配置文件的评分方案相比,效果更好。
{"title":"fRMSDPred: predicting local RMSD between structural fragments using sequence information.","authors":"H. Rangwala, G. Karypis","doi":"10.1142/9781860948732_0032","DOIUrl":"https://doi.org/10.1142/9781860948732_0032","url":null,"abstract":"The effectiveness of comparative modeling approaches for protein structure prediction can be substantially improved by incorporating predicted structural information in the initial sequence-structure alignment. Motivated by the approaches used to align protein structures, this paper focuses on developing machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel functions. Our comprehensive empirical study shows superior results compared to the profile-to-profile scoring schemes.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"141 1","pages":"311-22"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Effective labeling of molecular surface points for cavity detection and location of putative binding sites. 有效标记分子表面点,用于空腔检测和推定结合位点的定位。
Mary Ellen Bock, Claudio Garutti, Conettina Guerra

We present a method for detecting and comparing cavities on protein surfaces that is useful for protein binding site recognition. The method is based on a representation of the protein structures by a collection of spin-images and their associated spin-image profiles. Results of the cavity detection procedure are presented for a large set of non-redundant proteins and compared with SURFNET-ConSurf. Our comparison method is used to find a surface region in one cavity of a protein that is geometrically similar to a surface region in the cavity of another protein. Such a finding would be an indication that the two regions likely bind to the same ligand. Our overall approach for cavity detection and comparison is benchmarked on several pairs of known complexes, obtaining a good coverage of the atoms of the binding sites.

我们提出了一种检测和比较蛋白质表面空腔的方法,这对蛋白质结合位点识别很有用。该方法基于一组自旋图像及其相关的自旋图像轮廓来表示蛋白质结构。对大量非冗余蛋白进行了空腔检测,并与SURFNET-ConSurf进行了比较。我们的比较方法是用来找到一个表面区域在一个空腔的蛋白质,是几何上相似的表面区域在另一个蛋白质的空腔。这一发现表明,这两个区域可能与同一配体结合。我们对空腔检测和比较的总体方法以几对已知配合物为基准,获得了结合位点原子的良好覆盖。
{"title":"Effective labeling of molecular surface points for cavity detection and location of putative binding sites.","authors":"Mary Ellen Bock,&nbsp;Claudio Garutti,&nbsp;Conettina Guerra","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present a method for detecting and comparing cavities on protein surfaces that is useful for protein binding site recognition. The method is based on a representation of the protein structures by a collection of spin-images and their associated spin-image profiles. Results of the cavity detection procedure are presented for a large set of non-redundant proteins and compared with SURFNET-ConSurf. Our comparison method is used to find a surface region in one cavity of a protein that is geometrically similar to a surface region in the cavity of another protein. Such a finding would be an indication that the two regions likely bind to the same ligand. Our overall approach for cavity detection and comparison is benchmarked on several pairs of known complexes, obtaining a good coverage of the atoms of the binding sites.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"263-74"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27061073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An information theoretic method for reconstructing local regulatory network modules from polymorphic samples. 多态样本重构局部调控网络模块的信息理论方法。
Manjunatha Jagalur, David Kulp

Statistical relations between genome-wide mRNA transcript levels have been successfully used to infer regulatory relations among the genes, however the most successful methods have relied on additional data and focused on small sub-networks of genes. Along these lines, we recently demonstrated a model for simultaneously incorporating micro-array expression data with whole genome genotype marker data to identify causal pairwise relationships among genes. In this paper we extend this methodology to the principled construction of networks describing local regulatory modules. Our method is a two-step process: starting with a seed gene of interest, a Markov Blanket over genotype and gene expression observations is inferred according to differential entropy estimation; a Bayes Net is then constructed from the resulting variables with important biological constraints yielding causally correct relationships. We tested our method by simulating a regulatory network within the background of of a real data set. We found that 45% of the genes in a regulatory module can be identified and the relations among the genes can be recovered with moderately high accuracy (> 70%). Since sample size is a practical and economic limitation, we considered the impact of increasing the number of samples and found that recovery of true gene-gene relationships only doubled with ten times the number of samples, suggesting that useful networks can be achieved with current experimental designs, but that significant improvements are not expected without major increases in the number of samples. When we applied this method to an actual data set of 111 back-crossed mice we were able to recover local gene regulatory networks supported by the biological literature.

全基因组mRNA转录水平之间的统计关系已被成功地用于推断基因之间的调控关系,然而,最成功的方法依赖于额外的数据,并专注于基因的小子网络。沿着这些思路,我们最近展示了一个同时结合微阵列表达数据和全基因组基因型标记数据的模型,以确定基因之间的因果两两关系。在本文中,我们将这种方法扩展到描述本地监管模块的网络的原则构建。我们的方法是一个两步的过程:从感兴趣的种子基因开始,根据微分熵估计推断基因型和基因表达观察的马尔可夫毯;然后从具有重要生物约束的结果变量构建贝叶斯网络,从而产生因果正确的关系。我们通过在真实数据集的背景下模拟一个监管网络来测试我们的方法。我们发现一个调控模块中有45%的基因可以被识别,并且基因之间的关系可以以中等高的准确率恢复(> 70%)。由于样本量是一个实际的和经济的限制,我们考虑了增加样本量的影响,发现真正的基因-基因关系的恢复只有十倍的样本量,这表明有用的网络可以实现与目前的实验设计,但没有显著的改善预期显著增加样本量。当我们将这种方法应用于111只回交小鼠的实际数据集时,我们能够恢复生物学文献支持的局部基因调控网络。
{"title":"An information theoretic method for reconstructing local regulatory network modules from polymorphic samples.","authors":"Manjunatha Jagalur,&nbsp;David Kulp","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Statistical relations between genome-wide mRNA transcript levels have been successfully used to infer regulatory relations among the genes, however the most successful methods have relied on additional data and focused on small sub-networks of genes. Along these lines, we recently demonstrated a model for simultaneously incorporating micro-array expression data with whole genome genotype marker data to identify causal pairwise relationships among genes. In this paper we extend this methodology to the principled construction of networks describing local regulatory modules. Our method is a two-step process: starting with a seed gene of interest, a Markov Blanket over genotype and gene expression observations is inferred according to differential entropy estimation; a Bayes Net is then constructed from the resulting variables with important biological constraints yielding causally correct relationships. We tested our method by simulating a regulatory network within the background of of a real data set. We found that 45% of the genes in a regulatory module can be identified and the relations among the genes can be recovered with moderately high accuracy (> 70%). Since sample size is a practical and economic limitation, we considered the impact of increasing the number of samples and found that recovery of true gene-gene relationships only doubled with ten times the number of samples, suggesting that useful networks can be achieved with current experimental designs, but that significant improvements are not expected without major increases in the number of samples. When we applied this method to an actual data set of 111 back-crossed mice we were able to recover local gene regulatory networks supported by the biological literature.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"133-43"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27061633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning position weight matrices from sequence and expression data. 从序列和表达式数据中学习位置权重矩阵。
Xin Chen, Lingqiong Guo, Zhaocheng Fan, Tao Jiang
Position weight matrices (PWMs) are widely used to depict the DNA binding preferences of transcription factors (TFs) in computational molecular biology and regulatory genomics. Thus, learning an accurate PWM to characterize the binding sites of a specific TF is a fundamental problem that plays an important role in modeling regulatory motifs and discovering the binding targets of TFs. Given a set of binding sites bound by a TF, the learning problem can be formulated as a straightforward maximum likelihood problem, namely, finding a PWM such that the likelihood of the observed binding sites is maximized, and is usually solved by counting the base frequencies at each position of the aligned binding sequences. In this paper, we study the question of accurately learning a PWM from both binding site sequences and gene expression (or ChIP-chip) data. We revise the above maximum likelihood framework by taking into account the given gene expression or ChIP-chip data. More specifically, we attempt to find a PWM such that the likelihood of simultaneously observing both the binding sequences and the associated gene expression (or ChIP-chip) values is maximized, by using the sequence weighting scheme introduced in our recent work. We have incorporated this new approach for estimating PWMs into the popular motif finding program AlignACE. The modified program, called W-AlignACE, is compared with three other programs (AlignACE, MDscan, and MotifRegressor) on a variety of datasets, including simulated data, publicly available mRNA expression data, and ChIP-chip data. These large-scale tests demonstrate that W-AlignACE is an effective tool for discovering TF binding sites from gene expression or ChIP-chip data and, in particular, has the ability to find very weak motifs.
位置权重矩阵(PWMs)在计算分子生物学和调控基因组学中被广泛用于描述转录因子(TFs)的DNA结合偏好。因此,学习精确的PWM来表征特定TF的结合位点是一个基础问题,在建模调节基序和发现TF的结合靶点中起着重要作用。给定一组由TF结合的结合位点,学习问题可以表述为一个简单的最大似然问题,即找到一个使观察到的结合位点的似然最大化的PWM,通常通过计算对齐的结合序列的每个位置的基频来解决。在本文中,我们研究了从结合位点序列和基因表达(或ChIP-chip)数据中准确学习PWM的问题。我们通过考虑给定的基因表达或ChIP-chip数据来修改上述最大似然框架。更具体地说,我们试图通过使用我们最近工作中引入的序列加权方案,找到一个PWM,使同时观察结合序列和相关基因表达(或ChIP-chip)值的可能性最大化。我们已经将这种估算pwm的新方法整合到流行的motif查找程序AlignACE中。修改后的程序,称为W-AlignACE,在各种数据集上与其他三个程序(AlignACE, MDscan和MotifRegressor)进行比较,包括模拟数据,公开可用的mRNA表达数据和ChIP-chip数据。这些大规模试验表明,W-AlignACE是一种从基因表达或ChIP-chip数据中发现TF结合位点的有效工具,特别是能够发现非常弱的基序。
{"title":"Learning position weight matrices from sequence and expression data.","authors":"Xin Chen, Lingqiong Guo, Zhaocheng Fan, Tao Jiang","doi":"10.1142/9781860948732_0027","DOIUrl":"https://doi.org/10.1142/9781860948732_0027","url":null,"abstract":"Position weight matrices (PWMs) are widely used to depict the DNA binding preferences of transcription factors (TFs) in computational molecular biology and regulatory genomics. Thus, learning an accurate PWM to characterize the binding sites of a specific TF is a fundamental problem that plays an important role in modeling regulatory motifs and discovering the binding targets of TFs. Given a set of binding sites bound by a TF, the learning problem can be formulated as a straightforward maximum likelihood problem, namely, finding a PWM such that the likelihood of the observed binding sites is maximized, and is usually solved by counting the base frequencies at each position of the aligned binding sequences. In this paper, we study the question of accurately learning a PWM from both binding site sequences and gene expression (or ChIP-chip) data. We revise the above maximum likelihood framework by taking into account the given gene expression or ChIP-chip data. More specifically, we attempt to find a PWM such that the likelihood of simultaneously observing both the binding sequences and the associated gene expression (or ChIP-chip) values is maximized, by using the sequence weighting scheme introduced in our recent work. We have incorporated this new approach for estimating PWMs into the popular motif finding program AlignACE. The modified program, called W-AlignACE, is compared with three other programs (AlignACE, MDscan, and MotifRegressor) on a variety of datasets, including simulated data, publicly available mRNA expression data, and ChIP-chip data. These large-scale tests demonstrate that W-AlignACE is an effective tool for discovering TF binding sites from gene expression or ChIP-chip data and, in particular, has the ability to find very weak motifs.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"249-60"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Computational systems bioinformatics. Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1