首页 > 最新文献

Proceedings. IEEE Computational Systems Bioinformatics Conference最新文献

英文 中文
Inverse protein folding in 2D HP mode (extended abstract). 二维HP模式下的反向蛋白质折叠(扩展摘要)。
Arvind Gupta, Ján Manuch, Ladislav Stacho

The inverse protein folding problem is that of designing an amino acid sequence which has a particular native protein fold. This problem arises in drug design where a particular structure is necessary to ensure proper protein-protein interactions. In this paper we show that in the 2D HP model of Dill it is possible to solve this problem for a broad class of structures. These structures can be used to closely approximate any given structure. One of the most important properties of a good protein is its stability -- the aptitude not to fold simultanously into other structures. We show that for a number of basic structures, our sequences have a unique fold.

蛋白质反折叠问题是设计具有特定天然蛋白质折叠的氨基酸序列的问题。这个问题出现在药物设计中,其中一个特定的结构是必要的,以确保适当的蛋白质-蛋白质相互作用。在本文中,我们证明了在Dill的二维HP模型中,可以解决这一问题。这些结构可以用来近似任何给定的结构。好的蛋白质最重要的特性之一是它的稳定性,即不会同时折叠成其他结构的能力。我们证明了对于一些基本结构,我们的序列有一个独特的褶皱。
{"title":"Inverse protein folding in 2D HP mode (extended abstract).","authors":"Arvind Gupta,&nbsp;Ján Manuch,&nbsp;Ladislav Stacho","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The inverse protein folding problem is that of designing an amino acid sequence which has a particular native protein fold. This problem arises in drug design where a particular structure is necessary to ensure proper protein-protein interactions. In this paper we show that in the 2D HP model of Dill it is possible to solve this problem for a broad class of structures. These structures can be used to closely approximate any given structure. One of the most important properties of a good protein is its stability -- the aptitude not to fold simultanously into other structures. We show that for a number of basic structures, our sequences have a unique fold.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"311-8"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25831033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple RNA structure alignment. 多重RNA结构比对。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332438
Zhuozhi Wang, Kaizhong Zhang

RNA structures can be viewed as a kind of special strings with some characters bonded with each other. The question of aligning two RNA structures has been studied for a while, and there are several successful algorithms that are based upon different models. In this paper, by adopting the model introduced in [18], we propose two algorithms to attack the question of aligning multiple RNA structures. We reduce the multiple RNA structure alignment problem to the problem of aligning two RNA structure alignments.

RNA结构可以看作是一种特殊的字符串,其中一些字符相互连接。对齐两个RNA结构的问题已经研究了一段时间,并且有几种基于不同模型的成功算法。在本文中,我们采用[18]中引入的模型,提出了两种算法来解决多个RNA结构对齐问题。我们将多个RNA结构比对问题简化为两个RNA结构比对问题。
{"title":"Multiple RNA structure alignment.","authors":"Zhuozhi Wang,&nbsp;Kaizhong Zhang","doi":"10.1109/csb.2004.1332438","DOIUrl":"https://doi.org/10.1109/csb.2004.1332438","url":null,"abstract":"<p><p>RNA structures can be viewed as a kind of special strings with some characters bonded with each other. The question of aligning two RNA structures has been studied for a while, and there are several successful algorithms that are based upon different models. In this paper, by adopting the model introduced in [18], we propose two algorithms to attack the question of aligning multiple RNA structures. We reduce the multiple RNA structure alignment problem to the problem of aligning two RNA structure alignments.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"246-54"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332438","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gridding and compression of microarray images. 微阵列图像的网格化和压缩。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332424
Stefano Lonardi, Yu Luo

With the recent explosion of interest in microarray technology, massive amounts of microarray images are currently being produced. The storage and the transmission of this type of data are becoming increasingly challenging. Here we propose lossless and lossy compression algorithms for microarray images originally digitized at 16 bpp (bits per pixels) that achieve an average of 9.5 - 11.5 bpp (lossless) and 4.6 - 6.7 bpp (lossy, with a PSNR of 63 dB). The lossy compression is applied only on the background of the image, thereby preserving the regions of interest. The methods are based on a completely automatic gridding procedure of the image.

随着最近对微阵列技术的兴趣激增,目前正在生产大量的微阵列图像。这类数据的存储和传输正变得越来越具有挑战性。在这里,我们提出了原始以16 bpp(比特每像素)数字化的微阵列图像的无损和有损压缩算法,平均达到9.5 - 11.5 bpp(无损)和4.6 - 6.7 bpp(有损,PSNR为63 dB)。有损压缩仅应用于图像的背景,从而保留感兴趣的区域。该方法基于图像的完全自动网格化过程。
{"title":"Gridding and compression of microarray images.","authors":"Stefano Lonardi,&nbsp;Yu Luo","doi":"10.1109/csb.2004.1332424","DOIUrl":"https://doi.org/10.1109/csb.2004.1332424","url":null,"abstract":"<p><p>With the recent explosion of interest in microarray technology, massive amounts of microarray images are currently being produced. The storage and the transmission of this type of data are becoming increasingly challenging. Here we propose lossless and lossy compression algorithms for microarray images originally digitized at 16 bpp (bits per pixels) that achieve an average of 9.5 - 11.5 bpp (lossless) and 4.6 - 6.7 bpp (lossy, with a PSNR of 63 dB). The lossy compression is applied only on the background of the image, thereby preserving the regions of interest. The methods are based on a completely automatic gridding procedure of the image.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"122-30"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332424","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressed pattern matching in DNA sequences. DNA序列的压缩模式匹配。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332418
Lei Chen, Shiyong Lu, Jeffrey Ram

We propose derivative Boyer-Moore (d-BM), a new compressed pattern matching algorithm in DNA sequences. This algorithm is based on the Boyer-Moore method, which is one of the most popular string matching algorithms. In this approach, we compress both DNA sequences and patterns by using two bits to represent each A, T, C, G character. Experiments indicate that this compressed pattern matching algorithm searches long DNA patterns (length > 50) more than 10 times faster than the exact match routine of the software package Agrep, which is known as the fastest pattern matching tool. Moreover, compression of DNA sequences by this method gives a guaranteed space saving of 75%. In part the enhanced speed of the algorithm is due to the increased efficiency of the Boyer-Moore method resulting from an increase in alphabet size from 4 to 256.

提出了一种新的DNA序列压缩模式匹配算法——导数Boyer-Moore (d-BM)算法。该算法基于Boyer-Moore方法,这是最流行的字符串匹配算法之一。在这种方法中,我们通过使用两个比特来表示每个A, T, C, G字符来压缩DNA序列和模式。实验表明,该压缩模式匹配算法搜索长DNA模式(长度> 50)的速度比软件包Agrep精确匹配例程快10倍以上,后者被称为最快的模式匹配工具。此外,用这种方法压缩DNA序列可以保证节省75%的空间。在某种程度上,算法速度的提高是由于将字母表大小从4增加到256而提高了Boyer-Moore方法的效率。
{"title":"Compressed pattern matching in DNA sequences.","authors":"Lei Chen,&nbsp;Shiyong Lu,&nbsp;Jeffrey Ram","doi":"10.1109/csb.2004.1332418","DOIUrl":"https://doi.org/10.1109/csb.2004.1332418","url":null,"abstract":"<p><p>We propose derivative Boyer-Moore (d-BM), a new compressed pattern matching algorithm in DNA sequences. This algorithm is based on the Boyer-Moore method, which is one of the most popular string matching algorithms. In this approach, we compress both DNA sequences and patterns by using two bits to represent each A, T, C, G character. Experiments indicate that this compressed pattern matching algorithm searches long DNA patterns (length > 50) more than 10 times faster than the exact match routine of the software package Agrep, which is known as the fastest pattern matching tool. Moreover, compression of DNA sequences by this method gives a guaranteed space saving of 75%. In part the enhanced speed of the algorithm is due to the increased efficiency of the Boyer-Moore method resulting from an increase in alphabet size from 4 to 256.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"62-8"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332418","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. 配对随机树相邻语法对准和预测假结RNA结构。
Hiroshi Matsui, Kengo Sato, Yasubumi Sakakibara

Motivation: Since the whole genome sequences for many species are currently available, computational predictions of RNA secondary structures and computational identifications of those non-coding RNA regions by comparative genomics become important, and require more advanced alignment methods. Recently, an approach of structural alignments for RNA sequences has been introduced to solve these problems. By structural alignments, we mean a pairwise alignment to align an unfolded RNA sequence into a folded RNA sequence of known secondary structure. Pair HMMs on tree structures (PHMMTSs) proposed by Sakakibara are efficient automata-theoretic models for structural alignments of RNA secondary structures, but are incapable of handling pseudoknots. On the other hand, tree adjoining grammars (TAGs) is a subclass of context-sensitive grammar, which is suitable for modeling pseudoknots. Our goal is to extend PHMMTSs by incorporating TAGs to be able to handle pseudoknots.

Results: We propose the pair stochastic tree adjoining grammars (PSTAGs) for modeling RNA secondary structures including pseudoknots and show the strong experimental evidences that modeling pseudoknot structures significantly improves the prediction accuracies of RNA secondary structures. First, we extend the notion of PHMMTSs defined on alignments of 'trees' to PSTAGs defined on alignments of "TAG (derivation) trees", which represent a top-down parsing process of TAGs and are functionally equivalent to derived trees of TAGs. Second, we modify PSTAGs so that it takes as input a pair of a linear sequence and a TAG tree representing a pseudoknot structure of RNA to produce a structural alignment. Then, we develop a polynomial-time algorithm for obtaining an optimal structural alignment by PSTAGs, based on dynamic programming parser. We have done several computational experiments for predicting pseudoknots by PSTAGs, and our computational experiments suggests that prediction of RNA pseudoknot structures by our method are more efficient and biologically plausible than by other conventional methods. The binary code for PSTAG method is freely available from our website at http://www.dna.bio.keio.ac.jp/pstag/.

动机:由于目前许多物种的全基因组序列是可用的,通过比较基因组学计算预测RNA二级结构和计算鉴定那些非编码RNA区域变得重要,并且需要更先进的比对方法。最近,一种RNA序列结构比对的方法被引入来解决这些问题。通过结构比对,我们指的是成对比对,将未折叠的RNA序列与已知二级结构的折叠RNA序列对齐。由Sakakibara提出的树结构对hmm (PHMMTSs)是RNA二级结构排列的有效自动机模型,但不能处理假结。另一方面,树相邻语法(tag)是上下文敏感语法的一个子类,适合于建模伪结。我们的目标是通过合并tag来扩展phmmts,使其能够处理伪结。结果:我们提出了对随机树相邻语法(PSTAGs)来建模包括假结在内的RNA二级结构,并展示了强有力的实验证据,表明建模假结结构显著提高了RNA二级结构的预测精度。首先,我们将定义在“树”对齐上的phmmts的概念扩展到定义在“标签(派生)树”对齐上的PSTAGs,它代表了标签的自顶向下解析过程,在功能上等同于标签的派生树。其次,我们修改PSTAGs,使其以一对线性序列和代表RNA伪结结构的TAG树作为输入,以产生结构比对。在此基础上,基于动态规划解析器,提出了一种利用PSTAGs获得最优结构对齐的多项式时间算法。我们已经做了几个通过PSTAGs预测假结的计算实验,我们的计算实验表明,用我们的方法预测RNA假结结构比其他传统方法更有效,生物学上更合理。PSTAG方法的二进制代码可从我们的网站http://www.dna.bio.keio.ac.jp/pstag/免费获得。
{"title":"Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures.","authors":"Hiroshi Matsui,&nbsp;Kengo Sato,&nbsp;Yasubumi Sakakibara","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Motivation: </strong>Since the whole genome sequences for many species are currently available, computational predictions of RNA secondary structures and computational identifications of those non-coding RNA regions by comparative genomics become important, and require more advanced alignment methods. Recently, an approach of structural alignments for RNA sequences has been introduced to solve these problems. By structural alignments, we mean a pairwise alignment to align an unfolded RNA sequence into a folded RNA sequence of known secondary structure. Pair HMMs on tree structures (PHMMTSs) proposed by Sakakibara are efficient automata-theoretic models for structural alignments of RNA secondary structures, but are incapable of handling pseudoknots. On the other hand, tree adjoining grammars (TAGs) is a subclass of context-sensitive grammar, which is suitable for modeling pseudoknots. Our goal is to extend PHMMTSs by incorporating TAGs to be able to handle pseudoknots.</p><p><strong>Results: </strong>We propose the pair stochastic tree adjoining grammars (PSTAGs) for modeling RNA secondary structures including pseudoknots and show the strong experimental evidences that modeling pseudoknot structures significantly improves the prediction accuracies of RNA secondary structures. First, we extend the notion of PHMMTSs defined on alignments of 'trees' to PSTAGs defined on alignments of \"TAG (derivation) trees\", which represent a top-down parsing process of TAGs and are functionally equivalent to derived trees of TAGs. Second, we modify PSTAGs so that it takes as input a pair of a linear sequence and a TAG tree representing a pseudoknot structure of RNA to produce a structural alignment. Then, we develop a polynomial-time algorithm for obtaining an optimal structural alignment by PSTAGs, based on dynamic programming parser. We have done several computational experiments for predicting pseudoknots by PSTAGs, and our computational experiments suggests that prediction of RNA pseudoknot structures by our method are more efficient and biologically plausible than by other conventional methods. The binary code for PSTAG method is freely available from our website at http://www.dna.bio.keio.ac.jp/pstag/.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"290-9"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25831031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rec-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic trees. Rec-I-DCM3:用于重建大型系统发育树的快速算法技术。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332422
Usman W Roshan, Bernard M Moret, Tandy Warnow, Tiffani L Williams

Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP (and presumably ML) is NP-hard, such approaches do not scale when applied to large datasets. In this paper, we present a new technique called Recursive-Iterative-DCM3 (Rec-I-DCM3), which belongs to our family of Disk-Covering Methods (DCMs). We tested this new technique on ten large biological datasets ranging from 1,322 to 13,921 sequences and obtained dramatic speedups as well as significant improvements in accuracy (better than 99.99%) in comparison to existing approaches. Thus, high-quality reconstructions can be obtained for datasets at least ten times larger than was previously possible.

系统发育树通常是基于最大简约性(MP)和最大似然性(ML)等硬优化问题来重建的。用于生成系统发育树的传统MP启发式方法在合理的时间内对小数据集(最多几千个序列)产生良好的解决方案,而ML启发式方法仅限于较小的数据集(最多几百个序列)。然而,由于MP(可能还有ML)是np困难的,因此当应用于大型数据集时,这种方法无法扩展。本文提出了递归-迭代- dcm3 (Rec-I-DCM3)新技术,它属于我们的磁盘覆盖方法(dcm)家族。我们在10个大型生物数据集上测试了这种新技术,范围从1,322到13,921个序列,与现有方法相比,获得了显着的加速和准确性的显着提高(优于99.99%)。因此,对于比以前可能的数据集至少大十倍的数据集,可以获得高质量的重建。
{"title":"Rec-I-DCM3: a fast algorithmic technique for reconstructing large phylogenetic trees.","authors":"Usman W Roshan,&nbsp;Bernard M Moret,&nbsp;Tandy Warnow,&nbsp;Tiffani L Williams","doi":"10.1109/csb.2004.1332422","DOIUrl":"https://doi.org/10.1109/csb.2004.1332422","url":null,"abstract":"<p><p>Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP (and presumably ML) is NP-hard, such approaches do not scale when applied to large datasets. In this paper, we present a new technique called Recursive-Iterative-DCM3 (Rec-I-DCM3), which belongs to our family of Disk-Covering Methods (DCMs). We tested this new technique on ten large biological datasets ranging from 1,322 to 13,921 sequences and obtained dramatic speedups as well as significant improvements in accuracy (better than 99.99%) in comparison to existing approaches. Thus, high-quality reconstructions can be obtained for datasets at least ten times larger than was previously possible.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"98-109"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332422","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
A theoretical analysis of gene selection. 基因选择的理论分析。
Sach Mukherjee, Stephen J Roberts

A great deal of recent research has focused on the challenging task of selecting differentially expressed genes from microarray data ('gene selection'). Numerous gene selection algorithms have been proposed in the literature, but it is often unclear exactly how these algorithms respond to conditions like small sample-sizes or differing variances. Choosing an appropriate algorithm can therefore be difficult in many cases. In this paper we propose a theoretical analysis of gene selection, in which the probability of successfully selecting relevant genes, using a given gene ranking function, is explicitly calculated in terms of population parameters. The theory developed is applicable to any ranking function which has a known sampling distribution, or one which can be approximated analytically. In contrast to empirical methods, the analysis can easily be used to examine the behaviour of gene selection algorithms under a wide variety of conditions, even when the numbers of genes involved runs into the tens of thousands. The utility of our approach is illustrated by comparing three well-known gene ranking functions.

最近的大量研究都集中在从微阵列数据中选择差异表达基因(“基因选择”)的挑战性任务上。文献中提出了许多基因选择算法,但通常不清楚这些算法究竟如何应对小样本量或不同方差等条件。因此,在许多情况下,选择合适的算法是很困难的。在本文中,我们提出了一个基因选择的理论分析,其中成功选择相关基因的概率,使用一个给定的基因排序函数,明确地计算根据群体参数。所建立的理论适用于任何已知抽样分布的排序函数,或可以解析近似的排序函数。与经验方法相比,这种分析可以很容易地用于检查基因选择算法在各种条件下的行为,即使涉及的基因数量达到数万个。我们的方法的效用是通过比较三个著名的基因排序函数说明。
{"title":"A theoretical analysis of gene selection.","authors":"Sach Mukherjee,&nbsp;Stephen J Roberts","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A great deal of recent research has focused on the challenging task of selecting differentially expressed genes from microarray data ('gene selection'). Numerous gene selection algorithms have been proposed in the literature, but it is often unclear exactly how these algorithms respond to conditions like small sample-sizes or differing variances. Choosing an appropriate algorithm can therefore be difficult in many cases. In this paper we propose a theoretical analysis of gene selection, in which the probability of successfully selecting relevant genes, using a given gene ranking function, is explicitly calculated in terms of population parameters. The theory developed is applicable to any ranking function which has a known sampling distribution, or one which can be approximated analytically. In contrast to empirical methods, the analysis can easily be used to examine the behaviour of gene selection algorithms under a wide variety of conditions, even when the numbers of genes involved runs into the tens of thousands. The utility of our approach is illustrated by comparing three well-known gene ranking functions.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"131-41"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastR: fast database search tool for non-coding RNA. FastR:非编码RNA的快速数据库搜索工具。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332417
Vineet Bafna, Shaojie Zhang

The discovery of novel non-coding RNAs has been among the most exciting recent developments in Biology. Yet, many more remain undiscovered. It has been hypothesized that there is in fact an abundance of functional non-coding RNA (ncRNA) with various catalytic and regulatory functions. Computational methods tailored specifically for ncRNA are being actively developed. As the inherent signal for ncRNA is weaker than that for protein coding genes, comparative methods offer the most promising approach, and are the subject of our research. We consider the following problem: Given an RNA sequence with a known secondary structure, efficiently compute all structural homologs (computed as a function of sequence and structural similarity) in a genomic database. Our approach, based on structural filters that eliminate a large portion of the database, while retaining the true homologs allows us to search a typical bacterial database in minutes on a standard PC, with high sensitivity and specificity. This is two orders of magnitude better than current available software for the problem.

新的非编码rna的发现是生物学最近最令人兴奋的发展之一。然而,还有更多尚未被发现。据推测,实际上存在大量具有各种催化和调节功能的功能性非编码RNA (ncRNA)。目前正在积极开发专门针对ncRNA的计算方法。由于ncRNA的固有信号弱于蛋白质编码基因,比较方法是最有希望的方法,也是我们研究的主题。我们考虑以下问题:给定具有已知二级结构的RNA序列,有效地计算基因组数据库中所有结构同源性(作为序列和结构相似性的函数计算)。我们的方法,基于结构过滤器,消除了数据库的很大一部分,同时保留了真正的同源物,使我们能够在几分钟内在标准PC上搜索一个典型的细菌数据库,具有高灵敏度和特异性。这比目前可用的解决该问题的软件好两个数量级。
{"title":"FastR: fast database search tool for non-coding RNA.","authors":"Vineet Bafna,&nbsp;Shaojie Zhang","doi":"10.1109/csb.2004.1332417","DOIUrl":"https://doi.org/10.1109/csb.2004.1332417","url":null,"abstract":"<p><p>The discovery of novel non-coding RNAs has been among the most exciting recent developments in Biology. Yet, many more remain undiscovered. It has been hypothesized that there is in fact an abundance of functional non-coding RNA (ncRNA) with various catalytic and regulatory functions. Computational methods tailored specifically for ncRNA are being actively developed. As the inherent signal for ncRNA is weaker than that for protein coding genes, comparative methods offer the most promising approach, and are the subject of our research. We consider the following problem: Given an RNA sequence with a known secondary structure, efficiently compute all structural homologs (computed as a function of sequence and structural similarity) in a genomic database. Our approach, based on structural filters that eliminate a large portion of the database, while retaining the true homologs allows us to search a typical bacterial database in minutes on a standard PC, with high sensitivity and specificity. This is two orders of magnitude better than current available software for the problem.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"52-61"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332417","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gene Ontology friendly biclustering of expression profiles. 基因本体友好的表达谱双聚类。
Jinze Liu, Wei Wang, Jiong Yang

The soundness of clustering in the analysis of gene expression profiles and gene function prediction is based on the hypothesis that genes with similar expression profiles may imply strong correlations with their functions in the biological activities. Gene Ontology (GO) has become a well accepted standard in organizing gene function categories. Different gene function categories in GO can have very sophisticated relationships, such as 'part of' and 'overlapping'. Until now, no clustering algorithm can generate gene clusters within which the relationships can naturally reflect those of gene function categories in the GO hierarchy. The failure in resembling the relationships may reduce the confidence of clustering in gene function prediction. In this paper, we present a new clustering technique, Smart Hierarchical Tendency Preserving clustering (SHTP-clustering), based on a bicluster model, Tendency Preserving cluster (TP-Cluster). By directly incorporating Gene Ontology information into the clustering process, the SHTP-clustering algorithm yields a TP-cluster tree within which any subtree can be well mapped to a part of the GO hierarchy. Our experiments on yeast cell cycle data demonstrate that this method is efficient and effective in generating the biological relevant TP-Clusters.

聚类分析在基因表达谱分析和基因功能预测中的合理性是基于一个假设,即具有相似表达谱的基因可能与它们在生物活动中的功能有很强的相关性。基因本体(Gene Ontology, GO)已成为组织基因功能分类的公认标准。GO中不同的基因功能类别可以有非常复杂的关系,例如“部分”和“重叠”。到目前为止,还没有一种聚类算法能够生成能够自然反映GO层次结构中基因功能类别关系的基因聚类。关系不相似会降低聚类在基因功能预测中的可信度。本文在趋势保持聚类(TP-Cluster)双聚类模型的基础上,提出了一种新的聚类技术——智能分层趋势保持聚类(SHTP-clustering)。通过将基因本体信息直接整合到聚类过程中,shtp -聚类算法产生了一个tp -聚类树,其中的任何子树都可以很好地映射到GO层次结构的一部分。我们对酵母细胞周期数据的实验表明,该方法在生成生物学相关的tp簇方面是高效和有效的。
{"title":"Gene Ontology friendly biclustering of expression profiles.","authors":"Jinze Liu,&nbsp;Wei Wang,&nbsp;Jiong Yang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The soundness of clustering in the analysis of gene expression profiles and gene function prediction is based on the hypothesis that genes with similar expression profiles may imply strong correlations with their functions in the biological activities. Gene Ontology (GO) has become a well accepted standard in organizing gene function categories. Different gene function categories in GO can have very sophisticated relationships, such as 'part of' and 'overlapping'. Until now, no clustering algorithm can generate gene clusters within which the relationships can naturally reflect those of gene function categories in the GO hierarchy. The failure in resembling the relationships may reduce the confidence of clustering in gene function prediction. In this paper, we present a new clustering technique, Smart Hierarchical Tendency Preserving clustering (SHTP-clustering), based on a bicluster model, Tendency Preserving cluster (TP-Cluster). By directly incorporating Gene Ontology information into the clustering process, the SHTP-clustering algorithm yields a TP-cluster tree within which any subtree can be well mapped to a part of the GO hierarchy. Our experiments on yeast cell cycle data demonstrate that this method is efficient and effective in generating the biological relevant TP-Clusters.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"436-47"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AZuRE, a scalable system for automated term disambiguation of gene and protein names. AZuRE,一个可扩展的系统,用于基因和蛋白质名称的自动术语消歧。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332454
Raf M Podowski, John G Cleary, Nicholas T Goncharoff, Gregory Amoutzias, William S Hayes

Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system is described which is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen MEDLINE abstracts. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7,344 produced good quality models (F-measure > 0.7, nearly 60% of which were > 0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.

由于缺乏标准的基因和蛋白质命名惯例,研究人员忍受了长时间的、有时毫无结果的文献搜索。描述了一个系统,该系统能够在以前未见过的MEDLINE摘要中自动将基因名称分配给它们的LocusLink ID (LLID)。该系统基于监督学习,并为每个LLID建立一个模型。所有llid的训练集自动从LocusLink和SwissProt数据库中的MEDLINE参考文献中提取。对所有20,546个具有llid的人类基因的性能进行了验证。其中,7344个产生了高质量的模型(f值> 0.7,其中近60% > 0.9),13202个没有,主要是由于已知文献参考数量不足。一组66个基因的MEDLINE文档的手工验证与系统的内部准确性评估一致。结论是,使用可扩展的自动化技术可以实现高质量的基因消歧。
{"title":"AZuRE, a scalable system for automated term disambiguation of gene and protein names.","authors":"Raf M Podowski,&nbsp;John G Cleary,&nbsp;Nicholas T Goncharoff,&nbsp;Gregory Amoutzias,&nbsp;William S Hayes","doi":"10.1109/csb.2004.1332454","DOIUrl":"https://doi.org/10.1109/csb.2004.1332454","url":null,"abstract":"<p><p>Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system is described which is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen MEDLINE abstracts. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7,344 produced good quality models (F-measure > 0.7, nearly 60% of which were > 0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"415-24"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332454","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. IEEE Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1