首页 > 最新文献

Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)最新文献

英文 中文
Mining genome variation to associate disease with transcription factor binding site alteration 挖掘基因组变异将疾病与转录因子结合位点改变联系起来
J. Ponomarenko, T. Merkulova, G. Orlova, E. Gorshkova, Oleg N. Fokin, M. Ponomarenko
During the post genome era, single nucleotide polymorphism (SNP) analysis becomes the crossroad of bioinformatics, bioengineering and human health care. We have developed a data mining system, rSNP-Guide, http://wwmgs.bionet.nsc.ru/mgs/systems/rsnp/, devoted to predict the transcription factor (TF) binding sites on DNA, alterations of which are associated with disease. rSNP-Guide formalizes the disease-referred experimental data on the alterations in the DNA binding to unknown TF, estimates the abilities of the DNA with mutations associated with disease to bind to each known TFs examined so that to separate one of them, which TF site is altered by the mutations in the best consistence with that of the unknown TF experimentally associated with diseases. The rSNP-Guide has been control tested on the SNPs with known site-disease relationships. Two TF sites associated with diseases were predicted and confirmed experimentally, namely: GATA site in K-ras gene (lung tumor) and YY1 site in TDO2 gene (mental disorders).
在后基因组时代,单核苷酸多态性(SNP)分析成为生物信息学、生物工程和人类卫生保健的十字路口。我们已经开发了一个数据挖掘系统,rSNP-Guide, http://wwmgs.bionet.nsc.ru/mgs/systems/rsnp/,致力于预测DNA上的转录因子(TF)结合位点,其改变与疾病相关。rSNP-Guide将与疾病相关的DNA与未知TF结合改变的实验数据正式化,估计与疾病相关突变的DNA与所检查的每个已知TF结合的能力,以便分离出其中一个,哪个TF位点被突变改变,与实验中与疾病相关的未知TF的位点最一致。rSNP-Guide已对已知位点-疾病关系的snp进行了对照测试。预测并实验证实了两个与疾病相关的TF位点,分别是:K-ras基因GATA位点(肺肿瘤)和TDO2基因YY1位点(精神障碍)。
{"title":"Mining genome variation to associate disease with transcription factor binding site alteration","authors":"J. Ponomarenko, T. Merkulova, G. Orlova, E. Gorshkova, Oleg N. Fokin, M. Ponomarenko","doi":"10.1109/BIBE.2001.974424","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974424","url":null,"abstract":"During the post genome era, single nucleotide polymorphism (SNP) analysis becomes the crossroad of bioinformatics, bioengineering and human health care. We have developed a data mining system, rSNP-Guide, http://wwmgs.bionet.nsc.ru/mgs/systems/rsnp/, devoted to predict the transcription factor (TF) binding sites on DNA, alterations of which are associated with disease. rSNP-Guide formalizes the disease-referred experimental data on the alterations in the DNA binding to unknown TF, estimates the abilities of the DNA with mutations associated with disease to bind to each known TFs examined so that to separate one of them, which TF site is altered by the mutations in the best consistence with that of the unknown TF experimentally associated with diseases. The rSNP-Guide has been control tested on the SNPs with known site-disease relationships. Two TF sites associated with diseases were predicted and confirmed experimentally, namely: GATA site in K-ras gene (lung tumor) and YY1 site in TDO2 gene (mental disorders).","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114497348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating co-regulated gene-groups and pair-wise genome comparisons to automate reconstruction of microbial pathways 整合共调控基因组和成对基因组比较,自动重建微生物途径
A. Bansal
This paper extends previously described automated techniques by automatically integrating the information about automatically derived co-transcribed gene-groups, functionally similar gene-groups derived using automated pair-wise genome comparisons and automatically derived orthologs (functionally equivalent genes) to derive microbial metabolic pathways. The method integrates automatically derived co-transcribed gene-groups with orthologous and homologous gene-groups (http://www.mcs.kent.edu//spl sim/arvind/orthos.html), the biochemical pathway template available at the KEGG database. (http://www.genome.ad.jp), the enzyme information derived from the SwissProt enzyme database (http://expasy.hcuge.ch/) and Ligand database (http://www.genome.ad.jp). The technique refines existing pathways (based upon network of reactions of enzymes) by associating corresponding non-enzymatic, regulatory, and cotranscribed proteins to enzymes. The technique has been illustrated by deriving a major pathway of M. tuberculosis by comparison with seven microbial genomes including E coli and B. subtilis - two microbes well explored in wet laboratories.
本文扩展了先前描述的自动化技术,通过自动集成有关自动衍生共转录基因组的信息,使用自动成对基因组比较衍生的功能相似基因组和自动衍生同源基因(功能等效基因)来推导微生物代谢途径。该方法将自动衍生的共转录基因组与同源和同源基因组(http://www.mcs.kent.edu//spl sim/arvind/ orths .html)整合在一起,生化途径模板可在KEGG数据库中获得。(http://www.genome.ad.jp),酶信息来源于SwissProt酶数据库(http://expasy.hcuge.ch/)和Ligand数据库(http://www.genome.ad.jp)。该技术通过将相应的非酶、调节蛋白和共转录蛋白与酶结合,改进了现有的途径(基于酶的反应网络)。通过与大肠杆菌和枯草芽孢杆菌等7种微生物基因组进行比较,得出了结核分枝杆菌的主要途径,从而证明了该技术的可行性。
{"title":"Integrating co-regulated gene-groups and pair-wise genome comparisons to automate reconstruction of microbial pathways","authors":"A. Bansal","doi":"10.1109/BIBE.2001.974431","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974431","url":null,"abstract":"This paper extends previously described automated techniques by automatically integrating the information about automatically derived co-transcribed gene-groups, functionally similar gene-groups derived using automated pair-wise genome comparisons and automatically derived orthologs (functionally equivalent genes) to derive microbial metabolic pathways. The method integrates automatically derived co-transcribed gene-groups with orthologous and homologous gene-groups (http://www.mcs.kent.edu//spl sim/arvind/orthos.html), the biochemical pathway template available at the KEGG database. (http://www.genome.ad.jp), the enzyme information derived from the SwissProt enzyme database (http://expasy.hcuge.ch/) and Ligand database (http://www.genome.ad.jp). The technique refines existing pathways (based upon network of reactions of enzymes) by associating corresponding non-enzymatic, regulatory, and cotranscribed proteins to enzymes. The technique has been illustrated by deriving a major pathway of M. tuberculosis by comparison with seven microbial genomes including E coli and B. subtilis - two microbes well explored in wet laboratories.","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126202223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Development of a robotic device for MRI-guided interventions in the breast 一种用于核磁共振引导乳腺干预的机器人装置的开发
N. Tsekos, J. Shudy, E. Yacoub, Panagiotis V. Tsekos, I. Koutlas
The objective of this work was to develop a robotic apparatus for MR-guided biopsy and therapeutic interventions in the breast. This device facilitates (i) conditioning of the breast, by setting the orientation and degree of compression, (ii) definition of the interventional probe trajectory, by setting the height and angulation of a probe guide and (iii) positioning of an interventional probe, by setting the depth of insertion. The apparatus is fitted with appropriate computer-controlled degrees of freedom for optimal approach for delivering and monitoring interventions with MR-guidance, such as diagnostic or therapeutic trans-cannula or subcutaneous minimally invasive procedures. The entire device is constructed of MR compatible material, i.e. non-magnetic and non-conductive, to eliminate artifacts and distortion of the local magnetic field. The apparatus is remotely controlled by means of ultrasonic actuators and a graphics user interface, providing real-time MR-guided planning and monitoring of the operation.
这项工作的目的是开发一种用于磁共振引导活检和乳腺治疗干预的机器人装置。该装置通过设置方向和压缩程度来促进(i)乳房的调节,(ii)通过设置探头导轨的高度和角度来定义介入探头轨迹,以及(iii)通过设置插入深度来定位介入探头。该设备配备了适当的计算机控制自由度,以实现通过核磁共振引导提供和监测干预措施的最佳方法,例如诊断或治疗经套管或皮下微创手术。整个装置由MR兼容材料构成,即非磁性和非导电,以消除局部磁场的伪影和畸变。该设备通过超声波致动器和图形用户界面进行远程控制,提供实时核磁共振引导计划和操作监控。
{"title":"Development of a robotic device for MRI-guided interventions in the breast","authors":"N. Tsekos, J. Shudy, E. Yacoub, Panagiotis V. Tsekos, I. Koutlas","doi":"10.1109/BIBE.2001.974430","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974430","url":null,"abstract":"The objective of this work was to develop a robotic apparatus for MR-guided biopsy and therapeutic interventions in the breast. This device facilitates (i) conditioning of the breast, by setting the orientation and degree of compression, (ii) definition of the interventional probe trajectory, by setting the height and angulation of a probe guide and (iii) positioning of an interventional probe, by setting the depth of insertion. The apparatus is fitted with appropriate computer-controlled degrees of freedom for optimal approach for delivering and monitoring interventions with MR-guidance, such as diagnostic or therapeutic trans-cannula or subcutaneous minimally invasive procedures. The entire device is constructed of MR compatible material, i.e. non-magnetic and non-conductive, to eliminate artifacts and distortion of the local magnetic field. The apparatus is remotely controlled by means of ultrasonic actuators and a graphics user interface, providing real-time MR-guided planning and monitoring of the operation.","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115071136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
TRUFFLER: programs to study microbial community composition and flux from fluorescent DNA fingerprinting data TRUFFLER:从荧光DNA指纹数据中研究微生物群落组成和通量的程序
M. Wise, A. Osborn
Terminal-restriction fragment length polymorphism (T-RFLP) and length heterogeneity-polymerase chain reaction (LH-PCR) are DNA fingerprinting technologies which use PCR amplification of a gene of interest e.g. small subunit rRNA gene, to study microbial community structure and dynamics. Either one or both of the forward and reverse strand primers used to amplify the gene are fluorescently labelled. The products of restriction endonuclease digestion are electrophoresed with an automated sequencer that detects only the terminal (labelled) restriction fragments (T-RFs). In LH-PCR, products are electrophoresed without digestion, with different fragment lengths being due to inherent variation in the amplified sequence. A novel software system, TRUFFLER, has been developed to mimic this process in silico allowing comparison of experimental data against databases of theoretically determined T-RFs. As a given combination of forward and reverse primers and restriction endonuclease can yield identical T-RFs across a number of species, combinations of different endonucleases (and/or primers) are typically used In addition to fragment length data, data on fluorescence levels is also available. Computationally, this can be viewed as a constraint satisfaction problem which can be solved to allow identification of the dominant members of the microbial community, often down to individual species or at least genus level, and their relative proportions.
末端限制性片段长度多态性(T-RFLP)和长度异质性聚合酶链反应(LH-PCR)是利用PCR扩增感兴趣的基因(如小亚基rRNA基因)来研究微生物群落结构和动态的DNA指纹技术。用于扩增基因的正链和反链引物中的一个或两个都被荧光标记。限制性内切酶酶切的产物用自动测序仪电泳,该测序仪仅检测末端(标记的)限制性内切片段(T-RFs)。在LH-PCR中,产物在没有消化的情况下进行电泳,不同的片段长度是由于扩增序列的固有差异。一种新的软件系统,TRUFFLER,已经被开发出来模拟这一过程,允许将实验数据与理论确定的T-RFs数据库进行比较。由于正向和反向引物和限制性内切酶的给定组合可以在许多物种中产生相同的t - rf,因此通常使用不同内切酶(和/或引物)的组合除了片段长度数据外,还可以获得荧光水平的数据。在计算上,这可以被看作是一个约束满足问题,它可以被解决,以允许识别微生物群落的优势成员,通常下降到单个物种或至少属水平,以及它们的相对比例。
{"title":"TRUFFLER: programs to study microbial community composition and flux from fluorescent DNA fingerprinting data","authors":"M. Wise, A. Osborn","doi":"10.1109/BIBE.2001.974421","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974421","url":null,"abstract":"Terminal-restriction fragment length polymorphism (T-RFLP) and length heterogeneity-polymerase chain reaction (LH-PCR) are DNA fingerprinting technologies which use PCR amplification of a gene of interest e.g. small subunit rRNA gene, to study microbial community structure and dynamics. Either one or both of the forward and reverse strand primers used to amplify the gene are fluorescently labelled. The products of restriction endonuclease digestion are electrophoresed with an automated sequencer that detects only the terminal (labelled) restriction fragments (T-RFs). In LH-PCR, products are electrophoresed without digestion, with different fragment lengths being due to inherent variation in the amplified sequence. A novel software system, TRUFFLER, has been developed to mimic this process in silico allowing comparison of experimental data against databases of theoretically determined T-RFs. As a given combination of forward and reverse primers and restriction endonuclease can yield identical T-RFs across a number of species, combinations of different endonucleases (and/or primers) are typically used In addition to fragment length data, data on fluorescence levels is also available. Computationally, this can be viewed as a constraint satisfaction problem which can be solved to allow identification of the dominant members of the microbial community, often down to individual species or at least genus level, and their relative proportions.","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132047272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Comparing algorithms for large-scale sequence analysis 大规模序列分析的算法比较
Hadon Nash, Douglas Blair, J. Grefenstette
The first step in homology analysis is usually the comparison of sequences by similarity search. The explosive growth of genomic databases makes it increasingly important to develop more rapid approaches to the comparison of large sequence databases while using the most sensitive methods available. This paper explores the consequences of this trade-off, comparing the results produced by BLAST and Smith-Waterman on genoinic- scale sequence searches. Stich comparisons are now possible thanks to the development of novel distributed computing platforms. This study uses the Parabon Frontier/sup TM/ Internet computing platform, which enables the effective use of the vast supply of idle computer cycles on the Internet for high-performance computing. We have ported both Smith-Waterman and BLAST to the Frontier platform, enabling the efficient use of these algorithms on large sequence databases. In addition, we present a novel visualization tool along with quantitative metrics for comparing the results of alternative sequence alignment algorithms. Our results compare the sensitivity of Smith-Waterman and BLAST for identifying homologies on proteome databases.
同源性分析的第一步通常是通过相似性搜索对序列进行比较。基因组数据库的爆炸性增长使得开发更快速的方法来比较大型序列数据库,同时使用最灵敏的方法变得越来越重要。本文探讨了这种权衡的后果,比较了BLAST和Smith-Waterman在基因级序列搜索上产生的结果。由于新型分布式计算平台的发展,这种比较现在成为可能。本研究采用Parabon Frontier/sup TM/ Internet计算平台,能够有效利用Internet上大量的空闲计算机周期供给进行高性能计算。我们已经将Smith-Waterman和BLAST移植到Frontier平台上,从而能够在大型序列数据库上有效地使用这些算法。此外,我们提出了一种新的可视化工具,以及用于比较不同序列比对算法结果的定量指标。我们的研究结果比较了Smith-Waterman和BLAST在蛋白质组数据库中识别同源性的敏感性。
{"title":"Comparing algorithms for large-scale sequence analysis","authors":"Hadon Nash, Douglas Blair, J. Grefenstette","doi":"10.1109/BIBE.2001.974416","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974416","url":null,"abstract":"The first step in homology analysis is usually the comparison of sequences by similarity search. The explosive growth of genomic databases makes it increasingly important to develop more rapid approaches to the comparison of large sequence databases while using the most sensitive methods available. This paper explores the consequences of this trade-off, comparing the results produced by BLAST and Smith-Waterman on genoinic- scale sequence searches. Stich comparisons are now possible thanks to the development of novel distributed computing platforms. This study uses the Parabon Frontier/sup TM/ Internet computing platform, which enables the effective use of the vast supply of idle computer cycles on the Internet for high-performance computing. We have ported both Smith-Waterman and BLAST to the Frontier platform, enabling the efficient use of these algorithms on large sequence databases. In addition, we present a novel visualization tool along with quantitative metrics for comparing the results of alternative sequence alignment algorithms. Our results compare the sensitivity of Smith-Waterman and BLAST for identifying homologies on proteome databases.","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114273367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Searching online journals for fluorescence microscope images depicting protein subcellular location patterns 搜索在线期刊荧光显微镜图像描绘蛋白质亚细胞定位模式
R. Murphy, M. Velliste, Jie Yao, G. Porreca
There is extensive interest in automating the collection, organization and analysis of biological data. Data in the form of images present special challenges for such efforts. Since fluorescence microscope images are a primary source of information about the location of proteins within cells, we have set as a long-term goal the building of a knowledge base system that can interpret such images in online journals. To this end, we first developed a robot that searches online journals and finds fluorescence microscope images of individual cells. We then characterized the applicability of pattern classification methods we have previously used on images obtained under controlled conditions to images from different sources and to images subjected to manipulations commonly performed during publication. The results indicate the feasibility of developing search engines to find fluorescence microscope images depicting particular subcellular patterns.
人们对生物数据的自动化收集、组织和分析有着广泛的兴趣。图像形式的数据对这种努力提出了特殊的挑战。由于荧光显微镜图像是关于细胞内蛋白质位置的主要信息来源,我们已经设定了一个长期目标,即建立一个知识库系统,可以在在线期刊上解释这些图像。为此,我们首先开发了一个机器人,它可以搜索在线期刊并找到单个细胞的荧光显微镜图像。然后,我们描述了我们之前在受控条件下获得的图像上使用的模式分类方法对来自不同来源的图像以及在出版期间通常进行的操作的图像的适用性。结果表明,开发搜索引擎来寻找描述特定亚细胞模式的荧光显微镜图像是可行的。
{"title":"Searching online journals for fluorescence microscope images depicting protein subcellular location patterns","authors":"R. Murphy, M. Velliste, Jie Yao, G. Porreca","doi":"10.1109/BIBE.2001.974420","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974420","url":null,"abstract":"There is extensive interest in automating the collection, organization and analysis of biological data. Data in the form of images present special challenges for such efforts. Since fluorescence microscope images are a primary source of information about the location of proteins within cells, we have set as a long-term goal the building of a knowledge base system that can interpret such images in online journals. To this end, we first developed a robot that searches online journals and finds fluorescence microscope images of individual cells. We then characterized the applicability of pattern classification methods we have previously used on images obtained under controlled conditions to images from different sources and to images subjected to manipulations commonly performed during publication. The results indicate the feasibility of developing search engines to find fluorescence microscope images depicting particular subcellular patterns.","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114826731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
A fast pruning algorithm for optimal sequence alignment 最优序列比对的快速剪枝算法
Aaron Davidson
Sequence alignment is an important operation in computational biology. Both dynamic programming and A* heuristic search algorithms for optimal sequence alignment are discussed and evaluated Presented here are two new algorithms for optimal pairwise sequence alignment which outperform traditional methods on very large problem instances (hundreds of thousands of characters, for example). The technique combines the benefits of dynamic programming and A* heuristic search, with a minimal amount of additional overhead. The dynamic programming matrix is traversed along antidiagonals, bounding the computation to exclude portions of the matrix that cannot contain optimal paths. An admissible heuristic assists in pruning away unnecessary areas of the matrix, while preserving optimal solutions for any given scoring function. Since memory requirements are a major concern for large sequence alignment problems, it is shown how the standard algorithm (requiring quadratic space) can be reformulated as a divide and conquer algorithm (requiring only linear space, at the cost of some recomputuation).
序列比对是计算生物学中的一项重要操作。讨论并评价了动态规划算法和A*启发式搜索算法两种最优序列比对算法。本文提出了两种新的最优配对序列比对算法,它们在非常大的问题实例(例如数十万个字符)上优于传统方法。该技术结合了动态规划和A*启发式搜索的优点,并且额外开销很小。沿着反对角线遍历动态规划矩阵,限制计算以排除不包含最优路径的矩阵部分。一个可接受的启发式有助于修剪掉矩阵中不必要的区域,同时为任何给定的评分函数保留最优解。由于内存需求是大型序列对齐问题的主要关注点,因此展示了如何将标准算法(需要二次空间)重新表述为分治算法(只需要线性空间,以一些重新计算为代价)。
{"title":"A fast pruning algorithm for optimal sequence alignment","authors":"Aaron Davidson","doi":"10.1109/BIBE.2001.974411","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974411","url":null,"abstract":"Sequence alignment is an important operation in computational biology. Both dynamic programming and A* heuristic search algorithms for optimal sequence alignment are discussed and evaluated Presented here are two new algorithms for optimal pairwise sequence alignment which outperform traditional methods on very large problem instances (hundreds of thousands of characters, for example). The technique combines the benefits of dynamic programming and A* heuristic search, with a minimal amount of additional overhead. The dynamic programming matrix is traversed along antidiagonals, bounding the computation to exclude portions of the matrix that cannot contain optimal paths. An admissible heuristic assists in pruning away unnecessary areas of the matrix, while preserving optimal solutions for any given scoring function. Since memory requirements are a major concern for large sequence alignment problems, it is shown how the standard algorithm (requiring quadratic space) can be reformulated as a divide and conquer algorithm (requiring only linear space, at the cost of some recomputuation).","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115258874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
GIMS-a data warehouse for storage and analysis of genome sequence and functional data gims -用于存储和分析基因组序列和功能数据的数据仓库
M. Cornell, N. Paton, Shengli Wu, C. Goble, Crispin J. Miller, Paul Kirby, K. Eilbeck, A. Brass, A. Hayes, S. Oliver
Effective analysis of genome sequences and associated functional data requires access to many different kinds of biological information. For example, when analysing gene expression data, it may be useful to have access to the sequences upstream of the genes, or to the cellular location of their protein products. Such information is currently stored in different formats at different sites in a way that does not readily allow integrated analyses. The Genome Information Management System (GIMS) is an object database that integrates genome sequence data with functional data on the transcriptome and on protein-protein interactions in a single data warehouse. We have used GIMS to store the Saccharomyces cerevisiae (yeast) genome and to demonstrate how the integrated storage of diverse kinds of genomic data can be beneficial for analysing data using context-rich queries and analyses. GIMS allows data to be stored in a way that reflects the underlying mechanisms in the organism, and permits complex questions to be asked of the data. This paper provides an overview of the GIMS system and describes some analyses that illustrate its use for analysing functional data sets for S. cerevisiae.
基因组序列和相关功能数据的有效分析需要访问许多不同种类的生物信息。例如,在分析基因表达数据时,获得基因上游的序列或其蛋白质产物的细胞位置可能是有用的。这些信息目前以不同的格式存储在不同的地点,不容易进行综合分析。基因组信息管理系统(GIMS)是一个对象数据库,它将基因组序列数据与转录组和蛋白质-蛋白质相互作用的功能数据集成在一个数据仓库中。我们已经使用GIMS来存储酿酒酵母(酵母)基因组,并演示了多种基因组数据的集成存储如何有助于使用上下文丰富的查询和分析来分析数据。GIMS允许以反映生物体中潜在机制的方式存储数据,并允许对数据提出复杂的问题。本文概述了GIMS系统,并描述了一些分析,说明了它在分析酿酒酵母功能数据集方面的应用。
{"title":"GIMS-a data warehouse for storage and analysis of genome sequence and functional data","authors":"M. Cornell, N. Paton, Shengli Wu, C. Goble, Crispin J. Miller, Paul Kirby, K. Eilbeck, A. Brass, A. Hayes, S. Oliver","doi":"10.1109/BIBE.2001.974407","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974407","url":null,"abstract":"Effective analysis of genome sequences and associated functional data requires access to many different kinds of biological information. For example, when analysing gene expression data, it may be useful to have access to the sequences upstream of the genes, or to the cellular location of their protein products. Such information is currently stored in different formats at different sites in a way that does not readily allow integrated analyses. The Genome Information Management System (GIMS) is an object database that integrates genome sequence data with functional data on the transcriptome and on protein-protein interactions in a single data warehouse. We have used GIMS to store the Saccharomyces cerevisiae (yeast) genome and to demonstrate how the integrated storage of diverse kinds of genomic data can be beneficial for analysing data using context-rich queries and analyses. GIMS allows data to be stored in a way that reflects the underlying mechanisms in the organism, and permits complex questions to be asked of the data. This paper provides an overview of the GIMS system and describes some analyses that illustrate its use for analysing functional data sets for S. cerevisiae.","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129807557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Interrelated two-way clustering: an unsupervised approach for gene expression data analysis 相互关联的双向聚类:基因表达数据分析的无监督方法
Chun Tang, Li Zhang, A. Zhang, M. Ramanathan
DNA arrays can be used to measure the expression levels of thousands of genes simultaneously. Most research is focusing on interpretation of the meaning of the data. However, the majority of methods are supervised, with less attention having been paid to unsupervised approaches which are important when domain knowledge is incomplete or hard to obtain. In this paper we present a new framework for unsupervised analysis of gene expression data which applies an interrelated two-way clustering approach to the gene expression matrices. The goal of clustering is to find important gene patterns and perform cluster discovery on samples. The advantage of this approach is that we can dynamically use the relationships between the groups of genes and samples while iteratively clustering through both gene-dimension and sample-dimension. We illustrate the method on gene expression data from a study of multiple sclerosis patients. The experiments demonstrate the effectiveness of this approach.
DNA阵列可用于同时测量数千个基因的表达水平。大多数研究都集中在对数据含义的解释上。然而,大多数方法都是有监督的,当领域知识不完整或难以获得时,对非监督方法的关注较少。在本文中,我们提出了一种新的框架,用于基因表达数据的无监督分析,该框架将相互关联的双向聚类方法应用于基因表达矩阵。聚类的目标是找到重要的基因模式,并对样本进行聚类发现。该方法的优点是可以动态地利用基因组和样本之间的关系,同时通过基因维和样本维进行迭代聚类。我们从多发性硬化症患者的研究说明基因表达数据的方法。实验证明了该方法的有效性。
{"title":"Interrelated two-way clustering: an unsupervised approach for gene expression data analysis","authors":"Chun Tang, Li Zhang, A. Zhang, M. Ramanathan","doi":"10.1109/BIBE.2001.974410","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974410","url":null,"abstract":"DNA arrays can be used to measure the expression levels of thousands of genes simultaneously. Most research is focusing on interpretation of the meaning of the data. However, the majority of methods are supervised, with less attention having been paid to unsupervised approaches which are important when domain knowledge is incomplete or hard to obtain. In this paper we present a new framework for unsupervised analysis of gene expression data which applies an interrelated two-way clustering approach to the gene expression matrices. The goal of clustering is to find important gene patterns and perform cluster discovery on samples. The advantage of this approach is that we can dynamically use the relationships between the groups of genes and samples while iteratively clustering through both gene-dimension and sample-dimension. We illustrate the method on gene expression data from a study of multiple sclerosis patients. The experiments demonstrate the effectiveness of this approach.","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126636944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 189
An algebra for semantic interoperability of information sources 信息源的语义互操作性代数
P. Mitra, G. Wiederhold
Resolving heterogeneity among the various biological information systems is a crucial problem if we wish to gain value from the many distributed resources available to us. For example, information from multiple protein databases (e.g., Swiss-Prot and PDB) might need to be composed to answer queries posed by end-users. Problems of heterogeneity in hardware, operating systems, interfaces and data structures have been widely addressed, but issues of diverse semantics have been handled mainly in an ad-hoc fashion. This paper highlights the ONION (ONtology compositION) system that enables semantic interoperation among various information sources by articulating the ontologies associated with them. An articulation focuses on the semantically relevant intersection of information resources. Although the generation of articulations (semantic correspondences between the ontologies) cannot be fully automated, we take a semi-automatic approach. ONION uses heuristic algorithms for the automatic generation of suggested articulations. This paper outlines an algebra for ontology composition based on their articulations. We show the properties of the algebraic operators and how they depend upon the articulation functions that generate the articulations. Query optimization is enabled based on the properties of the algebraic operators.
如果我们希望从众多可用的分布式资源中获得价值,解决各种生物信息系统之间的异质性是一个关键问题。例如,可能需要组合来自多个蛋白质数据库(例如Swiss-Prot和PDB)的信息来回答最终用户提出的查询。硬件、操作系统、接口和数据结构中的异构问题已经得到了广泛的解决,但是不同语义的问题主要以一种特别的方式处理。本文重点介绍了ONION(本体组合)系统,该系统通过阐明与之相关的本体来实现各种信息源之间的语义互操作。衔接关注的是信息资源在语义上相关的交集。尽管衔接(本体之间的语义对应)的生成不能完全自动化,但我们采用了半自动的方法。ONION使用启发式算法自动生成建议的发音。本文提出了一种基于它们铰接的本体合成代数。我们展示了代数算子的性质,以及它们如何依赖于产生关节的关节函数。查询优化是基于代数运算符的属性来启用的。
{"title":"An algebra for semantic interoperability of information sources","authors":"P. Mitra, G. Wiederhold","doi":"10.1109/BIBE.2001.974427","DOIUrl":"https://doi.org/10.1109/BIBE.2001.974427","url":null,"abstract":"Resolving heterogeneity among the various biological information systems is a crucial problem if we wish to gain value from the many distributed resources available to us. For example, information from multiple protein databases (e.g., Swiss-Prot and PDB) might need to be composed to answer queries posed by end-users. Problems of heterogeneity in hardware, operating systems, interfaces and data structures have been widely addressed, but issues of diverse semantics have been handled mainly in an ad-hoc fashion. This paper highlights the ONION (ONtology compositION) system that enables semantic interoperation among various information sources by articulating the ontologies associated with them. An articulation focuses on the semantically relevant intersection of information resources. Although the generation of articulations (semantic correspondences between the ontologies) cannot be fully automated, we take a semi-automatic approach. ONION uses heuristic algorithms for the automatic generation of suggested articulations. This paper outlines an algebra for ontology composition based on their articulations. We show the properties of the algebraic operators and how they depend upon the articulation functions that generate the articulations. Query optimization is enabled based on the properties of the algebraic operators.","PeriodicalId":405124,"journal":{"name":"Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)","volume":"98 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133054682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
期刊
Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1