首页 > 最新文献

Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.最新文献

英文 中文
Uses of multiagents systems for simulation of MAPK pathway 使用多智能体系统模拟MAPK通路
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188982
G. Querrec, V. Rodin, J. Abgrall, S. Kerdélo, J. Tisseau
Since emergence of molecular biology, one has improved knowledge about intracellular networks controlling cell behavior. In parallel, advances in mathematic and computer science allow one to simulate such complex phenomena. Moreover, most methods need a global resolution of the system which makes it difficult to be created and modified. We proposed, in this study, a distributed approach by multiagent system (MAS), to simulate the MAPK pathway. Our results show that such simulation is possible and allows "in virtuo" experimentation, i.e. model perturbation during its execution.
自从分子生物学出现以来,人们对控制细胞行为的细胞内网络有了更深入的了解。与此同时,数学和计算机科学的进步使人们能够模拟这种复杂的现象。此外,大多数方法需要系统的全局解析,这使得它难以创建和修改。在这项研究中,我们提出了一种多智能体系统(MAS)的分布式方法来模拟MAPK通路。我们的结果表明,这样的模拟是可能的,并允许“虚拟”实验,即在其执行过程中的模型扰动。
{"title":"Uses of multiagents systems for simulation of MAPK pathway","authors":"G. Querrec, V. Rodin, J. Abgrall, S. Kerdélo, J. Tisseau","doi":"10.1109/BIBE.2003.1188982","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188982","url":null,"abstract":"Since emergence of molecular biology, one has improved knowledge about intracellular networks controlling cell behavior. In parallel, advances in mathematic and computer science allow one to simulate such complex phenomena. Moreover, most methods need a global resolution of the system which makes it difficult to be created and modified. We proposed, in this study, a distributed approach by multiagent system (MAS), to simulate the MAPK pathway. Our results show that such simulation is possible and allows \"in virtuo\" experimentation, i.e. model perturbation during its execution.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129101263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Nodal distance algorithm: calculating a phylogenetic tree comparison metric 节点距离算法:计算一个系统发育树比较度量
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188933
John Bluis, Dong-Guk Shin
Maintaining a phylogenetic relationship repository requires the development of tools that are useful for mining the data stored in the repository. One way to query a database of phylogenetic information would be to compare phylogenetic trees. Because the only existing tree comparison methods are computationally intensive, this is not a reasonable task. Presented here is the nodal distance algorithm which has significantly less computation time than the most widely used comparison method, the partition metric. When the metric is calculated for trees where one species has been repositioned to a distant part of the tree no further computation is required as is needed for the partition metric. The nodal distance algorithm provides a method for comparing large sets of phylogenetic trees in a reasonable amount of time.
维护系统发育关系存储库需要开发用于挖掘存储在存储库中的数据的工具。查询系统发育信息数据库的一种方法是比较系统发育树。因为现有的树比较方法都是计算密集型的,所以这不是一个合理的任务。这里提出的是节点距离算法,它比最广泛使用的比较方法分区度量的计算时间要少得多。当一个物种被重新定位到树的较远部分的树木计算度量时,不需要进一步的计算,因为需要分区度量。节点距离算法提供了一种在合理的时间内比较大的系统发育树集的方法。
{"title":"Nodal distance algorithm: calculating a phylogenetic tree comparison metric","authors":"John Bluis, Dong-Guk Shin","doi":"10.1109/BIBE.2003.1188933","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188933","url":null,"abstract":"Maintaining a phylogenetic relationship repository requires the development of tools that are useful for mining the data stored in the repository. One way to query a database of phylogenetic information would be to compare phylogenetic trees. Because the only existing tree comparison methods are computationally intensive, this is not a reasonable task. Presented here is the nodal distance algorithm which has significantly less computation time than the most widely used comparison method, the partition metric. When the metric is calculated for trees where one species has been repositioned to a distant part of the tree no further computation is required as is needed for the partition metric. The nodal distance algorithm provides a method for comparing large sets of phylogenetic trees in a reasonable amount of time.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123026454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
An open multiple instance learning framework and its application in drug activity prediction problems 开放式多实例学习框架及其在药物活性预测中的应用
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188929
Xin Huang, Shu‐Ching Chen, M. Shyu
In this paper, a powerful open Multiple Instance Learning (MIL) framework is proposed. Such an open framework is powerful since different sub-methods can be plugged into the framework to generate different specific Multiple Instance Learning algorithms. In our proposed framework, the Multiple Instance Learning problem is first converted to an unconstrained optimization problem by the Minimum Square Error (MSE) criterion, and then the framework can be constructed with an open form of hypothesis and gradient search method. The proposed Multiple Instance Learning framework is applied to the drug activity problems in bioinformatics applications. Specifically, experiments are conducted on the Musk-I dataset to predict the binding activity of drug molecules. In the experiments, an algorithm with the exponential hypothesis model and the Quasi-Newton method is embedded into our proposed framework. We compare our proposed framework with other existing algorithms and the experimental results show that our proposed framework yields a good accuracy of classification, which demonstrates the feasibility and effectiveness of our framework.
本文提出了一种功能强大的开放式多实例学习框架。这样一个开放的框架是强大的,因为不同的子方法可以插入到框架中来生成不同的特定的多实例学习算法。在该框架中,首先通过最小二乘误差(MSE)准则将多实例学习问题转化为无约束优化问题,然后采用开放的假设形式和梯度搜索方法构建框架。提出的多实例学习框架应用于生物信息学应用中的药物活性问题。具体而言,在Musk-I数据集上进行实验,预测药物分子的结合活性。在实验中,将指数假设模型和准牛顿方法嵌入到我们提出的框架中。实验结果表明,本文提出的框架具有良好的分类精度,证明了该框架的可行性和有效性。
{"title":"An open multiple instance learning framework and its application in drug activity prediction problems","authors":"Xin Huang, Shu‐Ching Chen, M. Shyu","doi":"10.1109/BIBE.2003.1188929","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188929","url":null,"abstract":"In this paper, a powerful open Multiple Instance Learning (MIL) framework is proposed. Such an open framework is powerful since different sub-methods can be plugged into the framework to generate different specific Multiple Instance Learning algorithms. In our proposed framework, the Multiple Instance Learning problem is first converted to an unconstrained optimization problem by the Minimum Square Error (MSE) criterion, and then the framework can be constructed with an open form of hypothesis and gradient search method. The proposed Multiple Instance Learning framework is applied to the drug activity problems in bioinformatics applications. Specifically, experiments are conducted on the Musk-I dataset to predict the binding activity of drug molecules. In the experiments, an algorithm with the exponential hypothesis model and the Quasi-Newton method is embedded into our proposed framework. We compare our proposed framework with other existing algorithms and the experimental results show that our proposed framework yields a good accuracy of classification, which demonstrates the feasibility and effectiveness of our framework.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121361398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Vessel extraction techniques and algorithms: a survey 血管提取技术与算法综述
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188957
C. Kirbas, Francis K. H. Quek
Vessel segmentation algorithms are critical components of circulatory blood vessel analysis systems. We present a survey of vessel extraction techniques and algorithms, putting the various approaches and techniques in perspective by means of a classification of the existing research. While we target mainly the extraction of blood vessels, neurovascular structure in particular we also review some of the segmentation methods for the tubular objects that show similar characteristics to vessels. We divide vessel segmentation algorithms and techniques into six main categories: (1) pattern recognition techniques, (2) model-based approaches, (3) tracking-based approaches, (4) artificial intelligence-based approaches, (5) neural network-based approaches, and (6) miscellaneous tube-like object detection approaches. Some of these categories are further divided into sub-categories. A table compares the papers against such criteria as dimensionality, input type, preprocessing, user interaction, and result type.
血管分割算法是循环血管分析系统的关键组成部分。我们提出了血管提取技术和算法的调查,把各种方法和技术的观点,通过现有的研究分类的手段。虽然我们主要针对血管,特别是神经血管结构的提取,但我们也回顾了一些与血管具有相似特征的管状物体的分割方法。我们将血管分割算法和技术分为六大类:(1)模式识别技术,(2)基于模型的方法,(3)基于跟踪的方法,(4)基于人工智能的方法,(5)基于神经网络的方法,(6)杂项管状物体检测方法。其中一些类别又进一步分为子类别。一个表格根据诸如维度、输入类型、预处理、用户交互和结果类型等标准对论文进行比较。
{"title":"Vessel extraction techniques and algorithms: a survey","authors":"C. Kirbas, Francis K. H. Quek","doi":"10.1109/BIBE.2003.1188957","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188957","url":null,"abstract":"Vessel segmentation algorithms are critical components of circulatory blood vessel analysis systems. We present a survey of vessel extraction techniques and algorithms, putting the various approaches and techniques in perspective by means of a classification of the existing research. While we target mainly the extraction of blood vessels, neurovascular structure in particular we also review some of the segmentation methods for the tubular objects that show similar characteristics to vessels. We divide vessel segmentation algorithms and techniques into six main categories: (1) pattern recognition techniques, (2) model-based approaches, (3) tracking-based approaches, (4) artificial intelligence-based approaches, (5) neural network-based approaches, and (6) miscellaneous tube-like object detection approaches. Some of these categories are further divided into sub-categories. A table compares the papers against such criteria as dimensionality, input type, preprocessing, user interaction, and result type.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131809087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 164
A robotic device for minimally invasive breast interventions with real-time MRI guidance 一种实时MRI引导下的微创乳房介入机器人装置
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188946
B. Larson, N. Tsekos, A. Erdman
We have developed a device to perform minimally invasive interventions in the breast with realtime MRI guidance for the early detection and treatment of breast cancer. The device uses five computer-controlled degrees of freedom to perform minimally invasive interventions inside a closed MRI scanner. Typically the intervention would consist of a biopsy of the suspicious lesion for diagnosis, but may involve therapies to destroy or remove malignant tissue in the breast. The procedure proceeds with: (a) conditioning of the breast along a prescribed orientation, (b) definition of an insertion vector by its height and pitch angle, and (c) insertion into the breast. The entire device is made of materials compatible with MRI, avoiding artifacts and distortion of the local magnetic field. The device is remotely controlled via a graphical user interface. This is the first surgical robotic device to perform real-time MRI-guided breast interventions in the United States.
我们开发了一种设备,通过实时MRI指导对乳房进行微创干预,以早期发现和治疗乳腺癌。该设备使用五个计算机控制的自由度,在一个封闭的核磁共振扫描仪内进行微创干预。典型的干预措施包括对可疑病变进行活检以进行诊断,但也可能包括破坏或切除乳腺恶性组织的治疗。该程序进行:(a)乳房沿规定的方向调节,(b)根据其高度和俯仰角定义插入向量,(c)插入乳房。整个装置由与MRI兼容的材料制成,避免了伪影和局部磁场的畸变。该设备通过图形用户界面进行远程控制。这是第一个在美国进行实时核磁共振引导乳房干预的手术机器人设备。
{"title":"A robotic device for minimally invasive breast interventions with real-time MRI guidance","authors":"B. Larson, N. Tsekos, A. Erdman","doi":"10.1109/BIBE.2003.1188946","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188946","url":null,"abstract":"We have developed a device to perform minimally invasive interventions in the breast with realtime MRI guidance for the early detection and treatment of breast cancer. The device uses five computer-controlled degrees of freedom to perform minimally invasive interventions inside a closed MRI scanner. Typically the intervention would consist of a biopsy of the suspicious lesion for diagnosis, but may involve therapies to destroy or remove malignant tissue in the breast. The procedure proceeds with: (a) conditioning of the breast along a prescribed orientation, (b) definition of an insertion vector by its height and pitch angle, and (c) insertion into the breast. The entire device is made of materials compatible with MRI, avoiding artifacts and distortion of the local magnetic field. The device is remotely controlled via a graphical user interface. This is the first surgical robotic device to perform real-time MRI-guided breast interventions in the United States.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116604188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Enhanced biclustering on expression data 增强表达数据的双聚类
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188969
Jiong Yang, Haixun Wang, Wei Wang, Philip S. Yu
Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. The concept of bicluster was introduced by Cheng and Church (2000) to capture the coherence of a subset of genes and a subset of conditions. A set of heuristic algorithms were also designed to either find one bicluster or a set of biclusters, which consist of iterations of masking null values and discovered biclusters, coarse and fine node deletion, node addition, and the inclusion of inverted data. These heuristics inevitably suffer from some serious drawback. The masking of null values and discovered biclusters with random numbers may result in the phenomenon of random interference which in turn impacts the discovery of high quality biclusters. To address this issue and to further accelerate the biclustering process, we generalize the model of bicluster to incorporate null values and propose a probabilistic algorithm (FLOC) that can discover a set of k possibly overlapping biclusters simultaneously. Furthermore, this algorithm can easily be extended to support additional features that suit different requirements at virtually little cost. Experimental study on the yeast gene expression data shows that the FLOC algorithm can offer substantial improvements over the previously proposed algorithm.
微阵列是实验分子生物学的最新突破之一,它提供了一个强大的工具,通过它可以同时监测数千个基因的表达模式,并且已经产生了大量有价值的数据。Cheng和Church(2000)引入了双聚类的概念,以捕捉基因子集和条件子集的一致性。设计了一套启发式算法,用于寻找一个或一组双聚类,该算法由屏蔽空值和发现的双聚类的迭代、粗节点和细节点的删除、节点的添加和反向数据的包含组成。这些启发式不可避免地存在一些严重的缺陷。用随机数掩盖空值和发现的双聚类可能会导致随机干扰现象,从而影响高质量双聚类的发现。为了解决这个问题并进一步加速双聚类过程,我们将双聚类模型推广到包含空值,并提出了一种可以同时发现k个可能重叠的双聚类的概率算法(FLOC)。此外,该算法可以很容易地扩展,以支持额外的功能,以满足不同的需求,几乎很少的成本。酵母基因表达数据的实验研究表明,FLOC算法比之前提出的算法有很大的改进。
{"title":"Enhanced biclustering on expression data","authors":"Jiong Yang, Haixun Wang, Wei Wang, Philip S. Yu","doi":"10.1109/BIBE.2003.1188969","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188969","url":null,"abstract":"Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. The concept of bicluster was introduced by Cheng and Church (2000) to capture the coherence of a subset of genes and a subset of conditions. A set of heuristic algorithms were also designed to either find one bicluster or a set of biclusters, which consist of iterations of masking null values and discovered biclusters, coarse and fine node deletion, node addition, and the inclusion of inverted data. These heuristics inevitably suffer from some serious drawback. The masking of null values and discovered biclusters with random numbers may result in the phenomenon of random interference which in turn impacts the discovery of high quality biclusters. To address this issue and to further accelerate the biclustering process, we generalize the model of bicluster to incorporate null values and propose a probabilistic algorithm (FLOC) that can discover a set of k possibly overlapping biclusters simultaneously. Furthermore, this algorithm can easily be extended to support additional features that suit different requirements at virtually little cost. Experimental study on the yeast gene expression data shows that the FLOC algorithm can offer substantial improvements over the previously proposed algorithm.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121084402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 342
An empirical comparison of tools for phylogenetic footprinting 系统发育足迹分析工具的经验比较
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188931
M. Blanchette, Samson Kwong, M. Tompa
Phylogenetic footprinting is an increasingly popular comparative genomics method for detecting regulatory elements in DNA sequences. With the profusion of possible methods to use for phylogenetic footprinting, the biologist needs some guidance to choose the most appropriate tool. We present methods for comparing tools on phylogenetic footprinting data. More specifically, we discuss two different classes of comparative experiments: those on simulated data and those on real orthologous promoter regions. We then report the results of a series of such empirical comparisons. The tools compared are the alignment-based methods using ClustalW and Dialign, and the motif-finding programs MEME and FootPrinter. Our results show that methods taking the species' phylogenetic relationships into consideration obtain better accuracy.
系统发育足迹是一种越来越流行的比较基因组学方法,用于检测DNA序列中的调控元件。随着系统发育足迹可能使用的方法的丰富,生物学家需要一些指导来选择最合适的工具。我们提出了系统发育足迹数据比较工具的方法。更具体地说,我们讨论了两种不同的比较实验:模拟数据的实验和真实同源启动子区域的实验。然后,我们报告了一系列这样的经验比较的结果。比较的工具是使用ClustalW和Dialign的基于对齐的方法,以及motif查找程序MEME和FootPrinter。我们的结果表明,考虑物种系统发育关系的方法获得了更好的准确性。
{"title":"An empirical comparison of tools for phylogenetic footprinting","authors":"M. Blanchette, Samson Kwong, M. Tompa","doi":"10.1109/BIBE.2003.1188931","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188931","url":null,"abstract":"Phylogenetic footprinting is an increasingly popular comparative genomics method for detecting regulatory elements in DNA sequences. With the profusion of possible methods to use for phylogenetic footprinting, the biologist needs some guidance to choose the most appropriate tool. We present methods for comparing tools on phylogenetic footprinting data. More specifically, we discuss two different classes of comparative experiments: those on simulated data and those on real orthologous promoter regions. We then report the results of a series of such empirical comparisons. The tools compared are the alignment-based methods using ClustalW and Dialign, and the motif-finding programs MEME and FootPrinter. Our results show that methods taking the species' phylogenetic relationships into consideration obtain better accuracy.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130934847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
An algorithm to reconstruct a target DNA sequence from its spectrum connected at a given level 一种从在给定水平上连接的谱中重建目标DNA序列的算法
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188947
Fang-Xiang Wu, W. Zhang, A. Kusalik
In order to sequence a target DNA, it is first cleaved into many shorter overlapping fragments by chemical or physical techniques. The nucleotide sequence of each fragment is then determined (read) by established methods. The set of all read fragments which cover the target DNA sequence is called its spectrum. It is believed that the shortest superstring of a spectrum is the best candidate for the target DNA sequence. The general problem of finding the shortest superstring for any given set of strings s is NP-hard. Fortunately, the biological instance of this problem is easier. It is not likely that two read fragments, each consisting of several hundred letters, which come from consecutive locations on the target DNA sequence have an overlap of only a few letters; typically, the overlap will be longer. Thus one may reasonably assume that two strings in the spectrum have significant overlap (connectivity) if they come from consecutive locations on the target DNA sequence. A class of important instances satisfying this assumption are those whose spectra are from DNA microarrays. This assumption allows us to claim and show the following: if the spectrum S of a target DNA sequence is substring-free and connected at level t, and the target DNA sequence has no repeats of size t or larger, then there exists an algorithm to reconstruct the target DNA sequence in the linear time O(|S|) after an overlap graph of the spectrum is built.
为了对目标DNA进行测序,首先通过化学或物理技术将其切割成许多较短的重叠片段。每个片段的核苷酸序列然后通过既定的方法确定(读取)。覆盖目标DNA序列的所有可读片段的集合称为其谱。人们认为谱中最短的超弦是目标DNA序列的最佳候选。对于任意给定的一组弦s,寻找最短超弦的一般问题是np困难的。幸运的是,这个问题的生物学实例更容易。来自目标DNA序列上连续位置的两个可读片段(每个片段由几百个字母组成)不太可能只有几个字母重叠;通常,重叠的时间会更长。因此,人们可以合理地假设,如果光谱中的两个字符串来自目标DNA序列上的连续位置,则它们具有显著的重叠(连通性)。满足这一假设的一类重要实例是那些光谱来自DNA微阵列的实例。这个假设使我们可以声明并证明:如果目标DNA序列的谱S是无子串的,并且在t层连通,并且目标DNA序列没有大小为t或更大的重复序列,则在建立谱的重叠图后,存在一种算法在线性时间O(|S|)内重构目标DNA序列。
{"title":"An algorithm to reconstruct a target DNA sequence from its spectrum connected at a given level","authors":"Fang-Xiang Wu, W. Zhang, A. Kusalik","doi":"10.1109/BIBE.2003.1188947","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188947","url":null,"abstract":"In order to sequence a target DNA, it is first cleaved into many shorter overlapping fragments by chemical or physical techniques. The nucleotide sequence of each fragment is then determined (read) by established methods. The set of all read fragments which cover the target DNA sequence is called its spectrum. It is believed that the shortest superstring of a spectrum is the best candidate for the target DNA sequence. The general problem of finding the shortest superstring for any given set of strings s is NP-hard. Fortunately, the biological instance of this problem is easier. It is not likely that two read fragments, each consisting of several hundred letters, which come from consecutive locations on the target DNA sequence have an overlap of only a few letters; typically, the overlap will be longer. Thus one may reasonably assume that two strings in the spectrum have significant overlap (connectivity) if they come from consecutive locations on the target DNA sequence. A class of important instances satisfying this assumption are those whose spectra are from DNA microarrays. This assumption allows us to claim and show the following: if the spectrum S of a target DNA sequence is substring-free and connected at level t, and the target DNA sequence has no repeats of size t or larger, then there exists an algorithm to reconstruct the target DNA sequence in the linear time O(|S|) after an overlap graph of the spectrum is built.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134073678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DHC: a density-based hierarchical clustering method for time series gene expression data DHC:一种基于密度的时间序列基因表达数据的分层聚类方法
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188978
D. Jiang, J. Pei, A. Zhang
Clustering the time series gene expression data is an important task in bioinformatics research and biomedical applications. Recently, some clustering methods have been adapted or proposed. However, some concerns still remain, such as the robustness of the mining methods, as well as the quality and the interpretability of the mining results. In this paper, we tackle the problem of effectively clustering time series gene expression data by proposing algorithm DHC, a density-based, hierarchical clustering method. We use a density-based approach to identify the clusters such that the clustering results are of high quality and robustness. Moreover, the mining result is in the form of a density tree, which uncovers the embedded clusters in a data set. The inner-structures, the borders and the outliers of the clusters can be further investigated using the attraction tree, which is an intermediate result of the mining. By these two trees, the internal structure of the data set can be visualized effectively. Our empirical evaluation using some real-world data sets show that the method is effective, robust and scalable. It matches the ground truth provided by bioinformatics experts very well in the sample data sets.
时间序列基因表达数据的聚类是生物信息学研究和生物医学应用中的一项重要任务。近年来,人们对聚类方法进行了改进或提出。然而,仍然存在一些问题,例如挖掘方法的稳健性,以及挖掘结果的质量和可解释性。本文提出了一种基于密度的分层聚类算法DHC,解决了时间序列基因表达数据的有效聚类问题。我们使用基于密度的方法来识别聚类,使聚类结果具有高质量和鲁棒性。挖掘结果以密度树的形式呈现,揭示了数据集中嵌入的聚类。利用吸引树可以进一步研究集群的内部结构、边界和异常值,这是挖掘的中间结果。通过这两棵树,可以有效地可视化数据集的内部结构。使用实际数据集进行的实证评估表明,该方法是有效的、鲁棒的和可扩展的。它与样本数据集中生物信息学专家提供的基本事实非常吻合。
{"title":"DHC: a density-based hierarchical clustering method for time series gene expression data","authors":"D. Jiang, J. Pei, A. Zhang","doi":"10.1109/BIBE.2003.1188978","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188978","url":null,"abstract":"Clustering the time series gene expression data is an important task in bioinformatics research and biomedical applications. Recently, some clustering methods have been adapted or proposed. However, some concerns still remain, such as the robustness of the mining methods, as well as the quality and the interpretability of the mining results. In this paper, we tackle the problem of effectively clustering time series gene expression data by proposing algorithm DHC, a density-based, hierarchical clustering method. We use a density-based approach to identify the clusters such that the clustering results are of high quality and robustness. Moreover, the mining result is in the form of a density tree, which uncovers the embedded clusters in a data set. The inner-structures, the borders and the outliers of the clusters can be further investigated using the attraction tree, which is an intermediate result of the mining. By these two trees, the internal structure of the data set can be visualized effectively. Our empirical evaluation using some real-world data sets show that the method is effective, robust and scalable. It matches the ground truth provided by bioinformatics experts very well in the sample data sets.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114581068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 193
A computational pipeline for protein structure prediction and analysis at genome scale 基因组尺度下蛋白质结构预测与分析的计算管道
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188923
M. Shah, S. Passovets, Dongsup Kim, K. Ellrott, Li Wang, Inna Vokler, P. LoCascio, Dong Xu, Ying Xu
Traditionally, protein 3D structures are solved using experimental techniques, like X-ray crystallography or nuclear magnetic resonance (NMR). While these experimental techniques have been the main workhorse for protein structure studies in the past few decades, it is becoming increasingly apparent that they alone cannot keep up with the production rate of protein sequences. Fortunately, computational techniques for protein structure predictions have matured to such a level that they can complement the existing experimental techniques. In this paper, we present an automated pipeline for protein structure prediction. The centerpiece of the pipeline is a threading-based protein structure prediction system, called PROSPECT, which we have been developing for the past few years. The pipeline consists of seven logical phases, utilizing a dozen tools. The pipeline has been implemented to run in a heterogeneous computational environment as a client/server system with a web interface. A number of genome-scale applications have been carried out on microbial genomes. Here we present one genome-scale application on Caenorhabditis elegans.
传统上,蛋白质的3D结构是通过实验技术来解决的,比如x射线晶体学或核磁共振(NMR)。虽然这些实验技术在过去几十年里一直是蛋白质结构研究的主要手段,但越来越明显的是,仅靠这些实验技术无法跟上蛋白质序列的生产速度。幸运的是,蛋白质结构预测的计算技术已经成熟到可以补充现有的实验技术的水平。在本文中,我们提出了一个自动化的蛋白质结构预测管道。管道的核心是一个基于线程的蛋白质结构预测系统,称为PROSPECT,这是我们过去几年一直在开发的。该管道由七个逻辑阶段组成,使用了十几个工具。该管道已被实现在异构计算环境中作为具有web接口的客户机/服务器系统运行。许多基因组规模的应用已经在微生物基因组上展开。在这里,我们提出了一个基因组规模的应用秀丽隐杆线虫。
{"title":"A computational pipeline for protein structure prediction and analysis at genome scale","authors":"M. Shah, S. Passovets, Dongsup Kim, K. Ellrott, Li Wang, Inna Vokler, P. LoCascio, Dong Xu, Ying Xu","doi":"10.1109/BIBE.2003.1188923","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188923","url":null,"abstract":"Traditionally, protein 3D structures are solved using experimental techniques, like X-ray crystallography or nuclear magnetic resonance (NMR). While these experimental techniques have been the main workhorse for protein structure studies in the past few decades, it is becoming increasingly apparent that they alone cannot keep up with the production rate of protein sequences. Fortunately, computational techniques for protein structure predictions have matured to such a level that they can complement the existing experimental techniques. In this paper, we present an automated pipeline for protein structure prediction. The centerpiece of the pipeline is a threading-based protein structure prediction system, called PROSPECT, which we have been developing for the past few years. The pipeline consists of seven logical phases, utilizing a dozen tools. The pipeline has been implemented to run in a heterogeneous computational environment as a client/server system with a web interface. A number of genome-scale applications have been carried out on microbial genomes. Here we present one genome-scale application on Caenorhabditis elegans.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121572553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1