首页 > 最新文献

Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.最新文献

英文 中文
Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design 针对药物设计中高维数据库学习的集成特征子集选择方法的实证评价
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188959
Hiroshi Mamitsuka
Discovering a new drug is one of the most important goals in not only the pharmaceutical field but also a variety of fields including molecular biology, chemistry and medical science. The importance of computationally understanding the relationships between a given chemical compound and its drug activity has been pronounced. In the data set regarding drug activity of chemical compounds, each row corresponds to a chemical compound, and columns are the descriptors of the compound and a label indicating drug activity of the compound Recently, the size of the descriptors has become larger to obtain more detailed information from a given set of compounds. Actually, the number of columns (attributes or features) of some drug data sets reaches hundreds of thousands or a million. The purpose of this paper is to empirically evaluate the performance of ensemble feature subset selection strategies by applying them to such a high-dimensional data set actually used in the process of drug design. We examined the performance of three ensemble methods, including a query learning based method, comparing with that of one of the latest feature subset selection methods. The evaluation was performed on a data set which contains approximately 140,000 features. Our results show that the query learning based methodology outperformed the other three methods, in terms of the final prediction accuracy and time efficiency. We have also examined the effect of noise in the data and found that the advantage of the method becomes more pronounced for larger noise levels.
发现新药不仅是制药领域的重要目标之一,也是分子生物学、化学和医学等各个领域的重要目标之一。通过计算来理解给定化合物与其药物活性之间的关系的重要性已经得到了明确的认识。在有关化合物药物活性的数据集中,每一行对应一个化合物,列是该化合物的描述符和指示该化合物药物活性的标签。最近,描述符的大小变得更大,以便从给定的一组化合物中获得更详细的信息。实际上,一些药物数据集的列数(属性或特征)达到数十万甚至百万。本文的目的是通过将集成特征子集选择策略应用于药物设计过程中实际使用的高维数据集,对其性能进行实证评估。我们研究了三种集成方法的性能,包括基于查询学习的方法,并将其与最新的一种特征子集选择方法进行了比较。评估是在包含大约14万个特征的数据集上进行的。结果表明,基于查询学习的方法在最终预测精度和时间效率方面优于其他三种方法。我们还检查了数据中噪声的影响,发现该方法的优势在较大的噪声水平下变得更加明显。
{"title":"Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design","authors":"Hiroshi Mamitsuka","doi":"10.1109/BIBE.2003.1188959","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188959","url":null,"abstract":"Discovering a new drug is one of the most important goals in not only the pharmaceutical field but also a variety of fields including molecular biology, chemistry and medical science. The importance of computationally understanding the relationships between a given chemical compound and its drug activity has been pronounced. In the data set regarding drug activity of chemical compounds, each row corresponds to a chemical compound, and columns are the descriptors of the compound and a label indicating drug activity of the compound Recently, the size of the descriptors has become larger to obtain more detailed information from a given set of compounds. Actually, the number of columns (attributes or features) of some drug data sets reaches hundreds of thousands or a million. The purpose of this paper is to empirically evaluate the performance of ensemble feature subset selection strategies by applying them to such a high-dimensional data set actually used in the process of drug design. We examined the performance of three ensemble methods, including a query learning based method, comparing with that of one of the latest feature subset selection methods. The evaluation was performed on a data set which contains approximately 140,000 features. Our results show that the query learning based methodology outperformed the other three methods, in terms of the final prediction accuracy and time efficiency. We have also examined the effect of noise in the data and found that the advantage of the method becomes more pronounced for larger noise levels.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121138595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automatic detection of premature ventricular contraction using quantum neural networks 利用量子神经网络自动检测室性早缩
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188943
Jie Zhou
Premature ventricular contractions (PVCs) are ectopic heart beats originating from ventricular area. It is a common form of heart arrhythmia. Electrocardiogram (ECG) recordings have been widely used to assist cardiologists to diagnose the problem. In this paper, we study the automatic detection of PVC using a fuzzy artificial neural network named Quantum Neural Network (QNN). With the quantum neurons in the network, trained QNN can model the levels of uncertainty arising from complex classification problems. This fuzzy feature is expected to enhance the reliability of the algorithm, which is critical for the applications in the biomedical domain. Experiments were conducted on ECG records in the MIT-BIH Arrhythmia Database. Results showed consistently higher or same reliability of QNN on all the available records compared to the backpropagation network. QNN, however, has a relatively higher resource requirement for training.
室性早搏是源自心室区域的异位心跳。这是一种常见的心律失常。心电图(ECG)记录已被广泛用于帮助心脏病专家诊断问题。本文采用模糊人工神经网络量子神经网络(QNN)对PVC的自动检测进行了研究。利用网络中的量子神经元,经过训练的QNN可以对复杂分类问题产生的不确定性水平进行建模。这种模糊特征有望提高算法的可靠性,这对生物医学领域的应用至关重要。实验采用MIT-BIH心律失常数据库中的心电图记录。结果表明,与反向传播网络相比,QNN在所有可用记录上的可靠性始终更高或相同。然而,QNN对训练资源的要求相对较高。
{"title":"Automatic detection of premature ventricular contraction using quantum neural networks","authors":"Jie Zhou","doi":"10.1109/BIBE.2003.1188943","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188943","url":null,"abstract":"Premature ventricular contractions (PVCs) are ectopic heart beats originating from ventricular area. It is a common form of heart arrhythmia. Electrocardiogram (ECG) recordings have been widely used to assist cardiologists to diagnose the problem. In this paper, we study the automatic detection of PVC using a fuzzy artificial neural network named Quantum Neural Network (QNN). With the quantum neurons in the network, trained QNN can model the levels of uncertainty arising from complex classification problems. This fuzzy feature is expected to enhance the reliability of the algorithm, which is critical for the applications in the biomedical domain. Experiments were conducted on ECG records in the MIT-BIH Arrhythmia Database. Results showed consistently higher or same reliability of QNN on all the available records compared to the backpropagation network. QNN, however, has a relatively higher resource requirement for training.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123775658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Analysis of Atlantic salmon skin mucus: COPS-a computer-based system for protein pattern analysis of 1D SDS-PAGE gels 大西洋鲑鱼皮肤粘液的分析:cops -一种基于计算机的1D SDS-PAGE凝胶蛋白质模式分析系统
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188928
Richard Eibrand, P. Kennedy, D. Cotter, U. MacEvilly, Bing Wu
This paper presents an approach that applies a combination of computing techniques, including image processing and analysis, syntactic pattern matching, clustering techniques and artificial neural networks to interpret biological data. The application domain being is the analysis of 1D SDS-PAGE gels of Atlantic salmon skin mucus. Researchers in our group have visually identified protein band intensity patterns in the salmon's skin mucus. The objective is to produce a system to minimize the loss of livestock in the fish farming industry. Initial results of the gel image analysis application and manual data analysis have shown that reproducible patterns exist within the gel band data and can be classified as either increasing or decreasing patterns. This type of analysis is not restricted to the analysis of Atlantic salmon skin mucus proteins, but can be extended to other proteins that exhibit recurring patterns over a period of time that require identification and classification.
本文提出了一种综合应用计算技术的方法,包括图像处理和分析、语法模式匹配、聚类技术和人工神经网络来解释生物数据。应用领域为大西洋鲑鱼皮肤黏液的1D SDS-PAGE凝胶分析。我们小组的研究人员已经从视觉上识别出了鲑鱼皮肤粘液中的蛋白质带强度模式。目标是建立一个系统,以尽量减少鱼类养殖业中牲畜的损失。凝胶图像分析应用程序和人工数据分析的初步结果表明,凝胶带数据中存在可重复的模式,可以分类为增加或减少模式。这种类型的分析不仅限于大西洋鲑鱼皮肤粘液蛋白的分析,而且可以扩展到需要识别和分类的其他蛋白质,这些蛋白质在一段时间内表现出反复出现的模式。
{"title":"Analysis of Atlantic salmon skin mucus: COPS-a computer-based system for protein pattern analysis of 1D SDS-PAGE gels","authors":"Richard Eibrand, P. Kennedy, D. Cotter, U. MacEvilly, Bing Wu","doi":"10.1109/BIBE.2003.1188928","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188928","url":null,"abstract":"This paper presents an approach that applies a combination of computing techniques, including image processing and analysis, syntactic pattern matching, clustering techniques and artificial neural networks to interpret biological data. The application domain being is the analysis of 1D SDS-PAGE gels of Atlantic salmon skin mucus. Researchers in our group have visually identified protein band intensity patterns in the salmon's skin mucus. The objective is to produce a system to minimize the loss of livestock in the fish farming industry. Initial results of the gel image analysis application and manual data analysis have shown that reproducible patterns exist within the gel band data and can be classified as either increasing or decreasing patterns. This type of analysis is not restricted to the analysis of Atlantic salmon skin mucus proteins, but can be extended to other proteins that exhibit recurring patterns over a period of time that require identification and classification.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126742303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A model of random sequences for de novo peptide sequencing 一种用于从头肽测序的随机序列模型
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188948
K. Jarman, W. Cannon, Kristin H. Jarman, A. Heredia-Langner
We present a model for the probability of random sequences appearing in product ion spectra obtained from tandem mass spectrometry experiments using collision-induced dissociation. We demonstrate the use of these probabilities for ranking candidate peptide sequences obtained using a de novo algorithm. Sequence candidates are obtained from a spectrum graph that is greatly reduced in size from those in previous graph-theoretical de novo approaches. Evidence of multiple instances of subsequences of each candidate, due to different fragment ion type series as well as isotopic peaks, is incorporated in a hierarchical scoring scheme. This approach is shown to be useful for confirming results from database search and as a first step towards a statistically rigorous de novo algorithm.
我们提出了一个随机序列出现在使用碰撞诱导解离的串联质谱实验获得的产物离子谱中的概率模型。我们演示了使用这些概率对使用从头算法获得的候选肽序列排序。候选序列是从谱图中获得的,该谱图的大小比以前的图理论从头开始的方法大大减少。由于不同的片段离子类型系列和同位素峰,每个候选子序列的多个实例的证据被纳入分层评分方案。这种方法对于确认数据库搜索的结果非常有用,并且是迈向统计上严格的从头算法的第一步。
{"title":"A model of random sequences for de novo peptide sequencing","authors":"K. Jarman, W. Cannon, Kristin H. Jarman, A. Heredia-Langner","doi":"10.1109/BIBE.2003.1188948","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188948","url":null,"abstract":"We present a model for the probability of random sequences appearing in product ion spectra obtained from tandem mass spectrometry experiments using collision-induced dissociation. We demonstrate the use of these probabilities for ranking candidate peptide sequences obtained using a de novo algorithm. Sequence candidates are obtained from a spectrum graph that is greatly reduced in size from those in previous graph-theoretical de novo approaches. Evidence of multiple instances of subsequences of each candidate, due to different fragment ion type series as well as isotopic peaks, is incorporated in a hierarchical scoring scheme. This approach is shown to be useful for confirming results from database search and as a first step towards a statistically rigorous de novo algorithm.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121910434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A biological mapping of a learned avoidance behavior model to the basal ganglia 学习回避行为模型到基底神经节的生物学映射
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188963
K. Biddell, Jinghong Li, Jeffrey D. Johnson
In this paper we map a computational model of learned avoidance behavior in a one-way avoidance experiment to the biology of the basal ganglia. We extend our previous work to develop a more biologically accurate mapping. Learned avoidance behavior is a critical component of animal survival; thus, a model of animal learning should account for this phenomenon. Through long term potentiation and long term depression at the corticostriatal synapses, we propose that a prediction of the expected future benefit is generated by the animal. We map a reinforcement center of the model to the indirect pathway of the basal ganglia and a motor center to the direct pathway. Finally, we propose that an external reinforcement signal, in the form of pain caused by an electric shock, is transferred from the thalamus to the subthalamic nucleus.
本文将单向回避实验中习得性回避行为的计算模型映射到基底神经节的生物学。我们扩展了之前的工作,以开发更精确的生物学图谱。习得性回避行为是动物生存的重要组成部分;因此,动物学习的模型应该解释这种现象。通过皮质纹状体突触的长期增强和长期抑制,我们提出动物对预期的未来利益的预测。我们将模型的强化中心映射到基底节区的间接通路,将运动中心映射到基底节区的直接通路。最后,我们提出一种外部强化信号,以由电击引起的疼痛的形式,从丘脑转移到丘脑下核。
{"title":"A biological mapping of a learned avoidance behavior model to the basal ganglia","authors":"K. Biddell, Jinghong Li, Jeffrey D. Johnson","doi":"10.1109/BIBE.2003.1188963","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188963","url":null,"abstract":"In this paper we map a computational model of learned avoidance behavior in a one-way avoidance experiment to the biology of the basal ganglia. We extend our previous work to develop a more biologically accurate mapping. Learned avoidance behavior is a critical component of animal survival; thus, a model of animal learning should account for this phenomenon. Through long term potentiation and long term depression at the corticostriatal synapses, we propose that a prediction of the expected future benefit is generated by the animal. We map a reinforcement center of the model to the indirect pathway of the basal ganglia and a motor center to the direct pathway. Finally, we propose that an external reinforcement signal, in the form of pain caused by an electric shock, is transferred from the thalamus to the subthalamic nucleus.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129692587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenoMosaic: on-demand multiple genome comparison and comparative annotation 基因组:按需多基因组比较和比较注释
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188942
C. Gibas, D. Sturgill, J. Weller
GenoMosaic is a portable database application for on demand multiple genome comparison. We discuss the methods used to generate a GenoMosaic data set from genome sequence data, and present the relational data model used in the application. We define an abstraction of genome sequence data (the feature mosaic) that allows us to bridge between annotation that describes features within single genes and that which includes possibly multiple genes and intergenic features over long stretches of genomic sequence. The goal of this project is to support new method development for on-demand multiple genome comparison. Each genome to be compared can be modeled as a string of generic features of any type that can be computationally defined, related by adjacency information within and among genomes. The generic feature abstraction makes it possible to study the arrangement of features in the genome at a level of detail which includes RNA genes, putative regulatory regions, SNPs, overlapping transcripts, intron splice junctions, alternative polyadenylation signals-in short, to incorporate significant sequence details which are not necessarily within protein-coding regions. This abstraction is amenable to functional implementation as a relational data model upon which novel query capabilities can be built, and provides objects that can be analyzed using algorithms for comparison of strings and lists. As an initial effort, we have implemented a prototype using a representative set of comparative and content-based annotation methods to reduce a collection of prokaryotic genomes to a feature mosaic representation. Entity-Relationship modeling was then used to develop a data model capable of storing detailed results, including complete parameters for each instance of analysis.
GenoMosaic是一个便携式的按需多基因组比较数据库应用程序。我们讨论了从基因组序列数据中生成基因组数据集的方法,并给出了应用中使用的关系数据模型。我们定义了基因组序列数据的抽象(特征拼接),它允许我们在描述单个基因内的特征的注释和描述可能包括多个基因和基因组序列长片段的基因间特征的注释之间架起桥梁。该项目的目标是支持按需多基因组比较的新方法开发。每个要比较的基因组都可以被建模为一串可以计算定义的任何类型的通用特征,通过基因组内部和基因组之间的邻接信息联系起来。通用特征抽象使得研究基因组中特征的排列成为可能,这些特征包括RNA基因、假定的调控区域、snp、重叠转录本、内含子剪接连接、可选的聚腺苷化信号——简而言之,将不一定在蛋白质编码区域内的重要序列细节结合起来。这种抽象适用于作为关系数据模型的功能实现,可以在其上构建新的查询功能,并提供可以使用比较字符串和列表的算法进行分析的对象。作为最初的努力,我们已经实现了一个原型,使用一组具有代表性的比较和基于内容的注释方法,将原核基因组集合减少到特征马赛克表示。然后使用实体-关系建模来开发能够存储详细结果的数据模型,包括每个分析实例的完整参数。
{"title":"GenoMosaic: on-demand multiple genome comparison and comparative annotation","authors":"C. Gibas, D. Sturgill, J. Weller","doi":"10.1109/BIBE.2003.1188942","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188942","url":null,"abstract":"GenoMosaic is a portable database application for on demand multiple genome comparison. We discuss the methods used to generate a GenoMosaic data set from genome sequence data, and present the relational data model used in the application. We define an abstraction of genome sequence data (the feature mosaic) that allows us to bridge between annotation that describes features within single genes and that which includes possibly multiple genes and intergenic features over long stretches of genomic sequence. The goal of this project is to support new method development for on-demand multiple genome comparison. Each genome to be compared can be modeled as a string of generic features of any type that can be computationally defined, related by adjacency information within and among genomes. The generic feature abstraction makes it possible to study the arrangement of features in the genome at a level of detail which includes RNA genes, putative regulatory regions, SNPs, overlapping transcripts, intron splice junctions, alternative polyadenylation signals-in short, to incorporate significant sequence details which are not necessarily within protein-coding regions. This abstraction is amenable to functional implementation as a relational data model upon which novel query capabilities can be built, and provides objects that can be analyzed using algorithms for comparison of strings and lists. As an initial effort, we have implemented a prototype using a representative set of comparative and content-based annotation methods to reduce a collection of prokaryotic genomes to a feature mosaic representation. Entity-Relationship modeling was then used to develop a data model capable of storing detailed results, including complete parameters for each instance of analysis.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"240 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113996544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Augmenting SSEs with structural properties for rapid protein structure comparison 利用结构特性扩增sse,快速比较蛋白质结构
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188972
C. Chionh, Zhiyong Huang, K. Tan, Zhen Yao
Comparing protein structures in three dimensions is a computationally expensive process that makes a full scan of a protein against a library of known protein structures impractical. To reduce the cost, we can use an approximation of the three dimensional structure that allows protein comparison to be performed quickly to filter away dissimilar proteins. In this paper we present a new algorithm, called SCALE, for protein structure comparison. In SCALE, a protein is represented as a sequence of secondary structure elements (SSEs) augmented with 3D structural properties such as the distances and angles between the SSEs. As such, the comparison between two proteins is reduced to a sequence alignment problem between their corresponding sequences of SSEs. The 3-D structural properties of the proteins contribute to the similarity score between the two sequences. We have implemented SCALE, and compared its performance against existing schemes. Our performance study shows that SCALE outperforms existing methods in terms of both efficiency and effectiveness (measured in terms of precision and recall).
在三维上比较蛋白质结构是一个计算成本很高的过程,这使得对已知蛋白质结构库的蛋白质进行全面扫描变得不切实际。为了降低成本,我们可以使用近似的三维结构,使蛋白质比较能够快速进行,以过滤掉不同的蛋白质。本文提出了一种新的蛋白质结构比较算法,称为SCALE。在SCALE中,蛋白质被表示为二级结构元素(sse)序列,并增加了sse之间的距离和角度等3D结构属性。因此,两种蛋白质之间的比较被简化为它们对应的sse序列之间的序列比对问题。蛋白质的三维结构特性有助于两个序列之间的相似性得分。我们已经实现了SCALE,并将其性能与现有方案进行了比较。我们的性能研究表明,SCALE在效率和有效性(以精度和召回率衡量)方面优于现有方法。
{"title":"Augmenting SSEs with structural properties for rapid protein structure comparison","authors":"C. Chionh, Zhiyong Huang, K. Tan, Zhen Yao","doi":"10.1109/BIBE.2003.1188972","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188972","url":null,"abstract":"Comparing protein structures in three dimensions is a computationally expensive process that makes a full scan of a protein against a library of known protein structures impractical. To reduce the cost, we can use an approximation of the three dimensional structure that allows protein comparison to be performed quickly to filter away dissimilar proteins. In this paper we present a new algorithm, called SCALE, for protein structure comparison. In SCALE, a protein is represented as a sequence of secondary structure elements (SSEs) augmented with 3D structural properties such as the distances and angles between the SSEs. As such, the comparison between two proteins is reduced to a sequence alignment problem between their corresponding sequences of SSEs. The 3-D structural properties of the proteins contribute to the similarity score between the two sequences. We have implemented SCALE, and compared its performance against existing schemes. Our performance study shows that SCALE outperforms existing methods in terms of both efficiency and effectiveness (measured in terms of precision and recall).","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134298418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Streamlining biological data analysis using BioFlow 使用BioFlow简化生物数据分析
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188960
Zhijie Guan, H. Jamil
For several obvious and practical reasons, resources needed for biological data analysis are often geographically distributed and are accessible through the Internet. Such resources usually include data repositories, analysis tools, digital documents, and so on. Such an arrangement warrants sophisticated data and process integration tools in order to design ad hoc higher level applications using these online resources. In this paper we present such a system, called the BioFlow, that exploits recent advances in workflow technology and Internet computing in order to provide support for ad hoc application development by hiding aspects related to the heterogeneity and distributive nature of the resources required by user applications. We introduce the salient features of the BioFlow system, discuss briefly its architecture and implementation issues using simple but real life applications. We demonstrate that the declarative language on which BioFlow is based makes our system quite intuitive, easy to use, effective and efficient for ad hoc application design. The approach taken in BioFlow is somewhat similar to the idea of web services in semantic web computing for biological applications.
由于几个明显和实际的原因,生物数据分析所需的资源通常是地理分布的,并且可以通过Internet访问。这些资源通常包括数据存储库、分析工具、数字文档等等。这样的安排需要复杂的数据和流程集成工具,以便使用这些在线资源设计特别的高级应用程序。在本文中,我们提出了这样一个系统,称为BioFlow,它利用工作流技术和互联网计算的最新进展,通过隐藏与用户应用程序所需资源的异构性和分布式特性相关的方面,为临时应用程序开发提供支持。我们介绍了BioFlow系统的主要特点,简要讨论了其架构和使用简单但现实生活中的应用程序的实现问题。我们证明了BioFlow所基于的声明性语言使我们的系统非常直观,易于使用,对于特别的应用程序设计是有效和高效的。BioFlow采用的方法有点类似于生物学应用的语义web计算中的web服务。
{"title":"Streamlining biological data analysis using BioFlow","authors":"Zhijie Guan, H. Jamil","doi":"10.1109/BIBE.2003.1188960","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188960","url":null,"abstract":"For several obvious and practical reasons, resources needed for biological data analysis are often geographically distributed and are accessible through the Internet. Such resources usually include data repositories, analysis tools, digital documents, and so on. Such an arrangement warrants sophisticated data and process integration tools in order to design ad hoc higher level applications using these online resources. In this paper we present such a system, called the BioFlow, that exploits recent advances in workflow technology and Internet computing in order to provide support for ad hoc application development by hiding aspects related to the heterogeneity and distributive nature of the resources required by user applications. We introduce the salient features of the BioFlow system, discuss briefly its architecture and implementation issues using simple but real life applications. We demonstrate that the declarative language on which BioFlow is based makes our system quite intuitive, easy to use, effective and efficient for ad hoc application design. The approach taken in BioFlow is somewhat similar to the idea of web services in semantic web computing for biological applications.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124449232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Combining few neural networks for effective secondary structure prediction 结合几种神经网络进行有效的二次结构预测
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188981
K. Guimaraes, J. Melo, George D. C. Cavalcanti
The prediction of secondary structure is treated with a simple and efficient method. Combining only three neural networks, an average Q/sub 3/ accuracy prediction by residues of 75.93% is achieved. This value is better than the best results reported on the same test and training database, CB396, using the same validation method. For a second database, RS126, an average Q/sub 3/ accuracy of 74.13% is attained, which is better than each individual method, being defeated only by CONSENSUS, a rather intricate engine, which is a combination of several methods. The networks are trained with RPROP an efficient variation of the back-propagation algorithm. Five combination rules are applied independently afterwards. Each one increases the accuracy of prediction by at least 1%, due to the fact that each network used converges to a different local minimum. The Product rule derives the best results. The predictor described here can be accessed at http://biolab.cin.ufpe.br/tools/.
采用一种简单有效的方法对二次结构进行预测。仅结合三个神经网络,残差平均Q/sub 3/精度预测达到75.93%。该值优于使用相同验证方法在相同测试和训练数据库CB396上报告的最佳结果。对于第二个数据库RS126,平均Q/sub 3/准确率达到74.13%,优于每个单独的方法,只有CONSENSUS(一个相当复杂的引擎,它是几种方法的组合)打败了它。该网络使用RPROP进行训练,RPROP是一种反向传播算法的有效变体。五个组合规则随后独立应用。由于使用的每个网络收敛到不同的局部最小值,因此每个网络都将预测的准确性提高了至少1%。Product规则可以得到最好的结果。这里描述的预测器可以在http://biolab.cin.ufpe.br/tools/上访问。
{"title":"Combining few neural networks for effective secondary structure prediction","authors":"K. Guimaraes, J. Melo, George D. C. Cavalcanti","doi":"10.1109/BIBE.2003.1188981","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188981","url":null,"abstract":"The prediction of secondary structure is treated with a simple and efficient method. Combining only three neural networks, an average Q/sub 3/ accuracy prediction by residues of 75.93% is achieved. This value is better than the best results reported on the same test and training database, CB396, using the same validation method. For a second database, RS126, an average Q/sub 3/ accuracy of 74.13% is attained, which is better than each individual method, being defeated only by CONSENSUS, a rather intricate engine, which is a combination of several methods. The networks are trained with RPROP an efficient variation of the back-propagation algorithm. Five combination rules are applied independently afterwards. Each one increases the accuracy of prediction by at least 1%, due to the fact that each network used converges to a different local minimum. The Product rule derives the best results. The predictor described here can be accessed at http://biolab.cin.ufpe.br/tools/.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125838129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A framework for cancer-related genes mining over the Internet 互联网上癌症相关基因挖掘的框架
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188983
J. Tsai, Jan-Gowth Chang, S. H. Shih, Rong-Ming Chen, H. Hsiao, Rouh-Mei Hu, S. N. Chen, M. M. Lee, Falcon F. M. Liu, Wen-Ling Chan
Clinically, cancer is a complex family of diseases. From the view of molecular biology, cancer is a genetic disease resulting from abnormal gene expression. This alternation of gene expression could be resulting from DNA instability, such as translocation, amplification, deletion or point mutations. A large amplification or deletion of a chromosome region can be easily detected by two methods: loss of heterozygosity (LOH) and comparative genomic hybridization (CGH). The different gene expression pattern can be monitored by high throughput microarray analysis. Enormous data accumulated by practicing these technologies and the data pool is continuing enlarging with an amazing rate. To aid investigators mining useful information in these data deposits, new data storing and analysis tools must be developed. Two value-added databases are constructed to achieve this purpose. They contain information of genes in the unstable regions of cancer cells basing on the data accumulated from LOH and CGH experiments and information of cancer cell gene expression profiles according to microarray analysis, respectively. An automatic system to retrieve interesting gene information, to compare with the known databases, to analyze and predict the protein functions, and to group the genes of the same function will be integrated into the database circuit. An automatic update system will be installed and performed after the setup of the two databases. The system keeps also the probability to modify and to accept new data obtained from any new techniques. Our goal is to help biologists to find the needles in a haystack that is, to find the real cancer-related genes (oncogenes or tumor suppressor genes) for further research purpose.
在临床上,癌症是一个复杂的疾病家族。从分子生物学的角度看,癌症是一种由基因表达异常引起的遗传性疾病。这种基因表达的改变可能是由DNA不稳定引起的,如易位、扩增、缺失或点突变。有两种方法可以很容易地检测到染色体区域的大量扩增或缺失:杂合性缺失(LOH)和比较基因组杂交(CGH)。不同的基因表达模式可以通过高通量微阵列分析来监测。通过实践这些技术积累了大量的数据,并且数据池正在以惊人的速度继续扩大。为了帮助研究人员从这些数据中挖掘有用的信息,必须开发新的数据存储和分析工具。为此构建了两个增值数据库。它们分别包含了基于LOH和CGH实验积累数据的癌细胞不稳定区域的基因信息和基于微阵列分析的癌细胞基因表达谱信息。数据库电路将集成一个自动检索感兴趣的基因信息,与已知数据库进行比较,分析和预测蛋白质功能,以及对相同功能的基因进行分组的系统。安装两个数据库后,将安装并执行自动更新系统。该系统还保留了修改和接受从任何新技术获得的新数据的可能性。我们的目标是帮助生物学家大海捞针,即找到真正的癌症相关基因(致癌基因或肿瘤抑制基因),为进一步的研究目的。
{"title":"A framework for cancer-related genes mining over the Internet","authors":"J. Tsai, Jan-Gowth Chang, S. H. Shih, Rong-Ming Chen, H. Hsiao, Rouh-Mei Hu, S. N. Chen, M. M. Lee, Falcon F. M. Liu, Wen-Ling Chan","doi":"10.1109/BIBE.2003.1188983","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188983","url":null,"abstract":"Clinically, cancer is a complex family of diseases. From the view of molecular biology, cancer is a genetic disease resulting from abnormal gene expression. This alternation of gene expression could be resulting from DNA instability, such as translocation, amplification, deletion or point mutations. A large amplification or deletion of a chromosome region can be easily detected by two methods: loss of heterozygosity (LOH) and comparative genomic hybridization (CGH). The different gene expression pattern can be monitored by high throughput microarray analysis. Enormous data accumulated by practicing these technologies and the data pool is continuing enlarging with an amazing rate. To aid investigators mining useful information in these data deposits, new data storing and analysis tools must be developed. Two value-added databases are constructed to achieve this purpose. They contain information of genes in the unstable regions of cancer cells basing on the data accumulated from LOH and CGH experiments and information of cancer cell gene expression profiles according to microarray analysis, respectively. An automatic system to retrieve interesting gene information, to compare with the known databases, to analyze and predict the protein functions, and to group the genes of the same function will be integrated into the database circuit. An automatic update system will be installed and performed after the setup of the two databases. The system keeps also the probability to modify and to accept new data obtained from any new techniques. Our goal is to help biologists to find the needles in a haystack that is, to find the real cancer-related genes (oncogenes or tumor suppressor genes) for further research purpose.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128318015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1