首页 > 最新文献

Proceedings. International Conference on Intelligent Systems for Molecular Biology最新文献

英文 中文
Analysis of gene expression microarrays for phenotype classification. 基因表达微阵列分析用于表型分类。
A Califano, G Stolovitzky, Y Tu

Several microarray technologies that monitor the level of expression of a large number of genes have recently emerged. Given DNA-microarray data for a set of cells characterized by a given phenotype and for a set of control cells, an important problem is to identify "patterns" of gene expression that can be used to predict cell phenotype. The potential number of such patterns is exponential in the number of genes. In this paper, we propose a solution to this problem based on a supervised learning algorithm, which differs substantially from previous schemes. It couples a complex, non-linear similarity metric, which maximizes the probability of discovering discriminative gene expression patterns, and a pattern discovery algorithm called SPLASH. The latter discovers efficiently and deterministically all statistically significant gene expression patterns in the phenotype set. Statistical significance is evaluated based on the probability of a pattern to occur by chance in the control set. Finally, a greedy set covering algorithm is used to select an optimal subset of statistically significant patterns, which form the basis for a standard likelihood ratio classification scheme. We analyze data from 60 human cancer cell lines using this method, and compare our results with those of other supervised learning schemes. Different phenotypes are studied. These include cancer morphologies (such as melanoma), molecular targets (such as mutations in the p53 gene), and therapeutic targets related to the sensitivity to an anticancer compounds. We also analyze a synthetic data set that shows that this technique is especially well suited for the analysis of sub-phenotype mixtures. For complex phenotypes, such as p53, our method produces an encouragingly low rate of false positives and false negatives and seems to outperform the others. Similar low rates are reported when predicting the efficacy of experimental anticancer compounds. This counts among the first reported studies where drug efficacy has been successfully predicted from large-scale expression data analysis.

最近出现了几种监测大量基因表达水平的微阵列技术。给定一组具有给定表型特征的细胞和一组对照细胞的dna微阵列数据,一个重要的问题是确定可用于预测细胞表型的基因表达“模式”。这种模式的潜在数量与基因数量呈指数关系。在本文中,我们提出了一种基于监督学习算法的解决方案,这与以往的方案有很大的不同。它结合了一个复杂的非线性相似性度量,该度量最大化发现歧视性基因表达模式的概率,以及一个称为SPLASH的模式发现算法。后者发现有效和确定性的所有统计显著基因表达模式在表型集。统计显著性是根据一种模式在控制集中偶然发生的概率来评估的。最后,使用贪婪集覆盖算法选择统计显著模式的最优子集,形成标准似然比分类方案的基础。我们使用这种方法分析了60个人类癌细胞系的数据,并将我们的结果与其他监督学习方案的结果进行了比较。研究了不同的表型。这些包括癌症形态(如黑色素瘤),分子靶标(如p53基因突变),以及与抗癌化合物敏感性相关的治疗靶标。我们还分析了一个合成数据集,表明这种技术特别适合于分析亚表型混合物。对于复杂的表现型,如p53,我们的方法产生了令人鼓舞的低假阳性和假阴性率,似乎优于其他方法。在预测实验性抗癌化合物的疗效时,也报道了类似的低准确率。这是首次报道的通过大规模表达数据分析成功预测药物疗效的研究。
{"title":"Analysis of gene expression microarrays for phenotype classification.","authors":"A Califano,&nbsp;G Stolovitzky,&nbsp;Y Tu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Several microarray technologies that monitor the level of expression of a large number of genes have recently emerged. Given DNA-microarray data for a set of cells characterized by a given phenotype and for a set of control cells, an important problem is to identify \"patterns\" of gene expression that can be used to predict cell phenotype. The potential number of such patterns is exponential in the number of genes. In this paper, we propose a solution to this problem based on a supervised learning algorithm, which differs substantially from previous schemes. It couples a complex, non-linear similarity metric, which maximizes the probability of discovering discriminative gene expression patterns, and a pattern discovery algorithm called SPLASH. The latter discovers efficiently and deterministically all statistically significant gene expression patterns in the phenotype set. Statistical significance is evaluated based on the probability of a pattern to occur by chance in the control set. Finally, a greedy set covering algorithm is used to select an optimal subset of statistically significant patterns, which form the basis for a standard likelihood ratio classification scheme. We analyze data from 60 human cancer cell lines using this method, and compare our results with those of other supervised learning schemes. Different phenotypes are studied. These include cancer morphologies (such as melanoma), molecular targets (such as mutations in the p53 gene), and therapeutic targets related to the sensitivity to an anticancer compounds. We also analyze a synthetic data set that shows that this technique is especially well suited for the analysis of sub-phenotype mixtures. For complex phenotypes, such as p53, our method produces an encouragingly low rate of false positives and false negatives and seems to outperform the others. Similar low rates are reported when predicting the efficacy of experimental anticancer compounds. This counts among the first reported studies where drug efficacy has been successfully predicted from large-scale expression data analysis.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21812199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multiple alignment algorithm for metabolic pathway analysis using enzyme hierarchy. 一种利用酶层次结构进行代谢途径分析的多重比对算法。
Y Tohsato, H Matsuda, A Hashimoto

In many of the chemical reactions in living cells, enzymes act as catalysts in the conversion of certain compounds (substrates) into other compounds (products). Comparative analyses of the metabolic pathways formed by such reactions give important information on their evolution and on pharmacological targets (Dandekar et al. 1999). Each of the enzymes that constitute a pathway is classified according to the EC (Enzyme Commission) numbering system, which consists of four sets of numbers that categorize the type of the chemical reaction catalyzed. In this study, we consider that reaction similarities can be expressed by the similarities between EC numbers of the respective enzymes. Therefore, in order to find a common pattern among pathways, it is desirable to be able to use the functional hierarchy of EC numbers to express the reaction similarities. In this paper, we propose a multiple alignment algorithm utilizing information content that is extended to symbols having a hierarchical structure. The effectiveness of our method is demonstrated by applying the method to pathway analyses of sugar, DNA and amino acid metabolisms.

在活细胞中的许多化学反应中,酶在某些化合物(底物)转化为其他化合物(产物)的过程中起催化剂的作用。对这些反应形成的代谢途径进行比较分析,可以提供有关其进化和药理靶点的重要信息(Dandekar et al. 1999)。构成途径的每种酶都根据EC(酶委员会)编号系统进行分类,该系统由四组编号组成,用于对催化的化学反应类型进行分类。在本研究中,我们认为反应的相似性可以通过各自酶的EC数的相似性来表达。因此,为了找到途径之间的共同模式,希望能够使用EC数的功能层次来表示反应的相似性。在本文中,我们提出了一种利用信息内容扩展到具有层次结构的符号的多重对齐算法。通过将该方法应用于糖、DNA和氨基酸代谢的途径分析,证明了该方法的有效性。
{"title":"A multiple alignment algorithm for metabolic pathway analysis using enzyme hierarchy.","authors":"Y Tohsato,&nbsp;H Matsuda,&nbsp;A Hashimoto","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In many of the chemical reactions in living cells, enzymes act as catalysts in the conversion of certain compounds (substrates) into other compounds (products). Comparative analyses of the metabolic pathways formed by such reactions give important information on their evolution and on pharmacological targets (Dandekar et al. 1999). Each of the enzymes that constitute a pathway is classified according to the EC (Enzyme Commission) numbering system, which consists of four sets of numbers that categorize the type of the chemical reaction catalyzed. In this study, we consider that reaction similarities can be expressed by the similarities between EC numbers of the respective enzymes. Therefore, in order to find a common pattern among pathways, it is desirable to be able to use the functional hierarchy of EC numbers to express the reaction similarities. In this paper, we propose a multiple alignment algorithm utilizing information content that is extended to symbols having a hierarchical structure. The effectiveness of our method is demonstrated by applying the method to pathway analyses of sugar, DNA and amino acid metabolisms.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21813097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Algorithm Combining Discrete and Continuous Methods for Optical Mapping 一种离散与连续相结合的光学映射算法
R. Karp, I. Pe’er, R. Shamir
Optical mapping is a novel technique for generating the restriction map of a DNA molecule by observing many single, partially digested, copies of it, using fluorescence microscopy. The real-life problem is complicated by numerous factors: false positive and false negative cut observations, inaccurate location measurements, unknown orientations and faulty molecules. We present an algorithm for solving the real-life problem. The algorithm combines continuous optimization and combinatorial algorithms, applied to a non-uniform discretization of the data. We present encouraging results on real experimental data.
光学作图是一种新的技术,通过荧光显微镜观察DNA分子的许多单一的,部分消化的拷贝来产生DNA分子的限制图谱。现实生活中的问题由于许多因素而变得复杂:假阳性和假阴性切割观察,不准确的位置测量,未知的方向和错误的分子。我们提出了一种解决现实问题的算法。该算法将连续优化算法与组合算法相结合,应用于数据的非均匀离散化。我们在实际实验数据上给出了令人鼓舞的结果。
{"title":"An Algorithm Combining Discrete and Continuous Methods for Optical Mapping","authors":"R. Karp, I. Pe’er, R. Shamir","doi":"10.1089/106652701446189","DOIUrl":"https://doi.org/10.1089/106652701446189","url":null,"abstract":"Optical mapping is a novel technique for generating the restriction map of a DNA molecule by observing many single, partially digested, copies of it, using fluorescence microscopy. The real-life problem is complicated by numerous factors: false positive and false negative cut observations, inaccurate location measurements, unknown orientations and faulty molecules. We present an algorithm for solving the real-life problem. The algorithm combines continuous optimization and combinatorial algorithms, applied to a non-uniform discretization of the data. We present encouraging results on real experimental data.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1089/106652701446189","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60598883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A motion planning approach to flexible ligand binding. 柔性配体结合的运动规划方法。
A P Singh, J C Latombe, D L Brutlag

Most computational models of protein-ligand interactions consider only the energetics of the final bound state of the complex and do not examine the dynamics of the ligand as it enters the binding site. We have developed a novel technique for studying the dynamics of protein-ligand interactions based on motion planning algorithms from the field of robotics. Our algorithm uses electrostatic and van der Waals potentials to compute the most energetically favorable path between any given initial and goal ligand configurations. We use probabilistic motion planning to sample the distribution of possible paths to a given goal configuration and compute an energy-based "difficulty weight" for each path. By statistically averaging this weight over several randomly generated starting configurations, we compute the relative difficulty of entering and leaving a given binding configuration. This approach yields details of the energy contours around the binding site and can be used to characterize and predict good binding sites. Results from tests with three protein-ligand complexes indicate that our algorithm is able to detect energy barriers around the true binding site that distinguish this site from other predicted low-energy binding sites.

大多数蛋白质-配体相互作用的计算模型只考虑复合物最终结合状态的能量学,而不考虑配体进入结合位点时的动力学。我们开发了一种基于机器人领域运动规划算法的研究蛋白质-配体相互作用动力学的新技术。我们的算法使用静电和范德华势来计算任何给定的初始配体和目标配体构型之间最有利的能量路径。我们使用概率运动规划对给定目标配置的可能路径分布进行采样,并为每条路径计算基于能量的“难度权重”。通过在几个随机生成的起始配置上统计平均这个权重,我们计算进入和离开给定绑定配置的相对难度。这种方法产生了结合位点周围能量轮廓的细节,并可用于表征和预测良好的结合位点。三种蛋白质配体复合物的测试结果表明,我们的算法能够检测到真正结合位点周围的能量屏障,从而将该位点与其他预测的低能量结合位点区分开来。
{"title":"A motion planning approach to flexible ligand binding.","authors":"A P Singh,&nbsp;J C Latombe,&nbsp;D L Brutlag","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Most computational models of protein-ligand interactions consider only the energetics of the final bound state of the complex and do not examine the dynamics of the ligand as it enters the binding site. We have developed a novel technique for studying the dynamics of protein-ligand interactions based on motion planning algorithms from the field of robotics. Our algorithm uses electrostatic and van der Waals potentials to compute the most energetically favorable path between any given initial and goal ligand configurations. We use probabilistic motion planning to sample the distribution of possible paths to a given goal configuration and compute an energy-based \"difficulty weight\" for each path. By statistically averaging this weight over several randomly generated starting configurations, we compute the relative difficulty of entering and leaving a given binding configuration. This approach yields details of the energy contours around the binding site and can be used to characterize and predict good binding sites. Results from tests with three protein-ligand complexes indicate that our algorithm is able to detect energy barriers around the true binding site that distinguish this site from other predicted low-energy binding sites.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic extraction of biological information from scientific text: protein-protein interactions. 从科学文本中自动提取生物信息:蛋白质-蛋白质相互作用。
C Blaschke, M A Andrade, C Ouzounis, A Valencia

We describe the basic design of a system for automatic detection of protein-protein interactions extracted from scientific abstracts. By restricting the problem domain and imposing a number of strong assumptions which include pre-specified protein names and a limited set of verbs that represent actions, we show that it is possible to perform accurate information extraction. The performance of the system is evaluated with different cases of real-world interaction networks, including the Drosophila cell cycle control. The results obtained computationally are in good agreement with current biological knowledge and demonstrate the feasibility of developing a fully automated system able to describe networks of protein interactions with sufficient accuracy.

我们描述了一个系统的基本设计,用于自动检测从科学摘要中提取的蛋白质-蛋白质相互作用。通过限制问题域和施加一些强有力的假设(包括预先指定的蛋白质名称和代表动作的有限动词集),我们表明有可能执行准确的信息提取。系统的性能与现实世界的相互作用网络的不同情况下进行了评估,包括果蝇细胞周期控制。计算得到的结果与当前的生物学知识很好地一致,并证明了开发一个能够以足够的精度描述蛋白质相互作用网络的全自动系统的可行性。
{"title":"Automatic extraction of biological information from scientific text: protein-protein interactions.","authors":"C Blaschke,&nbsp;M A Andrade,&nbsp;C Ouzounis,&nbsp;A Valencia","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We describe the basic design of a system for automatic detection of protein-protein interactions extracted from scientific abstracts. By restricting the problem domain and imposing a number of strong assumptions which include pre-specified protein names and a limited set of verbs that represent actions, we show that it is possible to perform accurate information extraction. The performance of the system is evaluated with different cases of real-world interaction networks, including the Drosophila cell cycle control. The results obtained computationally are in good agreement with current biological knowledge and demonstrate the feasibility of developing a fully automated system able to describe networks of protein interactions with sufficient accuracy.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A linear time algorithm for finding all maximal scoring subsequences. 寻找所有最大得分子序列的线性时间算法。
W L Ruzzo, M Tompa

Given a sequence of real numbers ("scores"), we present a practical linear time algorithm to find those nonoverlapping, contiguous subsequences having greatest total scores. This improves on the best previously known algorithm, which requires quadratic time in the worst case. The problem arises in biological sequence analysis, where the high-scoring subsequences correspond to regions of unusual composition in a nucleic acid or protein sequence. For instance, Altschul, Karlin, and others have used this approach to identify transmembrane regions, DNA binding domains, and regions of high charge in proteins.

给定一个实数序列(“分数”),我们提出了一个实用的线性时间算法来找到那些具有最大总分的非重叠、连续子序列。这改进了之前已知的最佳算法,在最坏的情况下需要二次的时间。问题出现在生物序列分析中,其中高分子序列对应于核酸或蛋白质序列中不寻常组成的区域。例如,Altschul, Karlin和其他人已经使用这种方法来识别跨膜区域,DNA结合域和蛋白质中的高电荷区域。
{"title":"A linear time algorithm for finding all maximal scoring subsequences.","authors":"W L Ruzzo,&nbsp;M Tompa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Given a sequence of real numbers (\"scores\"), we present a practical linear time algorithm to find those nonoverlapping, contiguous subsequences having greatest total scores. This improves on the best previously known algorithm, which requires quadratic time in the worst case. The problem arises in biological sequence analysis, where the high-scoring subsequences correspond to regions of unusual composition in a nucleic acid or protein sequence. For instance, Altschul, Karlin, and others have used this approach to identify transmembrane regions, DNA binding domains, and regions of high charge in proteins.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. 用于检测、评估和重建EST序列中潜在编码区域的程序。
C Iseli, C V Jongeneel, P Bucher

One of the problems associated with the large-scale analysis of unannotated, low quality EST sequences is the detection of coding regions and the correction of frameshift errors that they often contain. We introduce a new type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors. This model was implemented in an efficient and robust program, ESTScan. We show that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors. In the framework of genome sequencing projects, ESTScan could become a very useful tool for gene discovery, for quality control, and for the assembly of contigs representing the coding regions of genes.

对无注释的低质量EST序列进行大规模分析的问题之一是编码区域的检测和它们通常包含的移码错误的纠正。我们引入了一种新型的隐马尔可夫模型,该模型明确地处理了待分析序列中错误的可能性,并结合了一种纠正这些错误的方法。该模型在一个高效且健壮的程序ESTScan中实现。研究表明,ESTScan能够以高选择性和高灵敏度从低质量序列中检测和提取编码区域,并能够准确地纠正移码错误。在基因组测序计划的框架中,ESTScan可以成为基因发现、质量控制和代表基因编码区域的contigs组装的非常有用的工具。
{"title":"ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences.","authors":"C Iseli,&nbsp;C V Jongeneel,&nbsp;P Bucher","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>One of the problems associated with the large-scale analysis of unannotated, low quality EST sequences is the detection of coding regions and the correction of frameshift errors that they often contain. We introduce a new type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors. This model was implemented in an efficient and robust program, ESTScan. We show that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors. In the framework of genome sequencing projects, ESTScan could become a very useful tool for gene discovery, for quality control, and for the assembly of contigs representing the coding regions of genes.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein fold class prediction: new methods of statistical classification. 蛋白质折叠类预测:统计分类的新方法。
J Grassmann, M Reczko, S Suhai, L Edler

Feed forward neural networks are compared with standard and new statistical classification procedures for the classification of proteins. We applied logistic regression, an additive model and projection pursuit regression from the methods based on a posterior probabilities; linear, quadratic and a flexible discriminant analysis from the methods based on class conditional probabilities, and the K-nearest-neighbors classification rule. Both, the apparent error rate obtained with the training sample (n = 143) and the test error rate obtained with the test sample (n = 125) and the 10-fold cross validation error were calculated. We conclude that some of the standard statistical methods are potent competitors to the more flexible tools of machine learning.

将前馈神经网络与标准的和新的蛋白质分类方法进行了比较。我们应用了逻辑回归、加性模型和基于后验概率的投影寻踪回归;基于类条件概率的线性、二次和柔性判别分析方法,以及k -近邻分类规则。分别计算训练样本(n = 143)的表观错误率、测试样本(n = 125)的测试错误率和10倍交叉验证误差。我们的结论是,一些标准的统计方法是更灵活的机器学习工具的有力竞争者。
{"title":"Protein fold class prediction: new methods of statistical classification.","authors":"J Grassmann,&nbsp;M Reczko,&nbsp;S Suhai,&nbsp;L Edler","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Feed forward neural networks are compared with standard and new statistical classification procedures for the classification of proteins. We applied logistic regression, an additive model and projection pursuit regression from the methods based on a posterior probabilities; linear, quadratic and a flexible discriminant analysis from the methods based on class conditional probabilities, and the K-nearest-neighbors classification rule. Both, the apparent error rate obtained with the training sample (n = 143) and the test error rate obtained with the test sample (n = 125) and the 10-fold cross validation error were calculated. We conclude that some of the standard statistical methods are potent competitors to the more flexible tools of machine learning.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crystallographic threading. 晶体线程。
A Ableson, J I Glasgow

Crystallographic studies play a major role in current efforts towards protein structure determination. Despite recent advances in computational tools for molecular modeling and graphics, the construction of a three-dimensional protein backbone model from crystallographic data remains complex and time-consuming. This paper describes a unique contribution to an automated approach to protein model construction and evaluation, where a model is represented as an annotated trace (or partial trace) of a structure. Candidate models are derived through a topological analysis of the electron density map of a protein. Using sequence alignment techniques, we determine an optimal threading of the known sequence onto the candidate protein structure models. In this threading, connected nodes on the model are associated with adjacent amino acids in the sequence and a fitness score is assigned based on features extracted from the electron density map for the protein. Experimental results demonstrate that crystallographic threading provides an effective means for evaluating the "goodness" of experimentally derived protein models.

晶体学研究在当前蛋白质结构测定中起着重要作用。尽管最近在分子建模和图形计算工具方面取得了进展,但从晶体学数据构建三维蛋白质骨架模型仍然复杂且耗时。本文描述了对蛋白质模型构建和评估的自动化方法的独特贡献,其中模型表示为结构的注释痕迹(或部分痕迹)。候选模型是通过对蛋白质的电子密度图进行拓扑分析得出的。使用序列比对技术,我们确定了已知序列到候选蛋白质结构模型的最佳线程。在该线程中,模型上的连接节点与序列中相邻的氨基酸相关联,并根据从蛋白质的电子密度图中提取的特征分配适应度分数。实验结果表明,晶体穿线是评价实验导出的蛋白质模型“好坏”的有效手段。
{"title":"Crystallographic threading.","authors":"A Ableson,&nbsp;J I Glasgow","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Crystallographic studies play a major role in current efforts towards protein structure determination. Despite recent advances in computational tools for molecular modeling and graphics, the construction of a three-dimensional protein backbone model from crystallographic data remains complex and time-consuming. This paper describes a unique contribution to an automated approach to protein model construction and evaluation, where a model is represented as an annotated trace (or partial trace) of a structure. Candidate models are derived through a topological analysis of the electron density map of a protein. Using sequence alignment techniques, we determine an optimal threading of the known sequence onto the candidate protein structure models. In this threading, connected nodes on the model are associated with adjacent amino acids in the sequence and a fitness score is assigned based on features extracted from the electron density map for the protein. Experimental results demonstrate that crystallographic threading provides an effective means for evaluating the \"goodness\" of experimentally derived protein models.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of ribosomal RNA sequences by combinatorial clustering. 组合聚类分析核糖体RNA序列。
P Xing, C Kulikowski, I Muchnik, I Dubchak, D M Wolf, S Spengler, M Zorn

We present an analysis of multi-aligned eukaryotic and procaryotic small subunit rRNA sequences using a novel segmentation and clustering procedure capable of extracting subsets of sequences that share common sequence features. This procedure consists of: i) segmentation of aligned sequences using a dynamic programming procedure, and subsequent identification of likely conserved segments; ii) for each putative conserved segment, extraction of a locall homogeneous cluster using a novel polynomial procedure; and iii) intersection of clusters associated with each conserved segment. Aside from their utilit in processing large gap-filled multi-alignments, these algorithms can be applied to a broad spectrum of rRNA analysis functions such as subalignment, phylogenetic subtree extraction and construction, and organism tree-placement, and can serve as a framework to organize sequence data in an efficient and easily searchable manner. The sequence classification we obtained using the method presented here shows a remarkable consistency with the independently constructed eukaryotic phylogenetic tree.

我们提出了多对齐真核和原核小亚基rRNA序列的分析,使用一种新的分割和聚类程序能够提取序列的子集,共享共同的序列特征。该过程包括:i)使用动态规划程序对对齐序列进行分割,随后识别可能的保守片段;Ii)对于每个假定的保守段,使用新的多项式过程提取局部齐次聚类;iii)与每个保守段相关联的聚类的交集。这些算法除了可以用于处理大间隙填充的多比对外,还可以应用于广泛的rRNA分析功能,如亚比对、系统发育子树的提取和构建以及生物树的放置,并且可以作为一个框架,以高效和易于搜索的方式组织序列数据。利用本文提出的方法获得的序列分类与独立构建的真核生物系统发育树具有显著的一致性。
{"title":"Analysis of ribosomal RNA sequences by combinatorial clustering.","authors":"P Xing,&nbsp;C Kulikowski,&nbsp;I Muchnik,&nbsp;I Dubchak,&nbsp;D M Wolf,&nbsp;S Spengler,&nbsp;M Zorn","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present an analysis of multi-aligned eukaryotic and procaryotic small subunit rRNA sequences using a novel segmentation and clustering procedure capable of extracting subsets of sequences that share common sequence features. This procedure consists of: i) segmentation of aligned sequences using a dynamic programming procedure, and subsequent identification of likely conserved segments; ii) for each putative conserved segment, extraction of a locall homogeneous cluster using a novel polynomial procedure; and iii) intersection of clusters associated with each conserved segment. Aside from their utilit in processing large gap-filled multi-alignments, these algorithms can be applied to a broad spectrum of rRNA analysis functions such as subalignment, phylogenetic subtree extraction and construction, and organism tree-placement, and can serve as a framework to organize sequence data in an efficient and easily searchable manner. The sequence classification we obtained using the method presented here shows a remarkable consistency with the independently constructed eukaryotic phylogenetic tree.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. International Conference on Intelligent Systems for Molecular Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1