首页 > 最新文献

Proceedings. International Conference on Intelligent Systems for Molecular Biology最新文献

英文 中文
A new fast heuristic for computing the breakpoint phylogeny and experimental phylogenetic analyses of real and synthetic data. 一个新的快速启发式计算断点系统发育和实验系统发育分析的真实和合成数据。
M E Cosner, R K Jansen, B M Moret, L A Raubeson, L S Wang, T Warnow, S Wyman

The breakpoint phylogeny is an optimization problem proposed by Blanchette et al. for reconstructing evolutionary trees from gene order data. These same authors also developed and implemented BPAnalysis [3], a heuristic method (based upon solving many instances of the travelling salesman problem) for estimating the breakpoint phylogeny. We present a new heuristic for this purpose; although not polynomial-time, our heuristic is much faster in practice than BPAnalysis. We present and discuss the results of experimentation on synthetic datasets and on the flowering plant family Campanulaceae with three methods: our new method, BPAnalysis, and the neighbor-joining method [25] using several distance estimation techniques. Our preliminary results indicate that, on datasets with slow evolutionary rates and large numbers of genes in comparison with the number of taxa (genomes), all methods recover quite accurate reconstructions of the true evolutionary history (although BPAnalysis is too slow to be practical), but that on datasets where the rate of evolution is high relative to the number of genes, the accuracy of all three methods is poor.

断点系统发育是Blanchette等人提出的从基因序列数据重构进化树的优化问题。这些作者还开发并实现了BPAnalysis[3],这是一种用于估计断点系统发育的启发式方法(基于解决旅行推销员问题的许多实例)。为此,我们提出了一种新的启发式;虽然不是多项式时间,但我们的启发式在实践中比bp分析法快得多。我们介绍并讨论了三种方法在合成数据集和开花植物家族Campanulaceae上的实验结果:我们的新方法BPAnalysis和使用几种距离估计技术的邻居连接方法[25]。我们的初步结果表明,与分类群(基因组)数量相比,在进化速度较慢且基因数量较多的数据集上,所有方法都能相当准确地重建真实的进化史(尽管bp分析法速度太慢而不实用),但在进化速度相对于基因数量较高的数据集上,所有三种方法的准确性都很差。
{"title":"A new fast heuristic for computing the breakpoint phylogeny and experimental phylogenetic analyses of real and synthetic data.","authors":"M E Cosner,&nbsp;R K Jansen,&nbsp;B M Moret,&nbsp;L A Raubeson,&nbsp;L S Wang,&nbsp;T Warnow,&nbsp;S Wyman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The breakpoint phylogeny is an optimization problem proposed by Blanchette et al. for reconstructing evolutionary trees from gene order data. These same authors also developed and implemented BPAnalysis [3], a heuristic method (based upon solving many instances of the travelling salesman problem) for estimating the breakpoint phylogeny. We present a new heuristic for this purpose; although not polynomial-time, our heuristic is much faster in practice than BPAnalysis. We present and discuss the results of experimentation on synthetic datasets and on the flowering plant family Campanulaceae with three methods: our new method, BPAnalysis, and the neighbor-joining method [25] using several distance estimation techniques. Our preliminary results indicate that, on datasets with slow evolutionary rates and large numbers of genes in comparison with the number of taxa (genomes), all methods recover quite accurate reconstructions of the true evolutionary history (although BPAnalysis is too slow to be practical), but that on datasets where the rate of evolution is high relative to the number of genes, the accuracy of all three methods is poor.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21812143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genes, themes and microarrays: using information retrieval for large-scale gene analysis. 基因、主题和微阵列:利用信息检索进行大规模基因分析。
H Shatkay, S Edwards, W J Wilbur, M Boguski

The immense volume of data resulting from DNA microarray experiments, accompanied by an increase in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analysis of gene expression patterns. Clustering indeed reveals potentially meaningful relationships among genes, but can not explain the underlying biological mechanisms. In an attempt to address this problem, we have developed a new approach for utilizing the literature in order to establish functional relationships among genes on a genome-wide scale. Our method is based on revealing coherent themes within the literature, using a similarity-based search in document space. Content-based relationships among abstracts are then translated into functional connections among genes. We describe preliminary experiments applying our algorithm to a database of documents discussing yeast genes. A comparison of the produced results with well-established yeast gene functions demonstrates the effectiveness of our approach.

DNA微阵列实验产生的大量数据,伴随着讨论基因相关发现的出版物数量的增加,提出了一个主要的数据分析挑战。目前的全基因组表达数据分析方法通常依赖于基因表达模式的聚类分析。聚类确实揭示了基因之间潜在的有意义的关系,但不能解释潜在的生物学机制。为了解决这个问题,我们开发了一种利用文献的新方法,以便在全基因组范围内建立基因之间的功能关系。我们的方法是基于揭示文献中连贯的主题,在文档空间中使用基于相似性的搜索。摘要之间基于内容的关系随后被转化为基因之间的功能联系。我们描述了将我们的算法应用于讨论酵母基因的文件数据库的初步实验。将所产生的结果与已建立的酵母基因功能进行比较,证明了我们方法的有效性。
{"title":"Genes, themes and microarrays: using information retrieval for large-scale gene analysis.","authors":"H Shatkay,&nbsp;S Edwards,&nbsp;W J Wilbur,&nbsp;M Boguski","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The immense volume of data resulting from DNA microarray experiments, accompanied by an increase in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analysis of gene expression patterns. Clustering indeed reveals potentially meaningful relationships among genes, but can not explain the underlying biological mechanisms. In an attempt to address this problem, we have developed a new approach for utilizing the literature in order to establish functional relationships among genes on a genome-wide scale. Our method is based on revealing coherent themes within the literature, using a similarity-based search in document space. Content-based relationships among abstracts are then translated into functional connections among genes. We describe preliminary experiments applying our algorithm to a database of documents discussing yeast genes. A comparison of the produced results with well-established yeast gene functions demonstrates the effectiveness of our approach.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21812563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining for putative regulatory elements in the yeast genome using gene expression data. 利用基因表达数据挖掘酵母基因组中可能的调控元件。
J Vilo, A Brazma, I Jonassen, A Robinson, E Ukkonen

We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance threshold limit detection, selection of interesting patterns, grouping of these patterns, representing the pattern groups in a concise form and evaluating the discovered putative signals against existing databases of regulatory signals. The pattern discovery is computationally the most expensive and crucial step. Our tool performs a rapid exhaustive search for a priori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster with respect to a set of background sequences allowing the detection of subtle regulatory signals specific for each cluster. The potentially large number of significant patterns is reduced to a small number of groups by clustering them by mutual similarity. Automatically derived consensus patterns of these groups represent the results in a comprehensive way for a human investigator. We have performed a systematic analysis for the yeast Saccharomyces cerevisiae. We created a large number of independent clusterings of expression data simultaneously assessing the "goodness" of each cluster. For each of the over 52,000 clusters acquired in this way we discovered significant patterns in the upstream sequences of respective genes. We selected nearly 1,500 significant patterns by formal criteria and matched them against the experimentally mapped transcription factor binding sites in the SCPD database. We clustered the 1,500 patterns to 62 groups for which we derived automatically alignments and consensus patterns. Of these 62 groups 48 had patterns that have matching sites in SCPD database.

我们已经开发了一套方法和工具来自动发现基因组序列中假定的调节信号。分析管道包括基因表达数据聚类、从上游基因序列中发现序列模式、模式显著性阈值限制检测的控制实验、有趣模式的选择、这些模式的分组、以简洁的形式表示模式组以及根据现有的调控信号数据库评估发现的假设信号。模式发现在计算上是最昂贵和最关键的一步。我们的工具执行一个快速穷尽搜索先验未知的统计显著序列模式的无限制长度。相对于一组背景序列,确定每个簇中的一组序列的统计显著性,允许检测特定于每个簇的细微调节信号。通过相互相似性聚类,将潜在的大量重要模式减少到少量组。这些群体的自动衍生的共识模式代表了人类研究者以全面的方式得出的结果。我们对酿酒酵母进行了系统的分析。我们创建了大量独立的表达数据聚类,同时评估每个聚类的“好”度。对于以这种方式获得的52,000多个集群中的每一个,我们在各自基因的上游序列中发现了重要的模式。我们根据正式标准选择了近1500个显著模式,并将它们与SCPD数据库中实验绘制的转录因子结合位点进行匹配。我们将1500个模式聚集到62个组中,并为其自动导出对齐和一致模式。在这62组中,48组的模式在SCPD数据库中有匹配的位点。
{"title":"Mining for putative regulatory elements in the yeast genome using gene expression data.","authors":"J Vilo,&nbsp;A Brazma,&nbsp;I Jonassen,&nbsp;A Robinson,&nbsp;E Ukkonen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance threshold limit detection, selection of interesting patterns, grouping of these patterns, representing the pattern groups in a concise form and evaluating the discovered putative signals against existing databases of regulatory signals. The pattern discovery is computationally the most expensive and crucial step. Our tool performs a rapid exhaustive search for a priori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster with respect to a set of background sequences allowing the detection of subtle regulatory signals specific for each cluster. The potentially large number of significant patterns is reduced to a small number of groups by clustering them by mutual similarity. Automatically derived consensus patterns of these groups represent the results in a comprehensive way for a human investigator. We have performed a systematic analysis for the yeast Saccharomyces cerevisiae. We created a large number of independent clusterings of expression data simultaneously assessing the \"goodness\" of each cluster. For each of the over 52,000 clusters acquired in this way we discovered significant patterns in the upstream sequences of respective genes. We selected nearly 1,500 significant patterns by formal criteria and matched them against the experimentally mapped transcription factor binding sites in the SCPD database. We clustered the 1,500 patterns to 62 groups for which we derived automatically alignments and consensus patterns. Of these 62 groups 48 had patterns that have matching sites in SCPD database.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21813098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a systematics for protein subcelluar location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images. 迈向蛋白质亚细胞定位的分类学:蛋白质定位模式的定量描述和荧光显微镜图像的自动分析。
R F Murphy, M V Boland, M Velliste

Determination of the functions of all expressed proteins represents one of the major upcoming challenges in computational molecular biology. Since subcellular location plays a crucial role in protein function, the availability of systems that can predict location from sequence or high-throughput systems that determine location experimentally will be essential to the full characterization of expressed proteins. The development of prediction systems is currently hindered by an absence of training data that adequately captures the complexity of protein localization patterns. What is needed is a systematics for the subcellular locations of proteins. This paper describes an approach to the quantitative description of protein localization patterns using numerical features and the use of these features to develop classifiers that can recognize all major subcellular structures in fluorescence microscope images. Such classifiers provide a valuable tool for experiments aimed at determining the subcellular distributions of all expressed proteins. The features also have application in automated interpretation of imaging experiments, such as the selection of representative images or the rigorous statistical comparison of protein distributions under different experimental conditions. A key conclusion is that, at least in certain cases, these automated approaches are better able to distinguish similar protein localization patterns than human observers.

确定所有表达蛋白的功能是计算分子生物学即将面临的主要挑战之一。由于亚细胞定位在蛋白质功能中起着至关重要的作用,因此可以从序列中预测位置的系统或通过实验确定位置的高通量系统的可用性对于表达蛋白的全面表征至关重要。预测系统的发展目前受到缺乏充分捕捉蛋白质定位模式复杂性的训练数据的阻碍。我们需要的是蛋白质亚细胞位置的系统学。本文描述了一种使用数字特征定量描述蛋白质定位模式的方法,并利用这些特征开发可以识别荧光显微镜图像中所有主要亚细胞结构的分类器。这种分类器为旨在确定所有表达蛋白的亚细胞分布的实验提供了有价值的工具。这些特征在成像实验的自动解释中也有应用,例如选择代表性图像或在不同实验条件下对蛋白质分布进行严格的统计比较。一个关键的结论是,至少在某些情况下,这些自动化方法比人类观察者更能区分相似的蛋白质定位模式。
{"title":"Towards a systematics for protein subcelluar location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images.","authors":"R F Murphy,&nbsp;M V Boland,&nbsp;M Velliste","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Determination of the functions of all expressed proteins represents one of the major upcoming challenges in computational molecular biology. Since subcellular location plays a crucial role in protein function, the availability of systems that can predict location from sequence or high-throughput systems that determine location experimentally will be essential to the full characterization of expressed proteins. The development of prediction systems is currently hindered by an absence of training data that adequately captures the complexity of protein localization patterns. What is needed is a systematics for the subcellular locations of proteins. This paper describes an approach to the quantitative description of protein localization patterns using numerical features and the use of these features to develop classifiers that can recognize all major subcellular structures in fluorescence microscope images. Such classifiers provide a valuable tool for experiments aimed at determining the subcellular distributions of all expressed proteins. The features also have application in automated interpretation of imaging experiments, such as the selection of representative images or the rigorous statistical comparison of protein distributions under different experimental conditions. A key conclusion is that, at least in certain cases, these automated approaches are better able to distinguish similar protein localization patterns than human observers.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21811351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Glimmers in the midnight zone: characterization of aligned identical residues in sequence-dissimilar proteins sharing a common fold. 午夜区的微光:序列不同的蛋白质中排列相同残基的特征,它们共享一个共同的折叠。
I Friedberg, T Kaplan, H Margalit

Sequence comparison of proteins that adopt the same fold has revealed a large degree of sequence variation. There are many pairs of structurally similar proteins with only a very low percentage of identical residues at structurally aligned positions. It is not clear whether these few identical residues have been conserved just by coincidence, or due to their structural and/or functional role The current study focuses on characterization of STructurally Aligned Identical ResidueS (STAIRS) in a data set of protein pairs that are structurally similar but sequentially dissimilar. The conservation pattern of the residues at structurally aligned positions has been characterized within the protein families of the two pair members, and mutually highly and weakly conserved positions of STAIRS could be identified About 40% of the STAIRS are only moderately conserved, suggesting that their maintenance may have been coincidental. The mutually highly conserved STAIRS show distinct features that are associated with protein structure and function: a relatively high fraction of these STAIRS are buried within their protein structures. Glycine, cysteine, histidine, and tryptophan are significantly over-represented among the mutually conserved STAIRS. A detailed survey of these STAIRS reveals residue-specific roles in the determination of the protein's structure and function.

采用相同折叠的蛋白质的序列比较揭示了很大程度的序列差异。有许多对结构相似的蛋白质,只有非常低百分比的相同残基在结构对齐的位置。目前尚不清楚这些相同的残基是否只是巧合,还是由于它们的结构和/或功能作用而保守。目前的研究重点是在结构相似但序列不同的蛋白质对数据集中对结构对齐的相同残基(STAIRS)进行表征。在两对成员的蛋白家族中,结构对齐位置的残基的保守模式已经被表征出来,并且可以鉴定出楼梯的相互高度和弱保守位置,大约40%的楼梯只有中等保守,这表明它们的维持可能是巧合。相互高度保守的楼梯显示出与蛋白质结构和功能相关的独特特征:这些楼梯中相对较高的一部分隐藏在它们的蛋白质结构中。甘氨酸、半胱氨酸、组氨酸和色氨酸在相互保守的阶梯中明显过多。对这些阶梯的详细调查揭示了残基在决定蛋白质结构和功能中的特定作用。
{"title":"Glimmers in the midnight zone: characterization of aligned identical residues in sequence-dissimilar proteins sharing a common fold.","authors":"I Friedberg,&nbsp;T Kaplan,&nbsp;H Margalit","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Sequence comparison of proteins that adopt the same fold has revealed a large degree of sequence variation. There are many pairs of structurally similar proteins with only a very low percentage of identical residues at structurally aligned positions. It is not clear whether these few identical residues have been conserved just by coincidence, or due to their structural and/or functional role The current study focuses on characterization of STructurally Aligned Identical ResidueS (STAIRS) in a data set of protein pairs that are structurally similar but sequentially dissimilar. The conservation pattern of the residues at structurally aligned positions has been characterized within the protein families of the two pair members, and mutually highly and weakly conserved positions of STAIRS could be identified About 40% of the STAIRS are only moderately conserved, suggesting that their maintenance may have been coincidental. The mutually highly conserved STAIRS show distinct features that are associated with protein structure and function: a relatively high fraction of these STAIRS are buried within their protein structures. Glycine, cysteine, histidine, and tryptophan are significantly over-represented among the mutually conserved STAIRS. A detailed survey of these STAIRS reveals residue-specific roles in the determination of the protein's structure and function.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21812149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An exact algorithm to identify motifs in orthologous sequences from multiple species. 一个精确的算法,以识别基序在同源序列从多个物种。
M Blanchette, B Schwikowski, M Tompa

The identification of sequence motifs is a fundamental method for suggesting good candidates for biologically functional regions such as promoters, splice sites, binding sites, etc. We investigate the following approach to identifying motifs: given a collection of orthologous sequences from multiple species related by a known phylogenetic tree, search for motifs that are well conserved (according to a parsimony measure) in the species. We present an exact algorithm for solving this problem. We then discuss experimental results on finding promoters of the rbcS gene for a family of 10 plants, on finding promoters of the adh gene for 12 Drosophila species, and on finding promoters of several chloroplast encoded genes.

序列基序的鉴定是确定启动子、剪接位点、结合位点等生物功能区域的基本方法。我们研究了以下识别基序的方法:给定来自已知系统发育树相关的多个物种的同源序列集合,在物种中搜索保守性良好的基序(根据简约性测量)。我们提出了一个精确的算法来解决这个问题。然后,我们讨论了在10个植物家族中发现红细胞基因启动子的实验结果,在12个果蝇物种中发现adh基因启动子的实验结果,以及在几个叶绿体编码基因的启动子的实验结果。
{"title":"An exact algorithm to identify motifs in orthologous sequences from multiple species.","authors":"M Blanchette,&nbsp;B Schwikowski,&nbsp;M Tompa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The identification of sequence motifs is a fundamental method for suggesting good candidates for biologically functional regions such as promoters, splice sites, binding sites, etc. We investigate the following approach to identifying motifs: given a collection of orthologous sequences from multiple species related by a known phylogenetic tree, search for motifs that are well conserved (according to a parsimony measure) in the species. We present an exact algorithm for solving this problem. We then discuss experimental results on finding promoters of the rbcS gene for a family of 10 plants, on finding promoters of the adh gene for 12 Drosophila species, and on finding promoters of several chloroplast encoded genes.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21812195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust parametric and semi-parametric spot fitting for spot array images. 点阵图像的鲁棒参数和半参数点拟合。
N Brändle, H Y Chen, H Bischof, H Lapp

In this paper we address the problem of reliably fitting parametric and semi-parametric models to spots in high density spot array images obtained in gene expression experiments. The goal is to measure the amount of label bound to an array element. A lot of spots can be modelled accurately by a Gaussian shape. In order to deal with highly overlapping spots we use robust M-estimators. When the parametric method fails (which can be detected automatically) we use a novel, robust semi-parametric method which can handle spots of different shapes accurately. The introduced techniques are evaluated experimentally.

本文解决了基因表达实验中高密度斑点阵列图像中斑点的参数和半参数模型的可靠拟合问题。目标是测量绑定到数组元素的标签的数量。许多点可以用高斯形状精确地建模。为了处理高度重叠的点,我们使用了稳健的m估计量。当参数方法失效时(可以自动检测),我们采用了一种新颖的、鲁棒的半参数方法,可以准确地处理不同形状的斑点。对所介绍的技术进行了实验评价。
{"title":"Robust parametric and semi-parametric spot fitting for spot array images.","authors":"N Brändle,&nbsp;H Y Chen,&nbsp;H Bischof,&nbsp;H Lapp","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In this paper we address the problem of reliably fitting parametric and semi-parametric models to spots in high density spot array images obtained in gene expression experiments. The goal is to measure the amount of label bound to an array element. A lot of spots can be modelled accurately by a Gaussian shape. In order to deal with highly overlapping spots we use robust M-estimators. When the parametric method fails (which can be detected automatically) we use a novel, robust semi-parametric method which can handle spots of different shapes accurately. The introduced techniques are evaluated experimentally.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21812196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An evaluation of ontology exchange languages for bioinformatics. 生物信息学本体交换语言评价。
R McEntire, P Karp, N Abernethy, D Benton, G Helt, M DeJongh, R Kent, A Kosky, S Lewis, D Hodnett, E Neumann, F Olken, D Pathak, P Tarczy-Hornoch, L Toldo, T Topaloglou

Ontologies are specifications of the concepts in a given field, and of the relationships among those concepts. The development of ontologies for molecular-biology information and the sharing of those ontologies within the bioinformatics community are central problems in bioinformatics. If the bioinformatics community is to share ontologies effectively, ontologies must be exchanged in a form that uses standardized syntax and semantics. This paper reports on an effort among the authors to evaluate alternative ontology-exchange languages, and to recommend one or more languages for use within the larger bioinformatics community. The study selected a set of candidate languages, and defined a set of capabilities that the ideal ontology-exchange language should satisfy. The study scored the languages according to the degree to which they satisfied each capability. In addition, the authors performed several ontology-exchange experiments with the two languages that received the highest scores: OML and Ontolingua. The result of those experiments, and the main conclusion of this study, was that the frame-based semantic model of Ontolingua is preferable to the conceptual graph model of OML, but that the XML-based syntax of OML is preferable to the Lisp-based syntax of Ontolingua.

本体是给定领域中概念的规范,以及这些概念之间的关系的规范。分子生物学信息本体的发展和生物信息学社区内这些本体的共享是生物信息学的核心问题。如果生物信息学社区要有效地共享本体,本体必须以使用标准化语法和语义的形式进行交换。这篇论文报告了作者之间的一项努力,以评估替代本体交换语言,并推荐一种或多种语言在更大的生物信息学社区中使用。该研究选择了一组候选语言,并定义了理想的本体交换语言应该满足的一组功能。该研究根据语言满足每种能力的程度对语言进行评分。此外,作者还对得分最高的两种语言(OML和Ontolingua)进行了几次本体交换实验。这些实验的结果以及本研究的主要结论是,基于框架的Ontolingua语义模型优于OML的概念图模型,而基于xml的OML语法优于基于lisp的Ontolingua语法。
{"title":"An evaluation of ontology exchange languages for bioinformatics.","authors":"R McEntire,&nbsp;P Karp,&nbsp;N Abernethy,&nbsp;D Benton,&nbsp;G Helt,&nbsp;M DeJongh,&nbsp;R Kent,&nbsp;A Kosky,&nbsp;S Lewis,&nbsp;D Hodnett,&nbsp;E Neumann,&nbsp;F Olken,&nbsp;D Pathak,&nbsp;P Tarczy-Hornoch,&nbsp;L Toldo,&nbsp;T Topaloglou","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Ontologies are specifications of the concepts in a given field, and of the relationships among those concepts. The development of ontologies for molecular-biology information and the sharing of those ontologies within the bioinformatics community are central problems in bioinformatics. If the bioinformatics community is to share ontologies effectively, ontologies must be exchanged in a form that uses standardized syntax and semantics. This paper reports on an effort among the authors to evaluate alternative ontology-exchange languages, and to recommend one or more languages for use within the larger bioinformatics community. The study selected a set of candidate languages, and defined a set of capabilities that the ideal ontology-exchange language should satisfy. The study scored the languages according to the degree to which they satisfied each capability. In addition, the authors performed several ontology-exchange experiments with the two languages that received the highest scores: OML and Ontolingua. The result of those experiments, and the main conclusion of this study, was that the frame-based semantic model of Ontolingua is preferable to the conceptual graph model of OML, but that the XML-based syntax of OML is preferable to the Lisp-based syntax of Ontolingua.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21811350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biclustering of expression data. 表达式数据的双聚类。
Y Cheng, G M Church

An efficient node-deletion algorithm is introduced to find submatrices in expression data that have low mean squared residue scores and it is shown to perform well in finding co-regulation patterns in yeast and human. This introduces "biclustering", or simultaneous clustering of both genes and conditions, to knowledge discovery from expression data. This approach overcomes some problems associated with traditional clustering methods, by allowing automatic discovery of similarity based on a subset of attributes, simultaneous clustering of genes and conditions, and overlapped grouping that provides a better representation for genes with multiple functions or regulated by many factors.

介绍了一种高效的节点删除算法,用于在表达数据中寻找具有低均方残差分数的子矩阵,并在酵母和人类中显示出良好的共调控模式。这引入了“双聚类”,或基因和条件的同时聚类,以从表达数据中发现知识。该方法克服了传统聚类方法存在的一些问题,允许基于属性子集的相似性自动发现、基因和条件的同时聚类以及为具有多种功能或受多种因素调节的基因提供更好的表示的重叠分组。
{"title":"Biclustering of expression data.","authors":"Y Cheng,&nbsp;G M Church","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>An efficient node-deletion algorithm is introduced to find submatrices in expression data that have low mean squared residue scores and it is shown to perform well in finding co-regulation patterns in yeast and human. This introduces \"biclustering\", or simultaneous clustering of both genes and conditions, to knowledge discovery from expression data. This approach overcomes some problems associated with traditional clustering methods, by allowing automatic discovery of similarity based on a subset of attributes, simultaneous clustering of genes and conditions, and overlapped grouping that provides a better representation for genes with multiple functions or regulated by many factors.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21812142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing mass degeneracy in SAR by MS by stable isotopic labeling. 稳定同位素标记的质谱法还原SAR中的质量简并。
C Bailey-Kellogg, J J Kelley, C Stein, B R Donald

Mass spectrometry (MS) promises to be an invaluable tool for functional genomics, by supporting low-cost, high-throughput experiments. However, large-scale MS faces the potential problem of mass degeneracy--indistinguishable masses for multiple biopolymer fragments (e.g. from a limited proteolytic digest). This paper studies the tasks of planning and interpreting MS experiments that use selective isotopic labeling, thereby substantially reducing potential mass degeneracy. Our algorithms support an experimental-computational protocol called Structure-Activity Relation by Mass Spectrometry (SAR by MS), for elucidating the function of protein-DNA and protein-protein complexes. SAR by MS enzymatically cleaves a crosslinked complex and analyzes the resulting mass spectrum for mass peaks of hypothesized fragments. Depending on binding mode, some cleavage sites will be shielded; the absence of anticipated peaks implicates corresponding fragments as either part of the interaction region or inaccessible due to conformational change upon binding. Thus different mass spectra provide evidence for different structure-activity relations. We address combinatorial and algorithmic questions in the areas of data analysis (constraining binding mode based on mass signature) and experiment planning (determining an isotopic labeling strategy to reduce mass degeneracy and aid data analysis). We explore the computational complexity of these problems, obtaining upper and lower bounds. We report experimental results from implementations of our algorithms.

质谱(MS)通过支持低成本、高通量的实验,有望成为功能基因组学的宝贵工具。然而,大规模质谱面临着质量退化的潜在问题——多个生物聚合物片段(例如,来自有限的蛋白水解消化)无法区分的质量。本文研究了计划和解释使用选择性同位素标记的质谱实验的任务,从而大大降低了潜在的质量简并。我们的算法支持一种称为质谱结构-活性关系(SAR by MS)的实验计算方案,用于阐明蛋白质- dna和蛋白质-蛋白质复合物的功能。通过质谱分析合成SAR酶切交联复合物,并分析产生的质谱为假设片段的质量峰。根据结合方式的不同,一些裂解位点会被屏蔽;预期峰的缺失意味着相应的片段要么是相互作用区域的一部分,要么是由于结合时构象的变化而无法进入的。因此,不同的质谱为不同的构效关系提供了证据。我们解决了数据分析(基于质量签名的约束绑定模式)和实验规划(确定同位素标记策略以减少质量简并并辅助数据分析)领域的组合和算法问题。我们探讨了这些问题的计算复杂度,得到了上界和下界。我们报告了算法实现的实验结果。
{"title":"Reducing mass degeneracy in SAR by MS by stable isotopic labeling.","authors":"C Bailey-Kellogg,&nbsp;J J Kelley,&nbsp;C Stein,&nbsp;B R Donald","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Mass spectrometry (MS) promises to be an invaluable tool for functional genomics, by supporting low-cost, high-throughput experiments. However, large-scale MS faces the potential problem of mass degeneracy--indistinguishable masses for multiple biopolymer fragments (e.g. from a limited proteolytic digest). This paper studies the tasks of planning and interpreting MS experiments that use selective isotopic labeling, thereby substantially reducing potential mass degeneracy. Our algorithms support an experimental-computational protocol called Structure-Activity Relation by Mass Spectrometry (SAR by MS), for elucidating the function of protein-DNA and protein-protein complexes. SAR by MS enzymatically cleaves a crosslinked complex and analyzes the resulting mass spectrum for mass peaks of hypothesized fragments. Depending on binding mode, some cleavage sites will be shielded; the absence of anticipated peaks implicates corresponding fragments as either part of the interaction region or inaccessible due to conformational change upon binding. Thus different mass spectra provide evidence for different structure-activity relations. We address combinatorial and algorithmic questions in the areas of data analysis (constraining binding mode based on mass signature) and experiment planning (determining an isotopic labeling strategy to reduce mass degeneracy and aid data analysis). We explore the computational complexity of these problems, obtaining upper and lower bounds. We report experimental results from implementations of our algorithms.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21812193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. International Conference on Intelligent Systems for Molecular Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1