Proceedings. International Conference on Intelligent Systems for Molecular Biology最新文献

英文中文

UTR reconstruction and analysis using genomically aligned EST sequences. 利用基因组比对EST序列进行UTR重建和分析。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

Z Kan, W Gish, E Rouchka, J Glasscock, D States

Untranslated regions (UTR) play important roles in the posttranscriptional regulation of mRNA processing. There is a wealth of UTR-related information to be mined from the rapidly accumulating EST collections. A computational tool, UTR-extender, has been developed to infer UTR sequences from genomically aligned ESTs. It can completely and accurately reconstruct 72% of the 3' UTRs and 15% of the 5' UTRs when tested using 908 functionally cloned transcripts. In addition, it predicts extensions for 11% of the 5' UTRs and 28% of the 3' UTRs. These extension regions are validated by examining splicing frequencies and conservation levels. We also developed a method called polyadenylation site scan (PASS) to precisely map polyadenylation sites in human genomic sequences. A PASS analysis of 908 genic regions estimates that 40-50% of human genes undergo alternative polyadenylation. Using EST redundancy to assess expression levels, we also find that genes with short 3' UTRs tend to be highly expressed.

非翻译区(UTR)在mRNA加工的转录后调控中发挥重要作用。从快速积累的EST集合中可以挖掘出大量与utr相关的信息。一种计算工具，UTR-extender，已经开发出来，从基因组比对的est推断UTR序列。使用908个功能克隆转录本，可以完整准确地重建72%的3' utr和15%的5' utr。此外，它还预测了11%的5' utr和28%的3' utr的延长。通过检查剪接频率和保护水平来验证这些扩展区域。我们还开发了一种称为聚腺苷化位点扫描(PASS)的方法来精确地绘制人类基因组序列中的聚腺苷化位点。对908个基因区进行的PASS分析估计，40-50%的人类基因经历了选择性聚腺苷化。使用EST冗余来评估表达水平，我们还发现具有短3' utr的基因倾向于高表达。

引用次数: 0

Accelerating protein classification using suffix trees. 使用后缀树加速蛋白质分类。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

B Dorohonceanu, C G Nevill-Manning

Position-specific scoring matrices have been used extensively to recognize highly conserved protein regions. We present a method for accelerating these searches using a suffix tree data structure computed from the sequences to be searched. Building on earlier work that allows evaluation of a scoring matrix to be stopped early, the suffix tree-based method excludes many protein segments from consideration at once by pruning entire subtrees. Although suffix trees are usually expensive in space, the fact that scoring matrix evaluation requires an in-order traversal allows nodes to be stored more compactly without loss of speed, and our implementation requires only 17 bytes of primary memory per input symbol. Searches are accelerated by up to a factor of ten.

位置特异性评分矩阵已广泛用于识别高度保守的蛋白质区域。我们提出了一种使用从待搜索序列计算的后缀树数据结构来加速这些搜索的方法。基于早期允许评分矩阵评估提前停止的工作，基于后缀树的方法通过修剪整个子树，将许多蛋白质片段从考虑中排除。虽然后缀树通常在空间上很昂贵，但评分矩阵计算需要按顺序遍历，这一事实允许在不损失速度的情况下更紧凑地存储节点，并且我们的实现只需要每个输入符号17字节的主内存。搜索速度提高了10倍。

引用次数: 0

Computation and visualization of degenerate repeats in complete genomes. 全基因组退化重复序列的计算与可视化。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

S Kurtz, E Ohlebusch, C Schleiermacher, J Stoye, R Giegerich

The repetitive structure of genomic DNA holds many secrets to be discovered. A systematic study of repetitive DNA on a genomic or inter-genomic scale requires extensive algorithmic support. The REPuter family of programs described herein was designed to serve as a fundamental tool in such studies. Efficient and complete detection of various types of repeats is provided together with an evaluation of significance, interactive visualization, and simple interfacing to other analysis programs.

基因组DNA的重复结构蕴藏着许多有待发现的秘密。在基因组或基因组间尺度上对重复DNA的系统研究需要广泛的算法支持。本文所述的repter系列程序旨在作为此类研究的基本工具。提供了各种类型重复序列的高效和完整的检测，以及显著性评估，交互式可视化和与其他分析程序的简单接口。

引用次数: 0

Matching protein beta-sheet partners by feedforward and recurrent neural networks. 利用前馈和递归神经网络匹配蛋白质β -薄片伴侣。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

P Baldi, G Pollastri, C A Andersen, S Brunak

Predicting the secondary structure (alpha-helices, beta-sheets, coils) of proteins is an important step towards understanding their three dimensional conformations. Unlike alpha-helices that are built up from one contiguous region of the polypeptide chain, beta-sheets are more complex resulting from a combination of two or more disjoint regions. The exact nature of these long distance interactions remains unclear. Here we introduce two neural-network based methods for the prediction of amino acid partners in parallel as well as anti-parallel beta-sheets. The neural architectures predict whether two residues located at the center of two distant windows are paired or not in a beta-sheet structure. Variations on these architecture, including also profiles and ensembles, are trained and tested via five-fold cross validation using a large corpus of curated data. Prediction on both coupled and non-coupled residues currently approaches 84% accuracy, better than any previously reported method.

预测蛋白质的二级结构(螺旋状、薄片状、线圈状)是理解蛋白质三维构象的重要一步。与由多肽链的一个连续区域形成的α -螺旋不同，β -薄片由两个或多个不相连区域的组合而成，更为复杂。这些远距离相互作用的确切性质尚不清楚。在这里，我们介绍了两种基于神经网络的方法来预测平行和反平行β -片中的氨基酸伴侣。神经结构预测位于两个遥远窗口中心的两个残基在β -片结构中是否成对。这些架构的变体，包括配置文件和集成，都是通过使用大量管理数据的五倍交叉验证进行训练和测试的。对耦合和非耦合残基的预测目前接近84%的准确率，优于任何先前报道的方法。

引用次数: 0

Pattern recognition of genomic features with microarrays: site typing of Mycobacterium tuberculosis strains. 利用芯片对基因组特征进行模式识别：结核分枝杆菌菌株的部位分型。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

S Raychaudhuri, J M Stuart, X Liu, P M Small, R B Altman

Mycobacterium tuberculosis (M. tb.) strains differ in the number and locations of a transposon-like insertion sequence known as IS6110. Accurate detection of this sequence can be used as a fingerprint for individual strains, but can be difficult because of noisy data. In this paper, we propose a non-parametric discriminant analysis method for predicting the locations of the IS6110 sequence from microarray data. Polymerase chain reaction extension products generated from primers specific for the insertion sequence are hybridized to a microarray containing targets corresponding to each open reading frame in M. tb. To test for insertion sites, we use microarray intensity values extracted from small windows of contiguous open reading frames. Rank-transformation of spot intensities and first-order differences in local windows provide enough information to reliably determine the presence of an insertion sequence. The nonparametric approach outperforms all other methods tested in this study.

结核分枝杆菌（M. tb.）菌株的转座子插入序列 IS6110 的数量和位置各不相同。对这一序列的精确检测可作为单个菌株的指纹图谱，但由于数据嘈杂而难以实现。本文提出了一种非参数判别分析方法，用于从芯片数据中预测 IS6110 序列的位置。将插入序列特异引物产生的聚合酶链反应延伸产物与包含与 M. tb 每个开放阅读框相对应的靶标的微阵列杂交。为了检测插入位点，我们使用从连续开放阅读框的小窗口中提取的微阵列强度值。点强度的秩变换和局部窗口的一阶差异提供了足够的信息，可以可靠地确定插入序列的存在。非参数方法优于本研究中测试的所有其他方法。

引用次数: 0

CLICK: a clustering algorithm with applications to gene expression analysis. CLICK:应用于基因表达分析的聚类算法。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

R Sharan, R Shamir

Novel DNA microarray technologies enable the monitoring of expression levels of thousands of genes simultaneously. This allows a global view on the transcription levels of many (or all) genes when the cell undergoes specific conditions or processes. Analyzing gene expression data requires the clustering of genes into groups with similar expression patterns. We have developed a novel clustering algorithm, called CLICK, which is applicable to gene expression analysis as well as to other biological applications. No prior assumptions are made on the structure or the number of the clusters. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups of highly similar elements (kernels), which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clustering. CLICK has been implemented and tested on a variety of biological datasets, ranging from gene expression, cDNA oligo-fingerprinting to protein sequence similarity. In all those applications it outperformed extant algorithms according to several common figures of merit. CLICK is also very fast, allowing clustering of thousands of elements in minutes, and over 100,000 elements in a couple of hours on a regular workstation.

新的DNA微阵列技术能够同时监测数千个基因的表达水平。当细胞经历特定条件或过程时，这允许对许多(或所有)基因的转录水平进行全局视图。分析基因表达数据需要将基因聚类成具有相似表达模式的组。我们开发了一种新的聚类算法，称为CLICK，它适用于基因表达分析以及其他生物学应用。没有预先对簇的结构或数量做出假设。该算法利用图论和统计技术来识别高度相似的元素(核)的紧密组，这些元素可能属于同一个真正的聚类。然后使用几个启发式过程将内核扩展为完整的聚类。CLICK已经在各种生物数据集上实现和测试，范围从基因表达，cDNA寡聚指纹图谱到蛋白质序列相似性。在所有这些应用中，根据几个常见的优点数字，它优于现有的算法。CLICK的速度也非常快，可以在几分钟内集群数千个元素，在常规工作站上可以在几个小时内集群超过100,000个元素。

{"title":"CLICK: a clustering algorithm with applications to gene expression analysis.","authors":"R Sharan, R Shamir","doi":"","DOIUrl":"","url":null,"abstract":"Novel DNA microarray technologies enable the monitoring of expression levels of thousands of genes simultaneously. This allows a global view on the transcription levels of many (or all) genes when the cell undergoes specific conditions or processes. Analyzing gene expression data requires the clustering of genes into groups with similar expression patterns. We have developed a novel clustering algorithm, called CLICK, which is applicable to gene expression analysis as well as to other biological applications. No prior assumptions are made on the structure or the number of the clusters. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups of highly similar elements (kernels), which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clustering. CLICK has been implemented and tested on a variety of biological datasets, ranging from gene expression, cDNA oligo-fingerprinting to protein sequence similarity. In all those applications it outperformed extant algorithms according to several common figures of merit. CLICK is also very fast, allowing clustering of thousands of elements in minutes, and over 100,000 elements in a couple of hours on a regular workstation.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":"8 ","pages":"307-16"},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21812562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prediction of the number of residue contacts in proteins. 蛋白质中残基接触数的预测。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

P Fariselli, R Casadio

Knowing the number of residue contacts in a protein is crucial for deriving constraints useful in modeling protein folding and/or scoring remote homology search. Here we focus on the prediction of residue contacts and show that this figure can be predicted with a neural network based method. The accuracy of the prediction is 12 percentage points higher than that of a simple statistical method. The neural network is used to discriminate between two different states of residue contacts, characterized by a contact number higher or lower than the average value of the residue distribution. When evolutionary information is taken into account, our method correctly predicts 69% of the residue states in the data base and it adds to the prediction of residue solvent accessibility. The predictor is available at htpp://www.biocomp.unibo.it

了解蛋白质中残基接触的数量对于推导在蛋白质折叠建模和/或远程同源性搜索中有用的约束至关重要。本文重点研究了残差接触的预测，并证明了基于神经网络的方法可以预测残差接触。预测的准确度比简单的统计方法高出12个百分点。神经网络用于区分两种不同状态的残差接触，其特征是接触数高于或低于残差分布的平均值。当考虑进化信息时，我们的方法正确预测了数据库中69%的残留物状态，并且增加了对残留物溶剂可及性的预测。该预测器可从http://www.biocomp.unibo.it获得

引用次数: 0

Combinatorial approaches to finding subtle signals in DNA sequences. 在DNA序列中寻找微妙信号的组合方法。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

P A Pevzner, S H Sze

Signal finding (pattern discovery in unaligned DNA sequences) is a fundamental problem in both computer science and molecular biology with important applications in locating regulatory sites and drug target identification. Despite many studies, this problem is far from being resolved: most signals in DNA sequences are so complicated that we don't yet have good models or reliable algorithms for their recognition. We complement existing statistical and machine learning approaches to this problem by a combinatorial approach that proved to be successful in identifying very subtle signals.

信号发现(在未对齐DNA序列中发现模式)是计算机科学和分子生物学中的一个基本问题，在定位调控位点和药物靶标鉴定中具有重要应用。尽管进行了许多研究，但这个问题远未得到解决:DNA序列中的大多数信号都非常复杂，我们还没有很好的模型或可靠的算法来识别它们。我们通过一种组合方法来补充现有的统计和机器学习方法来解决这个问题，这种组合方法被证明在识别非常微妙的信号方面是成功的。

引用次数: 0

Search for a new description of protein topology and local structure. 寻找蛋白质拓扑结构和局部结构的新描述。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

L Jaroszewski, A Godzik

A novel description of protein structure in terms of the generalized secondary structure elements (GSSE) is proposed. GSSE's are defined as fragments of the protein structure where the chain doesn't radically change its direction. In this new language, global protein topology becomes a particular arrangement of the relatively small number of large, rod like GSSE's. Protein topology can be described by an adjacency matrix giving information, which GSSE's are close in space to each other and defining a graph, where GSSE's are equivalent to vertices and interactions between them to edges. The information about the local structure is translated into the local density of pseudo-Calpha atoms along the chain and the curvature of the chain. This new description has a number of interesting and useful features. For instance, enumeration theorems of graph theory can be used to estimate a number of possible topologies for a protein built from a given number of elements. Different topologies, including novel ones, can be generated from the known by various permutations of elements. Many new regularities in protein structures become suddenly visible in a new description. A new local structure description is more amenable to predictions and easier to use in fold predictions.

提出了一种用广义二级结构元(GSSE)来描述蛋白质结构的新方法。GSSE被定义为蛋白质结构的片段，其中链不会从根本上改变其方向。在这种新的语言中，全局蛋白质拓扑结构变成了相对少量的大棒状GSSE的特殊排列。蛋白质拓扑可以用邻接矩阵来描述，该邻接矩阵给出了GSSE在空间上彼此接近的信息，并定义了一个图，其中GSSE相当于顶点，它们之间的相互作用相当于边。有关局部结构的信息被转化为沿链的伪α原子的局部密度和链的曲率。这个新的描述有许多有趣和有用的特性。例如，图论的枚举定理可以用来估计由给定数量的元素构成的蛋白质的可能拓扑的数量。不同的拓扑，包括新的拓扑，可以从已知的元素的各种排列中生成。在新的描述中，蛋白质结构的许多新规律突然变得清晰可见。新的局部结构描述更易于预测，更易于在折叠预测中使用。

{"title":"Search for a new description of protein topology and local structure.","authors":"L Jaroszewski, A Godzik","doi":"","DOIUrl":"","url":null,"abstract":"A novel description of protein structure in terms of the generalized secondary structure elements (GSSE) is proposed. GSSE's are defined as fragments of the protein structure where the chain doesn't radically change its direction. In this new language, global protein topology becomes a particular arrangement of the relatively small number of large, rod like GSSE's. Protein topology can be described by an adjacency matrix giving information, which GSSE's are close in space to each other and defining a graph, where GSSE's are equivalent to vertices and interactions between them to edges. The information about the local structure is translated into the local density of pseudo-Calpha atoms along the chain and the curvature of the chain. This new description has a number of interesting and useful features. For instance, enumeration theorems of graph theory can be used to estimate a number of possible topologies for a protein built from a given number of elements. Different topologies, including novel ones, can be generated from the known by various permutations of elements. Many new regularities in protein structures become suddenly visible in a new description. A new local structure description is more amenable to predictions and easier to use in fold predictions.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":"8 ","pages":"211-7"},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21811347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Small subunit ribosomal RNA modeling using stochastic context-free grammars. 使用随机上下文无关语法的小亚基核糖体RNA建模。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 2000-01-01

M P Brown

We introduce a model based on stochastic context-free grammars (SCFGs) that can construct small subunit ribosomal RNA (SSU rRNA) multiple alignments. The method takes into account both primary sequence and secondary structure basepairing interactions. We show that this method produces multiple alignments of quality close to hand edited ones and outperforms several other methods. We also introduce a method of SCFG constraints that dramatically reduces the required computer resources needed to effectively use SCFGs on large problems such as SSU rRNA. Without such constraints, the required computer resources are infeasible for most computers. This work has applications to fields such as phylogenetic tree construction.

我们介绍了一个基于随机上下文无关语法(scfg)的模型，该模型可以构建小亚基核糖体RNA (SSU rRNA)多序列。该方法考虑了一级序列和二级结构的碱基修复相互作用。我们表明，这种方法产生了多个接近手工编辑的质量对齐，并且优于其他几种方法。我们还介绍了一种SCFG约束方法，该方法大大减少了在SSU rRNA等大型问题上有效使用SCFG所需的计算机资源。没有这样的限制，所需的计算机资源对大多数计算机来说是不可行的。这项工作可以应用于系统发育树的构建等领域。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. International Conference on Intelligent Systems for Molecular Biology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀