首页 > 最新文献

Proceedings. IEEE Computational Systems Bioinformatics Conference最新文献

英文 中文
High-throughput 3D structural homology detection via NMR resonance assignment. 通过核磁共振分配的高通量三维结构同源性检测。
Christopher James Langmead, Bruce Randall Donald

One goal of the structural genomics initiative is the identification of new protein folds. Sequence-based structural homology prediction methods are an important means for prioritizing unknown proteins for structure determination. However, an important challenge remains: two highly dissimilar sequences can have similar folds & how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure, called HD, for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies 3D models in a protein structural database whose geometries best fit the unassigned experimental NMR data. HD does not use, and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or homology modelling. The algorithm runs in O(pn + pn(5/2) log (cn)+p log p) time, where p is the number of proteins in the database, n is the number of residues in the target protein and c is the maximum edge weight in an integer-weighted bipartite graph. Our experiments on real NMR data from 3 different proteins against a database of 4,500 representative folds demonstrate that the method identifies closely related protein folds, including sub-domains of larger proteins, with as little as 10-30% sequence homology between the target protein (or sub-domain) and the computed model. In particular, we report no false-negatives or false-positives despite significant percentages of missing experimental data.

结构基因组学倡议的一个目标是鉴定新的蛋白质折叠。基于序列的结构同源性预测方法是确定未知蛋白结构优先级的重要手段。然而,一个重要的挑战仍然存在:两个高度不同的序列可以有相似的折叠&我们如何在结构基因组学的背景下快速检测到这一点?高通量核磁共振实验,加上新的数据分析算法,可以解决这一挑战。我们报告了一种称为HD的自动化程序,用于从稀疏的未分配蛋白质NMR数据中检测3D结构同源性。我们的方法在蛋白质结构数据库中识别三维模型,其几何形状最适合未分配的实验核磁共振数据。HD不使用,因此不受序列同源性的限制。该方法还可用于证实或反驳其他技术(如蛋白质穿线或同源性建模)所做的结构预测。算法运行时间为O(pn + pn(5/2) log (cn)+p log p),其中p为数据库中蛋白质的个数,n为目标蛋白质的残基数,c为整数加权二部图的最大边权值。我们对3种不同蛋白质的真实NMR数据和4,500个代表性折叠的数据库进行了实验,结果表明该方法可以识别出密切相关的蛋白质折叠,包括较大蛋白质的子结构域,目标蛋白质(或子结构域)与计算模型之间的序列同源性只有10-30%。特别是,我们没有报告假阴性或假阳性,尽管有很大比例的实验数据缺失。
{"title":"High-throughput 3D structural homology detection via NMR resonance assignment.","authors":"Christopher James Langmead,&nbsp;Bruce Randall Donald","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>One goal of the structural genomics initiative is the identification of new protein folds. Sequence-based structural homology prediction methods are an important means for prioritizing unknown proteins for structure determination. However, an important challenge remains: two highly dissimilar sequences can have similar folds & how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure, called HD, for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies 3D models in a protein structural database whose geometries best fit the unassigned experimental NMR data. HD does not use, and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or homology modelling. The algorithm runs in O(pn + pn(5/2) log (cn)+p log p) time, where p is the number of proteins in the database, n is the number of residues in the target protein and c is the maximum edge weight in an integer-weighted bipartite graph. Our experiments on real NMR data from 3 different proteins against a database of 4,500 representative folds demonstrate that the method identifies closely related protein folds, including sub-domains of larger proteins, with as little as 10-30% sequence homology between the target protein (or sub-domain) and the computed model. In particular, we report no false-negatives or false-positives despite significant percentages of missing experimental data.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"278-89"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25831030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved fourier transform method for unsupervised cell-cycle regulated gene prediction. 无监督细胞周期调控基因预测的改进傅立叶变换方法。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332433
Karuturi R Murthy, Liu Jian Hua

Motivation: Cell-cycle regulated gene prediction using microarray time-course measurements of the mRNA expression levels of genes has been used by several researchers. The popularly employed approach is Fourier transform (FT) method in conjunction with the set of known cell-cycle regulated genes. In the absence of training data, fourier transform method is sensitive to noise, additive monotonic component arising from cell population growth and deviation from strict sinusoidal form of expression. Known cell cycle regulated genes may not be available for certain organisms or using them for training may bias the prediction.

Results: In this paper we propose an Improved Fourier Transform (IFT) method which takes care of several factors such as monotonic additive component of the cell-cycle expression, irregular or partial-cycle sampling of gene expression. The proposed algorithm does not need any known cell-cycle regulated genes for prediction. Apart from alleviating need for training set, it also removes bias towards genes similar to the training set. We have evaluated the developed method on two publicly available datasets: yeast cell-cycle data and HeLa cell-cycle data. The proposed algorithm has performed competitively on both datasets with that of the supervised fourier transform method used. It outperformed other unsupervised methods such as Partial Least Squares (PLS) and Single Pulse Modeling (SPM). This method is easy to comprehend and implement, and runs faster.

动机:利用基因mRNA表达水平的微阵列时间过程测量来预测细胞周期调节基因已经被一些研究人员使用。常用的方法是傅里叶变换(FT)方法结合一组已知的细胞周期调控基因。在没有训练数据的情况下,傅里叶变换方法对噪声、细胞群增长产生的单调分量和严格正弦表达式的偏离很敏感。已知的细胞周期调节基因可能不适用于某些生物体,或者将它们用于训练可能会使预测产生偏差。结果:本文提出了一种改进的傅立叶变换(IFT)方法,该方法考虑了细胞周期表达的单调加性成分、基因表达的不规则或部分周期采样等因素。该算法不需要任何已知的细胞周期调控基因进行预测。除了减轻对训练集的需求外,它还消除了对与训练集相似的基因的偏见。我们已经在两个公开可用的数据集上评估了开发的方法:酵母细胞周期数据和HeLa细胞周期数据。该算法在这两个数据集上的表现与使用的有监督傅里叶变换方法具有竞争力。它优于其他无监督方法,如偏最小二乘(PLS)和单脉冲建模(SPM)。该方法易于理解和实现,运行速度较快。
{"title":"Improved fourier transform method for unsupervised cell-cycle regulated gene prediction.","authors":"Karuturi R Murthy,&nbsp;Liu Jian Hua","doi":"10.1109/csb.2004.1332433","DOIUrl":"https://doi.org/10.1109/csb.2004.1332433","url":null,"abstract":"<p><strong>Motivation: </strong>Cell-cycle regulated gene prediction using microarray time-course measurements of the mRNA expression levels of genes has been used by several researchers. The popularly employed approach is Fourier transform (FT) method in conjunction with the set of known cell-cycle regulated genes. In the absence of training data, fourier transform method is sensitive to noise, additive monotonic component arising from cell population growth and deviation from strict sinusoidal form of expression. Known cell cycle regulated genes may not be available for certain organisms or using them for training may bias the prediction.</p><p><strong>Results: </strong>In this paper we propose an Improved Fourier Transform (IFT) method which takes care of several factors such as monotonic additive component of the cell-cycle expression, irregular or partial-cycle sampling of gene expression. The proposed algorithm does not need any known cell-cycle regulated genes for prediction. Apart from alleviating need for training set, it also removes bias towards genes similar to the training set. We have evaluated the developed method on two publicly available datasets: yeast cell-cycle data and HeLa cell-cycle data. The proposed algorithm has performed competitively on both datasets with that of the supervised fourier transform method used. It outperformed other unsupervised methods such as Partial Least Squares (PLS) and Single Pulse Modeling (SPM). This method is easy to comprehend and implement, and runs faster.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"194-203"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332433","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated protein classification using consensus decision. 使用共识决策的自动蛋白质分类。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332436
Tolga Can, Orhan Camoğlu, Ambuj K Singh, Yuan-Fang Wang

We propose a novel technique for automatically generating the SCOP classification of a protein structure with high accuracy. High accuracy is achieved by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique is rooted in machine learning which shows that by judicially employing component classifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure, using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, and fold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3-12 times less than the individual classifiers' error rates at the family level, 1.5-4.5 times less at the superfamily level, and 1.1-2.4 times less at the fold level.

提出了一种自动生成高精度蛋白质结构SCOP分类的新技术。通过使用委员会(或集成)分类器的共识组合多种方法的决策来实现高精度。我们的技术植根于机器学习,这表明通过合理地使用组件分类器,可以构建一个集成分类器,以优于其组件。我们使用两个序列比较工具和三个结构比较工具作为组件分类器。给定一个蛋白质结构,使用联合假设,我们首先确定该蛋白质是否属于SCOP层次结构中的现有类别(家族、超家族、折叠)。对于预测为现有类别成员的蛋白质,我们使用共识分类器计算其家族,超家族和折叠级别分类。我们表明,与单个组件分类器相比,我们可以显着提高分类精度。特别是,我们实现的错误率比单个分类器在家族层面的错误率低3-12倍,在超家族层面的错误率低1.5-4.5倍,在折叠层面的错误率低1.1-2.4倍。
{"title":"Automated protein classification using consensus decision.","authors":"Tolga Can,&nbsp;Orhan Camoğlu,&nbsp;Ambuj K Singh,&nbsp;Yuan-Fang Wang","doi":"10.1109/csb.2004.1332436","DOIUrl":"https://doi.org/10.1109/csb.2004.1332436","url":null,"abstract":"<p><p>We propose a novel technique for automatically generating the SCOP classification of a protein structure with high accuracy. High accuracy is achieved by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique is rooted in machine learning which shows that by judicially employing component classifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure, using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, and fold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3-12 times less than the individual classifiers' error rates at the family level, 1.5-4.5 times less at the superfamily level, and 1.1-2.4 times less at the fold level.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"224-35"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332436","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping of microbial pathways through constrained mapping of orthologous genes. 通过限制同源基因的定位来定位微生物途径。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332449
Victor Olman, Hanchuan Peng, Zhengchang Su, Ying Xu

We present a novel computer algorithm for mapping biological pathways from one prokaryotic genome to another. The algorithm maps genes in a known pathway to their homologous genes (if any) in a target genome that is most consistent with (a) predicted orthologous gene relationship, (b) predicted operon structures, and (c) predicted co-regulation relationship of operons. Mathematically, we have formulated this problem as a constrained minimum spanning tree problem (called a Steiner network problem), and demonstrated that this formulation has the desired property through applications. We have solved this mapping problem using a combinatorial optimization algorithm, with guaranteed global optimality. We have implemented this algorithm as a computer program, called PMAP. Our test results on pathway mapping are highly encouraging -- we have mapped a number of pathways of H. influenzae, B. subtilis, H. pylori, and M. tuberculosis to E. coli using P-MAP, whose homologous pathways in E coli. are known and hence the mapping accuracy could be checked. We have then mapped known E. coli pathways in the EcoCyc database to the newly sequenced organism Synechococcus sp WH8102, and predicted 158 Synechococcus pathways. Detailed analyses on the predicted pathways indicate that P-MAP's mapping results are consistent with our general knowledge about (local) pathways. We believe that P-MAP will be a useful tool for microbial genome annotation projects and inference of individual microbial pathways.

我们提出了一种新的计算机算法来绘制从一个原核生物基因组到另一个原核生物基因组的生物路径。该算法将已知途径中的基因映射到目标基因组中的同源基因(如果有的话),这些基因与(a)预测的同源基因关系,(b)预测的操纵子结构,以及(c)预测的操纵子共调控关系最一致。在数学上,我们将这个问题表述为一个约束最小生成树问题(称为Steiner网络问题),并通过应用证明了这个表述具有期望的性质。我们用组合优化算法解决了这个映射问题,保证了全局最优性。我们已经将这个算法实现为一个计算机程序,称为PMAP。我们在路径绘制方面的测试结果非常鼓舞人心——我们使用P-MAP绘制了流感嗜血杆菌、枯草芽孢杆菌、幽门螺杆菌和结核分枝杆菌到大肠杆菌的许多路径,而P-MAP在大肠杆菌中的同源路径。是已知的,因此可以检查映射的准确性。然后,我们将EcoCyc数据库中已知的大肠杆菌途径映射到新测序的生物体粘球菌sp WH8102,并预测了158种粘球菌途径。对预测路径的详细分析表明,P-MAP的映射结果与我们对(局部)路径的一般认识一致。我们相信P-MAP将成为微生物基因组注释项目和个体微生物途径推断的有用工具。
{"title":"Mapping of microbial pathways through constrained mapping of orthologous genes.","authors":"Victor Olman,&nbsp;Hanchuan Peng,&nbsp;Zhengchang Su,&nbsp;Ying Xu","doi":"10.1109/csb.2004.1332449","DOIUrl":"https://doi.org/10.1109/csb.2004.1332449","url":null,"abstract":"<p><p>We present a novel computer algorithm for mapping biological pathways from one prokaryotic genome to another. The algorithm maps genes in a known pathway to their homologous genes (if any) in a target genome that is most consistent with (a) predicted orthologous gene relationship, (b) predicted operon structures, and (c) predicted co-regulation relationship of operons. Mathematically, we have formulated this problem as a constrained minimum spanning tree problem (called a Steiner network problem), and demonstrated that this formulation has the desired property through applications. We have solved this mapping problem using a combinatorial optimization algorithm, with guaranteed global optimality. We have implemented this algorithm as a computer program, called PMAP. Our test results on pathway mapping are highly encouraging -- we have mapped a number of pathways of H. influenzae, B. subtilis, H. pylori, and M. tuberculosis to E. coli using P-MAP, whose homologous pathways in E coli. are known and hence the mapping accuracy could be checked. We have then mapped known E. coli pathways in the EcoCyc database to the newly sequenced organism Synechococcus sp WH8102, and predicted 158 Synechococcus pathways. Detailed analyses on the predicted pathways indicate that P-MAP's mapping results are consistent with our general knowledge about (local) pathways. We believe that P-MAP will be a useful tool for microbial genome annotation projects and inference of individual microbial pathways.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"363-70"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332449","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A mixed factors model for dimension reduction and extraction of a group structure in gene expression data. 一种用于基因表达数据中群体结构降维和提取的混合因子模型。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332429
Ryo Yoshida, Tomoyuki Higuchi, Seiya Imoto

When we cluster tissue samples on the basis of genes, the number of observations to be grouped is much smaller than the dimension of feature vector. In such a case, the applicability of conventional model-based clustering is limited since the high dimensionality of feature vector leads to overfitting during the density estimation process. To overcome such difficulty, we attempt a methodological extension of the factor analysis. Our approach enables us not only to prevent from the occurrence of overfitting, but also to handle the issues of clustering, data compression and extracting a set of genes to be relevant to explain the group structure. The potential usefulness are demonstrated with the application to the leukemia dataset.

当我们基于基因对组织样本进行聚类时,待分组的观察值的数量远远小于特征向量的维数。在这种情况下,传统的基于模型的聚类方法的适用性受到限制,因为特征向量的高维会导致密度估计过程中的过拟合。为了克服这一困难,我们尝试在方法上对因子分析进行扩展。我们的方法不仅可以防止过度拟合的发生,还可以处理聚类、数据压缩和提取一组相关基因来解释群体结构的问题。通过对白血病数据集的应用,证明了潜在的有用性。
{"title":"A mixed factors model for dimension reduction and extraction of a group structure in gene expression data.","authors":"Ryo Yoshida,&nbsp;Tomoyuki Higuchi,&nbsp;Seiya Imoto","doi":"10.1109/csb.2004.1332429","DOIUrl":"https://doi.org/10.1109/csb.2004.1332429","url":null,"abstract":"<p><p>When we cluster tissue samples on the basis of genes, the number of observations to be grouped is much smaller than the dimension of feature vector. In such a case, the applicability of conventional model-based clustering is limited since the high dimensionality of feature vector leads to overfitting during the density estimation process. To overcome such difficulty, we attempt a methodological extension of the factor analysis. Our approach enables us not only to prevent from the occurrence of overfitting, but also to handle the issues of clustering, data compression and extracting a set of genes to be relevant to explain the group structure. The potential usefulness are demonstrated with the application to the leukemia dataset.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"161-72"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332429","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fractal genomics modeling: a new approach to genomic analysis and biomarker discovery. 分形基因组建模:基因组分析和生物标志物发现的新方法。
Sandy Shaw, Paul Shapshak

Reverse engineering of genetics networks generally requires establishing correlative behavior within and between a very large number of genes. This becomes a difficult analytical problem for even a few hundred genes and the difficulty tends to grow exponentially as more genes are examined. Using a hybrid data analysis method known as Fractal Genomics Modeling (FGM), this problem is reduced to examining correlative behavior within small gene groups that can then be compared and integrated to produce a picture of larger networks using a type pf shotgun approach. We have applied FGM toward examining genetic networks involved in HIV infection in the brain. These networks have relevance both to processes related to HIV infection and neurodegenerative disorders. Our preliminary findings have produced conjectures of related pathways and networks as well new candidates for genetic markers in HIV brain infection. Evidence has also been produced which appears to show the presence of a hierarchical network structure within the genes studied. We will discuss the background and methodology of FGM as well as our recent findings.

遗传网络的逆向工程通常需要在大量基因内部和基因之间建立相关行为。即使对几百个基因来说,这也成为一个困难的分析问题,而且随着检测的基因越来越多,难度呈指数级增长。使用一种称为分形基因组建模(FGM)的混合数据分析方法,这个问题被简化为检查小基因组内的相关行为,然后可以使用霰弹枪方法进行比较和整合,以产生更大网络的图片。我们已经将女性生殖器切割用于检测大脑中与HIV感染有关的遗传网络。这些网络与HIV感染和神经退行性疾病相关的过程相关。我们的初步发现已经产生了相关途径和网络的猜想,以及HIV脑感染遗传标记的新候选物。也有证据表明,在被研究的基因中似乎存在一个等级网络结构。我们将讨论女性生殖器切割的背景和方法以及我们最近的发现。
{"title":"Fractal genomics modeling: a new approach to genomic analysis and biomarker discovery.","authors":"Sandy Shaw,&nbsp;Paul Shapshak","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Reverse engineering of genetics networks generally requires establishing correlative behavior within and between a very large number of genes. This becomes a difficult analytical problem for even a few hundred genes and the difficulty tends to grow exponentially as more genes are examined. Using a hybrid data analysis method known as Fractal Genomics Modeling (FGM), this problem is reduced to examining correlative behavior within small gene groups that can then be compared and integrated to produce a picture of larger networks using a type pf shotgun approach. We have applied FGM toward examining genetic networks involved in HIV infection in the brain. These networks have relevance both to processes related to HIV infection and neurodegenerative disorders. Our preliminary findings have produced conjectures of related pathways and networks as well new candidates for genetic markers in HIV brain infection. Evidence has also been produced which appears to show the presence of a hierarchical network structure within the genes studied. We will discuss the background and methodology of FGM as well as our recent findings.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"9-18"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25837755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segmental duplications containing tandem repeated genes encoding putative deubiquitinating enzymes. 含有串联重复基因的片段复制,编码假定的去泛素化酶。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332414
Hong Liu, Li Li, Asher Zilberstein, Chang S Hahn

Both inter- and intra-chromosomal segmental duplications are known occurred in human genome during evolution. Few cases of such segments involving functional genes have been reported. While searching for the human orthologs of murine hematopoietic deubiquitinating enzymes (DUBs), we identified four clusters of DUB-like genes on chromosome 4p15 and chromosome 8p22-23 that are over 90% identical to each other at the DNA level. These genes are expressed in a cell type- and activation-specific manner, with different clusters possessing potentially distinct expression profiles. Examining the surrounding sequences of these gene duplication events, we have identified previously unreported conserved sequence elements that are as large as 35 to 74 kb encircling the gene clusters. Traces of these elements are also found on chromosome 12p13 and chromosome 11q13. The coding and immediate upstream sequences for DUB-like genes as well as the surrounding conserved elements, are present in the chimpanzee trace database, but not in rodent genome. We hypothesize that the segments containing these DUB clusters and surrounding elements arose relatively recently in evolution through inter- and intra-chromosomal duplicative transpositions, following the divergence of primates and rodents. Genome wide systematical search of the segmental duplication containing duplicated gene cluster has been performed.

在人类基因组进化过程中,染色体间和染色体内的片段复制都是已知的。这类涉及功能基因的片段鲜有报道。在寻找小鼠造血去泛素化酶(DUBs)的人类同源基因时,我们在染色体4p15和染色体8p22-23上发现了四个类似dub的基因簇,它们在DNA水平上的相似性超过90%。这些基因以细胞类型和激活特异性的方式表达,不同的基因簇具有潜在的不同表达谱。通过检查这些基因重复事件的周围序列,我们发现了先前未报道的围绕基因簇的保守序列元件,其长度可达35至74 kb。在12p13染色体和11q13染色体上也发现了这些元素的痕迹。dub样基因的编码和直接上游序列以及周围的保守元件存在于黑猩猩的基因序列数据库中,而不存在于啮齿类动物的基因组中。我们假设包含这些DUB簇和周围元素的片段是在灵长类动物和啮齿动物的分化之后,通过染色体间和染色体内的复制转位而出现的。对包含重复基因簇的片段复制进行了全基因组范围的系统搜索。
{"title":"Segmental duplications containing tandem repeated genes encoding putative deubiquitinating enzymes.","authors":"Hong Liu,&nbsp;Li Li,&nbsp;Asher Zilberstein,&nbsp;Chang S Hahn","doi":"10.1109/csb.2004.1332414","DOIUrl":"https://doi.org/10.1109/csb.2004.1332414","url":null,"abstract":"<p><p>Both inter- and intra-chromosomal segmental duplications are known occurred in human genome during evolution. Few cases of such segments involving functional genes have been reported. While searching for the human orthologs of murine hematopoietic deubiquitinating enzymes (DUBs), we identified four clusters of DUB-like genes on chromosome 4p15 and chromosome 8p22-23 that are over 90% identical to each other at the DNA level. These genes are expressed in a cell type- and activation-specific manner, with different clusters possessing potentially distinct expression profiles. Examining the surrounding sequences of these gene duplication events, we have identified previously unreported conserved sequence elements that are as large as 35 to 74 kb encircling the gene clusters. Traces of these elements are also found on chromosome 12p13 and chromosome 11q13. The coding and immediate upstream sequences for DUB-like genes as well as the surrounding conserved elements, are present in the chimpanzee trace database, but not in rodent genome. We hypothesize that the segments containing these DUB clusters and surrounding elements arose relatively recently in evolution through inter- and intra-chromosomal duplicative transpositions, following the divergence of primates and rodents. Genome wide systematical search of the segmental duplication containing duplicated gene cluster has been performed.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"31-9"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332414","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A self-tuning method for one-chip SNP identification. 一种单芯片SNP鉴定的自调谐方法。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332419
Michael Molla, Jude Shavlik, Todd Richmond, Steven Smith

Current methods for interpreting oligonucleotide-based SNP-detection microarrays, SNP chips, are based on statistics and require extensive parameter tuning as well as extremely high-resolution images of the chip being processed. We present a method, based on a simple data-classification technique called nearest-neighbors that, on haploid organisms, produces results comparable to the published results of the leading statistical methods and requires very little in the way of parameter tuning. Furthermore, it can interpret SNP chips using lower-resolution scanners of the type more typically used in current microarray experiments. Along with our algorithm, we present the results of a SNP-detection experiment where, when independently applying this algorithm to six identical SARS SNP chips, we correctly identify all 24 SNPs in a particular strain of the SARS virus, with between 6 and 13 false positives across the six experiments.

目前用于解释基于寡核苷酸的SNP检测微阵列(SNP芯片)的方法是基于统计数据的,需要大量的参数调整以及正在处理的芯片的极高分辨率图像。我们提出了一种方法,基于一种简单的数据分类技术,称为最近邻,在单倍体生物上,产生的结果与已发表的主要统计方法的结果相当,并且只需要很少的参数调整。此外,它可以使用当前微阵列实验中更常用的低分辨率扫描仪来解释SNP芯片。随着我们的算法,我们提出了一个SNP检测实验的结果,其中,当独立地将该算法应用于六个相同的SARS SNP芯片时,我们正确地识别出特定SARS病毒株中的所有24个SNP,在六个实验中有6到13个假阳性。
{"title":"A self-tuning method for one-chip SNP identification.","authors":"Michael Molla,&nbsp;Jude Shavlik,&nbsp;Todd Richmond,&nbsp;Steven Smith","doi":"10.1109/csb.2004.1332419","DOIUrl":"https://doi.org/10.1109/csb.2004.1332419","url":null,"abstract":"<p><p>Current methods for interpreting oligonucleotide-based SNP-detection microarrays, SNP chips, are based on statistics and require extensive parameter tuning as well as extremely high-resolution images of the chip being processed. We present a method, based on a simple data-classification technique called nearest-neighbors that, on haploid organisms, produces results comparable to the published results of the leading statistical methods and requires very little in the way of parameter tuning. Furthermore, it can interpret SNP chips using lower-resolution scanners of the type more typically used in current microarray experiments. Along with our algorithm, we present the results of a SNP-detection experiment where, when independently applying this algorithm to six identical SARS SNP chips, we correctly identify all 24 SNPs in a particular strain of the SARS virus, with between 6 and 13 false positives across the six experiments.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"69-79"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332419","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative analysis of gene sets in the Gene Ontology space under the multiple hypothesis testing framework. 多假设检验框架下基因本体空间中基因集的比较分析。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332455
Sheng Zhong, Lu Tian, Cheng Li, Kai-Florian Storch, Wing H Wong

The Gene Ontology (GO) resource can be used as a powerful tool to uncover the properties shared among, and specific to, a list of genes produced by high-throughput functional genomics studies, such as microarray studies. In the comparative analysis of several gene lists, researchers maybe interested in knowing which GO terms are enriched in one list of genes but relatively depleted in another. Statistical tests such as Fisher's exact test or Chi-square test can be performed to search for such GO terms. However, because multiple GO terms are tested simultaneously, individual p-values from individual tests do not serve as good indicators for picking GO terms. Furthermore, these multiple tests are highly correlated, usual multiple testing procedures that work under an independence assumption are not applicable. In this paper we introduce a procedure, based on False Discovery Rate (FDR), to treat this correlated multiple testing problem. This procedure calculates a moderately conserved estimator of q-value for every GO term. We identify the GO terms with q-values that satisfy a desired level as the significant GO terms. This procedure has been implemented into the GoSurfer software. GoSurfer is a windows based graphical data mining tool. It is freely available at http://www.gosurfer.org.

基因本体(GO)资源可以作为一个强大的工具来揭示高通量功能基因组学研究(如微阵列研究)产生的一系列基因之间共享的和特定的特性。在几个基因列表的比较分析中,研究人员可能有兴趣知道哪些氧化石墨烯在一个基因列表中富集,而在另一个基因列表中相对缺乏。统计检验,如费雪精确检验或卡方检验,可以执行搜索这类GO项。然而,由于同时测试多个氧化石墨烯项,单个测试的单个p值不能作为选择氧化石墨烯项的良好指标。此外,这些多重测试是高度相关的,通常在独立性假设下工作的多重测试程序不适用。本文介绍了一种基于错误发现率(FDR)的方法来处理这一相关多重测试问题。这个程序计算每个GO项的q值的适度保守估计量。我们识别具有满足期望水平的q值的GO项作为有效GO项。此程序已在GoSurfer软件中实现。GoSurfer是一个基于windows的图形数据挖掘工具。它可以在http://www.gosurfer.org上免费获得。
{"title":"Comparative analysis of gene sets in the Gene Ontology space under the multiple hypothesis testing framework.","authors":"Sheng Zhong,&nbsp;Lu Tian,&nbsp;Cheng Li,&nbsp;Kai-Florian Storch,&nbsp;Wing H Wong","doi":"10.1109/csb.2004.1332455","DOIUrl":"https://doi.org/10.1109/csb.2004.1332455","url":null,"abstract":"<p><p>The Gene Ontology (GO) resource can be used as a powerful tool to uncover the properties shared among, and specific to, a list of genes produced by high-throughput functional genomics studies, such as microarray studies. In the comparative analysis of several gene lists, researchers maybe interested in knowing which GO terms are enriched in one list of genes but relatively depleted in another. Statistical tests such as Fisher's exact test or Chi-square test can be performed to search for such GO terms. However, because multiple GO terms are tested simultaneously, individual p-values from individual tests do not serve as good indicators for picking GO terms. Furthermore, these multiple tests are highly correlated, usual multiple testing procedures that work under an independence assumption are not applicable. In this paper we introduce a procedure, based on False Discovery Rate (FDR), to treat this correlated multiple testing problem. This procedure calculates a moderately conserved estimator of q-value for every GO term. We identify the GO terms with q-values that satisfy a desired level as the significant GO terms. This procedure has been implemented into the GoSurfer software. GoSurfer is a windows based graphical data mining tool. It is freely available at http://www.gosurfer.org.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"425-35"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332455","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MISAE: a new approach for regulatory motif extraction. MISAE:一种新的调控基序提取方法。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332430
Zhaohui Sun, Jingyi Yang, Jitender S Deogun

The recognition of regulatory motifs of co-regulated genes is essential for understanding the regulatory mechanisms. However, the automatic extraction of regulatory motifs from a given data set of the upstream non-coding DNA sequences of a family of co-regulated genes is difficult because regulatory motifs are often subtle and inexact. This problem is further complicated by the corruption of the data sets. In this paper, a new approach called Mismatch-allowed Probabilistic Suffix Tree Motif Extraction (MISAE) is proposed. It combines the mismatch-allowed probabilistic suffix tree that is a probabilistic model and local prediction for the extraction of regulatory motifs. The proposed approach is tested on 15 co-regulated gene families and compares favorably with other state-of-the-art approaches. Moreover, MISAE performs well on "corrupted" data sets. It is able to extract the motif from a "corrupted" data set with less than one fourth of the sequences containing the real motif.

识别共调控基因的调控基序对于理解调控机制至关重要。然而,从一个共调控基因家族的上游非编码DNA序列的给定数据集中自动提取调控基序是困难的,因为调控基序通常是微妙和不精确的。数据集的损坏使这个问题进一步复杂化。提出了一种允许错匹配的概率后缀树基序提取方法(MISAE)。它结合了允许不匹配的概率后缀树(一种概率模型)和局部预测来提取调控基序。所提出的方法在15个共同调节的基因家族中进行了测试,并与其他最先进的方法进行了比较。此外,MISAE在“损坏的”数据集上表现良好。它能够从包含真实基序的序列少于四分之一的“损坏”数据集中提取基序。
{"title":"MISAE: a new approach for regulatory motif extraction.","authors":"Zhaohui Sun,&nbsp;Jingyi Yang,&nbsp;Jitender S Deogun","doi":"10.1109/csb.2004.1332430","DOIUrl":"https://doi.org/10.1109/csb.2004.1332430","url":null,"abstract":"<p><p>The recognition of regulatory motifs of co-regulated genes is essential for understanding the regulatory mechanisms. However, the automatic extraction of regulatory motifs from a given data set of the upstream non-coding DNA sequences of a family of co-regulated genes is difficult because regulatory motifs are often subtle and inexact. This problem is further complicated by the corruption of the data sets. In this paper, a new approach called Mismatch-allowed Probabilistic Suffix Tree Motif Extraction (MISAE) is proposed. It combines the mismatch-allowed probabilistic suffix tree that is a probabilistic model and local prediction for the extraction of regulatory motifs. The proposed approach is tested on 15 co-regulated gene families and compares favorably with other state-of-the-art approaches. Moreover, MISAE performs well on \"corrupted\" data sets. It is able to extract the motif from a \"corrupted\" data set with less than one fourth of the sequences containing the real motif.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"173-81"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332430","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. IEEE Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1