首页 > 最新文献

Proceedings of the ... Asia-Pacific bioinformatics conference最新文献

英文 中文
Seed Optimization Is No Easier than Optimal Golomb Ruler Design 种子优化并不比优化Golomb标尺设计更容易
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0016
Bin Ma, Hongyi Yao
Spaced seed is a lter method invented to eciently identify the regions of interest in similarity searches. It is now well known that certain spaced seeds hit (detect) a randomly sampled similarity region with higher probabilities than the others. Assume each position of the similarity region is identity with probability p independently. The seed optimization problem seeks for the optimal seed achieving the highest hit probability with given length and weight. Despite that the problem was previously shown not to be NP-hard, in practice it seems dicult to solve. The only algorithm known to compute the optimal seed is still exhaustive search in exponential time. In this article we put some insight into the hardness of the seed design problem by demonstrating the relation between the seed optimization problem and the optimal Golomb ruler design problem, which is a well known dicult problem in combinatorial design.
间隔种子是为了在相似性搜索中快速识别感兴趣区域而发明的一种新方法。现在大家都知道,某些间隔的种子以比其他种子更高的概率击中(探测到)随机抽样的相似区域。假设相似区域的每一个位置都是独立的,且概率为p。种子优化问题寻求在给定长度和权重下命中概率最高的最优种子。尽管这个问题之前被证明不是np难题,但实际上它似乎很难解决。唯一已知的计算最优种子的算法仍然是指数时间内的穷举搜索。本文通过论证种子优化问题与组合设计中最优Golomb标尺设计问题之间的关系,对种子设计问题的难度进行了深入的探讨。
{"title":"Seed Optimization Is No Easier than Optimal Golomb Ruler Design","authors":"Bin Ma, Hongyi Yao","doi":"10.1142/9781848161092_0016","DOIUrl":"https://doi.org/10.1142/9781848161092_0016","url":null,"abstract":"Spaced seed is a lter method invented to eciently identify the regions of interest in similarity searches. It is now well known that certain spaced seeds hit (detect) a randomly sampled similarity region with higher probabilities than the others. Assume each position of the similarity region is identity with probability p independently. The seed optimization problem seeks for the optimal seed achieving the highest hit probability with given length and weight. Despite that the problem was previously shown not to be NP-hard, in practice it seems dicult to solve. The only algorithm known to compute the optimal seed is still exhaustive search in exponential time. In this article we put some insight into the hardness of the seed design problem by demonstrating the relation between the seed optimization problem and the optimal Golomb ruler design problem, which is a well known dicult problem in combinatorial design.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"454 1","pages":"133-144"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81009294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
GenePC and ASPIC Integrate Gene Predictions with Expressed Sequence Alignments To Predict Alternative Transcripts GenePC和ASPIC整合基因预测与表达序列比对预测替代转录本
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0037
T. Alioto, R. Guigó, E. Picardi, G. Pesole
We have developed a generic framework for combining introns from genomicly aligned expressed–sequence–tag clusters with a set of exon predictions to produce alternative transcript predictions. Our current implementation uses ASPIC to generate alternative transcripts from EST mappings. Introns from ASPIC and a set of gene predictions from many diverse gene prediction programs are given to the gene prediction combiner GenePC which then generates alternative consensus splice forms. We evaluated our method on the ENCODE regions of the human genome. In general we see a marked improvement in transcript-level sensitivity due to the fact that more than one transcript per gene may now be predicted. GenePC, which alone is highly specific at the transcript level, balances the lower specificity of ASPIC.
我们开发了一个通用框架,用于将基因组排列的表达序列标签簇中的内含子与一组外显子预测相结合,以产生替代转录本预测。我们当前的实现使用ASPIC从EST映射生成可选的转录本。来自ASPIC的内含子和来自许多不同基因预测程序的一组基因预测被给予基因预测组合子GenePC,然后产生替代的一致剪接形式。我们在人类基因组的ENCODE区域上评估了我们的方法。总的来说,我们看到转录水平敏感性的显著提高,因为现在每个基因可以预测多个转录本。GenePC在转录水平上具有高度特异性,平衡了ASPIC的低特异性。
{"title":"GenePC and ASPIC Integrate Gene Predictions with Expressed Sequence Alignments To Predict Alternative Transcripts","authors":"T. Alioto, R. Guigó, E. Picardi, G. Pesole","doi":"10.1142/9781848161092_0037","DOIUrl":"https://doi.org/10.1142/9781848161092_0037","url":null,"abstract":"We have developed a generic framework for combining introns from genomicly aligned expressed–sequence–tag clusters with a set of exon predictions to produce alternative transcript predictions. Our current implementation uses ASPIC to generate alternative transcripts from EST mappings. Introns from ASPIC and a set of gene predictions from many diverse gene prediction programs are given to the gene prediction combiner GenePC which then generates alternative consensus splice forms. We evaluated our method on the ENCODE regions of the human genome. In general we see a marked improvement in transcript-level sensitivity due to the fact that more than one transcript per gene may now be predicted. GenePC, which alone is highly specific at the transcript level, balances the lower specificity of ASPIC.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"29 1","pages":"363-372"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89273437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing and Analysing Gene Expression Patterns Across Animal Species Using 4DXpress 利用4DXpress比较和分析动物物种间基因表达模式
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0038
Yannick Haudry, C. Ong, L. Ettwiller, Hugo Bérubé, Ivica Letunic, M. Kapushesky, Paul-Daniel Weeber, Xi Wang, J. Gagneur, Charles Girardot, D. Arendt, P. Bork, A. Brazma, E. Furlong, J. Wittbrodt, T. Henrich
High-resolution spatial information on gene expression over time can be acquired through whole mount in-situ hybridisation experiments in animal model species such as fish, fly or mouse. Expression patterns of many genes have been studied and data has been integrated into dedicated model organism databases like ZFIN for zebrafish, MEPD for medaka, BDGP for drosophila or MGI for mouse. Nevertheless, a central repository that allows users to query and compare gene expression patterns across different species has not yet been established. For this purpose we have integrated gene expression data for zebrafish, medaka, drosophila and mouse into a central public repository named 4DXpress (http://ani.embl.de/4DXpress). 4DXpress allows to query anatomy ontology based expression annotations across species and quickly jump from one gene to the orthologs in other species based on ensembl-compara relationships. We have set up a linked resource for microarray data at ArrayExpress. In addition we have mapped developmental stages between the species to be able to compare corresponding developmental time phases. We have used clustering algorithms to classify genes based on their expression pattern annotations. To illustrate the use of 4DXpress we systematically analysed the relationships between conserved regulatory inputs and spatio-temporal gene expression derived from 4DXpress and found significant correlation between expression patterns of genes predicted to have similar regulatory elements in their promoters.
通过对鱼、蝇或小鼠等动物模型物种进行全基因组原位杂交实验,可以获得基因表达随时间变化的高分辨率空间信息。许多基因的表达模式已被研究,数据已被整合到专门的模式生物数据库中,如斑马鱼的ZFIN, medaka的MEPD,果蝇的BDGP或小鼠的MGI。然而,允许用户查询和比较不同物种的基因表达模式的中央存储库尚未建立。为此,我们将斑马鱼、medaka、果蝇和小鼠的基因表达数据整合到一个名为4DXpress (http://ani.embl.de/4DXpress)的中央公共存储库中。4DXpress允许跨物种查询基于解剖本体的表达注释,并基于集成比较关系从一个基因快速跳转到其他物种的同源基因。我们在ArrayExpress上为微阵列数据建立了一个链接资源。此外,我们还绘制了物种之间的发育阶段图,以便能够比较相应的发育阶段。我们使用聚类算法基于表达模式注释对基因进行分类。为了说明4DXpress的使用,我们系统地分析了保守调控输入与4DXpress衍生的基因时空表达之间的关系,发现在启动子中预测具有相似调控元件的基因的表达模式之间存在显著相关性。
{"title":"Comparing and Analysing Gene Expression Patterns Across Animal Species Using 4DXpress","authors":"Yannick Haudry, C. Ong, L. Ettwiller, Hugo Bérubé, Ivica Letunic, M. Kapushesky, Paul-Daniel Weeber, Xi Wang, J. Gagneur, Charles Girardot, D. Arendt, P. Bork, A. Brazma, E. Furlong, J. Wittbrodt, T. Henrich","doi":"10.1142/9781848161092_0038","DOIUrl":"https://doi.org/10.1142/9781848161092_0038","url":null,"abstract":"High-resolution spatial information on gene expression over time can be acquired through whole mount in-situ hybridisation experiments in animal model species such as fish, fly or mouse. Expression patterns of many genes have been studied and data has been integrated into dedicated model organism databases like ZFIN for zebrafish, MEPD for medaka, BDGP for drosophila or MGI for mouse. Nevertheless, a central repository that allows users to query and compare gene expression patterns across different species has not yet been established. For this purpose we have integrated gene expression data for zebrafish, medaka, drosophila and mouse into a central public repository named 4DXpress (http://ani.embl.de/4DXpress). 4DXpress allows to query anatomy ontology based expression annotations across species and quickly jump from one gene to the orthologs in other species based on ensembl-compara relationships. We have set up a linked resource for microarray data at ArrayExpress. In addition we have mapped developmental stages between the species to be able to compare corresponding developmental time phases. We have used clustering algorithms to classify genes based on their expression pattern annotations. To illustrate the use of 4DXpress we systematically analysed the relationships between conserved regulatory inputs and spatio-temporal gene expression derived from 4DXpress and found significant correlation between expression patterns of genes predicted to have similar regulatory elements in their promoters.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"62 1","pages":"373-382"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82347901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent Progress in Phylogenetic Combinatorics 系统发育组合学的最新进展
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0001
A. Dress
of D is an R-tree. (ii) There exists a tree (V,E) whose vertex set V contains X, and an edge weighting ` : E → R that assigns a positive length `(e) to each edge e in E, such that D is the restriction of X to the shortest-path metric induced on V. (iii) There exists a map w : S(X) → R≥0 from the set S(X) of all bi-partitions or splits of X into the set R≥0 of non-negative real numbers such that, given any two splits S = {A,B} and S′ = {A′, B′} in S(X) with w(S), w(S′) 6= 0, at least one of the four intersections A ∩A′, B ∩A′, A ∩B′, and B ∩B′ is empty and D(x, y) = ∑ S∈S(X:x↔y) w(S) holds where S(X : x↔y) denotes the set of splits S = {A,B} ∈ S(X) that separate x and y. (iv) D(x, y)+D(u, v) ≤ max ( D(x, u)+D(y, v), D(x, v)+D(y, u) ) holds for all x, y, u, v ∈ X
(D)是一个r树。(ii)存在一棵树(V,E),其顶点集V包含X,并且存在一个边权':E→R,该边权':E→R赋予E中的每条边E一个正长度' (E),使得D是X对V诱导的最短路径度量的约束。(iii)存在一个映射w:S (X)→R≥0集合S (X)的所有bi-partitions或分裂X的非负实数集R≥0的,给定的任意两个分裂S = {A、B}和S ' ={一个“B”}S (X)和w (S), w(年代”)6 = 0,至少有一个的四个十字路口∩“,B∩,∩B, B和B∩”是空的和D (X, y) =∑∈年代(X, X↔y) w (S)认为,S (X): X↔y)表示的集合分裂S = {A、B}∈(X),独立的X和y。(iv) D (X, y) + D (u, v)≤马克斯(D (X, u) + D (y, v)、D (X, v) + D (y, u))拥有对所有的X, y, u, v∈X
{"title":"Recent Progress in Phylogenetic Combinatorics","authors":"A. Dress","doi":"10.1142/9781848161092_0001","DOIUrl":"https://doi.org/10.1142/9781848161092_0001","url":null,"abstract":"of D is an R-tree. (ii) There exists a tree (V,E) whose vertex set V contains X, and an edge weighting ` : E → R that assigns a positive length `(e) to each edge e in E, such that D is the restriction of X to the shortest-path metric induced on V. (iii) There exists a map w : S(X) → R≥0 from the set S(X) of all bi-partitions or splits of X into the set R≥0 of non-negative real numbers such that, given any two splits S = {A,B} and S′ = {A′, B′} in S(X) with w(S), w(S′) 6= 0, at least one of the four intersections A ∩A′, B ∩A′, A ∩B′, and B ∩B′ is empty and D(x, y) = ∑ S∈S(X:x↔y) w(S) holds where S(X : x↔y) denotes the set of splits S = {A,B} ∈ S(X) that separate x and y. (iv) D(x, y)+D(u, v) ≤ max ( D(x, u)+D(y, v), D(x, v)+D(y, u) ) holds for all x, y, u, v ∈ X","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"6 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82986766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification of Protein Sequences Based on Word Segmentation Methods 基于分词方法的蛋白质序列分类
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0020
Yang Yang, Bao-Liang Lu, Wen-Yun Yang
Protein sequences contain great potential revealing protein function, structure families and evolution information. Classifying protein sequences into different functional groups or families based on their sequence patterns has attracted lots of research efforts in the last decade. A key issue of these classification systems is how to interpret and represent protein sequences, which largely determines the performance of classifiers. Inspired by text classification and Chinese word segmentation techniques, we propose a segmentation-based feature extraction method. The extracted features include selected words, i.e., substrings of the sequences, and also motifs specified in public database. They are segmented out and their occurrence frequencies are recorded as the feature vector values. We conducted experiments on two protein data sets. One is a set of SCOP families, and the other is GPCR family. Experiments in classification of SCOP protein families show that the proposed method not only results in an extremely condensed feature set but also achieves higher accuracy than the methods based on whole k-spectrum feature space. And it also performs comparably to the most powerful classifiers for GPCR level I and level II subfamily recognition with 92.6 and 88.8% accuracy, respectively.
蛋白质序列具有揭示蛋白质功能、结构家族和进化信息的巨大潜力。基于蛋白质序列模式将蛋白质序列划分为不同的功能基团或家族是近十年来研究的热点。这些分类系统的一个关键问题是如何解释和表示蛋白质序列,这在很大程度上决定了分类器的性能。受文本分类和中文分词技术的启发,我们提出了一种基于分词的特征提取方法。提取的特征包括选定的词,即序列的子串,以及公共数据库中指定的motif。它们被分割出来,它们的出现频率被记录为特征向量值。我们在两个蛋白质数据集上进行了实验。一个是SCOP家族,另一个是GPCR家族。对SCOP蛋白家族的分类实验表明,该方法不仅得到了一个极为浓缩的特征集,而且比基于整个k谱特征空间的方法具有更高的分类精度。在GPCR I级和II级亚家族识别方面,它的准确率分别为92.6%和88.8%,与最强大的分类器相当。
{"title":"Classification of Protein Sequences Based on Word Segmentation Methods","authors":"Yang Yang, Bao-Liang Lu, Wen-Yun Yang","doi":"10.1142/9781848161092_0020","DOIUrl":"https://doi.org/10.1142/9781848161092_0020","url":null,"abstract":"Protein sequences contain great potential revealing protein function, structure families and evolution information. Classifying protein sequences into different functional groups or families based on their sequence patterns has attracted lots of research efforts in the last decade. A key issue of these classification systems is how to interpret and represent protein sequences, which largely determines the performance of classifiers. Inspired by text classification and Chinese word segmentation techniques, we propose a segmentation-based feature extraction method. The extracted features include selected words, i.e., substrings of the sequences, and also motifs specified in public database. They are segmented out and their occurrence frequencies are recorded as the feature vector values. We conducted experiments on two protein data sets. One is a set of SCOP families, and the other is GPCR family. Experiments in classification of SCOP protein families show that the proposed method not only results in an extremely condensed feature set but also achieves higher accuracy than the methods based on whole k-spectrum feature space. And it also performs comparably to the most powerful classifiers for GPCR level I and level II subfamily recognition with 92.6 and 88.8% accuracy, respectively.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"26 1","pages":"177-186"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72818721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Semantic Similarity Definition over Gene Ontology by Further Mining of the Information Content 基于信息内容进一步挖掘的基因本体语义相似度定义
Pub Date : 2007-12-01 DOI: 10.1142/9781848161092_0018
Yuan-Peng Li, Bao-Liang Lu
The similarity of two gene products can be used to solve many problems in information biology. Since one gene product corresponds to several GO (Gene Ontology) terms, one way to calculate the gene product similarity is to use the similarity of their GO terms. This GO term similarity can be defined as the semantic similarity on the GO graph. There are many kinds of similarity definitions of two GO terms, but the information of the GO graph is not used efficiently. This paper presents a new way to mine more information of the GO graph by regarding edge as information content and using the information of negation on the semantic graph. A simple experiment is conducted and, as a result, the accuracy increased by 8.3 percent in average, compared with the traditional method which uses node as information source.
两个基因产物的相似性可以用来解决信息生物学中的许多问题。由于一个基因产物对应多个GO (gene Ontology)术语,因此计算基因产物相似度的一种方法是使用它们的GO术语的相似度。这种GO词相似度可以定义为GO图上的语义相似度。两个围棋项的相似度定义有很多种,但未能有效利用围棋图的信息。本文提出了一种将边缘作为信息内容,利用语义图上的否定信息来挖掘GO图更多信息的新方法。通过简单的实验,与以节点为信息源的传统方法相比,准确率平均提高了8.3%。
{"title":"Semantic Similarity Definition over Gene Ontology by Further Mining of the Information Content","authors":"Yuan-Peng Li, Bao-Liang Lu","doi":"10.1142/9781848161092_0018","DOIUrl":"https://doi.org/10.1142/9781848161092_0018","url":null,"abstract":"The similarity of two gene products can be used to solve many problems in information biology. Since one gene product corresponds to several GO (Gene Ontology) terms, one way to calculate the gene product similarity is to use the similarity of their GO terms. This GO term similarity can be defined as the semantic similarity on the GO graph. There are many kinds of similarity definitions of two GO terms, but the information of the GO graph is not used efficiently. This paper presents a new way to mine more information of the GO graph by regarding edge as information content and using the information of negation on the semantic graph. A simple experiment is conducted and, as a result, the accuracy increased by 8.3 percent in average, compared with the traditional method which uses node as information source.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"30 1","pages":"155-164"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87979725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A New Strategy of Geometrical Biclustering for Microarray Data Analysis 微阵列数据分析的几何双聚类新策略
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0008
Hongya Zhao, Alan Wee-Chung Liew, Hong Yan
In this paper, we present a new biclustering algorithm to provide the geometrical interpretation of similar microarray gene expression profiles. Different from standard clustering analyses, biclustering methodology can perform simultaneous classification on the row and column dimensions of a data matrix. The main object of the strategy is to reveal the submatrix, in which a subset of genes exhibits a consistent pattern over a subset of conditions. However, the search for such subsets is a computationally complex task. We propose a new algorithm, based on the Hough transform in the column-pair space to perform pattern identification. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our simulation studies show that the method is robust to noise and computationally efficient. Furthermore, we have applied it to a large database of gene expression profiles of multiple human organs and the resulting biclusters show clear biological meanings.
在本文中,我们提出了一种新的双聚类算法来提供相似微阵列基因表达谱的几何解释。与标准聚类分析不同,双聚类方法可以同时对数据矩阵的行维和列维进行分类。该策略的主要目的是揭示子矩阵,其中一组基因在一组条件下表现出一致的模式。然而,搜索这样的子集是一个计算复杂的任务。本文提出了一种基于列对空间中的Hough变换的模式识别算法。该算法特别适用于大规模微阵列数据的双聚类分析。仿真研究表明,该方法对噪声具有较强的鲁棒性和计算效率。此外,我们已将其应用于多个人体器官的基因表达谱的大型数据库,由此产生的双聚类显示出明确的生物学意义。
{"title":"A New Strategy of Geometrical Biclustering for Microarray Data Analysis","authors":"Hongya Zhao, Alan Wee-Chung Liew, Hong Yan","doi":"10.1142/9781860947995_0008","DOIUrl":"https://doi.org/10.1142/9781860947995_0008","url":null,"abstract":"In this paper, we present a new biclustering algorithm to provide the geometrical interpretation of similar microarray gene expression profiles. Different from standard clustering analyses, biclustering methodology can perform simultaneous classification on the row and column dimensions of a data matrix. The main object of the strategy is to reveal the submatrix, in which a subset of genes exhibits a consistent pattern over a subset of conditions. However, the search for such subsets is a computationally complex task. We propose a new algorithm, based on the Hough transform in the column-pair space to perform pattern identification. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our simulation studies show that the method is robust to noise and computationally efficient. Furthermore, we have applied it to a large database of gene expression profiles of multiple human organs and the resulting biclusters show clear biological meanings.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"39 1","pages":"47-56"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74905469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Complexities and Algorithms for Glycan Structure Sequencing using Tandem Mass Spectrometry 串联质谱法测定多糖结构的复杂性和算法
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0032
B. Shan, B. Ma, Kaizhong Zhang, G. Lajoie
Determining glycan structures is vital to comprehend cell-matrix, cell-cell, and even intracellular biological events. Glycan structure sequencing, which is to determine the primary structure of a glycan using MS/MS spectrometry, remains one of the most important tasks in proteomics. Analogous to the peptide de novo sequencing, the glycan de novo sequencing is to determine the structure without the aid of a known glycan database. We show in this paper that glycan de novo sequencing is NP-hard. We then provide a heuristic algorithm and develop a software program to solve the problem in practical cases. Experiments on real MS/MS data of glycopeptides demonstrate that our heuristic algorithm gives satisfactory results on practical data.
确定聚糖结构对于理解细胞-基质、细胞-细胞甚至细胞内生物事件至关重要。聚糖结构测序是利用MS/MS光谱法确定聚糖的一级结构,是蛋白质组学中最重要的任务之一。与肽从头测序类似,聚糖从头测序是在没有已知聚糖数据库的帮助下确定结构。我们在本文中表明,聚糖从头测序是np困难的。然后,我们提供了一个启发式算法,并开发了一个软件程序来解决实际案例中的问题。对糖肽的MS/MS数据进行的实验表明,启发式算法在实际数据上得到了满意的结果。
{"title":"Complexities and Algorithms for Glycan Structure Sequencing using Tandem Mass Spectrometry","authors":"B. Shan, B. Ma, Kaizhong Zhang, G. Lajoie","doi":"10.1142/9781860947995_0032","DOIUrl":"https://doi.org/10.1142/9781860947995_0032","url":null,"abstract":"Determining glycan structures is vital to comprehend cell-matrix, cell-cell, and even intracellular biological events. Glycan structure sequencing, which is to determine the primary structure of a glycan using MS/MS spectrometry, remains one of the most important tasks in proteomics. Analogous to the peptide de novo sequencing, the glycan de novo sequencing is to determine the structure without the aid of a known glycan database. We show in this paper that glycan de novo sequencing is NP-hard. We then provide a heuristic algorithm and develop a software program to solve the problem in practical cases. Experiments on real MS/MS data of glycopeptides demonstrate that our heuristic algorithm gives satisfactory results on practical data.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"50 1","pages":"297-306"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75004251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exact and Heuristic Approaches for Identifying Disease-Associated SNP Motifs 确定疾病相关SNP基序的精确和启发式方法
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0020
Gaofeng Huang, P. Jeavons, D. Kwiatkowski
A Single Nucleotide Polymorphism (SNP) is a small DNA variation which occurs naturally between dierent individuals of the same species. Some combinations of SNPs in the human genome are known to increase the risk of certain complex genetic diseases. This paper formulates the problem of identifying such disease-associated SNP motifs as a combinatorial optimization problem and shows it to be NP-hard. Both exact and heuristic approaches for this problem are developed and tested on simulated data and real clinical data. Computational results are given to demonstrate that these approaches are suciently eective to support ongoing biological research.
单核苷酸多态性(SNP)是一种小的DNA变异,自然发生在同一物种的不同个体之间。人类基因组中一些单核苷酸多态性的组合已知会增加某些复杂遗传疾病的风险。本文将这类疾病相关SNP基序的识别问题表述为一个组合优化问题,并表明其为np困难问题。在模拟数据和真实临床数据上,对该问题的精确方法和启发式方法进行了开发和测试。计算结果表明,这些方法是非常有效的,以支持正在进行的生物学研究。
{"title":"Exact and Heuristic Approaches for Identifying Disease-Associated SNP Motifs","authors":"Gaofeng Huang, P. Jeavons, D. Kwiatkowski","doi":"10.1142/9781860947995_0020","DOIUrl":"https://doi.org/10.1142/9781860947995_0020","url":null,"abstract":"A Single Nucleotide Polymorphism (SNP) is a small DNA variation which occurs naturally between dierent individuals of the same species. Some combinations of SNPs in the human genome are known to increase the risk of certain complex genetic diseases. This paper formulates the problem of identifying such disease-associated SNP motifs as a combinatorial optimization problem and shows it to be NP-hard. Both exact and heuristic approaches for this problem are developed and tested on simulated data and real clinical data. Computational results are given to demonstrate that these approaches are suciently eective to support ongoing biological research.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"30 1","pages":"175-184"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77213011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Effective Promoter Detection Method using the Adaboost Algorithm 一种有效的Adaboost算法启动子检测方法
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0007
Xudong Xie, Shuanhu Wu, K. Lam, Hong Yan
In this paper, an effective promoter detection algorithm, which is called PromoterExplorer, is proposed. In our approach, various features, i.e. local distribution of pentamers, positional CpG island features and digitized DNA sequence, are combined to build a high-dimensional input vector. A cascade AdaBoost based learning procedure is adopted to select the most “informative” or “discriminating” features to build a sequence of weak classifiers. A number of weak classifiers construct a strong classifier, which can achieve a better performance. In order to reduce the false positive, a cascade structure is used for detection. PromoterExplorer is tested based on large-scale DNA sequences from different databases, including EPD, Genbank and human chromosome 22. The proposed method consistently outperforms PromoterInspector and Dragon Promoter Finder.
本文提出了一种有效的启动子检测算法——PromoterExplorer。在我们的方法中,各种特征,即五聚体的局部分布,位置CpG岛特征和数字化DNA序列,结合起来构建一个高维输入向量。采用基于AdaBoost的级联学习过程,选择最具“信息量”或“判别性”的特征构建弱分类器序列。多个弱分类器组成一个强分类器,可以获得更好的性能。为了减少误报,采用级联结构进行检测。PromoterExplorer基于来自不同数据库的大规模DNA序列进行测试,包括EPD, Genbank和人类22号染色体。所提出的方法始终优于PromoterInspector和Dragon Promoter Finder。
{"title":"An Effective Promoter Detection Method using the Adaboost Algorithm","authors":"Xudong Xie, Shuanhu Wu, K. Lam, Hong Yan","doi":"10.1142/9781860947995_0007","DOIUrl":"https://doi.org/10.1142/9781860947995_0007","url":null,"abstract":"In this paper, an effective promoter detection algorithm, which is called PromoterExplorer, is proposed. In our approach, various features, i.e. local distribution of pentamers, positional CpG island features and digitized DNA sequence, are combined to build a high-dimensional input vector. A cascade AdaBoost based learning procedure is adopted to select the most “informative” or “discriminating” features to build a sequence of weak classifiers. A number of weak classifiers construct a strong classifier, which can achieve a better performance. In order to reduce the false positive, a cascade structure is used for detection. PromoterExplorer is tested based on large-scale DNA sequences from different databases, including EPD, Genbank and human chromosome 22. The proposed method consistently outperforms PromoterInspector and Dragon Promoter Finder.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"64 1","pages":"37-46"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83453504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the ... Asia-Pacific bioinformatics conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1