首页 > 最新文献

Proceedings. IEEE Computational Systems Bioinformatics Conference最新文献

英文 中文
MISAE: a new approach for regulatory motif extraction. MISAE:一种新的调控基序提取方法。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332430
Zhaohui Sun, Jingyi Yang, Jitender S Deogun

The recognition of regulatory motifs of co-regulated genes is essential for understanding the regulatory mechanisms. However, the automatic extraction of regulatory motifs from a given data set of the upstream non-coding DNA sequences of a family of co-regulated genes is difficult because regulatory motifs are often subtle and inexact. This problem is further complicated by the corruption of the data sets. In this paper, a new approach called Mismatch-allowed Probabilistic Suffix Tree Motif Extraction (MISAE) is proposed. It combines the mismatch-allowed probabilistic suffix tree that is a probabilistic model and local prediction for the extraction of regulatory motifs. The proposed approach is tested on 15 co-regulated gene families and compares favorably with other state-of-the-art approaches. Moreover, MISAE performs well on "corrupted" data sets. It is able to extract the motif from a "corrupted" data set with less than one fourth of the sequences containing the real motif.

识别共调控基因的调控基序对于理解调控机制至关重要。然而,从一个共调控基因家族的上游非编码DNA序列的给定数据集中自动提取调控基序是困难的,因为调控基序通常是微妙和不精确的。数据集的损坏使这个问题进一步复杂化。提出了一种允许错匹配的概率后缀树基序提取方法(MISAE)。它结合了允许不匹配的概率后缀树(一种概率模型)和局部预测来提取调控基序。所提出的方法在15个共同调节的基因家族中进行了测试,并与其他最先进的方法进行了比较。此外,MISAE在“损坏的”数据集上表现良好。它能够从包含真实基序的序列少于四分之一的“损坏”数据集中提取基序。
{"title":"MISAE: a new approach for regulatory motif extraction.","authors":"Zhaohui Sun,&nbsp;Jingyi Yang,&nbsp;Jitender S Deogun","doi":"10.1109/csb.2004.1332430","DOIUrl":"https://doi.org/10.1109/csb.2004.1332430","url":null,"abstract":"<p><p>The recognition of regulatory motifs of co-regulated genes is essential for understanding the regulatory mechanisms. However, the automatic extraction of regulatory motifs from a given data set of the upstream non-coding DNA sequences of a family of co-regulated genes is difficult because regulatory motifs are often subtle and inexact. This problem is further complicated by the corruption of the data sets. In this paper, a new approach called Mismatch-allowed Probabilistic Suffix Tree Motif Extraction (MISAE) is proposed. It combines the mismatch-allowed probabilistic suffix tree that is a probabilistic model and local prediction for the extraction of regulatory motifs. The proposed approach is tested on 15 co-regulated gene families and compares favorably with other state-of-the-art approaches. Moreover, MISAE performs well on \"corrupted\" data sets. It is able to extract the motif from a \"corrupted\" data set with less than one fourth of the sequences containing the real motif.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"173-81"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332430","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25830154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biclustering in gene expression data by tendency. 基因表达数据的倾向双聚类。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332431
Jinze Liu, Jiong Wang, Wei Wang

The advent of DNA microarray technologies has revolutionized the experimental study of gene expression. Clustering is the most popular approach of analyzing gene expression data and has indeed proven to be successful in many applications. Our work focuses on discovering a subset of genes which exhibit similar expression patterns along a subset of conditions in the gene expression matrix. Specifically, we are looking for the Order Preserving clusters (OPCluster), in each of which a subset of genes induce a similar linear ordering along a subset of conditions. The pioneering work of the OPSM model[3], which enforces the strict order shared by the genes in a cluster, is included in our model as a special case. Our model is more robust than OPSM because similarly expressed conditions are allowed to form order equivalent groups and no restriction is placed on the order within a group. Guided by our model, we design and implement a deterministic algorithm, namely OPCTree, to discover OP-Clusters. Experimental study on two real datasets demonstrates the effectiveness of the algorithm in the application of tissue classification and cell cycle identification. In addition, a large percentage of OP-Clusters exhibit significant enrichment of one or more function categories, which implies that OP-Clusters indeed carry significant biological relevance.

DNA微阵列技术的出现彻底改变了基因表达的实验研究。聚类是分析基因表达数据最流行的方法,并且在许多应用中被证明是成功的。我们的工作重点是发现在基因表达矩阵中沿条件子集表现出相似表达模式的基因子集。具体来说,我们正在寻找保持顺序簇(OPCluster),在每个簇中,一个基因子集沿着一个条件子集诱导类似的线性排序。OPSM模型的开创性工作[3],强制集群中基因共享的严格顺序,作为一个特例被纳入我们的模型。我们的模型比OPSM更健壮,因为类似表达的条件允许形成顺序等效组,并且对组内的顺序没有限制。在该模型的指导下,我们设计并实现了一种确定性算法,即OPCTree来发现op -簇。在两个真实数据集上的实验研究证明了该算法在组织分类和细胞周期识别方面的有效性。此外,很大比例的op - cluster表现出一种或多种功能类别的显著富集,这意味着op - cluster确实具有重要的生物学相关性。
{"title":"Biclustering in gene expression data by tendency.","authors":"Jinze Liu,&nbsp;Jiong Wang,&nbsp;Wei Wang","doi":"10.1109/csb.2004.1332431","DOIUrl":"https://doi.org/10.1109/csb.2004.1332431","url":null,"abstract":"<p><p>The advent of DNA microarray technologies has revolutionized the experimental study of gene expression. Clustering is the most popular approach of analyzing gene expression data and has indeed proven to be successful in many applications. Our work focuses on discovering a subset of genes which exhibit similar expression patterns along a subset of conditions in the gene expression matrix. Specifically, we are looking for the Order Preserving clusters (OPCluster), in each of which a subset of genes induce a similar linear ordering along a subset of conditions. The pioneering work of the OPSM model[3], which enforces the strict order shared by the genes in a cluster, is included in our model as a special case. Our model is more robust than OPSM because similarly expressed conditions are allowed to form order equivalent groups and no restriction is placed on the order within a group. Guided by our model, we design and implement a deterministic algorithm, namely OPCTree, to discover OP-Clusters. Experimental study on two real datasets demonstrates the effectiveness of the algorithm in the application of tissue classification and cell cycle identification. In addition, a large percentage of OP-Clusters exhibit significant enrichment of one or more function categories, which implies that OP-Clusters indeed carry significant biological relevance.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"182-93"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332431","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hierarchical mixture of Markov models for finding biologically active metabolic paths using gene expression and protein classes. 利用基因表达和蛋白质类别寻找生物活性代谢途径的马尔可夫模型的层次混合。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332447
Hiroshi Mamitsuka, Yasushi Okuno

With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.

近年来,随着实验高通量技术的发展,积累生物学数据的种类和数量急剧增加。从不同类型的数据中挖掘可能会让我们找到新的生物学见解。我们提出了一种新的方法,系统地结合三种不同的数据集来寻找生物活性代谢途径/模式。该方法包括两个步骤:首先,它以一种有效的方式,从一组已知的化学反应中合成代谢途径,这些化学反应的酶是共同表达的。然后通过使用这些合成路径估计概率模型的参数,以更易于理解的方式表示所获得的代谢路径。这个模型建立在一个假设之上,即一整套化学反应对应于一个马尔可夫状态转换图。此外,该模型是一个分层潜变量模型,包含一组蛋白质类别作为潜变量,用于根据现有蛋白质类别知识对输入路径进行聚类。我们使用糖酵解的主要途径测试了我们的方法的性能,发现我们的方法在基因表达分类问题上取得了比其他无监督方法更高的预测性能。我们进一步分析了概率模型的估计参数,发现每种表达实验类型的生物活性路径仅聚为两种或三种模式,每种模式都表明糖酵解途径中存在一些新的远程关系。
{"title":"A hierarchical mixture of Markov models for finding biologically active metabolic paths using gene expression and protein classes.","authors":"Hiroshi Mamitsuka,&nbsp;Yasushi Okuno","doi":"10.1109/csb.2004.1332447","DOIUrl":"https://doi.org/10.1109/csb.2004.1332447","url":null,"abstract":"<p><p>With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"341-52"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332447","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25831036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Separation of ion types in tandem mass spectrometry data interpretation -- a graph-theoretic approach. 串联质谱数据解释中离子类型的分离——一种图论方法。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332437
Bo Yan, Chongle Pan, Victor N Olman, Robert L Hettich, Ying Xu

Mass spectrometry is one of the most popular analytical techniques for identification of individual proteins in a protein mixture, one of the basic problems in proteomics. It identifies a protein through identifying its unique mass spectral pattern. While the problem is theoretically solvable, it remains a challenging problem computationally. One of the key challenges comes from the difficulty in distinguishing the N- and C-terminus ions, mostly b- and y-ions respectively. In this paper, we present a graph algorithm for solving the problem of separating bfrom y-ions in a set of mass spectra. We represent each spectral peak as a node and consider two types of edges: a type-1 edge connects two peaks possibly of the same ion types and a type-2 edge connects two peaks possibly of different ion types, predicted based on local information. The ion-separation problem is then formulated and solved as a graph partition problem, which is to partition the graph into three subgraphs, namely b-, y-ions and others respectively, so to maximize the total weight of type-1 edges while minimizing the total weight of type-2 edges within each subgraph. We have developed a dynamic programming algorithm for rigorously solving this graph partition problem and implemented it as a computer program PRIME. We have tested PRIME on 18 data sets of high accurate FT-ICR tandem mass spectra and found that it achieved ~90% accuracy for separation of b- and y- ions.

质谱法是鉴定蛋白质混合物中单个蛋白质的最流行的分析技术之一,是蛋白质组学的基本问题之一。它通过识别蛋白质独特的质谱模式来识别蛋白质。虽然这个问题在理论上是可以解决的,但它在计算上仍然是一个具有挑战性的问题。其中一个关键的挑战来自于难以区分N和c端离子,主要是分别b和y离子。本文提出了一种图算法,用于解决一组质谱中硼离子和y离子的分离问题。我们将每个光谱峰表示为一个节点,并考虑两种类型的边:1型边连接两个可能具有相同离子类型的峰,2型边连接两个可能具有不同离子类型的峰,这是基于局部信息预测的。然后将离子分离问题表述为图划分问题,将图划分为3个子图,分别为b-、y-ions和其他,使每个子图中1型边的总权值最大化,2型边的总权值最小化。我们开发了一种动态规划算法来严格解决这个图划分问题,并将其实现为计算机程序PRIME。我们在18组高精度FT-ICR串联质谱数据集上对PRIME进行了测试,发现它对b离子和y离子的分离精度达到了~90%。
{"title":"Separation of ion types in tandem mass spectrometry data interpretation -- a graph-theoretic approach.","authors":"Bo Yan,&nbsp;Chongle Pan,&nbsp;Victor N Olman,&nbsp;Robert L Hettich,&nbsp;Ying Xu","doi":"10.1109/csb.2004.1332437","DOIUrl":"https://doi.org/10.1109/csb.2004.1332437","url":null,"abstract":"<p><p>Mass spectrometry is one of the most popular analytical techniques for identification of individual proteins in a protein mixture, one of the basic problems in proteomics. It identifies a protein through identifying its unique mass spectral pattern. While the problem is theoretically solvable, it remains a challenging problem computationally. One of the key challenges comes from the difficulty in distinguishing the N- and C-terminus ions, mostly b- and y-ions respectively. In this paper, we present a graph algorithm for solving the problem of separating bfrom y-ions in a set of mass spectra. We represent each spectral peak as a node and consider two types of edges: a type-1 edge connects two peaks possibly of the same ion types and a type-2 edge connects two peaks possibly of different ion types, predicted based on local information. The ion-separation problem is then formulated and solved as a graph partition problem, which is to partition the graph into three subgraphs, namely b-, y-ions and others respectively, so to maximize the total weight of type-1 edges while minimizing the total weight of type-2 edges within each subgraph. We have developed a dynamic programming algorithm for rigorously solving this graph partition problem and implemented it as a computer program PRIME. We have tested PRIME on 18 data sets of high accurate FT-ICR tandem mass spectra and found that it achieved ~90% accuracy for separation of b- and y- ions.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"236-44"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332437","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. IEEE Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1