首页 > 最新文献

Proceedings. IEEE Computational Systems Bioinformatics Conference最新文献

英文 中文
Reasoning about molecular similarity and properties. 推理分子的相似性和性质。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332440
Rahul Singh

Ascertaining the similarity amongst molecules is a fundamental problem in biology and drug discovery. Since similar molecules tend to have similar biological properties, the notion of molecular similarity plays an important role in exploration of molecular structural space, query-retrieval in molecular databases, and in structure-activity modeling. This problem is related to the issue of molecular representation. Currently, approaches with high descriptive power like 3D surface-based representations are available. However, most techniques tend to focus on 2D graph-based molecular similarity due to the complexity that accompanies reasoning with more elaborate representations. This paper addresses the problem of determining similarity when molecules are described using complex surface-based representations. It proposes an intrinsic, spherical representation that systematically maps points on a molecular surface to points on a standard coordinate system (a sphere). Molecular geometry, molecular fields, and effects due to field super-positioning can then be captured as distributions on the surface of the sphere. Molecular similarity is obtained by computing the similarity of the corresponding property distributions using a novel formulation of histogram-intersection. This method is robust to noise, obviates molecular pose-optimization, can incorporate conformational variations, and facilitates highly efficient determination of similarity. Retrieval performance, applications in structure-activity modeling of complex biological properties, and comparisons with existing research and commercial methods demonstrate the validity and effectiveness of the approach.

确定分子间的相似性是生物学和药物发现中的一个基本问题。由于相似的分子往往具有相似的生物学特性,因此分子相似性的概念在分子结构空间的探索、分子数据库的查询检索以及结构-活性建模中发挥着重要作用。这个问题与分子表示问题有关。目前,具有高描述能力的方法,如基于3D表面的表示是可用的。然而,大多数技术倾向于关注基于二维图的分子相似性,因为更复杂的表示会带来推理的复杂性。本文解决了当使用复杂的基于表面的表示来描述分子时确定相似性的问题。它提出了一种内在的球形表示,系统地将分子表面上的点映射到标准坐标系(球体)上的点。分子几何、分子场和由于场超定位而产生的效应可以被捕获为球体表面的分布。采用一种新的直方图-交集公式,通过计算相应性质分布的相似度来获得分子相似度。该方法对噪声具有鲁棒性,避免了分子位优化,可以结合构象变化,并有助于高效地确定相似性。检索性能,在复杂生物特性结构-活性建模中的应用,以及与现有研究和商业方法的比较证明了该方法的有效性和有效性。
{"title":"Reasoning about molecular similarity and properties.","authors":"Rahul Singh","doi":"10.1109/csb.2004.1332440","DOIUrl":"https://doi.org/10.1109/csb.2004.1332440","url":null,"abstract":"<p><p>Ascertaining the similarity amongst molecules is a fundamental problem in biology and drug discovery. Since similar molecules tend to have similar biological properties, the notion of molecular similarity plays an important role in exploration of molecular structural space, query-retrieval in molecular databases, and in structure-activity modeling. This problem is related to the issue of molecular representation. Currently, approaches with high descriptive power like 3D surface-based representations are available. However, most techniques tend to focus on 2D graph-based molecular similarity due to the complexity that accompanies reasoning with more elaborate representations. This paper addresses the problem of determining similarity when molecules are described using complex surface-based representations. It proposes an intrinsic, spherical representation that systematically maps points on a molecular surface to points on a standard coordinate system (a sphere). Molecular geometry, molecular fields, and effects due to field super-positioning can then be captured as distributions on the surface of the sphere. Molecular similarity is obtained by computing the similarity of the corresponding property distributions using a novel formulation of histogram-intersection. This method is robust to noise, obviates molecular pose-optimization, can incorporate conformational variations, and facilitates highly efficient determination of similarity. Retrieval performance, applications in structure-activity modeling of complex biological properties, and comparisons with existing research and commercial methods demonstrate the validity and effectiveness of the approach.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"266-77"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332440","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25831029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biclustering in gene expression data by tendency. 基因表达数据的倾向双聚类。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332431
Jinze Liu, Jiong Wang, Wei Wang

The advent of DNA microarray technologies has revolutionized the experimental study of gene expression. Clustering is the most popular approach of analyzing gene expression data and has indeed proven to be successful in many applications. Our work focuses on discovering a subset of genes which exhibit similar expression patterns along a subset of conditions in the gene expression matrix. Specifically, we are looking for the Order Preserving clusters (OPCluster), in each of which a subset of genes induce a similar linear ordering along a subset of conditions. The pioneering work of the OPSM model[3], which enforces the strict order shared by the genes in a cluster, is included in our model as a special case. Our model is more robust than OPSM because similarly expressed conditions are allowed to form order equivalent groups and no restriction is placed on the order within a group. Guided by our model, we design and implement a deterministic algorithm, namely OPCTree, to discover OP-Clusters. Experimental study on two real datasets demonstrates the effectiveness of the algorithm in the application of tissue classification and cell cycle identification. In addition, a large percentage of OP-Clusters exhibit significant enrichment of one or more function categories, which implies that OP-Clusters indeed carry significant biological relevance.

DNA微阵列技术的出现彻底改变了基因表达的实验研究。聚类是分析基因表达数据最流行的方法,并且在许多应用中被证明是成功的。我们的工作重点是发现在基因表达矩阵中沿条件子集表现出相似表达模式的基因子集。具体来说,我们正在寻找保持顺序簇(OPCluster),在每个簇中,一个基因子集沿着一个条件子集诱导类似的线性排序。OPSM模型的开创性工作[3],强制集群中基因共享的严格顺序,作为一个特例被纳入我们的模型。我们的模型比OPSM更健壮,因为类似表达的条件允许形成顺序等效组,并且对组内的顺序没有限制。在该模型的指导下,我们设计并实现了一种确定性算法,即OPCTree来发现op -簇。在两个真实数据集上的实验研究证明了该算法在组织分类和细胞周期识别方面的有效性。此外,很大比例的op - cluster表现出一种或多种功能类别的显著富集,这意味着op - cluster确实具有重要的生物学相关性。
{"title":"Biclustering in gene expression data by tendency.","authors":"Jinze Liu,&nbsp;Jiong Wang,&nbsp;Wei Wang","doi":"10.1109/csb.2004.1332431","DOIUrl":"https://doi.org/10.1109/csb.2004.1332431","url":null,"abstract":"<p><p>The advent of DNA microarray technologies has revolutionized the experimental study of gene expression. Clustering is the most popular approach of analyzing gene expression data and has indeed proven to be successful in many applications. Our work focuses on discovering a subset of genes which exhibit similar expression patterns along a subset of conditions in the gene expression matrix. Specifically, we are looking for the Order Preserving clusters (OPCluster), in each of which a subset of genes induce a similar linear ordering along a subset of conditions. The pioneering work of the OPSM model[3], which enforces the strict order shared by the genes in a cluster, is included in our model as a special case. Our model is more robust than OPSM because similarly expressed conditions are allowed to form order equivalent groups and no restriction is placed on the order within a group. Guided by our model, we design and implement a deterministic algorithm, namely OPCTree, to discover OP-Clusters. Experimental study on two real datasets demonstrates the effectiveness of the algorithm in the application of tissue classification and cell cycle identification. In addition, a large percentage of OP-Clusters exhibit significant enrichment of one or more function categories, which implies that OP-Clusters indeed carry significant biological relevance.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"182-93"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332431","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hierarchical mixture of Markov models for finding biologically active metabolic paths using gene expression and protein classes. 利用基因表达和蛋白质类别寻找生物活性代谢途径的马尔可夫模型的层次混合。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332447
Hiroshi Mamitsuka, Yasushi Okuno

With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.

近年来,随着实验高通量技术的发展,积累生物学数据的种类和数量急剧增加。从不同类型的数据中挖掘可能会让我们找到新的生物学见解。我们提出了一种新的方法,系统地结合三种不同的数据集来寻找生物活性代谢途径/模式。该方法包括两个步骤:首先,它以一种有效的方式,从一组已知的化学反应中合成代谢途径,这些化学反应的酶是共同表达的。然后通过使用这些合成路径估计概率模型的参数,以更易于理解的方式表示所获得的代谢路径。这个模型建立在一个假设之上,即一整套化学反应对应于一个马尔可夫状态转换图。此外,该模型是一个分层潜变量模型,包含一组蛋白质类别作为潜变量,用于根据现有蛋白质类别知识对输入路径进行聚类。我们使用糖酵解的主要途径测试了我们的方法的性能,发现我们的方法在基因表达分类问题上取得了比其他无监督方法更高的预测性能。我们进一步分析了概率模型的估计参数,发现每种表达实验类型的生物活性路径仅聚为两种或三种模式,每种模式都表明糖酵解途径中存在一些新的远程关系。
{"title":"A hierarchical mixture of Markov models for finding biologically active metabolic paths using gene expression and protein classes.","authors":"Hiroshi Mamitsuka,&nbsp;Yasushi Okuno","doi":"10.1109/csb.2004.1332447","DOIUrl":"https://doi.org/10.1109/csb.2004.1332447","url":null,"abstract":"<p><p>With the recent development of experimental high-throughput techniques, the type and volume of accumulating biological data have extremely increased these few years. Mining from different types of data might lead us to find new biological insights. We present a new methodology for systematically combining three different datasets to find biologically active metabolic paths/patterns. This method consists of two steps: First it synthesizes metabolic paths from a given set of chemical reactions, which are already known and whose enzymes are co-expressed, in an efficient manner. It then represents the obtained metabolic paths in a more comprehensible way through estimating parameters of a probabilistic model by using these synthesized paths. This model is built upon an assumption that an entire set of chemical reactions corresponds to a Markov state transition diagram. Furthermore, this model is a hierarchical latent variable model, containing a set of protein classes as a latent variable, for clustering input paths in terms of existing knowledge of protein classes. We tested the performance of our method using a main pathway of glycolysis, and found that our method achieved higher predictive performance for the issue of classifying gene expressions than those obtained by other unsupervised methods. We further analyzed the estimated parameters of our probabilistic models, and found that biologically active paths were clustered into only two or three patterns for each expression experiment type, and each pattern suggested some new long-range relations in the glycolysis pathway.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"341-52"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332447","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25831036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Separation of ion types in tandem mass spectrometry data interpretation -- a graph-theoretic approach. 串联质谱数据解释中离子类型的分离——一种图论方法。
Pub Date : 2004-01-01 DOI: 10.1109/csb.2004.1332437
Bo Yan, Chongle Pan, Victor N Olman, Robert L Hettich, Ying Xu

Mass spectrometry is one of the most popular analytical techniques for identification of individual proteins in a protein mixture, one of the basic problems in proteomics. It identifies a protein through identifying its unique mass spectral pattern. While the problem is theoretically solvable, it remains a challenging problem computationally. One of the key challenges comes from the difficulty in distinguishing the N- and C-terminus ions, mostly b- and y-ions respectively. In this paper, we present a graph algorithm for solving the problem of separating bfrom y-ions in a set of mass spectra. We represent each spectral peak as a node and consider two types of edges: a type-1 edge connects two peaks possibly of the same ion types and a type-2 edge connects two peaks possibly of different ion types, predicted based on local information. The ion-separation problem is then formulated and solved as a graph partition problem, which is to partition the graph into three subgraphs, namely b-, y-ions and others respectively, so to maximize the total weight of type-1 edges while minimizing the total weight of type-2 edges within each subgraph. We have developed a dynamic programming algorithm for rigorously solving this graph partition problem and implemented it as a computer program PRIME. We have tested PRIME on 18 data sets of high accurate FT-ICR tandem mass spectra and found that it achieved ~90% accuracy for separation of b- and y- ions.

质谱法是鉴定蛋白质混合物中单个蛋白质的最流行的分析技术之一,是蛋白质组学的基本问题之一。它通过识别蛋白质独特的质谱模式来识别蛋白质。虽然这个问题在理论上是可以解决的,但它在计算上仍然是一个具有挑战性的问题。其中一个关键的挑战来自于难以区分N和c端离子,主要是分别b和y离子。本文提出了一种图算法,用于解决一组质谱中硼离子和y离子的分离问题。我们将每个光谱峰表示为一个节点,并考虑两种类型的边:1型边连接两个可能具有相同离子类型的峰,2型边连接两个可能具有不同离子类型的峰,这是基于局部信息预测的。然后将离子分离问题表述为图划分问题,将图划分为3个子图,分别为b-、y-ions和其他,使每个子图中1型边的总权值最大化,2型边的总权值最小化。我们开发了一种动态规划算法来严格解决这个图划分问题,并将其实现为计算机程序PRIME。我们在18组高精度FT-ICR串联质谱数据集上对PRIME进行了测试,发现它对b离子和y离子的分离精度达到了~90%。
{"title":"Separation of ion types in tandem mass spectrometry data interpretation -- a graph-theoretic approach.","authors":"Bo Yan,&nbsp;Chongle Pan,&nbsp;Victor N Olman,&nbsp;Robert L Hettich,&nbsp;Ying Xu","doi":"10.1109/csb.2004.1332437","DOIUrl":"https://doi.org/10.1109/csb.2004.1332437","url":null,"abstract":"<p><p>Mass spectrometry is one of the most popular analytical techniques for identification of individual proteins in a protein mixture, one of the basic problems in proteomics. It identifies a protein through identifying its unique mass spectral pattern. While the problem is theoretically solvable, it remains a challenging problem computationally. One of the key challenges comes from the difficulty in distinguishing the N- and C-terminus ions, mostly b- and y-ions respectively. In this paper, we present a graph algorithm for solving the problem of separating bfrom y-ions in a set of mass spectra. We represent each spectral peak as a node and consider two types of edges: a type-1 edge connects two peaks possibly of the same ion types and a type-2 edge connects two peaks possibly of different ion types, predicted based on local information. The ion-separation problem is then formulated and solved as a graph partition problem, which is to partition the graph into three subgraphs, namely b-, y-ions and others respectively, so to maximize the total weight of type-1 edges while minimizing the total weight of type-2 edges within each subgraph. We have developed a dynamic programming algorithm for rigorously solving this graph partition problem and implemented it as a computer program PRIME. We have tested PRIME on 18 data sets of high accurate FT-ICR tandem mass spectra and found that it achieved ~90% accuracy for separation of b- and y- ions.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"236-44"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332437","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. IEEE Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1