首页 > 最新文献

Proceedings. IEEE Computational Systems Bioinformatics Conference最新文献

英文 中文
Estimating time-dependent gene networks from time series microarray data by dynamic linear models with Markov switching. 基于马尔可夫切换的动态线性模型估计时间序列微阵列数据的时变基因网络。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.32
Ryo Yoshida, Seiya Imoto, Tomoyuki Higuchi

In gene network estimation from time series microarray data, dynamic models such as differential equations and dynamic Bayesian networks assume that the network structure is stable through all time points, while the real network might changes its structure depending on time, affection of some shocks and so on. If the true network structure underlying the data changes at certain points, the fitting of the usual dynamic linear models fails to estimate the structure of gene network and we cannot obtain efficient information from data. To solve this problem, we propose a dynamic linear model with Markov switching for estimating time-dependent gene network structure from time series gene expression data. Using our proposed method, the network structure between genes and its change points are automatically estimated. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle time series data.

在基于时间序列微阵列数据的基因网络估计中,微分方程和动态贝叶斯网络等动态模型假设网络结构在所有时间点都是稳定的,而真实网络可能会随着时间、某些冲击的影响等发生结构变化。如果数据背后的真实网络结构在某些点发生变化,通常的动态线性模型拟合无法估计基因网络的结构,无法从数据中获得有效的信息。为了解决这一问题,我们提出了一个带有马尔可夫切换的动态线性模型,用于从时间序列基因表达数据中估计时间相关的基因网络结构。该方法可自动估计基因间的网络结构及其变化点。我们通过对酿酒酵母细胞周期时间序列数据的分析证明了所提出方法的有效性。
{"title":"Estimating time-dependent gene networks from time series microarray data by dynamic linear models with Markov switching.","authors":"Ryo Yoshida,&nbsp;Seiya Imoto,&nbsp;Tomoyuki Higuchi","doi":"10.1109/csb.2005.32","DOIUrl":"https://doi.org/10.1109/csb.2005.32","url":null,"abstract":"<p><p>In gene network estimation from time series microarray data, dynamic models such as differential equations and dynamic Bayesian networks assume that the network structure is stable through all time points, while the real network might changes its structure depending on time, affection of some shocks and so on. If the true network structure underlying the data changes at certain points, the fitting of the usual dynamic linear models fails to estimate the structure of gene network and we cannot obtain efficient information from data. To solve this problem, we propose a dynamic linear model with Markov switching for estimating time-dependent gene network structure from time series gene expression data. Using our proposed method, the network structure between genes and its change points are automatically estimated. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle time series data.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"289-98"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.32","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Learning yeast gene functions from heterogeneous sources of data using hybrid weighted Bayesian networks. 利用混合加权贝叶斯网络从异构数据源学习酵母基因功能。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.38
Xutao Deng, Huimin Geng, Hesham Ali

We developed a machine learning system for determining gene functions from heterogeneous sources of data sets using a Weighted Naive Bayesian Network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or ORFs (Open Reading Frames) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore many functional links would be missing when only one or two source of data is used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.

我们开发了一个机器学习系统,用于使用加权朴素贝叶斯网络(WNB)从异构数据集中确定基因功能。基因功能的知识对于理解许多基本的生物学机制如调控途径、细胞周期和疾病是至关重要的。我们的主要目标是使用计算方法从现有数据库中准确地推断出假定基因或orf(开放阅读框架)的功能。然而,这项任务本质上是困难的,因为潜在的生物过程代表了多个实体的复杂相互作用。因此,如果在预测中只使用一个或两个数据源,就会丢失许多功能链接。我们的假设是,整合来自多个互补来源的证据可以显著提高预测精度。在本文中,我们的实验结果不仅表明上述假设是有效的,而且为使用WNB系统进行数据收集、训练和预测提供了指导。组合的训练数据集包含来自公共数据库的基因注释、基因表达、聚类输出、关键字注释和序列同源性的信息。目前的系统是在出芽酵母酿酒酵母的基因上进行训练和测试的。我们的WNB模型还可以用于分析通过重量训练过程中每个信息源对预测性能的贡献。通过促进对生物实体之间复杂关系的解释和理解,贡献分析可能潜在地导致重大的科学发现。
{"title":"Learning yeast gene functions from heterogeneous sources of data using hybrid weighted Bayesian networks.","authors":"Xutao Deng,&nbsp;Huimin Geng,&nbsp;Hesham Ali","doi":"10.1109/csb.2005.38","DOIUrl":"https://doi.org/10.1109/csb.2005.38","url":null,"abstract":"<p><p>We developed a machine learning system for determining gene functions from heterogeneous sources of data sets using a Weighted Naive Bayesian Network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or ORFs (Open Reading Frames) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore many functional links would be missing when only one or two source of data is used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"25-34"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.38","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Analysis of SNP-expression association matrices. snp表达关联矩阵分析。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.14
Anya Tsalenko, Roded Sharan, Hege Edvardsen, Vessela Kristensen, Anne-Lise Børresen-Dale, Amir Ben-Dor, Zohar Yakhini

High throughput expression profiling and genotyping technologies provide the means to study the genetic determinants of population variation in gene expression variation. In this paper we present a general statistical framework for the simultaneous analysis of gene expression data and SNP genotype data measured for the same cohort. The framework consists of methods to associate transcripts with SNPs affecting their expression, algorithms to detect subsets of transcripts that share significantly many associations with a subset of SNPs, and methods to visualize the identified relations. We apply our framework to SNP-expression data collected from 49 breast cancer patients. Our results demonstrate an overabundance of transcript-SNP associations in this data, and pinpoint SNPs that are potential master regulators of transcription. We also identify several statistically significant transcript-subsets with common putative regulators that fall into well-defined functional categories.

高通量表达谱和基因分型技术为研究基因表达变异中群体变异的遗传决定因素提供了手段。在本文中,我们提出了一个通用的统计框架,用于同时分析同一队列的基因表达数据和SNP基因型数据。该框架包括将转录本与影响其表达的snp关联起来的方法,检测与snp子集共享许多关联的转录本子集的算法,以及可视化已识别关系的方法。我们将我们的框架应用于从49名乳腺癌患者中收集的snp表达数据。我们的研究结果表明,在这些数据中存在过多的转录- snp关联,并确定了潜在的转录主调控snp。我们还确定了几个具有统计意义的转录亚群,它们具有共同的假定调节因子,属于定义良好的功能类别。
{"title":"Analysis of SNP-expression association matrices.","authors":"Anya Tsalenko,&nbsp;Roded Sharan,&nbsp;Hege Edvardsen,&nbsp;Vessela Kristensen,&nbsp;Anne-Lise Børresen-Dale,&nbsp;Amir Ben-Dor,&nbsp;Zohar Yakhini","doi":"10.1109/csb.2005.14","DOIUrl":"https://doi.org/10.1109/csb.2005.14","url":null,"abstract":"<p><p>High throughput expression profiling and genotyping technologies provide the means to study the genetic determinants of population variation in gene expression variation. In this paper we present a general statistical framework for the simultaneous analysis of gene expression data and SNP genotype data measured for the same cohort. The framework consists of methods to associate transcripts with SNPs affecting their expression, algorithms to detect subsets of transcripts that share significantly many associations with a subset of SNPs, and methods to visualize the identified relations. We apply our framework to SNP-expression data collected from 49 breast cancer patients. Our results demonstrate an overabundance of transcript-SNP associations in this data, and pinpoint SNPs that are potential master regulators of transcription. We also identify several statistically significant transcript-subsets with common putative regulators that fall into well-defined functional categories.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"135-43"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Identification of post-translational modifications via blind search of mass-spectra. 通过质谱盲搜索鉴定翻译后修饰。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.34
Dekel Tsur, Stephen Tanner, Ebrahim Zandi, Vineet Bafna, Pavel A Pevzner

Post-translational modifications (PTMs) are of great biological importance. Most existing approaches perform a restrictive search that can only take into account a few types of PTMs and ignore all others. We describe an unrestrictive PTM search algorithm that searches for all types of PTMs at once in a blind mode, i.e., without knowing which PTMs exist in a sample. The blind PTM identification opens a possibility to study the extent and frequencies of different types of PTMs, still an open problem in proteomics. Using our new algorithm, we were able to construct a two-dimensional PTM frequency matrix that reflects the number of MS/MS spectra in a sample for each putative PTM type and each amino acid. Application of this approach to a large IKKb dataset resulted in the largest set of PTMs reported for a single MS/MS sample so far. We demonstrate an excellent correlation between high values in the PTM frequency matrix and known PTMs thus validating our approach. We further argue that the PTM frequency matrix may reveal some still unknown modifications that warrant further experimental validation.

翻译后修饰(PTMs)具有重要的生物学意义。大多数现有方法执行限制性搜索,只能考虑几种类型的ptm,而忽略所有其他类型。我们描述了一种无限制的PTM搜索算法,该算法在盲模式下搜索所有类型的PTM,即不知道样本中存在哪些PTM。PTM的盲鉴定为研究不同类型PTM的范围和频率提供了可能,这是蛋白质组学中尚未解决的问题。利用我们的新算法,我们能够构建一个二维PTM频率矩阵,该矩阵反映了样品中每种假定的PTM类型和每种氨基酸的MS/MS谱的数量。将这种方法应用于大型IKKb数据集,产生了迄今为止单个MS/MS样本报告的最大ptm集。我们证明了PTM频率矩阵中的高值与已知PTM之间的良好相关性,从而验证了我们的方法。我们进一步认为,PTM频率矩阵可能揭示了一些仍然未知的修改,需要进一步的实验验证。
{"title":"Identification of post-translational modifications via blind search of mass-spectra.","authors":"Dekel Tsur,&nbsp;Stephen Tanner,&nbsp;Ebrahim Zandi,&nbsp;Vineet Bafna,&nbsp;Pavel A Pevzner","doi":"10.1109/csb.2005.34","DOIUrl":"https://doi.org/10.1109/csb.2005.34","url":null,"abstract":"<p><p>Post-translational modifications (PTMs) are of great biological importance. Most existing approaches perform a restrictive search that can only take into account a few types of PTMs and ignore all others. We describe an unrestrictive PTM search algorithm that searches for all types of PTMs at once in a blind mode, i.e., without knowing which PTMs exist in a sample. The blind PTM identification opens a possibility to study the extent and frequencies of different types of PTMs, still an open problem in proteomics. Using our new algorithm, we were able to construct a two-dimensional PTM frequency matrix that reflects the number of MS/MS spectra in a sample for each putative PTM type and each amino acid. Application of this approach to a large IKKb dataset resulted in the largest set of PTMs reported for a single MS/MS sample so far. We demonstrate an excellent correlation between high values in the PTM frequency matrix and known PTMs thus validating our approach. We further argue that the PTM frequency matrix may reveal some still unknown modifications that warrant further experimental validation.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"157-66"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.34","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
A topological measurement for weighted protein interaction network. 加权蛋白质相互作用网络的拓扑测量方法。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.8
Pengjun Pei, Aidong Zhang

High-throughput methods for detecting protein-protein interactions (PPI) have given researchers an initial global picture of protein interactions on a genomic scale. The usefulness of this understanding is, however, typically compromised by noisy data. The effective way of integrating and using these non-congruent data sets has received little attention to date. This paper proposes a model to integrate different data sets. We construct this model using our prior knowledge of data set reliability. Based on this model, we propose a topological measurement to select reliable interactions and to quantify the similarity between two proteins' interaction profiles. Our measurement exploits the small-world network topological properties of protein interaction network. Meanwhile, we discovered some additional properties of the network. We show that our measurement can be used to find reliable interactions with improved performance and to find protein pairs with higher function homogeneity.

检测蛋白质-蛋白质相互作用(PPI)的高通量方法为研究人员提供了基因组尺度上蛋白质相互作用的初步全局图像。然而,这种理解的有用性通常会受到噪声数据的影响。如何有效地整合和利用这些非同余数据集,目前还没有得到足够的重视。本文提出了一种集成不同数据集的模型。我们使用我们对数据集可靠性的先验知识来构建这个模型。基于该模型,我们提出了一种拓扑测量方法来选择可靠的相互作用,并量化两种蛋白质相互作用谱之间的相似性。我们的测量利用了蛋白质相互作用网络的小世界网络拓扑特性。同时,我们还发现了该网络的一些附加特性。我们表明,我们的测量可以用来找到可靠的相互作用,提高性能,并找到具有更高功能同质性的蛋白质对。
{"title":"A topological measurement for weighted protein interaction network.","authors":"Pengjun Pei,&nbsp;Aidong Zhang","doi":"10.1109/csb.2005.8","DOIUrl":"https://doi.org/10.1109/csb.2005.8","url":null,"abstract":"<p><p>High-throughput methods for detecting protein-protein interactions (PPI) have given researchers an initial global picture of protein interactions on a genomic scale. The usefulness of this understanding is, however, typically compromised by noisy data. The effective way of integrating and using these non-congruent data sets has received little attention to date. This paper proposes a model to integrate different data sets. We construct this model using our prior knowledge of data set reliability. Based on this model, we propose a topological measurement to select reliable interactions and to quantify the similarity between two proteins' interaction profiles. Our measurement exploits the small-world network topological properties of protein interaction network. Meanwhile, we discovered some additional properties of the network. We show that our measurement can be used to find reliable interactions with improved performance and to find protein pairs with higher function homogeneity.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"268-78"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Application of a generalized MWC model for the mathematical simulation of metabolic pathways regulated by allosteric enzymes. 应用广义MWC模型对变构酶调控的代谢途径进行数学模拟。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.15
Tarek S Najdi, Chin-Rang Yang, Bruce E Shapiro, G Wesley Hatfield, Eric D Mjolsness

In our effort to elucidate the systems biology of the model organism, Escherichia coli, we have developed a mathematical model that simulates the allosteric regulation for threonine biosynthesis pathway starting from aspartate. To achieve this goal, we used kMech, a Cellerator language extension that describes enzyme mechanisms for the mathematical modeling of metabolic pathways. These mechanisms are converted by Cellerator into ordinary differential equations (ODEs) solvable by Mathematica. In this paper, we describe a more flexible model in Cellerator, which generalizes the Monod, Wyman, Changeux (MWC) model for enzyme allosteric regulation to allow for multiple substrate, activator and inhibitor binding sites. Furthermore, we have developed a model that describes the behavior of the bifunctional allosteric enzyme aspartate Kinase I-Homoserine Dehydrogenase I (AKI-HDHI). This model predicts the partition of enzyme activities in the steady state which paves a way for a more generalized prediction of the behavior of bifunctional enzymes.

为了阐明模式生物大肠杆菌的系统生物学,我们建立了一个数学模型,模拟了从天冬氨酸开始的苏氨酸生物合成途径的变弹性调节。为了实现这一目标,我们使用了kMech,这是一种Cellerator语言扩展,用于描述代谢途径数学建模的酶机制。这些机制由Cellerator转换成可由Mathematica求解的常微分方程(ode)。在本文中,我们在Cellerator中描述了一个更灵活的模型,它将Monod, Wyman, Changeux (MWC)模型推广到酶变构调节,以允许多个底物,激活剂和抑制剂结合位点。此外,我们已经开发了一个模型来描述双功能变构酶天冬氨酸激酶I-同丝氨酸脱氢酶I (AKI-HDHI)的行为。该模型预测了稳定状态下酶活性的分配,为更广泛地预测双功能酶的行为铺平了道路。
{"title":"Application of a generalized MWC model for the mathematical simulation of metabolic pathways regulated by allosteric enzymes.","authors":"Tarek S Najdi,&nbsp;Chin-Rang Yang,&nbsp;Bruce E Shapiro,&nbsp;G Wesley Hatfield,&nbsp;Eric D Mjolsness","doi":"10.1109/csb.2005.15","DOIUrl":"https://doi.org/10.1109/csb.2005.15","url":null,"abstract":"<p><p>In our effort to elucidate the systems biology of the model organism, Escherichia coli, we have developed a mathematical model that simulates the allosteric regulation for threonine biosynthesis pathway starting from aspartate. To achieve this goal, we used kMech, a Cellerator language extension that describes enzyme mechanisms for the mathematical modeling of metabolic pathways. These mechanisms are converted by Cellerator into ordinary differential equations (ODEs) solvable by Mathematica. In this paper, we describe a more flexible model in Cellerator, which generalizes the Monod, Wyman, Changeux (MWC) model for enzyme allosteric regulation to allow for multiple substrate, activator and inhibitor binding sites. Furthermore, we have developed a model that describes the behavior of the bifunctional allosteric enzyme aspartate Kinase I-Homoserine Dehydrogenase I (AKI-HDHI). This model predicts the partition of enzyme activities in the steady state which paves a way for a more generalized prediction of the behavior of bifunctional enzymes.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"279-88"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Identifying simple discriminatory gene vectors with an information theory approach. 用信息论方法识别简单的歧视性基因载体。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.35
Zheng Yun, Kwoh Chee Keong

In the feature selection of cancer classification problems, many existing methods consider genes individually by choosing the top genes which have the most significant signal-to-noise statistic or correlation coefficient. However the information of the class distinction provided by such genes may overlap intensively, since their gene expression patterns are similar. The redundancy of including many genes with similar gene expression patterns results in highly complex classifiers. According to the principle of Occam's razor, simple models are preferable to complex ones, if they can produce comparable prediction performances to the complex ones. In this paper, we introduce a new method to learn accurate and low-complexity classifiers from gene expression profiles. In our method, we use mutual information to measure the relation between a set of genes, called gene vectors, and the class attribute of the samples. The gene vectors are in higher-dimensional spaces than individual genes, therefore, they are more diverse, or contain more information than individual genes. Hence, gene vectors are more preferable to individual genes in describing the class distinctions between samples since they contain more information about the class attribute. We validate our method on 3 gene expression profiles. By comparing our results with those from literature and other well-known classification methods, our method demonstrated better or comparable prediction performances to the existing methods, however, with lower-complexity models than existing methods.

在癌症分类问题的特征选择中,现有的许多方法都是通过选择信噪统计量或相关系数最大的顶级基因来单独考虑基因。然而,由于它们的基因表达模式相似,这些基因提供的分类信息可能会有很大的重叠。包含许多具有相似基因表达模式的基因的冗余导致高度复杂的分类器。根据奥卡姆剃刀原理,如果简单模型能产生与复杂模型相当的预测性能,那么简单模型优于复杂模型。本文介绍了一种从基因表达谱中学习准确、低复杂度分类器的新方法。在我们的方法中,我们使用互信息来度量一组基因(称为基因载体)与样本的类别属性之间的关系。基因载体比个体基因处于更高的维度空间,因此,它们比个体基因更多样化,或者包含更多的信息。因此,基因载体在描述样本之间的类区别时比单个基因更可取,因为它们包含更多关于类属性的信息。我们在3个基因表达谱上验证了我们的方法。将我们的结果与文献和其他知名分类方法的结果进行比较,我们的方法显示出与现有方法更好或相当的预测性能,但模型复杂度低于现有方法。
{"title":"Identifying simple discriminatory gene vectors with an information theory approach.","authors":"Zheng Yun,&nbsp;Kwoh Chee Keong","doi":"10.1109/csb.2005.35","DOIUrl":"https://doi.org/10.1109/csb.2005.35","url":null,"abstract":"<p><p>In the feature selection of cancer classification problems, many existing methods consider genes individually by choosing the top genes which have the most significant signal-to-noise statistic or correlation coefficient. However the information of the class distinction provided by such genes may overlap intensively, since their gene expression patterns are similar. The redundancy of including many genes with similar gene expression patterns results in highly complex classifiers. According to the principle of Occam's razor, simple models are preferable to complex ones, if they can produce comparable prediction performances to the complex ones. In this paper, we introduce a new method to learn accurate and low-complexity classifiers from gene expression profiles. In our method, we use mutual information to measure the relation between a set of genes, called gene vectors, and the class attribute of the samples. The gene vectors are in higher-dimensional spaces than individual genes, therefore, they are more diverse, or contain more information than individual genes. Hence, gene vectors are more preferable to individual genes in describing the class distinctions between samples since they contain more information about the class attribute. We validate our method on 3 gene expression profiles. By comparing our results with those from literature and other well-known classification methods, our method demonstrated better or comparable prediction performances to the existing methods, however, with lower-complexity models than existing methods.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"13-24"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.35","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A learned comparative expression measure for affymetrix genechip DNA microarrays. affymetrix基因芯片DNA微阵列的学习比较表达测量。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.5
Will Sheffler, Eli Upfal, John Sedivy, William Stafford Noble

Perhaps the most common question that a microarray study can ask is, "Between two given biological conditions, which genes exhibit changed expression levels?" Existing methods for answering this question either generate a comparative measure based upon a static model, or take an indirect approach, first estimating absolute expression levels and then comparing the estimated levels to one another. We present a method for detecting changes in gene expression between two samples based on data from Affymetrix GeneChips. Using a library of over 200,000 known cases of differential expression, we create a learned comparative expression measure (LCEM) based on classification of probe-level data patterns as changed or unchanged. LCEM uses perfect match probe data only; mismatch probe values did not prove to be useful in this context. LCEM is particularly powerful in the case of small microarry studies, in which a regression-based method such as RMA cannot generalize, and in detecting small expression changes. At the levels of selectivity that are typical in microarray analysis, the LCEM shows a lower false discovery rate than either MAS5 or RMA trained from a single chip. When many chips are available to RMA, LCEM performs better on two out of the three data sets, and nearly as well on the third. Performance of the MAS5 log ratio statistic was notably bad on all datasets.

也许微阵列研究最常见的问题是,“在两种给定的生物条件下,哪些基因表现出表达水平的变化?”回答这个问题的现有方法要么基于静态模型生成比较度量,要么采用间接方法,首先估计绝对表达水平,然后将估计的水平相互比较。我们提出了一种基于Affymetrix GeneChips的数据检测两个样本之间基因表达变化的方法。使用一个包含超过200,000个已知差分表达案例的库,我们基于对探针级数据模式的分类(更改或未更改)创建了一个学习的比较表达度量(LCEM)。LCEM只使用完全匹配的探针数据;在这种情况下,不匹配探测值没有被证明是有用的。LCEM在小型微阵列研究中尤其强大,在这种情况下,基于回归的方法(如RMA)无法推广,并且可以检测小的表达变化。在微阵列分析中典型的选择性水平上,LCEM显示出比单芯片训练的MAS5或RMA更低的错误发现率。当RMA可以使用许多芯片时,LCEM在三个数据集中的两个上表现更好,在第三个数据集上表现也差不多。MAS5对数比率统计的性能在所有数据集上都很差。
{"title":"A learned comparative expression measure for affymetrix genechip DNA microarrays.","authors":"Will Sheffler,&nbsp;Eli Upfal,&nbsp;John Sedivy,&nbsp;William Stafford Noble","doi":"10.1109/csb.2005.5","DOIUrl":"https://doi.org/10.1109/csb.2005.5","url":null,"abstract":"<p><p>Perhaps the most common question that a microarray study can ask is, \"Between two given biological conditions, which genes exhibit changed expression levels?\" Existing methods for answering this question either generate a comparative measure based upon a static model, or take an indirect approach, first estimating absolute expression levels and then comparing the estimated levels to one another. We present a method for detecting changes in gene expression between two samples based on data from Affymetrix GeneChips. Using a library of over 200,000 known cases of differential expression, we create a learned comparative expression measure (LCEM) based on classification of probe-level data patterns as changed or unchanged. LCEM uses perfect match probe data only; mismatch probe values did not prove to be useful in this context. LCEM is particularly powerful in the case of small microarry studies, in which a regression-based method such as RMA cannot generalize, and in detecting small expression changes. At the levels of selectivity that are typical in microarray analysis, the LCEM shows a lower false discovery rate than either MAS5 or RMA trained from a single chip. When many chips are available to RMA, LCEM performs better on two out of the three data sets, and nearly as well on the third. Performance of the MAS5 log ratio statistic was notably bad on all datasets.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"144-54"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Accurate prediction of orthologous gene groups in microbes. 微生物中同源基因群的准确预测。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.10
Hongwei Wu, Fenglou Mao, Victor Olman, Ying Xu

We present a new computational method for the prediction of orthologous gene groups for microbial genomes based on the prediction of co-occurrences of homologous genes. The method is inspired by the observation that homologous genes are highly likely to be orthologous if their neighboring genes are also homologous. Based on co-occurrences of homologous genes, we have grouped the (predicted) operons of 77 selected sequenced microbial genomes so that operons of the same group are highly likely to be functionally similar or related. We then cluster the homologous genes in the same operon group so that genes of the same cluster are highly likely to be similar in terms of their sequences and functions, i.e., they are predicted to be orthologous genes. By comparing our predicted orthologous gene groups with the COG assignments and NCBI annotations, we conclude that our method is promising to provide more accurate and specific predictions than the existing methods.

我们提出了一种基于同源基因共现预测的微生物基因组同源基因群预测的新计算方法。该方法的灵感来自于这样的观察:如果邻近基因也是同源的,那么同源基因很可能是同源的。基于同源基因的共现,我们对77个选定的已测序微生物基因组的(预测的)操纵子进行了分组,因此同一组的操纵子很可能在功能上相似或相关。然后,我们将同源基因聚类在同一个操纵子群中,这样同一簇的基因在序列和功能方面很可能是相似的,即,它们被预测为同源基因。通过将我们预测的同源基因群与COG定位和NCBI注释进行比较,我们得出结论,我们的方法有望提供比现有方法更准确和更具体的预测。
{"title":"Accurate prediction of orthologous gene groups in microbes.","authors":"Hongwei Wu,&nbsp;Fenglou Mao,&nbsp;Victor Olman,&nbsp;Ying Xu","doi":"10.1109/csb.2005.10","DOIUrl":"https://doi.org/10.1109/csb.2005.10","url":null,"abstract":"<p><p>We present a new computational method for the prediction of orthologous gene groups for microbial genomes based on the prediction of co-occurrences of homologous genes. The method is inspired by the observation that homologous genes are highly likely to be orthologous if their neighboring genes are also homologous. Based on co-occurrences of homologous genes, we have grouped the (predicted) operons of 77 selected sequenced microbial genomes so that operons of the same group are highly likely to be functionally similar or related. We then cluster the homologous genes in the same operon group so that genes of the same cluster are highly likely to be similar in terms of their sequences and functions, i.e., they are predicted to be orthologous genes. By comparing our predicted orthologous gene groups with the COG assignments and NCBI annotations, we conclude that our method is promising to provide more accurate and specific predictions than the existing methods.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"73-9"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multi-metric and multi-substructure biclustering analysis for gene expression data. 基因表达数据的多度量和多亚结构双聚类分析。
Pub Date : 2005-01-01 DOI: 10.1109/csb.2005.40
S Y Kung, Man-Wai Mak, Ilias Tagkopoulos

A good number of biclustering algorithms have been proposed for grouping gene expression data. Many of them have adopted matrix norms to define the similarity score of a bicluster. We shall show that almost all matrix metrics can be converted into vector norms while preserving the rank equivalence. Vector norms provide a much more efficient vehicle for biclustering analysis and computation. The advantages are two folds: ease of analysis and saving of computation. Most existing biclustering algorithms have also implicitly assumed the use of univariate (i.e., single metric) evaluation for identifying biclusters. Such an approach however overlooks the fundamental principle that genes (even though they may belong to the same gene group) (1) may be subdivided into different substructures; and (2) they may be co-expressed via a diversity of coherence models (a gene may participate in multiple pathways that may or may not be co-active under all conditions). The former leads to the adoption of a multi-substurcture analysis, while the latter to the multivariate analysis. This paper will show that the proposed multivariate and multi-subscluster analysis is very effective in identifying and classifying biologically relevant groups in genes and conditions. For example, it has successfully yielded highly discriminant and accurate classification based on known ribosomal gene groups.

许多双聚类算法已经被提出用于基因表达数据的分组。他们中的许多人都采用矩阵规范来定义双聚类的相似度得分。我们将证明几乎所有的矩阵度量都可以在保持秩等价的情况下转换成向量范数。向量规范为双聚类分析和计算提供了更有效的工具。其优点有两方面:易于分析和节省计算。大多数现有的双聚类算法也隐含地假设使用单变量(即,单度量)评估来识别双聚类。然而,这种方法忽略了一个基本原则,即基因(即使它们可能属于同一基因群)(1)可以被细分为不同的亚结构;(2)它们可能通过多种相干模型共同表达(一个基因可能参与多种途径,这些途径在所有条件下可能协同作用,也可能不协同作用)。前者导致采用多子结构分析,后者导致采用多变量分析。本文将证明所提出的多元和多亚聚类分析在识别和分类基因和条件的生物相关群体方面是非常有效的。例如,它已经成功地产生了基于已知核糖体基因群的高度判别和准确分类。
{"title":"Multi-metric and multi-substructure biclustering analysis for gene expression data.","authors":"S Y Kung,&nbsp;Man-Wai Mak,&nbsp;Ilias Tagkopoulos","doi":"10.1109/csb.2005.40","DOIUrl":"https://doi.org/10.1109/csb.2005.40","url":null,"abstract":"<p><p>A good number of biclustering algorithms have been proposed for grouping gene expression data. Many of them have adopted matrix norms to define the similarity score of a bicluster. We shall show that almost all matrix metrics can be converted into vector norms while preserving the rank equivalence. Vector norms provide a much more efficient vehicle for biclustering analysis and computation. The advantages are two folds: ease of analysis and saving of computation. Most existing biclustering algorithms have also implicitly assumed the use of univariate (i.e., single metric) evaluation for identifying biclusters. Such an approach however overlooks the fundamental principle that genes (even though they may belong to the same gene group) (1) may be subdivided into different substructures; and (2) they may be co-expressed via a diversity of coherence models (a gene may participate in multiple pathways that may or may not be co-active under all conditions). The former leads to the adoption of a multi-substurcture analysis, while the latter to the multivariate analysis. This paper will show that the proposed multivariate and multi-subscluster analysis is very effective in identifying and classifying biologically relevant groups in genes and conditions. For example, it has successfully yielded highly discriminant and accurate classification based on known ribosomal gene groups.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"123-34"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2005.40","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25829997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Proceedings. IEEE Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1