Clustering expressed genes on the basis of their association with a quantitative phenotype.

Zhenyu Jia, Shizhong Xu
{"title":"Clustering expressed genes on the basis of their association with a quantitative phenotype.","authors":"Zhenyu Jia,&nbsp;Shizhong Xu","doi":"10.1017/S0016672305007822","DOIUrl":null,"url":null,"abstract":"<p><p>Cluster analyses of gene expression data are usually conducted based on their associations with the phenotype of a particular disease. Many disease traits have a clearly defined binary phenotype (presence or absence), so that genes can be clustered based on the differences of expression levels between the two contrasting phenotypic groups. For example, cluster analysis based on binary phenotype has been successfully used in tumour research. Some complex diseases have phenotypes that vary in a continuous manner and the method developed for a binary trait is not immediately applicable to a continuous trait. However, understanding the role of gene expression in these complex traits is of fundamental importance. Therefore, it is necessary to develop a new statistical method to cluster expressed genes based on their association with a quantitative trait phenotype. We developed a model-based clustering method to classify genes based on their association with a continuous phenotype. We used a linear model to describe the relationship between gene expression and the phenotypic value. The model effects of the linear model (linear regression coefficients) represent the strength of the association. We assumed that the model effects of each gene follow a mixture of several multivariate Gaussian distributions. Parameter estimation and cluster assignment were accomplished via an Expectation-Maximization (EM) algorithm. The method was verified by analysing two simulated datasets, and further demonstrated using real data generated in a microarray experiment for the study of gene expression associated with Alzheimer's disease.</p>","PeriodicalId":12777,"journal":{"name":"Genetical research","volume":"86 3","pages":"193-207"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/S0016672305007822","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetical research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/S0016672305007822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

Abstract

Cluster analyses of gene expression data are usually conducted based on their associations with the phenotype of a particular disease. Many disease traits have a clearly defined binary phenotype (presence or absence), so that genes can be clustered based on the differences of expression levels between the two contrasting phenotypic groups. For example, cluster analysis based on binary phenotype has been successfully used in tumour research. Some complex diseases have phenotypes that vary in a continuous manner and the method developed for a binary trait is not immediately applicable to a continuous trait. However, understanding the role of gene expression in these complex traits is of fundamental importance. Therefore, it is necessary to develop a new statistical method to cluster expressed genes based on their association with a quantitative trait phenotype. We developed a model-based clustering method to classify genes based on their association with a continuous phenotype. We used a linear model to describe the relationship between gene expression and the phenotypic value. The model effects of the linear model (linear regression coefficients) represent the strength of the association. We assumed that the model effects of each gene follow a mixture of several multivariate Gaussian distributions. Parameter estimation and cluster assignment were accomplished via an Expectation-Maximization (EM) algorithm. The method was verified by analysing two simulated datasets, and further demonstrated using real data generated in a microarray experiment for the study of gene expression associated with Alzheimer's disease.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
聚类表达基因的基础上,他们与定量表型的关联。
基因表达数据的聚类分析通常基于它们与特定疾病表型的关联进行。许多疾病特征具有明确定义的二元表型(存在或不存在),因此可以根据两种不同表型组之间表达水平的差异对基因进行聚类。例如,基于二元表型的聚类分析已成功地用于肿瘤研究。一些复杂疾病的表型以连续的方式变化,为二元性状开发的方法不能立即适用于连续性状。然而,了解基因表达在这些复杂性状中的作用是至关重要的。因此,有必要开发一种新的统计方法,根据表达基因与数量性状表型的关联对表达基因进行聚类。我们开发了一种基于模型的聚类方法,根据基因与连续表型的关联对基因进行分类。我们使用线性模型来描述基因表达与表型值之间的关系。线性模型的模型效应(线性回归系数)表示关联的强度。我们假设每个基因的模型效应遵循几个多元高斯分布的混合。通过期望最大化算法实现参数估计和聚类分配。通过分析两个模拟数据集验证了该方法,并使用微阵列实验中产生的与阿尔茨海默病相关的基因表达研究的真实数据进一步验证了该方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Wild populations are smaller than we think: a commentary on 'Effective population size/adult population size ratios in wildlife: a review' by Richard Frankham. Impact of selection on effective population size: a commentary on 'Inbreeding in artificial selection programmes' by Alan Robertson. Hybrid dysgenesis: from darkness into light: a commentary on 'Hybrid dysgenesis in Drosophila melanogaster: rules of inheritance of female sterility' by William R. Engels. A model in two acts: a commentary on 'A model detectable alleles in a finite population' by Timoko Ohta and Motoo Kimura. Estimating the recombination parameter: a commentary on 'Estimating the recombination parameter of a finite population model without selection' by Richard R. Hudson.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1