A clustering model for identification of time course gene expression patterns

P. Ochieng, S. I. Tarigan, H. Didik
{"title":"A clustering model for identification of time course gene expression patterns","authors":"P. Ochieng, S. I. Tarigan, H. Didik","doi":"10.1109/IBIOMED.2016.7869819","DOIUrl":null,"url":null,"abstract":"Identification of gene expression patterns when studying complex and dynamic biological processes such as gene regulatory functions is critical. Gene expression is a continuous biological phenomenon and can be represented by a continuous function (curve). Each gene behaving in such a continuous functions often shares similar functional forms. However, patterns such as numbers, shape, and the identities of those genes sharing similar functional forms remain unknown. To identify such functional forms we introduce a clustering model for identification of time course gene expression patterns. The method utilizes an S-spline approach to model the functional curves and a penalized log-likelihood approach to fit the model. In addition, a rejection-controlled EM algorithm is designed minimizes the error and computational cost during mean curve estimation. Furthermore, the method utilizes general crossvalidation to select smoothing parameters and further measure the clustering uncertainty using the Bayesian information criterion. The interest of the method is illustrated by its application to D. melanogaster life cycle datasets. Simulation results indicated our method accurately estimates mean expression curve to true functional forms by assigning the gene to cluster, predicting mean curve and providing 95% associated confidence bands for each cluster. Based on Gene Ontology term description, the estimated mean curve in each cluster reflects true gene functional annotations with biologically meaningful gene expression patterns. Finally, comparative clustering performance indicates our method to outperform Fuzzy-cMeans and K-Means by misclassification rate of 0.1289 and overall success rate of 98.71%.","PeriodicalId":171132,"journal":{"name":"2016 1st International Conference on Biomedical Engineering (IBIOMED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 1st International Conference on Biomedical Engineering (IBIOMED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IBIOMED.2016.7869819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Identification of gene expression patterns when studying complex and dynamic biological processes such as gene regulatory functions is critical. Gene expression is a continuous biological phenomenon and can be represented by a continuous function (curve). Each gene behaving in such a continuous functions often shares similar functional forms. However, patterns such as numbers, shape, and the identities of those genes sharing similar functional forms remain unknown. To identify such functional forms we introduce a clustering model for identification of time course gene expression patterns. The method utilizes an S-spline approach to model the functional curves and a penalized log-likelihood approach to fit the model. In addition, a rejection-controlled EM algorithm is designed minimizes the error and computational cost during mean curve estimation. Furthermore, the method utilizes general crossvalidation to select smoothing parameters and further measure the clustering uncertainty using the Bayesian information criterion. The interest of the method is illustrated by its application to D. melanogaster life cycle datasets. Simulation results indicated our method accurately estimates mean expression curve to true functional forms by assigning the gene to cluster, predicting mean curve and providing 95% associated confidence bands for each cluster. Based on Gene Ontology term description, the estimated mean curve in each cluster reflects true gene functional annotations with biologically meaningful gene expression patterns. Finally, comparative clustering performance indicates our method to outperform Fuzzy-cMeans and K-Means by misclassification rate of 0.1289 and overall success rate of 98.71%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种识别时间过程基因表达模式的聚类模型
在研究复杂和动态的生物过程如基因调控功能时,基因表达模式的识别是至关重要的。基因表达是一个连续的生物现象,可以用一个连续的函数(曲线)来表示。每一个具有这种连续功能的基因通常具有相似的功能形式。然而,诸如数量、形状和那些共享相似功能形式的基因的身份等模式仍然未知。为了识别这些功能形式,我们引入了一个聚类模型来识别时间过程基因表达模式。该方法采用s样条法对函数曲线进行建模,并用惩罚对数似然法对模型进行拟合。此外,设计了一种抑制控制的电磁算法,使平均曲线估计的误差和计算量最小化。此外,该方法利用一般交叉验证选择平滑参数,并利用贝叶斯信息准则进一步测量聚类不确定性。该方法的兴趣是通过其应用于D. melanogaster生命周期数据集来说明的。仿真结果表明,该方法通过将基因分配到聚类,预测平均曲线,并为每个聚类提供95%的相关置信区间,从而准确地估计出平均表达曲线的真实功能形式。基于基因本体术语描述,每个聚类估计的平均曲线反映了具有生物学意义的基因表达模式的真实基因功能注释。最后,聚类性能的比较表明,我们的方法优于Fuzzy-cMeans和K-Means,误分类率为0.1289,总成功率为98.71%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Comparison of pre- and post-reconstruction denoising approaches in positron emission tomography A clustering model for identification of time course gene expression patterns Pattern of accesibility level of health facilities in yogyakarta A prototype of SSVEP-based BCI for home appliances control Feature extraction for palmprint recognition using kernel-PCA with modification in Gabor parameters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1