Biclustering Models Under Collinearity in Simulated Biological Experiments

IF 0.3 Q4 MATHEMATICS Matematika Pub Date : 2023-11-30 DOI:10.11113/matematika.v39.n3.1461
Chibuike Nnamani, Norhaiza Ahmad
{"title":"Biclustering Models Under Collinearity in Simulated Biological Experiments","authors":"Chibuike Nnamani, Norhaiza Ahmad","doi":"10.11113/matematika.v39.n3.1461","DOIUrl":null,"url":null,"abstract":"Biclustering models allow simultaneous detection of group observations that are related to variables in a data matrix. Such methods have been applied in biological data for classification. Collinearity is a common feature in biological data as there exist interactions between genes and proteins in their respective pathways. Such relationships could seriously reduce the efficiency of biclustering models. In this study, synthetic data are generated to investigate the effect of collinearity on the performance of biclustering models. Specifically, the data are generated and induced with varying degrees of collinearity using Cholesky decomposition, and are implanted with biclusters to produce different sets of synthetic data. The effectiveness of three models namely Biclustering by Cheng and Church (BCCC), Spectral Bicluster (BCSpectral) and Plaid Model in correctly detecting three types of biclusters in the generated data matrix were compared. The results show that all the models investigated are sensitive to changes in the level of collinearity. At low collinearity, all biclustering models were able to detect the implanted biclusters in the data correctly. As the level of collinearity in the data rise, the proportion of detectedbiclusters captured by the models reduces. In particular, BCC outperformed the other two models for moderate to high collinearity with a Jaccard coefficient of 0.499 to 0.875 and 0.746 to 0.936 for one and two implanted biclusters respectively.","PeriodicalId":43733,"journal":{"name":"Matematika","volume":null,"pages":null},"PeriodicalIF":0.3000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Matematika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/matematika.v39.n3.1461","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Biclustering models allow simultaneous detection of group observations that are related to variables in a data matrix. Such methods have been applied in biological data for classification. Collinearity is a common feature in biological data as there exist interactions between genes and proteins in their respective pathways. Such relationships could seriously reduce the efficiency of biclustering models. In this study, synthetic data are generated to investigate the effect of collinearity on the performance of biclustering models. Specifically, the data are generated and induced with varying degrees of collinearity using Cholesky decomposition, and are implanted with biclusters to produce different sets of synthetic data. The effectiveness of three models namely Biclustering by Cheng and Church (BCCC), Spectral Bicluster (BCSpectral) and Plaid Model in correctly detecting three types of biclusters in the generated data matrix were compared. The results show that all the models investigated are sensitive to changes in the level of collinearity. At low collinearity, all biclustering models were able to detect the implanted biclusters in the data correctly. As the level of collinearity in the data rise, the proportion of detectedbiclusters captured by the models reduces. In particular, BCC outperformed the other two models for moderate to high collinearity with a Jaccard coefficient of 0.499 to 0.875 and 0.746 to 0.936 for one and two implanted biclusters respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
模拟生物实验中相关性条件下的双聚类模型
双聚类模型可以同时检测与数据矩阵中变量相关的群体观测数据。这种方法已被应用于生物数据的分类。共线性是生物数据的常见特征,因为基因和蛋白质在各自的通路中存在相互作用。这种关系会严重降低双聚类模型的效率。本研究生成合成数据,以研究共线性对双聚类模型性能的影响。具体来说,利用 Cholesky 分解法生成并诱导不同程度的共线性数据,然后植入双聚类,生成不同的合成数据集。比较了三种模型,即 Biclustering by Cheng and Church (BCCCC)、Spectral Bicluster (BCSpectral) 和 Plaid Model 在生成的数据矩阵中正确检测三种双簇的有效性。结果表明,所研究的所有模型对共线性水平的变化都很敏感。在低共线性条件下,所有双簇模型都能正确检测出数据中的植入双簇。随着数据中共线性水平的上升,模型所捕捉到的双簇比例也随之下降。特别是在中度到高度共线性情况下,BCC 的表现优于其他两个模型,对于一个和两个植入双簇的 Jaccard 系数分别为 0.499 到 0.875 和 0.746 到 0.936。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Matematika
Matematika MATHEMATICS-
自引率
25.00%
发文量
0
审稿时长
24 weeks
期刊最新文献
An Almost Unbiased Regression Estimator: Theoretical Comparison and Numerical Comparison in Portland Cement Data Neutrosophic Bicubic Bezier Surface ApproximationModel for Uncertainty Data Using the ARIMA/SARIMA Model for Afghanistan's Drought Forecasting Based on Standardized Precipitation Index Heat Transfer Enhancement of Convective Casson Nanofluid Flow by CNTs over Exponentially Accelerated Plate Biclustering Models Under Collinearity in Simulated Biological Experiments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1