Biclustering Models Under Collinearity in Simulated Biological Experiments

IF 0.6 Q4 MATHEMATICS Matematika Pub Date : 2023-11-30 DOI:10.11113/matematika.v39.n3.1461

Chibuike Nnamani, Norhaiza Ahmad

{"title":"Biclustering Models Under Collinearity in Simulated Biological Experiments","authors":"Chibuike Nnamani, Norhaiza Ahmad","doi":"10.11113/matematika.v39.n3.1461","DOIUrl":null,"url":null,"abstract":"Biclustering models allow simultaneous detection of group observations that are related to variables in a data matrix. Such methods have been applied in biological data for classification. Collinearity is a common feature in biological data as there exist interactions between genes and proteins in their respective pathways. Such relationships could seriously reduce the efficiency of biclustering models. In this study, synthetic data are generated to investigate the effect of collinearity on the performance of biclustering models. Specifically, the data are generated and induced with varying degrees of collinearity using Cholesky decomposition, and are implanted with biclusters to produce different sets of synthetic data. The effectiveness of three models namely Biclustering by Cheng and Church (BCCC), Spectral Bicluster (BCSpectral) and Plaid Model in correctly detecting three types of biclusters in the generated data matrix were compared. The results show that all the models investigated are sensitive to changes in the level of collinearity. At low collinearity, all biclustering models were able to detect the implanted biclusters in the data correctly. As the level of collinearity in the data rise, the proportion of detectedbiclusters captured by the models reduces. In particular, BCC outperformed the other two models for moderate to high collinearity with a Jaccard coefficient of 0.499 to 0.875 and 0.746 to 0.936 for one and two implanted biclusters respectively.","PeriodicalId":43733,"journal":{"name":"Matematika","volume":"65 12","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Matematika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/matematika.v39.n3.1461","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Biclustering models allow simultaneous detection of group observations that are related to variables in a data matrix. Such methods have been applied in biological data for classification. Collinearity is a common feature in biological data as there exist interactions between genes and proteins in their respective pathways. Such relationships could seriously reduce the efficiency of biclustering models. In this study, synthetic data are generated to investigate the effect of collinearity on the performance of biclustering models. Specifically, the data are generated and induced with varying degrees of collinearity using Cholesky decomposition, and are implanted with biclusters to produce different sets of synthetic data. The effectiveness of three models namely Biclustering by Cheng and Church (BCCC), Spectral Bicluster (BCSpectral) and Plaid Model in correctly detecting three types of biclusters in the generated data matrix were compared. The results show that all the models investigated are sensitive to changes in the level of collinearity. At low collinearity, all biclustering models were able to detect the implanted biclusters in the data correctly. As the level of collinearity in the data rise, the proportion of detectedbiclusters captured by the models reduces. In particular, BCC outperformed the other two models for moderate to high collinearity with a Jaccard coefficient of 0.499 to 0.875 and 0.746 to 0.936 for one and two implanted biclusters respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

模拟生物实验中相关性条件下的双聚类模型

双聚类模型可以同时检测与数据矩阵中变量相关的群体观测数据。这种方法已被应用于生物数据的分类。共线性是生物数据的常见特征，因为基因和蛋白质在各自的通路中存在相互作用。这种关系会严重降低双聚类模型的效率。本研究生成合成数据，以研究共线性对双聚类模型性能的影响。具体来说，利用 Cholesky 分解法生成并诱导不同程度的共线性数据，然后植入双聚类，生成不同的合成数据集。比较了三种模型，即 Biclustering by Cheng and Church (BCCCC)、Spectral Bicluster (BCSpectral) 和 Plaid Model 在生成的数据矩阵中正确检测三种双簇的有效性。结果表明，所研究的所有模型对共线性水平的变化都很敏感。在低共线性条件下，所有双簇模型都能正确检测出数据中的植入双簇。随着数据中共线性水平的上升，模型所捕捉到的双簇比例也随之下降。特别是在中度到高度共线性情况下，BCC 的表现优于其他两个模型，对于一个和两个植入双簇的 Jaccard 系数分别为 0.499 到 0.875 和 0.746 到 0.936。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Matematika MATHEMATICS-

自引率

25.00%

发文量

审稿时长

24 weeks