威尔克斯基因聚类的不相似性:计算问题

Q3 Nursing Epidemiology Biostatistics and Public Health Pub Date : 2022-07-07 DOI:10.2427/8761

F. M. L. D. Lascio, A. Roverato

{"title":"威尔克斯基因聚类的不相似性:计算问题","authors":"F. M. L. D. Lascio, A. Roverato","doi":"10.2427/8761","DOIUrl":null,"url":null,"abstract":"Clustering methods are widely used in the analysis of gene expression data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover co–regulated genes because it has been postulated that co–regulation implies a similar function. In the context of agglomerative hierarchical clustering, we introduced a dissimilarity measure based on the Wilks’ Λ statistic that they called the Wilks’ dissimilarity and showed its usefulness in the identification of transcription modules. In this paper, we discuss the ability of the Wilks’ dissimilarity to identify clusters of co-expressed genes by providing an example where the most commonly used dissimilarity measures fail. Furthermore, we carry out a set of simulations aimed to investigate the use of a sparse canonical correlation technique in the estimation of the Wilks’ dissimilarity and provide guidelines for its use.","PeriodicalId":45811,"journal":{"name":"Epidemiology Biostatistics and Public Health","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Wilks’ dissimilarity for gene clustering: computational issues\",\"authors\":\"F. M. L. D. Lascio, A. Roverato\",\"doi\":\"10.2427/8761\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering methods are widely used in the analysis of gene expression data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover co–regulated genes because it has been postulated that co–regulation implies a similar function. In the context of agglomerative hierarchical clustering, we introduced a dissimilarity measure based on the Wilks’ Λ statistic that they called the Wilks’ dissimilarity and showed its usefulness in the identification of transcription modules. In this paper, we discuss the ability of the Wilks’ dissimilarity to identify clusters of co-expressed genes by providing an example where the most commonly used dissimilarity measures fail. Furthermore, we carry out a set of simulations aimed to investigate the use of a sparse canonical correlation technique in the estimation of the Wilks’ dissimilarity and provide guidelines for its use.\",\"PeriodicalId\":45811,\"journal\":{\"name\":\"Epidemiology Biostatistics and Public Health\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemiology Biostatistics and Public Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2427/8761\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Nursing\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology Biostatistics and Public Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2427/8761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Nursing","Score":null,"Total":0}

引用次数: 0

摘要

聚类方法因其揭示协调表达谱的能力而广泛应用于基因表达数据分析。聚类的一个重要目标是发现共调控基因，因为人们一直认为共调控意味着类似的功能。在聚类分层聚类的背景下，我们引入了一种基于Wilks ' Λ统计的不相似度度量，他们称之为Wilks '不相似度，并显示了其在转录模块识别中的实用性。在本文中，我们通过提供一个最常用的不相似度测量失败的例子来讨论威尔克斯不相似度识别共表达基因簇的能力。此外，我们进行了一组模拟，旨在研究稀疏典型相关技术在估计Wilks不相似性中的使用，并为其使用提供指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Wilks’ dissimilarity for gene clustering: computational issues

Clustering methods are widely used in the analysis of gene expression data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover co–regulated genes because it has been postulated that co–regulation implies a similar function. In the context of agglomerative hierarchical clustering, we introduced a dissimilarity measure based on the Wilks’ Λ statistic that they called the Wilks’ dissimilarity and showed its usefulness in the identification of transcription modules. In this paper, we discuss the ability of the Wilks’ dissimilarity to identify clusters of co-expressed genes by providing an example where the most commonly used dissimilarity measures fail. Furthermore, we carry out a set of simulations aimed to investigate the use of a sparse canonical correlation technique in the estimation of the Wilks’ dissimilarity and provide guidelines for its use.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Epidemiology Biostatistics and Public Health PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH-

自引率

0.00%

发文量

期刊介绍： Epidemiology, Biostatistics, and Public Health (EBPH) is a multidisciplinary journal that has two broad aims: -To support the international public health community with publications on health service research, health care management, health policy, and health economics. -To strengthen the evidences on effective preventive interventions. -To advance public health methods, including biostatistics and epidemiology. EBPH welcomes submissions on all public health issues (including topics like eHealth, big data, personalized prevention, epidemiology and risk factors of chronic and infectious diseases); on basic and applied research in epidemiology; and in biostatistics methodology. Primary studies, systematic reviews, and meta-analyses are all welcome, as are research protocols for observational and experimental studies. EBPH aims to be a cross-discipline, international forum for scientific integration and evidence-based policymaking, combining the methodological aspects of epidemiology, biostatistics, and public health research with their practical applications.