{"title":"威尔克斯基因聚类的不相似性:计算问题","authors":"F. M. L. D. Lascio, A. Roverato","doi":"10.2427/8761","DOIUrl":null,"url":null,"abstract":"Clustering methods are widely used in the analysis of gene expression data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover co–regulated genes because it has been postulated that co–regulation implies a similar function. In the context of agglomerative hierarchical clustering, we introduced a dissimilarity measure based on the Wilks’ Λ statistic that they called the Wilks’ dissimilarity and showed its usefulness in the identification of transcription modules. In this paper, we discuss the ability of the Wilks’ dissimilarity to identify clusters of co-expressed genes by providing an example where the most commonly used dissimilarity measures fail. Furthermore, we carry out a set of simulations aimed to investigate the use of a sparse canonical correlation technique in the estimation of the Wilks’ dissimilarity and provide guidelines for its use.","PeriodicalId":45811,"journal":{"name":"Epidemiology Biostatistics and Public Health","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Wilks’ dissimilarity for gene clustering: computational issues\",\"authors\":\"F. M. L. D. Lascio, A. Roverato\",\"doi\":\"10.2427/8761\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering methods are widely used in the analysis of gene expression data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover co–regulated genes because it has been postulated that co–regulation implies a similar function. In the context of agglomerative hierarchical clustering, we introduced a dissimilarity measure based on the Wilks’ Λ statistic that they called the Wilks’ dissimilarity and showed its usefulness in the identification of transcription modules. In this paper, we discuss the ability of the Wilks’ dissimilarity to identify clusters of co-expressed genes by providing an example where the most commonly used dissimilarity measures fail. Furthermore, we carry out a set of simulations aimed to investigate the use of a sparse canonical correlation technique in the estimation of the Wilks’ dissimilarity and provide guidelines for its use.\",\"PeriodicalId\":45811,\"journal\":{\"name\":\"Epidemiology Biostatistics and Public Health\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemiology Biostatistics and Public Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2427/8761\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Nursing\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology Biostatistics and Public Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2427/8761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Nursing","Score":null,"Total":0}
Wilks’ dissimilarity for gene clustering: computational issues
Clustering methods are widely used in the analysis of gene expression data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover co–regulated genes because it has been postulated that co–regulation implies a similar function. In the context of agglomerative hierarchical clustering, we introduced a dissimilarity measure based on the Wilks’ Λ statistic that they called the Wilks’ dissimilarity and showed its usefulness in the identification of transcription modules. In this paper, we discuss the ability of the Wilks’ dissimilarity to identify clusters of co-expressed genes by providing an example where the most commonly used dissimilarity measures fail. Furthermore, we carry out a set of simulations aimed to investigate the use of a sparse canonical correlation technique in the estimation of the Wilks’ dissimilarity and provide guidelines for its use.
期刊介绍:
Epidemiology, Biostatistics, and Public Health (EBPH) is a multidisciplinary journal that has two broad aims: -To support the international public health community with publications on health service research, health care management, health policy, and health economics. -To strengthen the evidences on effective preventive interventions. -To advance public health methods, including biostatistics and epidemiology. EBPH welcomes submissions on all public health issues (including topics like eHealth, big data, personalized prevention, epidemiology and risk factors of chronic and infectious diseases); on basic and applied research in epidemiology; and in biostatistics methodology. Primary studies, systematic reviews, and meta-analyses are all welcome, as are research protocols for observational and experimental studies. EBPH aims to be a cross-discipline, international forum for scientific integration and evidence-based policymaking, combining the methodological aspects of epidemiology, biostatistics, and public health research with their practical applications.