{"title":"Haplotype Classification Using Copy Number Variation and Principal Components Analysis","authors":"K. Blighe","doi":"10.2174/1875036201307010019","DOIUrl":null,"url":null,"abstract":"Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such 'dimension reduction' techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"7 1","pages":"19-24"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Bioinformatics Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1875036201307010019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 2
Abstract
Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such 'dimension reduction' techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.
期刊介绍:
The Open Bioinformatics Journal is an Open Access online journal, which publishes research articles, reviews/mini-reviews, letters, clinical trial studies and guest edited single topic issues in all areas of bioinformatics and computational biology. The coverage includes biomedicine, focusing on large data acquisition, analysis and curation, computational and statistical methods for the modeling and analysis of biological data, and descriptions of new algorithms and databases. The Open Bioinformatics Journal, a peer reviewed journal, is an important and reliable source of current information on the developments in the field. The emphasis will be on publishing quality articles rapidly and freely available worldwide.