Rong Jin, Luo Si, Shireesh Srivastava, Zheng Li, Christina Chan
{"title":"基因表达和微阵列分析的知识驱动回归模型。","authors":"Rong Jin, Luo Si, Shireesh Srivastava, Zheng Li, Christina Chan","doi":"10.1109/IEMBS.2006.260347","DOIUrl":null,"url":null,"abstract":"<p><p>The linear regression model has been widely used in the analysis of gene expression and microarray data to identify a subset of genes that are important to a given metabolic function. One of the key challenges in applying the linear regression model to gene expression data analysis arises from the sparse data problem, in which the number of genes is significantly larger than the number of conditions. To resolve this problem, we present a knowledge driven regression model that incorporates the knowledge of genes from the Gene Ontology (GO) database into the linear regression model. It is based on the assumption that two genes are likely to be assigned similar weights when they share similar sets of GO codes. Empirical studies show that the proposed knowledge driven regression model is effective in reducing the regression errors, and furthermore effective in identifying genes that are relevant to a given metabolite.</p>","PeriodicalId":72689,"journal":{"name":"Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference","volume":" ","pages":"5326-9"},"PeriodicalIF":0.0000,"publicationDate":"2006-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/IEMBS.2006.260347","citationCount":"7","resultStr":"{\"title\":\"A knowledge driven regression model for gene expression and microarray analysis.\",\"authors\":\"Rong Jin, Luo Si, Shireesh Srivastava, Zheng Li, Christina Chan\",\"doi\":\"10.1109/IEMBS.2006.260347\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The linear regression model has been widely used in the analysis of gene expression and microarray data to identify a subset of genes that are important to a given metabolic function. One of the key challenges in applying the linear regression model to gene expression data analysis arises from the sparse data problem, in which the number of genes is significantly larger than the number of conditions. To resolve this problem, we present a knowledge driven regression model that incorporates the knowledge of genes from the Gene Ontology (GO) database into the linear regression model. It is based on the assumption that two genes are likely to be assigned similar weights when they share similar sets of GO codes. Empirical studies show that the proposed knowledge driven regression model is effective in reducing the regression errors, and furthermore effective in identifying genes that are relevant to a given metabolite.</p>\",\"PeriodicalId\":72689,\"journal\":{\"name\":\"Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference\",\"volume\":\" \",\"pages\":\"5326-9\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/IEMBS.2006.260347\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IEMBS.2006.260347\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEMBS.2006.260347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A knowledge driven regression model for gene expression and microarray analysis.
The linear regression model has been widely used in the analysis of gene expression and microarray data to identify a subset of genes that are important to a given metabolic function. One of the key challenges in applying the linear regression model to gene expression data analysis arises from the sparse data problem, in which the number of genes is significantly larger than the number of conditions. To resolve this problem, we present a knowledge driven regression model that incorporates the knowledge of genes from the Gene Ontology (GO) database into the linear regression model. It is based on the assumption that two genes are likely to be assigned similar weights when they share similar sets of GO codes. Empirical studies show that the proposed knowledge driven regression model is effective in reducing the regression errors, and furthermore effective in identifying genes that are relevant to a given metabolite.