U. Erdogdu, Mehmet Tan, R. Alhajj, Faruk Polat, D. Demetrick, J. Rokne
{"title":"利用机器学习技术进行数据丰富:增加有效基因表达数据分析的样本数量","authors":"U. Erdogdu, Mehmet Tan, R. Alhajj, Faruk Polat, D. Demetrick, J. Rokne","doi":"10.1109/BIBM.2011.105","DOIUrl":null,"url":null,"abstract":"For certain domains, e.g. bioinformatics, producing more real samples is costly, error prone and time consuming. Therefore, there is a need for an intelligent automated process capable of substituting the real samples by artificial samples that carry the same characteristics as the real samples and hence could be used for running comprehensive testing of new methodologies. Motivated by this need, we describe a novel approach that integrates Probabilistic Boolean Network and genetic algorithm based techniques into a framework that uses some existing real samples as input and successfully produces new samples as output. The new samples will inspire the characteristics of the existing samples without duplicating them. This leads to diversity in the samples and hence a more rich set of samples to be used in testing. The developed framework incorporates two models (perspectives) for sample generation. We illustrate its applicability for producing new gene expression data samples, a high demanding area that has not received attention. The two perspectives employed in the process are based on models that are not closely related, the independence eliminates the bias of having the produced approach covering only certain characteristics of the domain and leading to samples skewed towards one direction. The produced results are very promising in showing the effectiveness, usefulness and applicability of the proposed multi-model framework.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"9 1","pages":"238-242"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Employing Machine Learning Techniques for Data Enrichment: Increasing the Number of Samples for Effective Gene Expression Data Analysis\",\"authors\":\"U. Erdogdu, Mehmet Tan, R. Alhajj, Faruk Polat, D. Demetrick, J. Rokne\",\"doi\":\"10.1109/BIBM.2011.105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For certain domains, e.g. bioinformatics, producing more real samples is costly, error prone and time consuming. Therefore, there is a need for an intelligent automated process capable of substituting the real samples by artificial samples that carry the same characteristics as the real samples and hence could be used for running comprehensive testing of new methodologies. Motivated by this need, we describe a novel approach that integrates Probabilistic Boolean Network and genetic algorithm based techniques into a framework that uses some existing real samples as input and successfully produces new samples as output. The new samples will inspire the characteristics of the existing samples without duplicating them. This leads to diversity in the samples and hence a more rich set of samples to be used in testing. The developed framework incorporates two models (perspectives) for sample generation. We illustrate its applicability for producing new gene expression data samples, a high demanding area that has not received attention. The two perspectives employed in the process are based on models that are not closely related, the independence eliminates the bias of having the produced approach covering only certain characteristics of the domain and leading to samples skewed towards one direction. The produced results are very promising in showing the effectiveness, usefulness and applicability of the proposed multi-model framework.\",\"PeriodicalId\":6345,\"journal\":{\"name\":\"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)\",\"volume\":\"9 1\",\"pages\":\"238-242\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2011.105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2011.105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Employing Machine Learning Techniques for Data Enrichment: Increasing the Number of Samples for Effective Gene Expression Data Analysis
For certain domains, e.g. bioinformatics, producing more real samples is costly, error prone and time consuming. Therefore, there is a need for an intelligent automated process capable of substituting the real samples by artificial samples that carry the same characteristics as the real samples and hence could be used for running comprehensive testing of new methodologies. Motivated by this need, we describe a novel approach that integrates Probabilistic Boolean Network and genetic algorithm based techniques into a framework that uses some existing real samples as input and successfully produces new samples as output. The new samples will inspire the characteristics of the existing samples without duplicating them. This leads to diversity in the samples and hence a more rich set of samples to be used in testing. The developed framework incorporates two models (perspectives) for sample generation. We illustrate its applicability for producing new gene expression data samples, a high demanding area that has not received attention. The two perspectives employed in the process are based on models that are not closely related, the independence eliminates the bias of having the produced approach covering only certain characteristics of the domain and leading to samples skewed towards one direction. The produced results are very promising in showing the effectiveness, usefulness and applicability of the proposed multi-model framework.