Pub Date : 2015-09-01DOI: 10.1504/IJDMB.2015.072101
Nadia Ben Nsira, T. Lecroq, M. Elloumi
In the last decade, biology and medicine have undergone a fundamental change: next generation sequencing (NGS) technologies have enabled to obtain genomic sequences very quickly and at small costs compared to the traditional Sanger method. These NGS technologies have thus permitted to collect genomic sequences (genes, exomes or even full genomes) of individuals of the same species. These latter sequences are identical to more than 99%. There is thus a strong need for efficient algorithms for indexing and performing fast pattern matching in such specific sets of sequences. In this paper we propose a very efficient algorithm that solves the exact pattern matching problem in a set of highly similar DNA sequences where only the pattern can be pre-processed. This new algorithm extends variants of the Boyer-Moore exact string matching algorithm. Experimental results show that it exhibits the best performances in practice.
{"title":"A fast Boyer-Moore type pattern matching algorithm for highly similar sequences","authors":"Nadia Ben Nsira, T. Lecroq, M. Elloumi","doi":"10.1504/IJDMB.2015.072101","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072101","url":null,"abstract":"In the last decade, biology and medicine have undergone a fundamental change: next generation sequencing (NGS) technologies have enabled to obtain genomic sequences very quickly and at small costs compared to the traditional Sanger method. These NGS technologies have thus permitted to collect genomic sequences (genes, exomes or even full genomes) of individuals of the same species. These latter sequences are identical to more than 99%. There is thus a strong need for efficient algorithms for indexing and performing fast pattern matching in such specific sets of sequences. In this paper we propose a very efficient algorithm that solves the exact pattern matching problem in a set of highly similar DNA sequences where only the pattern can be pre-processed. This new algorithm extends variants of the Boyer-Moore exact string matching algorithm. Experimental results show that it exhibits the best performances in practice.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"45 1","pages":"266-88"},"PeriodicalIF":0.3,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072101","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1504/IJDMB.2015.072092
C. Gunavathi, K. Premalatha
Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.
{"title":"Cuckoo search optimisation for feature selection in cancer classification: a new approach","authors":"C. Gunavathi, K. Premalatha","doi":"10.1504/IJDMB.2015.072092","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072092","url":null,"abstract":"Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 3 1","pages":"248-65"},"PeriodicalIF":0.3,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072092","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1504/IJDMB.2015.072091
Wael Zakaria Abd Allah, Y. Kotb, F. Ghaleb
The MCR-Miner algorithm is aimed to mine all maximal high confident association rules form the microarray up/down-expressed genes data set. This paper introduces two new algorithms: IMCR-Miner and PMCR-Miner. The IMCR-Miner algorithm is an extension of the MCR-Miner algorithm with some improvements. These improvements implement a novel way to store the samples of each gene into a list of unsigned integers in order to benefit using the bitwise operations. In addition, the IMCR-Miner algorithm overcomes the drawbacks faced by the MCR-Miner algorithm by setting some restrictions to ignore repeated comparisons. The PMCR-Miner algorithm is a parallel version of the new proposed IMCR-Miner algorithm. The PMCR-Miner algorithm is based on shared-memory systems and task parallelism, where no time is needed in the process of sharing and combining data between processors. The experimental results on real microarray data sets show that the PMCR-Miner algorithm is more efficient and scalable than the counterparts.
{"title":"PMCR-Miner: parallel maximal confident association rules miner algorithm for microarray data set","authors":"Wael Zakaria Abd Allah, Y. Kotb, F. Ghaleb","doi":"10.1504/IJDMB.2015.072091","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072091","url":null,"abstract":"The MCR-Miner algorithm is aimed to mine all maximal high confident association rules form the microarray up/down-expressed genes data set. This paper introduces two new algorithms: IMCR-Miner and PMCR-Miner. The IMCR-Miner algorithm is an extension of the MCR-Miner algorithm with some improvements. These improvements implement a novel way to store the samples of each gene into a list of unsigned integers in order to benefit using the bitwise operations. In addition, the IMCR-Miner algorithm overcomes the drawbacks faced by the MCR-Miner algorithm by setting some restrictions to ignore repeated comparisons. The PMCR-Miner algorithm is a parallel version of the new proposed IMCR-Miner algorithm. The PMCR-Miner algorithm is based on shared-memory systems and task parallelism, where no time is needed in the process of sharing and combining data between processors. The experimental results on real microarray data sets show that the PMCR-Miner algorithm is more efficient and scalable than the counterparts.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 3 1","pages":"225-47"},"PeriodicalIF":0.3,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072091","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1504/IJDMB.2015.072103
S. Richter, I. Fetzer, M. Thullner, F. Centler, P. Dittrich
Knowledge of metabolic processes is collected in easily accessable online databases which are increasing rapidly in content and detail. Using these databases for the automatic construction of metabolic network models requires high accuracy and consistency. In this bipartite study we evaluate current accuracy and consistency problems using the KEGG database as a prominent example and propose design principles for dealing with such problems. In the first half, we present our computational approach for classifying inconsistencies and provide an overview of the classes of inconsistencies we identified. We detected inconsistencies both for database entries referring to substances and entries referring to reactions. In the second part, we present strategies to deal with the detected problem classes. We especially propose a rule-based database approach which allows for the inclusion of parameterised molecular species and parameterised reactions. Detailed case-studies and a comparison of explicit networks from KEGG with their anticipated rule-based representation underline the applicability and scalability of this approach.
{"title":"Towards rule-based metabolic databases: a requirement analysis based on KEGG","authors":"S. Richter, I. Fetzer, M. Thullner, F. Centler, P. Dittrich","doi":"10.1504/IJDMB.2015.072103","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.072103","url":null,"abstract":"Knowledge of metabolic processes is collected in easily accessable online databases which are increasing rapidly in content and detail. Using these databases for the automatic construction of metabolic network models requires high accuracy and consistency. In this bipartite study we evaluate current accuracy and consistency problems using the KEGG database as a prominent example and propose design principles for dealing with such problems. In the first half, we present our computational approach for classifying inconsistencies and provide an overview of the classes of inconsistencies we identified. We detected inconsistencies both for database entries referring to substances and entries referring to reactions. In the second part, we present strategies to deal with the detected problem classes. We especially propose a rule-based database approach which allows for the inclusion of parameterised molecular species and parameterised reactions. Detailed case-studies and a comparison of explicit networks from KEGG with their anticipated rule-based representation underline the applicability and scalability of this approach.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 3 1","pages":"289-319"},"PeriodicalIF":0.3,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072103","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-01DOI: 10.1504/IJDMB.2015.071553
Limin Li, Shuqin Zhang
The existence of confounders such as population structure in genome-wide association study makes it difficult to apply machine learning methods directly to solve biological problems. It is still unclear how to effectively correct confounders. In this work, we propose an Orthogonal Projection Correction (OPC) method to correct confounders. This is achieved by orthogonally decomposing each feature to a confounding component and a non-confounding component, such that the original data can be best reconstructed by only the non-confounding components of features. The confounder space is built based on prior knowledge, and each feature is projected to its orthogonal complement space. This OPC procedure is shown to be kernelisable. We then propose a ProSVM method by integrating the OPC method and support vector machine for classification. In the experiments, our OPC method for confounder correction improves the tumour diagnosis based on samples from different labs and phenotype prediction in the presence of population structure.
{"title":"Orthogonal projection correction for confounders in biological data classification","authors":"Limin Li, Shuqin Zhang","doi":"10.1504/IJDMB.2015.071553","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.071553","url":null,"abstract":"The existence of confounders such as population structure in genome-wide association study makes it difficult to apply machine learning methods directly to solve biological problems. It is still unclear how to effectively correct confounders. In this work, we propose an Orthogonal Projection Correction (OPC) method to correct confounders. This is achieved by orthogonally decomposing each feature to a confounding component and a non-confounding component, such that the original data can be best reconstructed by only the non-confounding components of features. The confounder space is built based on prior knowledge, and each feature is projected to its orthogonal complement space. This OPC procedure is shown to be kernelisable. We then propose a ProSVM method by integrating the OPC method and support vector machine for classification. In the experiments, our OPC method for confounder correction improves the tumour diagnosis based on samples from different labs and phenotype prediction in the presence of population structure.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 2 1","pages":"181-96"},"PeriodicalIF":0.3,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.071553","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-01DOI: 10.1504/IJDMB.2015.071523
Ali Katanforoush, Ehsan Mahdavi
MicroRNAs (miRNAs) are a class of short RNA molecules that regulate gene expression by binding directly to messenger RNAs. Conventional approaches to miRNA target prediction estimate the accessibility of target sites and the strength of the binding miRNA by finding optimums of some energy models, which involves O(n3) computations. Alternatively, we narrow down potential binding sites of miRNAs to suboptimal hits of a pairwise alignment algorithm called Fitting Alignment in O(n2). We invoke a same algorithm, once for all candidate sites to measure the site accessibilities. These features are applied to a binary classifier being learned to predict true associations between miRNAs and target genes. Training the classifier requires the negative samples indicating non-affected genes. The experiments verifying such negative associations have been rarely performed, so we exploit tissue-specific gene expression data to impute the negative associations. The recall rate of our method is above 70% (at precision 85%).
{"title":"miRNA target recognition using features of suboptimal alignments","authors":"Ali Katanforoush, Ehsan Mahdavi","doi":"10.1504/IJDMB.2015.071523","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.071523","url":null,"abstract":"MicroRNAs (miRNAs) are a class of short RNA molecules that regulate gene expression by binding directly to messenger RNAs. Conventional approaches to miRNA target prediction estimate the accessibility of target sites and the strength of the binding miRNA by finding optimums of some energy models, which involves O(n3) computations. Alternatively, we narrow down potential binding sites of miRNAs to suboptimal hits of a pairwise alignment algorithm called Fitting Alignment in O(n2). We invoke a same algorithm, once for all candidate sites to measure the site accessibilities. These features are applied to a binary classifier being learned to predict true associations between miRNAs and target genes. Training the classifier requires the negative samples indicating non-affected genes. The experiments verifying such negative associations have been rarely performed, so we exploit tissue-specific gene expression data to impute the negative associations. The recall rate of our method is above 70% (at precision 85%).","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 2 1","pages":"171-80"},"PeriodicalIF":0.3,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.071523","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-01DOI: 10.1504/IJDMB.2015.071544
T. Johnsten, Laura Fain, Leanna Fain, Ryan G. Benton, Ethan Butler, L. Pannell, Ming Tan
Analysing and classifying sequences based on similarities and differences is a mathematical problem of escalating relevance and importance in many scientific disciplines. One of the primary challenges in applying machine learning algorithms to sequential data, such as biological sequences, is the extraction and representation of significant features from the data. To address this problem, we have recently developed a representation, entitled Multi-Layered Vector Spaces (MLVS), which is a simple mathematical model that maps sequences into a set of MLVS. We demonstrate the usefulness of the model by applying it to the problem of identifying signal peptides. MLVS feature vectors are generated from a collection of protein sequences and the resulting vectors are used to create support vector machine classifiers. Experiments show that the MLVS-based classifiers are able to outperform or perform on par with several existing methods that are specifically designed for the purpose of identifying signal peptides.
{"title":"Exploiting multi-layered vector spaces for signal peptide detection","authors":"T. Johnsten, Laura Fain, Leanna Fain, Ryan G. Benton, Ethan Butler, L. Pannell, Ming Tan","doi":"10.1504/IJDMB.2015.071544","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.071544","url":null,"abstract":"Analysing and classifying sequences based on similarities and differences is a mathematical problem of escalating relevance and importance in many scientific disciplines. One of the primary challenges in applying machine learning algorithms to sequential data, such as biological sequences, is the extraction and representation of significant features from the data. To address this problem, we have recently developed a representation, entitled Multi-Layered Vector Spaces (MLVS), which is a simple mathematical model that maps sequences into a set of MLVS. We demonstrate the usefulness of the model by applying it to the problem of identifying signal peptides. MLVS feature vectors are generated from a collection of protein sequences and the resulting vectors are used to create support vector machine classifiers. Experiments show that the MLVS-based classifiers are able to outperform or perform on par with several existing methods that are specifically designed for the purpose of identifying signal peptides.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 2 1","pages":"141-57"},"PeriodicalIF":0.3,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.071544","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-01DOI: 10.1504/IJDMB.2015.071515
Fei Han, Shanxiu Yang, Jian Guan
In this paper, a hybrid approach based on clustering and Particle Swarm Optimisation (PSO) is proposed to perform gene selection and classification for microarray data. In the new method, firstly, genes are partitioned into a predetermined number of clusters by K-means method. Since the genes in each cluster have much redundancy, Max-Relevance Min-Redundancy (mRMR) strategy is used to reduce redundancy of the clustered genes. Then, PSO is used to perform further gene selection from the remaining clustered genes. Because of its better generalisation performance with much faster convergence rate than other learning algorithms for neural networks, Extreme Learning Machine (ELM) is chosen to evaluate candidate gene subsets selected by PSO and perform samples classification in this study. The proposed method selects less redundant genes as well as increases prediction accuracy and its efficiency and effectiveness are verified by extensive comparisons with other classical methods on three open microarray data.
{"title":"An effective hybrid approach of gene selection and classification for microarray data based on clustering and particle swarm optimisation","authors":"Fei Han, Shanxiu Yang, Jian Guan","doi":"10.1504/IJDMB.2015.071515","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.071515","url":null,"abstract":"In this paper, a hybrid approach based on clustering and Particle Swarm Optimisation (PSO) is proposed to perform gene selection and classification for microarray data. In the new method, firstly, genes are partitioned into a predetermined number of clusters by K-means method. Since the genes in each cluster have much redundancy, Max-Relevance Min-Redundancy (mRMR) strategy is used to reduce redundancy of the clustered genes. Then, PSO is used to perform further gene selection from the remaining clustered genes. Because of its better generalisation performance with much faster convergence rate than other learning algorithms for neural networks, Extreme Learning Machine (ELM) is chosen to evaluate candidate gene subsets selected by PSO and perform samples classification in this study. The proposed method selects less redundant genes as well as increases prediction accuracy and its efficiency and effectiveness are verified by extensive comparisons with other classical methods on three open microarray data.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"33 1","pages":"103-21"},"PeriodicalIF":0.3,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.071515","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-01DOI: 10.1504/IJDMB.2015.071534
Yuan Zhang, Yue Cheng, Liang Ge, Nan Du, Ke-bin Jia, A. Zhang
Many clustering methods have been developed to identify functional modules in Protein-Protein Interaction (PPI) networks but the results are far from satisfaction. To overcome the noise and incomplete problems of PPI networks and find more accurate and stable functional modules, we propose an integrative method, bipartite graph-based Non-negative Matrix Factorisation method (BiNMF), in which we adopt multiple biological data sources as different views that describe PPIs. Specifically, traditional clustering models are adopted as preliminary analysis of different views of protein functional similarity. Then the intermediate clustering results are represented by a bipartite graph which can comprehensively represent the relationships between proteins and intermediate clusters and finally overlapping clustering results are achieved. Through extensive experiments, we see that our method is superior to baseline methods and detailed analysis has demonstrated the benefits of integrating diverse clustering methods and multiple biological information sources.
{"title":"A graph-based integrative method of detecting consistent protein functional modules from multiple data sources","authors":"Yuan Zhang, Yue Cheng, Liang Ge, Nan Du, Ke-bin Jia, A. Zhang","doi":"10.1504/IJDMB.2015.071534","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.071534","url":null,"abstract":"Many clustering methods have been developed to identify functional modules in Protein-Protein Interaction (PPI) networks but the results are far from satisfaction. To overcome the noise and incomplete problems of PPI networks and find more accurate and stable functional modules, we propose an integrative method, bipartite graph-based Non-negative Matrix Factorisation method (BiNMF), in which we adopt multiple biological data sources as different views that describe PPIs. Specifically, traditional clustering models are adopted as preliminary analysis of different views of protein functional similarity. Then the intermediate clustering results are represented by a bipartite graph which can comprehensively represent the relationships between proteins and intermediate clusters and finally overlapping clustering results are achieved. Through extensive experiments, we see that our method is superior to baseline methods and detailed analysis has demonstrated the benefits of integrating diverse clustering methods and multiple biological information sources.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 2 1","pages":"122-40"},"PeriodicalIF":0.3,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.071534","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-01DOI: 10.1504/IJDMB.2015.071556
M. Farhadian, H. Mahjub, A. Moghimbeigi, P. Lisboa, J. Poorolajal, Muharram Mansoorizadeh
Microarray technology allows simultaneous measurements of expression levels for thousands of genes. An important aspect of microarray studies includes the prediction of patient survival based on their gene expression profile. This naturally calls for the use of a dimension reduction procedure together with the survival prediction model. In this study, a new method based on wavelet transform for survival-relevant gene selection is presented. Cox proportional hazard model is typically used to build prediction model for patients' survival using the selected genes. The prediction model will be evaluated with the R2, concordance index, likelihood ratio statistic and Akaike information criteria. The results proved that good performance of survival prediction is achieved based on the selected genes. The results suggested the possibility of developing more advanced tools based on wavelets for gene selection from microarray data sets in the context of survival analysis.
{"title":"Wavelet-based gene selection method for survival prediction in diffuse large B-cell lymphomas patients","authors":"M. Farhadian, H. Mahjub, A. Moghimbeigi, P. Lisboa, J. Poorolajal, Muharram Mansoorizadeh","doi":"10.1504/IJDMB.2015.071556","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.071556","url":null,"abstract":"Microarray technology allows simultaneous measurements of expression levels for thousands of genes. An important aspect of microarray studies includes the prediction of patient survival based on their gene expression profile. This naturally calls for the use of a dimension reduction procedure together with the survival prediction model. In this study, a new method based on wavelet transform for survival-relevant gene selection is presented. Cox proportional hazard model is typically used to build prediction model for patients' survival using the selected genes. The prediction model will be evaluated with the R2, concordance index, likelihood ratio statistic and Akaike information criteria. The results proved that good performance of survival prediction is achieved based on the selected genes. The results suggested the possibility of developing more advanced tools based on wavelets for gene selection from microarray data sets in the context of survival analysis.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 2 1","pages":"197-210"},"PeriodicalIF":0.3,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.071556","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}