Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.070072
Dhanasekaran Kuttiyapillai, R Rajeswari
A method for information extraction which processes the unstructured data from document collection has been introduced. A dynamic programming technique adopted to find relevant genes from sequences which are longest and accurate is used for finding matching sequences and identifying effects of various factors. The proposed method could handle complex information sequences which give different meanings in different situations, eliminating irrelevant information. The text contents were pre-processed using a general-purpose method and were applied with entity tagging component. The bottom-up scanning of key-value pairs improves content finding to generate relevant sequences to the testing task. This paper highlights context-based extraction method for extracting food safety information, which is identified from articles, guideline documents and laboratory results. The graphical disease model verifies weak component through utilisation of development data set. This improves the accuracy of information retrieval in biological text analysis and reporting applications.
{"title":"A method for extracting task-oriented information from biological text sources.","authors":"Dhanasekaran Kuttiyapillai, R Rajeswari","doi":"10.1504/ijdmb.2015.070072","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.070072","url":null,"abstract":"<p><p>A method for information extraction which processes the unstructured data from document collection has been introduced. A dynamic programming technique adopted to find relevant genes from sequences which are longest and accurate is used for finding matching sequences and identifying effects of various factors. The proposed method could handle complex information sequences which give different meanings in different situations, eliminating irrelevant information. The text contents were pre-processed using a general-purpose method and were applied with entity tagging component. The bottom-up scanning of key-value pairs improves content finding to generate relevant sequences to the testing task. This paper highlights context-based extraction method for extracting food safety information, which is identified from articles, guideline documents and laboratory results. The graphical disease model verifies weak component through utilisation of development data set. This improves the accuracy of information retrieval in biological text analysis and reporting applications.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 4","pages":"387-99"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.070072","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34192164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.070065
Rohayanti Hassan, Razib M Othman, Zuraini A Shah
To date, classification of structural class using local protein structure rather than the whole structure has been gaining widespread attention. It is noted that the structural class lies in local composition or arrangement of secondary structure, while the threshold-based classification method has restricted rules in determining these structural classes. As a consequence, some of the structures are unknown. In order to determine these unknown structural classes, we propose a fusion algorithm, abbreviated as GSVM-SigLpsSCPred (Granular Support Vector Machine--with Significant Local protein structure for Structural Class Prediction), which consists of two major components, which are: optimal local protein structure to represent the feature vector and granular support vector machine to predict the unknown structural classes. The results highlight the performance of GSVM-SigLpsSCPred as an alternative computational method for low-identity sequences.
目前,利用蛋白质局部结构而非整体结构进行结构类分类已受到广泛关注。指出结构类存在于二级结构的局部组成或排列中,而基于阈值的分类方法在确定这些结构类时存在规则限制。因此,有些结构是未知的。为了确定这些未知的结构类别,我们提出了一种融合算法,简称为GSVM-SigLpsSCPred (Granular Support Vector Machine- with Significant Local protein structure for structural Class Prediction),该算法由两大部分组成,即最优局部蛋白质结构表示特征向量和颗粒支持向量机预测未知结构类别。结果表明,GSVM-SigLpsSCPred作为一种低恒等序列的替代计算方法具有良好的性能。
{"title":"Granular support vector machine to identify unknown structural classes of protein.","authors":"Rohayanti Hassan, Razib M Othman, Zuraini A Shah","doi":"10.1504/ijdmb.2015.070065","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.070065","url":null,"abstract":"<p><p>To date, classification of structural class using local protein structure rather than the whole structure has been gaining widespread attention. It is noted that the structural class lies in local composition or arrangement of secondary structure, while the threshold-based classification method has restricted rules in determining these structural classes. As a consequence, some of the structures are unknown. In order to determine these unknown structural classes, we propose a fusion algorithm, abbreviated as GSVM-SigLpsSCPred (Granular Support Vector Machine--with Significant Local protein structure for Structural Class Prediction), which consists of two major components, which are: optimal local protein structure to represent the feature vector and granular support vector machine to predict the unknown structural classes. The results highlight the performance of GSVM-SigLpsSCPred as an alternative computational method for low-identity sequences.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 4","pages":"451-67"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.070065","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34192168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a systems biology approach to the understanding of the miRNA-regulatory network in colon rectal cancer. An initial set of significant genes in Colon Rectal Cancer (CRC) were obtained by mining relevant literature. An initial set of cancer-related miRNAs were obtained from three databases: miRBase, miRWalk, Targetscan and GEO microarray experiment. First principle methods were then used to generate the global miRNA-gene network. Significant miRNAs and associated transcription factors in the global miRNA-gene network were identified using topological and sub-graph analyses. Eleven novel miRNAs were identified and three of the novel miRNAs, hsa-miR-630, hsa-miR-100 and hsa-miR-99a, were further analysed to elucidate their role in CRC. The proposed methodology effectively made use of literature data and was able to show novel, significant miRNA-transcription associations in CRC.
{"title":"A system biology approach for understanding the miRNA regulatory network in colon rectal cancer.","authors":"Meeta Pradhan, Kshithija Nagulapalli, Lakenvia Ledford, Yogesh Pandit, Mathew Palakal","doi":"10.1504/ijdmb.2015.066332","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066332","url":null,"abstract":"<p><p>In this paper we present a systems biology approach to the understanding of the miRNA-regulatory network in colon rectal cancer. An initial set of significant genes in Colon Rectal Cancer (CRC) were obtained by mining relevant literature. An initial set of cancer-related miRNAs were obtained from three databases: miRBase, miRWalk, Targetscan and GEO microarray experiment. First principle methods were then used to generate the global miRNA-gene network. Significant miRNAs and associated transcription factors in the global miRNA-gene network were identified using topological and sub-graph analyses. Eleven novel miRNAs were identified and three of the novel miRNAs, hsa-miR-630, hsa-miR-100 and hsa-miR-99a, were further analysed to elucidate their role in CRC. The proposed methodology effectively made use of literature data and was able to show novel, significant miRNA-transcription associations in CRC.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 1","pages":"1-30"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066332","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.069414
Christine W Duarte, Yann C Klimentidis, Jacqueline J Harris, Michelle Cardel, José R Fernández
Genome-wide Association Studies (GWAS) have resulted in many discovered risk variants for several obesity-related traits. However, before clinical relevance of these discoveries can be achieved, molecular or physiological mechanisms of these risk variants needs to be discovered. One strategy is to perform data mining of phenotypically-rich data sources such as those present in dbGAP (database of Genotypes and Phenotypes) for hypothesis generation. Here we propose a technique that combines the power of existing Bayesian Network (BN) learning algorithms with the statistical rigour of Structural Equation Modelling (SEM) to produce an overall phenotypic network discovery system with optimal properties. We illustrate our method using the analysis of a candidate SNP data set from the AMERICO sample, a multi-ethnic cross-sectional cohort of roughly 300 children with detailed obesity-related phenotypes. We demonstrate our approach by showing genetic mechanisms for three obesity-related SNPs.
{"title":"Discovery of phenotypic networks from genotypic association studies with application to obesity.","authors":"Christine W Duarte, Yann C Klimentidis, Jacqueline J Harris, Michelle Cardel, José R Fernández","doi":"10.1504/ijdmb.2015.069414","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069414","url":null,"abstract":"<p><p>Genome-wide Association Studies (GWAS) have resulted in many discovered risk variants for several obesity-related traits. However, before clinical relevance of these discoveries can be achieved, molecular or physiological mechanisms of these risk variants needs to be discovered. One strategy is to perform data mining of phenotypically-rich data sources such as those present in dbGAP (database of Genotypes and Phenotypes) for hypothesis generation. Here we propose a technique that combines the power of existing Bayesian Network (BN) learning algorithms with the statistical rigour of Structural Equation Modelling (SEM) to produce an overall phenotypic network discovery system with optimal properties. We illustrate our method using the analysis of a candidate SNP data set from the AMERICO sample, a multi-ethnic cross-sectional cohort of roughly 300 children with detailed obesity-related phenotypes. We demonstrate our approach by showing genetic mechanisms for three obesity-related SNPs.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 2","pages":"129-43"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069414","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34123508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.067957
Swati Vipsita, Santanu Kumar Rath
Protein superfamily classification deals with the problem of predicting the family membership of newly discovered amino acid sequence. Although many trivial alignment methods are already developed by previous researchers, but the present trend demands the application of computational intelligent techniques. As there is an exponential growth in size of biological database, retrieval and inference of essential knowledge in the biological domain become a very cumbersome task. This problem can be easily handled using intelligent techniques due to their ability of tolerance for imprecision, uncertainty, approximate reasoning, and partial truth. This paper discusses the various global and local features extracted from full length protein sequence which are used for the approximation and generalisation of the classifier. The various parameters used for evaluating the performance of the classifiers are also discussed. Therefore, this review article can show right directions to the present researchers to make an improvement over the existing methods.
{"title":"Sequence-based protein superfamily classification using computational intelligence techniques: a review.","authors":"Swati Vipsita, Santanu Kumar Rath","doi":"10.1504/ijdmb.2015.067957","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067957","url":null,"abstract":"<p><p>Protein superfamily classification deals with the problem of predicting the family membership of newly discovered amino acid sequence. Although many trivial alignment methods are already developed by previous researchers, but the present trend demands the application of computational intelligent techniques. As there is an exponential growth in size of biological database, retrieval and inference of essential knowledge in the biological domain become a very cumbersome task. This problem can be easily handled using intelligent techniques due to their ability of tolerance for imprecision, uncertainty, approximate reasoning, and partial truth. This paper discusses the various global and local features extracted from full length protein sequence which are used for the approximation and generalisation of the classifier. The various parameters used for evaluating the performance of the classifiers are also discussed. Therefore, this review article can show right directions to the present researchers to make an improvement over the existing methods.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 4","pages":"424-57"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067957","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34145688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.068951
Jun Ren, Jianxin Wang, Min Li, Fangxiang Wu
Most computational methods for identifying essential proteins focus on the topological centrality of protein-protein interaction (PPI) networks. However, these methods have limitations, such as the difficulty for identifying essential proteins with low centrality values and the poor performance for incomplete PPI network. In this paper, protein complex is proven to be an important factor for determining protein essentiality and a new centrality measure, complex centrality, is proposed. The weighted average of complex centrality and subgraph centrality, called harmonic centrality (HC), is proposed to predict essential proteins. It combines PPI network topology and protein complex information and has better performance than methods based on PPI network. The improvement is higher when the PPI network is incomplete. Furthermore, a weighted PPI network is generated by integrating cellular localisation and biological process to a PPI network. The performance of HC measure is improved 5% in this weighted PPI network.
{"title":"Discovering essential proteins based on PPI network and protein complex.","authors":"Jun Ren, Jianxin Wang, Min Li, Fangxiang Wu","doi":"10.1504/ijdmb.2015.068951","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068951","url":null,"abstract":"<p><p>Most computational methods for identifying essential proteins focus on the topological centrality of protein-protein interaction (PPI) networks. However, these methods have limitations, such as the difficulty for identifying essential proteins with low centrality values and the poor performance for incomplete PPI network. In this paper, protein complex is proven to be an important factor for determining protein essentiality and a new centrality measure, complex centrality, is proposed. The weighted average of complex centrality and subgraph centrality, called harmonic centrality (HC), is proposed to predict essential proteins. It combines PPI network topology and protein complex information and has better performance than methods based on PPI network. The improvement is higher when the PPI network is incomplete. Furthermore, a weighted PPI network is generated by integrating cellular localisation and biological process to a PPI network. The performance of HC measure is improved 5% in this weighted PPI network.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 1","pages":"24-43"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068951","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.068955
Abdul Hakim Mohamed Salleh, Mohd Saberi Mohamad, Safaai Deris, Rosli Md Illias
With the advancement in metabolic engineering technologies, reconstruction of the genome of host organisms to achieve desired phenotypes can be made. However, due to the complexity and size of the genome scale metabolic network, significant components tend to be invisible. We proposed an approach to improve metabolite production that consists of two steps. First, we find the essential genes and identify the minimal genome by a single gene deletion process using Flux Balance Analysis (FBA) and second by identifying the significant pathway for the metabolite production using gene expression data. A genome scale model of Saccharomyces cerevisiae for production of vanillin and acetate is used to test this approach. The result has shown the reliability of this approach to find essential genes, reduce genome size and identify production pathway that can further optimise the production yield. The identified genes and pathways can be extendable to other applications especially in strain optimisation.
{"title":"Metabolites production improvement by identifying minimal genomes and essential genes using flux balance analysis.","authors":"Abdul Hakim Mohamed Salleh, Mohd Saberi Mohamad, Safaai Deris, Rosli Md Illias","doi":"10.1504/ijdmb.2015.068955","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068955","url":null,"abstract":"<p><p>With the advancement in metabolic engineering technologies, reconstruction of the genome of host organisms to achieve desired phenotypes can be made. However, due to the complexity and size of the genome scale metabolic network, significant components tend to be invisible. We proposed an approach to improve metabolite production that consists of two steps. First, we find the essential genes and identify the minimal genome by a single gene deletion process using Flux Balance Analysis (FBA) and second by identifying the significant pathway for the metabolite production using gene expression data. A genome scale model of Saccharomyces cerevisiae for production of vanillin and acetate is used to test this approach. The result has shown the reliability of this approach to find essential genes, reduce genome size and identify production pathway that can further optimise the production yield. The identified genes and pathways can be extendable to other applications especially in strain optimisation.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 1","pages":"85-99"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to screen the more effective software for the pathway and network analysis of Kashin-Beck disease, gene microarrays, TranscriptomeBrowser, MetaCore and GeneMANIA were used for analysis. Three significant chondrocytic pathways and one network were screened by TranscriptomeBrowser; one significant pathway and one network were identified by MetaCore. BAX, APAF1, CASP6, BCL2, VEGF, SOCS3, BAK, TGFBI, TNFAIP6, TNFRSF11B and THBS1 were significant genes associated with the biological function of chondrocyte or cartilage involved in the TranscriptomeBrowser or MetaCore results. The interactions between the significant genes and their adjacent genes were searched and classified in GeneMANIA. In pathway analysis results, TranscriptomeBrowser is superior to get the interaction of pathway and co-expression compared with MetaCore; MetaCore is superior to get the interaction of physical interaction compared with TranscriptomeBrowser. In network analysis results, TranscriptomeBrowser contains more interaction message of co-localisation, MetaCore contains, more interaction message of co-expression.
{"title":"To screen the effective software for analysing gene interactions from Kashin-Beck disease genome profiling pathway and network, according to the tool of GeneMANIA.","authors":"Sen Wang, Weizhuo Wang, Junjie Zhao, Feng Zhang, Shulan He, Xiong Guo","doi":"10.1504/ijdmb.2015.068963","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068963","url":null,"abstract":"<p><p>In order to screen the more effective software for the pathway and network analysis of Kashin-Beck disease, gene microarrays, TranscriptomeBrowser, MetaCore and GeneMANIA were used for analysis. Three significant chondrocytic pathways and one network were screened by TranscriptomeBrowser; one significant pathway and one network were identified by MetaCore. BAX, APAF1, CASP6, BCL2, VEGF, SOCS3, BAK, TGFBI, TNFAIP6, TNFRSF11B and THBS1 were significant genes associated with the biological function of chondrocyte or cartilage involved in the TranscriptomeBrowser or MetaCore results. The interactions between the significant genes and their adjacent genes were searched and classified in GeneMANIA. In pathway analysis results, TranscriptomeBrowser is superior to get the interaction of pathway and co-expression compared with MetaCore; MetaCore is superior to get the interaction of physical interaction compared with TranscriptomeBrowser. In network analysis results, TranscriptomeBrowser contains more interaction message of co-localisation, MetaCore contains, more interaction message of co-expression.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 1","pages":"100-14"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068963","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.068950
Yulin Zhang, Shudong Wang, Dazhi Meng
Analysing structure of gene networks is an important way to understand regulatory mechanisms of organism at the molecular level. In this work, gene mutual information networks are constructed based on gene expression profiles in prostate tissues with and without cancer. In order to contrast structural difference of normal and diseased networks, curves of four structural parameters are given with the change of thresholds. Then threshold discrimination intervals and discrimination weights are defined. A method of finding structural key genes with significant degree-difference is proposed. The finding of key genes will help the biomedical scientists to further research the pathogenesis of prostate cancer. Finally randomisation test is performed to prove that these structural parameters can distinguish normal and prostate cancer in their structures compared with these results in real data.
{"title":"Modelling and structural characteristics analysis of gene networks for prostate cancer.","authors":"Yulin Zhang, Shudong Wang, Dazhi Meng","doi":"10.1504/ijdmb.2015.068950","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068950","url":null,"abstract":"<p><p>Analysing structure of gene networks is an important way to understand regulatory mechanisms of organism at the molecular level. In this work, gene mutual information networks are constructed based on gene expression profiles in prostate tissues with and without cancer. In order to contrast structural difference of normal and diseased networks, curves of four structural parameters are given with the change of thresholds. Then threshold discrimination intervals and discrimination weights are defined. A method of finding structural key genes with significant degree-difference is proposed. The finding of key genes will help the biomedical scientists to further research the pathogenesis of prostate cancer. Finally randomisation test is performed to prove that these structural parameters can distinguish normal and prostate cancer in their structures compared with these results in real data.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 1","pages":"14-23"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068950","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34106977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.069657
Huijuan Lu, Shasha Wei, Zili Zhou, Yanzi Miao, Yi Lu
The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.
{"title":"Regularised extreme learning machine with misclassification cost and rejection cost for gene expression data classification.","authors":"Huijuan Lu, Shasha Wei, Zili Zhou, Yanzi Miao, Yi Lu","doi":"10.1504/ijdmb.2015.069657","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069657","url":null,"abstract":"<p><p>The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 3","pages":"294-312"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069657","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34125295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}