Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.066767
Jinsoo Lee, Romans Kasperovics, Wook-Shin Han, Jeong-Hoon Lee, Min Soo Kim, Hune Cho
The Resource Description Framework (RDF) is widely used for sharing biomedical data, such as gene ontology or the online protein database UniProt. SPARQL is a native query language for RDF, featuring regular expressions in queries for which exact values are either irrelevant or unknown. The use of regular expression indexes in SPARQL query processing improves the performance of queries containing regular expressions by up to two orders of magnitude. In this study, we address the update operation for regular expression indexes in RDF databases. We identify major performance problems of straightforward index update algorithms and propose a new algorithm that utilises unique properties of regular expression indexes to increase performance. Our contributions can be summarised as follows: (1) we propose an efficient update algorithm for regular expression indexes in RDF databases, (2) we build a prototype system for the proposed algorithm in C++ and (3) we conduct extensive experiments demonstrating the improvement of our algorithm over the straightforward approaches by an order of magnitude.
{"title":"An efficient algorithm for updating regular expression indexes in RDF databases.","authors":"Jinsoo Lee, Romans Kasperovics, Wook-Shin Han, Jeong-Hoon Lee, Min Soo Kim, Hune Cho","doi":"10.1504/ijdmb.2015.066767","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066767","url":null,"abstract":"<p><p>The Resource Description Framework (RDF) is widely used for sharing biomedical data, such as gene ontology or the online protein database UniProt. SPARQL is a native query language for RDF, featuring regular expressions in queries for which exact values are either irrelevant or unknown. The use of regular expression indexes in SPARQL query processing improves the performance of queries containing regular expressions by up to two orders of magnitude. In this study, we address the update operation for regular expression indexes in RDF databases. We identify major performance problems of straightforward index update algorithms and propose a new algorithm that utilises unique properties of regular expression indexes to increase performance. Our contributions can be summarised as follows: (1) we propose an efficient update algorithm for regular expression indexes in RDF databases, (2) we build a prototype system for the proposed algorithm in C++ and (3) we conduct extensive experiments demonstrating the improvement of our algorithm over the straightforward approaches by an order of magnitude.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 2","pages":"205-22"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33906549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.066768
Shuo Li, James O Nyagilo, Digant P Dave, Wei Wang, Baoju Zhang, Jean Gao
With the latest development of Surface-Enhanced Raman Scattering (SERS) technique, quantitative analysis of Raman spectra has shown the potential and promising trend of development in vivo molecular imaging. Partial Least Squares Regression (PLSR) is state-of-the-art method. But it only relies on training samples, which makes it difficult to incorporate complex domain knowledge. Based on probabilistic Principal Component Analysis (PCA) and probabilistic curve fitting idea, we propose a probabilistic PLSR (PPLSR) model and an Estimation Maximisation (EM) algorithm for estimating parameters. This model explains PLSR from a probabilistic viewpoint, describes its essential meaning and provides a foundation to develop future Bayesian nonparametrics models. Two real Raman spectra datasets were used to evaluate this model, and experimental results show its effectiveness.
{"title":"Probabilistic partial least squares regression for quantitative analysis of Raman spectra.","authors":"Shuo Li, James O Nyagilo, Digant P Dave, Wei Wang, Baoju Zhang, Jean Gao","doi":"10.1504/ijdmb.2015.066768","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066768","url":null,"abstract":"<p><p>With the latest development of Surface-Enhanced Raman Scattering (SERS) technique, quantitative analysis of Raman spectra has shown the potential and promising trend of development in vivo molecular imaging. Partial Least Squares Regression (PLSR) is state-of-the-art method. But it only relies on training samples, which makes it difficult to incorporate complex domain knowledge. Based on probabilistic Principal Component Analysis (PCA) and probabilistic curve fitting idea, we propose a probabilistic PLSR (PPLSR) model and an Estimation Maximisation (EM) algorithm for estimating parameters. This model explains PLSR from a probabilistic viewpoint, describes its essential meaning and provides a foundation to develop future Bayesian nonparametrics models. Two real Raman spectra datasets were used to evaluate this model, and experimental results show its effectiveness.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 2","pages":"223-43"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066768","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33906550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.069418
Chandan K Reddy, Mohammad S Aziz
The functional classification of genes plays a vital role in molecular biology. Detecting previously unknown role of genes and their products in physiological and pathological processes is an important and challenging problem. In this work, information from several biological sources such as comparative genome sequences, gene expression and protein interactions are combined to obtain robust results on predicting gene functions. The information in such heterogeneous sources is often incomplete and hence making the maximum use of all the available information is a challenging problem. We propose an algorithm that improves the performance of prediction of different models built on individual sources. We also develop a heterogeneous boosting framework that uses all the available information even if some sources do not provide any information about some of the genes. We demonstrate the superior performance of the proposed methods in terms of accuracy and F-measure compared to several imputation and integration schemes.
{"title":"Predicting gene functions from multiple biological sources using novel ensemble methods.","authors":"Chandan K Reddy, Mohammad S Aziz","doi":"10.1504/ijdmb.2015.069418","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069418","url":null,"abstract":"<p><p>The functional classification of genes plays a vital role in molecular biology. Detecting previously unknown role of genes and their products in physiological and pathological processes is an important and challenging problem. In this work, information from several biological sources such as comparative genome sequences, gene expression and protein interactions are combined to obtain robust results on predicting gene functions. The information in such heterogeneous sources is often incomplete and hence making the maximum use of all the available information is a challenging problem. We propose an algorithm that improves the performance of prediction of different models built on individual sources. We also develop a heterogeneous boosting framework that uses all the available information even if some sources do not provide any information about some of the genes. We demonstrate the superior performance of the proposed methods in terms of accuracy and F-measure compared to several imputation and integration schemes.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 2","pages":"184-206"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069418","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34123511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.067956
Haochang Wang, Yu Li
As a new branch of data mining and knowledge discovery, the research of biomedical text mining has a rapid progress currently. Biomedical named entity (BNE) recognition is a basic technique in the biomedical knowledge discovery and its performance has direct effects on further discovery and processing in biomedical texts. In this paper, we present an improved method based on co-decision matrix framework for Biomedical Named Entity Recognition (BNER). The relativity between classifiers is utilised by using co-decision matrix to exchange decision information among classifiers. The experiments are carried on GENIA corpus with the best result of 75.9% F-score. Experimental results show that the proposed method, co-decision matrix framework, can yield promising performances.
{"title":"Co-decision matrix framework for name entity recognition in biomedical text.","authors":"Haochang Wang, Yu Li","doi":"10.1504/ijdmb.2015.067956","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067956","url":null,"abstract":"<p><p>As a new branch of data mining and knowledge discovery, the research of biomedical text mining has a rapid progress currently. Biomedical named entity (BNE) recognition is a basic technique in the biomedical knowledge discovery and its performance has direct effects on further discovery and processing in biomedical texts. In this paper, we present an improved method based on co-decision matrix framework for Biomedical Named Entity Recognition (BNER). The relativity between classifiers is utilised by using co-decision matrix to exchange decision information among classifiers. The experiments are carried on GENIA corpus with the best result of 75.9% F-score. Experimental results show that the proposed method, co-decision matrix framework, can yield promising performances.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 4","pages":"412-23"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067956","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34145687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.067973
Xianjun Shen, Yanli Zhao, Yanan Li, Yang Yi, Tingting He, Jincai Yang
In order to overcome the limitations of global modularity and the deficiency of local modularity, we propose a hybrid modularity measure Local-Global Quantification (LGQ) which considers global modularity and local modularity together. LGQ adopts a suitable module feature adjustable parameter to control the balance of global detecting capability and local search capability in Protein-Protein Interactions (PPI) Network. Furthermore, we develop a new protein complex mining algorithm called Best Neighbour and Local-Global Quantification (BN-LGQ) which integrates the best neighbour node and modularity increment. BN-LGQ expands the protein complex by fast searching the best neighbour node of the current cluster and by calculating the modularity increment as a metric to determine whether the best neighbour node can join the current cluster. The experimental results show BN-LGQ performs a better accuracy on predicting protein complexes and has a higher match with the reference protein complexes than MCL and MCODE algorithms. Moreover, BN-LGQ can effectively discover protein complexes with better biological significance in the PPI network.
{"title":"An integrated approach to identify protein complex based on best neighbour and modularity increment.","authors":"Xianjun Shen, Yanli Zhao, Yanan Li, Yang Yi, Tingting He, Jincai Yang","doi":"10.1504/ijdmb.2015.067973","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067973","url":null,"abstract":"<p><p>In order to overcome the limitations of global modularity and the deficiency of local modularity, we propose a hybrid modularity measure Local-Global Quantification (LGQ) which considers global modularity and local modularity together. LGQ adopts a suitable module feature adjustable parameter to control the balance of global detecting capability and local search capability in Protein-Protein Interactions (PPI) Network. Furthermore, we develop a new protein complex mining algorithm called Best Neighbour and Local-Global Quantification (BN-LGQ) which integrates the best neighbour node and modularity increment. BN-LGQ expands the protein complex by fast searching the best neighbour node of the current cluster and by calculating the modularity increment as a metric to determine whether the best neighbour node can join the current cluster. The experimental results show BN-LGQ performs a better accuracy on predicting protein complexes and has a higher match with the reference protein complexes than MCL and MCODE algorithms. Moreover, BN-LGQ can effectively discover protein complexes with better biological significance in the PPI network.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 4","pages":"458-73"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067973","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34145689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.066334
Arash Shaban-Nejad, Volker Haarslev
The issue of ontology evolution and change management is inadequately addressed by available tools and algorithms, mostly due to the lack of suitable knowledge representation formalisms to deal with temporal abstract notations and the overreliance on human factors. Also most of the current approaches have been focused on changes within the internal structure of ontologies and interactions with other existing ontologies have been widely neglected. In our research, after revealing and classifying some of the common alterations in a number of popular biomedical ontologies, we present a novel agent-based framework, Represent, Legitimate and Reproduce (RLR), to semi-automatically manage the evolution of bio-ontologies, with emphasis on the FungalWeb Ontology, with minimal human intervention. RLR assists and guides ontology engineers through the change management process in general and aids in tracking and representing the changes, particularly through the use of category theory and hierarchical graph transformation.
{"title":"Managing changes in distributed biomedical ontologies using hierarchical distributed graph transformation.","authors":"Arash Shaban-Nejad, Volker Haarslev","doi":"10.1504/ijdmb.2015.066334","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066334","url":null,"abstract":"<p><p>The issue of ontology evolution and change management is inadequately addressed by available tools and algorithms, mostly due to the lack of suitable knowledge representation formalisms to deal with temporal abstract notations and the overreliance on human factors. Also most of the current approaches have been focused on changes within the internal structure of ontologies and interactions with other existing ontologies have been widely neglected. In our research, after revealing and classifying some of the common alterations in a number of popular biomedical ontologies, we present a novel agent-based framework, Represent, Legitimate and Reproduce (RLR), to semi-automatically manage the evolution of bio-ontologies, with emphasis on the FungalWeb Ontology, with minimal human intervention. RLR assists and guides ontology engineers through the change management process in general and aids in tracking and representing the changes, particularly through the use of category theory and hierarchical graph transformation.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 1","pages":"53-83"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066334","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.067321
Adetayo Kasim, Ziv Shkedy, Dan Lin, Suzy Van Sanden, Josè Cortiñas Abrahantes, Hinrich W H Göhlmann, Luc Bijnens, Dani Yekutieli, Michael Camilleri, Jeroen Aerssens, Willem Talloen
It has recently been shown that disease associated gene signatures can be identified by profiling tissue other than the disease related tissue. In this paper, we investigate gene signatures for Irritable Bowel Syndrome (IBS) using gene expression profiling of both disease related tissue (colon) and surrogate tissue (rectum). Gene specific joint ANOVA models were used to investigate differentially expressed genes between the IBS patients and the healthy controls taken into account both intra and inter tissue dependencies among expression levels of the same gene. Classification algorithms in combination with feature selection methods were used to investigate the predictive power of gene expression levels from the surrogate and the target tissues. We conclude based on the analyses that expression profiles of the colon and the rectum tissue could result in better predictive accuracy if the disease associated genes are known.
{"title":"Translation of disease associated gene signatures across tissues.","authors":"Adetayo Kasim, Ziv Shkedy, Dan Lin, Suzy Van Sanden, Josè Cortiñas Abrahantes, Hinrich W H Göhlmann, Luc Bijnens, Dani Yekutieli, Michael Camilleri, Jeroen Aerssens, Willem Talloen","doi":"10.1504/ijdmb.2015.067321","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067321","url":null,"abstract":"<p><p>It has recently been shown that disease associated gene signatures can be identified by profiling tissue other than the disease related tissue. In this paper, we investigate gene signatures for Irritable Bowel Syndrome (IBS) using gene expression profiling of both disease related tissue (colon) and surrogate tissue (rectum). Gene specific joint ANOVA models were used to investigate differentially expressed genes between the IBS patients and the healthy controls taken into account both intra and inter tissue dependencies among expression levels of the same gene. Classification algorithms in combination with feature selection methods were used to investigate the predictive power of gene expression levels from the surrogate and the target tissues. We conclude based on the analyses that expression profiles of the colon and the rectum tissue could result in better predictive accuracy if the disease associated genes are known.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 3","pages":"301-13"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34039165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.069658
Chen Zhou, Shao-Wu Zhang, Fei Liu
During the past decades, numerous computational approaches have been introduced for inferring the GRNs. PCA-CMI approach achieves the highest precision on the benchmark GRN datasets; however, it does not recover the meaningful edges that may have been deleted in an earlier iterative process. To recover this disadvantage and enhance the precision and robustness of GRNs inferred, we present an ensemble method, named as JRAMF, to infer GRNs from gene expression data by adopting two strategies of resampling and arithmetic mean fusion in this work. The jackknife resampling procedure were first employed to form a series of sub-datasets of gene expression data, then the PCA-CMI was used to generate the corresponding sub-networks from the sub-datasets, and the final GRN was inferred by integrating these sub-networks with an arithmetic mean fusion strategy. Compared with PCA-CMI algorithm, the results show that JRAMF outperforms significantly PCA-CMI method, which has a high and robust performance.
{"title":"An ensemble method for reconstructing gene regulatory network with jackknife resampling and arithmetic mean fusion.","authors":"Chen Zhou, Shao-Wu Zhang, Fei Liu","doi":"10.1504/ijdmb.2015.069658","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069658","url":null,"abstract":"<p><p>During the past decades, numerous computational approaches have been introduced for inferring the GRNs. PCA-CMI approach achieves the highest precision on the benchmark GRN datasets; however, it does not recover the meaningful edges that may have been deleted in an earlier iterative process. To recover this disadvantage and enhance the precision and robustness of GRNs inferred, we present an ensemble method, named as JRAMF, to infer GRNs from gene expression data by adopting two strategies of resampling and arithmetic mean fusion in this work. The jackknife resampling procedure were first employed to form a series of sub-datasets of gene expression data, then the PCA-CMI was used to generate the corresponding sub-networks from the sub-datasets, and the final GRN was inferred by integrating these sub-networks with an arithmetic mean fusion strategy. Compared with PCA-CMI algorithm, the results show that JRAMF outperforms significantly PCA-CMI method, which has a high and robust performance.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 3","pages":"328-42"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069658","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34125297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.068953
Nicola Bernabò, Mauro Mattioli, Barbara Barboni
In this paper we represented Spermatozoa Activation (SA) the process that leads male gametes to reach their fertilising ability of sea urchin, Caenorhabditis elegans and human as biological networks, i.e. as networks of nodes (molecules) linked by edges (their interactions). Then, we compared them with networks representing ten pathways of relevant physio-pathological importance and with a computer-generated network. We have found that the number of nodes and edges composing each network is not related with the amount of published papers on each specific topic and that all the topological parameters examined are similar in all the networks, thus conferring them a scale free topology and small world behaviour. In conclusion, SA topology, independently from the reproductive biology of considered organism, as others signalling networks is characterised by robustness against random failure, controllability and efficiency in signal transmission.
{"title":"Signal transduction in the activation of spermatozoa compared to other signalling pathways: a biological networks study.","authors":"Nicola Bernabò, Mauro Mattioli, Barbara Barboni","doi":"10.1504/ijdmb.2015.068953","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.068953","url":null,"abstract":"<p><p>In this paper we represented Spermatozoa Activation (SA) the process that leads male gametes to reach their fertilising ability of sea urchin, Caenorhabditis elegans and human as biological networks, i.e. as networks of nodes (molecules) linked by edges (their interactions). Then, we compared them with networks representing ten pathways of relevant physio-pathological importance and with a computer-generated network. We have found that the number of nodes and edges composing each network is not related with the amount of published papers on each specific topic and that all the topological parameters examined are similar in all the networks, thus conferring them a scale free topology and small world behaviour. In conclusion, SA topology, independently from the reproductive biology of considered organism, as others signalling networks is characterised by robustness against random failure, controllability and efficiency in signal transmission.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 1","pages":"59-69"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.068953","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34276058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-01-01DOI: 10.1504/ijdmb.2015.066333
P Ganesh Kumar, C Rani, D Mahibha, T Aruldoss Albert Victoire
The greatest restriction in estimating the information measure for microarray data is the continuous nature of gene expression values. The traditional criterion function of f-information discretises the continuous gene expression value for calculating the probability function during gene selection. This leads to loss of biological meaning of microarray data and results in poor classification accuracy. To overcome this difficulty, the concepts of fuzzy and rough set are combined to redefine the criterion functions of f-information and are used to form candidate genes from which informative genes are selected using neural network. The performance of the proposed Fuzzy-Rough-Neural-based f-Information (FRNf-I) is evaluated using ten gene expression datasets. Simulation results show that the proposed approach compute f-information measure easily without discretisation. Statistical analysis of the test result shows that the proposed FRNf-I selects comparatively less number of genes and more classification accuracy than the other approaches reported in the literature.
{"title":"Fuzzy-rough-neural-based f-information for gene selection and sample classification.","authors":"P Ganesh Kumar, C Rani, D Mahibha, T Aruldoss Albert Victoire","doi":"10.1504/ijdmb.2015.066333","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066333","url":null,"abstract":"<p><p>The greatest restriction in estimating the information measure for microarray data is the continuous nature of gene expression values. The traditional criterion function of f-information discretises the continuous gene expression value for calculating the probability function during gene selection. This leads to loss of biological meaning of microarray data and results in poor classification accuracy. To overcome this difficulty, the concepts of fuzzy and rough set are combined to redefine the criterion functions of f-information and are used to form candidate genes from which informative genes are selected using neural network. The performance of the proposed Fuzzy-Rough-Neural-based f-Information (FRNf-I) is evaluated using ten gene expression datasets. Simulation results show that the proposed approach compute f-information measure easily without discretisation. Statistical analysis of the test result shows that the proposed FRNf-I selects comparatively less number of genes and more classification accuracy than the other approaches reported in the literature.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 1","pages":"31-52"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}