The availability of rapidly increasing repositories of micro array data requires the help of computer-aided analysis techniques. This data combined with a growing knowledge base about molecular processes enables the use of intelligent machine learning algorithms to expand the existing knowledge base. In this paper, we propose a novel algorithm, namely iterated Hidden Markov Model, to query micro array expression data with genes known to be involved in the same function to produce novel genes involved with the same cellular function. We run this algorithm on publicly available benchmark data sets and show that it outperforms comparable machine learning approaches.
{"title":"Functional Gene Detection and Clustering from Seed Gene Sets","authors":"Alexander Senf, Xue-wen Chen","doi":"10.1109/BIBM.2011.48","DOIUrl":"https://doi.org/10.1109/BIBM.2011.48","url":null,"abstract":"The availability of rapidly increasing repositories of micro array data requires the help of computer-aided analysis techniques. This data combined with a growing knowledge base about molecular processes enables the use of intelligent machine learning algorithms to expand the existing knowledge base. In this paper, we propose a novel algorithm, namely iterated Hidden Markov Model, to query micro array expression data with genes known to be involved in the same function to produce novel genes involved with the same cellular function. We run this algorithm on publicly available benchmark data sets and show that it outperforms comparable machine learning approaches.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"212 1","pages":"179-184"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79428316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-12DOI: 10.1109/BIBMW.2011.6112522
Rajeshree Joshi, N. Yanasak
In our lab, we are studying a mouse model of Hereditary Hemorrhagic Telangiectasia (HHT), a genetic disorder that leads to ArterioVenous Malformations (AVMs) in brain. Using this mouse model and the technique of Magnetic Resonance Angiography (MRA), we are pursuing the goal of tracking AVMs to determine a classification system that stratifies AVMs by longitudinal evolution. Before this work can occur, we need to build a three-dimensional (3D) MRA Atlas for a healthy normal mouse brain for comparison. The 3D atlas we have built presents the vascular structure for the healthy normal mouse brain using a graphical software tool.
{"title":"Magnetic resonance angiography study of a normal mouse brain for creating a three-dimensional cerebral vasculature atlas and software for labeling vessels","authors":"Rajeshree Joshi, N. Yanasak","doi":"10.1109/BIBMW.2011.6112522","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112522","url":null,"abstract":"In our lab, we are studying a mouse model of Hereditary Hemorrhagic Telangiectasia (HHT), a genetic disorder that leads to ArterioVenous Malformations (AVMs) in brain. Using this mouse model and the technique of Magnetic Resonance Angiography (MRA), we are pursuing the goal of tracking AVMs to determine a classification system that stratifies AVMs by longitudinal evolution. Before this work can occur, we need to build a three-dimensional (3D) MRA Atlas for a healthy normal mouse brain for comparison. The 3D atlas we have built presents the vascular structure for the healthy normal mouse brain using a graphical software tool.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"74 1","pages":"966-968"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84515279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-12DOI: 10.1109/BIBMW.2011.6112357
I. Astrakhantseva, David S. Campo, Aufra Araujo, C. Teo, Y. Khudyakov, S. Kamili
Differentiation between acute and chronic HCV infections is clinically important given that early treatment of infected patients leads to high rates of sustained virological response. Analysis of HVR1 sequences (n=2179) from samples obtained from patients with acute (n=49) and chronic (n=102) HCV infections showed that intra-host HVR1 diversity was 1.8 times higher in patients with chronic than acute infection. Analysis of molecular variance showed significant differences between sequences from acute and chronic patients. We found statistically significant differences in polarity, volume and hydrophobicity of amino acids at 10 HVR1 positions. A classification model constructed using the 10 positions in HVR1 distinguished between acute and chronic cases with accuracy of 88% in cross-validation experiments. The results indicate that progression from acute to chronic stage of HCV infection is accompanied by characteristic changes in amino acid composition of HVR1, suggesting a substantial regularity of the intra-host HVR1 evolution.
{"title":"Variation in physicochemical properties of the hypervariable region 1 during acute and chronic stages of hepatitis C virus infection","authors":"I. Astrakhantseva, David S. Campo, Aufra Araujo, C. Teo, Y. Khudyakov, S. Kamili","doi":"10.1109/BIBMW.2011.6112357","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112357","url":null,"abstract":"Differentiation between acute and chronic HCV infections is clinically important given that early treatment of infected patients leads to high rates of sustained virological response. Analysis of HVR1 sequences (n=2179) from samples obtained from patients with acute (n=49) and chronic (n=102) HCV infections showed that intra-host HVR1 diversity was 1.8 times higher in patients with chronic than acute infection. Analysis of molecular variance showed significant differences between sequences from acute and chronic patients. We found statistically significant differences in polarity, volume and hydrophobicity of amino acids at 10 HVR1 positions. A classification model constructed using the 10 positions in HVR1 distinguished between acute and chronic cases with accuracy of 88% in cross-validation experiments. The results indicate that progression from acute to chronic stage of HCV infection is accompanied by characteristic changes in amino acid composition of HVR1, suggesting a substantial regularity of the intra-host HVR1 evolution.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"152 1","pages":"72-78"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83823017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-12DOI: 10.1109/BIBMW.2011.6112381
N. A. Yousri, D. Elkaffash
Gene expression arrays provide a rich source of information on the behaviour of thousands of genes for several clinical conditions in a particular tumor/cancer. Such expression sets when integrated with functional classification of genes enrich information provided from both sources. Stemming from the need to score relations between functional groups of genes and multiple clinical types associated with a tumor, this study proposes to use Jaccard similarity. For any set of genes, this measure can be used to measure the association between two sets of gene classes/groups, obtained from two different sources of information. In the proposed study, we particularly consider subsets of overexpressing genes in cancer expression sets. This enables the identification of unique genes and associate their most correlated sample clinical types to their functional groups. Experiments on a breast cancer expression set are done to illustrate the use of the proposed measure.
{"title":"Associating gene functional groups with multiple clinical conditions using Jaccard similarity","authors":"N. A. Yousri, D. Elkaffash","doi":"10.1109/BIBMW.2011.6112381","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112381","url":null,"abstract":"Gene expression arrays provide a rich source of information on the behaviour of thousands of genes for several clinical conditions in a particular tumor/cancer. Such expression sets when integrated with functional classification of genes enrich information provided from both sources. Stemming from the need to score relations between functional groups of genes and multiple clinical types associated with a tumor, this study proposes to use Jaccard similarity. For any set of genes, this measure can be used to measure the association between two sets of gene classes/groups, obtained from two different sources of information. In the proposed study, we particularly consider subsets of overexpressing genes in cancer expression sets. This enables the identification of unique genes and associate their most correlated sample clinical types to their functional groups. Experiments on a breast cancer expression set are done to illustrate the use of the proposed measure.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"8 1","pages":"241-246"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82369055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. D. Lopez, Jingyi Yu, C. Arighi, Hongzhan Huang, H. Shatkay, Cathy H. Wu
Figures in biomedical articles often constitute direct evidence of experimental results. Image analysis methods can be coupled with text-based methods to improve knowledge discovery. However, automatically harvesting figures along with their associated captions from full-text articles remains challenging. In this paper, we present an automatic system for robustly harvesting figures from biomedical literature. Our approach relies on the idea that the PDF specification of the document layout can be used to identify encoded figures and figure boundaries within the PDF and enforce constraints among figure-regions. This allows us to harvest fragments of figures (subfigures), from the PDF, correctly identify subfigures that belong to the same figure, and identify the captions associated with each figure. Our method simultaneously recovers figures and captions and applies additional filtering process to remove irrelevant figures such as logos, to eliminate text passages that were incorrectly identified as captions, and to re-group subfigures to generate a putative figure. Finally, we associate figures with captions. Our preliminary experiments suggest that our method achieves an accuracy of 95% in harvesting figures-caption pairs from a set of 2, 035 full-text biomedical documents from Bio Creative III, containing 12, 574 figures.
{"title":"An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents","authors":"L. D. Lopez, Jingyi Yu, C. Arighi, Hongzhan Huang, H. Shatkay, Cathy H. Wu","doi":"10.1109/BIBM.2011.26","DOIUrl":"https://doi.org/10.1109/BIBM.2011.26","url":null,"abstract":"Figures in biomedical articles often constitute direct evidence of experimental results. Image analysis methods can be coupled with text-based methods to improve knowledge discovery. However, automatically harvesting figures along with their associated captions from full-text articles remains challenging. In this paper, we present an automatic system for robustly harvesting figures from biomedical literature. Our approach relies on the idea that the PDF specification of the document layout can be used to identify encoded figures and figure boundaries within the PDF and enforce constraints among figure-regions. This allows us to harvest fragments of figures (subfigures), from the PDF, correctly identify subfigures that belong to the same figure, and identify the captions associated with each figure. Our method simultaneously recovers figures and captions and applies additional filtering process to remove irrelevant figures such as logos, to eliminate text passages that were incorrectly identified as captions, and to re-group subfigures to generate a putative figure. Finally, we associate figures with captions. Our preliminary experiments suggest that our method achieves an accuracy of 95% in harvesting figures-caption pairs from a set of 2, 035 full-text biomedical documents from Bio Creative III, containing 12, 574 figures.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"55 1","pages":"578-581"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90066222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-12DOI: 10.1109/BIBMW.2011.6112408
A. Casagrande, Francesco Fabris
Protein domain classification is a useful instrument to deduce functional properties of proteins. Several databases have been introduced that collect domains having a known structure, and SCOP is probably the most used one. It classifies domains in a four level hierarchy and it groups sequences according to both structural similarity and phylogenetic relation. Many automatic tools to classify domains according to available databases have been proposed so far. In this paper we introduce the notion of “fingerprint” as an easy and readable digest of the similarities between a sequence and an entire set of sequences, and this concept offers us a rationale for building an automatic SCOP classifier which assigns a query sequence to the most likely family. Fingerprint-based analysis has been implemented in a software tool and we report some experimental validations for it.
{"title":"SCOP family fingerprints: An information theoretic approach to structural classification of protein domains","authors":"A. Casagrande, Francesco Fabris","doi":"10.1109/BIBMW.2011.6112408","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112408","url":null,"abstract":"Protein domain classification is a useful instrument to deduce functional properties of proteins. Several databases have been introduced that collect domains having a known structure, and SCOP is probably the most used one. It classifies domains in a four level hierarchy and it groups sequences according to both structural similarity and phylogenetic relation. Many automatic tools to classify domains according to available databases have been proposed so far. In this paper we introduce the notion of “fingerprint” as an easy and readable digest of the similarities between a sequence and an entire set of sequences, and this concept offers us a rationale for building an automatic SCOP classifier which assigns a query sequence to the most likely family. Fingerprint-based analysis has been implemented in a software tool and we report some experimental validations for it.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"13 1","pages":"416-423"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90438035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-12DOI: 10.1109/BIBMW.2011.6112370
Ruifei Xie, Bin Han, Lihua Li, Juan Zhang, Lei Zhu
Extracting significant features from high-dimensional and small sample-size microarray data is a challenging problem. Other than wrapper or filter methods, we propose a novel feature selection algorithm which integrates the ideas of professional tennis players ranking, such as seed players and dynamic ranking with Monte Carlo simulation. Seed players make the ‘game’ more competitive and selective, hence improve the selection efficiency. Besides, the ranks of features are dynamically updated and this ensures that it is always the current best players to take part in each competitions. The proposed algorithm is tested on widely used public datasets. Results demonstrate that the proposed method comparatively converges faster, more stable and has good performance in classification and therefore is an efficient algorithm for feature selection.
{"title":"Professional tennis player ranking strategy based Monte Carlo feature selection","authors":"Ruifei Xie, Bin Han, Lihua Li, Juan Zhang, Lei Zhu","doi":"10.1109/BIBMW.2011.6112370","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112370","url":null,"abstract":"Extracting significant features from high-dimensional and small sample-size microarray data is a challenging problem. Other than wrapper or filter methods, we propose a novel feature selection algorithm which integrates the ideas of professional tennis players ranking, such as seed players and dynamic ranking with Monte Carlo simulation. Seed players make the ‘game’ more competitive and selective, hence improve the selection efficiency. Besides, the ranks of features are dynamically updated and this ensures that it is always the current best players to take part in each competitions. The proposed algorithm is tested on widely used public datasets. Results demonstrate that the proposed method comparatively converges faster, more stable and has good performance in classification and therefore is an efficient algorithm for feature selection.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"6 1","pages":"165-172"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89755152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We have developed a computational approach which predicts the protein kinases that may regulate the transition between the blood developmental stages of Plasmodium falciparum (P. falciparum). To improve the accuracy of our prediction, synchronized gene expression levels are reconstructed from the observed micro array data generated by the ensembles of non-synchronized cells. Peaks in annotated protein kinase transcript levels are hypothesized to directly correlate with the period when the encoded protein kinases function temporally. Therefore, protein kinases, which putatively regulate a given developmental stage transition, are identified by their peak in synchronized gene expression levels. By analyzing publicly available micro array data set, a few protein kinases are considered to be strongly associated with developmental stage transition. Two of these (PF13 0211, PFB0815w) have recently been implicated in the schizont to ring transition [1], [2]. Another one of these identified (MAL7P1.144) has been found to influence erythrocyte membrane in both trophozoite and schizont [3]. Overall, these results suggest that further functional analysis of the other protein kinases we have predicted may reveal new insights into P. falciparum blood stage development.
{"title":"Deconvolution of Microarray Data Predicts Transcriptionally Regulated Protein Kinases of Plasmodium falciparum","authors":"Wei Zhao, J. Dauwels, J. Niles, Jianshu Cao","doi":"10.1109/BIBM.2011.31","DOIUrl":"https://doi.org/10.1109/BIBM.2011.31","url":null,"abstract":"We have developed a computational approach which predicts the protein kinases that may regulate the transition between the blood developmental stages of Plasmodium falciparum (P. falciparum). To improve the accuracy of our prediction, synchronized gene expression levels are reconstructed from the observed micro array data generated by the ensembles of non-synchronized cells. Peaks in annotated protein kinase transcript levels are hypothesized to directly correlate with the period when the encoded protein kinases function temporally. Therefore, protein kinases, which putatively regulate a given developmental stage transition, are identified by their peak in synchronized gene expression levels. By analyzing publicly available micro array data set, a few protein kinases are considered to be strongly associated with developmental stage transition. Two of these (PF13 0211, PFB0815w) have recently been implicated in the schizont to ring transition [1], [2]. Another one of these identified (MAL7P1.144) has been found to influence erythrocyte membrane in both trophozoite and schizont [3]. Overall, these results suggest that further functional analysis of the other protein kinases we have predicted may reveal new insights into P. falciparum blood stage development.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"5 1","pages":"286-289"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86516660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-12DOI: 10.1109/BIBMW.2011.6112348
Junbo Duan, Ji-Gang Zhang, J. Lefante, H. Deng, Yu-ping Wang
The detection of copy number variation is important to understand complex diseases such as autism, schizophrenia, cancer, etc. In this paper we propose a method to detect copy number variation from next generation sequencing data. Compared with conventional methods to detect copy number variation like array comparative genomic hybridization (aCGH), the next generation sequencing data provide higher resolution of genomic variations. There are a lot of methods to detect copy number variation from next sequencing data, and most of them are based on statistical hypothesis testing. In this paper, we consider this problem from an optimization point of view. The proposed method is based on optimizing a total variation penalized least square criterion, which involves ℓ-1 norm. Inspired by the analytical study of a statics system, we propose an iterative algorithm to find the optimal solution of this optimization problem. The comparative study with other existing methods on simulated data demonstrates that our method can detect relatively small copy number variants (low copy number and small single copy length) with low false positive rate.
{"title":"Detection of copy number variation from next generation sequencing data with total variation penalized least square optimization","authors":"Junbo Duan, Ji-Gang Zhang, J. Lefante, H. Deng, Yu-ping Wang","doi":"10.1109/BIBMW.2011.6112348","DOIUrl":"https://doi.org/10.1109/BIBMW.2011.6112348","url":null,"abstract":"The detection of copy number variation is important to understand complex diseases such as autism, schizophrenia, cancer, etc. In this paper we propose a method to detect copy number variation from next generation sequencing data. Compared with conventional methods to detect copy number variation like array comparative genomic hybridization (aCGH), the next generation sequencing data provide higher resolution of genomic variations. There are a lot of methods to detect copy number variation from next sequencing data, and most of them are based on statistical hypothesis testing. In this paper, we consider this problem from an optimization point of view. The proposed method is based on optimizing a total variation penalized least square criterion, which involves ℓ-1 norm. Inspired by the analytical study of a statics system, we propose an iterative algorithm to find the optimal solution of this optimization problem. The comparative study with other existing methods on simulated data demonstrates that our method can detect relatively small copy number variants (low copy number and small single copy length) with low false positive rate.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"46 1","pages":"3-12"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86556174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand the mechanism of heme-protein interactions and aid in functional annotation. In the present work, we propose a sequence-based approach for the accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. Particularly, we design an intuitive feature selection scheme for informative physicochemical properties. As shown in the primary results, our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent test.
{"title":"Prediction of Heme Binding Sites in Heme Proteins Using an Integrative Sequence Profile Coupling Evolutionary Information with Physicochemical Properties","authors":"Y. Xiong, Wen Zhang, Tao Zeng, Juan Liu","doi":"10.1109/BIBM.2011.8","DOIUrl":"https://doi.org/10.1109/BIBM.2011.8","url":null,"abstract":"Heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand the mechanism of heme-protein interactions and aid in functional annotation. In the present work, we propose a sequence-based approach for the accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. Particularly, we design an intuitive feature selection scheme for informative physicochemical properties. As shown in the primary results, our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent test.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"110 1","pages":"143-146"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87603748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}