Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706562
M. Ayati, Golnaz Taheri, S. Arab, L. Wong, C. Eslahchi
Removal or suppression of key proteins in an essential pathway of a pathogen is expected to disrupt the pathway and prohibit the pathogen from performing a vital function. Thus disconnecting multiple essential pathways should disrupt the survival of a pathogen even when it has multiple pathways to drug resistance. We consider a scenario where the drug-resistance pathways are unknown. To disrupt these pathways, we consider a cut set S of G, where G is a connected simple graph representing the protein interaction network of the pathogen, so that G-S splits to two partitions such that the endpoints of each pathway are in different partitions. If the difference between the sizes of the two partitions is high, the probability of existence of a functioning pathway in one partition is increased. Thus, we need to partition the graph into two balanced partitions. We approximate the balanced bipartitioning problem with spectral bipartitioning since finding (2,1)-separator is NP-complete. We test our technique on E. coli and C. jejuni. We show that over 50% of genes in the cut sets are essential. Moreover, all proteins in the cut sets have fundamental roles in cell and inhibition of each of them is harmful for cell survival. Also, 20% and 17% of known targets are in the vertex cut of E. coli and C. jejuni. Hence our approach has produced plausible “co-targets” whose inhibition should counter a pathogen's drug resistance.
{"title":"Overcoming drug resistance by co-targeting","authors":"M. Ayati, Golnaz Taheri, S. Arab, L. Wong, C. Eslahchi","doi":"10.1109/BIBM.2010.5706562","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706562","url":null,"abstract":"Removal or suppression of key proteins in an essential pathway of a pathogen is expected to disrupt the pathway and prohibit the pathogen from performing a vital function. Thus disconnecting multiple essential pathways should disrupt the survival of a pathogen even when it has multiple pathways to drug resistance. We consider a scenario where the drug-resistance pathways are unknown. To disrupt these pathways, we consider a cut set S of G, where G is a connected simple graph representing the protein interaction network of the pathogen, so that G-S splits to two partitions such that the endpoints of each pathway are in different partitions. If the difference between the sizes of the two partitions is high, the probability of existence of a functioning pathway in one partition is increased. Thus, we need to partition the graph into two balanced partitions. We approximate the balanced bipartitioning problem with spectral bipartitioning since finding (2,1)-separator is NP-complete. We test our technique on E. coli and C. jejuni. We show that over 50% of genes in the cut sets are essential. Moreover, all proteins in the cut sets have fundamental roles in cell and inhibition of each of them is harmful for cell survival. Also, 20% and 17% of known targets are in the vertex cut of E. coli and C. jejuni. Hence our approach has produced plausible “co-targets” whose inhibition should counter a pathogen's drug resistance.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123102193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706609
K. Drechsler, C. O. Laura
Graphs are useful representations of the liver vas-culature. They support tree matching algorithms in landmark-based registration algorithms, they are useful to separate connected vessels from two different vessel systems and are the basis of vessel annotation tools. In this paper, we propose a hierarchical decomposition of vessel skeletons into sub-branches. This simplifies the process of creating labeled graphs and extracting features. Furthermore, we propose a measure to classify voxels as branch voxels. We applied our method to several datasets with satisfying results and found that the number of sub-branches is normal distributed under rotation.
{"title":"Hierarchical decomposition of vessel skeletons for graph creation and feature extraction","authors":"K. Drechsler, C. O. Laura","doi":"10.1109/BIBM.2010.5706609","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706609","url":null,"abstract":"Graphs are useful representations of the liver vas-culature. They support tree matching algorithms in landmark-based registration algorithms, they are useful to separate connected vessels from two different vessel systems and are the basis of vessel annotation tools. In this paper, we propose a hierarchical decomposition of vessel skeletons into sub-branches. This simplifies the process of creating labeled graphs and extracting features. Furthermore, we propose a measure to classify voxels as branch voxels. We applied our method to several datasets with satisfying results and found that the number of sub-branches is normal distributed under rotation.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123718607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706633
Hao Lian, C. Song, Young-Rae Cho
Recent high-throughput experimental methods have generated protein-protein interaction data in the genome scale, called interactome. Various graph clustering algorithms have been applied to the protein interactome networks for identifying protein complexes and predicting functional modules. Although the previous algorithms are scalable and robust, their accuracy is still limited because of complex connectivity of the networks. In this study, we propose a novel information-theoretic definition, Graph Entropy, as a measure of structural complexity of a graph. Loss of graph entropy represents an increase in modularity of the graph. Based on this concept, we present a graph clustering algorithm. Starting from a random seed vertex and its neighbors as a seed cluster, the algorithm iteratively adds or removes vertices on the border of the cluster to minimize graph entropy. We make an additional improvement on the algorithm for generating overlapping clusters. In the experiments with the yeast protein interactome network, we show the graph entropy-based approach has higher accuracy in predicting functional modules than other competing methods.
{"title":"Decomposing protein interactome networks by graph entropy","authors":"Hao Lian, C. Song, Young-Rae Cho","doi":"10.1109/BIBM.2010.5706633","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706633","url":null,"abstract":"Recent high-throughput experimental methods have generated protein-protein interaction data in the genome scale, called interactome. Various graph clustering algorithms have been applied to the protein interactome networks for identifying protein complexes and predicting functional modules. Although the previous algorithms are scalable and robust, their accuracy is still limited because of complex connectivity of the networks. In this study, we propose a novel information-theoretic definition, Graph Entropy, as a measure of structural complexity of a graph. Loss of graph entropy represents an increase in modularity of the graph. Based on this concept, we present a graph clustering algorithm. Starting from a random seed vertex and its neighbors as a seed cluster, the algorithm iteratively adds or removes vertices on the border of the cluster to minimize graph entropy. We make an additional improvement on the algorithm for generating overlapping clusters. In the experiments with the yeast protein interactome network, we show the graph entropy-based approach has higher accuracy in predicting functional modules than other competing methods.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"63 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116259630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706587
Jia Meng, Jianqiu Zhang, Yidong Chen, Yufei Huang
The problem of uncovering transcriptional regulation by transcription factors (TFs) based on microarray data is considered. A novel Bayesian sparse correlated rectified factor model (BSCRFM) coupled with its ICM solution is proposed. BSCRFM models the unknown TF protein level activity, the correlated regulations between TFs, and the sparse nature of TF regulated genes and it admits prior knowledge from existing database regarding TF regulated target genes. An efficient Iterated Conditional Modes (ICM) algorithm is developed, and a maximum a posterior (MAP) solution is calculated from multiple ICM results to avoid the local maximum problem, a context-specific transcriptional regulatory network specific to the experimental condition of the microarray data can then be obtained. The proposed model's ICM algorithm and MAP solution are evaluated on the simulated systems and results demonstrated the validity and effectiveness of the proposed approach. The proposed model is also applied to the breast cancer microarray data and a TF regulated network is obtained.
{"title":"An Iterated Conditional Modes solution for sparse Bayesian factor modeling of transcriptional regulatory networks","authors":"Jia Meng, Jianqiu Zhang, Yidong Chen, Yufei Huang","doi":"10.1109/BIBM.2010.5706587","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706587","url":null,"abstract":"The problem of uncovering transcriptional regulation by transcription factors (TFs) based on microarray data is considered. A novel Bayesian sparse correlated rectified factor model (BSCRFM) coupled with its ICM solution is proposed. BSCRFM models the unknown TF protein level activity, the correlated regulations between TFs, and the sparse nature of TF regulated genes and it admits prior knowledge from existing database regarding TF regulated target genes. An efficient Iterated Conditional Modes (ICM) algorithm is developed, and a maximum a posterior (MAP) solution is calculated from multiple ICM results to avoid the local maximum problem, a context-specific transcriptional regulatory network specific to the experimental condition of the microarray data can then be obtained. The proposed model's ICM algorithm and MAP solution are evaluated on the simulated systems and results demonstrated the validity and effectiveness of the proposed approach. The proposed model is also applied to the breast cancer microarray data and a TF regulated network is obtained.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121528279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706649
Qinmin Hu, Xiangji Huang, Jun Miao
In this paper, we focus on the biomedicine domain to propose a multi-source fusion approach for improving information retrieval performance. First, we consider a common scenario for a metasearch system that has access to multiple baselines with retrieving and ranking documents/passages by their own models. Second, given selected baselines from multiple sources, we employ two modified fusion rules in the proposed approach, reciprocal and combMNZ, to rerank the candidates as the output for evaluation. Third, our empirical study on both 2007 and 2006 genomics data sets demonstrates the viability of the proposed approach to better performance fusion. Fourth, the experimental results show that the reciprocal method provides notable improvements on the individual baseline, especially on the effective passage MAP, the passage2-level and the diversity MAP, the aspect-level.
{"title":"Exploring a multi-source fusion approach for genomics information retrieval","authors":"Qinmin Hu, Xiangji Huang, Jun Miao","doi":"10.1109/BIBM.2010.5706649","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706649","url":null,"abstract":"In this paper, we focus on the biomedicine domain to propose a multi-source fusion approach for improving information retrieval performance. First, we consider a common scenario for a metasearch system that has access to multiple baselines with retrieving and ranking documents/passages by their own models. Second, given selected baselines from multiple sources, we employ two modified fusion rules in the proposed approach, reciprocal and combMNZ, to rerank the candidates as the output for evaluation. Third, our empirical study on both 2007 and 2006 genomics data sets demonstrates the viability of the proposed approach to better performance fusion. Fourth, the experimental results show that the reciprocal method provides notable improvements on the individual baseline, especially on the effective passage MAP, the passage2-level and the diversity MAP, the aspect-level.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125647373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706527
Zejun Zheng, B. Schmidt, G. Bourque
Applications of next-generation sequencing technologies have the potential to bring revolutionary changes to medicine and biology. However, coverage bias can pose a challenge to short read data analysis tools, which rely on high coverage. To address this issue we have developed a support vector machine (SVM) based method for predicting low coverage prone (LCP) regions on a given genome. The developed SVM-based prediction of LCP regions on a given genome can assist data processing procedures based on Illumina sequencing technology, such as de novo sequencing and transcriptome analysis.
{"title":"Prediction of low coverage prone regions for Illumina sequencing projects using a support vector machine","authors":"Zejun Zheng, B. Schmidt, G. Bourque","doi":"10.1109/BIBM.2010.5706527","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706527","url":null,"abstract":"Applications of next-generation sequencing technologies have the potential to bring revolutionary changes to medicine and biology. However, coverage bias can pose a challenge to short read data analysis tools, which rely on high coverage. To address this issue we have developed a support vector machine (SVM) based method for predicting low coverage prone (LCP) regions on a given genome. The developed SVM-based prediction of LCP regions on a given genome can assist data processing procedures based on Illumina sequencing technology, such as de novo sequencing and transcriptome analysis.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121745490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706566
Jo-Yang Lu, E. Chuang, C. K. Hsiao, M. Tsai, L. Lai, Pei-Chun Chen
The risks of relapse for lung adenocarcinoma patients were still higher than 30%, even after complete surgical resections in early stages. Although lots of prognosis studies using genome-wide profiling had been published, biological meaning and interactions among the prognostic genes were poorly understood. Therefore, we developed a novel method integrating gene set enrichment analysis and Cox-hazard regression model to investigate the relations between predefined gene sets and the survival outcome in lung cancer. The method was able to select gene sets associated with the survival outcome, clustering of the prognostic genes sets, and selection of a representative gene set from each cluster. Furthermore, kernel matrix was used to visualize the similarities between those representative gene sets. In addition to survival outcome, our method can also use other continuous variables to explore other biological interpretation concealed in the predefined gene sets.
{"title":"Utilizing Cox regression model to assess the relations between predefined gene sets and the survival outcome of lung adenocarcinoma","authors":"Jo-Yang Lu, E. Chuang, C. K. Hsiao, M. Tsai, L. Lai, Pei-Chun Chen","doi":"10.1109/BIBM.2010.5706566","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706566","url":null,"abstract":"The risks of relapse for lung adenocarcinoma patients were still higher than 30%, even after complete surgical resections in early stages. Although lots of prognosis studies using genome-wide profiling had been published, biological meaning and interactions among the prognostic genes were poorly understood. Therefore, we developed a novel method integrating gene set enrichment analysis and Cox-hazard regression model to investigate the relations between predefined gene sets and the survival outcome in lung cancer. The method was able to select gene sets associated with the survival outcome, clustering of the prognostic genes sets, and selection of a representative gene set from each cluster. Furthermore, kernel matrix was used to visualize the similarities between those representative gene sets. In addition to survival outcome, our method can also use other continuous variables to explore other biological interpretation concealed in the predefined gene sets.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132857419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706578
T. Lu, L. Lai, C. K. Hsiao, Pei-Chun Chen, M. Tsai, E. Chuang
Lung cancer has been one of the major causes of cancer-related death worldwide. To predict survival outcomes of lung cancer patients, many prognosis gene sets were identified by using gene expression microarrays. However, these gene sets were often inconsistent across independent cohorts. To identify genes with more consistency, we combined gene expression and copy number variations (CNVs). Affymetrix SNP 6.0 and u133plus2.0 microarrays were performed on 42 pairs of lung adenocarcinoma patients. The copy number varied regions (CNVR) existed in more than 30% samples were identified and 475 differentially expressed genes with concordant changes were selected for pathway analysis. Thirteen pathways were significantly enriched among the 475 CNV-associated genes, and survival analyses showed these pathways had generally consistent and significant prediction probabilities across three independent microarray studies. Therefore, integration between gene expression and copy number may help to lower false discovery rate and identify genes used to predict survival outcomes.
肺癌一直是全球癌症相关死亡的主要原因之一。为了预测肺癌患者的生存结果,许多预后基因集通过基因表达芯片被鉴定出来。然而,这些基因集在独立的队列中往往不一致。为了鉴定一致性更高的基因,我们将基因表达和拷贝数变异(CNVs)结合起来。对42对肺腺癌患者进行Affymetrix SNP 6.0和u133plus2.0微阵列检测。鉴定了30%以上的样本中存在拷贝数变化区(拷贝数变化区,CNVR),并选择475个具有一致性变化的差异表达基因进行通路分析。在475个cnv相关基因中,有13个通路显著富集,生存分析表明,这些通路在三个独立的微阵列研究中具有普遍一致和显著的预测概率。因此,整合基因表达和拷贝数可能有助于降低错误发现率和识别用于预测生存结果的基因。
{"title":"Concurrent analysis of copy number variations and expression profiles to identify genes associated with tumorigenesis and survival outcome in lung adenocarcinoma","authors":"T. Lu, L. Lai, C. K. Hsiao, Pei-Chun Chen, M. Tsai, E. Chuang","doi":"10.1109/BIBM.2010.5706578","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706578","url":null,"abstract":"Lung cancer has been one of the major causes of cancer-related death worldwide. To predict survival outcomes of lung cancer patients, many prognosis gene sets were identified by using gene expression microarrays. However, these gene sets were often inconsistent across independent cohorts. To identify genes with more consistency, we combined gene expression and copy number variations (CNVs). Affymetrix SNP 6.0 and u133plus2.0 microarrays were performed on 42 pairs of lung adenocarcinoma patients. The copy number varied regions (CNVR) existed in more than 30% samples were identified and 475 differentially expressed genes with concordant changes were selected for pathway analysis. Thirteen pathways were significantly enriched among the 475 CNV-associated genes, and survival analyses showed these pathways had generally consistent and significant prediction probabilities across three independent microarray studies. Therefore, integration between gene expression and copy number may help to lower false discovery rate and identify genes used to predict survival outcomes.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"44 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128870111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706550
I. Liu, Yu-Shu Lo, Jinn-Moon Yang
Class-I major histocompatibility complex (MHC), peptide, and T-cell receptor (TCR) play an essential role of adaptive immune responses. Many prediction servers are available for identification of peptides that bind to MHC class I molecules. These servers are often lack of detailed interacting residues and binding models for analyzing MHC-peptide-TCR interaction mechanisms. This study numerously enhanced the template-based scoring function derived from protein-protein interactions for identifying MHC-peptide-TCR binding models. The scoring function considers both the template similarity and interacting force to ensure the statistically significant interface similarity between the peptide candidates and structure templates. The result shows that our scoring function is comparative to the public websites for identifying MHC binding peptides. Our model, considering both the MHC-peptide and peptide-TCR interfaces, is able to provide visualization and the biological insights of MHC-peptide-TCR binding models. We believe that our model is useful for the development of peptide-based vaccines.
{"title":"Template-based scoring functions for visualizing biological insights of H-2Kb-peptide-TCR complexes","authors":"I. Liu, Yu-Shu Lo, Jinn-Moon Yang","doi":"10.1109/BIBM.2010.5706550","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706550","url":null,"abstract":"Class-I major histocompatibility complex (MHC), peptide, and T-cell receptor (TCR) play an essential role of adaptive immune responses. Many prediction servers are available for identification of peptides that bind to MHC class I molecules. These servers are often lack of detailed interacting residues and binding models for analyzing MHC-peptide-TCR interaction mechanisms. This study numerously enhanced the template-based scoring function derived from protein-protein interactions for identifying MHC-peptide-TCR binding models. The scoring function considers both the template similarity and interacting force to ensure the statistically significant interface similarity between the peptide candidates and structure templates. The result shows that our scoring function is comparative to the public websites for identifying MHC binding peptides. Our model, considering both the MHC-peptide and peptide-TCR interfaces, is able to provide visualization and the biological insights of MHC-peptide-TCR binding models. We believe that our model is useful for the development of peptide-based vaccines.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115235352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706533
Ke Chen, M. Mizianty, Lukasz Kurgan
ATP is a ubiquitous nucleotide that provides energy for cellular activities, catalyzes chemical reactions, and is involved in cellular signaling. The knowledge of the ATP-protein interactions helps with annotation of protein functions and finds applications in drug design. We propose a high-throughput machine learning-based predictor, ATPsite, which identifies ATP-binding residues from protein sequences. Statistical tests show that ATPsite significantly outperforms existing ATPint predictor and other solutions which utilize sequence alignment and residue conservation scoring. The improvements stem from the usage of novel custom-designed input features that are based on the sequence, evolutionary profiles, and the sequence-predicted structural descriptors including secondary structure, solvent accessibility, and dihedral angles. A simple consensus of the ATPsite with the sequence-alignment based predictor is shown to give further improvements.
{"title":"Accurate prediction of ATP-binding residues using sequence and sequence-derived structural descriptors","authors":"Ke Chen, M. Mizianty, Lukasz Kurgan","doi":"10.1109/BIBM.2010.5706533","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706533","url":null,"abstract":"ATP is a ubiquitous nucleotide that provides energy for cellular activities, catalyzes chemical reactions, and is involved in cellular signaling. The knowledge of the ATP-protein interactions helps with annotation of protein functions and finds applications in drug design. We propose a high-throughput machine learning-based predictor, ATPsite, which identifies ATP-binding residues from protein sequences. Statistical tests show that ATPsite significantly outperforms existing ATPint predictor and other solutions which utilize sequence alignment and residue conservation scoring. The improvements stem from the usage of novel custom-designed input features that are based on the sequence, evolutionary profiles, and the sequence-predicted structural descriptors including secondary structure, solvent accessibility, and dihedral angles. A simple consensus of the ATPsite with the sequence-alignment based predictor is shown to give further improvements.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129706115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}