Background: Spleen tyrosine kinase (SYK) is a protein related to various diseases. Aberrant SYK expression often causes the progression and initiation of several diseases including cancer and autoimmune diseases. Despite the importance of inhibiting SYK and identifying candidate inhibitors, no clinically effective inhibitors have been reported to date. Therefore, there is a need for novel SYK inhibitors. Results: Candidate compounds were investigated using in silico screening by chooseLD, which simulates ligand docking to proteins. Using this system, known inhibitors were correctly recognized as compounds with high affinity to SYK. Furthermore, many compounds in the DrugBank database were newly identified as having high affinity to the ATP-binding sites in the kinase domain with a similar affinity to previously reported inhibitors. Conclusions: Many drug candidate compounds from the DrugBank database were newly identified as inhibitors of SYK. Because compounds registered in the DrugBank are expected to have fewer side effects than currently available compounds, these newly identified compounds may be clinically useful inhibitors of SYK for the treatment of various diseases.
{"title":"In silico Spleen Tyrosine Kinase Inhibitor Screening by ChooseLD","authors":"H. Umeyama, M. Iwadate, Y-h. Taguchi","doi":"10.2197/IPSJTBIO.8.14","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.8.14","url":null,"abstract":"Background: Spleen tyrosine kinase (SYK) is a protein related to various diseases. Aberrant SYK expression often causes the progression and initiation of several diseases including cancer and autoimmune diseases. Despite the importance of inhibiting SYK and identifying candidate inhibitors, no clinically effective inhibitors have been reported to date. Therefore, there is a need for novel SYK inhibitors. Results: Candidate compounds were investigated using in silico screening by chooseLD, which simulates ligand docking to proteins. Using this system, known inhibitors were correctly recognized as compounds with high affinity to SYK. Furthermore, many compounds in the DrugBank database were newly identified as having high affinity to the ATP-binding sites in the kinase domain with a similar affinity to previously reported inhibitors. Conclusions: Many drug candidate compounds from the DrugBank database were newly identified as inhibitors of SYK. Because compounds registered in the DrugBank are expected to have fewer side effects than currently available compounds, these newly identified compounds may be clinically useful inhibitors of SYK for the treatment of various diseases.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"8 1","pages":"14-20"},"PeriodicalIF":0.0,"publicationDate":"2015-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.8.14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuki Endo, Fubito Toyama, C. Chiba, H. Mori, K. Shoji
: Sequencing the whole genome of various species has many applications, not only in understanding bio- logical systems, but also in medicine, pharmacy, and agriculture. In recent years, the emergence of high-throughput next generation sequencing technologies has dramatically reduced the time and costs for whole genome sequencing. These new technologies provide ultrahigh throughput with a lower per-unit data cost. However, the data are generated from very short fragments of DNA. Thus, it is very important to develop algorithms for merging these fragments. One method of merging these fragments without using a reference dataset is called de novo assembly. Many algorithms for de novo assembly have been proposed in recent years. Velvet and SOAPdenovo2 are well-known assembly algorithms, which have good performance in terms of memory and time consumption. However, memory consumption increases dramatically when the size of input fragments is larger. Therefore, it is necessary to develop an alternative algorithm with low memory usage. In this paper, we propose an algorithm for de novo assembly with lower memory. In our experiments using E.coli K-12 strain MG 1655 and human chromosome 14, the memory consumption of our proposed algorithm was less than that of other popular assemblers.
对各种物种的全基因组进行测序不仅在理解生物系统方面有许多应用,而且在医学、制药和农业方面也有许多应用。近年来,高通量下一代测序技术的出现大大降低了全基因组测序的时间和成本。这些新技术以更低的单位数据成本提供了超高的吞吐量。然而,这些数据是由非常短的DNA片段生成的。因此,开发融合这些碎片的算法是非常重要的。在不使用参考数据集的情况下合并这些片段的一种方法称为de novo assembly。近年来提出了许多新的从头组装算法。Velvet和SOAPdenovo2是众所周知的汇编算法,它们在内存和时间消耗方面具有良好的性能。但是,当输入片段的大小较大时,内存消耗会急剧增加。因此,有必要开发一种低内存占用的替代算法。本文提出了一种低内存的从头组装算法。在大肠杆菌K-12菌株MG 1655和人类14号染色体的实验中,我们提出的算法的内存消耗低于其他流行的汇编程序。
{"title":"A Memory Efficient Short Read De Novo Assembly Algorithm","authors":"Yuki Endo, Fubito Toyama, C. Chiba, H. Mori, K. Shoji","doi":"10.2197/IPSJTBIO.8.2","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.8.2","url":null,"abstract":": Sequencing the whole genome of various species has many applications, not only in understanding bio- logical systems, but also in medicine, pharmacy, and agriculture. In recent years, the emergence of high-throughput next generation sequencing technologies has dramatically reduced the time and costs for whole genome sequencing. These new technologies provide ultrahigh throughput with a lower per-unit data cost. However, the data are generated from very short fragments of DNA. Thus, it is very important to develop algorithms for merging these fragments. One method of merging these fragments without using a reference dataset is called de novo assembly. Many algorithms for de novo assembly have been proposed in recent years. Velvet and SOAPdenovo2 are well-known assembly algorithms, which have good performance in terms of memory and time consumption. However, memory consumption increases dramatically when the size of input fragments is larger. Therefore, it is necessary to develop an alternative algorithm with low memory usage. In this paper, we propose an algorithm for de novo assembly with lower memory. In our experiments using E.coli K-12 strain MG 1655 and human chromosome 14, the memory consumption of our proposed algorithm was less than that of other popular assemblers.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"8 1","pages":"2-8"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.8.2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuuichi Nakano, M. Iwadate, H. Umeyama, Y-h. Taguchi
: Type III secretion system (T3SS) e ff ector protein is a part of bacterial secretion systems. T3SS exists in the pathogenic and symbiotic bacteria. How the T3SS e ff ector proteins in these two classes di ff er from each other should be interesting. In this paper, we successfully discriminated T3SS e ff ector proteins between plant pathogenic, animal pathogenic and plant symbiotic bacteria based on feature vectors inferred computationally by Yahara et al. only from amino acid sequences. This suggests that these three classes of bacteria employ distinct T3SS e ff ector proteins. We also hypothesized that the feature vector proposed by Yahara et al. represents protein structure, possibly protein folds defined in Structural Classification of Proteins (SCOP) database.
III型分泌系统(T3SS) e - ff载体蛋白是细菌分泌系统的一部分。T3SS存在于病原菌和共生菌中。这两类中的T3SS - e载体蛋白是如何相互区别的应该是有趣的。本文基于Yahara等人仅从氨基酸序列计算推断的特征向量,成功区分了植物病原菌、动物病原菌和植物共生菌之间的T3SS e - ff载体蛋白。这表明这三类细菌使用不同的T3SS e载体蛋白。我们还假设Yahara等人提出的特征向量代表蛋白质结构,可能是蛋白质结构分类(SCOP)数据库中定义的蛋白质折叠。
{"title":"Bacterial Type III Secretion System Effector Proteins are Distinct between Plant Symbiotic, Plant Pathogenic and Animal Pathogenic Bacteria","authors":"Yuuichi Nakano, M. Iwadate, H. Umeyama, Y-h. Taguchi","doi":"10.2197/IPSJTBIO.7.2","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.7.2","url":null,"abstract":": Type III secretion system (T3SS) e ff ector protein is a part of bacterial secretion systems. T3SS exists in the pathogenic and symbiotic bacteria. How the T3SS e ff ector proteins in these two classes di ff er from each other should be interesting. In this paper, we successfully discriminated T3SS e ff ector proteins between plant pathogenic, animal pathogenic and plant symbiotic bacteria based on feature vectors inferred computationally by Yahara et al. only from amino acid sequences. This suggests that these three classes of bacteria employ distinct T3SS e ff ector proteins. We also hypothesized that the feature vector proposed by Yahara et al. represents protein structure, possibly protein folds defined in Structural Classification of Proteins (SCOP) database.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"7 1","pages":"2-15"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.7.2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68503235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The inference of genetic networks is a problem to obtain mathematical models that can explain observed time-series of gene expression levels. A number of models have been proposed to describe genetic networks. The S-system model is one of the most studied models among them. Due to its advantageous features, numerous inference algorithms based on the S-system model have been proposed. The number of the parameters in the S-system model is however larger than those of the other well-studied models. Therefore, when trying to infer S-system models of genetic networks, we need to provide a larger amount of gene expression data to the inference method. In order to reduce the amount of gene expression data required for an inference of genetic networks, this study simplifies the S-system model by fixing some of its parameters to 0. In this study, we call this simplified S-system model a reduced S-system model. We then propose a new inference method that estimates the parameters of the reduced S-system model by minimizing two-dimensional functions. Finally, we check the effectiveness of the proposed method through numerical experiments on artificial and actual genetic network inference problems.
{"title":"An Effective Method for the Inference of Reduced S-system Models of Genetic Networks","authors":"Shuhei Kimura, Masanao Sato, M. Okada‐Hatakeyama","doi":"10.2197/IPSJTBIO.7.30","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.7.30","url":null,"abstract":"The inference of genetic networks is a problem to obtain mathematical models that can explain observed time-series of gene expression levels. A number of models have been proposed to describe genetic networks. The S-system model is one of the most studied models among them. Due to its advantageous features, numerous inference algorithms based on the S-system model have been proposed. The number of the parameters in the S-system model is however larger than those of the other well-studied models. Therefore, when trying to infer S-system models of genetic networks, we need to provide a larger amount of gene expression data to the inference method. In order to reduce the amount of gene expression data required for an inference of genetic networks, this study simplifies the S-system model by fixing some of its parameters to 0. In this study, we call this simplified S-system model a reduced S-system model. We then propose a new inference method that estimates the parameters of the reduced S-system model by minimizing two-dimensional functions. Finally, we check the effectiveness of the proposed method through numerical experiments on artificial and actual genetic network inference problems.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"7 1","pages":"30-38"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.7.30","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68503380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shin-ichi Utsunomiya, Yuichiro Fujita, Satoshi Tanaka, Shigeki Kajihara, K. Aoshima, Y. Oda, Koichi Tanaka
Mass++ is free platform software for mass spectrometry, mainly developed for biological science, with which users can construct their own functions or workflows for use as plug-ins. In this paper, we present an algorithm development example using Mass++ that performs a new baseline subtraction method. A signal processing technique previously developed to correct the atmospheric substances in infrared spectroscopy was converted to adjust to the mass spectrum baseline estimation, and a new method called Bottom Line Tracing (BLT) was constructed. BLT can estimate a suitable baseline for a mass spectrum with rapid changes in its waveform with easy parameter tuning. We confirm that it is beneficial to utilize techniques or knowledge acquired in another field to obtain a better solution for a problem, and that the practical barriers to algorithm development and distribution will be considerably reduced by platform software like Mass++.
{"title":"Signal Processing Algorithm Development for Mass++ (Ver. 2): Platform Software for Mass Spectrometry","authors":"Shin-ichi Utsunomiya, Yuichiro Fujita, Satoshi Tanaka, Shigeki Kajihara, K. Aoshima, Y. Oda, Koichi Tanaka","doi":"10.2197/IPSJTBIO.7.24","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.7.24","url":null,"abstract":"Mass++ is free platform software for mass spectrometry, mainly developed for biological science, with which users can construct their own functions or workflows for use as plug-ins. In this paper, we present an algorithm development example using Mass++ that performs a new baseline subtraction method. A signal processing technique previously developed to correct the atmospheric substances in infrared spectroscopy was converted to adjust to the mass spectrum baseline estimation, and a new method called Bottom Line Tracing (BLT) was constructed. BLT can estimate a suitable baseline for a mass spectrum with rapid changes in its waveform with easy parameter tuning. We confirm that it is beneficial to utilize techniques or knowledge acquired in another field to obtain a better solution for a problem, and that the practical barriers to algorithm development and distribution will be considerably reduced by platform software like Mass++.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"7 1","pages":"24-29"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.7.24","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68503329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yeondae Kwon, Shogo Shimizu, H. Sugawara, S. Miyazaki
Identification of candidate target genes related to a particular disease is an important stage in drug development. A number of studies have extracted disease-related genes from the biomedical literature. We herein present a novel evaluation measure that identifies disease-associated genes and prioritizes the identified genes as drug target genes in terms of fewer side-effects using the biomedical literature. The proposed measure evaluates the specificity of a gene to a particular disease based on the number of diseases associated with the gene. The specificity of a gene is measured by means of, for example, term frequency-inverse document frequency (tf-idf), which is widely used in Web information retrieval. We assume that if a gene is chosen as a target gene for a disease, then side-effects are more likely to occur as the number of diseases associated with the gene increases. We verified the obtained ranking results by checking the ranks of known drug targets. As a result, 177 known drug targets were found to be ranked within the top 100 genes, and 21 drug targets were top ranked. The results suggest that the proposed measure is useful as a primary filter for extracting candidate target genes from a large number of genes.
{"title":"A novel evaluation measure for identifying drug targets from the biomedical literature","authors":"Yeondae Kwon, Shogo Shimizu, H. Sugawara, S. Miyazaki","doi":"10.2197/IPSJTBIO.7.16","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.7.16","url":null,"abstract":"Identification of candidate target genes related to a particular disease is an important stage in drug development. A number of studies have extracted disease-related genes from the biomedical literature. We herein present a novel evaluation measure that identifies disease-associated genes and prioritizes the identified genes as drug target genes in terms of fewer side-effects using the biomedical literature. The proposed measure evaluates the specificity of a gene to a particular disease based on the number of diseases associated with the gene. The specificity of a gene is measured by means of, for example, term frequency-inverse document frequency (tf-idf), which is widely used in Web information retrieval. We assume that if a gene is chosen as a target gene for a disease, then side-effects are more likely to occur as the number of diseases associated with the gene increases. We verified the obtained ranking results by checking the ranks of known drug targets. As a result, 177 known drug targets were found to be ranked within the top 100 genes, and 21 drug targets were top ranked. The results suggest that the proposed measure is useful as a primary filter for extracting candidate target genes from a large number of genes.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"121 1","pages":"16-23"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.7.16","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68503185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NegFinder: A Web Service for Identifying Negation Signals and Their Scopes","authors":"Kazuki Fujikawa, Kazuhiro Seki, K. Uehara","doi":"10.2197/IPSJTBIO.6.29","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.6.29","url":null,"abstract":"","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"6 1","pages":"29-34"},"PeriodicalIF":0.0,"publicationDate":"2013-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.6.29","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior *1.
在本文中,我们提出了一种新的方法,命名为SCPSSMpred (Smoothed and Condensed PSSM based prediction),它使用一个简化的位置特异性评分矩阵(PSSM)来预测配体结合位点。虽然简化后的PSSM只有十个维度,但它结合了丰富的特征,如氨基酸排列、邻近残基信息、物理化学性质和进化信息等。我们的方法不使用其他分类器的预测结果作为输入,即该方法中使用的所有特征仅从序列中提取。用三种配体(FAD, NAD和ATP)验证了我们方法的通用性,并对三种替代的传统方法进行了比较分析。所有方法均在残基水平和蛋白序列水平上进行了试验。实验结果表明,SCPSSMpred方法除将PSSM中的冗余特征减少50%外,还取得了最佳性能。此外,在蛋白质序列水平上,与其他方法相比,该方法在处理不平衡数据方面表现出了显著的适应性。这项研究不仅证明了减少PSSM中冗余特征的重要性,而且还确定了配体结合位点的序列衍生标志,使得邻近残基的排列和物理化学性质显著影响配体结合行为*1。
{"title":"SCPSSMpred: A General Sequence-based Method for Ligand-binding Site Prediction","authors":"Chun Fang, T. Noguchi, H. Yamana","doi":"10.2197/IPSJTBIO.6.35","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.6.35","url":null,"abstract":"In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior *1.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"6 1","pages":"35-42"},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.6.35","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein-ligand interaction prediction plays an important role in drug design and discovery. However, wet lab procedures are inherently time consuming and expensive due to the vast number of candidate compounds and target genes. Hence, computational approaches became imperative and have become popular due to their promising results and practicality. Such methods require high accuracy and precision outputs for them to be useful, thus, the problem of devising such an algorithm remains very challenging. In this paper we propose an algorithm employing both support vector machines (SVM) and an extension of canonical correlation analysis (CCA). Following assumptions of recent chemogenomic approaches, we explore the effects of incorporating bias on similarity of compounds. We introduce kernel weighted CCA as a means of uncovering any underlying relationship between similarity of ligands and known ligands of target proteins. Experimental results indicate statistically significant improvement in the area under the ROC curve (AUC) and F-measure values obtained as opposed to those gathered when only SVM, or SVM with kernel CCA is employed, which translates to better quality of prediction.
{"title":"Improved Protein-ligand Prediction Using Kernel Weighted Canonical Correlation Analysis","authors":"Raissa Relator, Tsuyoshi Kato, Richard S. Lemence","doi":"10.2197/IPSJTBIO.6.18","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.6.18","url":null,"abstract":"Protein-ligand interaction prediction plays an important role in drug design and discovery. However, wet lab procedures are inherently time consuming and expensive due to the vast number of candidate compounds and target genes. Hence, computational approaches became imperative and have become popular due to their promising results and practicality. Such methods require high accuracy and precision outputs for them to be useful, thus, the problem of devising such an algorithm remains very challenging. In this paper we propose an algorithm employing both support vector machines (SVM) and an extension of canonical correlation analysis (CCA). Following assumptions of recent chemogenomic approaches, we explore the effects of incorporating bias on similarity of compounds. We introduce kernel weighted CCA as a means of uncovering any underlying relationship between similarity of ligands and known ligands of target proteins. Experimental results indicate statistically significant improvement in the area under the ROC curve (AUC) and F-measure values obtained as opposed to those gathered when only SVM, or SVM with kernel CCA is employed, which translates to better quality of prediction.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"6 1","pages":"18-28"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.6.18","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junko Sato, Kouji Kozaki, Susumu Handa, Takashi Ikeda, Ryotaro Saka, K. Tomizuka, Y. Nishiyama, Toshiyuki Okumura, S. Hirai, Tadashi Ohno, Mamoru Ohta, S. Date, Haruki Nakamura
{"title":"Protein Experimental Information Management System (PREIMS) Based on Ontology: Development and Applications","authors":"Junko Sato, Kouji Kozaki, Susumu Handa, Takashi Ikeda, Ryotaro Saka, K. Tomizuka, Y. Nishiyama, Toshiyuki Okumura, S. Hirai, Tadashi Ohno, Mamoru Ohta, S. Date, Haruki Nakamura","doi":"10.2197/IPSJTBIO.6.9","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.6.9","url":null,"abstract":"","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"6 1","pages":"9-17"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.6.9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68503172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}