IPSJ Transactions on Bioinformatics最新文献

英文中文

In silico Spleen Tyrosine Kinase Inhibitor Screening by ChooseLD 筛选脾脏酪氨酸激酶抑制剂

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2015-03-13 DOI: 10.2197/IPSJTBIO.8.14

H. Umeyama, M. Iwadate, Y-h. Taguchi

Background: Spleen tyrosine kinase (SYK) is a protein related to various diseases. Aberrant SYK expression often causes the progression and initiation of several diseases including cancer and autoimmune diseases. Despite the importance of inhibiting SYK and identifying candidate inhibitors, no clinically effective inhibitors have been reported to date. Therefore, there is a need for novel SYK inhibitors. Results: Candidate compounds were investigated using in silico screening by chooseLD, which simulates ligand docking to proteins. Using this system, known inhibitors were correctly recognized as compounds with high affinity to SYK. Furthermore, many compounds in the DrugBank database were newly identified as having high affinity to the ATP-binding sites in the kinase domain with a similar affinity to previously reported inhibitors. Conclusions: Many drug candidate compounds from the DrugBank database were newly identified as inhibitors of SYK. Because compounds registered in the DrugBank are expected to have fewer side effects than currently available compounds, these newly identified compounds may be clinically useful inhibitors of SYK for the treatment of various diseases.

背景:脾酪氨酸激酶(SYK)是一种与多种疾病相关的蛋白。异常的SYK表达经常导致包括癌症和自身免疫性疾病在内的几种疾病的进展和开始。尽管抑制SYK和确定候选抑制剂很重要，但迄今为止还没有临床有效的抑制剂报道。因此，需要新的SYK抑制剂。结果:通过模拟配体与蛋白质对接的chooseLD方法对候选化合物进行了筛选。使用该系统，已知的抑制剂被正确识别为与SYK具有高亲和力的化合物。此外，DrugBank数据库中的许多化合物被新发现与激酶结构域的atp结合位点具有高亲和力，与先前报道的抑制剂具有相似的亲和力。结论:从DrugBank数据库中发现了许多新的SYK抑制剂候选药物。由于在DrugBank中注册的化合物比现有的化合物具有更少的副作用，这些新发现的化合物可能是临床上有用的SYK抑制剂，用于治疗各种疾病。

{"title":"In silico Spleen Tyrosine Kinase Inhibitor Screening by ChooseLD","authors":"H. Umeyama, M. Iwadate, Y-h. Taguchi","doi":"10.2197/IPSJTBIO.8.14","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.8.14","url":null,"abstract":"Background: Spleen tyrosine kinase (SYK) is a protein related to various diseases. Aberrant SYK expression often causes the progression and initiation of several diseases including cancer and autoimmune diseases. Despite the importance of inhibiting SYK and identifying candidate inhibitors, no clinically effective inhibitors have been reported to date. Therefore, there is a need for novel SYK inhibitors. Results: Candidate compounds were investigated using in silico screening by chooseLD, which simulates ligand docking to proteins. Using this system, known inhibitors were correctly recognized as compounds with high affinity to SYK. Furthermore, many compounds in the DrugBank database were newly identified as having high affinity to the ATP-binding sites in the kinase domain with a similar affinity to previously reported inhibitors. Conclusions: Many drug candidate compounds from the DrugBank database were newly identified as inhibitors of SYK. Because compounds registered in the DrugBank are expected to have fewer side effects than currently available compounds, these newly identified compounds may be clinically useful inhibitors of SYK for the treatment of various diseases.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"8 1","pages":"14-20"},"PeriodicalIF":0.0,"publicationDate":"2015-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.8.14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Memory Efficient Short Read De Novo Assembly Algorithm 一种内存高效的短读从头组装算法

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2015-01-01 DOI: 10.2197/IPSJTBIO.8.2

Yuki Endo, Fubito Toyama, C. Chiba, H. Mori, K. Shoji

: Sequencing the whole genome of various species has many applications, not only in understanding bio- logical systems, but also in medicine, pharmacy, and agriculture. In recent years, the emergence of high-throughput next generation sequencing technologies has dramatically reduced the time and costs for whole genome sequencing. These new technologies provide ultrahigh throughput with a lower per-unit data cost. However, the data are generated from very short fragments of DNA. Thus, it is very important to develop algorithms for merging these fragments. One method of merging these fragments without using a reference dataset is called de novo assembly. Many algorithms for de novo assembly have been proposed in recent years. Velvet and SOAPdenovo2 are well-known assembly algorithms, which have good performance in terms of memory and time consumption. However, memory consumption increases dramatically when the size of input fragments is larger. Therefore, it is necessary to develop an alternative algorithm with low memory usage. In this paper, we propose an algorithm for de novo assembly with lower memory. In our experiments using E.coli K-12 strain MG 1655 and human chromosome 14, the memory consumption of our proposed algorithm was less than that of other popular assemblers.

对各种物种的全基因组进行测序不仅在理解生物系统方面有许多应用，而且在医学、制药和农业方面也有许多应用。近年来，高通量下一代测序技术的出现大大降低了全基因组测序的时间和成本。这些新技术以更低的单位数据成本提供了超高的吞吐量。然而，这些数据是由非常短的DNA片段生成的。因此，开发融合这些碎片的算法是非常重要的。在不使用参考数据集的情况下合并这些片段的一种方法称为de novo assembly。近年来提出了许多新的从头组装算法。Velvet和SOAPdenovo2是众所周知的汇编算法，它们在内存和时间消耗方面具有良好的性能。但是，当输入片段的大小较大时，内存消耗会急剧增加。因此，有必要开发一种低内存占用的替代算法。本文提出了一种低内存的从头组装算法。在大肠杆菌K-12菌株MG 1655和人类14号染色体的实验中，我们提出的算法的内存消耗低于其他流行的汇编程序。

{"title":"A Memory Efficient Short Read De Novo Assembly Algorithm","authors":"Yuki Endo, Fubito Toyama, C. Chiba, H. Mori, K. Shoji","doi":"10.2197/IPSJTBIO.8.2","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.8.2","url":null,"abstract":": Sequencing the whole genome of various species has many applications, not only in understanding bio- logical systems, but also in medicine, pharmacy, and agriculture. In recent years, the emergence of high-throughput next generation sequencing technologies has dramatically reduced the time and costs for whole genome sequencing. These new technologies provide ultrahigh throughput with a lower per-unit data cost. However, the data are generated from very short fragments of DNA. Thus, it is very important to develop algorithms for merging these fragments. One method of merging these fragments without using a reference dataset is called de novo assembly. Many algorithms for de novo assembly have been proposed in recent years. Velvet and SOAPdenovo2 are well-known assembly algorithms, which have good performance in terms of memory and time consumption. However, memory consumption increases dramatically when the size of input fragments is larger. Therefore, it is necessary to develop an alternative algorithm with low memory usage. In this paper, we propose an algorithm for de novo assembly with lower memory. In our experiments using E.coli K-12 strain MG 1655 and human chromosome 14, the memory consumption of our proposed algorithm was less than that of other popular assemblers.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"8 1","pages":"2-8"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.8.2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bacterial Type III Secretion System Effector Proteins are Distinct between Plant Symbiotic, Plant Pathogenic and Animal Pathogenic Bacteria 细菌III型分泌系统效应蛋白区别于植物共生菌、植物致病菌和动物致病菌

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2014-01-01 DOI: 10.2197/IPSJTBIO.7.2

Yuuichi Nakano, M. Iwadate, H. Umeyama, Y-h. Taguchi

: Type III secretion system (T3SS) e ﬀ ector protein is a part of bacterial secretion systems. T3SS exists in the pathogenic and symbiotic bacteria. How the T3SS e ﬀ ector proteins in these two classes di ﬀ er from each other should be interesting. In this paper, we successfully discriminated T3SS e ﬀ ector proteins between plant pathogenic, animal pathogenic and plant symbiotic bacteria based on feature vectors inferred computationally by Yahara et al. only from amino acid sequences. This suggests that these three classes of bacteria employ distinct T3SS e ﬀ ector proteins. We also hypothesized that the feature vector proposed by Yahara et al. represents protein structure, possibly protein folds deﬁned in Structural Classiﬁcation of Proteins (SCOP) database.

III型分泌系统(T3SS) e - ff载体蛋白是细菌分泌系统的一部分。T3SS存在于病原菌和共生菌中。这两类中的T3SS - e载体蛋白是如何相互区别的应该是有趣的。本文基于Yahara等人仅从氨基酸序列计算推断的特征向量，成功区分了植物病原菌、动物病原菌和植物共生菌之间的T3SS e - ff载体蛋白。这表明这三类细菌使用不同的T3SS e载体蛋白。我们还假设Yahara等人提出的特征向量代表蛋白质结构，可能是蛋白质结构分类(SCOP)数据库中定义的蛋白质折叠。

引用次数: 1

An Effective Method for the Inference of Reduced S-system Models of Genetic Networks 遗传网络简化s系统模型的一种有效推理方法

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2014-01-01 DOI: 10.2197/IPSJTBIO.7.30

Shuhei Kimura, Masanao Sato, M. Okada‐Hatakeyama

The inference of genetic networks is a problem to obtain mathematical models that can explain observed time-series of gene expression levels. A number of models have been proposed to describe genetic networks. The S-system model is one of the most studied models among them. Due to its advantageous features, numerous inference algorithms based on the S-system model have been proposed. The number of the parameters in the S-system model is however larger than those of the other well-studied models. Therefore, when trying to infer S-system models of genetic networks, we need to provide a larger amount of gene expression data to the inference method. In order to reduce the amount of gene expression data required for an inference of genetic networks, this study simplifies the S-system model by fixing some of its parameters to 0. In this study, we call this simplified S-system model a reduced S-system model. We then propose a new inference method that estimates the parameters of the reduced S-system model by minimizing two-dimensional functions. Finally, we check the effectiveness of the proposed method through numerical experiments on artificial and actual genetic network inference problems.

遗传网络的推理是一个问题，以获得数学模型，可以解释观察到的基因表达水平的时间序列。人们提出了许多模型来描述遗传网络。s系统模型是其中研究最多的模型之一。由于S-system模型的优点，已经提出了许多基于S-system模型的推理算法。然而，s系统模型中的参数数量比其他研究充分的模型要大。因此，在试图推断遗传网络的s系统模型时，我们需要为推理方法提供更大量的基因表达数据。为了减少遗传网络推断所需的基因表达数据量，本研究通过将s系统模型的一些参数固定为0来简化s系统模型。在本研究中，我们将这种简化的s系统模型称为简化s系统模型。然后，我们提出了一种新的推理方法，通过最小化二维函数来估计约简s系统模型的参数。最后，通过人工和实际遗传网络推理问题的数值实验验证了所提方法的有效性。

{"title":"An Effective Method for the Inference of Reduced S-system Models of Genetic Networks","authors":"Shuhei Kimura, Masanao Sato, M. Okada‐Hatakeyama","doi":"10.2197/IPSJTBIO.7.30","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.7.30","url":null,"abstract":"The inference of genetic networks is a problem to obtain mathematical models that can explain observed time-series of gene expression levels. A number of models have been proposed to describe genetic networks. The S-system model is one of the most studied models among them. Due to its advantageous features, numerous inference algorithms based on the S-system model have been proposed. The number of the parameters in the S-system model is however larger than those of the other well-studied models. Therefore, when trying to infer S-system models of genetic networks, we need to provide a larger amount of gene expression data to the inference method. In order to reduce the amount of gene expression data required for an inference of genetic networks, this study simplifies the S-system model by fixing some of its parameters to 0. In this study, we call this simplified S-system model a reduced S-system model. We then propose a new inference method that estimates the parameters of the reduced S-system model by minimizing two-dimensional functions. Finally, we check the effectiveness of the proposed method through numerical experiments on artificial and actual genetic network inference problems.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"7 1","pages":"30-38"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.7.30","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68503380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Signal Processing Algorithm Development for Mass++ (Ver. 2): Platform Software for Mass Spectrometry mass++信号处理算法开发(版本2):质谱分析平台软件

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2014-01-01 DOI: 10.2197/IPSJTBIO.7.24

Shin-ichi Utsunomiya, Yuichiro Fujita, Satoshi Tanaka, Shigeki Kajihara, K. Aoshima, Y. Oda, Koichi Tanaka

Mass++ is free platform software for mass spectrometry, mainly developed for biological science, with which users can construct their own functions or workflows for use as plug-ins. In this paper, we present an algorithm development example using Mass++ that performs a new baseline subtraction method. A signal processing technique previously developed to correct the atmospheric substances in infrared spectroscopy was converted to adjust to the mass spectrum baseline estimation, and a new method called Bottom Line Tracing (BLT) was constructed. BLT can estimate a suitable baseline for a mass spectrum with rapid changes in its waveform with easy parameter tuning. We confirm that it is beneficial to utilize techniques or knowledge acquired in another field to obtain a better solution for a problem, and that the practical barriers to algorithm development and distribution will be considerably reduced by platform software like Mass++.

Mass++是一款免费的质谱分析平台软件，主要针对生物科学领域开发，用户可以使用该软件构建自己的功能或工作流程作为插件使用。在本文中，我们给出了一个使用mass++的算法开发示例，该示例执行了一种新的基线减法方法。将红外光谱中大气物质校正的信号处理技术转化为质谱基线估计，构建了一种新的“底线追踪”方法。BLT可以为波形变化迅速的质谱估计一个合适的基线，且参数易于调整。我们确认，利用在另一个领域获得的技术或知识来获得一个问题的更好解决方案是有益的，并且像Mass++这样的平台软件将大大减少算法开发和分发的实际障碍。

引用次数: 6

A novel evaluation measure for identifying drug targets from the biomedical literature 从生物医学文献中识别药物靶点的一种新的评价方法

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2014-01-01 DOI: 10.2197/IPSJTBIO.7.16

Yeondae Kwon, Shogo Shimizu, H. Sugawara, S. Miyazaki

Identification of candidate target genes related to a particular disease is an important stage in drug development. A number of studies have extracted disease-related genes from the biomedical literature. We herein present a novel evaluation measure that identifies disease-associated genes and prioritizes the identified genes as drug target genes in terms of fewer side-effects using the biomedical literature. The proposed measure evaluates the specificity of a gene to a particular disease based on the number of diseases associated with the gene. The specificity of a gene is measured by means of, for example, term frequency-inverse document frequency (tf-idf), which is widely used in Web information retrieval. We assume that if a gene is chosen as a target gene for a disease, then side-effects are more likely to occur as the number of diseases associated with the gene increases. We verified the obtained ranking results by checking the ranks of known drug targets. As a result, 177 known drug targets were found to be ranked within the top 100 genes, and 21 drug targets were top ranked. The results suggest that the proposed measure is useful as a primary filter for extracting candidate target genes from a large number of genes.

与特定疾病相关的候选靶基因的鉴定是药物开发的重要阶段。许多研究从生物医学文献中提取了与疾病相关的基因。我们在此提出了一种新的评估方法，可以识别疾病相关基因，并根据生物医学文献的副作用较少，将识别的基因优先作为药物靶基因。所提出的测量方法基于与该基因相关的疾病数量来评估基因对特定疾病的特异性。基因的特异性是通过术语频率逆文档频率(tf-idf)等方法来测量的，该方法广泛用于Web信息检索。我们假设，如果一个基因被选为某种疾病的靶基因，那么随着与该基因相关的疾病数量的增加，副作用就更有可能发生。我们通过对已知药物靶点的排序来验证得到的排序结果。结果发现，在前100个基因中有177个已知药物靶点，其中21个药物靶点排名靠前。结果表明，该方法可作为从大量基因中提取候选靶基因的初级过滤器。

{"title":"A novel evaluation measure for identifying drug targets from the biomedical literature","authors":"Yeondae Kwon, Shogo Shimizu, H. Sugawara, S. Miyazaki","doi":"10.2197/IPSJTBIO.7.16","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.7.16","url":null,"abstract":"Identification of candidate target genes related to a particular disease is an important stage in drug development. A number of studies have extracted disease-related genes from the biomedical literature. We herein present a novel evaluation measure that identifies disease-associated genes and prioritizes the identified genes as drug target genes in terms of fewer side-effects using the biomedical literature. The proposed measure evaluates the specificity of a gene to a particular disease based on the number of diseases associated with the gene. The specificity of a gene is measured by means of, for example, term frequency-inverse document frequency (tf-idf), which is widely used in Web information retrieval. We assume that if a gene is chosen as a target gene for a disease, then side-effects are more likely to occur as the number of diseases associated with the gene increases. We verified the obtained ranking results by checking the ranks of known drug targets. As a result, 177 known drug targets were found to be ranked within the top 100 genes, and 21 drug targets were top ranked. The results suggest that the proposed measure is useful as a primary filter for extracting candidate target genes from a large number of genes.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"121 1","pages":"16-23"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.7.16","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68503185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

NegFinder: A Web Service for Identifying Negation Signals and Their Scopes NegFinder:一个用于识别否定信号及其作用域的Web服务

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2013-06-20 DOI: 10.2197/IPSJTBIO.6.29

Kazuki Fujikawa, Kazuhiro Seki, K. Uehara

引用次数: 3

SCPSSMpred: A General Sequence-based Method for Ligand-binding Site Prediction 基于序列的配体结合位点预测方法

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2013-06-01 DOI: 10.2197/IPSJTBIO.6.35

Chun Fang, T. Noguchi, H. Yamana

In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior *1.

在本文中，我们提出了一种新的方法，命名为SCPSSMpred (Smoothed and Condensed PSSM based prediction)，它使用一个简化的位置特异性评分矩阵(PSSM)来预测配体结合位点。虽然简化后的PSSM只有十个维度，但它结合了丰富的特征，如氨基酸排列、邻近残基信息、物理化学性质和进化信息等。我们的方法不使用其他分类器的预测结果作为输入，即该方法中使用的所有特征仅从序列中提取。用三种配体(FAD, NAD和ATP)验证了我们方法的通用性，并对三种替代的传统方法进行了比较分析。所有方法均在残基水平和蛋白序列水平上进行了试验。实验结果表明，SCPSSMpred方法除将PSSM中的冗余特征减少50%外，还取得了最佳性能。此外，在蛋白质序列水平上，与其他方法相比，该方法在处理不平衡数据方面表现出了显著的适应性。这项研究不仅证明了减少PSSM中冗余特征的重要性，而且还确定了配体结合位点的序列衍生标志，使得邻近残基的排列和物理化学性质显著影响配体结合行为*1。

{"title":"SCPSSMpred: A General Sequence-based Method for Ligand-binding Site Prediction","authors":"Chun Fang, T. Noguchi, H. Yamana","doi":"10.2197/IPSJTBIO.6.35","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.6.35","url":null,"abstract":"In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior *1.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"6 1","pages":"35-42"},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.6.35","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Improved Protein-ligand Prediction Using Kernel Weighted Canonical Correlation Analysis 基于核加权典型相关分析的改进蛋白质配体预测

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2013-01-01 DOI: 10.2197/IPSJTBIO.6.18

Raissa Relator, Tsuyoshi Kato, Richard S. Lemence

Protein-ligand interaction prediction plays an important role in drug design and discovery. However, wet lab procedures are inherently time consuming and expensive due to the vast number of candidate compounds and target genes. Hence, computational approaches became imperative and have become popular due to their promising results and practicality. Such methods require high accuracy and precision outputs for them to be useful, thus, the problem of devising such an algorithm remains very challenging. In this paper we propose an algorithm employing both support vector machines (SVM) and an extension of canonical correlation analysis (CCA). Following assumptions of recent chemogenomic approaches, we explore the effects of incorporating bias on similarity of compounds. We introduce kernel weighted CCA as a means of uncovering any underlying relationship between similarity of ligands and known ligands of target proteins. Experimental results indicate statistically significant improvement in the area under the ROC curve (AUC) and F-measure values obtained as opposed to those gathered when only SVM, or SVM with kernel CCA is employed, which translates to better quality of prediction.

蛋白质-配体相互作用预测在药物设计和发现中起着重要作用。然而，由于大量的候选化合物和靶基因，湿实验室程序本身是耗时和昂贵的。因此，计算方法变得势在必行，并且由于其有希望的结果和实用性而变得流行。这种方法需要高精度和高精度的输出才能发挥作用，因此，设计这样的算法仍然是一个非常具有挑战性的问题。本文提出了一种采用支持向量机(SVM)和典型相关分析(CCA)扩展的算法。根据最近的化学基因组学方法的假设，我们探讨了纳入偏差对化合物相似性的影响。我们引入核加权CCA作为揭示配体相似性和已知靶蛋白配体之间任何潜在关系的手段。实验结果表明，与仅使用SVM或使用核CCA的SVM相比，得到的ROC曲线下面积(AUC)和F-measure值有统计学意义上的改善，这意味着预测质量更好。

{"title":"Improved Protein-ligand Prediction Using Kernel Weighted Canonical Correlation Analysis","authors":"Raissa Relator, Tsuyoshi Kato, Richard S. Lemence","doi":"10.2197/IPSJTBIO.6.18","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.6.18","url":null,"abstract":"Protein-ligand interaction prediction plays an important role in drug design and discovery. However, wet lab procedures are inherently time consuming and expensive due to the vast number of candidate compounds and target genes. Hence, computational approaches became imperative and have become popular due to their promising results and practicality. Such methods require high accuracy and precision outputs for them to be useful, thus, the problem of devising such an algorithm remains very challenging. In this paper we propose an algorithm employing both support vector machines (SVM) and an extension of canonical correlation analysis (CCA). Following assumptions of recent chemogenomic approaches, we explore the effects of incorporating bias on similarity of compounds. We introduce kernel weighted CCA as a means of uncovering any underlying relationship between similarity of ligands and known ligands of target proteins. Experimental results indicate statistically significant improvement in the area under the ROC curve (AUC) and F-measure values obtained as opposed to those gathered when only SVM, or SVM with kernel CCA is employed, which translates to better quality of prediction.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"6 1","pages":"18-28"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.6.18","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Protein Experimental Information Management System (PREIMS) Based on Ontology: Development and Applications 基于本体的蛋白质实验信息管理系统(PREIMS)的开发与应用

Q3 Biochemistry, Genetics and Molecular Biology

IPSJ Transactions on Bioinformatics

Pub Date : 2013-01-01 DOI: 10.2197/IPSJTBIO.6.9

Junko Sato, Kouji Kozaki, Susumu Handa, Takashi Ikeda, Ryotaro Saka, K. Tomizuka, Y. Nishiyama, Toshiyuki Okumura, S. Hirai, Tadashi Ohno, Mamoru Ohta, S. Date, Haruki Nakamura

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IPSJ Transactions on Bioinformatics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀