{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Artificial Intelligence, Knowledge Discovery and Data Mining Thirty Years of Experience in Cheminformatics","authors":"T. Okada","doi":"10.2751/JCAC.18.3","DOIUrl":"https://doi.org/10.2751/JCAC.18.3","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"3-14"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Solvent dipole ordering virtual screening (SDO-VS) is a virtual screening method that focuses on the shape of the SDO region at the binding site of the protein. In SDO-VS, pseudo molecules (PMs) are generated to reproduce the shape of the SDO region. Compounds that have shapes (or volumes) similar to those of the PMs are then screened from a 3D struct ure database. The original implementation of SDO-VS involved PMs with only sp 3-hybridized carbon atoms. However, utilization of s p2and sp-hybridized atoms and/or small molecular fragments, in addition to sp 3-hybridized atoms, is expected to provide more effi cient screening. To this end, this study investigated the effect of sp3-, sp2-, and sp-hybridized atoms and phenyl rings as fragments for PM generation in the SDO-VS method. The screening efficiencies were compared with the original method for several drug target pr oteins. Overall, this new method improved screening efficiencies, as measured by the area under the cur v of the corresponding receiver operating characte istic plots.
{"title":"Improvement of Pseudo-molecule Generation on Solvent Dipole Ordering Virtual Screening (SDO-VS)","authors":"Shinya Nakamura, Hayao Kitayoshi, I. Nakanishi","doi":"10.2751/JCAC.18.149","DOIUrl":"https://doi.org/10.2751/JCAC.18.149","url":null,"abstract":"Solvent dipole ordering virtual screening (SDO-VS) is a virtual screening method that focuses on the shape of the SDO region at the binding site of the protein. In SDO-VS, pseudo molecules (PMs) are generated to reproduce the shape of the SDO region. Compounds that have shapes (or volumes) similar to those of the PMs are then screened from a 3D struct ure database. The original implementation of SDO-VS involved PMs with only sp 3-hybridized carbon atoms. However, utilization of s p2and sp-hybridized atoms and/or small molecular fragments, in addition to sp 3-hybridized atoms, is expected to provide more effi cient screening. To this end, this study investigated the effect of sp3-, sp2-, and sp-hybridized atoms and phenyl rings as fragments for PM generation in the SDO-VS method. The screening efficiencies were compared with the original method for several drug target pr oteins. Overall, this new method improved screening efficiencies, as measured by the area under the cur v of the corresponding receiver operating characte istic plots.","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"149-158"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.149","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Altaf-Ul-Amin, S. Wijaya, D. Chandra, S. Kanaya
It has long been investigated and understood that centrality of proteins in the context of protein-protein interaction (PPI) networks are related to their essentiality. In the present work, we validate the relations between essentiality of yeast proteins and their centrality measures in a PPI network by following a different approach using the concept of the receiver operating characteristic (ROC) curve. We found that all centrality measures are related to essentiality. However, the degree centrality performed better in case of the data we used. By deeply examining different centrality values of yeast proteins we find that they are not highly correlated, which has leaded us to hypothesize that centralities might have some relations with gene/protein functions. Indeed, we found that many of the clusters generated based on the pattern of centrality values are rich with similar function proteins. Different types of centrality values imply different types of importance of a node in a network and the functions of genes are of various types. In the present work, we hypothesized that important genes of different functions may tend to show different patterns of centralities and here we show some preliminary links between groups of similar function genes and profiles of centrality values. The concepts of network biology discussed in this paper are applicable to other networks including networks of chemical compounds.
{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Centrality Values of Yeast Proteins in a PPI Network Are Related to Their Essentiality and Functions","authors":"M. Altaf-Ul-Amin, S. Wijaya, D. Chandra, S. Kanaya","doi":"10.2751/JCAC.18.94","DOIUrl":"https://doi.org/10.2751/JCAC.18.94","url":null,"abstract":"It has long been investigated and understood that centrality of proteins in the context of protein-protein interaction (PPI) networks are related to their essentiality. In the present work, we validate the relations between essentiality of yeast proteins and their centrality measures in a PPI network by following a different approach using the concept of the receiver operating characteristic (ROC) curve. We found that all centrality measures are related to essentiality. However, the degree centrality performed better in case of the data we used. By deeply examining different centrality values of yeast proteins we find that they are not highly correlated, which has leaded us to hypothesize that centralities might have some relations with gene/protein functions. Indeed, we found that many of the clusters generated based on the pattern of centrality values are rich with similar function proteins. Different types of centrality values imply different types of importance of a node in a network and the functions of genes are of various types. In the present work, we hypothesized that important genes of different functions may tend to show different patterns of centralities and here we show some preliminary links between groups of similar function genes and profiles of centrality values. The concepts of network biology discussed in this paper are applicable to other networks including networks of chemical compounds.","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"94-109"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.94","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69256094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Chemical Annotation of ESI-MS/MS Spectral Data","authors":"T. Nishioka, H. Horai","doi":"10.2751/JCAC.18.15","DOIUrl":"https://doi.org/10.2751/JCAC.18.15","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"15-23"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryohei Eguchi, N. Ono, H. Horai, Md. Altuf-Ul Amin, Aki Hirai, J. Kawahara, S. Kasahara, Tomoaki Endo, S. Kanaya
Systematic representation of alkaloid biosynthetic pathways based on ring skeletons has been proposed because the skeleton nucleus of an alkaloid is the main criterion for determination in biosynthetic pathways. So the idea of ring skeletons was extended to apply classification of alkaloid compounds based on ring skeletons and to systematize alkaloid compounds and to examine the performance of this approach to predict biosynthetic pathways based on module elements. We constructed a 2-dimensional binary matrix corresponding to 2546 SRS and 478 pathway-known alkaloid compounds. Here, if ith substring skeleton is present in a target compound, the ith element was set to 1; otherwise, the ith element was set to 0. Relationship of alkaloid compounds with biosynthetic pathways are examined based on the dendrogram produced by Ward clustering method to the matrix. Of 12,243 alkaloid compounds accumulated in KNApSAcK Core DB (http://kanaya.naist.jp/knapsack_jsp/top.html), 3,124 compounds (25.5 %) correspond to the pathway-known ring skeletons (187 ring skeletons), but the remaining 9,119 (74.5%) compounds do not. By examining the sub-ring skeleton similarity of the remaining compounds, it might be possible to obtain clues of pathway information and systemization of all alkaloid compounds. Therefore, the present work focuses on comprehensive systematization of the alkaloid compounds and construction principles of ring skeletons in alkaloids based on subring skeleton profiling.
由于生物碱的骨架核是生物合成途径确定的主要标准,因此提出了基于环骨架的生物碱生物合成途径的系统表示。因此,将环骨架的思想扩展到基于环骨架的生物碱化合物分类,并将生物碱化合物系统化,并检验该方法在基于模块元素的生物合成途径预测中的性能。我们构建了一个二维二元矩阵,对应于2546个SRS和478个通路已知的生物碱化合物。这里,如果目标化合物中存在ith子字符串骨架,则将第i元素设置为1;否则,第i个元素被设为0。基于Ward聚类法对基质生成的树形图,研究了生物碱化合物与生物合成途径的关系。在KNApSAcK Core DB (http://kanaya.naist.jp/knapsack_jsp/top.html)中积累的12243个生物碱化合物中,3124个(25.5%)化合物对应于途径已知的环骨架(187个),其余9119个(74.5%)化合物不对应于途径已知的环骨架。通过检测剩余化合物的亚环骨架相似性,可以获得所有生物碱化合物的通路信息和系统化线索。因此,本文的研究重点是基于亚骨架谱的生物碱化合物的综合系统化和生物碱环骨架的构建原理。
{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Classification of Alkaloid Compounds Based on Subring Skeleton (SRS) Profiling: On Finding Relationship of Compounds with Metabolic Pathways","authors":"Ryohei Eguchi, N. Ono, H. Horai, Md. Altuf-Ul Amin, Aki Hirai, J. Kawahara, S. Kasahara, Tomoaki Endo, S. Kanaya","doi":"10.2751/JCAC.18.58","DOIUrl":"https://doi.org/10.2751/JCAC.18.58","url":null,"abstract":"Systematic representation of alkaloid biosynthetic pathways based on ring skeletons has been proposed because the skeleton nucleus of an alkaloid is the main criterion for determination in biosynthetic pathways. So the idea of ring skeletons was extended to apply classification of alkaloid compounds based on ring skeletons and to systematize alkaloid compounds and to examine the performance of this approach to predict biosynthetic pathways based on module elements. We constructed a 2-dimensional binary matrix corresponding to 2546 SRS and 478 pathway-known alkaloid compounds. Here, if ith substring skeleton is present in a target compound, the ith element was set to 1; otherwise, the ith element was set to 0. Relationship of alkaloid compounds with biosynthetic pathways are examined based on the dendrogram produced by Ward clustering method to the matrix. Of 12,243 alkaloid compounds accumulated in KNApSAcK Core DB (http://kanaya.naist.jp/knapsack_jsp/top.html), 3,124 compounds (25.5 %) correspond to the pathway-known ring skeletons (187 ring skeletons), but the remaining 9,119 (74.5%) compounds do not. By examining the sub-ring skeleton similarity of the remaining compounds, it might be possible to obtain clues of pathway information and systemization of all alkaloid compounds. Therefore, the present work focuses on comprehensive systematization of the alkaloid compounds and construction principles of ring skeletons in alkaloids based on subring skeleton profiling.","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"58-75"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.58","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Nakano, Yuji Mochidzuki, Kaori Fukuzawa, Yoshio Okiyama, C. Watanabe
{"title":"A Preliminary Study of Correction for Inter Fragment Interaction Energy (IFIE) between Fragments Sharing Bond Detached Atom (BDA)","authors":"T. Nakano, Yuji Mochidzuki, Kaori Fukuzawa, Yoshio Okiyama, C. Watanabe","doi":"10.2751/JCAC.18.143","DOIUrl":"https://doi.org/10.2751/JCAC.18.143","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"143-148"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.143","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]The Contribution of Lipid Identification Tools Powered by In Silico MS/MS Spectral Libraries to Lipidomics","authors":"Takumi Ogawa, A. Okazawa, D. Ohta","doi":"10.2751/jcac.18.51","DOIUrl":"https://doi.org/10.2751/jcac.18.51","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"51-57"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/jcac.18.51","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The identification of new compound-protein interactions has long been the fundamental quest in the field of medicinal chemistry. With increasing amounts of biochemical data, advanced machine learning techniques such as active learning have been proven to be beneficial for building high-performance prediction models upon subsets of such complex data. In a recently published paper, chemogenomic active learning had been applied to the interaction spaces of kinases and G protein-coupled receptors featuring over 150,000 compound-protein interactions. Prediction models were actively trained based on random forest classification using 500 decision trees per experiment. In a new direction for chemogenomic active learning, we address the question of how forest size influences model evolution and performance. In addition to the original chemogenomic active learning findings that highly predictive models could be constructed from a small fraction of the available data, we find here that that model complexity as viewed by forest size can be reduced to one-fourth or one-fifth of the previously investigated forest size while still maintaining reliable prediction performance. Thus, chemogenomic active learning can yield predictive models with reduced complexity based on only a fraction of the data available for model construction.
{"title":"Small Random Forest Models for Effective Chemogenomic Active Learning","authors":"C. Rakers, D. Reker, J. B. Brown","doi":"10.2751/JCAC.18.124","DOIUrl":"https://doi.org/10.2751/JCAC.18.124","url":null,"abstract":"The identification of new compound-protein interactions has long been the fundamental quest in the field of medicinal chemistry. With increasing amounts of biochemical data, advanced machine learning techniques such as active learning have been proven to be beneficial for building high-performance prediction models upon subsets of such complex data. In a recently published paper, chemogenomic active learning had been applied to the interaction spaces of kinases and G protein-coupled receptors featuring over 150,000 compound-protein interactions. Prediction models were actively trained based on random forest classification using 500 decision trees per experiment. In a new direction for chemogenomic active learning, we address the question of how forest size influences model evolution and performance. In addition to the original chemogenomic active learning findings that highly predictive models could be constructed from a small fraction of the available data, we find here that that model complexity as viewed by forest size can be reduced to one-fourth or one-fifth of the previously investigated forest size while still maintaining reliable prediction performance. Thus, chemogenomic active learning can yield predictive models with reduced complexity based on only a fraction of the data available for model construction.","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"124-142"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.124","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}