首页 > 最新文献

Journal of Computer Aided Chemistry最新文献

英文 中文
[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Artificial Intelligence, Knowledge Discovery and Data Mining Thirty Years of Experience in Cheminformatics 【献给冈田教授和西冈教授:化学中的数据科学】人工智能、知识发现和数据挖掘化学信息学三十年的经验
Pub Date : 2017-01-01 DOI: 10.2751/JCAC.18.3
T. Okada
{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Artificial Intelligence, Knowledge Discovery and Data Mining Thirty Years of Experience in Cheminformatics","authors":"T. Okada","doi":"10.2751/JCAC.18.3","DOIUrl":"https://doi.org/10.2751/JCAC.18.3","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"3-14"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
<岡田孝先生・西岡孝明先生の退職記念号:化学データサイエンス>Bayes ANOVA を用いた Euglena gracilis の代謝経路の網羅的発現変動解析 《冈田孝、西冈孝明先生的退休纪念号:化学数据科学》利用Bayes ANOVA分析Euglena gracilis代谢路径的全面性发现变化
Pub Date : 2017-01-01 DOI: 10.2751/JCAC.18.110
直亮 小野, 直己 横山, Md.Altuf-Ul Amin, 雅俊 中本, 大策 太田
{"title":"<岡田孝先生・西岡孝明先生の退職記念号:化学データサイエンス>Bayes ANOVA を用いた Euglena gracilis の代謝経路の網羅的発現変動解析","authors":"直亮 小野, 直己 横山, Md.Altuf-Ul Amin, 雅俊 中本, 大策 太田","doi":"10.2751/JCAC.18.110","DOIUrl":"https://doi.org/10.2751/JCAC.18.110","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"110-116"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.110","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69254684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improvement of Pseudo-molecule Generation on Solvent Dipole Ordering Virtual Screening (SDO-VS) 溶剂偶极子有序虚拟筛选(SDO-VS)伪分子生成的改进
Pub Date : 2017-01-01 DOI: 10.2751/JCAC.18.149
Shinya Nakamura, Hayao Kitayoshi, I. Nakanishi
Solvent dipole ordering virtual screening (SDO-VS) is a virtual screening method that focuses on the shape of the SDO region at the binding site of the protein. In SDO-VS, pseudo molecules (PMs) are generated to reproduce the shape of the SDO region. Compounds that have shapes (or volumes) similar to those of the PMs are then screened from a 3D struct ure database. The original implementation of SDO-VS involved PMs with only sp 3-hybridized carbon atoms. However, utilization of s p2and sp-hybridized atoms and/or small molecular fragments, in addition to sp 3-hybridized atoms, is expected to provide more effi cient screening. To this end, this study investigated the effect of sp3-, sp2-, and sp-hybridized atoms and phenyl rings as fragments for PM generation in the SDO-VS method. The screening efficiencies were compared with the original method for several drug target pr oteins. Overall, this new method improved screening efficiencies, as measured by the area under the cur v of the corresponding receiver operating characte istic plots.
溶剂偶极序虚拟筛选(SDO- vs)是一种关注蛋白质结合位点SDO区域形状的虚拟筛选方法。在SDO- vs中,生成伪分子(pm)来重现SDO区域的形状。然后从3D结构数据库中筛选形状(或体积)与pm相似的化合物。SDO-VS的最初实现涉及只有sp - 3杂化碳原子的pm。然而,除了sp- 3杂化原子外,利用s2和sp-杂化原子和/或小分子片段有望提供更有效的筛选。为此,本研究考察了sp3-、sp2-和sp-杂化原子和苯环作为片段在SDO-VS方法中对PM生成的影响。对几种药物靶蛋白的筛选效果与原方法进行了比较。总的来说,这种新方法提高了筛选效率,通过相应的接收器工作特性图的电流下面积来衡量。
{"title":"Improvement of Pseudo-molecule Generation on Solvent Dipole Ordering Virtual Screening (SDO-VS)","authors":"Shinya Nakamura, Hayao Kitayoshi, I. Nakanishi","doi":"10.2751/JCAC.18.149","DOIUrl":"https://doi.org/10.2751/JCAC.18.149","url":null,"abstract":"Solvent dipole ordering virtual screening (SDO-VS) is a virtual screening method that focuses on the shape of the SDO region at the binding site of the protein. In SDO-VS, pseudo molecules (PMs) are generated to reproduce the shape of the SDO region. Compounds that have shapes (or volumes) similar to those of the PMs are then screened from a 3D struct ure database. The original implementation of SDO-VS involved PMs with only sp 3-hybridized carbon atoms. However, utilization of s p2and sp-hybridized atoms and/or small molecular fragments, in addition to sp 3-hybridized atoms, is expected to provide more effi cient screening. To this end, this study investigated the effect of sp3-, sp2-, and sp-hybridized atoms and phenyl rings as fragments for PM generation in the SDO-VS method. The screening efficiencies were compared with the original method for several drug target pr oteins. Overall, this new method improved screening efficiencies, as measured by the area under the cur v of the corresponding receiver operating characte istic plots.","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"149-158"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.149","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Centrality Values of Yeast Proteins in a PPI Network Are Related to Their Essentiality and Functions [献给T. Okada教授和T. Nishioka教授:化学中的数据科学]酵母蛋白在PPI网络中的中心性值与其本质和功能相关
Pub Date : 2017-01-01 DOI: 10.2751/JCAC.18.94
M. Altaf-Ul-Amin, S. Wijaya, D. Chandra, S. Kanaya
It has long been investigated and understood that centrality of proteins in the context of protein-protein interaction (PPI) networks are related to their essentiality. In the present work, we validate the relations between essentiality of yeast proteins and their centrality measures in a PPI network by following a different approach using the concept of the receiver operating characteristic (ROC) curve. We found that all centrality measures are related to essentiality. However, the degree centrality performed better in case of the data we used. By deeply examining different centrality values of yeast proteins we find that they are not highly correlated, which has leaded us to hypothesize that centralities might have some relations with gene/protein functions. Indeed, we found that many of the clusters generated based on the pattern of centrality values are rich with similar function proteins. Different types of centrality values imply different types of importance of a node in a network and the functions of genes are of various types. In the present work, we hypothesized that important genes of different functions may tend to show different patterns of centralities and here we show some preliminary links between groups of similar function genes and profiles of centrality values. The concepts of network biology discussed in this paper are applicable to other networks including networks of chemical compounds.
长期以来,人们一直在研究和理解蛋白质在蛋白质-蛋白质相互作用(PPI)网络中的中心地位与其必要性有关。在目前的工作中,我们验证了酵母蛋白的重要性和它们在PPI网络中的中心性测量之间的关系,采用了不同的方法,使用了接收者工作特征(ROC)曲线的概念。我们发现所有的中心性度量都与重要性有关。然而,在我们使用的数据的情况下,度中心性表现更好。通过深入研究酵母蛋白的不同中心性值,我们发现它们之间并没有高度相关,这使得我们假设中心性可能与基因/蛋白质功能有一定的关系。事实上,我们发现许多基于中心性值模式生成的簇都富含类似的功能蛋白。不同类型的中心性值意味着网络中节点的重要性类型不同,基因的功能类型也不同。在目前的工作中,我们假设不同功能的重要基因可能倾向于表现出不同的中心性模式,在这里我们展示了类似功能基因群和中心性值谱之间的一些初步联系。本文讨论的网络生物学概念也适用于其他网络,包括化合物网络。
{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Centrality Values of Yeast Proteins in a PPI Network Are Related to Their Essentiality and Functions","authors":"M. Altaf-Ul-Amin, S. Wijaya, D. Chandra, S. Kanaya","doi":"10.2751/JCAC.18.94","DOIUrl":"https://doi.org/10.2751/JCAC.18.94","url":null,"abstract":"It has long been investigated and understood that centrality of proteins in the context of protein-protein interaction (PPI) networks are related to their essentiality. In the present work, we validate the relations between essentiality of yeast proteins and their centrality measures in a PPI network by following a different approach using the concept of the receiver operating characteristic (ROC) curve. We found that all centrality measures are related to essentiality. However, the degree centrality performed better in case of the data we used. By deeply examining different centrality values of yeast proteins we find that they are not highly correlated, which has leaded us to hypothesize that centralities might have some relations with gene/protein functions. Indeed, we found that many of the clusters generated based on the pattern of centrality values are rich with similar function proteins. Different types of centrality values imply different types of importance of a node in a network and the functions of genes are of various types. In the present work, we hypothesized that important genes of different functions may tend to show different patterns of centralities and here we show some preliminary links between groups of similar function genes and profiles of centrality values. The concepts of network biology discussed in this paper are applicable to other networks including networks of chemical compounds.","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"94-109"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.94","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69256094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Chemical Annotation of ESI-MS/MS Spectral Data 【献给冈田教授和西冈教授:化学中的数据科学】ESI-MS/MS光谱数据的化学注释
Pub Date : 2017-01-01 DOI: 10.2751/JCAC.18.15
T. Nishioka, H. Horai
{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Chemical Annotation of ESI-MS/MS Spectral Data","authors":"T. Nishioka, H. Horai","doi":"10.2751/JCAC.18.15","DOIUrl":"https://doi.org/10.2751/JCAC.18.15","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"15-23"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Classification of Alkaloid Compounds Based on Subring Skeleton (SRS) Profiling: On Finding Relationship of Compounds with Metabolic Pathways [献给T. Okada教授和T. Nishioka教授:化学中的数据科学]基于subbring Skeleton (SRS) Profiling的生物碱化合物分类:寻找化合物与代谢途径的关系
Pub Date : 2017-01-01 DOI: 10.2751/JCAC.18.58
Ryohei Eguchi, N. Ono, H. Horai, Md. Altuf-Ul Amin, Aki Hirai, J. Kawahara, S. Kasahara, Tomoaki Endo, S. Kanaya
Systematic representation of alkaloid biosynthetic pathways based on ring skeletons has been proposed because the skeleton nucleus of an alkaloid is the main criterion for determination in biosynthetic pathways. So the idea of ring skeletons was extended to apply classification of alkaloid compounds based on ring skeletons and to systematize alkaloid compounds and to examine the performance of this approach to predict biosynthetic pathways based on module elements. We constructed a 2-dimensional binary matrix corresponding to 2546 SRS and 478 pathway-known alkaloid compounds. Here, if ith substring skeleton is present in a target compound, the ith element was set to 1; otherwise, the ith element was set to 0. Relationship of alkaloid compounds with biosynthetic pathways are examined based on the dendrogram produced by Ward clustering method to the matrix. Of 12,243 alkaloid compounds accumulated in KNApSAcK Core DB (http://kanaya.naist.jp/knapsack_jsp/top.html), 3,124 compounds (25.5 %) correspond to the pathway-known ring skeletons (187 ring skeletons), but the remaining 9,119 (74.5%) compounds do not. By examining the sub-ring skeleton similarity of the remaining compounds, it might be possible to obtain clues of pathway information and systemization of all alkaloid compounds. Therefore, the present work focuses on comprehensive systematization of the alkaloid compounds and construction principles of ring skeletons in alkaloids based on subring skeleton profiling.
由于生物碱的骨架核是生物合成途径确定的主要标准,因此提出了基于环骨架的生物碱生物合成途径的系统表示。因此,将环骨架的思想扩展到基于环骨架的生物碱化合物分类,并将生物碱化合物系统化,并检验该方法在基于模块元素的生物合成途径预测中的性能。我们构建了一个二维二元矩阵,对应于2546个SRS和478个通路已知的生物碱化合物。这里,如果目标化合物中存在ith子字符串骨架,则将第i元素设置为1;否则,第i个元素被设为0。基于Ward聚类法对基质生成的树形图,研究了生物碱化合物与生物合成途径的关系。在KNApSAcK Core DB (http://kanaya.naist.jp/knapsack_jsp/top.html)中积累的12243个生物碱化合物中,3124个(25.5%)化合物对应于途径已知的环骨架(187个),其余9119个(74.5%)化合物不对应于途径已知的环骨架。通过检测剩余化合物的亚环骨架相似性,可以获得所有生物碱化合物的通路信息和系统化线索。因此,本文的研究重点是基于亚骨架谱的生物碱化合物的综合系统化和生物碱环骨架的构建原理。
{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]Classification of Alkaloid Compounds Based on Subring Skeleton (SRS) Profiling: On Finding Relationship of Compounds with Metabolic Pathways","authors":"Ryohei Eguchi, N. Ono, H. Horai, Md. Altuf-Ul Amin, Aki Hirai, J. Kawahara, S. Kasahara, Tomoaki Endo, S. Kanaya","doi":"10.2751/JCAC.18.58","DOIUrl":"https://doi.org/10.2751/JCAC.18.58","url":null,"abstract":"Systematic representation of alkaloid biosynthetic pathways based on ring skeletons has been proposed because the skeleton nucleus of an alkaloid is the main criterion for determination in biosynthetic pathways. So the idea of ring skeletons was extended to apply classification of alkaloid compounds based on ring skeletons and to systematize alkaloid compounds and to examine the performance of this approach to predict biosynthetic pathways based on module elements. We constructed a 2-dimensional binary matrix corresponding to 2546 SRS and 478 pathway-known alkaloid compounds. Here, if ith substring skeleton is present in a target compound, the ith element was set to 1; otherwise, the ith element was set to 0. Relationship of alkaloid compounds with biosynthetic pathways are examined based on the dendrogram produced by Ward clustering method to the matrix. Of 12,243 alkaloid compounds accumulated in KNApSAcK Core DB (http://kanaya.naist.jp/knapsack_jsp/top.html), 3,124 compounds (25.5 %) correspond to the pathway-known ring skeletons (187 ring skeletons), but the remaining 9,119 (74.5%) compounds do not. By examining the sub-ring skeleton similarity of the remaining compounds, it might be possible to obtain clues of pathway information and systemization of all alkaloid compounds. Therefore, the present work focuses on comprehensive systematization of the alkaloid compounds and construction principles of ring skeletons in alkaloids based on subring skeleton profiling.","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"58-75"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.58","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
<岡田孝先生・西岡孝明先生の退職記念号:化学データサイエンス>巻頭言:岡田孝先生、西岡孝明先生の退職特別企画:ケモインフォマティクスのデータ・サイエンスとしての広がり 《冈田孝老师、西冈孝明老师的退休纪念号:化学数据科学》卷首语:冈田孝老师、西冈孝明老师的退休特别企划:momo infortics作为数据科学的扩展
Pub Date : 2017-01-01 DOI: 10.2751/jcac.18.1
重彦 金谷
{"title":"<岡田孝先生・西岡孝明先生の退職記念号:化学データサイエンス>巻頭言:岡田孝先生、西岡孝明先生の退職特別企画:ケモインフォマティクスのデータ・サイエンスとしての広がり","authors":"重彦 金谷","doi":"10.2751/jcac.18.1","DOIUrl":"https://doi.org/10.2751/jcac.18.1","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/jcac.18.1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Preliminary Study of Correction for Inter Fragment Interaction Energy (IFIE) between Fragments Sharing Bond Detached Atom (BDA) 共用键分离原子(BDA)碎片间相互作用能(IFIE)修正的初步研究
Pub Date : 2017-01-01 DOI: 10.2751/JCAC.18.143
T. Nakano, Yuji Mochidzuki, Kaori Fukuzawa, Yoshio Okiyama, C. Watanabe
{"title":"A Preliminary Study of Correction for Inter Fragment Interaction Energy (IFIE) between Fragments Sharing Bond Detached Atom (BDA)","authors":"T. Nakano, Yuji Mochidzuki, Kaori Fukuzawa, Yoshio Okiyama, C. Watanabe","doi":"10.2751/JCAC.18.143","DOIUrl":"https://doi.org/10.2751/JCAC.18.143","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"143-148"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.143","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]The Contribution of Lipid Identification Tools Powered by In Silico MS/MS Spectral Libraries to Lipidomics [献给T. Okada教授和T. Nishioka教授:化学中的数据科学]基于硅质谱库的脂质鉴定工具对脂质组学的贡献
Pub Date : 2017-01-01 DOI: 10.2751/jcac.18.51
Takumi Ogawa, A. Okazawa, D. Ohta
{"title":"[Dedicated to Prof. T. Okada and Prof. T. Nishioka: data science in chemistry]The Contribution of Lipid Identification Tools Powered by In Silico MS/MS Spectral Libraries to Lipidomics","authors":"Takumi Ogawa, A. Okazawa, D. Ohta","doi":"10.2751/jcac.18.51","DOIUrl":"https://doi.org/10.2751/jcac.18.51","url":null,"abstract":"","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"51-57"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/jcac.18.51","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Small Random Forest Models for Effective Chemogenomic Active Learning 有效化学基因组主动学习的小随机森林模型
Pub Date : 2017-01-01 DOI: 10.2751/JCAC.18.124
C. Rakers, D. Reker, J. B. Brown
The identification of new compound-protein interactions has long been the fundamental quest in the field of medicinal chemistry. With increasing amounts of biochemical data, advanced machine learning techniques such as active learning have been proven to be beneficial for building high-performance prediction models upon subsets of such complex data. In a recently published paper, chemogenomic active learning had been applied to the interaction spaces of kinases and G protein-coupled receptors featuring over 150,000 compound-protein interactions. Prediction models were actively trained based on random forest classification using 500 decision trees per experiment. In a new direction for chemogenomic active learning, we address the question of how forest size influences model evolution and performance. In addition to the original chemogenomic active learning findings that highly predictive models could be constructed from a small fraction of the available data, we find here that that model complexity as viewed by forest size can be reduced to one-fourth or one-fifth of the previously investigated forest size while still maintaining reliable prediction performance. Thus, chemogenomic active learning can yield predictive models with reduced complexity based on only a fraction of the data available for model construction.
鉴定新的化合物-蛋白质相互作用一直是药物化学领域的基本任务。随着生物化学数据量的增加,先进的机器学习技术(如主动学习)已被证明有助于在此类复杂数据的子集上构建高性能预测模型。在最近发表的一篇论文中,化学基因组主动学习已被应用于激酶和G蛋白偶联受体的相互作用空间,具有超过150,000种化合物-蛋白质相互作用。在随机森林分类的基础上主动训练预测模型,每个实验使用500棵决策树。在化学基因组主动学习的新方向上,我们解决了森林大小如何影响模型进化和性能的问题。除了原始的化学基因组主动学习发现可以从一小部分可用数据构建高度预测模型之外,我们还发现,从森林规模来看,模型复杂性可以降低到先前研究森林规模的四分之一或五分之一,同时仍然保持可靠的预测性能。因此,化学基因组主动学习可以产生复杂性较低的预测模型,仅基于模型构建可用数据的一小部分。
{"title":"Small Random Forest Models for Effective Chemogenomic Active Learning","authors":"C. Rakers, D. Reker, J. B. Brown","doi":"10.2751/JCAC.18.124","DOIUrl":"https://doi.org/10.2751/JCAC.18.124","url":null,"abstract":"The identification of new compound-protein interactions has long been the fundamental quest in the field of medicinal chemistry. With increasing amounts of biochemical data, advanced machine learning techniques such as active learning have been proven to be beneficial for building high-performance prediction models upon subsets of such complex data. In a recently published paper, chemogenomic active learning had been applied to the interaction spaces of kinases and G protein-coupled receptors featuring over 150,000 compound-protein interactions. Prediction models were actively trained based on random forest classification using 500 decision trees per experiment. In a new direction for chemogenomic active learning, we address the question of how forest size influences model evolution and performance. In addition to the original chemogenomic active learning findings that highly predictive models could be constructed from a small fraction of the available data, we find here that that model complexity as viewed by forest size can be reduced to one-fourth or one-fifth of the previously investigated forest size while still maintaining reliable prediction performance. Thus, chemogenomic active learning can yield predictive models with reduced complexity based on only a fraction of the data available for model construction.","PeriodicalId":41457,"journal":{"name":"Journal of Computer Aided Chemistry","volume":"18 1","pages":"124-142"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2751/JCAC.18.124","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69255281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
Journal of Computer Aided Chemistry
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1