首页 > 最新文献

Artificial intelligence in the life sciences最新文献

英文 中文
Modeling bioconcentration factors in fish with explainable deep learning 利用可解释的深度学习建模鱼类的生物富集因子
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100047
Linlin Zhao , Floriane Montanari , Henry Heberle , Sebastian Schmidt

The Bioconcentration Factor (BCF) is an important parameter in the environmental risk assessment of chemicals, relevant for industrial and academic research as well as required in many regulatory contexts. It represents the potential of a substance to accumulate in organic tissues or whole animals and is most frequently measured in fish. However, animal welfare reasons, throughput limitations, and costs push the need for alternative methods that allow accurate and reliable estimations of BCF in silico. We present a new deep learning model to predict BCF values from chemical structures, that outperforms currently available models (R2 of 0.68 and RMSE of 0.59 log units on an external test set; R2 of 0.70 and RMSE of 0.74 log units in a demanding cluster split validation). The model is based on molecular representations encoded as CDDD descriptors and exploits a large in-house dataset with measured logD values as an auxiliary task.

Additionally, we developed a post-hoc explainability method based on SMILES character substitutions to accompany our predictions with atom-level interpretations. These sensitivity scores highlight the most influential moieties in the molecule and can help to understand the predictions better and design new molecules.

生物浓度因子(BCF)是化学品环境风险评估中的一个重要参数,与工业和学术研究相关,并且在许多监管环境中都需要。它代表了一种物质在有机组织或整个动物中积累的潜力,最常在鱼类中测量。然而,动物福利的原因,吞吐量限制和成本推动了对替代方法的需求,这些方法可以准确可靠地估计BCF。我们提出了一个新的深度学习模型来预测化学结构的BCF值,该模型优于目前可用的模型(在外部测试集上R2为0.68,RMSE为0.59 log units;R2为0.70,RMSE为0.74 log单位(要求较高的集群分割验证)。该模型基于编码为CDDD描述符的分子表示,并利用具有测量logD值的大型内部数据集作为辅助任务。此外,我们开发了一种基于SMILES字符替换的事后可解释性方法,使我们的预测与原子水平的解释相结合。这些敏感性分数突出了分子中最具影响力的部分,可以帮助更好地理解预测并设计新的分子。
{"title":"Modeling bioconcentration factors in fish with explainable deep learning","authors":"Linlin Zhao ,&nbsp;Floriane Montanari ,&nbsp;Henry Heberle ,&nbsp;Sebastian Schmidt","doi":"10.1016/j.ailsci.2022.100047","DOIUrl":"10.1016/j.ailsci.2022.100047","url":null,"abstract":"<div><p>The Bioconcentration Factor (BCF) is an important parameter in the environmental risk assessment of chemicals, relevant for industrial and academic research as well as required in many regulatory contexts. It represents the potential of a substance to accumulate in organic tissues or whole animals and is most frequently measured in fish. However, animal welfare reasons, throughput limitations, and costs push the need for alternative methods that allow accurate and reliable estimations of BCF in silico. We present a new deep learning model to predict BCF values from chemical structures, that outperforms currently available models (<span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> of 0.68 and RMSE of 0.59 log units on an external test set; <span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> of 0.70 and RMSE of 0.74 log units in a demanding cluster split validation). The model is based on molecular representations encoded as CDDD descriptors and exploits a large in-house dataset with measured logD values as an auxiliary task.</p><p>Additionally, we developed a post-hoc explainability method based on SMILES character substitutions to accompany our predictions with atom-level interpretations. These sensitivity scores highlight the most influential moieties in the molecule and can help to understand the predictions better and design new molecules.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100047"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000174/pdfft?md5=d1e08bc12ac334ce4c4ea0eb17936560&pid=1-s2.0-S2667318522000174-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45371673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Symbolic regression for the interpretation of quantitative structure-property relationships 符号回归在定量构效关系解释中的应用
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100046
Katsushi Takaki , Tomoyuki Miyao

The interpretation of quantitative structure–activity or structure–property relationships is important in the field of chemoinformatics. Although multivariate linear regression models are typically interpretable, they do not generally have high predictive abilities. Symbolic regression (SR) combined with genetic programming (GP) is a well-established technique for generating the mathematical expressions that describe the relationships within a dataset. However, SR sometimes produces complicated expressions that are hard for humans to interpret. This paper proposes a method for generating simpler expressions by incorporating three filters into GP-based SR. The filters are further combined with nonlinear least-squares optimization to give filter-introduced GP (FIGP), which improves the predictive ability of SR models while retaining simple expressions. As a proof-of-concept, the quantitative estimate of drug-likeness and the synthetic accessibility score are predicted based on the chemical structures of compounds. Overall, FIGP generates less-complicated expressions than previous SR methods. In terms of predictive ability, FIGP is better than GP, but is outperformed by a support vector machine with a radial basis function kernel. Furthermore, quantitative structure–activity relationship models are constructed for three matching molecular series with biological targets. In the case of one target, the activity prediction models given by FIGP exhibit better predictive ability than multivariate linear regression and support vector regression with the radial basis function kernel, whereas for the remaining cases, FIGP is slightly less accurate than multivariate linear regression.

定量结构-活性或结构-性质关系的解释在化学信息学领域是重要的。虽然多元线性回归模型通常是可解释的,但它们通常没有很高的预测能力。符号回归(SR)结合遗传规划(GP)是一种成熟的技术,用于生成描述数据集中关系的数学表达式。然而,SR有时会产生人类难以理解的复杂表达。本文提出了一种将三个滤波器合并到基于遗传算法的遗传算法中生成更简单表达式的方法,并将这些滤波器与非线性最小二乘优化相结合,得到滤波引入遗传算法(FIGP),在保留简单表达式的同时提高了遗传算法模型的预测能力。作为概念验证,基于化合物的化学结构预测了药物相似性的定量估计和合成可及性评分。总的来说,FIGP生成的表达式比以前的SR方法简单。在预测能力方面,FIGP优于GP,但优于具有径向基函数核的支持向量机。在此基础上,构建了具有生物靶点的三个匹配分子序列的定量构效关系模型。在一个目标的情况下,FIGP给出的活动预测模型的预测能力优于多元线性回归和径向基函数核支持向量回归,而在其余情况下,FIGP的预测精度略低于多元线性回归。
{"title":"Symbolic regression for the interpretation of quantitative structure-property relationships","authors":"Katsushi Takaki ,&nbsp;Tomoyuki Miyao","doi":"10.1016/j.ailsci.2022.100046","DOIUrl":"10.1016/j.ailsci.2022.100046","url":null,"abstract":"<div><p>The interpretation of quantitative structure–activity or structure–property relationships is important in the field of chemoinformatics. Although multivariate linear regression models are typically interpretable, they do not generally have high predictive abilities. Symbolic regression (SR) combined with genetic programming (GP) is a well-established technique for generating the mathematical expressions that describe the relationships within a dataset. However, SR sometimes produces complicated expressions that are hard for humans to interpret. This paper proposes a method for generating simpler expressions by incorporating three filters into GP-based SR. The filters are further combined with nonlinear least-squares optimization to give filter-introduced GP (FIGP), which improves the predictive ability of SR models while retaining simple expressions. As a proof-of-concept, the quantitative estimate of drug-likeness and the synthetic accessibility score are predicted based on the chemical structures of compounds. Overall, FIGP generates less-complicated expressions than previous SR methods. In terms of predictive ability, FIGP is better than GP, but is outperformed by a support vector machine with a radial basis function kernel. Furthermore, quantitative structure–activity relationship models are constructed for three matching molecular series with biological targets. In the case of one target, the activity prediction models given by FIGP exhibit better predictive ability than multivariate linear regression and support vector regression with the radial basis function kernel, whereas for the remaining cases, FIGP is slightly less accurate than multivariate linear regression.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100046"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000162/pdfft?md5=d40d5f4fb6a5861ba6faf6c4bcb2c52c&pid=1-s2.0-S2667318522000162-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42959550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deepitope: Prediction of HLA-independent T-cell epitopes mediated by MHC class II using a convolutional neural network Deepitope:利用卷积神经网络预测MHC II类介导的HLA非依赖性T细胞表位
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100038
Raphael Trevizani , Fábio Lima Custódio

Computational linear T-cell epitope prediction tools allow cost and labor reduction in downstream in vitro testing, but the quality of currently available methods is compromised by the scarcity of experimental data and extensive HLA polymorphism. However, it is possible to improve prediction quality by forgoing HLA-dependency that allows treating all immunogenic sequences as a single group. This reduces the problem to a much simpler two-classes classification of determining whether a peptide is immunogenic or not. Here, we use a deep convolutional neural network capable of predicting linear T-cell epitope regions in primary structures trained using all peptides deposited in the IEDB website. We also investigate the possibility of using peptides derived from known human proteins as non-immunogenic counterexamples. We compared our model with a state-of-the-art tool and analyze the benefits of using larger databases. Our results corroborate the usefulness of HLA-free methods for practical applications that require the identification of immunogenic sequences. Deepitope is an open source project that can be found at https://github.com/raphaeltrevizani/deepitope.

计算线性t细胞表位预测工具可以降低下游体外测试的成本和人工,但目前可用方法的质量受到实验数据稀缺和广泛的HLA多态性的影响。然而,通过放弃hla依赖性,允许将所有免疫原性序列作为单一组处理,可以提高预测质量。这将问题简化为确定肽是否具有免疫原性的简单得多的两类分类。在这里,我们使用一个深度卷积神经网络,能够预测初级结构中的线性t细胞表位区域,该结构使用IEDB网站上沉积的所有肽进行训练。我们还研究了使用从已知人类蛋白质中提取的肽作为非免疫原性反例的可能性。我们将我们的模型与最先进的工具进行了比较,并分析了使用大型数据库的好处。我们的结果证实了无hla方法在实际应用中需要识别免疫原性序列的有效性。Deepitope是一个开源项目,可以在https://github.com/raphaeltrevizani/deepitope上找到。
{"title":"Deepitope: Prediction of HLA-independent T-cell epitopes mediated by MHC class II using a convolutional neural network","authors":"Raphael Trevizani ,&nbsp;Fábio Lima Custódio","doi":"10.1016/j.ailsci.2022.100038","DOIUrl":"10.1016/j.ailsci.2022.100038","url":null,"abstract":"<div><p>Computational linear T-cell epitope prediction tools allow cost and labor reduction in downstream <em>in vitro</em> testing, but the quality of currently available methods is compromised by the scarcity of experimental data and extensive HLA polymorphism. However, it is possible to improve prediction quality by forgoing HLA-dependency that allows treating all immunogenic sequences as a single group. This reduces the problem to a much simpler two-classes classification of determining whether a peptide is immunogenic or not. Here, we use a deep convolutional neural network capable of predicting linear T-cell epitope regions in primary structures trained using all peptides deposited in the IEDB website. We also investigate the possibility of using peptides derived from known human proteins as non-immunogenic counterexamples. We compared our model with a state-of-the-art tool and analyze the benefits of using larger databases. Our results corroborate the usefulness of HLA-free methods for practical applications that require the identification of immunogenic sequences. Deepitope is an open source project that can be found at <span>https://github.com/raphaeltrevizani/deepitope</span><svg><path></path></svg>.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100038"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000095/pdfft?md5=14ba0e71b89c009c171d8f8bde7e5f43&pid=1-s2.0-S2667318522000095-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43701924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Corrigendum to “Machine Learning Based Prediction of COVID-19 Mortality Suggests Repositioning of Anticancer Drug for Treating Severe Cases”[Artificial Intelligence in Life Sciences] 1(2021), 100020 “基于机器学习的COVID-19死亡率预测建议重新定位治疗重症的抗癌药物”[生命科学人工智能]1(2021),100020
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100032
Thomas Linden , Frank Hanses , Daniel Domingo-Fernández , Lauren Nicole DeLong , Alpha Tom Kodamullil , Jochen Schneider , Maria J.G.T. Vehreschild , Julia Lanznaster , Maria Madeleine Ruethrich , Stefan Borgmann , Martin Hower , Kai Wille , Torsten Feldt , Siegbert Rieg , Bernd Hertenstein , Christoph Wyen , Christoph Roemmele , Jörg Janne Vehreschild , Carolin E.M. Jakob , Melanie Stecher , Holger Fröhlich
{"title":"Corrigendum to “Machine Learning Based Prediction of COVID-19 Mortality Suggests Repositioning of Anticancer Drug for Treating Severe Cases”[Artificial Intelligence in Life Sciences] 1(2021), 100020","authors":"Thomas Linden ,&nbsp;Frank Hanses ,&nbsp;Daniel Domingo-Fernández ,&nbsp;Lauren Nicole DeLong ,&nbsp;Alpha Tom Kodamullil ,&nbsp;Jochen Schneider ,&nbsp;Maria J.G.T. Vehreschild ,&nbsp;Julia Lanznaster ,&nbsp;Maria Madeleine Ruethrich ,&nbsp;Stefan Borgmann ,&nbsp;Martin Hower ,&nbsp;Kai Wille ,&nbsp;Torsten Feldt ,&nbsp;Siegbert Rieg ,&nbsp;Bernd Hertenstein ,&nbsp;Christoph Wyen ,&nbsp;Christoph Roemmele ,&nbsp;Jörg Janne Vehreschild ,&nbsp;Carolin E.M. Jakob ,&nbsp;Melanie Stecher ,&nbsp;Holger Fröhlich","doi":"10.1016/j.ailsci.2022.100032","DOIUrl":"10.1016/j.ailsci.2022.100032","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100032"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8824443/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39916555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI for drug design: From explicit rules to deep learning 药物设计的人工智能:从明确的规则到深度学习
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100041
Lewis Mervin , Samuel Genheden , Ola Engkvist
{"title":"AI for drug design: From explicit rules to deep learning","authors":"Lewis Mervin ,&nbsp;Samuel Genheden ,&nbsp;Ola Engkvist","doi":"10.1016/j.ailsci.2022.100041","DOIUrl":"10.1016/j.ailsci.2022.100041","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100041"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000113/pdfft?md5=657b847f321004a995d4c509e863e3a9&pid=1-s2.0-S2667318522000113-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46445253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep learning of protein–ligand interactions—Remembering the actors 蛋白质-配体相互作用的深度学习——记住参与者
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100037
Jürgen Bajorath
{"title":"Deep learning of protein–ligand interactions—Remembering the actors","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100037","DOIUrl":"10.1016/j.ailsci.2022.100037","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100037"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000083/pdfft?md5=63e8dc2f154d93e6ede44a89727be89e&pid=1-s2.0-S2667318522000083-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44934025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Understanding uncertainty in deep learning builds confidence 理解深度学习中的不确定性可以建立信心
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100033
Jürgen Bajorath
{"title":"Understanding uncertainty in deep learning builds confidence","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100033","DOIUrl":"10.1016/j.ailsci.2022.100033","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100033"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000046/pdfft?md5=b881f0e2a53af340f6a1b73b950b6d6f&pid=1-s2.0-S2667318522000046-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44028547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Interpretation of multi-task clearance models from molecular images supported by experimental design 从实验设计支持的分子图像中解释多任务清除模型
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100048
Andrés Martínez Mora , Mickael Mogemark , Vigneshwari Subramanian , Filip Miljković

Recent methodological advances in deep learning (DL) architectures have not only improved the performance of predictive models but also enhanced their interpretability potential, thus considerably increasing their transparency. In the context of medicinal chemistry, the potential to not only accurately predict molecular properties, but also chemically interpret them, would be strongly preferred. Previously, we developed accurate multi-task convolutional neural network (CNN) and graph convolutional neural network (GCNN) models to predict a set of diverse intrinsic metabolic clearance parameters from image- and graph-based molecular representations, respectively. Herein, we introduce several model interpretability frameworks to answer whether the model explanations obtained from CNN and GCNN multi-task clearance models could be applied to predict chemical transformations associated with experimentally confirmed metabolic products. We show a strong correlation between the CNN pixel intensities and corresponding clearance predictions, as well as their robustness to different molecular orientations. Using actual case examples, we demonstrate that both CNN and GCNN interpretations frequently complement each other, suggesting their high potential for combined use in guiding medicinal chemistry design.

最近深度学习(DL)架构的方法进步不仅提高了预测模型的性能,而且增强了它们的可解释性潜力,从而大大提高了它们的透明度。在药物化学的背景下,不仅可以准确预测分子性质,而且可以化学解释它们的潜力将是强烈首选。此前,我们开发了精确的多任务卷积神经网络(CNN)和图卷积神经网络(GCNN)模型,分别从基于图像和基于图的分子表示中预测一组不同的内在代谢清除参数。在此,我们引入了几个模型可解释性框架,以回答从CNN和GCNN多任务清除模型获得的模型解释是否可以应用于预测与实验证实的代谢产物相关的化学转化。我们展示了CNN像素强度与相应的间隙预测之间的强相关性,以及它们对不同分子取向的鲁棒性。通过实际案例,我们证明了CNN和GCNN的解释经常相互补充,这表明它们在指导药物化学设计方面具有很大的潜力。
{"title":"Interpretation of multi-task clearance models from molecular images supported by experimental design","authors":"Andrés Martínez Mora ,&nbsp;Mickael Mogemark ,&nbsp;Vigneshwari Subramanian ,&nbsp;Filip Miljković","doi":"10.1016/j.ailsci.2022.100048","DOIUrl":"10.1016/j.ailsci.2022.100048","url":null,"abstract":"<div><p>Recent methodological advances in deep learning (DL) architectures have not only improved the performance of predictive models but also enhanced their interpretability potential, thus considerably increasing their transparency. In the context of medicinal chemistry, the potential to not only accurately predict molecular properties, but also chemically interpret them, would be strongly preferred. Previously, we developed accurate multi-task convolutional neural network (CNN) and graph convolutional neural network (GCNN) models to predict a set of diverse intrinsic metabolic clearance parameters from image- and graph-based molecular representations, respectively. Herein, we introduce several model interpretability frameworks to answer whether the model explanations obtained from CNN and GCNN multi-task clearance models could be applied to predict chemical transformations associated with experimentally confirmed metabolic products. We show a strong correlation between the CNN pixel intensities and corresponding clearance predictions, as well as their robustness to different molecular orientations. Using actual case examples, we demonstrate that both CNN and GCNN interpretations frequently complement each other, suggesting their high potential for combined use in guiding medicinal chemistry design.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100048"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000186/pdfft?md5=fc7537dd4777fa93dd0a74d1d81c0c55&pid=1-s2.0-S2667318522000186-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41622538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding the performance of knowledge graph embeddings in drug discovery 理解知识图嵌入在药物发现中的性能
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100036
Stephen Bonner , Ian P. Barrett , Cheng Ye , Rowan Swiers , Ola Engkvist , Charles Tapley Hoyt , William L. Hamilton

Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required.

In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.

知识图(KG)和相关的知识图嵌入(KGE)模型最近开始在药物发现的背景下进行探索,并有可能帮助解决关键挑战,如目标识别。在药物发现领域,kg可作为流程的一部分使用,这可能导致进行基于实验室的实验,或影响其他决策,从而产生大量的时间和财务成本,最重要的是,最终影响患者的医疗保健。要使KGE模型在这个领域产生影响,不仅需要更好地理解性能,还需要更好地理解决定性能的各种因素。在这项研究中,我们通过数千个实验,研究了五种KGE模型在两种面向公共药物发现的KGE上的预测性能。我们的目标不是关注最佳的整体模型或配置,而是更深入地研究性能如何受到训练设置、超参数选择、模型参数初始化种子和数据集不同分割的变化的影响。我们的研究结果强调,这些因素对性能有显著影响,甚至可以影响模型的排名。事实上,这些因素应该与模型架构一起报告,以确保未来工作的完全可重复性和公平比较,我们认为这对于在生物医学环境中接受kge的使用和影响至关重要。
{"title":"Understanding the performance of knowledge graph embeddings in drug discovery","authors":"Stephen Bonner ,&nbsp;Ian P. Barrett ,&nbsp;Cheng Ye ,&nbsp;Rowan Swiers ,&nbsp;Ola Engkvist ,&nbsp;Charles Tapley Hoyt ,&nbsp;William L. Hamilton","doi":"10.1016/j.ailsci.2022.100036","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100036","url":null,"abstract":"<div><p>Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required.</p><p>In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100036"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000071/pdfft?md5=06ed4e6a1e3c501ecb6c465108f88691&pid=1-s2.0-S2667318522000071-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91728647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
HematoNet: Expert level classification of bone marrow cytology morphology in hematological malignancy with deep learning HematoNet:基于深度学习的恶性血液病骨髓细胞学形态学专家级分类
Pub Date : 2022-12-01 DOI: 10.1016/j.ailsci.2022.100043
Satvik Tripathi , Alisha Isabelle Augustin , Rithvik Sukumaran , Suhani Dheer , Edward Kim

There have been few efforts made to automate the cytomorphological categorization of bone marrow cells. For bone marrow cell categorization, deep-learning algorithms have been limited to a small number of samples or disease classifications. In this paper, we proposed a pipeline to classify the bone marrow cells despite these limitations. Data augmentation was used throughout the data to resolve any class imbalances. Then, random transformations such as rotating between 0 to 90, zooming in/out, flipping horizontally and/or vertically, and translating were performed. The model used in the pipeline was a CoAtNet and that was compared with two baseline models, EfficientNetV2 and ResNext50. We then analyzed the CoAtNet model using SmoothGrad and Grad-CAM, two recently developed algorithms that have been shown to meet the fundamental requirements for explainability methods. After evaluating all three models’ performance for each of the distinct morphological classes, the proposed CoAtNet model was able to outperform the EfficientNetV2 and ResNext50 models due to its attention network property that increased the learning curve for the algorithm which was represented using a precision-recall curve.

很少有人努力使骨髓细胞的细胞形态学分类自动化。对于骨髓细胞分类,深度学习算法一直局限于少量样本或疾病分类。在本文中,我们提出了一个管道来分类骨髓细胞,尽管这些限制。在整个数据中使用数据增强来解决任何类的不平衡。然后进行随机变换,如在0°到90°之间旋转、放大/缩小、水平和/或垂直翻转以及翻译。管道中使用的模型是一个CoAtNet,并与两个基线模型(EfficientNetV2和ResNext50)进行了比较。然后,我们使用SmoothGrad和Grad-CAM分析了CoAtNet模型,这两种最近开发的算法已被证明符合可解释性方法的基本要求。在评估了所有三种模型对每个不同形态类别的性能后,所提出的CoAtNet模型能够优于EfficientNetV2和ResNext50模型,因为它的注意力网络特性增加了算法的学习曲线,使用精度-召回率曲线表示。
{"title":"HematoNet: Expert level classification of bone marrow cytology morphology in hematological malignancy with deep learning","authors":"Satvik Tripathi ,&nbsp;Alisha Isabelle Augustin ,&nbsp;Rithvik Sukumaran ,&nbsp;Suhani Dheer ,&nbsp;Edward Kim","doi":"10.1016/j.ailsci.2022.100043","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100043","url":null,"abstract":"<div><p>There have been few efforts made to automate the cytomorphological categorization of bone marrow cells. For bone marrow cell categorization, deep-learning algorithms have been limited to a small number of samples or disease classifications. In this paper, we proposed a pipeline to classify the bone marrow cells despite these limitations. Data augmentation was used throughout the data to resolve any class imbalances. Then, random transformations such as rotating between 0<span><math><msup><mrow></mrow><mo>∘</mo></msup></math></span> to 90<span><math><msup><mrow></mrow><mo>∘</mo></msup></math></span>, zooming in/out, flipping horizontally and/or vertically, and translating were performed. The model used in the pipeline was a CoAtNet and that was compared with two baseline models, EfficientNetV2 and ResNext50. We then analyzed the CoAtNet model using SmoothGrad and Grad-CAM, two recently developed algorithms that have been shown to meet the fundamental requirements for explainability methods. After evaluating all three models’ performance for each of the distinct morphological classes, the proposed CoAtNet model was able to outperform the EfficientNetV2 and ResNext50 models due to its attention network property that increased the learning curve for the algorithm which was represented using a precision-recall curve.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100043"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000137/pdfft?md5=ae12125aef4855e7cfd36f2c405d139f&pid=1-s2.0-S2667318522000137-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91728650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Artificial intelligence in the life sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1